DATA QUALITY - STRUCTURED AND TEXTUAL

DATA QUALITY - STRUCTURED AND TEXTUAL

DATA QUALITY – STRUCTURED AND TEXTUAL

By W H Inmon

The issues of data quality have been around since the first program was written. And if there is one concept that overshadows all others in data quality it is – GIGO – garbage in/garbage out. Furthermore, new advances in technology rely on the fact that the data that they operate on is “clean” – complete, up to date and accurate. And of course the data is not clean, neat and accurate.

You would think that data quality would be high on the list of the IT organization to get right. But strangely, it isn’t. Data quality is relegated to the back bench – to be addressed when there is time, and there is never time. Something more pressing comes along and butts in front of data quality.

Data quality is based in no small part on the works for Larry English, the original pioneer of data quality. Larry did his work in the day and age of structured data. The English approach always made the assumption that if you found incorrect data you should simply correct it.

Of course, there was a lot more than the correction of faulty data in the structured environment. But at the heart of managing data quality in the structured world, the thought that data should be corrected if found to be incorrect permeated throughout.

Today, the world is different than it was in Larry English’s day.

Today – in addition to having structured data – we also have textual data. And with textual data comes a whole different understanding of the meaning of data quality. In many ways textual data quality is diametrically different than structured data quality.

The first difference in the meaning of textual data quality is that notion that data should be corrected if found to be faulty. In the world of text, you absolutely do not correct text – even if it is incorrect. If the author of the text says that 1 + 1 = 5, then that is what you work with in the world of text even if the proposition is flawed. Suppose that I write on the bank loan that I make $1,000,000 a year. When I wrote that number I forgot the decimal places and I actually make $10,000 a year. I have written the data incorrectly on my loan application. So what happens to the incorrect data? Absolutely nothing. The bank is obligated by law to not change anything on my request for a loan, even if it is incorrect. Stated differently you are breaking the law if you correct someone else’s bank loan application.

Textual data must be managed and manipulated as recorded, regardless of the correctness of the text.

But the differences between text and structured data do not end there. Another difference relating to data quality is that of the need for context to accompany text. Context is needed to assert meaning to the text that has been encountered. In structured data, there is always – hidden or in plain sight – metadata that supplies context to the structured data. In structured data there are elements of metadata such as table name, key name, attribute name and so forth. But with text, context is implicit, not explicit. You have to have context in order to make sense of text.

For example – what does the word “fire” mean? Without context fire can mean many things. Fire may be a conflagration. Fire may be the firing of a gun. Fire may be a dismissal of an employee at work. Without context you don’t know what fire means.

And there are many other essential differences between the structured world and the textual world, all of which are relevant to the processing that is done in the different environments and all that relate to the idea of data quality.

Data quality is vitally important in both the structured and the textual environment. But the implementation and the manifestation of quality ?- the very meaning of data quality - is very different in the two environments.

?

Bill Inmon lives in Denver with his wife and his two Scotty dogs – Jeb and Lena. Bill wakes Jeb and Lena up every morning. Jeb always greets Bill with his morning howl. If you didn’t know better you would think that Jeb was dying. But that is just his way of waking up.

?

I was a friend and a peer of Larry English. I spoke at many conferences with Larry over the years. Larry was passionate about his contribution to our profession and he was truly the pioneer of data quality. In addition, Larry was a truly wonderful human being. We all remember and miss Larry.

要查看或添加评论,请登录

Bill Inmon的更多文章

  • RIGID METADATA/FLUID METADATA

    RIGID METADATA/FLUID METADATA

    RIGID METEDATA/FLUID METADATA By W H Inmon Programmers and designers have learned metadata from the day the first…

    5 条评论
  • A RECENT PODCAST

    A RECENT PODCAST

    Recently I did a podcast with my friend Mustafa on managing textual data in the modern environment. I thought you might…

    1 条评论
  • ETERNAL PURGATORY

    ETERNAL PURGATORY

    ETERNAL PURGATORY By W H Inmon VENDOR TO TECHNICIAN: I have this new product. It is going to make money for your…

    2 条评论
  • CANADIAN WOMEN AND BLACKFEET INDIANS

    CANADIAN WOMEN AND BLACKFEET INDIANS

    CANADIAN WOMEN AND BLACKFEET INDIANS By W H Inmon On a recent trip to Canada I saw something that I shall never forget.…

    8 条评论
  • DATA FABRIC AND REALITY - PART II

    DATA FABRIC AND REALITY - PART II

    DATA FABRIC AND REALITY – PART II By W H Inmon The data lake was an architectural creation of amateur data architects…

    39 条评论
  • DATA FABRIC AND REALITY - PART I

    DATA FABRIC AND REALITY - PART I

    DATA FABRIC AND REALITY – PART I By W H Inmon Organizations that have fallen for the data lake trap wake up one day and…

    54 条评论
  • FINDING THE ELUSIVE GOLDENFISH: AN ALLEGORY

    FINDING THE ELUSIVE GOLDENFISH: AN ALLEGORY

    FINDING THE ELUSIVE GOLDENFISH: AN ALLEGORY By W H Inmon Once upon a time a fisherman got lost at sea in the tip of…

    4 条评论
  • SNOWFLAKE AND DATA WAREHOUSE

    SNOWFLAKE AND DATA WAREHOUSE

    SNOWFLAKE AND DATA WAREHOUSE By W H Inmon It was with great interest that I saw where Snowflake just advertised their…

    73 条评论
  • THE SILVER BULLET RESIDUAL

    THE SILVER BULLET RESIDUAL

    THE SILVER BULLET RESIDUE By W H Inmon The IT profession is famous for falling for the silver bullets that are sold by…

    10 条评论
  • HUECO TANKS

    HUECO TANKS

    HUECO TANKS By W H Inmon A few years back I had dinner with Joe Reis to welcome him as a fellow author. As Joe and I…

    5 条评论