Datum. Singular. Data. Plural.
Reinaldo Lepsch Neto
Experienced Data & Analytics Professional | Proud father | 50+
The Latin – originated languages regard the sense of the word “data” as something that is “given”. A number that is there, in the very place where a certain phenomenon has happened and can be measured and accumulated on a set of measures. Given measures. Data sets
You can measure the temperature or rainfall, using sensors?in five buildings – four of them located at the 4 extreme geographic points, north, south, east, west; and the fifth located at a central point downtown. For a time interval, say one year, with recordings happening hourly.
One year, mean 365 days * 24 hours * 5 buildings = 43800 recordings, or records, or rows. Each row can have n fields, each one containing some recorded piece of the measurements. Temperature. Rainfall. Windspeed. Concentration of a certain chemical element. And each field has a type, or the domain where it varies: date, or number (float, integer), string, categorical. Each one has a size, specially if combined with the space occupied (example, a name with 30 characters).?
See what happened? On the second paragraph we were at the top of a building describing what would be measured and with which tools. On the third we were on the world of numbers. Abstraction. Numbers that could be measures of the real world or components of an abstract model on a field from pure or applied mathematics.?
They were given to us by the real-world phenomena, and we measured them with tools, then recorded the measurements with other tools. Pen and paper, special recording devices – or the original sensors were even connected through the internet or some private network to us at the research centre.
Did you say research??
Yes, at the research centre – an university, or a corporation – they arrive. Bunches of data. Scannable sheets for paper, electronic worksheets, other data formats recorded on their respective file format. We are not talking any longer about numbers, but about how are we going to read them, import them to an automated system, that will process them.
You probably have already listened to the expression “number crunching”. “Data ingestion”. Data input, data reading, whatever. The data are here. What do they mean? We have no idea.
We must first open the system’s mouth so it can swallow them. Once in the beast’s belly, they must be…. ahn…. well, now there are lots of words. Treated. Handled. Formatted. Disciplined. Imputed, filled. And the list goes on.
The just-arrived data are rough stones. Rustic gems. They contain value, high value. Pure oil. Which is hidden, under stones of records and more records with missing values. Dates with different formats. Numeric fields containing letters. Missing names or addresses. And the worst of all: distorted data. Doubled or tripled values. Negatives where there should only be positives
So, the part of the Data Science that takes most of the professional’s time will begin. It causes headache. Brings no salary raise. You will not be promoted to manager when you finish it. There are lots of tools that make it too much easier, but some are not trustful, others are great but expensive. And probably the boss told you to be total vanilla: use only free stuff, we have no money now. And document each step.?He doesn’t need to finish the phrase with “please”. This is the time data scientists go to the cafeteria to have a strong espresso. Legend says some of them even add something stronger on it. Whatever. Dirty work. But someone must do it.
?
??
???
?
?
?
?
领英推荐
Well, well. You took 3 whole days on it. Congratulations, you were too much of a lucky one – some poor disgraced souls would take so long that the data could lose their validity. You used all the techniques you’ve learned in your online courses and books and found the Greek statue inside that ordinary stone. Champion!
Easy, my friend. Not yet. What do you know about the metaphorical statue you have just brought to life? First you had a dirty and ugly database. Now you have a clean and beautiful one. But you still have no information at all. You must extract and, still metaphorically, say which god or general or Greek philosopher is this. You got a bunch of numbers, presumably correct and clean, but what do they say to you? If they are temperature measurements, can you already say you can foresee if you’ll be able to go to the beach tomorrow only based on them? Or if they are measurements of the mechanical resistance of a bridge, just built, can you tell whether It is safe enough to be inaugurated and open to heavy traffic? Serious you thought you would be called a scientist without performing ultra-high responsibility tasks like these?
Time to get a new toolbox and resume your work. Amazing things ahead.
?
Then you gotta go model building. Check clustering, trending. Produce charts and graphs, plotting this and that. Are you sure you’re ready?
?
?????????????????????????????????
?
You can now use whatever business domain tools you have in your hands, but all of them must produce results using your data, turned to information, to produce knowledge. And knowledge is enough to support decisions.
?
But… Is it, really? What about wisdom and how you’re using what you have just learned from data? Take easy, again.
?
??????????????????????????????
???????????????????????????????????????????????“All I know is that I know nothing…sort of who’s paying the bill”
?