Art of Data: Into the (un)known
Mehrzad Karami
AI Advisor | AI Strategist | CTO | On the mission to help businesses harness the potential of AI to solve real-world challenges, foster innovation, and build lasting solutions
My journey into Scientific Data
It was only a few months ago that I took a deep dive into the world of science. Little did I know that this will change everything: The way I understand science, the inner workings of scientific research, and the large amount of challenges and uncertainties that Scientists have to deal with everyday. And all this directed me to one aspect that kept coming back: Data is one of the main cornerstones of research.
As I continued on my path to understand the way scientific research and scientific data works, it became more and more apparent that still a vast amount of work needs to be done before we can talk about a proper data-friendly global infrastructure in Science. Even though a tremendous amount of data is being generated and used everyday, a proper data infrastructure is still being seen as one of the biggest pain-points in research. In the eyes of an enthusiastic data apprentice and evenly so motivated problem solver, it seemed to me that data used to be seen by many as a by-product. You need it in order to prove your hypothesis, but once you are passed that stage and hopefully, finished with the publication, it kind of loses its value to its owner (you have served me well, now please go away and make room for something else).You would be better off leaving it somewhere on a disk, either to be cleaned away by a network admin and make space for the next research project, or just stay on your local laptop and become part of the forgotten space, contributing to the 'data black hole'.
But wait! There is more to Data
If you are a data scientist, or you deal with data everyday, you have heard the statement: In the era of new economy, data is the new oil! Nowadays, every technology-oriented company, big or small, depends heavily on data for their daily businesses. If you are big enough as a company you have your own dedicated team or department to manage data, and if you are too small for that, you hire one or two data scientists, or outsource the work to a specialized data science company. In all of this, the sole purpose is to create more value and profit out of data you have.
Even though the purpose of science, especially in academic context, is not to profit, I would argue that generated data is much, much more, worth it than meets the eye, even long after research is completed and publication is done. Usually, generated (raw) data can be used to do various types of analysis, different than the original hypothesis it was generated for. Raw data sets can be recombined and restructured, to create more rich data collections, and then used by scientists working on different hypotheses. Getting into details on this subject will require another long article by itself, but you could say that this raw data sets are full of hidden gems. One needs to shine the light on them and then things can get more interesting.
Where are we heading
Everyday more and more data is being generated. According to research, within a couple of years we will face a situation where the amount of data will double every 12 hours. Can you imagine how we, as humans and with our tools, can cope with such a large amount of data on daily bases? Would this overflow of data help us to advance, or will we be flooded by this abundance of data and contribute to the expanse of 'data black holes'? To handle such exponential growth of data we need new tools, new ideas, and a new paradigm in data processing.
A gradual wave of change is already taking place in how we deal with our data. And this change is not limited to Science only, but even more apparent in the market niches, businesses, as well in policy and governmental institutions. What follows is a short summary of how handling data is changing:
- Political and legal. Policy makers and governments are becoming more and more aware of the importance of data, and at the same time, the dangers and perils of having a no-policy on data. Compared to 2018, 2019 has seen an increase of 33% in data leaks and breaches; an staggering amount of 7.9 billion data records have been exposed, and by the end of 2019 this amount was estimated to reach 8.5 Billion data records (Risk Based Security Q3 2019 Data Breach QuickView Report). Recent years scandals on the misuse of personal data (Did I mention Cambridge Analytica?) have been a clear sign to policymakers around the globe that something needs to be done. Major steps have been taken in some parts of the world, although there is still a long way to go before we could call the current political systems data-aware.
- Society and common awareness. The new generation, gen Z, is much more data aware, and thinks much differently about use of data. Whether this awareness is raised thanks to a series of events and scandals as mentioned above, is debatable. Whatever the reason, gen Z is more concerned about who gathers their data, and what happens to it compared to previous generations. And this awareness, even though still very basic, is creating a wave of change on how the market handles data, and how data-hungry organizations are dealing with data and user consent.
- Technology shift. The staggering pace based on which data is being generated (remember, will double every 12 hours in a couple of years from now) is pushing technology to keep up. And technology is adopting itself in a pretty rapid way in order to handle this magnitude of data. The change can be seen in how we keep reinventing technology to store, structure, clean, and analyse data. One thing is obvious; Data is changing the technology landscape, and it is very likely that this trend will continue to completely change innovation in both hardware and software technology. The change on both sides (Data and Technology) is not a unidirectional one; both sides are pushing each other forward to adapt and advance, more like a symbiotic coexistence.
- Cultural and ethical. Despite the aforementioned challenges in data privacy and security, the sharing culture is growing everyday. While keeping personal diaries 'personal' was the norm among previous generations, average gen Z does not see any use of keeping a personal log if you can not share it with others. After all, why would you write something, even personal to some extent, if you are not going to publish it? The sharing culture of information and data is ever growing and making more and more sense. And this 'making sense' is bringing a cultural adaptation with itself. This cultural change brings with itself a demand for free flow of information, or at least a technological society that supports the free flow of information.
What can we (un)learn
As Niel Bohr used to quote (some say it originated from Robert Storm Peterson, or Mark Twain. nobody knows since we did not keep any data records to validate its origin!), 'Prediction is very difficult, especially about future'. And when it comes to doing predictions about technology and progress, there have been enough predictions that turned out to be completely bogus. So let's not add one more to the list! Looking at the current state and advancements there are some noticeable lessons we can focus on and keep applying in order to get ahead. For one thing, we need to solve the privacy issue by putting the rightful owners of data at the center of whatever solution we are offering. Taking data and using it without the owner's consent and awareness is not a done thing anymore. And we need to give something back for using data. Data is already a commodity and it has become an important commodity in the new economy of data.
Last but not least, there is much more value in data than meets the eye. Data sets can be recombined, restructured, and be applied in order to create more enriched data collections. And these can then be used to test and validate more ideas and scientific hypotheses. To reach these goals data needs to be kept better, clearly defined, and structured so its usefulness is verified fast and easy.
Future is data-centric
The world of data is changing. As this progress continues, given the accelerating speed of change and discovery, we will see a completely different world within a decade. For any institution to succeed in this fast changing world, focus needs to be on the following guidelines:
- They should be data-driven.
- They should be forward-learning and forward-thinking.
- They should focus on measurable results.
- They should focus on privacy, collaboration and data sharing.
- They should think of data as the fuel in transforming 'Machine Learning' into 'Machine Thinking' technology.
Data is becoming an integral part of how our economic machine and our society works. And as the world makes progress towards a better future, more and more companies and business models will be open to new paradigms and innovative approaches in product development based on data and the creators of data.
The great Irish writer George Bernard Shaw tells the story of Joan of Ark in one of his masterworks. There, he foretells a conversation between Joan d'arc and king Charles, where king is complaining to Joan:
King: Why don't the voices come to me? I am the king, not you!
Joan: They do come to you; but you do not hear them. You have not sat in the field in the evening listening for them. When the Angelus rings, you cross yourself and have done with it; but if you prayed from your hearts and listened to the thrilling of the bells in the air after they stop ringing, you would hear the voices as well as I do.
I would argue that Science, pushing us forward everyday to better understand the world and to new discoveries, is the one who stays awake and listens to the mysteries of life long after the bells have stopped ringing. It does not matter how many challenges are ahead of us, it is the passion and curiosity to understand and unravel the mysteries that paves the path to better life, and progress. And it is our obligation to provide the tools and means that helps science in their everyday challenge. Let's keep listening to change around us and stay open to new paradigms.
Gepassioneerde Software Engineer met focus op gebruikers en kwaliteit | Tech Lead bij Picnic
5 å¹´The fact that businesses are already heavily relying on data, but academia lags behind is an interesting observation. I wonder why data is not seen as a first-class citizen by all scientists.
Client Happiness Manager – PioGroup Software
5 å¹´This is definitely a very interesting read! Thank you :) Looking forward to learning more about data and its potential as we go into the new decade!