Nigerian Businesses Need Data Science. Here's Why. . .
Jesse Johnston
African Digital Creator, Content Marketer & Writer For Hire | ForbesBLK Member
Introduction
Hi there.
If you're reading the intro to this article then—in addition to feeling like I definitely nailed it with the title! I must assume that you're either an innovative and/or competitive (no shame in that!) business owner in search of a means of overhauling your business's customer satisfaction and profitability margin in a way that wouldn't require breaking the bank...or you could be someone who—not unlike me—has a healthy practical and academic interest in the application of data science to the Nigerian business (but also everything else) scene. If for nothing else but to finally bring an end to the pure inefficiency and poorly informed (therefore bad) decision making which apparently pervades whatever structured institutions that exist in our oh-so-beloved country. Regardless of whatever the specific reasons for you being drawn to this post, I'm glad to inform you that this is a great first step toward the embarkment of a journey full of mysteries, revelations and guaranteed splendor--all in a manner of speaking, of course. That being said, I hope the next couple of lines will suffice to stoke your thirst for adventure. . .
About a year ago, not long after the onset of all pandemic induced lockdown procedures—yes, the very same ones that left Nigeria, the rest of Africa and most devastatingly--without sounding too selfish-- my academic career, job security and ability to earn in a discombobulated state. I, like so many others around the world, was left with way too much time on my hands and little more to do than stress-eat, compulsively wash/sanitize my hands and, more reluctantly, try to catch up on some reading which I had been helplessly procrastinating for so long that I must have all together gotten numb to that all-too-familiar feeling of almost crushing guilt which usually never fails to emerge in response to any at all of my neglected self-developmental goals. Anyway, the reading (which I am awfully tempted to discuss in a separate article) worked wonders for both my mental and emotional states at the time, but most importantly led to a highly self-motivated foray towards acquiring a set of skills which would keep me "relevant" in the ever turbulent and dynamic Nigerian job market—especially against the event of another disaster--perhaps one in the neighborhood of the current pandemic--ever happening again. It was on said foray that I happened to come across the field and concept of data and data science. Needless to say, I got hooked.
About 14 years ago, Clive Humby, an American mathematician and data scientist was reported to have made the claim that: "Data is the new oil." A statement which, in today's world, has turned out to be a lot truer than he could ever have imagined. Data obsession has become pretty mainstream lately, with reports of excessive increases in demands for its designated specialists, especially now as businesses and corporations push to implement their digital revolution strategies and become data-driven organizations. Unlike oil however, data itself costs nothing and is utterly worthless until someone makes sense of it.
This article—which I intend to split into two(or more?) bite-sized parts for maximum digestibility and theatrical effect—essentially contains the (proverbial but also almost literal)fruits of my almost year-long toil down a self-engineered learning path toward data science, as well as certain self-developed theories as well as personal notions and beliefs that I have come to think substantially stand to reason as to why Nigeria and its businesses are in dire need of data science. Because I cannot assume that you, dear reader, has had the same considerable amount of time and self-immersion into the different realms, disciplines and factors surrounding data science like I have, this first article as it were, is intended to give as basic an introduction to the aforementioned realms, disciplines and factors surrounding data science as I could conceive. Alright then, here goes nothing. . .
What even really is "Data"?
Anyone who had the privilege (or lack thereof) to experience an introductory education to computer studies in Nigeria—or probably anywhere even remotely west-African—can probably still remember the all-time classic definition of data proposed within the class room; Data simply being raw information and and vice versa. While it is neither my place nor intention to question or dispute the academic knowledge passed down and spread by countless generations of west-African elementary school teachers, and whose origins could very well likely be dated back past the birth of the older gods (please excuse Lovecraftian references and metaphors where ever spotted!), I do feel the need to say that there definitely is a little more to the general idea and concept of data than the half-a-dozen words or so that our previously stated definition would have us believe, though we must assume that it meant well.
For us to truly grasp and understand the concept of data, we must begin to view it as just that; a concept.
In today's world more than ever, data is ubiquitous (abundant, global, everywhere) and pervasive (unescapable, prevalent, persistent). From birth to death, humans generate and consume data. The data trail—as I like to call it—pretty much starts at the very beginning of every human life with the birth certificate and continues all the way to a death certificate (and beyond!). In between, each individual produces and consumes enormous amounts of data.
Data is not only ubiquitous and pervasive; it is also essentially the life-blood of all organizations that survive and prosper over time. Imagine trying to operate a business without a concise knowledge or having detailed accounts of who your customers are, what products you are selling, who works for you, who owes you money, and to whom you owe money. All businesses have to acknowledge and keep a minimum of this type of data (and much more in most cases). Just as important, they must have that data available to decision makers (Executives, Administrators and Managers) when necessary. It could even be argued that the ultimate purpose of all business information systems is to help businesses use data as an organizational resource. At the heart of all major and successful organizational systems are the collection, storage, aggregation, manipulation, dissemination, and management of data.
So, needless to say at this point, data is so much more than just a tender means to gaining an internet connection on our various mobile devices. That isn't to imply however, that data is in any way monotonous. Quite the opposite, dear reader, data is vast, variable and exists in several different types and formats, the most popular and recurrent of which include;
BIG DATA
An industry poster child, big data is essentially very large and diverse data sets that include structured, semi-structured and unstructured data from different sources and in different volumes which could range from terabytes to zettabytes. It consists of data sets so large and diverse that it's difficult—more like near impossible—to capture, manage and process them with traditional data management tools.
The Stock Exchange market, social media sites such as Facebook and Airline companies are all examples of some establishments and/or industries that are responsible for massive big data generation—with the NYSE being said to generate upwards of one terabyte of trade data daily, Facebook users to generate over 20 terabytes hourly and a single airline jet engine to generate over 10 terabytes of data for every half-hour of flight time.
STRUCTURED, SEMI-STRUCTURED AND UNSTRUCTURED DATA
All data has some sort of structure. In fact, when it really comes down to it, the only underlying difference between structured and unstructured data is whether or not said data possesses a pre-defined model or a uniform pre-defined format. In the past, data was typically stored in the tabular row and column format of relational databases. However the advance of modern web, mobile, social, AI and IoT apps, coupled with new-age objection-oriented programming, have broken that paradigm.
Any data that can be stored, accessed and processed in the form of a single, fixed format is termed as 'structured' data. Unstructured data on the other hand, refer to data possessing unknown forms or format variety. Typical examples of unstructured data are heterogeneous data sources containing a combination of simple text files, images, videos etc. And as you might have guessed by now; semi-structured data can contain both of the previously mentioned data forms
DATA STORAGE
Efficient data management typically requires the use of a database. A database—in the most fundamental of terms, is any structured collection of stored data. Upon the development, introduction and recognized efficiency of computer databases, the Database Management System (DBMS) also became a standard. DBMSs serve as the go-between for end users and the database. The database structure itself is stored as a collection of files, and the only way to access the data in those files is through the DBMS. DBMS presents the end user (or application program) with a single, integrated view of the data in the database. Having a DBMS between the end user’s applications and the database offers some important advantages. First, the DBMS enables the data in the database to be shared among multiple applications or users. Second, the DBMS integrates the many different users’ views of the data into a single all-encompassing data repository.
Data Growth
Now, with our new-found familiarity with the different aspects of data acting as a back-drop in our minds, consider this;
According to a study beginning in 2012 that was conducted by IDC's Digital Universe and sponsored by the EMC:
Since the dawn of time, up until the year of our lord 2005, the total recorded amount of data that was generated by all of mankind equaled about 130 exabytes(1.3 million terabytes). Keep in mind that that essentially consisted of all of human and humanity's recorded endeavors and creative activities up until that time; including every book or document that was ever produced and all of the text contained therein, every song (any sound in fact) that was ever recorded, every length film of any format ever recorded, and even supposedly every word that was ever spoken. I. Mean. Everything.
Now, fast forward about five years later; the year 2010, and it was recorded that the previously stated amount of approximately 130 exabytes had increased, rather tremendously if I might add, to about 1,200 exabytes (or 1.2 zettabytes)—yielding an almost tenfold expanse.
Fast forward still to the year 2015—another five years down the road, and studies indicated another phenomenal up-spike in said generated data levels, resulting in an astonishing 7,900 exabytes (or 7.9 zettabytes) of recorded data levels.
By this point, dear reader, one would hope that you might have begun to grasp the point I try to make of how rapid, indiscriminate (and dare I say exponential?) The increase, growth and expansion of human-generated data occurs, and even above that, why I stand to make the declaration that ignorance of said data or of how huge a role it plays in the effective running of our modern world would be little short of utter folly (and yeah, I just used the word "folly").
But it doesn't even stop there; The study went on to inform of results generated from predictive calculations performed with the already observed and massive expansion rates factored in. Said results indicating another jaw-dropping increase of almost six fold estimated to have occurred by the end of the year 2020. which of course would mean a jump from the 7,900-or-so exabytes realized in 2015 to about 40,900 exabytes (or 40.9 zettabytes) just five years later- an estimate which was more than likely surpassed as a result of the unprecedented prevalence of virtual and digital mediums of interaction, transaction and communication which emerged as the overwhelming norm, greatly due to the quarantine and lockdown procedures which were effectively enforced worldwide in a not-so-successful effort to limit and contain the spread of the devastating and ever-looming covid-19 pandemic.
Alright, so as detailed and quantitative as the facts, figures and images cited above might seem, it probably still leaves the individual and/or business owner with little or no prior exposure to or interest in global data growth or statistics wondering: "Why should i give a damn?" Well, my response to said individual would be: "because this means that we—mankind—are essentially sitting on an ever expanding goldmine of user data and consumer information, most of which is freely accessible and can be easily gathered in the various different forms of data files for anyone willing to begin exploitation". Okay, maybe not the most articulate or extemporaneous of responses, but my response nonetheless.
As a matter of fact, considering the previously explored research results, it wouldn't be so far fetched if one were to consider the possibility of data being the only truly inexhaustible resource currently available at mankind's disposal, bound to adamantly increase in both volume and variety long after all other commonly sought after resources and commodities are long depleted...Alright, so maybe that's a little too far fetched for immediate credence, but I honestly do believe that the potential and possibilities are limitless. I mean, we already live in a time where outrageous amounts of value and speculation are attached to decentralized and untraceable non-legal-tender (Ahem...crypto-)'currencies' on the internet whose values are solely determined and driven by online speculators, and to top it all off, is theoretically immune to any sort of tracking or regulation from governments or any other central authorities. Next to that reality, my ideas might not be so far fetched after all.
But, my personal grievances and wariness of crypto-currencies aside, given its uncharted and indiscriminate exponential growth, the perspicacious mind could just as easily think up several potentially perilous scenarios concerning data as well—an inability to keep up with or even maintain current data storage rates and standards being the very least of our worries. While there might be much and more to potentially explore down that dark and tainted path, for the sake of my (self-imposed)mission to introduce us to the wonders and benefits of data and data science, I'd very much rather stir us down the path that leads to the rainbow with all the wonderful colors and proverbial pots of gold.
Enter Data Science
I'm pretty sure that to anyone who isn't an obsessive fanboy like me—or my buddy, Patrick—The term 'data science' would most likely be met with either vague perplexity, or something so close that it makes no difference. On the other hand though, I would like to think that anyone who has faithfully stuck with me on our journey thus far—past all of the hurdles, detours and segues, should already begin to conceive some rudimentary ideas as to what data science is and what it is that a data scientist might do. But fortunately for us , we needn't worry or overly strain our adventurous but understandably road weary heads, because reminiscent of when I, myself first came across the very concept of data science in all its glory; here comes google to the rescue. . .
Kinda.
Upon first ever entering the question: "What is data Science?" into our beloved search engine in what feels like a thousand years ago, I was immediately—that is, of course after scrolling past the countless promoted ads promising to "turn you into a data science professional in as much time as it takes to have a drivers' license made"—presented with an entry from Wikipedia as one of the foremost search results.
The aforementioned Wikipedia link led to this definition:
"Data science is an interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from many structural and unstructured data. Data science is related to data mining, machine learning and big data''.
I don't know about anyone else, but for me this was a definition that was about as cool sounding as it was vague and convoluted. I mean, for starters I didn't even know what most of the terms within the very definition meant. I ended up falling down this worm-hole where I was stuck studying the wiki pages and articles—where available—on each of the terms used in the data science definition and then repeating the process for terms that I found in the definitions of the very terms I had just searched for. By and large, the experience left me a little bit more informed and a lot more confused and exhausted; not a healthy ratio by any standards.
A Lot of other more contemporary blogs and articles that I came across were choked with a bunch of almost meaningless expletives such as: "Profession of the Future " or "Sexiest Job of the 21st Century" or "Highest paid Career option right now". All of which, in my opinion, were little more than flimsy efforts and avenues for promoting and/or selling their very own paid courses and tutorials, all of which would just as easily "Turn you and your mum into Data Science professionals in time for a scheduled interview at Google the following day".
I have since then—after several months of participation in paid and unpaid internet courses and the consumption of several kinds of study material (not to mention gargantuan textbooks) from recognized educational and/or data science institutions such as Kaggle and IBM's Big Data University—come up with my own moderate and hopefully easier-to-swallow definition of the term "Data Science", one that wouldn't leave you with an itchy scalp, wondering why a "small-time business owner" like you would ever need to employ any of its techniques for all of the supposed better decision making, reduced customer churn rates and even enhanced competitive edge it offers.
I define data science as the process or series of techniques which involves the gathering, scrutinizing and analysis of pre-generated data, thereby determining and revealing said data's true worth or potential value to an individual or establishment.
Now it's possible that you might be thinking that this definition is still slightly inclined towards the vagarious. That, dear reader, is because I was rather intentional about making that definition as broad as possible. In truth, I have come to believe that the precise definition of data science might ultimately vary slightly or significantly depending on the specific industry and/or field to which it is applied. My intentions however, as previously made clear (hopefully) are to expose data science as an essential tool for the overall increased efficiency of Nigerian—or African, if you prefer—businesses and other profit oriented establishments.
But first, a little more on who a data scientist is and some of the various realms of data science.
WHO IS A DATA SCIENTIST
A pretty popular—if only slightly corny—colloquial maxim which apparently exists within data science circles, says: "A data scientist is someone who is better at statistics than the average computer programmer and is better at computer programming than average statistician" (and cue the hearty gales of laughter?)
While "Witty'' is obviously one thing they can't ever be labeled as, there actually does exist—just as with the very field within which they operate—a plethora of definitions attached to and/or proposed by several different sources for anyone who cares to look on the internet. Ranging all the way from professional and educational bodies and institutions, to jokers like me blogging on the internet to pass the time; no one really seems to totally agree on who exactly a data scientist is or what she does.
So once again, it would seem that it falls upon my shoulders to put forth—at the very possible expense of my own credibility—a definition befitting this particular context, with the academic interests of my esteemed target audience squarely in mind. One that, while mostly developed for the purposes of this blog, has also been the result of many a nap-less afternoon spent perusing article after article on the internet, in the maybe not-so-diligent pursuit of satisfying my own personal curiosities.
I have since come to define a data scientist as essentially any professional who possesses and is able to apply the required data science skills and techniques to defined data and data sets where necessary in order to generate actionable results.
Another seemingly overly broad definition, I know, but one intentionally intended to address the fact that just as with the definition of data science I previously put forth, the specifics of a data scientist's job description might vary slightly or significantly depending on the specific industry or field within which she finds herself. The details mentioned within the previously stated definition however, are aspects that are more or less universal to data scientists of all industries or calibers.
As a matter of fact, about twenty-five years ago, anyone who's job description consisted of the aspects included in our data scientist definition—essentially gathering, preparing and cleaning of datasets, as well as the application of various analytical and statistical methods to said data—would have gone under the job title "Statistician".
Over time however, with the steady and rapid growth of data and technological advancements, as well as the inclusion of a few new responsibilities—such as extracting patterns from data—which said technological advancements (not to mention new scientific breakthroughs with statistical models) now accommodated, the humble statistician—whose job had not necessarily changed all that much, by the way—came to be known as a "Data Mining Specialist".
A couple of years further into the future, and in the wake of even more rapid data and technological growth and advancements—and of course a few more tweaks in the specifics of her job description—the "Data Mining Specialist" became the "Predictive Analyst" or the "Predictive Analytics Specialist''.
As of late, the more preferred term is "Data Scientist". But still, our little history lesson just goes to prove that—regardless of whatever they were called, within the Corporate, Business, Financial or even Public and Healthcare industries, there has always been a healthy demand for professionals with the ability to gather and effectively make sense of all the non-stop, ever increasing, stockpile of generated data. These professionals however, were not sought out by so many substantial industries, businesses and establishments just for their ability to efficiently transform said ever-increasing data into useful information. They were hired to turn all that beautiful, overwhelming data into an Advantage!
CONCLUSION
Alright! So before I go any further into ending this post, I'd like for you, dear reader, to just take a moment to acknowledge the magnitude of what you just accomplished. I mean, If you really did come all this way with me, then frankly, you just successfully consumed and digested more information about data and data science in a few minutes than I was initially able to accomplish intentionally within a year and a half of semi-unemployment!
Seriously though, Its no small feat and you really should be proud.
Data science is a very vast and continuously evolving field and concept, and I believe that being even just a little bit more aware of it could ultimately lead to its overall acceptance and likely introduction into crucial areas of dire need and immense deficit.
As much as I'd like to cut this short and just end—mostly because I don't handle goodbyes very well, I do feel the only slightly compulsive need to remind us once again that this post has been a part one of two under the title; "Nigerian Businesses Need Data Science. Here's why. . ." and the second post—in which I'll discuss with more specificity the different realms and individual disciplines of data science, as well as how they could prospectively play huge roles in profiteering Nigerian business—will be available very soon, although the specificity of its arrival proximity would largely depend on how this article is perceived and generally responded to.
Thanks for reading and stay tuned!