Domain Knowledge — The Second Most Important Skill to Have as a Data Scientist.
Dr. V.K. Mishra
We are HIRING US IT Recruiters ( Remote Job for US Client ) | Co-Founder | Assist in Data Analysis / Predictive Analysis Consulting | Upskilling | Outsourcing
Domain Knowledge — How Does It Help a Business?
Contrary to what was mentioned earlier, notice how most participants in a Kaggle competition don’t have any substantial subject matter expertise. Yet, regardless of the absence, they go ahead to win competitions one after the other with a high score in the leaderboards.
And that’s because, fortunately, someone, somewhere, was smart enough to think & ease the process of making predictions. Thus high-level Predictive Analysis libraries like Scikit-Learn do most of the heavy lifting in the backend, yet, the libraries are robust enough to still yield surprisingly good results even with default parameters. With just a couple of lines of code, literally, any Tom, Dick & Harry is capable of training a model on the dataset & submitting it to Kaggle, achieving at least a top 50% score on the leaderboard with minimal effort.
On the flip side, businesses work under major financial & time constraints while trying to sustain their place in the market. Not to forget, they are also in the market to sustainably create a profit margin for themselves. Besides in general, for most businesses, it’s just not viable enough to invest in developing an algorithm specific to their domain, in-house. Hence, they hire for the much needed Data Science role, hoping that the new hire would help resolve the problem they were facing. Also if an opportunity arises, to move forward with it or possibly, to capitalize on it.
Why is Domain Knowledge essential for a Data Scientist?
Interrelated to each other, yet clearly distinguishable, three aspects of Domain Knowledge, a Data Scientist should keep in mind, can be defined in context to the —
- The source problem, the business is trying to resolve and/or capitalize on.
- The set of specialized information or expertise held by the business.
- The exact know-how, for domain specific data collection mechanisms.
On the other hand, a rather unfortunate misconception the general public has about Data Science & ML is, how ML & AI is the mythical Noah’s Ark, set on resolving every trivial problem ever faced.
“Machine Learning” - (Image 1)
Depicted humorously, the author summed it up on the xkcd comic where Data Scientists are viewed as wizards from Hogwarts with a Magic Wand named “Machine Learning” capable of resolving any problem they’re facing or want to make some profits from.(Image 1)
But contradictory to popular belief, a Data Scientist needs to prioritize planning ahead with a sustainable & logical business strategy, followed by the implementation. To give an analogy, constructing a Space Shuttle to travel between New York & Tokyo sounds like a fool’s errand. Similarly, a Cats & Dogs classifier doesn’t have any sustainable & profitable business prospects. Instead, adapting to the business sector & gaining the necessary knowledge of the domain will be more beneficial to the business overall, rather than the technical know-how to build the prediction algorithm right away.
Secondly, and perhaps the most discussed topic in the Data Science community is in context to the information held by the business. This information acts as the Rosetta Stone, helping the analysts find better ways and/or means to perform his/her job. Prior information about the industry & the domain augments the process of making more precise & accurate predictive models based on the available features in the dataset. The other benefit being that, the model would then generalize better into real-world situations.
Besides, emphasis on the importance of Feature Engineering & how doing so can improve the overall accuracy of the model are common & is a topic of discussions across every corner of the community. But performing proper & insightful feature engineering is a skill, only a few experienced ones among the whole bunch is capable of doing properly.
Chief Marketing Officer
2 年Hi V.k., It's very interesting! I will be happy to connect.