The Grit and Mettle required of a Data Scientist
Downtown Dallas, TX at dawn. Picture from my hotel room

The Grit and Mettle required of a Data Scientist

It's 10.30pm here in Texas. As I am typing away this note from my hotel room, I wanted to share the experience of what it takes building large-scale data science applications that are designed to go to production.

In the last two days, I have spent more than twenty hours helping a client fine-tune a large-scale application that is going to production soon . The model has been built, the integration is almost complete, the deployment is to be done on systems with 20+ cores and 128+ GB of RAM and guess what, we have performance issues. I am here to help fine-tune the application and am spending the whole week to get it production ready. Day one was frustrating.. Despite spending twelve hours, It was hard to figure out what was happening. Is it the code? Is it the system configuration? Is it the OS? Is it the memory? Is it heap? Do we have enough compute power ? Should we parallelize? Should we queue tasks? Is the system trashing? So many questions.. So many places to look.. So many logs to review.. So many tests to be designed..

No alt text provided for this image

Building performant, scalable data science applications is hard & it is no child's play. As some one who takes his seven-year old to piano classes that meets thrice a week, I appreciate the rigor of what it takes to become a musician. Yet many want to be musicians want to take a short cut and many businesses thrive on the promise of helping you learn instruments in 24 hours! The same thing is happening in Data Science. The "Moocification of Data science" and the democratization of data science has made data science accessible and seemingly simple that businesses and vendors are thriving with the promise of "You too can be a Data Scientist!".

After teaching more than a thousand students on how to build data science applications, many of whom have taken multiple graduate level courses under my supervision and are leading successful careers in data science, I find it amusing to see people still wanting to spend the least amount of time and effort to be called a data scientist. A PhD in Nuclear Physics + a 10-week Data science course, we have a Data scientist! A career changer who has gone through an intensive 4-week data science bootcamp where multiple Jupyter notebooks were inspected and a capstone Kaggle Kernel was implemented, we have a Data scientist! Picked up a Data science course on sale for just $9.99 and spent the next 4.5 hours watching videos, we have a Data Scientist in the making!

No alt text provided for this image

You wouldn't trust someone only experienced in building homes with lego blocks to build your home. So why would you trust an amateur to build the core piece of your IP haphazardly? It is estimated that more than 80% of analytics and data science projects fail to make it into production. Multiple surveys have highlighted the challenges of adoption of data science and machine learning applications in a production setting. Companies are realizing that just because your applications runs with a sample data set in a Jupyter notebook environment, doesn't mean it's ready for production. But it is also scary to see companies trust their reputation and risk deploying applications without applying the best practices of engineering, model governance, model risk and proactive monitoring.

Building data science applications requires a disciplined approach in model design, development, testing, scaling and tuning the models and environment for deployment.

No alt text provided for this image

The tools and hardware choices available today requires expertise and taking a comprehensive systems development approach factoring requirements in every step to be optimized for production-ready applications. When we started designing data science solutions and the QuSandbox, we realized there are so many dimensions to be tuned that are feasible only by experts in their respective fields, be it data engineering, data science, software development or risk. Gone are the days, a data scientist is some one who can write rudimentary Python code on Jupyter notebooks . Companies today are towards hiring experts who can engineer optimal solutions to bring the organization the competitive advantage they seek through their applications. There are many arm-chair data scientists, data enthusiasts and those who see data science as a hobby. It's fun and if that's what you are in it for, pursue it a hobby and have fun! If you are serious about Data science as a career, you need to be all in! It's a fantastic career path that is demanding, challenging but also gratifying if you give it your all and dedicate time and efforts to learn and hone your skills. Look for those opportunities to apply and specialize in the skills that you feel you have an edge over others. Experience is key to build scalable applications and the more challenges you encounter, the better your chances of being a successful data science practitioner!

On the second day of my project, things are a lot more clearer. I understand the system architecture well enough to look for where the problem areas are. In one project, I was able to recommend changes that decreased the application execution time by 50%. In another application, I see an opportunity for more than 80% improvement with some code factoring. But these changes will be tested atomically and we will incorporate it if the changes are proven to consistently work and provide value. It's a satisfying day in the life of a data scientist to see your client's application getting closer to what it was meant to be.. Deployed and In Production!

Author:

No alt text provided for this image

Sri Krishnamurthy, CFA, is the founder of QuantUniversity, a data and quantitative analysis company, and the creator of the Analytics Certificate program and the Fintech Certificate program. He has more than 15 years of experience in analytics, quantitative analysis, statistical modeling, and designing large-scale applications. Previously, Mr. Krishnamurthy has worked for Citigroup, Endeca, and MathWorks and has consulted with more than 25 customers in the financial services and energy industries. He has trained more than 1,000 students in quantitative methods, analytics, and big data in the industry and at Babson College, Northeastern University, and Hult International Business School, many of whom work in data science roles at financial services firms. Mr. Krishnamurthy earned an MS in computer systems engineering and an MS in computer science from Northeastern University and an MBA with a focus on investments from Babson College.




Good analogies on “instant” professions. Worse yet are credentials that haven’t been verified.

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了