Beginner's guide: The top 10 Data Science libraries in Python
Deena Gergis
AI & Data Science Expert @ McKinsey ? Improving lives, one AI product at a time
Dear aspiring Data Scientist,??
You think that Data Science is the coolest. Yes, you are right!?So you decide to pursue Data Science to be your career.??
After digging a bit deeper, and you get lost in all the references that you are supposed to use. Fear not, here is the ultimate 7 references that you will need to master Data Science.??Congratulations, you’ve now untangled the references zoo.??
And you are now ready to jump to the second zoo, namely the python packages that you will be using.??And here comes your guide for that.??
____________________________________________________
I. Data Prep:??
This is the very first stage of your work. And in this stage, you will load your data, transform it, clean it and summarise it so that it will be ready for your work afterwards. So which packages should you use for your data prep????
1. Pandas?
Pandas is a package that allows you to process, manipulate, summarise and analyse data in python. Fun fact, Pandas was actually created to duplicate the Data Frame functionality that R provides out of the box.??
2. Pyspark??
Pyspark is the Python API written to use Apache Spark directly from the Python script. Apache Spark is one of the most commonly used engines for the processing of big data.??
____________________________________________________
II. Visualisation:??
After you have prepared your data, you are ready to visualise it to visually spot out trends and insights. And for that purpose, you will need the following packages?
3. Matplotlib:??
Matplotlib is one the most commonly used visualisation packages in python. Personally, I believe this package is a pain to be considered in the standard Data Science stack due to its non-intuitive syntax and poor visualisation quality.. But, it is the status-quo standard.?
?
4. Plotly:??
Addressing the limitations of Matplotlib,?Plotly was developed to improve the quality of the data visualizations in python.?Plotly?also offer extensive interactive visualizations.?
?____________________________________________________
III. Machine learning:??
After having spent 80% of your time with the package mentioned above. It is now time for the advanced phase, namely: Machine Learning??
5. Numpy:?
Even though python is a high level language that is very attractive in terms of learning-curve, readability and community, python is not the best in terms of performance. And given that Machine Learning relies heavily on complex numeric computing, a higher preforming solution was needed.??
Thus, NumPy was developed using C, a high-performance programming language.?
?
6. Scipy:??
Building on the high-preforming implementation of NumPy, SciPy offers more advanced operations of linear algebra and calculus such as integration,?fourier?transformations, functions minimizations etc??
7. Sklearn:??
Sklearn is one of the most beautifully designed and commonly used packages that has most of the standard Machine Learning algorithms implemented, extensively optimised using NumPy and Cython???
Tip: If you are already familiar with the basics of Sklearn, check out this article: 5 advanced Scikit-learn features that will transform the way you code
?
8. Tensorflow:?
Tensorflow is a library that is used to create Neural Networks and Deep Learning models. It is also partially implemented in C to increase the performance. And it can run on multiple CPUs, GPUs & TPUs??
____________________________________________________
IV. Production:??
And now you have reached the point where you have a trained model that is ready to be used by your?customer.??
?
9. Flask:??
You now have one model that you want to intergrate into your product. One of the most common ways to expose your model is to wrap it in an API that other applications - irrespective of their language - can directly call.??
?
10. Dash?
As attractive as an API can be for tech geeks, as useless as it is for business stakeholders.?Solution? Build a web app that your customer can use to access the results of your model??
?____________________________________________________
Digital Business Consultant
3 年Thanks for the great article.
Business Analyst @ BNY | PMP, PSPO, Financial Engineering
3 年Very useful post simplifying the processes with their limitations … thanks a lot Deena Gergis. Very very helpful
CTO at Krino with experience in developing AI-powered SaaS software for startups and enterprise companies. Self-taught Backend Developer and proactive in business technology.
3 年Great!!
Hey Deena Gergis your articles are very informative and valuable ! We would love to invite you for a chat!
Looks good I