What is Data Scientist, Data Engineer and Data Analyst?
Ganesha Swaroop B
|17+ yrs exp Software Testing|Author| Mentor|Staff SDET|Technical Writer|Technology Reasearcher|Java|Pytest|Python|Allure|ExtentReports|BDD|Jenkins|SME|Self Taught Data Science and ML Engineer
Hi Everyone,
Today with the rainy whether i wanted to shed a little knowledge about Data Science and its other categories whereby different roles are defined for different people working in this field of expertise.
I myself pondered about what exactly is Data Science all about and why are there confusing interlapping terms used with Data Science and Machine Learning.
Lets start!!
First of all Data Science is a field that concentrates on collecting historic data, finding a way to store it on a suitable platform, Identify certain behavioral patterns that may be indirectly related to improving business decisions, Refining this further and take out meaningful information and feed it to Machine learning Models (Neural Networks) in order to assess the possibility of understanding application (Product usage) patterns and then incorporating insightful information as features of the product thereby improving service offerings and also enhancing the business.
I know it seems like rocket science but in simple terms Data Science offers a way to study any random product usage data and find out certain patterns of usage and fine tune the product features according to the needs of the people who use it in real world. This improves the product and also service offerings and helps the end user in ways never imagined!!
The different stages involved are:
Stage 1 and Stage 2 and Stage 3 is taken care of by Data Engineer
Stage 4 is taken care of by Data Analyst
Stage 5 will be taken care of by people who are really good in Programming and Maths......Data Scientist
On the other hand ML engineers are those who identify the behavior of human brain and try to build different machine learning models from their deep understanding of neural networks and learning approaches like Supervised, Unsupervised and Recurrent techniques commonly used to train the machines and also design LLM's as a way to train these machines.
To understand this field there is a lot to explore but to enter this field with existing set of skills you need to be really good at Python/R programming, PowerBI, Pandas/Hadoop/Apache Spark, Pyspark, Pandas, Pytorch,Numpy,SciPy, Tableu, JupyterLab.
领英推荐
Deep Learning is about fine tuning the existing or new LLM's to understand Data/Information in a better manner.
Gen AI is about building AI tools that can generate a probabilistic output from the set of inputs provided to it. Here we are not sure about what quality of the final product is generated by the AI itself.
For getting better certifications in Data Science you can check out 365datascience.com website.
Apache Spark/Pandas are data manipulation Frameworks, used to process high volumes of data.
I have tried my best to explain Data Science and its related job roles. I hope this helps.
I will be posting more details in relation to ML and neural networks soon...
Thanks,
Swaroop
IT undergraduate at MSIT
4 个月Survey Link: https://forms.office.com/r/Ssk744Mng6.