What is Data Scientist, Data Engineer and Data Analyst?

What is Data Scientist, Data Engineer and Data Analyst?

Hi Everyone,

Today with the rainy whether i wanted to shed a little knowledge about Data Science and its other categories whereby different roles are defined for different people working in this field of expertise.

I myself pondered about what exactly is Data Science all about and why are there confusing interlapping terms used with Data Science and Machine Learning.

Lets start!!

First of all Data Science is a field that concentrates on collecting historic data, finding a way to store it on a suitable platform, Identify certain behavioral patterns that may be indirectly related to improving business decisions, Refining this further and take out meaningful information and feed it to Machine learning Models (Neural Networks) in order to assess the possibility of understanding application (Product usage) patterns and then incorporating insightful information as features of the product thereby improving service offerings and also enhancing the business.

I know it seems like rocket science but in simple terms Data Science offers a way to study any random product usage data and find out certain patterns of usage and fine tune the product features according to the needs of the people who use it in real world. This improves the product and also service offerings and helps the end user in ways never imagined!!

The different stages involved are:

  1. Data Gathering and Storage --- Hadoop platform can be used here or else Apache Spark, Pyspark, JupyterLab, Spark SQL or else you can use Pandas with Pytorch, Numpy, and SciPy for structured low volume data. -------Data Engineers do this
  2. Identify unique patterns in the stored data ---- SQL Queries Data Engineer will do this
  3. Refinement of Data and Taking meaningful information and create Data Pipelines-----SQL Queries Data Engineer will do this
  4. Present this information in a Visual Manner ------ Tableu/PowerBI (Snowflake) -----Data Analyst will do this
  5. Presenting the Data Analytics about the product usage patterns-------Python/R

Stage 1 and Stage 2 and Stage 3 is taken care of by Data Engineer

Stage 4 is taken care of by Data Analyst

Stage 5 will be taken care of by people who are really good in Programming and Maths......Data Scientist

On the other hand ML engineers are those who identify the behavior of human brain and try to build different machine learning models from their deep understanding of neural networks and learning approaches like Supervised, Unsupervised and Recurrent techniques commonly used to train the machines and also design LLM's as a way to train these machines.

To understand this field there is a lot to explore but to enter this field with existing set of skills you need to be really good at Python/R programming, PowerBI, Pandas/Hadoop/Apache Spark, Pyspark, Pandas, Pytorch,Numpy,SciPy, Tableu, JupyterLab.

Deep Learning is about fine tuning the existing or new LLM's to understand Data/Information in a better manner.

Gen AI is about building AI tools that can generate a probabilistic output from the set of inputs provided to it. Here we are not sure about what quality of the final product is generated by the AI itself.

For getting better certifications in Data Science you can check out 365datascience.com website.

Apache Spark/Pandas are data manipulation Frameworks, used to process high volumes of data.

I have tried my best to explain Data Science and its related job roles. I hope this helps.

I will be posting more details in relation to ML and neural networks soon...

Thanks,

Swaroop





要查看或添加评论,请登录

社区洞察

其他会员也浏览了