An overview of Data Science & AI
Nayudamma Chowdary N.
Tech Enthusiast | Exploring Opportunities in AI agents, Generative AI, Machine Learning, Deep Learning, NLP, Blockchain, CRM and ERP.
THE DAWN OF THE MACHINES
The power of AI is just beginning to emerge in ways that will reinvent video and make the seemingly impossible possible.???????????????????????????????????????????????????????????????
??????????????? ??????????????????????????????????????????????????????????????????Neal Mohan, CEO, YouTube
BIRTH:
The term Artificial Intelligence (AI) was first coined at Dartmouth College in 1956, attributed to Computer scientists and Statisticians. Marvin Minsky & John McCarthy were the pioneers of the movement who were optimistic about the future of AI – Enabling machines to stimulate human Intelligence.
However, the well-known English mathematician Alan Turing is regarded as the “Father of Artificial intelligence.” By the virtue of Turing Test - A practical test for computer’s intelligence. The test is a criterion for determining the level of computer’s ability to think, which is a popular benchmark used even today.
THE CHESS GAME:
Turing illustrated his ideas on machine intelligence by reference to chess – A useful source for challenging the ideology by defining all the proposed methods of problem solving. A chess playing computer could play exhaustively through all the available and possible moves. This was turned into reality later in 1955 few years after his death when IBM launched a chess computer – Deep Blue, which defeated ‘Garry Kasprov’ – A reigning world champion.
Let us discuss what exactly is AI?
Essentially AI is an amalgamated phrase that comprises Machine Learning and Deep Learning concepts with more advanced statistical calculations, Artificial Intelligence is a subset of learning the most popular inculcation called “Data Science.”
Before explaining about Data Science let me just give a broader approach to AI.
In simpler words “AI is a process to make machines and tools mimic human behavior and try to learn from the environment”.
We can say Intelligence is of two types:
1. Natural Intelligence – The intelligence that has been naturally developed just by analyzing and learning from past experiences.
Ex: Brain, Natural Satellites.
2. Artificial Intelligence – AI is mimicking the natural intelligence to machines, done with the help of human interference collecting the previous existing data.
Ex: Sophie robot, Artificial Satellites.
Artificial Intelligence is divided into two broad categories:
Narrow AI: A limited form of AI focused on performing a simple task well, such as Google search, or a personal voice assistant just like Siri or Alexa.
General AI: This is a far-fetched form of human level AI allows machines to comprehend, learn and perform intellectual tasks to solve any kind of complex problem– we are not even scratching the ground of this AI. It is just like the movie “iRobot, Terminator.” It is still in the theoretical stage. ?
So how can we develop something called GAI?
It is the blend of a machine possessing the abilities of Decision making, Vision Detection, Classification analysis, Communication, and Speech. But on the ground level the use of all these is done separately.
Decision Making – Tesla cars is one best example of analyzing and making the decision while driving, even video games have developed something called as “bots.” Which resembles human intelligence.
Vision – The Daily use of our regular cellphones has a feature called ‘Face recognition’ to unlock our devices. While the camera’s design is helping in ‘object detection’.
Classification Analysis – The mail classification ‘spam/ham’ is one good example for the classification analysis.
Communication – Language translation tools and voice searches: Google Translator
Speech – Speech Detection is done by the voice assistants: Alexa, Siri.
?
How to attain AI?
As previously discussed, it is an amalgamated form of various disciplines such as machine learning and deep learning. Collecting the data is the key for any machine learning or deep learning algorithm. Preparing the data in a useful format and training the algorithm with the data is one important process and applying different learning algorithms helps us to attain AI.
In simple words- Collect the data, Analyze the data, grab the insights, prepare different machine and deep learning algorithms, train the algorithms with the existing data, and anticipate the algorithms with the new data collected.
CHAT GPT
Chat GPT (Generative Pretrained Transformer) model is just the basic form and is evident to be a revolutionary outcome since its launch on 31st November 2022 by Microsoft co-owned artificial Intelligence research firm OpenAI. This language model is trained on Azure, the company’s cloud business. And is pre trained on humongous datasets of human generated texts, such as books, websites, and social media. This allows the model to learn, train the neural networks and map the patterns of human language. In response to any input getting, it generates text that is similar in style to human generated text.
Well, there are many more aspects of AI tools other than GPT, voice assistants, Dall-E, Jasper. The use of “Clippy.” – An office assistant released by Microsoft– was like the use of GPT. There are lots and lots of other tools that have been developed and a lot more upcoming.
Looming Transition:
Since its release and widespread recognition, Chat GPT has captured the attention of the public and generated significant excitement and interest. Big Tech companies have anticipated the immense, far-reaching potential of this multi-layered field. AI’s ubiquity across the globe has been advancing steadily, as well, with companies like Tesla are now pushing the envelope to deploy the path breaking tools at scale.
Tesla founder Elon Musk showcased the latest version of his humanoid robot, Optimus, predicting that these machines could potentially outnumber humans in future, and the robotics business could surpass his renowned vehicle division.
In India alone, the formidable software sector is likely to lead the market for AI. Many startups have laid their foundations on “Vision AI”. AI growth is expected up to US $7.8 billion by 2025 at a healthy CAGR of 20.2%, according to market intelligence firm ‘IDC’.
Chat GPT popularity has ruffled many feathers, particularly with the rival Alphabet declaring a ‘code red’. Thereby has announced another experimental controversial AI service called BARD, Powered by their own language model (LAMDA). Currently this is a concerning state in all sectors.
?
DATA SCIENCE AND AI
Exploring the World of Data Science
The leading IT Industry IBM has stated:
“Data is the new fuel, new currency, new economy”.
Data Science is a rapidly growing field that has taken the world by storm. It involves the use of statistical and computational techniques to extract insights and knowledge from data. In today's data-driven world, data scientists play a critical role in helping organizations make informed decisions by uncovering patterns and relationships within large and complex data sets.
But What makes Data Science such a fascinating and exciting field to work in?... Well, it is the ability to bring all the diverse skills of statistics, computer programming, mathematics, database knowledge, and domain expertise to solve complex problems.?
?
?
Data science is not just about collecting and analysing the data, it is a problem-solving discipline. Data Scientists use a combination of techniques to understand and make predictions about complex data patterns, identify trends, and make data driven decisions. They use statistical models, machine learning algorithms, and data visualization tools to analyse and interpret data. This allows to identify the patterns, correlation, and anomalies that are not apparent to human eye.
Another important aspect of Data Science is data visualization. The goal of data visualization is to make it easier to understand the insights and patterns that are uncovered through the analysis of big data. This involves the use of graphical representations, such as charts, graphs, and maps, to clearly communicate the findings of the analysis. One of the biggest benefits of this field is its ability to automate decision making.
Machine learning & Deep learning are also important components of Data Science – The former refers to machines learning from existing data and without any human intervention. The latter includes artificial neural networks that enable machines to process large amounts of data by finding the patterns.
The use of other business intelligence tools is also vividly used for displaying the dashboards to stake holders such as Power BI, Tableau. Story Telling is one good art that is a must for any data scientist along with the business knowledge.
Let us just have a glance at the various steps involved for a data scientist:
Problem Definition: Problem definition is one crucial step in the data science process where you define the problem you are trying to solve. It involves the understanding of business or research objectives, identifying the key questions to be achieved. The problem definition may involve defining the scope of the project, including what type of data will be used, how much data is required, and what assumptions are to be made. ??
Effective problem definition requires close collaboration with the stakeholders to ensure that everyone is aligned on the objectives and goals.
Data collection: Data collection is the core for the data science process, that involves gathering relevant data that you need to analyse in order to address the problem. The data collected must be accurate, comprehensive, and representative. The collection of data can be done through various resources, such as databases, web scraping, surveys, or any public repositories. While collecting data it is important to ensure that the data collected is standardized, and less prone to errors and inconsistency.
Some key considerations while collecting data include:
Data Quality: It is required to ensure that the data is accurate, complete, and represents the domain problem. It refers to completeness, accuracy, consistency, and timeliness of data used for analysis. Poor quality data can lead to incorrect insights and unreliable decisions.
Data Quantity: This aspect is essentially required to collect a good amount of data to ensure that analysis is statically significant and can lead to meaningful insights. With increasing availability of data, we have access to large datasets that contain millions or even billions of data records. However, having large data does not necessarily mean that it is useful for analysis or modelling. The data must be relevant to the problem. ????
Data Format: It refers to the structure and organization of data that allows it to be analysed. There are several formats of data used like CSV, JSON, XML, Excel, etc.
Data Storage: Choosing an appropriate data storage solution is a basic ingredient, as it involves the management and organization of data for future analysis. Some generally used databases include Relational databases, NoSQL, MongoDB, Cloud storage.
Types of data used in real time:
·????????Structured Data: Structured data refers to the data that has been predefined format and has been organised. The most common formats are databases and spreadsheets.
·????????Semi-Structured Data: This is data that has some structure, but not as much as structured data. Examples of semi-structured data include email, text messages, and social media posts.
·????????Unstructured Data: This data has no organised format, examples are images, videos, audio, and sensor data.
·????????Time Series Data: Time series data mainly focuses over the changes according to time. Vagaries in stock market is a live example for this type of data.
·????????Geospatial Data: This data is associated with specific locations, such as GPS coordinates, address, and zip codes.
·????????Text Data: This data is represented in texts, such as news articles, social media posts, customer reviews.
·????????Network Data: This data represents connections between objects, such as social networks, transportation networks, communication networks.
?
Data cleaning and Pre-processing: It involves identifying and correcting errors, removing duplicate values, standardizing data formats, treating missing values through imputation or deletion, identification of outliers and handling skew data. This process is done to clinch that it is of high quality and can be used for analysis.
?
Data Engineering: Data engineering is critical for organizations that rely on data to make informed decisions and gain competitive advantage. This is the process of transforming and organising the data to make it suitable for data analysis. This can involve tasks such as data transformation, data integration, and data modelling. To process the data, we require some programming languages. Mostly used programming languages are python, R.
?
Exploratory Data Analysis: EDA is a process of analysing and summarizing the characteristics of the dataset in order to gain better insights and patterns. EDA is an important step in data science and is typically performed before any modelling or hypothesis testing.
?
Here are some common techniques used in EDA:
?
·????????Data visualization: This involves creating visual representation of data, such as bar plots, pie charts, heat maps, scatter plots, box plots, histograms, donut charts, KPI meters, etc.
·????????Descriptive statistics: This involves summarizing the main characteristics of the data, such as mean, median, and standard deviation, to gain better understanding of the data.
·????????Data cleaning: This involves identifying and correcting errors, inconsistencies, and missing values in the data, to ensure that the data is accurate and complete.
·????????Data transformation: This involves transforming the data to make it suitable for analysis, such as scaling and normalization, feature engineering, and encoding categorical variables.
·????????Data exploration: This involves generating hypotheses and testing them by analysing the relationships between different variables in the data.
Model Building: Once the data has been cleaned, pre-processed, and analysed, it can be used to build predictive models. This involves selecting the appropriate algorithm, training the model on the collected data, and evaluating its performance. Machine learning and Deep learning algorithm models are built in this stage of the data science project or study where we built several models to ensure the best outcome, with a higher efficiency, and minimal error. These models are aimed to make predictions and identify the relations with the past data.
Deployment and Monitoring: Once a model has been developed, it needs to be deployed into a production environment and monitored to ensure that it continues to perform well over time. This can involve tasks such as monitoring data quality, monitoring model performance, and updating the model as needed.
In conclusion, Data Science is a rapidly growing field that holds immense potential for businesses and organizations. With the ability to unlock new insights and improve decision making, Data Science has the power to transform the way we do business and make our world a better place. Whether you are a business leader, data analyst, or student, it is worth exploring the exciting world of Data Science and discovering the many opportunities it holds.
?
N.N CHOWDARY