登录查看更多内容

Continuation Of our journey after passing the midway mark

Navya sri

student at #DIET

发布日期: 2023年6月11日

??Case Study on Boston House Pricing Dataset :

First import the dataset from sklearn and then load the data next create a dataframe and then next check the complete visualization we get the scatterplot as shown as below.and next i understand the relationshp between the variables by using correlation

●Correlation is a statistical measure that expresses the extent to which two variables are linearly related (meaning they change together at a constant rate). It's a common tool for describing simple relationships without making a statement about cause and effect.

*correlation can be given by :df.corr()we can understand this by using heatmap() from seaborn.

■A heat map is a two-dimensional representation of data in which values are represented by colors. A simple heat map provides an immediate visual summary of information. More elaborate heat maps allow the viewer to understand complex data sets.

●Python Seaborn library is a widely popular data visualization library that is commonly used for data science and machine learning tasks.we get the heatmap() and then next we go for feature scalinng.

we divide the data into training and testing and start building the model and then we apply feature scaling and convert that into linear regression and we check by visualization.

?Insights using visualization:

plt.scatter(df['CRIM'],df['Price'])
plt.xlabel("Crime Rate")
plt.ylabel("Price")
plt.title("Crime Rate vs Price")
plt.show()
plt.scatter(df['RM'],df['Price'])
plt.xlabel("Number of Rooms")
plt.ylabel("Price")
plt.title("Rooms vs Price")
plt.show()

? Feature Analysis:

df.corr()
df.corr().round(2)
import seaborn as sn
plt.figure(figsize=(10,6))
sn.heatmap(df.corr().
round(2),annot=True)
plt.show()

and next Model Construction,Feature Selection,Performance Metrics and Interpretation,Model Deployment,

?Flask Web Framework: A web framework called?Flask?offers conventions and configuration for creating web apps.?

The Flask application's routes can be accessed using the provided URLs.

These requests library can be used to connect with the Flask application programmatically.

import requests
url = "https://127.0.0.1:5000"
data = requests.get(url)
data.text
url = "https://127.0.0.1:5000/data"
data = requests.get(url)
data.text
data.json()

Now the main part in machine learning that is about

??MODEL REPRESENTATION and MODEL DEPLOYMENT

first step is we have to convert the model into pickle

●pickle:this pickles allows us to save our ML models.next we have to dump our model into this pickle and then finally we go to predictions partthe next step is to take idle file and save it in a desktop and we worked out that using Flask.

FLASK:Flask is a web micro-framework developed by Armin Ronacher.we create a simple flask object and provide the routing for that and run that in a browser

We worked on the MODEL DEPLOYMENT : now ,we take the new file and write the code in it we import the modules and load the pickel files and we provide the routing and in python shell we take the data of the Boston housing data set columns as a dictionary and we save that.

And next we open python anywhere this is the platform where everyone can use that it is of free and fast.

??STEPS TO DO IN PYTHONANYWHERE

1.Click on the filed.

2.Create a new directory name any thing u want.

3.upload files in that directory.

4.Create templates folder in the directory which we created .

5.Upload the html files in templates folder.

6.Click on web.

7.Add a new web .

8.Next->Click on flask -> Click on python version which u have .

9.Change the folder name and file name in the link which they gave.

10.next open the link which we created and write the code which is present in the model.

11.Next save that code.

12.Relpad the web .

13.Click on the link .

Finally we can deploy our model.: url:https://lnkd.in/gGxR9Enm

??WEBSCRAPPING :

●Webscrapping:Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database.

There are mainly 4 steps in webscrapping they are

●Connect to the web page of the url by request module

●Fetching and parsing the data using BeautifulSoup and store in dict/lists

■BEAUTIFULSOUP :Beautiful Soup is a Python library that is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner.

●Analyzing HTML tags and its attributes

●Store the data in .xslx/.csv files.

领英推荐

CROPLAND's rstudio conf 2022 headlines

CROPLAND 2 年前

Deepchecks for Data and Model Validation

Aishwarya Srinivasan 2 年前

Deepchecks Demo Dashboard - Try It Out Yourself !

Aishwarya Srinivasan 2 年前

First we import requests and bs4 next from bs4 we import BeautifulSoup.

Next we pick the url from makaan.com which we want to scrap.

Next we started fetching the data using BeautifulSoup.

Then next we extract individual entries by right clicking on the web page -->inspect then we can get that.

Finally we save the data in .csv file.

●Logistic regression is a data analysis technique that uses mathematics to find the relationships between two data factors. It then uses this relationship to predict the value of one of those factors based on the other. The prediction usually has a finite number of outcomes, like yes or no.

●map() function:

Map in Python is a function that works as an iterator to return a result after applying a function to every item of an iterable (tuple, lists, etc.).

●Bias variance trade of

the property of a model that the variance of the parameter estimated across samples can be reduced by increasing the bias in the estimated parameters.

??DECISION TREE :A decision tree is a tree-like structure that represents a series of decisions and their possible consequences. It is used in machine learning for classification and regression tasks. An example of a decision tree is a flowchart that helps a person decide what to wear based on the weather conditions.

-->Root node: It is the top node in the decision tree

-->Decision node: A node in the decision tree that represents a decision.

-->Leaf node: It is the final answer

-->Feature: It is a variable used to make decision

-->target: It is nothing but a leaf node

Statistical terms:

1.Entropy -->Disorder in the data.

Entropy=sum_(1-K)(Pi logPi)

Pi is Probability of class i

K is no. of classes

2.Information gain -->It is a measure of how much information is gained after splitting the data set

Information gain=Entropy(parent)-Sum_(1-k)(ni/n)Entropy(Childi)

3.Gini Index-->Measure of inequality in a dataset, Gini index should be low for better model

Gini Index=1-Sum_(1-k)(Pi)^2

Parameters:

1.criterion-->it is specified aas the function to measure the impurity / qualuity of data(gini&entropy-classification)&(mse&mae-regression)

2.max_depth-->the maximum depth of a decision tree can have(none&any integer number)

3.min_samples_split-->min no of samples required to splita internal node(2 & any int value)

4.min_sample_leaf-->minimum no of smples required to be at leaf node(1&any int value)

5.max_feature-->no of features to consider while performing best splitting(none,any integer value,sqrt of log2)

Using the above data we plotted Decision tree for the mushroom dataset.

Learning & Exploring the new concepts is Exciting. Looking ahead for more to learn from the mentor?Saketh Kallepu?.

Code link:https://lnkd.in/ggs-pi7p

mushroom dataset link:https://lnkd.in/gj8uptfZ

????PARAMETERS

??CRITERION-->IT IS SPECIFIED AAS THE FUNCTION TO MEASURE THE IMPURITY / QUALUITY OF DATA(GINI&ENTROPY-CLASSIFICATION)&(MSE&MAE-REGRESSION)

??MAX_DEPTH-->THE MAXIMUM DEPTH OF A DECISION TREE CAN HAVE(NONE&ANY INTEGER NUMBER)

??MIN_SAMPLES_SPLIT-->MIN NO OF SAMPLES REQUIRED TO SPLITA INTERNAL NODE(2 & ANY INT VALUE)

??MIN_SAMPLE_LEAF-->MINIMUM NO OF SMPLES REQUIRED TO BE AT LEAF NODE(1&any int value)

??MAX_FEATURE-->no of features to consider while performing best splitting(none,any integer value,sqrt of log2)

with the help of these parameters we plot the tree graph and the graphs are as below.

These are the things i have so fsr learnt in my 3rd week of machine learning internship at Codegnan , I am grateful to? our mentor?Saketh Kallepu?for providing me with this invaluable opportunity. I look forward to the remaining weeks of the internship, eager to learn, grow, and make the most of this remarkable learning experience.

Codegnan

Saketh Kallepu

Uppugundla Sairam

Saketh Kallepu

Chief Management Officer and Data Science Mentor | Computational Intelligence

1 年

Keep up your consistency all the best Codegnan will always be there to guide you..

要查看或添加评论，请登录

Navya sri的更多文章

"Unraveling the Mysteries: Exploring Mid-way Topics in Machine Learning"

2023年6月4日

"Unraveling the Mysteries: Exploring Mid-way Topics in Machine Learning"

This article serves as a reflection of my thoughts , experiences,& things i learned as I ventured into the internship…

1 条评论

Continuation Of our journey after passing the midway mark

Navya sri

student at #DIET

??Case Study on Boston House Pricing Dataset :

领英推荐

Navya sri的更多文章

社区洞察

其他会员也浏览了

Z-Order: Visualization and Implementation

The Effects of Data Noise on the Efficiency of Vector Search Algorithms

How to index data into Vector DB from highly unstructured pdfs

Choosing Your Companion for Data and AI Journey: Jupyter Notebook vs. Dataiku DSS. Part 2.

?? Unlock Time Series Insights Using Python’s KPSS Test ??

(Week 9) NumPy and Visualization Tools: A Journey into Efficient Data Manipulation and Stunning Visualizations!

A few methods to deal with class imbalance in target

Data Scientist Journey with the 100 Days of Code Challenge - Part 1

Time Series Episode 6: Battle of forecasting algorithms in “Darts”

Pandas in Multidimensional Magic: Navigating Arrays from 2D to 5D

??Case Study on Boston House Pricing Dataset :

领英推荐

Navya sri的更多文章

"Unraveling the Mysteries: Exploring Mid-way Topics in Machine Learning"

社区洞察

其他会员也浏览了

Z-Order: Visualization and Implementation

The Effects of Data Noise on the Efficiency of Vector Search Algorithms

How to index data into Vector DB from highly unstructured pdfs

Choosing Your Companion for Data and AI Journey: Jupyter Notebook vs. Dataiku DSS. Part 2.

?? Unlock Time Series Insights Using Python’s KPSS Test ??

(Week 9) NumPy and Visualization Tools: A Journey into Efficient Data Manipulation and Stunning Visualizations!

A few methods to deal with class imbalance in target

Data Scientist Journey with the 100 Days of Code Challenge - Part 1

Time Series Episode 6: Battle of forecasting algorithms in “Darts”

Pandas in Multidimensional Magic: Navigating Arrays from 2D to 5D