Continuation Of our journey after passing the midway mark
??Case Study on Boston House Pricing Dataset :
First import the dataset from sklearn and then load the data next create a dataframe and then next check the complete visualization we get the scatterplot as shown as below.and next i understand the relationshp between the variables by using correlation
●Correlation is a statistical measure that expresses the extent to which two variables are linearly related (meaning they change together at a constant rate). It's a common tool for describing simple relationships without making a statement about cause and effect.
*correlation can be given by :df.corr()we can understand this by using heatmap() from seaborn.
■A heat map is a two-dimensional representation of data in which values are represented by colors. A simple heat map provides an immediate visual summary of information. More elaborate heat maps allow the viewer to understand complex data sets.
●Python Seaborn library is a widely popular data visualization library that is commonly used for data science and machine learning tasks.we get the heatmap() and then next we go for feature scalinng.
we divide the data into training and testing and start building the model and then we apply feature scaling and convert that into linear regression and we check by visualization.
?Insights using visualization:
plt.scatter(df['CRIM'],df['Price'])
plt.xlabel("Crime Rate")
plt.ylabel("Price")
plt.title("Crime Rate vs Price")
plt.show()
plt.scatter(df['RM'],df['Price'])
plt.xlabel("Number of Rooms")
plt.ylabel("Price")
plt.title("Rooms vs Price")
plt.show()
? Feature Analysis:
df.corr()
df.corr().round(2)
import seaborn as sn
plt.figure(figsize=(10,6))
sn.heatmap(df.corr().
round(2),annot=True)
plt.show()
and next Model Construction,Feature Selection,Performance Metrics and Interpretation,Model Deployment,
?Flask Web Framework: A web framework called?Flask?offers conventions and configuration for creating web apps.?
The Flask application's routes can be accessed using the provided URLs.
These requests library can be used to connect with the Flask application programmatically.
import requests
url = "https://127.0.0.1:5000"
data = requests.get(url)
data.text
url = "https://127.0.0.1:5000/data"
data = requests.get(url)
data.text
data.json()
Now the main part in machine learning that is about
??MODEL REPRESENTATION and MODEL DEPLOYMENT
first step is we have to convert the model into pickle
●pickle:this pickles allows us to save our ML models.next we have to dump our model into this pickle and then finally we go to predictions partthe next step is to take idle file and save it in a desktop and we worked out that using Flask.
FLASK:Flask is a web micro-framework developed by Armin Ronacher.we create a simple flask object and provide the routing for that and run that in a browser
We worked on the MODEL DEPLOYMENT : now ,we take the new file and write the code in it we import the modules and load the pickel files and we provide the routing and in python shell we take the data of the Boston housing data set columns as a dictionary and we save that.
And next we open python anywhere this is the platform where everyone can use that it is of free and fast.
??STEPS TO DO IN PYTHONANYWHERE
1.Click on the filed.
2.Create a new directory name any thing u want.
3.upload files in that directory.
4.Create templates folder in the directory which we created .
5.Upload the html files in templates folder.
6.Click on web.
7.Add a new web .
8.Next->Click on flask -> Click on python version which u have .
9.Change the folder name and file name in the link which they gave.
10.next open the link which we created and write the code which is present in the model.
11.Next save that code.
12.Relpad the web .
13.Click on the link .
Finally we can deploy our model.: url:https://lnkd.in/gGxR9Enm
??WEBSCRAPPING :
●Webscrapping:Web scraping is the process of using bots to extract content and data from a website. Unlike screen scraping, which only copies pixels displayed onscreen, web scraping extracts underlying HTML code and, with it, data stored in a database.
There are mainly 4 steps in webscrapping they are
●Connect to the web page of the url by request module
●Fetching and parsing the data using BeautifulSoup and store in dict/lists
■BEAUTIFULSOUP :Beautiful Soup is a Python library that is used for web scraping purposes to pull the data out of HTML and XML files. It creates a parse tree from page source code that can be used to extract data in a hierarchical and more readable manner.
●Analyzing HTML tags and its attributes
●Store the data in .xslx/.csv files.
领英推荐
First we import requests and bs4 next from bs4 we import BeautifulSoup.
Next we pick the url from makaan.com which we want to scrap.
Next we started fetching the data using BeautifulSoup.
Then next we extract individual entries by right clicking on the web page -->inspect then we can get that.
Finally we save the data in .csv file.
●Logistic regression is a data analysis technique that uses mathematics to find the relationships between two data factors. It then uses this relationship to predict the value of one of those factors based on the other. The prediction usually has a finite number of outcomes, like yes or no.
●map() function:
Map in Python is a function that works as an iterator to return a result after applying a function to every item of an iterable (tuple, lists, etc.).
●Bias variance trade of
the property of a model that the variance of the parameter estimated across samples can be reduced by increasing the bias in the estimated parameters.
??DECISION TREE :A decision tree is a tree-like structure that represents a series of decisions and their possible consequences. It is used in machine learning for classification and regression tasks. An example of a decision tree is a flowchart that helps a person decide what to wear based on the weather conditions.
-->Root node: It is the top node in the decision tree
-->Decision node: A node in the decision tree that represents a decision.
-->Leaf node: It is the final answer
-->Feature: It is a variable used to make decision
-->target: It is nothing but a leaf node
Statistical terms:
1.Entropy -->Disorder in the data.
Entropy=sum_(1-K)(Pi logPi)
Pi is Probability of class i
K is no. of classes
2.Information gain -->It is a measure of how much information is gained after splitting the data set
Information gain=Entropy(parent)-Sum_(1-k)(ni/n)Entropy(Childi)
3.Gini Index-->Measure of inequality in a dataset, Gini index should be low for better model
Gini Index=1-Sum_(1-k)(Pi)^2
Parameters:
1.criterion-->it is specified aas the function to measure the impurity / qualuity of data(gini&entropy-classification)&(mse&mae-regression)
2.max_depth-->the maximum depth of a decision tree can have(none&any integer number)
3.min_samples_split-->min no of samples required to splita internal node(2 & any int value)
4.min_sample_leaf-->minimum no of smples required to be at leaf node(1&any int value)
5.max_feature-->no of features to consider while performing best splitting(none,any integer value,sqrt of log2)
Using the above data we plotted Decision tree for the mushroom dataset.
Learning & Exploring the new concepts is Exciting. Looking ahead for more to learn from the mentor?Saketh Kallepu?.
Code link:https://lnkd.in/ggs-pi7p
mushroom dataset link:https://lnkd.in/gj8uptfZ
????PARAMETERS
??CRITERION-->IT IS SPECIFIED AAS THE FUNCTION TO MEASURE THE IMPURITY / QUALUITY OF DATA(GINI&ENTROPY-CLASSIFICATION)&(MSE&MAE-REGRESSION)
??MAX_DEPTH-->THE MAXIMUM DEPTH OF A DECISION TREE CAN HAVE(NONE&ANY INTEGER NUMBER)
??MIN_SAMPLES_SPLIT-->MIN NO OF SAMPLES REQUIRED TO SPLITA INTERNAL NODE(2 & ANY INT VALUE)
??MIN_SAMPLE_LEAF-->MINIMUM NO OF SMPLES REQUIRED TO BE AT LEAF NODE(1&any int value)
??MAX_FEATURE-->no of features to consider while performing best splitting(none,any integer value,sqrt of log2)
with the help of these parameters we plot the tree graph and the graphs are as below.
These are the things i have so fsr learnt in my 3rd week of machine learning internship at Codegnan , I am grateful to? our mentor?Saketh Kallepu?for providing me with this invaluable opportunity. I look forward to the remaining weeks of the internship, eager to learn, grow, and make the most of this remarkable learning experience.
Chief Management Officer and Data Science Mentor | Computational Intelligence
1 年Keep up your consistency all the best Codegnan will always be there to guide you..