Archive of Machine Learning Projects done: Dr Ratika Datta
Dr. Ratika Datta
Financial Economist Doctorate|Expert Advisor and Project Manager, Advisory ,Big 4 and Ministry| Authorship|Research Advisor with XIMB Executive MBA|Data Scientist| Jesus &Mary| Xaviers and DU-PU Alumni
Machine Learning Models as worked upon in Python:Dr Ratika Datta
(Libraries common to all ML Models: Numpy, Pandas,Matplotlib and Sklearn majorly,besides mentioned others )
1.Future Sales Prediction: Sales pitch are given based on Television, Radio and Newspaper. Analysing which medium is causing the maximum Causation to effective Sales.
Model Used: Linear Regression
2.Fake News Detection:Certain News feature are listed , eg. Title, Datetime span and label, whether the news is deciphered as Fake or Real. What is the aim?based on the dataset features, prediction is done whether, new News is Fake or Real, based on Adaptive Expectations.
Model Used: Na?ve Bayes
3.Travel Insurance Prediction:Based on Age, type of employment,Gender, Income Level, dependants or family members ,Graduate or not ,frequent flyer or not, ever travelled abroad etc,the dataset labels whether the travel insurance is paid,then 1, else not purchased , labelled as 0.What is the purpose?Whether certain prediction features as mentioned, the model tells whether the additional person , is the right one to pitch for the travel insurance.
Model Used: Decision Tree Classifier
4.Uber trip Analysis for New York City:the mentioned dataset categorisation, includes, mentioned Latitude, Longitude of top?regular clients,Gender etc. From Datetime , Weekday, weekhour,and time of highest uber trip cost are evaluated. Gender mentions who uses more uber etc?is also categorised. Based on Weekday, the busiest day is analysed.Based on Weekhour, the top most time of usage of Uber is classified.
?
5. Water Potability Analysis :0 is when water is not potable and 1 is water is potable. The water quality is deciphered across Ph level,hardness, turbidity, Solids,Sulphates, Chloromines,Conductivity,Organic Carbon, Triholomethanes, etc. and Water quality is categorised.
Model Used: Random Forest Classifier
6. IRIS Flower Classification Analysis: Flower Petal , width, size and length, categorises itself into One out of the three different types of petal consideration in data.
Model Used: KNN(K Nearest Neighbour) Classifier
7. Sentiment Analysis across Tweets and Word Cloud:Based on?Ranking or categorisation of Tweets,any topic could contribute into positive or negative or neutral contribution. The polarity in the tweet words is checked practically into Word Cloud. Subsequently, rating less than 3, is negative, equal to three is neutral and any rating above 3, is positive.Then, Word Cloud is checked for cross check for the polarity in tweets. Subsequently, the overall score is checked for the effective analysis for the Sentiment and Ratings.
Model Used: Decision Tree Classifier and NLTK(Stopwords), Word Cloud etc
8. Language Detection:Eg a tweet, is written into 22 different languages. The prediction result for any other sentence?would give the language in use mention.
Model Used:Natural Language Processing task , Multinominal Na?ve Bayes
9. Covid 19 Vaccine Analysis is done, implies, combinations of which vaccines was effective in which countries.
10.Electricity Price Prediction:Based on Date, Time,Holiday or weekday, or Wind Speed prediction, the Co2 intensity of Electricity Production, the actual national load and forecasted load, heat or temperature, actual price of electricity is predicted based on the reduction in the Cost of production or break even in pricing.
Model Used:Random Forest Regressor
11.Bank Churn Analysis: Based on CIBIL Score, Age, Employment, has credit card or not, based on gender, based on geography,analysis is done whether a client exited from the banking setup and initiated the bank churn calculation.
Model Used:Decision Tree, Random forest Clasifier,Gradient Boosting, Logistic regression accuracy score is checked
12.Google Word Search Analysis:Eg. Based on the search of the word *Machine Learning*, What are the top 10 countries , having maximum google search of the word * Machine Learning*.Also, based on the library called Pytrends, analysis is done for different Time periods , the same word usage.
Library ?Used:Pytrends library used with Data Visualisations
13.Car price prediction based on Car features :
Based on Horsepower,Engine Location, Engine Size,Door number, Car Height, Car Weight,Car Length, Car Mileage, Car Name, Car Symboling,Car fuel Economy, Car price is also histogrammed binned into Low, Medium and High price range. Fuel here, majorly, is Gas and Diesel. The stroke, Wheelrpm and CompressionRatio etc are other set of variables as analysed.Later the Car Price prediction Model assumes the car price for testing data set with?effective fit.
领英推荐
Model Used: Decision Tree Classifier
14.Classification Algorithm:
The Classification of the Dataset can be done using Logistic Regression, Decision Tree Classifier, Passive aggressive Classifier ,KNN Classifier, The accuracy score is checked for the best classification on Social Purchased Data, across the variables.
15.Orders Prediction:
Based on Store Location,Region Code, Store workday, Location Type,orders done ,Sales done,Date of purchase, Discount or not , holiday or not,?future order is predicted.What’s the aim? The order prediction can happen based on the features given.
Model Used: Light GBM Processor
16.Student grade?prediction at the graduate Level.
Based on G1, G2, failures, absences and attendance and self study time, the model can predict the test data on (G3) with good accuracy score.
Model Used:Linear Regression
17.Waiter's tip prediction:
Based on whether, the time of order is Lunch or Dinner, Male or Female,Smoker or no smoker, number of people for the treat,Day of the week,total bill .Based on these variables , Waiter tip consideration can be predicted.
Model Used: Linear Regression
18.Unemployment Trend Analysis after COVID in India
The Unemployment is calculated across the country India’s states , immediately after Covid.The variables under the study include Gender, Unemployment Rate(statewise), Labour force participation rate, latitude and Longitude,People employed.The Sunburst diagram, helps in predicting the actual trend.
19.Computer Vision Library for Count of images in Jpeg dynamically
Computer Vision library to identify count of images in Jpegs, eg. The same is used in self- driving Cars,dynamically ,(it's a heavy library),code understanding is essential but a no run on system is pivotal.
Work in Progress:
20. ?Image Classification with Pytorch
21. Spam Detection:Based on Na?ve Bayes Classifiers, whether, a particular SMS is causing Spam or no spam.Based on Label, datetime,and title, and also, word Cloud, spam detection is done for predictive new SMS too, as spam or non spam.
22.Tensor Flow and Keras and image recognition is done too, based on the Deep Learning Model.
?
*****************************************************************************
Self-drafted
Dr Ratika Datta
(Aiming at simplifying practicality , across Economics , Finance, Analytics and Econometrics)