ADVANCED ANALYTICAL PREDICTION SURVEY ON BREATHING CAPACITY WITH IBM WATSON
Saravanakumar Sekar
Senior Data Scientist | Digital Transformation Expert | AI and Machine Learning | NLP | Ex-Allstate | Insurance | Auto Liability & Casualty | 3AI PINNACLE AWARDS - AI & Analytics Rising Star |
KEY WORDS: IBM Watson, Breathing Capacity Prediction, Machine Learning, Health Care
We have performed Prediction on Breathing Capacity data using IBM Watson for Advance Analytics and best Machine Learning model building with less data and more accuracy
I. Introduction :
Here I tried to build a Machine Learning model for Predicting the Breathing Capacity and reduce the problem count of Health Care Sector because Health Care is one of important domain in world. We have Health Issues among 1/3 of the world population. We have very less amount of Data in Health Care, so using less amount data we need solve the small problems in health care, like wise I choose IBM Watson for Breathing Capacity Prediction. Because we have very minimum level of data related with Breathing Capacity , so I choose IBM Watson. IBM Watson is used to build machine learning model using very minimum data and give more accuracy.
II. Objective
Our objective is predict the Breathing Capacity for Sportsman, Athletes, Patients and People using Data Analysis Techniques to build Machine Learning in IBM Watson with more Accuracy.
III. Walk Through With Machine Learning
Machine learning is a field of computer science that uses statistical techniques to give computer systems the ability to "learn" (e.g., progressively improve performance on a specific task) with data, without being explicitly programmed. The name machine learning was coined by Arthur Samuel in IBM 1959. IBM is the first Company to introduce machine learning and evolved from the study of pattern recognition and computational learning theory in artificial intelligence, machine learning explores the study and construction of algorithms that can learn from and make predictions on data.
IV. Walk Through With IBM Watson
i) What’s Watson?
IBM Watson is the latest innovations in machine learning, Watson lets you learn more with less data. You can integrate AI into your most important business processes, informed by IBM’s rich industry expertise. You can build models from scratch, or leverage our APIs and pre-trained business solutions. No matter how you use Watson, your data and insights belong to you ? and only you.
ii). Why Watson?
Only Watson gives you complete control of what's important to you. With Watson on the IBM Cloud, you maintain ownership of your data, insights, training, and IP. Your business processes get smarter with Watson. From healthcare and education to finance, transportation, and energy, Watson is trained by leading experts in your field, so you can quickly embed into your existing workflows. Watson understands the language of your industry and taps into deep domain knowledge to help you make more informed decisions faster. Your data is valuable, no matter how much (or little) you have. Watson can ingest, enrich, and normalize a wide variety of data types without any additional integration, allowing you to make use of data from a broad range of sources with ease.
iii). How It Is?
Create and train machine learning models with the best tools and the latest expertise in a social environment built by and for data scientists. IBM Watson Machine Learning is a full service IBM Cloud offering that makes it easy for developers and data scientists to work together to integrate predictive capabilities with their applications. The Machine Learning service is a set of REST APIs that you can call from any programming language to develop applications that make smarter decisions, solve tough problems, and improve user outcomes. Use the command line interface and Python client to manage you artifacts. Extend your application with artificial intelligence through the Watson Machine Learning REST API. Take advantage of machine learning models management (continuous learning system) and deployment (online, batch, streaming). Select any of widely supported machine learning frameworks: Tensorflow, Keras, Caffee, Pytorch, Spark MLlib, scikit learn, xgboost and SPSS.
V. Walkthrough With Health Care
According to the World Health Organization, cancer of all types claims approximately 680,000 lives each year in India, making it the second leading cause of death in the country after heart diseases (2). There are 1 million new cancer cases diagnosed every year in India, and this is expected to rise 5-fold by 2020. Among this more than 200,000 individuals receive care for cancer at Manipal facilities each year in India. Manipal Hospital adopts Watson for oncology to help physicians identify options for individualized, Evidence -Based Cancer Care Across India two years later. IBM purpose is to empower leaders, advocates and influencers in health through support that helps them achieve remarkable outcomes, accelerate discovery, make essential connections and gain confidence on their path to solving the world’s biggest health challenges.
VI. Methodologies
Above I mention all Methodologies and technical terms used in IBM Watson. Some what it’s new but not difficult.
VI.1 New Modeler Flow
Connect nodes to build a modeler flow to explore your data and train machine learning models. In this Modeler Flow We can choose our Machine Learning run time modeler. In this modeler I build a SPARK as a run time builder because SPARK will work much better than other frameworks and it will work on In-Memory Computation. In Modeler Flow we can Build Flow of our Model with Statistical Techniques and Machine Learning models.
VI.2 Create Notebook With Ml
Initially we need to create a Project with Watson Studio, Machine Learning Model, Modeler Flow, Data Asset, Python and R Notebooks for deals other Machine Learning, Deep Learning and Statistical Techniques.
VI.3 Import Data
Importing Data is very important in build a model it’s a back bone of model. In this Breathing Capacity data set have 725 observation with 6 variables. Breathing Capacity attribute have people Breathing Capacity values ,the values are measured in seconds. Age attribute have people age from3 to 19 . Height attribute have people heights from 45 to 81 in cm. Smoke is a categorical attribute, it have two factor levels ‘no’ and ‘yes’ . Gender is a categorical attribute, it also have two factor ‘male’ and ‘female’. Caesarean also categorical attribute, it also have two factor ‘no’ and ‘yes’.
VI.4 Refine Data
Refine Data is use to find structure of data set ,it will tell about my data structure with some statistical measures,attribute names, attribute data types. Cleans and shape your data to prepare it for analysis Using refine technique we can perform slicing, filtering, aggregate functions,..etc.
VI.5 Record Operations
Record Operations have select, sort, aggregate ,Append. Using Record Operations I reshape my data set into nonsmoker_male, nonsmoker_female, smoke_male, smoke_female accordingly.
VI.6 Field Operations
Field Operations have filter, Auto data pre-processing , change data type, reclassify, binning, partition, field reorder. Auto data pre-processing is use for remove null values and normalize data set.
VI.7 Graphs
Data Visualization is a way to show a complex data in a form that is graphical and easy to understand. Also since a picture is worth a thousand words, then plots and graphs can be very effective in conveying a clear description of the data. It’s more effective It’s more attractive It’s more impactive Visualization is use to explore the insights of the data set . Visualization contains Graphs, Charts, Plots..etc.
Exploratory Data Analytics is the main part in Data Analytics because we can explore and find the insights through EDA.IBM Watson have 35 and above different type of charts, graphs, plots, maps. Here I’m using histogram, box plot, scatter plot, heat map for Breathing Capacity Prediction. Using IBM Watson we can make charts with less time.
i). Histogram Analysis
Histogram is perform with numeric values and it’s tell about Range, Skewness , Kurtosis, and Distribution with visual format. In this chart I plotted histogram for Breathing Capacity attribute. Breathing capacity is normally distributed and it’s have mean and median values are near to same and the range of values from 5 to 60. That means all the persons have Breathing capacity between min5 to max 60 and one person having 5 another person having 60. Like wise we can find insights through histogram.
ii). Box Plot Analysis
Box plot also perform with numeric values and it’s tell about Minimum, Maximum, Median,25th Quartile,75th Quartile, Range and Outlier. Here I plotted Box plot for Breathing capacity, and Age Attributes. In Breathing capacity Data set most of People are from 9th age to 16th age, like that we need find insights for further movements.
iii) Scatter Plot Analysis
Scatter plot mostly use to find relation between two attributes, Data point variation, and Outlier. Here I put a scatter plot for Breathing Capacity and Age attributes. Here ‘X’ axis is Age and ‘Y’ axis is Breathing Capacity. Based on this Chart we can find the insights. In ‘ X’ axis Age is increasing like that ‘Y’ axis Breathing Capacity also increasing. That means whenever a person’s Age increases, his/her Breathing Capacity also increases.Here Breathing Capacity depends on Age and values are linear.
iv). Heat Map Analysis
Heat Map is use to find correlation measure in visual format for more than one attributes. Here two color representation is there, blue color means it’s not highly correlated and pink color means it’s strongly correlated. Here I plotted Heat Map for Smoke attributes, a deals with Smoke attribute. Here this chart tells, when women and men is not smoking, when women and men is smoking, what is the average , accordingly which women is not smoking ,they have less average , rest all persons have more average.
VI.8 Model Selection
Model Selection is one of the main part in Machine Learning, using Model Selection we make our machine intelligent. In Machine Learning two main model is there, one is linear model and another is non-linear model. Most of Machine Learning algorithms are of Linear model. IBM Watson Provides Auto Classifier, Auto Numeric, Bayes Net, C&R Tree, Random Trees, GLE, Linear, Linear-AS, Regression, LSVM, Logistic, Neural Netwotk, KNN, PCA/Factor, Feature Selecction, Association Rule Mining, Apiriori, Sequence, Anomaly, K-Means, One Class SVM, XGBoost Linear, XGBoost Tree, XGBoost-AS, Machine Learning Algorithms.
We need to choose our model based on EDA Insights and results. In our Breathing Capacity Data set Breathing Capacity attribute is Dependent variable rest all attributes is independent variable because Breathing Capacity attribute depends on all other attributes. Breathing Capacity Data set all attributes is Linear, So we need go with linear model. Here I used MLR( Multi Linear Regression) method and it’s most popular linear model for prediction with accuracy.
Multiple Linear Regression (MLR)
i). What’s MLR?
Simple linear regression will refer to one independent variable to make a prediction. MLR is used to explore relationship between more than one predictor(X) with one target(Y) variable. Regression will give better Decision Making.
ii). When?
Regression is the very famous and first Machine Learning Model. Sir Frances Galton introduce this term in 1886. I build a two models for prediction , the very first one basic MLR method, Second Ridge Regression. Using MLR method I Split my data set randomly,80:20, or K-fold method for Train and Test data.
VI.9 Model Evaluation
Model Evaluation have around from ten to twelve metrics, most popular metrics is Mean Squared Error(MSE),R^2,Root Mean Squared Error(RMSE). Metrics will tell my model Accuracy.IBM Watson will give five metrics values in single click.
If our model have more accuracy then our model is fit Here I got 59% percent accuracy so, I need to rebuild a model , using Ridge regression. Ridge Regression for fit the Regression using hyper-parameter when my MLR,SLR is not fit. I build a two models for prediction , the very first one MLR method, Second one Ridge Regression. Regression will try fit my model automatically using hyper-parameter.
VI.10 Output
When the model has more accuracy, we can predict the values more accurately. Using Intercept and Coefficient values based I can predict the all people Breathing Capacity. Y=Bo+B1X1+B2X2+B3X3…BnXn. B0 is Intercept value of model B1 is Coefficient value X1,X2,X3 is Independent Variable value Y is Dependent Variable value.
Our Model will give Minimum Breathing Capacity, Average Breathing Capacity, Maximum Breathing Capacity in seconds.
VII Conclusion
IBM Watson is fantastic framework for Machine Learning, Deep Learning for all domains and it’s help to Data Scientist as Breathing Capacity Prediction is Awesome area in Health Care, This Machine Learning Model help to predict people Breathing Capacity in Zero Rupees within seconds and more accuracy. Breathing Capacity Prediction is Useful for Athletes, Sports mans, Patients and all people for check and improve their Breathing Capacity.
"Not only Data Analytics is a new trend and Data Analyst need to build Machine Learning Models with Intelligence but also Data Analyst have responsibilities to solve the Problem of People and Business"
Thanks for read my article.
If any clarifications you can comment me anytime
I need to test my model in real world then i will upload the source