A beginner’s Guide to data mining : RapidMiner
Utkarsh Sharma
SME & Manager | SAP Certified Application Associate | Certified Data Scientist | Intel certified Machine Learning Instructor| Mentor
RapidMiner studio is a data science and data mining platform that lets users extract transform and load data to draw insights if you're in the market for a business intelligence solution you'll want to consider RapidMiner studio. RapidMiner studio is an ETL and predictive analytics platform that lets you import data from local files and databases to clean transform explore visualize and more once you've imported your data you can use the GUI editor to build custom processes and models for predictive analytics.
I'm going to provide a brief introduction to the rapid miner studio interface. I am also going to show how to import a data set and build a simple process. The version of rapid miner studio that I'm using is 9.10 and I'm running the software on a Windows machine after you have downloaded and installed rapid miner and open it up this is the first screen that you are going to see which says welcome to rapid miner studio.
Now let’s see how to perform a basic level data visualization in RapidMiner. Click on the tab import data in repository section on the home screen. You can select from the existing example dataset available in RapidMiner studio or you can import from your computer. While importing the dataset from your computer, RapidMiner will check the data for any errors, and it will give you an option to replace the errors with null values. It will show you a glimpse of your dataset prior to import and if everything seems fine just finish import the data simply by clicking on Finish. Once you will load the data, this screen will appear with the contents of your data:
If we want to look at some of the statistics of this data set we can click the statistics icon and it shows some basic statistics on all the attributes so for example for the price attributes we can see easily that that the car in this data set that has the minimum price that is it sold for the least amount was a car that was sold for 4350 euros, the highest valued car in this data set is the 32500 euros and average price of the cars in this data set is 11860 euros. The age of the car which is in months the minimum is one the maximum is 68, the average age of the car in this data set is about 48 months which is four years.?
You can see now for the categorical variable such as the fuel type it also shows what is the how many how many cars of each type there are so for example you can click on the details, and it shows that that there are three types of cars in terms of fuel type in this data set there are 1264 cars which have which are petrol 155 cars in diesel and 17 cars that run on natural gas.?
领英推荐
You can use the visualization tools in RapidMiner studio to create several different visualizations from your data. Navigating to the results window in the visual editor shows you the outcomes for processes you've run and RapidMiner will automatically suggest visualizations for your data based on discoveries made.
In the statistics tab once you're in the visualizations tab you can edit your charts by using the controls on the left side of the workspace you can add new columns or change your visualization type choosing from several chart types including histograms, scatter plots, bell curves, pie charts and tree maps. When you're done share your visualization by saving to a pdf or jpeg file or printing out a hard copy.
?
You can create models from your data relatively quickly. RapidMiner also offers the RapidMiner marketplace where you can find hundreds of extensions to increase the platform's functionality.
?