How To Visualize A Decision Tree In 5 Steps
Tailored to corporate Windows environments
Decision trees are a very popular machine learning model. The beauty of it comes from its easy-to-understand visualization and fast deployment into production.
Many articles have covered decision tree visualization, but are focused on Mac or Linux environments. The same procedures do not apply to Windows systems. However, data analysts/scientists that work in large corporations often have to use Windows systems with limitations for installing software. I personally ran into this situation and had to search for a solution from different places.
As a result, this article is written to show a step by step guide for how to visualize a decision tree in Python for Windows. Just follow along and plot your first decision tree in Windows!
Step 1: Download and install Anaconda for Windows
Depending on your Python and computer versions, choose the right Anaconda package to download. Anaconda is a common Python distribution that is usually allowed to download and install in large corporations.
Step 2: Download GraphViz
GraphViz is an open-source graph visualization software that is necessary to plot decision trees.
Below is the GraphViz official website.
If you have limited software installation rights within your computer system, downloading the zip file is more convenient. After that, you can unzip the file onto your local drive (e.g., C:\graphviz).
Step 3: Set up your path for Graphviz
Open your local file that sets up the environment whenever Anaconda Prompt is executed. For example, mine is located at C:\Users\liann\Anaconda3\Scripts\activate.bat.
Add the line below to the end of the file.
@set path=C:\graphviz\release\bin\;%PATH%
Step 4: Run Anaconda Prompt and install software/packages
Open Anaconda Prompt and install packages Graphviz and Pydotplus by typing the below code into the prompt.
conda install graphviz conda install pydotplus
Step 5: Create the decision tree and visualize it!
Within your version of Python, copy and run the below code. I personally like using Jupyter Lab due to its interactive features.
- Load the basic packages and read in the data. Breast cancer data is used here as an example.
import sklearn.datasets as datasets import pandas as pd breast_cancer = datasets.load_breast_cancer() df = pd.DataFrame(breast_cancer.data, columns = breast_cancer.feature_names) target = breast_cancer.target
- Load decision tree plot-related packages and set up the basics of the model.
from sklearn.tree import DecisionTreeClassifier clf = DecisionTreeClassifier(max_depth=3) #max_depth is maximum number of levels in the tree clf.fit(df, target)
- Print the decision tree.
import sklearn.tree as tree import pydotplus from sklearn.externals.six import StringIO from IPython.display import Image dot_data = StringIO() tree.export_graphviz(clf, out_file=dot_data, class_names=breast_cancer.target_names, # the target names. feature_names=breast_cancer.feature_names, # the feature names. filled=True, # Whether to fill in the boxes with colours. rounded=True, # Whether to round the corners of the boxes. special_characters=True) graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) Image(graph.create_png())
For more articles related to data science, please visit https://medium.com/@liannewriting.