How To Visualize A Decision Tree In 5 Steps
Photo by Alexandre Chambon on Unsplash

How To Visualize A Decision Tree In 5 Steps

Tailored to corporate Windows environments

Decision trees are a very popular machine learning model. The beauty of it comes from its easy-to-understand visualization and fast deployment into production.

Many articles have covered decision tree visualization, but are focused on Mac or Linux environments. The same procedures do not apply to Windows systems. However, data analysts/scientists that work in large corporations often have to use Windows systems with limitations for installing software. I personally ran into this situation and had to search for a solution from different places.

As a result, this article is written to show a step by step guide for how to visualize a decision tree in Python for Windows. Just follow along and plot your first decision tree in Windows!


Step 1: Download and install Anaconda for Windows

Depending on your Python and computer versions, choose the right Anaconda package to download. Anaconda is a common Python distribution that is usually allowed to download and install in large corporations.

Step 2: Download GraphViz

GraphViz is an open-source graph visualization software that is necessary to plot decision trees.

Below is the GraphViz official website.

If you have limited software installation rights within your computer system, downloading the zip file is more convenient. After that, you can unzip the file onto your local drive (e.g., C:\graphviz).

Step 3: Set up your path for Graphviz

Open your local file that sets up the environment whenever Anaconda Prompt is executed. For example, mine is located at C:\Users\liann\Anaconda3\Scripts\activate.bat.

Add the line below to the end of the file.

@set path=C:\graphviz\release\bin\;%PATH%
No alt text provided for this image

Step 4: Run Anaconda Prompt and install software/packages

Open Anaconda Prompt and install packages Graphviz and Pydotplus by typing the below code into the prompt.

conda install graphviz

conda install pydotplus

Step 5: Create the decision tree and visualize it!

Within your version of Python, copy and run the below code. I personally like using Jupyter Lab due to its interactive features.

  • Load the basic packages and read in the data. Breast cancer data is used here as an example.
import sklearn.datasets as datasets
import pandas as pd

breast_cancer = datasets.load_breast_cancer()
df = pd.DataFrame(breast_cancer.data, 
                  columns = breast_cancer.feature_names)
target = breast_cancer.target
  • Load decision tree plot-related packages and set up the basics of the model.
from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier(max_depth=3) #max_depth is maximum number of levels in the tree
clf.fit(df, target)
  • Print the decision tree.
import sklearn.tree as tree
import pydotplus

from sklearn.externals.six import StringIO 
from IPython.display import Image

dot_data = StringIO()
tree.export_graphviz(clf, 
 out_file=dot_data, 
 class_names=breast_cancer.target_names, # the target names.
 feature_names=breast_cancer.feature_names, # the feature names.
 filled=True, # Whether to fill in the boxes with colours.
 rounded=True, # Whether to round the corners of the boxes.
 special_characters=True)

graph = pydotplus.graph_from_dot_data(dot_data.getvalue()) 
Image(graph.create_png())
No alt text provided for this image


For more articles related to data science, please visit https://medium.com/@liannewriting.

要查看或添加评论,请登录

Justin Ng的更多文章

社区洞察

其他会员也浏览了