10 Useful Jupyter Notebook Extensions for a Data Scientist
Every Data Scientist spends most of his time in data visualization, preprocessing and model tuning based on the results. These are the toughest situations for every Data Scientist because you will get a good model when you perform all these three steps precisely. There are 10 very helpful jupyter notebook extensions to help in these circumstances.
1. Qgrid
Qgrid is a Jupyter notebook widget which uses SlickGrid to render pandas DataFrames within a Jupyter notebook. This allows you to explore your DataFrames with intuitive scrolling, sorting and filtering controls, as well as edit your DataFrames by double-clicking cells.
Installation:
pip install qgrid #Installing with pip conda install qgrid #Installing with conda
2. itables
ITables turns pandas DataFrames and Series into interactive data tables in both your notebooks and their HTML representation. ITables uses basic Javascript, and because of this, it will only work in Jupyter Notebook, not in JupyterLab.
Installation:
pip install itables
Activate the interactive mode for all series and dataframes with
from itables import init_notebook_mode init_notebook_mode(all_interactive=True) import world_bank_data as wb df = wb.get_countries() df
3. Jupyter DataTables
Data scientists and in fact many developers work with dataframe on daily basis to interpret data to process them. The common workflow is to display the dataframe, take a look at the data schema and then produce multiple plots to check the distribution of the data to have a clearer picture, perhaps search some data in the table, etc...
What if those distribution plots were part of the standard DataFrame and we had the ability to quickly search through the table with minimal effort? What if it was the default representation?
The jupyter-datatables uses jupyter-require to draw the table.
Installation:
pip install jupyter-datatables
Usage
from jupyter_datatables import init_datatables_mode init_datatables_mode()
4. ipyvolume
ipyvolume helps in 3d plotting for Python in the Jupyter notebook based on IPython widgets using WebGL.
Ipyvolume currently can
- Do (multi) volume rendering.
- Create scatter plots (up to ~1 million glyphs).
- Create quiver plots (like scatter, but with an arrow pointing in a particular direction).
- Do lasso mouse selections.
- Render in stereo, for virtual reality with Google Cardboard.
- Animate in d3 style, for instance, if the x coordinates or colour of a scatter plots changes.
- Animations / sequences, all scatter/quiver plot properties can be a list of arrays, which can represent time snapshots etc.
Installation
$ pip install ipyvolume #Installing with pip $ conda install -c conda-forge ipyvolume #Installing with conda
5. bqplot
bqplot is a 2-D visualization system for Jupyter, based on the constructs of the Grammar of Graphics.
Goals
- provide a unified framework for 2-D visualizations with a pythonic API.
- provide a sensible API for adding user interactions (panning, zooming, selection, etc)
Two APIs are provided
- Users can build custom visualizations using the internal object model, which is inspired by the constructs of the Grammar of Graphics (figure, marks, axes, scales), and enrich their visualization with our Interaction Layer.
- Or they can use the context-based API similar to Matplotlib’s pyplot, which provides sensible default choices for most parameters.
Installation
$ pip install bqplot #Installing with pip $ conda install -c conda-forge bqplot #Installing with conda
6. livelossplot
Don’t train deep learning models blindfolded! Be impatient and look at each epoch of your training!
livelossplot provides a live training loss plot in Jupyter Notebook for Keras, PyTorch and other frameworks.
Installation
pip install livelossplot
Usage
from livelossplot import PlotLossesKeras model.fit(X_train, Y_train, epochs=10, validation_data=(X_test, Y_test), callbacks=[PlotLossesKeras()], verbose=0)
7. TensorWatch
TensorWatch is a debugging and visualization tool designed for data science, deep learning and reinforcement learning from Microsoft Research. It works in Jupyter Notebook to show real-time visualizations of your machine learning training and perform several other key analysis tasks for your models and data.
Installation
pip install tensorwatch
8. Polyaxon
Polyaxon is a platform for building, training, and monitoring large scale deep learning applications. We are making a system to solve reproducibility, automation, and scalability for machine learning applications. Polyaxon deploys into any data center, cloud provider, or can be hosted and managed by Polyaxon, and it supports all the major deep learning frameworks such as Tensorflow, MXNet, Caffe, Torch, etc.
Installation
$ pip install -U polyaxon
9. handcalcs
handcalcs is a library to render Python calculation code automatically in Latex, but in a manner that mimics how one might format their calculation if it were written with a pencil: write the symbolic formula, followed by numeric substitutions, and then the result.
Installation
pip install handcalcs
10. jupyternotify
jupyternotify provides a Jupyter notebook cell magic %%notify that notifies the user upon completion of a potentially long-running cell via a browser push notification. Use cases include long-running machine learning models, grid searches, or Spark computations. This magic allows you to navigate away to other work and still get a notification when your cell completes.
Installation
pip install jupyternotify
Thank you for reading!
Any feedback and comments are, greatly appreciated!