Anaconda is bloated - Set up a lean, reliable data science environment with Miniconda
A fat conda courtesy MCsaurus (https://www.deviantart.com/mcsaurus)

Anaconda is bloated - Set up a lean, reliable data science environment with Miniconda

Installing the Anaconda distribution is the norm for data science but it comes with lots of excess software and packages unnecessary to get started. Instead, I recommend installing Miniconda to get just the software and packages that you need. Here are the steps:

  • Download and install Miniconda. It only installs python, conda and a small number of other packages necessary for conda.
  • After initialization, opening a new shell automatically activates the base environment. I believe it is better to form a habit of purposefully activating an environment. Turn this feature off with:
conda config --set auto_activate_base false
  • The base environment will still exist, but you'll have to manually activate it. I recommend creating a new environment that will only rely on the conda-forge channel for packages which will lead to greater compatibility 
  • We now create a new environment with just pandas, scikit-learn, matplotlib, and jupyter notebooks. Notice that we are not installing the jupyter package, as it also installs qtconsole, an enormous package that is unnecessary. Instead, we specify just the notebook package.
conda create -n minimal_ds -c conda-forge --override-channels notebook pandas scikit-learn matplotlib
  • This creates a minimal data science environment with jupyter notebooks available and where all packages are downloaded from the conda-forge channel. Now activate this environment:
conda activate minimal_ds
  • This environment will not be set up to install packages from conda-forge. Run:
conda config --env --add channels conda-forge

This creates a .condarc file within the current environment and places conda-forge as a higher priority over the default channel.

  • To ensure that conda always installs from conda-forge, run:
conda config --env --set channel_priority strict
  • Verify that your .condarc file is correct by running:
conda config --show-sources

The above steps completes the process of installing python and creating an environment to do data science with minimal bloat. It will also force you to activate an environment first before being able to begin any data science work. Importantly, all of the packages will be downloaded from the conda-forge channel decreasing compatibility issues.

If you have the full Anaconda installation, I would recommend removing it completely and starting with a fresh miniconda install and creating environments with just the pieces you need.

A longer post with more details and a video are forthcoming.

要查看或添加评论,请登录

Teddy Petrou的更多文章

社区洞察

其他会员也浏览了