Issue 7 - Pythonic Poetry
Over the next few issues of this newsletter, together, we will build a fully functioning accounting automation system. From scratch. Using Python.
But first, we'll lay down some foundations.
Because the only right way to develop professional software is to set up and use a local development environment on your computer / laptop, we'll spend a bit of time getting this set. Remember, online (web-based / cloud-based) development environments like Jupyter notebooks, the Cloudera Data Science Workbench, etc., are OK for learning, exploring and sharing ideas, but are not nearly powerful enough to support the professional development of large, production-quality systems.
In this issue, we will talk about our tool of choice for managing 3rd party dependencies (external libraries) and the "virtual environment" for our project.
In addition, you'll also need the Python interpreter itself (download and install from https://www.python.org/downloads/ ) I recommend version 3.10 or newer.
You will also need a professional IDE (Integrated Development Environment) for Python - I recommend you get yourself a copy of PyCharm from here: https://www.jetbrains.com/pycharm/download/ The free Community Edition is good enough for our purposes.
And, to circle back - you can read how to install the Poetry tool in the official documentation here: https://python-poetry.org/docs/
Poetry - the What and Why
If you are doing any kind of professional development with Python, your needs will quickly go beyond the built-in language features and the Standard Library, which comes bundled with it. You will need to install some 3rd party libraries and tools for various tasks, which you don't want to (and shouldn't) code from scratch. We call these your project's "dependencies".
When you install Python packages (e.g. using the official Python package manager `pip`), they are installed globally on your system by default. This can cause version conflicts between projects, when you're working on more than one project.
A "virtual environment" is like a separate Python instance just for one project. Packages can be installed in the virtual environment without affecting the rest of your system. Under the hood, a virtual environment is usually just a hidden project-specific folder somewhere on the filesystem, where these things are kept separate.
Python projects can quickly become disorganized as they grow larger. Keeping track of dependencies and virtual environments manually becomes difficult. This is where Poetry comes in.
Poetry is a tool which helps you manage Python projects and their dependencies. It creates isolated virtual environments for each project, installing packages and their specific versions for you.
Assuming you have Poetry installed, you can initialize a new virtual environment in a project folder using `poetry init`. This creates a `pyproject.toml` file to store configuration (more about this file further down).
To add a dependency, use `poetry add [package]`. For example, to add the popular `requests` library (which allows you to access 3rd party APIs or other resources on the Internet via a URL), you can type `poetry add requests`. Poetry will install the package and automatically add it to pyproject.toml. You can also "pin" the dependency to a specific version, if that is important - by default `poetry` will install the latest stable version of the library.
BTW., if there is already an existing pyproject.toml file in the project, you can install all listed dependencies with `poetry install`. This will create a new virtual environment, if you don't have one for that project yet, and install packages there.
The virtualenvs poetry creates keep dependencies separate between projects. This avoids version conflicts. Poetry also creates lock files (poetry.lock) for reproducing environments exactly - e.g. when you deploy your application, or when you run tests for it as part of a Continuous Integration pipeline - and you want to make sure it has the exact same versions you have installed locally.
Overall, poetry simplifies dependency and virtualenv management. It's an essential tool for organizing larger Python projects. It's a more modern solution than tools like Anaconda (or Pip itself) - the configuration it uses is clearer / more flexible, and the the tool supports more functionality.
Under the hood - more on the pyproject.toml file
The pyproject.toml file is at the heart of poetry. It keeps track of all the configuration for your project in an easy to read format.
When you first run `poetry init`, a basic pyproject.toml is created. This stores metadata like the project name and author.
领英推荐
The main part of pyproject.toml stores your project dependencies. Under [tool.poetry.dependencies], each package installed with poetry add is listed with its version.
For example:
Copy code
[tool.poetry.dependencies]
requests = "^2.24.0"
This indicates the requests package is required at version 2.24.0 or higher.
The pyproject.toml file acts like a manifest for your project's dependencies. With just this file, poetry knows exactly which packages and versions to install.
It can also contain configuration for other Python tools, which support the pyproject.toml format - such as `mypy` and the `ruff` linter.
Poetry also uses the virtualenvs for running commands. For example, if you have installed the `black` auto-formatter, you can run it on your codebase with `poetry run black .` Or, if the main entry point for your application is in `main.py,` you can run that with `poetry run python main.py`. This way, poetry will make sure the correct version of Python is being used and your code will execute inside the project's virtualenv.
Finally, poetry can also be used to package and publish your project to a centralised registry - for example, the public PyPI.org registry, if you want to release your software as open-source. (Or a private PyPI registry, if you wish to release it for company-wide consumption by other teams, for example.)
Together with poetry commands, the pyproject.toml file makes dependency management simpler.
A crash course on Git and GitHub
It is beyond the scope of this newsletter to provide a tutorial on how to use Git and GitHub for revision control - there is plenty of material available online.
However, since we will be using these tools for our accounting automation software, I'll mention them here briefly.
Version control helps manage changes to the codebase over time. Git is probably the most popular version control system at the moment, used by many developers.
To start, install Git on your computer. If you have created a new project, which isn't yet under revision control, you can initialize a Git repository in your project folder with `git init`. This creates a hidden .git subdirectory to store revisions.
(If you have "cloned" an existing repository from a place like GitHub, you don't need to do `git init` - the downloaded codebase will already have a hidden .git folder. )
To save project changes, first `git add` the modified files. When you do this, we say that your updates are now "staged". Then you can run `git commit` to save the staged changes in a new commit. Commits let you revisit project states.
It is highly recommended to include a descriptive "commit message" with each of your commits, explaining the essence of the main changes you are committing. You can do this by using the `-m` parameter, like this: `git commit -m "Adds a new endpoint to the invoicing API to retrieve existing invoices by status code."`
The command `git log` shows a history of commits. Each has an ID to identify it, author, date, and the message summarizing the changes.
Branching lets you divert code to experiment or to add new features in a more isolated way. You can create a new branch and switch to it using a single command - `git checkout -b my-new-feature-branch`. You can later use `git switch main` and `git switch my-new-feature-branch` to go back and forth between the main branch and your feature branch. Once you are finished building your new feature, you can switch to the main branch and run `git merge my-new-feature-branch` to join it back to the main code.
Remote repositories make collaboration easier, especially when you are working in a team. GitHub is probably the most popular platform for storing and browsing remote repositories. On GitHub, repositories can be shared publicly or with a team. Using the command `git push` you can update the remote repo with your local changes. Using `git pull` you can downloads new commits that others may have committed to the remote repo.
With some basic commands, Git enables powerful collaboration and version control for Python projects. Staging commits lets you make focused, logical changes. Branching aids experimentation and changeset isolation. Remotes facilitate teamwork.
NLP research intern | Mitacs GRI' 24 | AI enthusiast | Student at INSAT - National Institute of Applied Science and Technology
1 年Insightful!
Data Engineer @ Nominet | MEng Aerospace Engineer
1 年Great!