Hook, Line, and Cleaner
Jessica Rudd, PhD, MPH, PStat?
Senior Data Engineer - Analytics Modernization Capability Lead @ Intuit Mailchimp | Pipelines and modeling
A primer for pristine python and precise pre-commit prowess
Happy Monday!?Time to make your week awesome! Today’s Data Bytes calories: 2,160 words … 10 minutes.
??Join the Atlanta Go User Group for tech talks (Debugging in Containers) and networking?-?Thursday, May 4 from 6:30 - 8:30 PM ET?- RSVP is required for security at the new location at FullStory.
??What I’m reading -?Why Data Debt Is the Next Technical Debt You Need to Worry About-??Data debt is the cost of avoiding or delaying investment in data quality, infrastructure, and processes. It can lead to poor decision-making, wasted resources, and missed opportunities.
??What I’m working on -?I ran the Eugene Marathon yesterday, drafted this newsletter, and went to bed.
One Big Thing: Python Clean Code Primer
Why did the python programmer get a broom?
…
To sweep away all the bugs and keep their code clean!
Code Cleaning Instructional
Clean and well-organized code is essential for better collaboration, readability, and maintainability. This instructional will guide you through the process of cleaning your code with the help of tools such as Pylint and pre-commit.
Introduction to Code Cleaning
Code cleaning involves:
Using Pylint in VSCode
Pylint is a popular Python code analysis tool that checks for code quality, enforces coding standards, and identifies potential bugs.
Installing Pylint
To install Pylint, run the following command in your terminal or command prompt:
pip install pylint
Configuring Pylint in VSCode
1. Open VSCode and click on the extensions icon in the sidebar.
2. Search for "Python" in the extensions marketplace and install the official Python extension by Microsoft.
3. Search for "Pylint" in the extensions marketplace and install the official Pylint extension by Microsoft.
4. Open the settings by clicking on the gear icon in the lower left corner and selecting "Settings".
5. In the search bar, type "python.linting" to filter the settings related to linting.
6. Make sure the "Python > Linting: Enabled" checkbox is checked (should automatically be enabled if you've installed the Pylint extension).
7. In the "Python > Linting: Pylint Path" field, enter the path to your Pylint installation (e.g., `pylint` on Linux or `pylint.exe` on Windows - should automatically be set if you've installed the Pylint extension)
8. In the "Python > Linting: Pylint Args" field, you can add any additional Pylint command-line arguments if needed.
Now, Pylint will automatically analyze your Python files in VSCode and display any issues it finds.
Setting up Pre-commit
Pre-commit is a tool that manages and maintains pre-commit hooks. These hooks are actions that get executed automatically before each commit, ensuring code quality and consistency.
Why Use Pre-commit
Pre-commit helps you maintain a consistent codebase by automatically running checks and fixes before each commit. This ensures that only clean, well-organized code is committed, making it easier to collaborate with others and maintain the code in the long run. Some benefits of using pre-commit include:
Installing Pre-commit
To install pre-commit, run the following command:
pip install pre-commit
Configuring Pre-commit
repos
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.0.1
hooks:
- id: trailing-whitespace
- id: check-ast
- id: check-case-conflict
- id: check-json
- id: check-merge-conflict
args: [--assume-in-merge]:
3. Run the following command to install the git hooks
pre-commit install
Now, pre-commit will run the specified hooks before each commit. If any of the hooks fail, the commit will be blocked.
Using Pre-commit During Code Development
Adding More Hooks to Pre-commit
To add more hooks to your .pre-commit-config.yaml, simply include them in the hooks section of the file. For example, to add black, isort, flake8, large files, and secrets checks:
repos
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.0.1
hooks:
# ... (other hooks)
- id: check-added-large-files
- id: detect-private-key
- repo: https://github.com/psf/black
rev: 21.9b0
hooks:
- id: black
language_version: python3.10
- repo: https://github.com/pycqa/isort
rev: 5.9.3
hooks:
- id: isort
- repo: https://github.com/pycqa/flake8
rev: 3.9.2
hooks:
- id: flake8:
Make sure to adjust the rev values to the latest versions of the respective tools.
Running Pre-commit on All Files
To run pre-commit on all files in your repository (whether they are part of the current commit or not), use the following command:
pre-commit run --all-files
Replace ```<file1> <file2>``` ... with the paths of the files you want to check.
Skipping Pre-commit During a Commit
If you need to bypass pre-commit checks for a specific commit, you can use the --no-verify flag when committing:
git commit --no-verify -m "Your commit message"
However, this should be used sparingly, as it defeats the purpose of having pre-commit checks in place. If you find yourself frequently skipping checks, consider adjusting the configuration to better suit your project's needs.
Automatically Update Hooks
To automatically use the most recent versions of the hooks, you can use the autoupdate feature of pre-commit. This will update the rev values in your .pre-commit-config.yaml file to the latest stable versions of the tools.
1. To enable automatic updates, first ensure you have pre-commit installed
pip install pre-commit
2. In your project root, run the following command
pre-commit autoupdate
This command will update the rev values in your .pre-commit-config.yaml file to the latest stable versions available.
Note: This method does not guarantee that you will always have the most recent version of each hook as soon as it is released. You should still periodically run pre-commit autoupdate to keep your hooks up to date.
领英推荐
Automating Updates with a Scheduled Job
To further automate the process, you can set up a scheduled job (e.g., using cron on Linux or Task Scheduler on Windows) to periodically run pre-commit autoupdate. This will ensure your hooks are updated regularly without needing manual intervention.
For example, to create a cron job that updates your hooks every week, open your crontab:
cd
crontab -e
add the following entry to your crontab file:
0 0 * * 1 cd /path/to/your/project && /path/to/your/project/.git/hooks/pre-commit autoupdate
Replace /path/to/your/project with the absolute path to your project directory and /path/to/your/project/pre-commit with the absolute path to your pre-commit executable. Ensure that pre-commit is executable by changing into the .git/hooks directory and running:
chmod +x pre-commit
This cron job will run pre-commit autoupdate every Monday at midnight. Adjust the schedule according to your needs and preferences.
Important: Keep in mind that updating hooks automatically can sometimes introduce breaking changes or new issues. Always test your code after updating hooks to ensure everything still works as expected.
Checking Code Complexity (optional)
Measuring code complexity can help identify areas of your code that may be difficult to understand, maintain, or test. One common metric for assessing code complexity is the Cyclomatic Complexity, which counts the number of linearly independent paths through a program's source code.
In this bonus section, we'll demonstrate how to use radon, a Python library that computes various code complexity metrics, including Cyclomatic Complexity.
Installing Radon
To install radon, run the following command:
pip install radon
Using Radon to Check Code Complexity
To check the complexity of a single file, use the following command:
radon cc <file_path>
Replace <file_path> with the path to the Python file you want to analyze.
To check the complexity of all Python files in a directory and its subdirectories, use the following command:
radon cc <directory_path>
Replace <directory_path> with the path to the directory you want to analyze.
By default, radon will display the complexity score for each function or method in the analyzed files. You can customize the output by providing additional command-line options. For example, to display only the functions with a complexity score of C or higher, use the following command:
radon cc <directory_path> --min C
For a complete list of command-line options, run:
radon cc --help
Integrating Radon with Pre-commit
To integrate radon with your pre-commit workflow, you'll need to create a custom pre-commit hook.
1. Create a new file named radon-check.sh in your project's root directory with the following content
#!/bin/s
set -e
# Run radon complexity check
radon cc . --min C
echo "Complexity check passed!"
2. Make the script executable by running
chmod +x radon-check.sh
3. Add a new custom hook to your .pre-commit-config.yaml
repos
# ... (other hooks)
- repo: local
hooks:
- id: radon-complexity-check
name: Radon Complexity Check
entry: ./radon-check.sh
language: script
types: [python]:
Now, the radon complexity check will be run as part of your pre-commit workflow. Adjust the --min option and other command-line arguments in the radon-check.sh script as needed to fit your project's requirements.
APPENDIX
Here is a brief overview of the hooks used in this document:
1.?Black:
2.?isort:
3.?flake8:
4.?trailing-whitespace:
5.?check-ast:
6.?check-case-conflict:
7.?check-json:
8.?check-merge-conflict:
Complete pre-commit-config.yaml used in this document:
repos
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: trailing-whitespace
- id: check-ast
- id: check-case-conflict
- id: check-json
- id: check-merge-conflict
args: [--assume-in-merge]
- id: check-added-large-files
- id: detect-private-key
- repo: https://github.com/psf/black
rev: 23.3.0
hooks:
- id: black
language_version: python3.10
- repo: https://github.com/pycqa/isort
rev: 5.12.0
hooks:
- id: isort
- repo: https://github.com/pycqa/flake8
rev: 6.0.0
hooks:
- id: flake8
- repo: local
hooks:
- id: radon-complexity-check
name: Radon Complexity Check
entry: ./radon-check.sh
language: script
types: [python]:
Helpful Resources
?? Sweet & Sour Candy (this week’s good, bad, or weird of the tech world)
???Exploring AI Ethics of ChatGPT: A Diagnostic Analysis?- Recent breakthroughs in NLP have led to the development of powerful LLMs that can generate text, translate languages, and answer questions. However, these models can also exhibit social biases and toxicity, posing ethical and societal risks. This paper presents a qualitative study of the ethical risks of LLMs, using OpenAI's ChatGPT as an example. The study finds that a significant number of ethical risks cannot be addressed by existing benchmarks, and calls for the development of new benchmarks and design considerations for LLMs.
?? ♀??Large language models are biased. Can logic help save them??- Large language models are trained on massive amounts of text data, which can include biases. Researchers at MIT have developed a new method for mitigating these biases by incorporating logic into the training process. The new method, called Textual Entailment Logic, was effective in reducing bias in various tasks, including question answering and summarization.
?? One last bite
Badassery thrives in the incremental space between wussing out and being an idiot. ~ Des Linden
Thank you for reading Data Bytes. This post is public, so feel free to share it.