Hook, Line, and Cleaner

A primer for pristine python and precise pre-commit prowess

Happy Monday!?Time to make your week awesome! Today’s Data Bytes calories: 2,160 words … 10 minutes.

??Join the Atlanta Go User Group for tech talks (Debugging in Containers) and networking?-?Thursday, May 4 from 6:30 - 8:30 PM ET?- RSVP is required for security at the new location at FullStory.

??What I’m reading -?Why Data Debt Is the Next Technical Debt You Need to Worry About-??Data debt is the cost of avoiding or delaying investment in data quality, infrastructure, and processes. It can lead to poor decision-making, wasted resources, and missed opportunities.

??What I’m working on -?I ran the Eugene Marathon yesterday, drafted this newsletter, and went to bed.


One Big Thing: Python Clean Code Primer

No alt text provided for this image

Why did the python programmer get a broom?

To sweep away all the bugs and keep their code clean!

Code Cleaning Instructional

Clean and well-organized code is essential for better collaboration, readability, and maintainability. This instructional will guide you through the process of cleaning your code with the help of tools such as Pylint and pre-commit.

Introduction to Code Cleaning

Code cleaning involves:

  • Following consistent style and naming conventions
  • Removing unused code and imports
  • Identifying and fixing potential bugs and issues
  • Implementing best practices for performance and readability

Using Pylint in VSCode

Pylint is a popular Python code analysis tool that checks for code quality, enforces coding standards, and identifies potential bugs.

Installing Pylint

To install Pylint, run the following command in your terminal or command prompt:

pip install pylint        

Configuring Pylint in VSCode

1. Open VSCode and click on the extensions icon in the sidebar.

2. Search for "Python" in the extensions marketplace and install the official Python extension by Microsoft.

3. Search for "Pylint" in the extensions marketplace and install the official Pylint extension by Microsoft.

4. Open the settings by clicking on the gear icon in the lower left corner and selecting "Settings".

5. In the search bar, type "python.linting" to filter the settings related to linting.

6. Make sure the "Python > Linting: Enabled" checkbox is checked (should automatically be enabled if you've installed the Pylint extension).

7. In the "Python > Linting: Pylint Path" field, enter the path to your Pylint installation (e.g., `pylint` on Linux or `pylint.exe` on Windows - should automatically be set if you've installed the Pylint extension)

8. In the "Python > Linting: Pylint Args" field, you can add any additional Pylint command-line arguments if needed.

Now, Pylint will automatically analyze your Python files in VSCode and display any issues it finds.

Setting up Pre-commit

Pre-commit is a tool that manages and maintains pre-commit hooks. These hooks are actions that get executed automatically before each commit, ensuring code quality and consistency.

Why Use Pre-commit

Pre-commit helps you maintain a consistent codebase by automatically running checks and fixes before each commit. This ensures that only clean, well-organized code is committed, making it easier to collaborate with others and maintain the code in the long run. Some benefits of using pre-commit include:

  • Automatically enforcing code style and best practices
  • Catching potential bugs and issues before they make it into the repository
  • Preventing secrets, sensitive data, and large files from being accidentally committed
  • Reducing the need for manual code reviews and time spent fixing issues

Installing Pre-commit

To install pre-commit, run the following command:

pip install pre-commit        

Configuring Pre-commit

  1. In your project root, create a file named `pre-commit-config.yaml`
  2. Add the following content to the file

repos

- repo: https://github.com/pre-commit/pre-commit-hooks

rev: v4.0.1

hooks:

- id: trailing-whitespace

- id: check-ast

- id: check-case-conflict

- id: check-json

- id: check-merge-conflict

args: [--assume-in-merge]:        

3. Run the following command to install the git hooks

pre-commit install        

Now, pre-commit will run the specified hooks before each commit. If any of the hooks fail, the commit will be blocked.

Using Pre-commit During Code Development

Adding More Hooks to Pre-commit

To add more hooks to your .pre-commit-config.yaml, simply include them in the hooks section of the file. For example, to add black, isort, flake8, large files, and secrets checks:

repos

- repo: https://github.com/pre-commit/pre-commit-hooks

rev: v4.0.1

hooks:

# ... (other hooks)

- id: check-added-large-files

- id: detect-private-key

- repo: https://github.com/psf/black

rev: 21.9b0

hooks:

- id: black

language_version: python3.10

- repo: https://github.com/pycqa/isort

rev: 5.9.3

hooks:

- id: isort

- repo: https://github.com/pycqa/flake8

rev: 3.9.2

hooks:

- id: flake8:        

Make sure to adjust the rev values to the latest versions of the respective tools.

Running Pre-commit on All Files

To run pre-commit on all files in your repository (whether they are part of the current commit or not), use the following command:

pre-commit run --all-files        

Replace ```<file1> <file2>``` ... with the paths of the files you want to check.

Skipping Pre-commit During a Commit

If you need to bypass pre-commit checks for a specific commit, you can use the --no-verify flag when committing:

git commit --no-verify -m "Your commit message"        

However, this should be used sparingly, as it defeats the purpose of having pre-commit checks in place. If you find yourself frequently skipping checks, consider adjusting the configuration to better suit your project's needs.

Automatically Update Hooks

To automatically use the most recent versions of the hooks, you can use the autoupdate feature of pre-commit. This will update the rev values in your .pre-commit-config.yaml file to the latest stable versions of the tools.

1. To enable automatic updates, first ensure you have pre-commit installed

pip install pre-commit        

2. In your project root, run the following command

pre-commit autoupdate        

This command will update the rev values in your .pre-commit-config.yaml file to the latest stable versions available.

Note: This method does not guarantee that you will always have the most recent version of each hook as soon as it is released. You should still periodically run pre-commit autoupdate to keep your hooks up to date.

Automating Updates with a Scheduled Job

To further automate the process, you can set up a scheduled job (e.g., using cron on Linux or Task Scheduler on Windows) to periodically run pre-commit autoupdate. This will ensure your hooks are updated regularly without needing manual intervention.

For example, to create a cron job that updates your hooks every week, open your crontab:

cd 

crontab -e        

add the following entry to your crontab file:

0 0 * * 1 cd /path/to/your/project && /path/to/your/project/.git/hooks/pre-commit autoupdate        

Replace /path/to/your/project with the absolute path to your project directory and /path/to/your/project/pre-commit with the absolute path to your pre-commit executable. Ensure that pre-commit is executable by changing into the .git/hooks directory and running:

chmod +x pre-commit        

This cron job will run pre-commit autoupdate every Monday at midnight. Adjust the schedule according to your needs and preferences.

Important: Keep in mind that updating hooks automatically can sometimes introduce breaking changes or new issues. Always test your code after updating hooks to ensure everything still works as expected.

Checking Code Complexity (optional)

Measuring code complexity can help identify areas of your code that may be difficult to understand, maintain, or test. One common metric for assessing code complexity is the Cyclomatic Complexity, which counts the number of linearly independent paths through a program's source code.

In this bonus section, we'll demonstrate how to use radon, a Python library that computes various code complexity metrics, including Cyclomatic Complexity.

Installing Radon

To install radon, run the following command:

pip install radon        

Using Radon to Check Code Complexity

To check the complexity of a single file, use the following command:

radon cc <file_path>        

Replace <file_path> with the path to the Python file you want to analyze.

To check the complexity of all Python files in a directory and its subdirectories, use the following command:

radon cc <directory_path>        

Replace <directory_path> with the path to the directory you want to analyze.

By default, radon will display the complexity score for each function or method in the analyzed files. You can customize the output by providing additional command-line options. For example, to display only the functions with a complexity score of C or higher, use the following command:

radon cc <directory_path> --min C        

For a complete list of command-line options, run:

radon cc --help        

Integrating Radon with Pre-commit

To integrate radon with your pre-commit workflow, you'll need to create a custom pre-commit hook.

1. Create a new file named radon-check.sh in your project's root directory with the following content

#!/bin/s

set -e

# Run radon complexity check

radon cc . --min C

echo "Complexity check passed!"        

2. Make the script executable by running

chmod +x radon-check.sh        

3. Add a new custom hook to your .pre-commit-config.yaml

repos

# ... (other hooks)

- repo: local

hooks:

- id: radon-complexity-check

name: Radon Complexity Check

entry: ./radon-check.sh

language: script

types: [python]:        

Now, the radon complexity check will be run as part of your pre-commit workflow. Adjust the --min option and other command-line arguments in the radon-check.sh script as needed to fit your project's requirements.

APPENDIX

Here is a brief overview of the hooks used in this document:

1.?Black:

  • Black is a code formatter for Python that enforces a consistent style across your codebase.
  • It automatically formats your code to conform to the PEP 8 style guide, with some customizations.
  • By using Black, you can reduce the time spent on discussing code style during code reviews and ensure a consistent look for your code.

2.?isort:

  • isort is a Python utility that automatically sorts and organizes your imports.
  • It separates imports into sections, such as standard library, third-party libraries, and local project imports.
  • isort can be configured with various options to fit your preferred import style, and it helps keep your import statements clean and consistent.

3.?flake8:

  • flake8 is a Python linting tool that checks your code for compliance with PEP 8 and other code quality issues.
  • It combines the functionality of multiple tools, such as PyFlakes, pycodestyle, and McCabe complexity.
  • flake8 helps identify potential issues like unused imports, undefined variables, and code that is too complex or doesn't follow best practices.

4.?trailing-whitespace:

  • This hook removes any unnecessary whitespace characters at the end of lines in your code.
  • By removing trailing whitespace, you can keep your codebase cleaner and avoid potential issues caused by extra whitespace characters.

5.?check-ast:

  • This hook checks if your Python code can be successfully parsed into an Abstract Syntax Tree (AST).
  • If your code has syntax errors, it cannot be parsed into an AST, and this hook will fail.
  • By using check-ast, you can catch syntax errors in your code before committing them to the repository.

6.?check-case-conflict:

  • This hook checks for files with names that would conflict on case-insensitive filesystems (e.g., Windows and macOS).
  • File name conflicts can cause problems when collaborating with others or deploying your code on different platforms.
  • By using check-case-conflict, you can avoid issues caused by files with similar names but different casing.

7.?check-json:

  • This hook checks if your JSON files are well-formed and can be parsed without errors.
  • Malformed JSON files can cause issues when they are read or processed by other tools or libraries.
  • By using check-json, you can catch JSON syntax errors before they cause problems in your application.

8.?check-merge-conflict:

  • This hook checks for merge conflict markers (e.g., `<<<<<<<`, `=======`, and `>>>>>>>`) in your code.
  • Merge conflict markers are typically added by version control systems like Git when conflicts occur during a merge or rebase.
  • By using check-merge-conflict, you can ensure that you resolve all merge conflicts and remove any conflict markers before committing your changes.

Complete pre-commit-config.yaml used in this document:

repos

- repo: https://github.com/pre-commit/pre-commit-hooks

rev: v4.4.0

hooks:

- id: trailing-whitespace

- id: check-ast

- id: check-case-conflict

- id: check-json

- id: check-merge-conflict

args: [--assume-in-merge]

- id: check-added-large-files

- id: detect-private-key

- repo: https://github.com/psf/black

rev: 23.3.0

hooks:

- id: black

language_version: python3.10

- repo: https://github.com/pycqa/isort

rev: 5.12.0

hooks:

- id: isort

- repo: https://github.com/pycqa/flake8

rev: 6.0.0

hooks:

- id: flake8

- repo: local

hooks:

- id: radon-complexity-check

name: Radon Complexity Check

entry: ./radon-check.sh

language: script

types: [python]:        
No alt text provided for this image

Helpful Resources


?? Sweet & Sour Candy (this week’s good, bad, or weird of the tech world)

???Exploring AI Ethics of ChatGPT: A Diagnostic Analysis?- Recent breakthroughs in NLP have led to the development of powerful LLMs that can generate text, translate languages, and answer questions. However, these models can also exhibit social biases and toxicity, posing ethical and societal risks. This paper presents a qualitative study of the ethical risks of LLMs, using OpenAI's ChatGPT as an example. The study finds that a significant number of ethical risks cannot be addressed by existing benchmarks, and calls for the development of new benchmarks and design considerations for LLMs.

?? ♀??Large language models are biased. Can logic help save them??- Large language models are trained on massive amounts of text data, which can include biases. Researchers at MIT have developed a new method for mitigating these biases by incorporating logic into the training process. The new method, called Textual Entailment Logic, was effective in reducing bias in various tasks, including question answering and summarization.


?? One last bite

Badassery thrives in the incremental space between wussing out and being an idiot. ~ Des Linden

Thank you for reading Data Bytes. This post is public, so feel free to share it.

要查看或添加评论,请登录

Jessica Rudd, PhD, MPH, PStat?的更多文章

  • Bridging the (Analytics) Culinary Gap - Part 3

    Bridging the (Analytics) Culinary Gap - Part 3

    Let's break open the data pantry! Happy Monday! Another week of tasty treats! Today’s Data Bytes calories: 890 words ……

  • Bridging the (Analytics) Culinary Gap - Part 2

    Bridging the (Analytics) Culinary Gap - Part 2

    It's time for the Sous Chef! Happy Tuesday! Another week of tasty treats! Today’s Data Bytes calories: 2,674 words … 13…

  • Bridging the Culinary Gap: A Delicious Data Journey with ETL!

    Bridging the Culinary Gap: A Delicious Data Journey with ETL!

    Happy Monday and Happy New Year! After a year hiatus, DataBytes is back with more tasty data dishes! Today’s Data Bytes…

    2 条评论
  • Why Heuristics Still Matter

    Why Heuristics Still Matter

    While AI and ML are all the rage, it's crucial to remember that they aren't always the silver bullet for every business…

    1 条评论
  • Take the Reigns of the Cloud

    Take the Reigns of the Cloud

    Mastering gcloud Multi-Account Manuevers! Happy Monday! Time to make your week awesome! Today’s Data Bytes calories:…

    1 条评论
  • One Mac, Many Gits to Rule Them All

    One Mac, Many Gits to Rule Them All

    Mastering the Art of Multitasking GitHub Accounts on Your Apple Silicon Mac Happy Tuesday (because I forgot to post…

  • Feeling the heat (of my first app development)??

    Feeling the heat (of my first app development)??

    Do or do not..

    2 条评论
  • The Art of the Chill Week

    The Art of the Chill Week

    Happy Monday! Time to make your week awesome! Today’s Data Bytes calories: 369 words … 2 minutes. ??Join Google…

  • Going Mad for March Madness ??

    Going Mad for March Madness ??

    Happy Monday! Time to make your week awesome! Today’s Data Bytes calories: 781 words … 4 minutes. ??Join Google…

  • JavaScript vs. Python: The Data Engineering Duel ??

    JavaScript vs. Python: The Data Engineering Duel ??

    Happy Monday! Time to make your week awesome! Today’s Data Bytes calories: 646 words … 3 minutes. ??Join Google…

    2 条评论

社区洞察

其他会员也浏览了