THE ART OF CODING
Mustafa Saifee - PM Grad at CMU
Carnegie Mellon University Product Management Grad | xMicrosoft | xAmazon AWS | xHarness.io | Product + Code + Content + Community | Looking for Summer 2025 Internship opportunities
Writing code is an art
This article is also available on my website: https://www.saifeemustafa.com/
The pathway to that art that I believe and learned from various platforms are
- Writing clean and modular code
- Writing efficient code
- Code refactoring
- Adding meaningful documentation
- Using version control
Some terms that you need to familiarize yourself with:
- PRODUCTION CODE: software running on production servers to handle live users and data of the intended audience. Note this is different from production quality code, which describes code that meets expectations in reliability, efficiency, etc., for production. Ideally, all code in production meets these expectations, but this is not always the case.
- CLEAN: readable, simple, and concise. A characteristic of production quality code that is crucial for collaboration and maintainability in software development.
- MODULAR: logically broken up into functions and modules. This is an essential characteristic of production quality code that makes your code more organized, efficient, and reusable.
- MODULE: a file. Modules allow code to be reused by encapsulating them into files that can be imported into other files.
Writing Clean Code: Meaningful Names Tip: Use meaningful names.
- Be descriptive and imply type - E.g., for booleans, you can prefix with is_ or has_ to make it clear it is a condition. You can also use part of speech to imply types, like verbs for functions and nouns for variables.
- Be consistent but clearly differentiate - E.g., age_list, and age are easier to differentiate than ages and age.
- Avoid abbreviations and especially single letters - (Exception: counters and common math variables) Choosing when these exceptions can be made can be determined based on the audience for your code. If you work with other data scientists, certain variables may be common knowledge. While if you work with full-stack engineers, it might be necessary to provide more descriptive names in these cases as well.
- Long names != descriptive names - You should be descriptive, but only with relevant information. E.g., good function names describe what they do well without including details about implementation or highly specific uses.
Try testing how effective your names are by asking a fellow programmer to guess the purpose of a function or variable based on its name, without looking at your code. Coming up with meaningful names often requires effort to get right.
Writing Clean Code: Nice Whitespace
Tip: Use whitespace properly
- Organize your code with consistent indentation - the standard is to use 4 spaces for each indent. You can make this a default in your text editor.
- Separate sections with blank lines to keep your code well organized and readable.
- Try to limit your lines to around 79 characters, which is the guideline given in the PEP 8 style guide. In many good text editors, there is a setting to display a subtle line that indicates where the 79 character limit is.
https://www.python.org/dev/peps/pep-0008/
Writing Modular Code
Tip: DRY (Don't Repeat Yourself)
Don't repeat yourself! Modularization allows you to reuse parts of your code. Generalize and consolidate repeated code in functions or loops.
Tip: Abstract out the logic to improve readability
Abstracting out code into a function makes it less repetitive and improves readability with descriptive function names. Although your code can become more readable when you abstract out logic into functions, it is possible to over-engineer this and have way too many modules, so use your judgment.
Tip: Minimize the number of entities (functions, classes, modules, etc.)
There are tradeoffs to having function calls instead of in-line logic. If you have broken up your code into an unnecessary amount of functions and modules, you'll have to jump around everywhere if you want to view the implementation details for something that may be too small to be worth it. Creating more modules doesn't necessarily result in effective modularization.
Tip: Functions should do one thing
Each function you write should be focused on doing one thing. If a function is doing multiple things, it becomes more difficult to generalize and reuse. Generally, if there's an "and" in your function name, consider refactoring.
Tip: Arbitrary variable names can be more effective in certain functions
Arbitrary variable names in general functions can actually make the code more readable.
Tip: Try to use fewer than three arguments per function
Try to use no more than three arguments when possible. This is not a hard rule, and there are times it is more appropriate to use many parameters. But in many cases, it's more effective to use fewer arguments. Remember, we are modularizing to simplify our code and make it more efficient to work with. If your function has a lot of parameters, you may want to rethink how you are splitting this up.
Efficient Code
Knowing how to write code that runs efficiently is another essential skill in software development. Optimizing code to be more efficient can mean making it:
- Execute faster
- Take up less space in memory/storage
The project you're working on would determine which of these is more important to optimize for your company or product. When performing lots of different transformations on large amounts of data, we can make orders of magnitudes of difference in performance.
Refactoring Code
- REFACTORING: restructuring your code to improve its internal structure without changing its external functionality. This gives you a chance to clean and modularize your program after you've got it working.
- Since it isn't easy to write your best code while still trying to get it working, allocating time to do this is essential to producing high-quality code. Despite the initial time and effort required, this pays off by speeding up your development time in the long run.
- You become a much stronger programmer when you're constantly looking to improve your code. The more you refactor, the easier it will be to structure and write good code the first time.
Documentation
- DOCUMENTATION: additional text or illustrated information that comes with or is embedded in the code of the software.
- Helpful for clarifying complex parts of code, making your code easier to navigate, and quickly conveying how and why different components of your program are used.
- Several types of documentation can be added at different levels of your program:
- In-line Comments - line level
- Docstrings - module and function level
- Project Documentation - project level
In-line Comments
- In-line comments are text following hash symbols throughout your code. They are used to explain parts of your code, and really help future contributors understand your work.
- One way comments are used is to document the major steps of complex code to help readers follow. Then, you may not have to understand the code to follow what it does. However, others would argue that this is using comments to justify bad code and that if code requires comments to follow, it is a sign refactoring is needed.
- Comments are valuable for explaining where code cannot. For example, the history behind why a certain method was implemented in a specific way. Sometimes an unconventional or seemingly arbitrary approach may be applied because of some obscure external variable causing side effects. These things are difficult to explain with code.
Docstrings
Docstring, or documentation strings, are valuable pieces of documentation that explain the functionality of any function or module in your code. Ideally, each of your functions should always have a docstring.
Triple quotes surround docstrings. The first line of the docstring is a brief explanation of the function's purpose.
One line docstring
def population_density(population, land_area): """Calculate the population density of an area.""" return population / land_area
If you think that the function is complicated enough to warrant a longer description, you can add a more thorough paragraph after the one-line summary.
Multi-line line docstring
def population_density(population, land_area): """Calculate the population density of an area. Args: population: int. The population of the area land_area: int or float. This function is unit-agnostic. If you pass in values in terms of square km or square miles, the function will return a density in those units. Returns: population_density: population/land_area. The population density of a particular area. """ return population / land_area
The next element of a docstring is an explanation of the function's arguments. Here you list the arguments, state their purpose, and state what types the arguments should be. Finally, it is common to provide some description of the output of the function. Every piece of the docstring is optional; however, docstrings are a part of good coding practice.
More resources:
https://www.python.org/dev/peps/pep-0257/
https://numpydoc.readthedocs.io/en/latest/format.html
Project Documentation
Project documentation is essential for getting others to understand why and how your code is relevant to them, whether they are potential users of your project or developers who may contribute to your code. A great first step in project documentation is your README file. It will often be the first interaction most users will have with your project.
Whether it's an application or a package, your project should absolutely come with a README file. At a minimum, this should explain what it does, list its dependencies, and provide sufficiently detailed instructions on how to use it. You want to make it as simple as possible for others to understand the purpose of your project, and quickly get something working.
Translating all your ideas and thoughts formally on paper can be a little difficult, but you'll get better over time and makes a significant difference in helping others realize the value of your project. Writing this documentation can also help you improve the design of your code, as you're forced to think through your design decisions more thoroughly. This also allows future contributors to know how to follow your original intentions.
Questions to Ask Yourself When Conducting a Code Review
First, let's look over some of the questions we may ask ourselves while reviewing code.
These are simply from the concepts we've covered in these last two lessons!
Is the code clean and modular?
- Can I understand the code easily?
- Does it use meaningful names and whitespace?
- Is there a duplicated code?
- Can you provide another layer of abstraction?
- Is each function and module necessary?
- Is each function or module too long?
Is the code efficient?
- Are there loops or other steps we can vectorize?
- Can we use better data structures to optimize any steps?
- Can we shorten the number of calculations needed for any steps?
- Can we use generators or multiprocessing to optimize any steps?
Is documentation effective?
- Are in-line comments concise and meaningful?
- Is there a complex code that's missing documentation?
- Does the function use effective docstrings?
- Is the necessary project documentation provided?
Is the code well tested?
- Does the code high test coverage?
- Do tests check for interesting cases?
- Are the tests readable?
- Can the tests be made more efficient?
Is the logging effective?
- Are log messages clear, concise, and professional?
- Do they include all relevant and useful information?
- Do they use the appropriate logging level?
Tips for Conducting a Code Review
Now that we know what we are looking for, let's go over some tips on how actually to write your code review. When your coworker finishes up some code that they want to merge to the team's codebase, they might send it to you for review. You provide feedback and suggestions, and then they may make changes and send it back to you. When you are happy with the code, you approve it and merge it to the team's codebase. As you may have noticed, with code reviews, you are dealing with people, not just computers. So it's important to be thoughtful of their ideas and efforts. You are in a team, and there will be differences in preferences. The goal of code review isn't to make all code follow your personal preferences, but a standard of quality for the whole team.
Tip: Use a code linter
This isn't a tip for code review, but can save you lots of time from code review! Using a Python code linter like pylint can automatically check for coding standards and PEP 8 guidelines for you! It's also a good idea to agree on a style guide as a team to handle disagreements on code style, whether that's an existing style guide or one you create together incrementally as a team.
Tip: Explain issues and make suggestions
Rather than commanding people to change their code in a specific way because it's better, it will go a long way to explain to them the consequences of the current code and suggest changes to improve it. They will be much more receptive to your feedback if they understand your thought process and are accepting recommendations, rather than following commands. They also may have done it a certain way intentionally, and framing it as a suggestion promotes a constructive discussion, rather than opposition.
BAD: Make model evaluation code its own module - too repetitive.
BETTER: Make the model evaluation code its own module. This will simplify models.py to be less repetitive and focus primarily on building models.
GOOD: How about we consider making the model evaluation code its own module? This would simplify models.py only to include code for building models. Organizing these evaluation methods into separate functions would also allow us to reuse them with different models without repeating code.
Tip: Keep your comments objective
Try to avoid using the words "I" and "you" in your comments. You want to avoid comments that sound personal to bring the review's attention to the code and not to themselves.
BAD: I wouldn't group by genre twice as you did here... Just compute it once and use that for your aggregations.
BAD: You create this groupby dataframe twice here. Just compute it once, save it as groupby_genre and then use that to get your average prices and views.
GOOD: Can we group by genre at the beginning of the function and then save that as a groupby object? We could then reference that object to get the average prices and views without computing groupby twice.
Tip: Provide code examples
When providing a code review, you can save the author time and make it easy for them to act on your feedback by writing out your code suggestions. This shows you are willing to spend some extra time to review their code and help them out. It can also just be much quicker for you to demonstrate concepts through code rather than explanations.
Let's say you were reviewing code that included the following lines:
first_names = [] last_names = [] for name in enumerate(df.name): first, last = name.split(' ') first_names.append(first) last_names.append(last) df['first_name'] = first_names df['last_names'] = last_names
BAD: You can do this all in one step by using the pandas str.split method.
GOOD: We can actually simplify this step to the line below using the pandas str.split method. Found this on this stack overflow post: https://stackoverflow.com/questions/14745022/how-to-split-a-column-into-two-columns
df['first_name'], df['last_name'] = df['name'].str.split(' ', 1).str