5 AI Studies Every Builder Must Know (but probably doesn’t)

Devansh Devansh

Chocolate Milk Cult Leader| Machine Learning Engineer| Writer | AI Researcher| | Computational Math, Data Science, Software Engineering, Computer Science

发布日期: 2025年3月15日

Thank you Tampa for all the love. I couldn’t spend enough to meet all of you who reached out, but I really appreciate the cultists pulling up to make this a fantastic time. SF next (15th night to 21st early morning).

When people think about AI products, the focus is often on the best models or algorithms to solve problems. However, I think this approach is short-sighted, and it misses how tech products (and I would argue solutions anywhere) are deployed in practice. In a nutshell, this approach emphasizes individual aspects of the system while ignoring the larger system within which the solution operates. Such an approach leads to problems like-

Inability to solve (or even spot) deeper problems.
Underestimating the impact of cofounding and influencing variables when making decisions.
Completely ignoring flywheel effects and feedback loops that can boost (or crush your solution).
Overengineering on problems that aren’t truly worth solving.
The collapse of your company, mass layoffs, civil war, and an accelerated heat death of the universe.

all b/c you didn’t think about what you were building.

Put another way, building great AI isn't just about technical brilliance or chasing improvements in specific aspects of the product. It’s about recognizing crucial underlying systems, subtle patterns, and human factors that determine whether your AI thrives—or fails. Building solutions with those larger underlying currents in mind will ensure that the smaller pieces fall into place without effort.

Building based on techniques is a counter-striker learning to win by focusing on their pull counter or their slip + liver shot. Building by thinking about systems is the counter striker learning about foot placements, baits, and setups- getting their opponent to throw the shot that they most want. The former gets highlight reels, the latter puts the opponents under.

Anime- How Heavy are the Dumbbells you Lift? Not great, but not a bad time kill.

In this article, I’ve personally picked out four powerful studies that anyone involved in AI should know. We’re not going to talk about specific models, techniques, or technical concepts, but instead about studies highlighting deeper principles and truths about building AI and software systems that will always be true, regardless of what technical developments happen.

Specifically we will cover research that helps us answer:

Why many organizations pick the wrong models and how to do AI Evaluations better
The costliest aspect of AI that causes 50% drops in productivity, 3.4x increases in defect density, and 10x increases in staff turnover.
What to look for in excellent software developers.
What are the key drivers of both success and failure in organizations looking to drive AI Projects?

If you want to know the answer to these questions, keep reading. One of the publications here is my favorite AI read EVER, so you absolutely should not miss that.

This article was originally published on my AI newsletter, AI Made Simple on Substack over here. If you want to ensure that you get such high-quality analysis delivered to your inbox at no cost sign up here. If you like like my work and think it's valuable, please consider becoming a premium subscriber of the newsletter for access to special articles, access to "inside" information from my network, and lots of style points, please consider becoming a premium subscriber to my newsletter over here.

I provide various consulting and advisory services. If you‘d like to explore how we can work together, reach out to me through any of my socials over here or reply to this email.

Executive Highlights (TL;DR of the article)

We will focus on the following papers-

“Accounting for Variance in ML Benchmarks”: This study investigates how reliable machine learning benchmarks really are, given the many sources of random variation in model training. The researchers modeled the entire benchmarking process and found that variance from factors like data sampling, model weight initialization, and hyperparameter choices can dramatically affect reported performance. Teams had a strong propensity to underestimate this high variance, and were often picking the wrong models due to incomplete evaluations. This paper is a must-read in the age of Generative AI, where the evaluation protocols are often a lot flimsier than even traditional ML pipelines (most people that end up building AI apps are not true AI people, but simply people that call the AI apis w/ Cursor and thus overlook these foundational but not obvious aspects of ML Engineering).

Another interesting bit from this study is their recommendation for how to run AI evaluations better. Their approach was counter-intuitive: integrate more randomness in your evals improved your performance, “We analyze the predominant comparison methods used today in the light of this variance. We show a counter-intuitive result that adding more sources of variation to an imperfect estimator approaches better the ideal estimator at a 51 times reduction in compute cost.” Here’s my working hypothesis on why this happens: most AI optimizations happen within a set of narrow/similar configurations (this is the theory behind Bayesian Optimization, which uses Bayes theorem to better estimate hyper params). So we can reasonably assume that if a model does well on Task A, it will do well on tasks very similar to A (Adversarial Perturbation is an exception, but it isn’t as much of a concern in most commercial apps since users aren’t normally trying to provide AP affected inputs). So instead of exhaustively searching the neighborhood of A, we can instead spend those resources for zipping around the search space and looking at other possible input values (this is what a lot of very good human testers do as well). It’s kind of the inverse principle we discuss at length- ensemble models do better than big singular models, even normalizing for computing resources, since ensembles sample from a more diverse search space, ensuring

“System design and the cost of architectural complexity” looks at how expensive architectural complexity is. And it’s not pretty. “Measures of architectural complexity were taken from eight versions of their product using techniques recently developed by MacCormack, Baldwin, and Rusnak. Significant cost drivers including defect density, developer productivity, and staff turnover were measured as well. The link between cost and complexity was explored using a variety of statistical techniques. Within this research setting, we found that differences in architectural complexity could account for 50% drops in productivity, three-fold increases in defect density, and order-of-magnitude increases in staff turnover.”

The study was conducted within a successful software firm, which adds to the practicality of the study. Another important point, I think this study underestimates the long-term impacts of complex architectures. Based on my own experiences and conversations, complex architectures tend to demotivate and push away more enterprising engineers, much more than they would push away engineers who don’t take personal ownership of things. This means that the long-term drops/culture can become incredibly stale, where nothing gets done b/c the “tenured engineers” on that team do nothing to reduce complexity and new hires either adopt the mentality or leave (creating what we call the “babu mentality” in India). All while the code base slowly gets worse and worse.

Above gets worse when we have out of touch upper management that pushes change recklessly, leaving unresolved threads and logic pieces in the code base, as the core team is constantly forced to accomodate the whims of overpaid executives who think they suddenly understand technology b/c they have ChatGPT explain very surface level ideas to them and code basic demos.

Yes, I’m thinking of a specific company as I write this. Yes, you know the company, and even the product. I won’t say which one, but that company will probably see a high profile exit due to these tensions. Fun times. We may revisit this when it happens, depending on what other topics I will have in my pipeline then.

Lastly, I think this study becomes even more important in coding-based tools. None of the coding tools that I’ve played with are great with generations in very long existing code bases (Augment is by far the best with understanding code bases so it’s my goto, but even that falls apart very quickly if not guided). Applying AI to complex code bases will likely lead to a lot of conflicts, especially when we consider that AI-generated code tends to be more verbose, adding to the complexity already there. Teams should be vigilant about cutting down architectural complexity so that they don’t have to deal with this nightmare fuel.

“What Distinguishes Great Software Engineers?”: Through a large survey of 1,926 senior engineers and 77 follow-up interviews, the researchers pinpointed the five attributes most essential to engineering “greatness”. These top five traits were: writing good code, adjusting behaviors to account for future value and costs, practicing informed decision-making, avoiding making others’ jobs harder, and learning continuously In plain terms, great engineers excel technically (code quality), think ahead about long-term implications, make thoughtful decisions, collaborate without creating friction, and constantly upgrade their skills.

Interestingly, the highest dimensions of greatness show the ability to work with the team, demonstrating holistic and team-oriented skills (as opposed to the myth of the “genius hacker” that interacts with no one and does what they want).

Notice how strongly personality dominates the top of the ranking. Absolutely generational run.

The study also identifies two aspects of bad coders. Both were very surprising, but made a ton of sense when I read them. This main section will cover them.

The next study is one of richest studies on AI that I’ve ever come across. I won’t be able to add a deep-dive here b/c that would make this article very long. But this paper has so much widom that I would hate for you to miss out on, so I’m going to give you a tl;dr of the most important ideas for now. We will do a seperate deep dive on this piece in the future. I think everyone in AI should be familiar with this paper.

“Operationalizing Machine Learning: An Interview Study”: This is one of most goated studies in AI, and it makes me sad that people don’t speak of it with the same reverence as the “Attention is All You Need” paper. Casuals doing Casual tings I guess. In fact this is the only study I’ve ever covered, where I had to do two different (very comprehensive) breakdowns just to do it some justice (this and this).

This interview study examined how organizations operationalize machine learning (ML), i.e. how they deploy and maintain ML pipelines in production settings. Through in-depth interviews with 18 ML engineers spanning domains like chatbots, autonomous vehicles, and finance, the researchers mapped out the end-to-end MLOps process and its pain points. They found that successful production ML systems revolve around three core capabilities, dubbed the “three Vs”:

Velocity means the ability to iterate and prototype quickly – for example, rapidly pushing pipeline tweaks or model improvements.
Validation means rigorous testing and early monitoring to catch issues (ensuring models are evaluated thoroughly through staging deployments and are watched in production for anomalies).
Versioning entails systematically tracking and managing multiple versions of models and datasets, so you can retrace steps, compare iterations, and rollback if needed.

The interviews revealed that many common issues in ML deployment arise from tension between these goals. For instance, pressure for fast iteration (velocity) can conflict with thorough testing (validation), leading to mismatches between development and production environments or bugs that slip through. Likewise, without good versioning, data errors can creep in or it becomes hard to reproduce results, undermining validation. Here is a list of aforementioned errors-

Development-Production Mismatch: There are discrepancies between development and production environments. This includes data leakage; differing philosophies on Jupyter Notebook usage; and non-standardized code quality- all of which cause unanticipated bugs in production. Bridging this gap with tools that provide similar environments while supporting varying iteration speeds is crucial. I also HIGHLY recommend leveraging asynchronous documentation/style guide checks + team culture to standardize many of these practices so that your engineers unconsciously follow the SOPs that you have established. As a wise man (me) once said- Subconscious brainwashing often succeeds where education fails. More on this to come in future articles.
Handling Data Errors: ML engineers face challenges in handling a spectrum of data errors, such as schema violations, missing values, and data drift. The difficulty lies in determining appropriate responses for each type of error and mitigating alert fatigue caused by false positives. These can be addressed by developing/buying tools for real-time data quality monitoring and automatic tuning of alerting criteria.
Taming the Long Tail of ML Bugs: Debugging ML pipelines presents unique challenges due to the unpredictable and often bespoke nature of bugs. While symptoms may be similar, pinpointing the root cause can be time-consuming and lead to a sense of “debugging trauma”. Categorizing bugs and developing tools that provide insights into performance drops and their root causes are potential solutions. This is why I obsessively emphasize better transparency and monitoring frameworks- they can help you avoid huge recurring mistakes.
Lengthy Multi-Stage Deployments: The iterative and unpredictable nature of ML experiments + the multi-stage deployment processes can lead to extended timelines for validating and launching new models or features. This often results in dropped ideas due to shifting priorities or evolving user behavior. Streamlining deployments and tools that predict end-to-end gains could minimize wasted effort. However, your best bet is to embrace the maxim of “less is more”. Look for ways to (in)validate ideas as quickly as possible, as this will save you a lot of time and energy. Remember, you win every battle you don’t fight.
Observed Anti-Patterns: Several anti-patterns hinder MLOps progress, including the mismatch between industry needs and classroom education, the urge to keep GPUs constantly running experiments without strategic focus, the tendency to retrofit explanations after observing results, and undocumented knowledge about specific pipelines. Addressing these anti-patterns requires improved educational resources, automated documentation tools, and a shift towards prioritizing quality over quantity in experimentation.

Overall, the research paints a picture of MLOps as a continuous loop of data collection, experimentation, deployment, and monitoring – which needs the infrastructure and practices address speed, quality assurance, and reproducibility in unison.

Routine tasks in the ML engineering workflow

Let’s break each of these down in more detail.

1. Accounting for Variance in ML Benchmarks – How teams pick the wrong models

We begin by questioning a cornerstone of AI development: benchmarks. Benchmarks are far more variable and less reliable than we often assume. This has some very important implications for how we approach AI. For best results, it’s very important to change how we view benchmarks and the results from our AI evaulations.

The core idea, often overlooked in the daily rush of AI development, is that every benchmark score is not a fixed point, but rather a sample from a distribution influenced by numerous random factors. These factors spawn everywhere- from the seemingly innocuous act of splitting your data into training and test sets, to the random initialization of model weights during the learning process, to the “intelligent” choices we make during hyperparameter tuning (we’ll pretend there is a lot of thought required here so that you can keep playing ping pong while the model finds the best configurations) – randomness is deeply ingrained in the very fabric of evaluating AI models. This inherent randomness introduces variance, a significant factor that can dramatically skew our perception of model performance and lead us down the wrong path.

The various ways we could induce variance into our learning agents. The numbers can’t be ignored. The variance can literally change the results of your comparison.

This variance is not a two-bit statistical phenomenon that gets ignored like the “+C” during integration; it's a critical signal, telling us that a single benchmark score is merely one snapshot from a spectrum of possible outcomes. It speaks to the range of performance we can realistically expect from a model. And ignoring this can have some major problems-

In case you missed that let me reiterate- “Properly accounting for these factors may go as far as changing the conclusions for the comparison”. In other words, trained pros were picking the wrong models.

This can get worse when you start accounting for more nuanced eval protocols- accuracy + stability is a big ones. Some models might have very high peaks, but also fail more spectaularly, while others have more stability. In this case, the latter model will be easier to build around, which creates a strong reason to pick it, one that can be overlooked if your evals don’t account for this angle. This isn't just a theoretical concern; it's a practical, costly error in AI development that I've witnessed time and again in the field.

The problem is amplified even further in Generative AI. Here, evaluation moves beyond the relatively structured world of classification accuracy and into the murkier waters of subjective assessments and less standardized metrics. Evaluation protocols for generative models are often far less rigorous, even more prone to variance than traditional ML pipelines. If benchmark variance is a problem in traditional ML, it's bordering on a crisis in Generative AI if left unaddressed. It’s no coincidence that multiple non-technical AI experts (like investors look at AI, policy makers for this etc) who have studied LLMs have all independtly come to the conclusion that benchmarks aren’t very useful.

The paper proposes a solution that, at first glance, seems paradoxical: fight randomness with randomness; embrace more variation in evaluation. Instead of clinging to the illusion of precision offered by single benchmark runs, we need to intentionally inject more variation into our evaluation process. Run benchmarks multiple times – not just once or twice, but five, ten, twenty times, systematically varying random seeds, data splits, and even configurations. Think of it as stress-testing your model under a multitude of conditions, probing its performance limits, and mapping its true capabilities.

This "noisy" evaluation approach might feel less efficient, less precise on the surface. But it's precisely this seemingly counter-intuitive strategy that provides a far more robust and reliable understanding of a model's true potential. By testing across a wide set of configurations, we can get a strong understanding of the models behaviors and metrics w/o needing to run on every possible configuration. This is where the 51x reduction in cost that the paper described comes from- randomized sources of variance allow the evals to jump through the search space very easily, creating a good understanding of the kinds of performance your model might have.

Now because I love you, my dearest reader, I’ve studied the literature, thought back through my own experiences, and spoken to a lot of people to give you some steps that you can apply-

Move Beyond Single-Run Benchmarks- The "Five is the New One" Rule: Something I always tell my clients- never, and I mean never, make critical model decisions based on a solitary benchmark run. Run your evaluations at least five times – and preferably aim for 10-20. Don't fixate solely on the average score; instead, rigorously analyze the range and distribution of scores. This distribution is your real signal, revealing the model's inherent variability and robustness, or lack thereof. If you're still reporting just a single number, you're only telling half the story, and more often than not, the misleading half. This might seem obvious, but even the leading AI Labs often skip this step. For example, OpenAI missed this very obvious thing when they were misleadingly marketing o1 beating doctors, and their “revolutionary model” gave me different outputs (including an almost 50% error rate) on the prompt that they shared to highlight how great it was.

Even Deep Research (which is a great product that I’m paying for FYI) asked very different follow up questions for identical input prompts, leading to different outputs. Any AI builder should always have strong cross-validation in their evals.

Craft Rigorous, Variance-Aware Evaluation Protocols- Especially for Generative AI's Wild West: Generic, off-the-shelf evaluation metrics, often designed for simpler classification tasks, are laughably inadequate for assessing the complexities of generative models. Invest serious time and resources in designing evaluation protocols that are meticulously tailored to your specific Generative AI task and that explicitly account for variance. For text generation, this might mean incorporating metrics that assess coherence, fluency, and factual accuracy, alongside human evaluations to judge subjective qualities like creativity and style. You might even use different embedding models and LLMs as judges to see how it gets processed (this has been one of the ways we improved our long context retrieval for legal review at Iqidis, but that’s a conversation for another time. If you’re interested in small updates on how we’re building next gen Legal AI, check out our LinkedIn page). Think beyond simplistic, easily quantifiable metrics and embrace a more holistic, variance-aware approach to evaluating Generative AI.
Variance-Informed Model Selection- Don't Be Seduced by the "Shiny Object": The siren song of a slightly higher average score can be dangerously alluring, leading teams to chase after marginal gains that are statistically insignificant or, worse, completely illusory. Train yourself and your team to be variance-aware in every model selection decision. When comparing models, look beyond the seductive average performance and prioritize the consistency and reliability of performance. A model with a slightly lower average score but significantly lower variance is often the strategically superior choice for production. It signals a more robust and stable model, less prone to wild, unpredictable performance swings in the chaotic real world. I also wouldn’t be too keen to switch out models until you can prove significantly better performance, otherwise you’re setting yourself up for shocks.
Embrace Randomized Evaluation- Retire the Sacred Seed: Break free from the outdated habit of treating random seeds as sacred, fixed constants in your evaluations. Instead, systematically inject randomness into your entire evaluation pipeline. Experiment with randomized data splits, varying model initializations, and even exploring different hyperparameter configurations during evaluation. Think of this randomized approach as a rigorous stress test for your model, pushing it to its limits under a diverse range of conditions. The benefits have been discussed prior.
Communicate Uncertainty- Benchmarks with Error Bars, Not Just Deceptive Point Estimates (very important): Finally, and perhaps most importantly, communicate benchmark results with responsibility and transparency. Present benchmark scores with error bars – standard deviations or, even better, confidence intervals – to honestly and accurately represent the inherent uncertainty in AI evaluations. Educate your stakeholders – product managers, executives, clients – that benchmark numbers are not gospel, not absolute truths carved in stone, but rather estimates with a measurable margin of error. Frame model comparisons in terms of probabilities and confidence levels, not false certainties. This is key for expectation management, and in developing the “thinking in bets” mentality that is key to succeed in ML Engineering (it’s one of the key areas MLE is different than SWE).

Let’s move onto the next section. Once you have strong evals, you need to take a step back and think about your system. Specifically, how easy it is to deal with.

How Complexity comes with a Death Sentence

I don’t think I need to come in and tell you about why complexity is a bad thing. So I’m going to skip the repetitions and the stats and go straight to the solutions for this. This way we both save time-

Architectural Simplicity as a Non-Negotiable Mandate. Simplicity isn't a design afterthought; it's the bedrock of sustainable AI development. Make it a non-negotiable mandate, starting from project inception. Demand clear, modular designs with rigorously defined interfaces. Reward elegance and clarity over cleverness and convolution. Institute architectural reviews not as bureaucratic hurdles, but as essential checkpoints to enforce simplicity and prevent the accumulation of technical debt before it cripples your project. Think of architectural simplicity as a core requirement, as critical as performance or functionality.
Strategic Refactoring: Build It Into the Rhythm of Development. Don't treat refactoring as a heroic, once-in-a-blue-moon effort to clean up a mess. Embed refactoring into the daily rhythm of your development process. Allocate dedicated time and resources for ongoing code cleanup, architectural simplification, and proactive debt management. Make it a regular sprint activity, not a reactive measure. Treat refactoring as a continuous investment, like regularly servicing machinery, ensuring smooth operation and preventing catastrophic breakdowns down the line. Integrate refactoring tasks into your project planning and track them just as rigorously as feature development. Some teams I know have dedicated people for refactoring, while others claim that it’s better to have the people that built systems also refactor them. I’ve heard good arguments for both and don’t know this side enough to really say (I’ve mostly worked as an IC and applied researcher) so would love to hear your thoughts on this.
Quantifiable Complexity Metrics: Your Architectural Dashboard. Vague notions of "code quality" are useless for practical management. Implement quantifiable metrics to measure and monitor architectural complexity. Adopt metrics like those detailed in the paper, or leverage existing code analysis tools to track dependencies, coupling, and other indicators of architectural health. Create an architectural dashboard that provides real-time visibility into complexity hotspots and areas of growing debt. Regularly review this dashboard with your team, making complexity metrics a key part of your architectural monitoring and management strategy. This will also help with CI/CD and AI coding assistents, so there’s really no good reason to start investing into it.
Engineer Empowerment: Make Everyone a Complexity Fighter. Foster a culture where code cleanup, refactoring, and debt reduction are not just tolerated but actively encouraged and rewarded. Incentivize engineers to prioritize maintainability and long-term system health alongside feature delivery. Recognize and celebrate simplification efforts, making them as visible and valued as new feature launches. A lot of engineering compensation is tied to feature pushing and not refactoring, which creates weird incentives (this is also true for teams that praise fire fighters, which incentivizes devs to cause fires by pushing quickly). Change this ASAP.
Management Education: Complexity as an Economic Risk. The final, and perhaps most critical, recommendation: educate your management team. Complexity is not a purely technical problem; it's a fundamental economic risk. Clearly communicate the tangible costs of architectural debt – the productivity losses, the increased defect rates, the talent attrition, and the long-term stagnation it breeds. Use data and metrics to demonstrate the ROI of investing in architectural simplicity and proactive refactoring. This ensures that management is aligned on fighting tech debt, and they don’t end up adding a ton of extra work for their engineers.

To really make this work, you need to hire (or become) a great software engineer. This is where Microsoft’s research can be extremely insighful. Let’s see what we can learn from it.

What Defines Great Software Engineers?

We already know the traits from the tl;dr, but let’s redefine them here to keep things easy-

Writing good code-Pays attention to coding details and mentally capable of handling complexity. Nothing too shocking here. Without knowing how to write good code, nothing else comes together.
Adjusting behaviors to account for future value and costs- Good developers are able to create solutions that maximize the current value of their work, in a way that is able to account for the future. This is why I recommend that developers understand basic economics and their company’s business models to be amazing . This paper also mentions understanding economic concepts like Risk and Return, which is in line with that recommendation. .
Practicing informed decision-making- “Great engineers differentiated themselves from others by going through the right processes for making informed decisions” That quote from the paper really explains all you need to know. Because of the scale tech operates at, good decisions can make millions. Bad decisions lose millions. One correct decision can be much better than 10 average ones. The ability to interact with (and pull insights from various stakeholders) can push engineers to high levels (this is probably been my best strength since I can pick up insights from a pretty big diversity of people).

Avoid making others’ jobs harder- Remember, everything you build is a part of a bigger system. No one is paying you just to write code. You’re being paid to contribute to a system that will make money. If I create an amazing 99% accuracy AI model that shuts down the whole system because of all the data processing and resources it needs, then I’ve just created a bad model. As the authors state- “Great engineers distinguished themselves by making others’ jobs easier, helping them to make their decisions more efficiently (or, at minimum, they did not make them worse)”.
Learning continuously- It’s not about where you are, but where you are willing and able to be. Learn consistently, and you will be legendary. Since you’ve subscribed to this newsletter of your own initiative and are most likely reading this on a weekend, I’d say you at least have the right attitude.

Nothing too shocking, but good to see them above other traits. Helps us focus our attention.

How can you develop these traits? Let’s do that next.

Cultivating these traits

Read more. YT videos on informative topics also count.

That’s really it. I could spin 20 stories, but this is what it boils down to.

We on the same page? Let’s cover the next part- what should you read?

There is a strong base of three types of reading material that will help you a ton. I share different examples of these 3 all the time. Any guesses for what they are?

Are you sure, you have no guesses?

I’ll give you a cookie if you get it correct.

Our holy Trinity is the following-

Material on the economics of different industries- Different industries have their own unique set of challenges, opportunities, and constraints. Reading about how people are tackling and solving these will help you gain a lot of perspective. Also helpful is understanding why certain solutions failed.
Engineering Blogs- Organizations all over the world are putting out exceptional content on how they designed their systems/tackled challenges. They talk about the problems they faced, solutions they used, and some interesting insights. Make sure you read through them. Your favorite big company is publishing very high-quality writing. Read it to gain a better understanding of them. And learn how people are making moves in this world.
Research Papers/Theoretical Material- Spend some time coming across different ideas in Math, Computer Science, and Software Engineering. Coming across different ideas and concepts will help you understand some of the nuances of the various solutions, and how to navigate trade-offs effectively.

Our self-guided learning plans have focused on the last 2 (reach out if you want a specific one), since I have the expertise to tell you how to learn from them. The first kind of source isn’t something I’ve studied formally, so I wouldn’t be a good person how to approach them beyond the general- be curious, ask questions about the source, and talk to people to see how the principles apply IRL. That’s what I’ve done. If you have something to add, please do share.

To this base, add whatever interests you. Depending on your interests, goals, and inclinations you will spend your time on each of these differently. That’s fine. In the end, that will help you develop skills and ideas that are unique to you. And that is where you will make amazing career leaps.

Reading more will help you make better decisions. It will help you foresee some challenges you might face, know how different people solved them in related (or different domains), and what engineering decisions will give you the highest ROI. It will also expose you to various best practices/design decisions that will ultimately help you create code/solutions that are functional, performant, and easy to work with/modify.

Now let’s get into the deadly programming sins that you should avoid.

The 2 Deadly Sins for Developers

We’ve already covered the 2 terrible sins. Let’s talk about why they’re bad and what you can do instead.

Hard work

This might come as a shock to most of you. Hustle Culture really loves Sigmas that love to grind. And it is important to put in the work. If you’re trying to run a marathon, you have to put in the miles. No way around it. So why is this considered a bad thing-

However, if a developer is consistently finding themselves working 8-hour+ days, they’re probably doing something wrong. As we’ve talked about many times, it’s not about doing more but doing the right things. This is why I emphasize the need to take a step back at times and analyze things as a whole. 1 High Impact Decision > 10 Medium Impact decisions.

…workload for a developer is a function of management and planning happening above that developer. Usually long working hours are needed, because the planning was not good, the decisions made during the project lifecycle were bad, the change management wasn’t “agile” enough

-From the paper

The key is in chilling out. Instead of rushing into a problem, spend your time thinking of the details (forecasting the future). Pick the most impactful, simple areas to tackle. Keep the end goal in mind as you proceed. Less is more. Great results can be attained by doing very little (comparatively).

This is not to say that you will never have to do these long days. Challenges crop up all the time. Just don’t make those long days the norm. Most of your time should be spent thinking, planning, and considering details. The grind should be a very rare event. Not something that happens on a monthly basis.

Moving on to the second sin.

Trading Favors

Imagine you helped someone fix something. Then you called upon them to help you? What’s wrong with this?

Nothing. Absolutely nothing.

The problem becomes when people start forming clusters. You and your crew go to each for help. And help mostly each other. Not because of drama or anything in particular. Humans are tribal creatures so this is only natural. We will naturally gravitate to the people we are familiar with. So why is this a bad thing?

I covered Conway’s Law a while back. It shows that biases in team structures tend to propagate throughout the system. Trading favors amongst a group will add a layer of bias that will show up in the solutions.

Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.

-Melvin E. Conway

There is a simple Tweak to solving this. Instead of helping/going to only a select few, reach out to more people. Actively pursue more and more people who might be able to help you. Post your challenges on the company boards/comm channels and work with a greater variety. This will allow you to avoid this problem while also allowing you to meet more people. Win win.

I selected these papers b/c I believe that they form a strong foundation for every team, irrespective of what they’re doing. I’ve found that everyone I’ve ever worked with has benefitted from the ideas discussed in these papers, and I often find myself quoting these studies.

Are there any studies you would add to this list for a part 2? Would love to hear them.

Like this post? Please consider sharing it with someone you think will benefit from it. And if you want more of such posts, sign up for AI Made Simple over here and never miss an important update on AI.

Thank you for being here, and I hope you have a wonderful day.

Dev <3

Andrew Harrison

Explorer, MD, PhD | Physician, Scientist, Clinical Informatics AI/ML | CMO, VP, Board Member | Diversity & Health

1 周

3 次回应

查看更多评论

要查看或添加评论，请登录

Devansh Devansh的更多文章

What you should know in AI, Software, Business, and Tech- 3/4/2025

2025年3月6日

What you should know in AI, Software, Business, and Tech- 3/4/2025

A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI…

2 条评论
The Next AI Revolution: Why Diffusion-Based Language (like Mercury LLM) Models Are a Big Deal

2025年2月28日

The Next AI Revolution: Why Diffusion-Based Language (like Mercury LLM) Models Are a Big Deal

This breakdown was originally published here. To ensure that you get such high-quality articles delivered straight to…

2 条评论
Why Research is Expensive

2025年2月21日

Why Research is Expensive

When most people think about the costs of research, they focus on the obvious: expensive equipment, competitive…

1 条评论
What you should know in AI, Software, Business, and Tech- 2/19/2025

2025年2月20日

What you should know in AI, Software, Business, and Tech- 2/19/2025

Before we get into the AI stuff, I saw "Capt America, Brave New World" today. Didn't really go in with many…

5 条评论
How Google uses AI to save Millions of Dollars on Computing Chip Design

2025年2月12日

How Google uses AI to save Millions of Dollars on Computing Chip Design

Following is an excerpt from my article- "AI x Computing Chips: How to Use Artificial Intelligence to Design Better…

1 条评论
The Chinese Philosophy to Building Flexible AI

2025年1月31日

The Chinese Philosophy to Building Flexible AI

Why you should read “The Tao Te Ching” by Lao Tzu, AI Edition This was originally published in my Free Newsletter-…

8 条评论
How to develop the most Important Skill Required for AI

2025年1月27日

How to develop the most Important Skill Required for AI

How you should learn Math for to get Good at AI The following is an excepr from my article, “What Math do you need to…

5 条评论
Why Deepseek is sharing their R1 AI Model publically

2025年1月24日

Why Deepseek is sharing their R1 AI Model publically

Understanding the misunderstood business of Open Source Software Deepseek's R1 model, which is competitive with…

15 条评论
What you should know in AI, Software, Business, and Tech- 1/22/2025

2025年1月23日

What you should know in AI, Software, Business, and Tech- 1/22/2025

A lot of people reach out to me for reading recommendations. I figured I’d start sharing whatever AI…

2 条评论
An introduction to the Most Undervalued field in AI

2025年1月14日

An introduction to the Most Undervalued field in AI

It was recently revealed that Meta used pirated data from LibGen to train their Llama Language model- This got me…

23 条评论

See all articles

1. Accounting for Variance in ML Benchmarks – How teams pick the wrong models

How Complexity comes with a Death Sentence

What Defines Great Software Engineers?

The 2 Deadly Sins for Developers

Devansh Devansh的更多文章

What you should know in AI, Software, Business, and Tech- 3/4/2025

The Next AI Revolution: Why Diffusion-Based Language (like Mercury LLM) Models Are a Big Deal

Why Research is Expensive

What you should know in AI, Software, Business, and Tech- 2/19/2025

How Google uses AI to save Millions of Dollars on Computing Chip Design

The Chinese Philosophy to Building Flexible AI

How to develop the most Important Skill Required for AI

Why Deepseek is sharing their R1 AI Model publically

What you should know in AI, Software, Business, and Tech- 1/22/2025

An introduction to the Most Undervalued field in AI

社区洞察