I am in the process of experimenting with AI - specifically Co-Pilot/GPT-4o - to see if it can really be useful and economical for programming real-world applications and systems. I emphasise “real-world” because engineering large scale? systems and applications involves many skills other than knowing how to program standard algorithms and using existing well known technology.?
As I see it, AI assisted programming is so impressive at first sight that people are blinded and brush aside the inherent risks. As a Software Engineer with 40 years of development behind me, I see my role to raise the flag of caution - don’t assume that programming algorithms and even web entire web applications can safely and reliably scale up to large complex systems - especially if these systems have mission-critical risks.?
This is work in progress, but here are some of my thoughts on the risks we can’t ignore, along with some thoughts of how some of them may be mitigated.
One of the key properties of software is that it is deterministic. Unlike in other engineering disciplines, software engineers do not need to consider physical characteristics such as noise, tolerance, material strength, or manufacturing variances. The primary cause of non-determinism in software is bugs—whether in application code, libraries, infrastructure, or tools. These bugs are not inherent software characteristics — they are characteristics of the humans that specified, coded, and tested the software.
When we consider the use of AI to assist us with building applications where reliability is important, we must be aware that we are introducing a significant source of non-determinism into the software process. If we want robust, reliable software-based systems, and we want to use AI to generate the code, then we need to identify and mitigate this non-determinism.
Here is an initial analysis of the fundamental sources of AI-generated code, based on a few weeks of writing code using Co-Pilot/GPT-4o.
- The AI is a black box. We don’t know what code the AI was trained on, so we have no idea if it has the expertise to produce the code necessary for our application. This means that at the start of the project, we have to evaluate if there is a need to invest in testing the AI’s knowledge of all the domains that we are relying on its expertise for. This adds effort, increases schedule overhead, and introduces risk since we may not know how to test it efficiently and effectively.
- AIs either don’t know what they don’t know or prefer to generate something rather than admit their ignorance. The results are hallucinations that can only be discovered through more review and testing. (To be fair, humans find this hard as well….)
- As AI generates a greater proportion of the code, our role shifts toward specifying, reviewing, and validating the code. However, the more we depend on AI, the more likely it is that we are not experts in the AI-generated functionality—so reviewing and testing become more challenging, and bugs are more likely to be missed. I suspect that many programmers are much better at writing accurate code than reviewing code.
- We don’t know what design decisions, trade-offs, and assumptions were made in the training code. For example, were the coders motivated by time to market (simplicity), or were they in an environment that demands future-proofed code? Is the code a proof of concept or industrial-grade robust code? Is it optimised—for what? Speed or memory? Typically, these design decisions are not always extractable from the code. So even if we tell the AI what our priorities are, it's not clear that, at least for AI trained on existing code, it has the knowledge to match the code pattern to our needs.
- We don’t know the quality of the code it was trained on. Since the training code was written by imperfect people, it likely contains bugs, implicit assumptions, and bad habits. Unlike copying code from the internet, AI-generated code feels hand-crafted, making these flaws less obvious.
- It’s likely that the training code contained multiple ways of solving a problem. How does the AI choose which one to use? Who says that this is most suitable for our needs?
- By creating prompts in natural language instead of concise, formal programming languages, we are widening the semantic gap between what we say we want and the code that is generated. Natural language is more likely to suffer from inaccuracy and ambiguity. It may also be more voluminous and complex, making it harder for human stakeholders to review for completeness, accuracy, and consistency—that’s one of the reasons that in the 1990's, “Agile” became popular instead of “Formal Specification,” and formal documents were abandoned in favor of “many small iterations with lots of informal communication.” While multiple small iterations are suitable for the AI code generation process, effective communication with an AI is currently a challenge (how this can be improved is described below).
The bottom line is that any time advantage gained by AI code generation needs to be weighed against increased review and testing effort and increasing schedule risk.
So how do we achieve deterministic software using AI? I think part of the solution is to move from a generic AI with free-style natural language prompts to one explicitly designed to support the software development process. Here are some possible directions to explore:
- Users should have access to clear specifications of the AI’s capabilities, including supported application domains and solution domains (e.g., languages, libraries, platforms, protocols), to help choose an appropriate AI tool.
- Natural language prompts should be augmented with more formal specification methods such as decision tables, state machines, diagrams, and other structured formats. An example domain where this is critical is a GUI specification, where a picture really is much more accurate than thousands of words.
- AIs should be less eager to please and more critical of the prompts they receive. Just as a compiler for a language with decent type checking can find many bugs by checking syntax, scope, and types, AI should be able to critically examine our prompts and ensure that what we expect of it is defined completely and consistently.
- Related to the previous point, the AI should offer the use of a structured format and defined terminology for writing specification prompts. A formal and structured prompt lets the AI validate that, in addition to the main functionality, other aspects such as secondary use cases, edge cases, error handling, performance requirements, and other limitations are explicitly defined and identified.
- Since the AI isn’t aware of all project requirements, limitations, trade-offs, and planned evolution, it should identify assumptions and annotate them with comments in the code. This would allow for review, approval, and a way to undo all the code related to that assumption if requested.
- The initial response to a prompt requesting code should be a “design.” This should describe how the requested changes are partitioned into new code and changes in existing files and functions. The design should include things such as side effects on existing functionality, assumptions, trade-off choices (simplicity/speed/memory use), and testability. Only after the design is approved should code be generated. Today it is possible to request this in the prompt, but my experience is that sometimes the AI generates code anyway.
- An organisation should be able to configure the AI with coding standards and similar guidelines. In Co-Pilot, you can write such a file and mention it in every prompt, but from personal experience, it’s easy to forget and annoying to add to every prompt. It is also a waste of time and processing for these guidelines to be processed on every prompt.
To summarise, AI has the benefit of vast knowledge and speed. But it is also a schedule and reliability risk,? and? significant time, effort and expertise are required to prompt, review and test the AI. At the same time our knowledge of the system implementation is reduced, making whiteboard testing, debug and code reuse less effective.? I believe the way forward is not with increasingly smarter AI’s, but rather a more specialised AI designed specifically for Software Engineering support.? Today's AI is a classic example of the old adage of a "jack of all trades master at none".
Senior Architect and Programmer
2 周Definitely. AI is created by humans. It is trained on human content By definition it reflects humanity However it reflects only that part of humanity that appears in the internet. Our thought process is much more complex. We don't just spurt out knowledge we've picked up. We temper what we say and do based on years of education, experience, morality, spirituality, empathy and so on. AI may have some political correctness filters and hard coded social skills but these are focused. The A is for artificial. Its no reality. The I, intelligence does not cover the full range of human emotions, spirituality and similar things that guide us.
Symbol VC, first check investments | Startups Eco-system, Partnerships and Community | It's all about the people
2 周Very well written and thought. The dangerous aspects of AI are essentially what makes saddens me about humanity.. Perhaps it's a mirror for human behavior:(?