Software 2.0 Design - From Rule to Goal based Software Development
In my work with our clients, we quite often speak about a shift currently happening based on the possibilities of machine learning. Machine Learning (ML) today is more assessible to software engineers than ever before. AWS, Azure, and Google all by now offer out-of-the-box ML services, ML tool suits like Sagemaker, or AutoML to find ML models, or provide specialized instance types for ML workloads.
For engineering teams, ML will become part of the software refactoring and design process, e.g. re-architecting to a Software 2.0 Design. Google in a recent paper showed how a team migrated an information extraction system to such a Software 2.0 Design.
“Migrating a Privacy-Safe Information Extraction System to a Software 2.0 Design” - https://cidrdb.org/cidr2020/papers/p31-sheng-cidr20.pdf
The paper describes a system, Google’s email information extraction system, initially designed with hand-written rules for information retrieval. The system grew over time and became so complex, that further meaningful development nearly stalled.
“A really interesting thing happens when you go from developing a Software 1.0 (i.e., traditional software) to a Software 2.0 system. In Software 1.0 we spend the majority of our effort on writing code, expressing how the system achieves its goals. Our whole tool chains are geared around the creation and validation of that logic. But in Software 2.0 the majority of our effort goes into curating training data, i.e., specification-by-example of what the system should do. We need a whole new tool chain geared around the creation/curation and validation of that data.” – writes Adrian Colyer - https://blog.acolyer.org/2020/02/17/software-20-migration/
Google successfully redesigned their system to a Software 2.0 design, and gained these three benefits:
1. Precision of the new ML enabled system quickly surpassed the heuristic-based on
2. Code reduction of about 45k LOC reduced the to be maintained project size
3. Easier to maintain system because the rule-based system had become brittle
Plus, Google unlocked new possibilities, like multi language support as the original system was only design for English. I recommend to read the paper, as it covers more details and a glimpse into Google’s internal tooling and development process.
Software 2.0 Design is not only interesting for companies such as Google or AWS, but it is as important for companies in other industry sectors. For example, producers of programmable logic controllers (PLC) should embrace such Goal Based designs. Unlocking previously unmet potentials through the enablement of machine learning on the shop floor. And the same is true as above: we need a whole new tool chain around creation, curation and validation of data to train the parts of the system, where heuristics are just too complex to maintain.
Furthermore, even though it is called Software 2.0, it will not replace Software 1.0. The Software 1.0 parts of a systems are still and will be important. Some problems in the cyber-physical world tend to be more difficult to describe with explicit rules rather than gathering and labelling data to train machine-learning models. But many problems are better solved with a clear rule written down.