Optimizing for performance
For many years, I've been deeply engaged in the study of Disruptive Innovation Theory, an extraordinary framework put forth by Harvard Professor Clayton M. Christensen.
As a consultant across various industries — from telecommunications, IT, and media to mobile technologies, automotive, and marine sectors — my ambition has consistently been to incorporate these groundbreaking models into my professional services. The ability of Christensen's theory to foresee future disruptive innovations across these diverse sectors continually astounds me.
It seems only logical to consider Intelligent Virtual Assistants (IVAs), whether they're managing self-driving cars (TESLA FSD) or acting as copilots for coding (GitHub Copilot), within this context. IVAs fit perfectly into Christensen's description of New-Market Disruption. This occurs when disruptors establish a fresh market and value network that incumbents have previously missed.
These disruptors create products or services that are simpler, more convenient, and more affordable, attracting a new demographic of customers who previously couldn't afford or didn't have access to the existing market's products, like real-life assistant, driver or SW-developers. As these disruptors secure their foothold in this new market, they gradually refine their offerings, potentially attracting customers from incumbent markets.
A classic instance of new-market disruption is the advent of personal computers. Initially targeting home users — a segment ignored by mainframe computer manufacturers — personal computers, with their performance enhancement over time, began to serve business users, disrupting the mainframe computer market.
It's only reasonable to ask: how would an IVA be any different? As they evolve in sophistication, they may well unsettle established industries, akin to how personal computers disrupted the past era.
It's key to understand that disruptive technology or products often start by underperforming compared to the existing dominant technology, especially in metrics meaningful to the most demanding customers. This underperformance tends to occur in a market segment currently underserved or not profitable for incumbents.
Yet, this under-performance is often confined to traditional performance metrics. Disruptive technologies often bring novel performance attributes that, while not valued by existing customers, resonate with a different, typically underserved, customer segment.
As the disruptive technology improves on traditional metrics to meet the needs of a wider customer base, not just early adopters, it can displace the existing technology and disrupt incumbents. This 'performance trajectory' is a common theme in disruptive innovation and is crucial for businesses to identify potential threats and opportunities.
Furthermore, successful disruptors often adopt an interdependent and vertically integrated architecture, especially in the early stages of disruption. This allows them to maintain control over critical product components, facilitate coordinated innovation, achieve cost efficiency, and ensure a superior customer experience.
Understanding these dynamics of disruption is critical to understanding the new technology stack, which Andrej Karpathy describes beautifully in his Software 2.0 post.
“To make the analogy explicit, in Software 1.0, human-engineered source code (e.g., some .cpp files) is compiled into a binary that does useful work. In Software 2.0 most often the source code comprises 1) the dataset that defines the desirable behavior and 2) the neural net architecture that gives the rough skeleton of the code, but with many details (the weights) to be filled in.
领英推荐
This is fundamentally altering the programming paradigm by which we iterate on our software, as the teams split in two: the 2.0 programmers (data labelers) edit and grow the datasets, while a few 1.0 programmers maintain and iterate on the surrounding training code infrastructure, analytics, visualizations, and labeling interfaces.”
To understand how it would work in practice, I need to understand the new target operating model for end-to-end operations: from Pre-production to Production to Post-production. Once I understand the new technology stack for Software 2.0, then I can start seeing how the resources (which refer to tangible and intangible assets that a company can utilize to deliver value and drive innovation) can be used and how an interdependent and vertically integrated architecture can improve performance.
Christensen's definition of performance is not restricted to the traditional, narrow view of technological performance or the functionality of a product. Instead, it is a broader, more comprehensive view that includes a variety of factors such as functionality, reliability, convenience, price, and even social and emotional factors.
Optimizing for performance
The performance gap, in this context, refers to the discrepancy between current computational capabilities and the growing demand for faster, more efficient, and more affordable solutions. Today, we have powerful models capable of remarkable tasks, however, these models often require significant computational power, high throughput, low latency, and substantial financial resources for training and inference. The gap between these requirements and the available resources presents a considerable performance gap.
In examining each layer of the stack, or the end-to-end process from Pre-production to Production to Post-production it becomes evident that applying an interdependent and vertically integrated architecture will enhance performance.
As Andrej Karpathy mentions in 'Software 2.0', a typical neural network primarily consists of two operations: matrix multiplication and thresholding at zero (ReLU). This is relatively simple compared to the more heterogeneous and complex instruction set of traditional software. As a result, it's easier to guarantee correctness and performance since you only need to provide a Software 1.0 implementation for a small number of core computational primitives, such as matrix multiplication. Therefore, because a neural network's instruction set is comparatively small, it's significantly simpler to implement these networks much closer to silicon.
The 'Full Stack Optimization of Transformer Inference: A Survey' corroborates this viewpoint. CPUs and GPUs, while commonly used in general-performance computing platforms and capable of supporting a wide array of workloads and operations, trade efficiency for flexibility. Deep learning models, composed of a small number of distinct operations that are repeated millions or billions of times, often do not require this high level of flexibility. Furthermore, while modern CPUs and GPUs can perform several operations concurrently, they do not leverage the extensive data reuse opportunities present in deep learning models.
The need for fast, efficient computation, the use of a few distinct operations, and the opportunities for data reuse have all led to the use of hardware accelerators for deep learning. Accompanying the development of hardware accelerators, software frameworks and compilers for deploying various deep learning algorithms have also advanced and matured. These tools not only enable the execution of deep learning algorithms on accelerators but also carry out mapping optimizations to boost the performance and efficiency of the entire deep learning pipeline.
The survey reviews different methods for efficient Transformer inference and concludes that a full-stack co-design approach using the aforementioned methods can result in up to 88.7x speedup with minimal performance degradation for Transformer inference."
By examining each stage in the end-to-end process — from Pre-production to Production, to Post-production of Intelligent Virtual Assistants — numerous opportunities for performance optimization emerge. I plan to follow up this post with insights on areas with potential for performance enhancement, such as vector databases, model architectures, active learning, chiplets, and more :)