Why The Future Of Computing Is Heterogeneous
I was taking some older computers to the recycling center recently, big towers that had fallen victim to Moore's Law many years before, and it struck me as I looked at one of them about how ... primitive it was. There was a great big square chip on a motherboard, holding its position like the king of some medieval fiefdom, and a few secondary chips clipped in as memory on cards that were about the size of a modern cell phone. GPUs, DSPs, each of these had their own card and component, and as I looked at it I realized that the latency between these systems must have been huge.
Multicore CPUs have been around for a while now with the idea that a multicore processor could do many actions in parallel. The biggest issue to making multicores work primarily came down to the issue of multi-threading, which was far from native even ten years. A multithreaded application is one where different processes can be run in parallel, and in general, this worked best at a core system level where you had multiple different windows (or ports or application processes) that each had their own work context.
An example of this would be browser panes in a web browser. The browser itself manages the multiple threads associated with each pane or window, though famously a given pane usually had only one thread (or processor) associated with it. Interpane processing consequently, usually required a separate thread, and thread contentions even with parallel processing typically proved problematic. Areas where homogenous threading did make sense came in areas like databases, where searches typically required a map/reduce type approach of finding appropriate matches then eliminating duplicates.
Graphics Processing Units (GPUs) and Digital Signal Processors (DSPs) appeared about the same time in the late 1990s. Until comparatively recently most of the work of actually painting the screen with graphics (windows, buttons, text, etc.) was managed by the Central Processing Unit (CPU), but these kinds of operations usually took cycles away from other kinds of computations. These first graphical processing units were typically based upon the same kind of architectures as CPUs, though with a few additional functions that were specialized for rendering basic two and three-dimensional constructs more efficiently.
3D meshes drove the rise of the GPU, but the same tools that make the depiction of objects in 3D space also are heavily used by AI recognition systems.
Over time, as specialized gaming systems became the rage and console games emerged specifically for gaming, the ability to handle 3D processing grew significantly, largely by changing the architecture to better handle the processing of vectors and tensors (matrices). Most people today are at least dimly aware of 3D meshes - patterns of lines that are used to define a surface, giving a kind of wireframe look to 3D object. Increase the density of each face of these meshes will make the difference between something looking blocky and something looking curved and smooth. By applying orthogonal (or normal) vectors to each face and associating that properties such as luminosity, texture mapping, shininess (specular surfaces), and transparency, and you can make such meshes look more realistic, especially when passed through a kernel (a kind of filter) for managing what happens at points and edges.
There's been an evolution of GPUs from Hollywood (or more precisely south of San Jose) to architectural firms and gaming companies to hand-held devices, with GPUs taking on more and more of the load of heavy-duty rendering. At each step, you're typically using higher-end GPUs that are then programmed with specific algorithms (more and more often machine-learning based) to prove out concepts, before those algorithms are eventually pushed into the GPU as firmware. This process of going from software to hardware-enabled functionality seems to take about eight years or so on average, which can be useful in predicting when those really cool Hollywood special effects will be showing up on your laptop.
Digital Signal Processors (DSPs) are another kind of specialized computer that primarily handles auditory processing. Unlike GPUs, DSPs typically are designed to handle fast Fourier processing, in essence attempting to convert signals coming from microphones into wave forms that can then be processed. DSPs usually contain both analog to digital (A2D) and digital to analog (D2A) processing pipelines, the first of which was typically used for sampling external noise and putting them into a digital form, the second converting those digital signals into magnetic oscillations that drive speakers.
DSPs increasingly figure very heavily in the text to speech (TTS) and speech to text (STT) arenas such as those employed by Siri, Alexa, Cortana and other voice agent systems. Not surprisingly, you're also seeing these chips equipped with AI-enabled systems for interpreting speech and making sense of it via natural language processing. By giving such chips significant memory allocations, computers, tablets, and smartphones are able to do a lot of their processing directly in the device without requiring external service processing.
These dedicated chips are also increasingly being run in parallel, in great part because many of the operations that GPUs and DSPs specialize in lend themselves to parallel processing. GPUs, in particular, are seeing adoption outside the media space because machine learning and deep learning systems tend to be heavily graph- and matrix- dependent. nVidia and AMD make the bulk of consumer-grade GPUs, but recently Tesla announced a custom chip designed for fully autonomous vehicles (not to be confused with the nVidia Tesla chip, which came out about twelve years ago). nVidia is also working on autonomous vehicle GPUs. Graph database, both property and RDF-based, are also an area where GPUs are finding homes, as graph search and traversal is, not surprisingly, something that GPUs do very well.
In a similar vein, DSPs are increasingly finding uses in the data analytics space, as both arenas concentrate on filtering out noise to better find signals. DSPs are frequently deployed as part of sensor arrays and array managers, where the fundamental problem is eliminating the amount of spurious information in something approaching realtime. DSPs are also increasingly paired with Charge Coupled Devices (CCDs) which are at the heart of most modern video cameras, and autonomous vehicles are dependent upon CCDs, DSPs AND GPUs working in conjunction with one another, as are drones and robots.
This brings us back to the notion of heterogeneous computing systems. Web services architectures are great when it comes to dealing with most human scale web applications, but the demand is rising for coupled systems where latency becomes the primary limiting factor. It's likely that, over the course of the next few years, heterogeneous computing standards will become the norm. Such standards will make it possible to more easily create HC systems where different vendor chips can be swapped in or out, where CPUs, GPUs, DSPs, smart memory and other types of chips can communicate directly ... or at worst, through a commonly addressed bridge.
We're really just at the beginning of the heterogeneous computing (HC) space now, especially as it relates to other related areas such as edge computing. In essence, the edge computing paradigm tries to put as much computing power as possible along the edges, with centralized data repositories then collecting and collating the results of such operations are providing distributed data services. HC, in turn, provides the computational muscle at a deep level to eliminate the overall requirements for highly specialized (and comparatively slow) soft programming at each of these end points, in essence, encoding soft algorithms in silicon and in-memory databases at the point of interaction.
Overall, heterogeneous computing will likely be one of the defining traits of 2020 computation, along with a shift towards graph-based architectures, message-oriented systems, and immutable programming. The noosphere continues to grow.
Kurt Cagle is the editor of The Cagle report, and a longtime blogger and writer focused on the field of information and knowledge management.
The Cagle Report is a daily update of what's happening in the Digital Workplace. He lives in Issaquah, Washington with his wife, kid, and cat. For more of the Cagle Report, please subscribe.
Principal Software Engineer at Tabcorp
4 年The past of computing is heterogeneous. Future might be less obviously so because of scale and what specialisations make sense/are really needed. DSP, GPU both describe application domains not processor or a specific complete system architecture. Both led to emergence/refinement of architectures to match aspects of those domains. And even then they are broad domains. DSPs rather predate your "late 90s" figure (among other things the efforts to stuff much more than 2400bps over analogue phone line necessitated DSPs to be able to push towards the Shannon limit of that channel) which was reached by about 1990... With increasing integration and parallelism right down to scheduling instruction execution on available units the distinctions fade away a bit beyond how many of what you want to execute on and how long a pipeline latency is compatible with your workload. It's conceivable that convergence rather than divergence will ensue but it depends on a rethink of many aspects. Including higher level representations of parallelism than "threads".
Software Engineer with emphasis on performance, continuous improvement and research
4 年Nicely written; good glance over several core components. ?? Minor glitch: "interpreting speed and making sense of it" ("speech") ??