When it Comes to AI, Speed is The Name of The Game

When it Comes to AI, Speed is The Name of The Game

“The good news is that it makes the model smaller, bad news is that I may be out of a job,” Song Han, assistant professor at MIT, said jokingly. The crowd laughed.

This line seemed to echo throughout Synopsys’ workshop at last week’s Embedded Vision Summit in Santa Clara, California. The Summit brought together computer vision specialists, corporate leaders, and innovative entrepreneurs to discuss challenges and opportunities facing AI development today.

Han, giving the first talk of the day, explained that as artificial intelligence becomes more commercialized, companies won’t be able to scale unless they address several big bottlenecks.

Research predicts that the AI industry will grow to over $190 billion by 2025. And on average, one AI company is launched every week in the U.K. alone.

As demand soars, engineers are racing the clock to deliver the highest performance at the lowest cost.

Sometimes even at the expense of automating themselves out of the process.

But even then, it’s not enough. There are some real constraints that keep us from scaling technology, like processing and storage, at the same rate as AI.

It seems the more we use AI, the less we are able to support it.

Hardware Can’t Keep Up

In 1965, Gordon Moore, Intel’s co-founder, noticed that as technology continued to advance, a pattern emerged; computer chips double the density of transistors roughly every two years, meaning increased processing power and reduced energy consumption.

This became a general rule of thumb called Moore’s Law that would drive innovation in semiconductors and in the technology industry as a whole.


Each generation of semiconductor technology has lead to cheaper, faster, and smaller computer chips, and correspondingly, devices. This kind of development is mainly what drove the evolution of desktops to laptops, landlines to cell phones and more.

But we are now reaching a point where innovation is bumping up against physical limits.

A recent study shows that we may even be reaching the end of Moore’s law.

Computation per unit cost and power is no longer increasing at the rate it once was. This is a challenge as technology such as home assistants, self-driving cars, and VR ramp up in adoption.

The installed base of Internet of Things devices is expected to grow to over 30 billion units in just two years. And smart devices observe, analyze, and deliver context-aware content in real-time, which means they require significantly more computing power and/or graphics processing.

Just think about how a home assistant has to recognize your voice, understand your command or question, and associate context all within a matter of seconds before responding.

Let’s say a byte of data is a gallon of water. With today’s technology, the average household generates enough data to fill an average house every 10 seconds. In 2020, this will only take two seconds.

AI will continue to grow at the rate that training data is added, which means as more consumers adopt the technology, the software will become more accurate. And accuracy will attract more users.

Demand feeds demand.

But if we’re unable to develop smaller, faster chips at the same rate, there won’t be any way to actually deliver this smarter AI technology.

Go Smaller or Go To The Edge

Rather than relying solely on hardware innovation to accelerate, we can focus on addressing the root of the problem. By compressing the data sets or doing a preliminary triaging for saliency, we can help streamline the process.

One method is to use machine learning compression to keep neural networks compact, which helps process data more quickly and accurately. Deep compression allows AI to narrow down the most relevant data points to share, without losing the meaning that the data represents.

Han spent over six years researching deep compression, looking into ways to build more efficient machine learning algorithms. He explained that deep compression is in part inspired by pruning, a natural process that occurs in the human brain. Pruning is the brain’s way of shedding old neurons which may have become damaged or degraded, making room for new neurons to form.

In that same way, deep compression in AI networks can reduce memory bandwidth and offer significant speedup and energy savings.

Many of the major tech players, like Facebook, Microsoft, Google, and Nvidia, use this solution to power software that we use on a daily basis.

Google Translate, for example, uses deep compression on an AI called the Google Neural Machine Translation, which processes and translates one sentence at a time instead of individual words. This allows the AI to achieve close to human accuracy with greater efficiency.

“If people use Google voice search for just three minutes a day and we ran deep neural nets for our speech recognition system on the processing units we were using, we would have had to double the number of Google data centers!” says Norm Jouppi, Hardware Engineer at Google.

Facebook uses similar deep compression techniques to enable their AR Camera to run computer vision seamlessly on mobile.


In the case of Facebook, this technique made the model size eight times smaller.

And this use case is interesting, because it enables computing not only on the cloud, but even lighter weight computing applications on the edge. 

Edge computing, the method of both storing data and processing it on the endpoint devices closer to where it is gathered, goes hand-in-hand with deep compression.

It shortens the path for traffic, triaging the data to process some locally. This streamlines the amount of traffic that is processed centrally and reduces the amount of bandwidth needed.

Self-driving cars are a great example of innovation with edge computing.

They produce large volumes of data with their numerous sensors and have to make many decisions in real-time. Since autonomous cars go down roads facing cars, pedestrians, and other unpredictable obstacles, it’s important that all of this can be processed quickly, and therefore, locally, instead of being sent over the network, which takes time and could create latency.

Moving off the cloud and onto the edge reduces operating and communication costs since there is less data being processed over the network.

Computational demands will only continue to grow as AI gains popularity and changes not only consumers’ lives but also the way entire industries, like retail and security/surveillance, operate.

Whether it’s pruning data or moving off the cloud altogether, companies will have to continue exploring ways to grow at scale.

Keeping Up With The Speed of AI

When it comes to AI development, bandwidth is power.

Improving AI often requires more memory, more network bandwidth, and more engineering resources, but solving these challenges can be costly or require trade-offs. Which one should be prioritized? Does it have to be a zero-sum game? If so, what is a good balance?

These decisions aren’t easy because no one has the right AI formula just yet. Using compression and intelligence at the edge are all steps in the right direction. The challenge for all of us is to keep up with the speed of change and increasing resource requirements of the latest AI technologies. But one thing’s for sure, if we do AI right, it has tremendous potential to speed up an organization’s throughput and revenues as well.

Ofer Rosenberg

System/Software Architect | Generative AI Software & Tools @ Qualcomm

6 年

Great read. One comment though, regarding “go smaller or go to the edge” - do both and get efficient AI processing at the edge. :-)

Ed Nelson

Strategic Advisor

6 年

This is a really great article!

Shah Hardik

Data Centre | IT Infrastructure | Colocation Service Provider | Global Switch | CloudEdge | Investor | Entrepreneur

6 年

I am impressed with the IT research and knowledge gone into this piece. Great read.

要查看或添加评论,请登录

Denise Chan的更多文章

社区洞察

其他会员也浏览了