登录查看更多内容

Evaluating Hardware Transcoder Performance

NETINT Technologies Inc.

AV1, HEVC, H.264 ASIC-based video encoders for metaverse, cloud gaming & real-time streaming data center workloads.

发布日期: 2022年12月2日

+ 关注

Evaluating Hardware Transcoder Performance

by Jan Ozer at NETINT Technologies Inc.

If you’ve ever benchmarked software codecs, you know the quality/throughput tradeoff; simply stated, the higher the quality, the lower the throughput. In contrast, for many first-generation hardware encoders, throughput was prioritized, but the quality was fixed; you got what you got.

Most next-gen hardware encoders offer presets or other switches to optimize quality at a cost to throughput that can be even more striking than with software. In comparing specifications for encoders, remember the quality/throughput tradeoff. And when you see quality stats, think, “hmm, at what throughput?” Or, if you see throughput stats, ask, “at what quality?”

Whenever you test a hardware encoder, you should start by identifying the configuration options that most impact quality and throughput and then test across a range of configurations to get a sense of the performance/quality tradeoff. If you plug in pricing and power consumption figures, you can also easily compute the cost per stream and watts per stream. This is the CAPEX and OPEX side of the equation.

Then you can choose the “operating point” that delivers the optimum blend of quality and throughput for your applications. When comparing multiple encoders, you should perform the same analysis for each to enable a complete apples-to-apples comparison.

Recently, I benchmarked the performance of NETINT’s Quadra Video Processing Unit (VPU) against the NVIDIA T4. In this post, I’ll review the testing and the Quadra results to give you a feel for the hardware evaluation process. In a future post, I’ll review the NVIDIA findings and compare the two.

Benchmarking Quadra

Briefly, Quadra is NETINT’s newest ASIC-based transcoder, called a VPU, because it has onboard decoding, scaling, encoding, and overlay, plus an 18 TOPS AI engine. The VPU can create encoded bitstreams in H.264, HEVC, and AV1.

Quadra has two major configuration options that impact quality, lookahead buffer, and rate-distortion optimization.

Briefly, the lookahead buffer allows the encoder to look at frames ahead of the frame being encoded, so it knows what’s coming and can make more intelligent decisions. This improves encoding quality, particularly at/around scene changes, and it can improve bitrate efficiency. But, lookahead adds latency equal to the lookahead duration, and it can decrease throughput.

Table 1 shows the impact of a 40-frame lookahead buffer when encoding to the H.264 format. The top-line harmonic mean VMAF score is 2.3 points lower, which is borderline significant. But the low-frame differential of almost 16 points could predict transient problems that might be apparent to some viewers. But in addition to injecting 1.3 seconds of latency into the process, you see that the lookahead cuts the throughput by 33%, from 36 1080p streams to 24.

No alt text provided for this image — Table 1. Quality and performance impact of a 40-frame lookahead on Quadra H.264 encoding.

Rate distortion optimization (RDO) functions like most presets and adjusts several parameters that impact both quality and throughout, with higher values increasing quality and reducing throughput. With H.264 output, Quadra offers one level of RDO, while with HEVC, there are three levels, 1, 2, and 3.

Table 2 shows the range of H.264 options tested during the recent benchmarking. LA is lookahead, and I tested three values, 40, 20, and 0. I also tested with RDO on and off. To provide some perspective of quality, the x264 Quality Equivalent shows x264 quality encoded using the same parameters using the presets shown.

At the highest quality setting, Quadra’s output quality slightly exceeded that of x264 using the slow preset, and the unit produced 16 1080p streams. You see that dropping the lookahead from 40 to 20 with RDO disabled had little impact on quality or throughput but cut latency by 0.66 sec, making that choice easy for latency-sensitive events.

领英推荐

Seeed Monthly Wrap-up for December 2022: 6 Amazing…

Seeed Studio 2 年前

Platform Strategies Revealed

Tachyum 10 个月前

Hacking the Linux Kernel in Ada - Part 1

Embedded Computing Design 4 个月前

At the lowest possible quality setting, Quadra’s quality dropped to slightly better than veryfast quality, which is often the x264 preset used for live applications to ensure at least nominal throughput with CPU-only transcoding.

At this quality level, the VPU outputs 36 1080p streams. By adding the cost per stream and watts per stream data, you will get a true feel for the comparative CAPEX and OPEX costs produced by all settings combinations.

Table 3 shows the same data for HEVC transcoding using the same lookahead options and RDO at 1, 2, 3, and 0 (disabled). At the highest quality levels, the output quality nearly matched the x265 encoder using the slow setting but only produced four streams. At the other end of the spectrum, output quality nearly matched x265 using the very fast preset, but the Quadra produced 40 1080p 30 streams, four more than using the H.264 format.?

There are several new hardware encoders coming, and their launches will be accompanied by aggressive claims about quality and throughput. My recommendation is not to assume that the same settings were used for both. In short, you better do your own testing. Trust but verify comes to mind.

When you perform your own testing, remember the methodology explained above:

Identify the most critical quality-related options for your specific application. All producers have different priorities, whether it’s bitrate efficiency, absolute quality, ultra-low latency, density, power consumption, or cost per stream. You need to know what your critical constraints are in order to arrive at the best solution analysis.
Test across a range of configurations from high quality/low throughput to low quality/high throughput. Increasingly, even for quality-driven use cases, imperceptible quality tradeoffs might be necessary to meet an operational cost or energy efficiency target. You should choose the operating point that delivers the optimum blend for your application.
Compute quality, cost per stream, and watts per stream at the operating point to compare against other technologies. Remember to factor in the CAPEX of the additional servers required to run a software encoding service. Our customers report a reduction in the number of machines needed by 90% or more, and this can translate to tens of millions or even hundreds of millions of dollars in savings or reclaimed CPUs that can be used in other parts of the operation.

In the next post, we’ll share quality results from the NVIDIA T4 GPU and compare them to Quadra.

Evaluating Hardware Transcoder Performance

NETINT Technologies Inc.

AV1, HEVC, H.264 ASIC-based video encoders for metaverse, cloud gaming & real-time streaming data center workloads.

Evaluating Hardware Transcoder Performance

If you’ve ever benchmarked software codecs, you know the quality/throughput tradeoff; simply stated, the higher the quality, the lower the throughput. In contrast, for many first-generation hardware encoders, throughput was prioritized, but the quality was fixed; you got what you got.

Benchmarking Quadra

领英推荐

When you perform your own testing, remember the methodology explained above:

RELATED ARTICLES

Future of Encoding

3,339 位关注者

NETINT Technologies Inc.的更多文章

社区洞察

其他会员也浏览了

OpenAI Developing First Custom AI Chip for 2025 Deployment

Are You Ready for 2024?

The Role of M.2 Connectors in Advancing AI Accelerators and SSDs

AI Hardware Round 2: TPU vs. DPU vs. VPU vs. APU vs. QPU

High-Speed Optical Module Demand Soars: AI Computing and Market Projections Drive Innovations

Comparing NVIDIA H100 and GH200: High-Performance AI Chips

Observations on the first order outputs of LLM’s wrt NVIDIA DGX Reference Architecture employing ChatGPT and Claude – an outside in perspective

‘Onforand’: the AI-RAN confluence with NVIDIA and it’s 6G developer’s forum, an outside in perspective on the initiatives to Xform the Telco Industry

Whispers of the Future

Evaluating Hardware Transcoder Performance

If you’ve ever benchmarked software codecs, you know the quality/throughput tradeoff; simply stated, the higher the quality, the lower the throughput. In contrast, for many first-generation hardware encoders, throughput was prioritized, but the quality was fixed; you got what you got.

Benchmarking Quadra

领英推荐

When you perform your own testing, remember the methodology explained above:

RELATED ARTICLES

Future of Encoding

3,339 位关注者

NETINT Technologies Inc.的更多文章

How to Reduce Operational Costs With Maelstrom OTT and Quadra VPU

Transforming Enterprise Video Security with Custom Silicon

Maximizing Video Streaming Efficiency with NETINT’s Quadra Video Server and Norsk Media Processing Framework

The Future of Personalized Ads and FAST Channels

Revolutionizing Media Delivery – 5G-MAG and Dash-js

Why Encoder Compute Efficiency is Being Measured by Everyone

The Power of VPUs Compared to CPUs

The 10x Advantage: VPUs vs. Software Encoders

Unlocking the Future of Real-Time Interactive Streaming

Process Video Security Streams with 10x Efficiency Using NETINT VPUs

社区洞察

其他会员也浏览了

OpenAI Developing First Custom AI Chip for 2025 Deployment

Are You Ready for 2024?

The Role of M.2 Connectors in Advancing AI Accelerators and SSDs

AI Hardware Round 2: TPU vs. DPU vs. VPU vs. APU vs. QPU

High-Speed Optical Module Demand Soars: AI Computing and Market Projections Drive Innovations

Comparing NVIDIA H100 and GH200: High-Performance AI Chips

Observations on the first order outputs of LLM’s wrt NVIDIA DGX Reference Architecture employing ChatGPT and Claude – an outside in perspective

‘Onforand’: the AI-RAN confluence with NVIDIA and it’s 6G developer’s forum, an outside in perspective on the initiatives to Xform the Telco Industry

Whispers of the Future