登录查看更多内容

From Hops to Monoliths: Crafting High-Performance Architecture in AdTech

Shreeniwas V Iyer

Harnessing Talent, Delivering Impact | Engineering Leader | Ex-CTO | Startup Founder

发布日期: 2024年10月26日

In my previous posts, Inside the World of Trillions: The Real-Time Ad Auctions Powering the Internet and Optimizing Networks for Billions: Scaling Efficiency and Speed in AdTech, I shared how, at Quantcast, we use a series of network optimizations to achieve scalability at low cost and discussed some of the trade-offs we make along the way. Here’s a recap: we process approximately 250 billion transactions daily, with responses required within 40-50 milliseconds. Of these, 220-230 billion are bidding endpoint requests, where we either bid or opt out. In this architecture, we prioritize scale, low latency, and low cost over absolute completeness.

Simplifying the Bidding Stack

To streamline our bidding stack, we reduce communication hops to the bare minimum. Our entire bidding infrastructure consists of only three components: a Layer 7 Load Balancer, a component called Mux, and another component called Bidder. Additionally, we use Aerospike as a distributed data store to quickly access critical information during lookups.

Given these trade-offs, we intentionally avoid a microservices architecture. We’ve learned the hard way that multiple service hops for discrete business logic tasks are not worth the cost in our low-latency environment. A few years ago, we attempted to add a third hop in our stack’s straight line, which led to costly issues during trial phases. Maintaining a small number of monoliths has proven much more effective for us.

Our basic software architecture (explained through out the post)

Why We Use Two Compute Systems: Mux and Bidder

So, why do we have two compute systems—Mux and Bidder? This setup is no accident and isn’t driven by legacy reasons. As an AI-powered company, every bid we make is driven by an AI model. These models are trained in the background but are tested live. By separating Mux and Bidder, we can run parallel bidding processes—one with production code and one with experimental code—to directly compare outcomes in real-world conditions.

Dividing Responsibilities: Mux and Bidder

Mux handles everything that doesn’t require a data lookup, eliminating a category of bids that we would bid on anyway. Once it gathers all preliminary data, it sends requests to the Bidder (both production and experimental versions), collects results, and consolidates them for a final decision.

领英推荐

How Salsify leveraged GraphQL to improve their API…

Vintage Global 8 个月前

Architectural Design Patterns

Manuel Soto 1 年前

Exploring GraphQL: The Modern Alternative to REST

Centizen, Inc. 11 个月前

Data Storage Choices: Minimalism in the Critical Path

If you noticed, our architecture barely mentions traditional databases, files, or other data stores. That’s because we don’t use them in the critical path. Instead, any necessary configuration is pre-loaded into memory and refreshed through scheduled updates every few minutes or hours. We run machines with substantial memory and use highly structured data structures tailored to each unique requirement, optimizing for speed and cost.

Logging and Batch Processing

We also avoid writing data in the critical path. Each bid generates a log entry, but even that’s processed in the background and saved in large batches into Parquet files before being centralized. Any updates or insights from this data typically take place 1-4 hours later.

Real-Time Input for Control Systems

The one system where real-time input is essential is our control system, which needs immediate data on supply volumes for each campaign. To balance memory and processing demands, we use a reservoir sampling mode, allowing us to push out updates in 30-second batches.

Observability: Real-Time Monitoring and Optimization

Observability is crucial to our architecture. We log extensive time-series data into observability systems like Datadog or Prometheus, allowing us to track system health, detect errors, and identify optimization opportunities—all in the background.

Our architecture is purpose-built from the ground up to optimize for latency, scalability, and cost. What are some techniques you use to achieve similar results?

要查看或添加评论，请登录

Shreeniwas V Iyer的更多文章

The True Cost of Building a (Software) Business

2024年12月3日

The True Cost of Building a (Software) Business

(pre-script: In theory, the following applies to all business, but capital investments in non-software business is…
Retention Beyond the Paycheck: The Culture Equation

2024年11月27日

Retention Beyond the Paycheck: The Culture Equation

My professional experience exceeded two decades earlier this year. Over 85% of it has been spent in unlisted…

1 条评论
When an Ant Halts an Elephant: Lessons in Troubleshooting

2024年11月16日

When an Ant Halts an Elephant: Lessons in Troubleshooting

Client A: An important client with whom we spent years building a reputation of good ad delivery performance. They use…

1 条评论
Scaling AdTech Engineering: Building a Culture of Ownership, Optimization, and Impact

2024年11月5日

Scaling AdTech Engineering: Building a Culture of Ownership, Optimization, and Impact

In my last three posts, we explored the foundational pillars of adtech engineering: managing trillions of real-time ad…
Optimizing Networks for Billions: Scaling Efficiency and Speed in AdTech

2024年10月17日

Optimizing Networks for Billions: Scaling Efficiency and Speed in AdTech

To recap from my previous post, we process approximately 250 billion transactions daily, responding within 40ms-50ms…

2 条评论
Inside the World of Trillions: The Real-Time Ad Auctions Powering the Internet

2024年10月11日

Inside the World of Trillions: The Real-Time Ad Auctions Powering the Internet

Most modern advertising happens in real-time. As consumers engage with content, the ads displayed next to that content…
Engineering Operational Excellence: Moving Beyond Incident Management

2024年10月4日

Engineering Operational Excellence: Moving Beyond Incident Management

In a recent post, I discussed how we use incidents as a tool to drive operational excellence. But perhaps it’s worth…

1 条评论
?? Ever wondered how to efficiently store and query geospatial data?

2024年9月29日

?? Ever wondered how to efficiently store and query geospatial data?

When it comes to applications needing proximity queries, PostGIS for Postgres is a go-to option among databases. Check…
Why the UK is Leading the Way in Tech Immigration

2024年9月27日

Why the UK is Leading the Way in Tech Immigration

As someone who has lived abroad for over 15 years, I’ve experienced firsthand how important stability is for migrants…
Balancing Speed and Stability: Continuous Learning from Incidents

2024年9月19日

Balancing Speed and Stability: Continuous Learning from Incidents

At Quantcast, we operate at an extremely large scale. On a typical day, we process between 220-250 billion transactions…

See all articles

From Hops to Monoliths: Crafting High-Performance Architecture in AdTech

Shreeniwas V Iyer

Harnessing Talent, Delivering Impact | Engineering Leader | Ex-CTO | Startup Founder

Simplifying the Bidding Stack

Why We Use Two Compute Systems: Mux and Bidder

Dividing Responsibilities: Mux and Bidder

领英推荐

Data Storage Choices: Minimalism in the Critical Path

Logging and Batch Processing

Real-Time Input for Control Systems

Observability: Real-Time Monitoring and Optimization

Shreeniwas V Iyer的更多文章

社区洞察

其他会员也浏览了

Optimizing API Calls in CMS-Driven Applications: Choosing Between Full Page APIs and Multiple API Requests

Architecture Weekly #166 - 12th February 2024

Workflows as a Distributed Transactional Backend

The Challenges of Event-Driven Architecture: Dealing with the Dual Write Anti-Pattern

Event Driven APIs

Under the Hood: How Open Architecture is Transforming Financial Market Infrastructure

High level System Design of Scalable and Efficient Notification System for any Application

API Gateway in Spring Boot (MICROSERVICES)

Command, Query, and Conquer: A Practical Dive into CQRS

Simplifying the Bidding Stack

Why We Use Two Compute Systems: Mux and Bidder

Dividing Responsibilities: Mux and Bidder

领英推荐

Data Storage Choices: Minimalism in the Critical Path

Logging and Batch Processing

Real-Time Input for Control Systems

Observability: Real-Time Monitoring and Optimization

Shreeniwas V Iyer的更多文章

The True Cost of Building a (Software) Business

Retention Beyond the Paycheck: The Culture Equation

When an Ant Halts an Elephant: Lessons in Troubleshooting

Scaling AdTech Engineering: Building a Culture of Ownership, Optimization, and Impact

Optimizing Networks for Billions: Scaling Efficiency and Speed in AdTech

Inside the World of Trillions: The Real-Time Ad Auctions Powering the Internet

Engineering Operational Excellence: Moving Beyond Incident Management

?? Ever wondered how to efficiently store and query geospatial data?

Why the UK is Leading the Way in Tech Immigration

Balancing Speed and Stability: Continuous Learning from Incidents

社区洞察

其他会员也浏览了

Optimizing API Calls in CMS-Driven Applications: Choosing Between Full Page APIs and Multiple API Requests

Architecture Weekly #166 - 12th February 2024

Workflows as a Distributed Transactional Backend

The Challenges of Event-Driven Architecture: Dealing with the Dual Write Anti-Pattern

Event Driven APIs

Under the Hood: How Open Architecture is Transforming Financial Market Infrastructure

High level System Design of Scalable and Efficient Notification System for any Application

API Gateway in Spring Boot (MICROSERVICES)

Command, Query, and Conquer: A Practical Dive into CQRS