登录查看更多内容

The Impact of Events on Observability in Booking.com

Prabhash K.

Java | Spring Boot | Hibernate | React | Kafka | AWS | Docker

发布日期: 2024年10月21日

Booking.com runs various distributed services across cloud and on-premises environments, each playing a different role. For instance, “Service X” might handle order processing, while “Service Y” manages inventory. All these services require monitoring to ensure they are performing well and are always available.

They rely on three key areas to monitor these systems: metrics, logs, and traces. To meet most of their observability needs, they use an in-house tool called Booking.com Events. This system helps generate traces, logs, and many metrics, handling tens of millions of events every second.

What is an Event?

An Event is a key-value pair that stores detailed information about a specific task or action which is at the core of Booking.com's observability system.

For example, an Event might represent an HTTP request and could include details like any errors or warnings that occurred during the request, how long it took to process, the number of database queries made, and their latency. It can also include information on A/B tests or other application-specific data.

An Event might look something like this:

{
  "availability_zone" : "new_york",
  "created_epoch": "1724567890.1234",
  "Service_name": "service_X",
  "git_commit_sha": "abc123xyz",
  …
}

At first glance, Events may seem similar to structured logs, but there are key differences. Logs tend to focus on individual error or status messages, possibly with some extra context. In contrast, Events gather data over time, pulling in information from various parts of the task they’re tracking.

The Role of Events

Booking.com uses Events because they provide a complete picture of what’s happening with a task, whether it’s an HTTP request, a scheduled job, or something else. Events capture everything from user inputs to performance details and the environment where the task is running. This data is then used to create traditional monitoring tools like metrics, logs, and traces. It also enables them to run analytics on the Event data.

Events help answer complex questions that involve multiple systems. For example, if there’s an issue during the flight booking process, Events can help us figure out if it’s only affecting certain users, if bots are causing the problem, or if it’s related to any ongoing experiments on our platform.

Since Events contain detailed information across various parts of the system, they allow seeing data that stretches across different software components at Booking.com.

TestGrid.io 5 个月前

Welcome to the Special 1st anniversary edition of…

Venturesathi 1 个月前

Analysis Of Five9’s PaaS Strategy

Sramana Mitra 4 年前

How Does Booking.com Use Events for Observability?

In Kubernetes, applications create Events using the “Events library.” These Events are then sent to the Event-proxy daemon running on the host machine. The Event-proxy performs three key tasks:

Adds Metadata: It enriches the Event with additional details, like the physical host where it was received.
Routes Events: It sends Events to specific Kafka topics based on custom rules. For example, Events from the order service go to the order-related Kafka topic.
Splits Messages: It breaks down a single Kafka message into smaller ones to make them easier to process.

The process is similar for bare-metal servers. However, cloud-native platforms, like serverless environments (e.g., AWS Lambda), use different tools like OpenTelemetry and CloudWatch instead of the Events system.

Once the Event-proxy sends Events to Kafka clusters, several consumers start processing them for different purposes. Here are three important ones:

Distributed Tracing Consumer: This handles tracing for distributed systems and sends span data to the Honeycomb.
APM Generator: It creates various application performance monitoring (APM) metrics and stores them in Graphite. For example, it tracks the number of actions for each app or failure rates.
Failed Event Processor: This focuses on Events that contain errors or warnings, writing them to ElasticSearch.

References

Booking.com Engineering

The Impact of Events on Observability in Booking.com

Prabhash K.

Java | Spring Boot | Hibernate | React | Kafka | AWS | Docker

What is an Event?

The Role of Events

领英推荐

How Does Booking.com Use Events for Observability?

References

Tech Talk

1,139 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

How Automation in SaaS Drives Scalability and Innovation

How to develop a SAAS product using AI

Rebuilding and Relaunching Document360 2.0

The BookMyShow Coldplay Conundrum: A Lesson in High-Scale System Design

The AI SaaS Win Combo: Insights for SaaS Providers

Azure API Management: Key Features, Use Cases, Benefits, and Deployment Strategies

Dreamforce 2024: Registrations Are Open Now!

Business Cloud Communication is changing customer lives on a day-to-day basis

Unleashing the Power of SaaS: Exploring the Top Trends in the US Market

Overview of Azure Event Grid

What is an Event?

The Role of Events

领英推荐

How Does Booking.com Use Events for Observability?

References

Tech Talk

1,139 位关注者

Understanding JDK, JRE, and JVM

2024年11月18日

The Rise of AI in Coding: What It Means for Software Engineers and the Tech Industry

2024年11月4日

Uber’s Implementation of Tiered Storage in Kafka

2024年10月7日

Avoiding MySQL Gap Lock Deadlocks in High-Concurrency Systems

2024年9月30日

Inside Reddit’s Real-Time Safety System: The Story of Signals-Joiner

2024年9月16日

Working with JWT

2024年9月9日

How Slack Runs Cron Scripts Reliably At Scale

2024年9月2日

How Tinder built its API Gateway

2024年8月26日

Kafka Connect

2024年8月19日

Internals of Kafka Topics - Producers & Consumers

2024年8月12日

社区洞察

其他会员也浏览了

How Automation in SaaS Drives Scalability and Innovation

How to develop a SAAS product using AI

Rebuilding and Relaunching Document360 2.0

The BookMyShow Coldplay Conundrum: A Lesson in High-Scale System Design

The AI SaaS Win Combo: Insights for SaaS Providers

Azure API Management: Key Features, Use Cases, Benefits, and Deployment Strategies

Dreamforce 2024: Registrations Are Open Now!

Business Cloud Communication is changing customer lives on a day-to-day basis

Unleashing the Power of SaaS: Exploring the Top Trends in the US Market

Overview of Azure Event Grid