登录查看更多内容

Building a Robust Error System in Go

Aditi Mishra

Senior Software Engineer @ Gojek | Ex - Adobe, Delhivery | Distributed Systems | Engineering Leader

发布日期: 2025年1月9日

Go's philosophy of explicit error handling through return values is both a blessing and a curse. While it promotes clear error checking, organizations often struggle as their applications scale. Let's explore how to build a sophisticated error management system that maintains Go's simplicity while adding enterprise-grade capabilities.

The Growing Pains of Error Handling

Assume you are building a house. Initially, you might just need a hammer and some nails. But as the structure grows, you need more specialized tools. Similarly, as Go applications scale, simple error returns become inadequate:

// Traditional approach - limited information  
func processOrder(id string) error {  
    if err := validateOrder(id); err != nil {  
        return fmt.Errorf("invalid order: %w", err)  
    }  
    return nil  
}

This approach leaves teams struggling with:

Inconsistent error messages across microservices
Limited error classification capabilities
Difficulty tracking error patterns
Challenge in attaching contextual information

Enter Domain-Driven Error Management

Instead of treating errors as mere strings, let's think of them as rich objects that carry domain meaning. We'll build a system that treats errors as first-class citizens:

// Domain error codes  
const (  
    OrderValidationFailed = "ORDER:VALIDATION:001"  
    PaymentDeclined      = "PAYMENT:PROCESS:001"  
    InventoryUnavailable = "INVENTORY:CHECK:001"  
)  

// Rich error type  
type BusinessError struct {  
    Code        string                 `json:"code"`  
    Message     string                 `json:"message"`  
    Details     map[string]interface{} `json:"details,omitempty"`  
    TraceID     string                 `json:"trace_id,omitempty"`  
    ServiceName string                 `json:"service_name,omitempty"`  
}  

func (e *BusinessError) Error() string {  
    return fmt.Sprintf("[%s] %s", e.Code, e.Message)  
}

Building Blocks of Modern Error Management

1. Domain Classification

Think of error codes like ZIP codes - they help route information efficiently:

func NewOrderError(code string, msg string) *BusinessError {  
    return &BusinessError{  
        Code:        code,  
        Message:     msg,  
        ServiceName: "order-service",  
        TraceID:     generateTraceID(),  
        Details:     make(map[string]interface{}),  
    }  
}  

// Usage  
if !isValid {  
    return NewOrderError(OrderValidationFailed, "Invalid order structure")  
        .WithDetail("orderId", id)  
        .WithDetail("reasons", validationErrors)  
}

2. Context Preservation

Like a chain of evidence, errors should maintain their history:

type ErrorChain struct {  
    Current *BusinessError  
    Cause   error  
    Stack   []string  
}  

func (ec *ErrorChain) Unwrap() error {  
    return ec.Cause  
}  

func WrapError(err error, code string, msg string) *ErrorChain {  
    return &ErrorChain{  
        Current: NewOrderError(code, msg),  
        Cause:   err,  
        Stack:   captureStack(),  
    }  
}

3. Error Translation Layer

Create clean boundaries between technical and business errors:

func translateDatabaseError(err error) *BusinessError {  
    switch {  
    case errors.Is(err, sql.ErrNoRows):  
        return NewOrderError("DB:NOT_FOUND", "Resource not found")  
    case isDuplicateKey(err):  
        return NewOrderError("DB:DUPLICATE", "Resource already exists")  
    default:  
        return NewOrderError("DB:UNKNOWN", "Database operation failed")  
    }  
}

领英推荐

Speedscale Review - Are load tests relics from the…

Jakub Dering 3 个月前

Top 5 Things to Know About System Architecture

revinr 1 年前

The Hidden Costs of Technical Debt (and How VTEST Can…

Shak H. 2 个月前

Practical Implementation Strategies

Error Factory Pattern

Create domain-specific error factories:

type OrderErrorFactory struct {  
    service string  
    env     string  
}  

func (f *OrderErrorFactory) ValidationFailed(orderId string) *BusinessError {  
    return NewOrderError(OrderValidationFailed, "Order validation failed")  
        .WithDetail("orderId", orderId)  
        .WithDetail("service", f.service)  
        .WithDetail("environment", f.env)  
}

Middleware Integration

Standardize error handling across your HTTP handlers:

func ErrorMiddleware(next http.Handler) http.Handler {  
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {  
        defer func() {  
            if err := recover(); err != nil {  
                logError(r.Context(), err)  
                respondWithError(w, NewOrderError("SYS:PANIC", "Internal server error"))  
            }  
        }()  
        
        next.ServeHTTP(w, r)  
    })  
}

Monitoring and Observability

Transform rich error data into actionable insights:

func logBusinessError(ctx context.Context, err *BusinessError) {  
    metrics.IncCounter(fmt.Sprintf("errors.%s", err.Code))  
    
    logger.WithFields(log.Fields{  
        "error_code":    err.Code,  
        "service":       err.ServiceName,  
        "trace_id":     err.TraceID,  
        "details":      err.Details,  
    }).Error(err.Message)  
}

Best Practices

Domain First: Design error codes around business domains, not technical implementations
Context Rich: Include relevant debugging information without exposing sensitive data
Consistent Patterns: Establish standard error creation and handling patterns across teams
Clear Boundaries: Maintain clear separation between internal and external error representations
Gradual Migration: Implement the new system incrementally, starting with critical paths

Real-World Example: Order Processing

Here's how it all comes together:

func (s *OrderService) ProcessOrder(ctx context.Context, order Order) error {  
    // Create domain-specific error factory  
    ef := &OrderErrorFactory{service: "order-processor", env: s.env}  
    
    // Validate order  
    if err := s.validator.Validate(order); err != nil {  
        return ef.ValidationFailed(order.ID).  
            WithDetail("validation_errors", err.Error())  
    }  
    
    // Check inventory  
    available, err := s.inventory.Check(ctx, order.Items)  
    if err != nil {  
        // Translate technical error to business error  
        return WrapError(err, "INVENTORY:CHECK", "Failed to verify inventory")  
    }  
    
    if !available {  
        return ef.OutOfStock(order.Items)  
    }  
    
    // Process payment  
    if err := s.payment.Process(ctx, order.Payment); err != nil {  
        return ef.PaymentFailed(err)  
    }  
    
    return nil  
}

Conclusion

Building a robust error management system is about finding the sweet spot between simplicity and functionality. By treating errors as first-class citizens and incorporating domain-driven design principles, we can create a system that's both powerful and maintainable.

The key is to start small and evolve the system based on real needs rather than hypothetical scenarios. Focus on solving actual problems your team faces, and gradually expand the system's capabilities as new requirements emerge.

Remember: Good error handling isn't just about catching failures - it's about providing meaningful information that helps maintain and improve your system over time.

How do you handle errors in your Go applications? What challenges have you faced with error management at scale? Share your experiences in the comments below.

Utsav K.

Software Development Engineer 3

1 个月

i find custom error types a more cleaner way to handle

1 次回应

Himanshu .

Ex Deliveroo| Ex Dotpe| Ex Samsung| DTU'19

1 个月

We can use error interface rather than explicity passing custom types across layers and every layer should have a translation mechanism. So the repo/ downstream errors can be handled with business context in the service layer.

3 次回应

查看更多评论

要查看或添加评论，请登录

Aditi Mishra的更多文章

Curious About How Tech Companies Implement and Manage Promo Codes to Engage Customers?

2024年7月16日

Curious About How Tech Companies Implement and Manage Promo Codes to Engage Customers?

The simplest answer lies in the sophisticated use of rule trees or decision trees. 1.

5 条评论
Comparing Rate Limiting Algorithms: Leaky Bucket and Token Bucket

2024年7月14日

Comparing Rate Limiting Algorithms: Leaky Bucket and Token Bucket

Rate limiting is crucial for maintaining the stability and performance of systems, especially in network traffic…

1 条评论
How do ride-hailing companies efficiently find the nearest drivers to a passenger's location in real-time?

2024年4月3日

How do ride-hailing companies efficiently find the nearest drivers to a passenger's location in real-time?

Unlike traditional searches for static objects or continuous searches for moving objects without direct interaction…

3 条评论
Why functional programming isn't going away anytime soon?

2020年8月20日

Why functional programming isn't going away anytime soon?

Functional programming has been around for the last 60 years, but so far it’s always been a niche phenomenon.Although…
Understanding the Choices

2019年7月22日

Understanding the Choices

"?Choose between a Monolith and Microservices" Ignore the hype and stick with a monolithic architecture as long as you…

See all articles

Building a Robust Error System in Go

Aditi Mishra

Senior Software Engineer @ Gojek | Ex - Adobe, Delhivery | Distributed Systems | Engineering Leader

The Growing Pains of Error Handling

Enter Domain-Driven Error Management

Building Blocks of Modern Error Management

1. Domain Classification

2. Context Preservation

3. Error Translation Layer

领英推荐

Practical Implementation Strategies

Error Factory Pattern

Middleware Integration

Monitoring and Observability

Best Practices

Real-World Example: Order Processing

Conclusion

Aditi Mishra的更多文章

社区洞察

其他会员也浏览了

Unclogging Delays to Accelerate Flow in any System

FMECA: Identifying Potential Systemic Failures

Defect categorization

Point 5. Feature Flags - In the context of reducing bugs in production

All About Message Queues

Piled Higher and Deeper? ... Increasing Operational Faults and Technical Debt

Optimizing Logging Levels: A Developer’s Guide

Without proper assertions, are we really testing?

Peel Back the Onion: Understanding the Root Cause with the 5 Whys Analysis ?♀?

CAP for Observability Signals

The Growing Pains of Error Handling

Enter Domain-Driven Error Management

Building Blocks of Modern Error Management

1. Domain Classification

2. Context Preservation

3. Error Translation Layer

领英推荐

Practical Implementation Strategies

Error Factory Pattern

Middleware Integration

Monitoring and Observability

Best Practices

Real-World Example: Order Processing

Conclusion

Aditi Mishra的更多文章

Curious About How Tech Companies Implement and Manage Promo Codes to Engage Customers?

Comparing Rate Limiting Algorithms: Leaky Bucket and Token Bucket

How do ride-hailing companies efficiently find the nearest drivers to a passenger's location in real-time?

Why functional programming isn't going away anytime soon?

Understanding the Choices

社区洞察

其他会员也浏览了

Unclogging Delays to Accelerate Flow in any System

FMECA: Identifying Potential Systemic Failures

Defect categorization

Point 5. Feature Flags - In the context of reducing bugs in production

All About Message Queues

Piled Higher and Deeper? ... Increasing Operational Faults and Technical Debt

Optimizing Logging Levels: A Developer’s Guide

Without proper assertions, are we really testing?

Peel Back the Onion: Understanding the Root Cause with the 5 Whys Analysis ?♀?

CAP for Observability Signals