登录查看更多内容

JSONB in PostgreSQL with EF Core - Part 2

Serhii Kokhan

Microsoft MVP??CTO & .NET/Azure Architect??Stripe Certified Professional Developer??Offering MVP Development, Legacy Migration, & Product Engineering??Expert in scalable solutions, high-load systems, and API integrations

发布日期: 2024年4月23日

Introduction

Welcome back to the second part of our series on using JSONB in PostgreSQL with EF Core. In our previous article, we explored the basic functionalities and benefits of JSONB in PostgreSQL. If you missed it, you can catch up here.

In this article, we delve into more advanced aspects of JSONB with EF Core, specifically focusing on:

Enhanced Queries and Operations: Leveraging the flexibility of JSONB to perform complex queries efficiently.
Performance Optimization: Techniques to enhance the speed and responsiveness of database operations involving JSONB data.
Migration Strategies: Best practices for evolving your database schema without downtime or data loss.
Schema Validation and Versioning: Approaches to implement robust schema validation and versioning to maintain data integrity.

Enhanced Queries and Operations

We'll first ensure that our orders table is well-defined, with a rich structure of nested JSONB data to simulate a complex real-world scenario:

DROP TABLE IF EXISTS orders;

CREATE TABLE orders (
    id serial primary key,
    order_info JSONB
);

INSERT INTO orders (order_info)
VALUES
('{
    "customer": {
        "name": "Alice",
        "email": "[email protected]"
    },
    "items": [
        {"name": "Laptop", "price": 1200, "quantity": 1},
        {"name": "Mouse", "price": 40, "quantity": 2}
    ],
    "payment": {
        "method": "Credit Card",
        "details": {
            "card_type": "Visa",
            "last_four": "1234"
        }
    }
}'::jsonb),
('{
    "customer": {
        "name": "Bob",
        "email": "[email protected]"
    },
    "items": [
        {"name": "Camera", "price": 850, "quantity": 1},
        {"name": "Tripod", "price": 120, "quantity": 1}
    ],
    "payment": {
        "method": "Paypal",
        "details": {
            "email": "[email protected]"
        }
    }
}'::jsonb);

Querying Items in an Array

To query elements within an array, you use the jsonb_array_elements function to expand the array into a set of JSONB objects.

SELECT
    order_info->'customer'->>'name' AS customer_name,
    SUM((item->>'price')::int * (item->>'quantity')::int) AS total_spent
FROM
    orders,
    LATERAL jsonb_array_elements(order_info->'items') AS item
GROUP BY
    customer_name;

This query calculates the total money spent by each customer, accounting for the quantity of each item purchased.

JSONB Navigation: order_info->'customer'->>'name' navigates through the JSONB document to extract the customer's name as text.
LATERAL and jsonb_array_elements(): The LATERAL keyword allows the jsonb_array_elements() function to refer to columns of the preceding FROM item (orders in this case). This function expands the JSONB array order_info->'items' into a set of JSONB values, each representing an item.
SUM() and Multiplication: For each item, it multiplies the 'price' by 'quantity', converting both to integers before multiplying, then sums these products to get the total spent per customer.
GROUP BY: Groups results by customer name to aggregate spending per customer.

Adding New Items to an Array

Adds a new item, a Keyboard, to the 'items' array in the JSONB document for the order with id = 1.

UPDATE orders
SET order_info = jsonb_set(
    order_info,
    '{items}',
    order_info->'items' || '{"name": "Keyboard", "price": 100, "quantity": 1}'::jsonb
)
WHERE id = 1;

jsonb_set(): This function is used to replace the 'items' array with a new array that includes the added item.
Array Concatenation (||): Concatenates the existing items array with a new JSONB object representing the keyboard. This is how the new item is added to the array.

Updating Items in an Array

Updates the quantity of the item 'Mouse' in all orders where it exists.

UPDATE orders
SET order_info = jsonb_set(
    order_info,
    array['items', index, 'quantity'],
    '"3"'::jsonb
)
FROM (
    SELECT id, index
    FROM orders,
    jsonb_array_elements(order_info->'items') WITH ORDINALITY arr(item, index)
    WHERE item->>'name' = 'Mouse'
) sub
WHERE orders.id = sub.id;

Subquery and WITH ORDINALITY: The subquery identifies the array index of 'Mouse' items in each order. WITH ORDINALITY adds a sequential number to each element, used here to pinpoint the exact position of the 'Mouse' item in the array.
jsonb_set() with dynamic path: Uses the jsonb_set() function to replace the 'quantity' value of a specific item, dynamically using the index identified in the subquery.

Joins and Group By with JSONB Fields

Consider a scenario where an application tracks user session data within an audit_entries table and maintains basic user information in a users table. Each entry in audit_entries contains details about the session stored in a JSONB column named new_values. This JSONB column, among other things, records the user ID associated with each session. The users table stores user IDs, emails, and names, with the user ID stored as a GUID (UUID), a common format for uniquely identifying records.

CREATE TABLE users (
    id UUID PRIMARY KEY,
    email TEXT,
    first_name TEXT,
    last_name TEXT
);

CREATE TABLE audit_entries (
    id serial PRIMARY KEY,
    entity_name TEXT,
    new_values JSONB
);

Extracts session-related user information by joining audit_entries with the users table, grouping by user details to count sessions per user.

SELECT
    ae.new_values->>'UserId' as user_id,
    u.email,
    u.first_name,
    u.last_name,
    COUNT(*)
FROM
    audit_entries as ae
JOIN
    users as u ON u.id::text = ae.new_values->>'UserId'
WHERE
    ae.entity_name = 'sessions'
    AND ae.new_values @> '{"OperationType": "Buy"}'
GROUP BY
    ae.new_values->>'UserId', u.email, u.first_name, u.last_name;

Column Extraction from JSONB: ae.new_values->>'UserId' gets the UserId as a text string from the JSONB column.
Join Condition: Matches rows from audit_entries with rows from users by casting the UUID u.id to text, which aligns with the UserId extracted from JSONB.
Filtering with JSONB Contains Operator (@>): The query filters entries related to 'sessions' and includes only those where the JSONB data has an 'OperationType' of "Buy".
Aggregation: Uses GROUP BY to group results by UserId and user details, with COUNT(*) calculating the number of sessions per user.

Performance Optimization

PostgreSQL provides several indexing options for JSONB data, with the Generalized Inverted Index (GIN) being the most commonly used. GIN indexes are ideal for JSONB because they efficiently handle queries that involve key existence checks and value lookups within JSONB documents.

GIN Index Basics

A GIN index is essentially a collection of key-value pairs where the key is derived from the JSONB data structure, and the value is the location or "pointer" to the row containing the JSONB value. This type of index is particularly useful for operations like @>, which checks whether one JSONB document contains another.

JSONB Path Operations

For deeply nested JSONB structures, regular GIN indexing may still require significant processing time because it needs to flatten the JSONB data to create the index. To optimize for specific JSON paths and improve query performance on nested structures, PostgreSQL allows the creation of GIN indexes using jsonb_path_ops. This operator class focuses specifically on the paths within a JSONB document, rather than all keys and values, making it more efficient for queries that involve nested data access.

Benefits of Using jsonb_path_ops:

Performance: jsonb_path_ops creates a smaller and more efficient index for queries that target specific paths within a JSONB document.
Storage Efficiency: Indexes created with jsonb_path_ops are typically smaller than those created with the default GIN operator class, as they only index the specified paths rather than all keys and values.
Query Optimization: These indexes are particularly effective for queries involving nested arrays and objects where access paths are well-defined.

Example of Indexing for Complex JSONB Queries

Suppose we have an orders table with a JSONB column order_info that contains detailed nested information about items, including categories.

CREATE INDEX idx_gin_order_items ON orders USING gin ((order_info->'items') jsonb_path_ops);

This index creation statement focuses on the items key within the order_info JSONB document. By using jsonb_path_ops, the index will be optimized for queries that specify paths leading to the items array.

Now, let's use this index in a query that filters orders based on specific item categories:

SELECT * FROM orders
WHERE order_info @> '{"items":[{"category": "electronics"}]}';

The GIN index idx_gin_order_items is automatically used by PostgreSQL to quickly find rows where the items array contains an object with the specified category, significantly speeding up query execution.

Migration Strategies

Let's dive directly into how to handle schema changes when working with EF Core, particularly when your database combines traditional relational columns with JSON data types in PostgreSQL.

The challenge of mixing relational and NoSQL data schemas includes:

Schema Management: JSON columns lack strict schema enforcement, which complicates data consistency.
Migration Complexity: Schema changes involving JSON require not only updates but also complex data transformations.
Query Performance: JSON data queries can be slower and more complicated than purely relational queries.
Tooling and Support: Tools for managing JSON within relational databases are often less developed.

Scenario

We have an application managing Product entities, with Specifications stored as JSON. The required schema changes are:

Modify Specifications.Dimensions: Transition from a string to a List<string> to accommodate multiple dimensions.
Rename Review.Context to Review.Text: Update the field name to better describe its content.
Add Review.ModifiedAt: Introduce a timestamp field to track when reviews are modified.

Step 1: Update Entity Classes

First, let's redefine our entity models to reflect the new schema requirements:

public class Specifications
{
    public string Material { get; set; }
    public string Color { get; set; }
    public List<string> Dimensions { get; set; } = new List<string>();
}

public class Review
{
    public string User { get; set; }
    public string Text { get; set; }  // Updated from 'Context'
    public int Rating { get; set; }
    public DateTime ModifiedAt { get; set; } = DateTime.UtcNow;
}

Step 2: Update DbContext

Adjust the OnModelCreating method to ensure the JSON structures and field names are updated correctly:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Entity<Product>()
        .OwnsOne(p => p.Specifications, a => {
            a.ToJson();
        })
        .OwnsMany(p => p.Reviews, a => {
            a.ToJson();
            a.Property(r => r.Text).HasColumnName("Text");
        });

    modelBuilder.Entity<Product>()
        .Property(p => p.Translations)
        .HasColumnType("jsonb")
        .IsRequired();
}

Step 3: Create Migration Script

Generate and modify the migration script to address the changes in the JSON data:

dotnet ef migrations add UpdateSpecificationsAndReviews

Update the migration script:

protected override void Up(MigrationBuilder migrationBuilder)
{
    migrationBuilder.AddColumn<DateTime>(
        name: "ModifiedAt",
        table: "Reviews",
        nullable: false,
        defaultValueSql: "CURRENT_TIMESTAMP");

    migrationBuilder.Sql(
        @"
        UPDATE ""Products""
        SET ""Specifications"" = jsonb_set(""Specifications"", '{Dimensions}', ('[' || ""Specifications""->>'Dimensions' || ']')::jsonb)
        WHERE jsonb_typeof(""Specifications""->'Dimensions') = 'string';
        "
    );
}

protected override void Down(MigrationBuilder migrationBuilder)
{
    migrationBuilder.DropColumn(
        name: "ModifiedAt",
        table: "Reviews");

    // Handling reversal would depend on the original format
}

Step 4: Apply and Test Migration

Apply the migration and conduct thorough testing:

dotnet ef database update

Testing should ensure data integrity, check for performance issues, and verify overall application functionality.

Schema Validation

Following the migration steps outlined previously, ensuring the integrity and validity of JSON data becomes a critical concern. To tackle this, we integrate JSON schema validation directly into the Entity Framework Core DbContext and establish rigorous testing protocols. This approach not only strengthens our application's reliability but also automates error handling and validation processes.

Validating JSON Schema

To ensure that all JSON data meets specific schema requirements using EF Core and NJsonSchema, a third-party library that facilitates JSON schema validation

First, integrate NJsonSchema into your project by running the following command in your package manager console:

dotnet add package NJsonSchema

Define the JSON schema that describes the structure of your JSON data. Suppose you have a Specifications class with Material, Color, and Dimensions fields:

{
   "$schema": "https://json-schema.org/draft-07/schema#",
   "type": "object",
   "properties": {
     "Material": {"type": "string"},
     "Color": {"type": "string"},
     "Dimensions": {
       "type": "array",
       "items": {"type": "string"}
     }
   },
   "required": ["Material", "Color", "Dimensions"]
}

Now, set up your application to validate JSON strings against the defined schema. This is done by creating a utility class that uses the schema to check if JSON strings are valid:

using NJsonSchema;
using Microsoft.Extensions.Logging;
using System;
using System.Threading.Tasks;

public class JsonSchemaValidator
{
    private readonly ILogger<JsonSchemaValidator> _logger;
    private readonly string schemaJson = @"{
      '$schema': 'https://json-schema.org/draft-07/schema#',
      'type': 'object',
      'properties': {
        'Material': {'type': 'string'},
        'Color': {'type': 'string'},
        'Dimensions': {'type': 'array', 'items': {'type': 'string'}}
      },
      'required': ['Material', 'Color', 'Dimensions']
    }";

    public JsonSchemaValidator(ILogger<JsonSchemaValidator> logger)
    {
        _logger = logger;
    }

    public async Task<bool> IsValidJsonAsync(string jsonString)
    {
        var schema = await JsonSchema.FromJsonAsync(schemaJson);
        var errors = schema.Validate(jsonString);
        if (errors.Count > 0)
        {
            foreach (var error in errors)
            {
                _logger.LogError($"Validation error: {error}");
            }
            return false;
        }
        return true;
    }
}

You can now integrate this validator into your application's workflow, particularly before database transactions that involve JSON data are committed:

var validator = new JsonSchemaValidator();
var jsonToValidate = @"{'Material':'Wood','Color':'Red','Dimensions':['100x200','200x300']}";
bool isValid = await validator.IsValidJsonAsync(jsonToValidate);
Console.WriteLine($"Is JSON valid? {isValid}");

Integrating Validation into DbContext

To automate the validation process, extend your DbContext to include schema checks before saving changes to the database:

using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.Logging;
using System;
using System.Threading.Tasks;

public class ProductContext : DbContext
{
    private readonly JsonSchemaValidator _validator;

    public DbSet<Product> Products { get; set; }

    public ProductContext(DbContextOptions<ProductContext> options, ILogger<JsonSchemaValidator> logger)
        : base(options)
    {
        _validator = new JsonSchemaValidator(logger);
    }

    public override int SaveChanges()
    {
        ValidateJson();
        return base.SaveChanges();
    }

    public override async Task<int> SaveChangesAsync(CancellationToken cancellationToken = default)
    {
        ValidateJson();
        return await base.SaveChangesAsync(cancellationToken);
    }

    private void ValidateJson()
    {
        foreach (var entry in ChangeTracker.Entries<Product>())
        {
            if (entry.State == EntityState.Added || entry.State == EntityState.Modified)
            {
                var jsonToValidate = JsonSerializer.Serialize(entry.Entity.Specifications);
                var isValid = _validator.IsValidJsonAsync(jsonToValidate).GetAwaiter().GetResult();
                if (!isValid)
                {
                    throw new InvalidOperationException("JSON validation error");
                }
            }
        }
    }
}

Writing Validation and Integration Tests

To further ensure our application behaves as expected, especially after schema changes, we implement both unit and integration tests focusing on JSON data validation:

using Xunit;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.Logging.Abstractions;
using System.Threading.Tasks;

public class DbContextTests
{
    [Fact]
    public async Task JsonDataConformsToSchemaAfterModification()
    {
        var options = new DbContextOptionsBuilder<ProductContext>()
            .UseInMemoryDatabase(databaseName: "TestDb").Options;

        var logger = NullLogger<JsonSchemaValidator>.Instance;
        using (var context = new ProductContext(options, logger))
        {
            var product = new Product
            {
                Specifications = new Specifications { Material = "Plastic", Color = "Blue", Dimensions = new List<string> { "100x200" } }
            };
            context.Products.Add(product);
            await context.SaveChangesAsync();

            // Modify the product in a way that should fail validation
            product.Specifications.Dimensions = null; // Invalid change according to schema
            await Assert.ThrowsAsync<InvalidOperationException>(async () => await context.SaveChangesAsync());
        }
    }
}

Schema Versioning

Schema versioning is a crucial component of data governance, especially when handling JSON data, which is inherently schema-less in relational databases. Effective schema versioning helps manage changes over time, ensuring compatibility between different versions of applications and their data, and facilitating smooth transitions and updates.

Why Schema Versioning is Important

As your application evolves, so will its data requirements. New features might require changes to the data structure, or you may need to correct previously undiscovered issues in your data format. Without a robust versioning strategy, these changes can lead to data inconsistencies, break application functionality, or cause data loss.

Implementing Schema Versioning

Embedding Version Information: One straightforward method is to include a version number directly in the JSON data. This can be easily achieved by adding a version key in your JSON documents.

{
    "version": "1.0",
    "data": {
        "name": "Alice",
        "email": "[email protected]"
    }
}

Handling Different Versions: Once versioning is in place, you can write functions or application logic that checks the version of each JSON document before processing it. This allows you to handle different versions appropriately, whether by transforming data to a new format or by triggering specific business logic.

SELECT
CASE
    WHEN json_data->>'version' = '1.0' THEN process_version_1(json_data)
    WHEN json_data->>'version' = '2.0' THEN process_version_2(json_data)
    ELSE raise_error('Unsupported version')
END
FROM orders;

Migrating Data Between Versions: For data, create migration scripts that update the JSON structure in your database. These scripts can convert old versions to newer formats, ensuring that all data in the database conforms to the latest schema.

UPDATE orders
SET order_info = jsonb_set(order_info, '{version}', '"2.0"', true)
WHERE order_info->>'version' = '1.0';

Automating Version Migration in EF Core

When storing or retrieving JSON data, include logic in your application to serialize and deserialize this data along with its version. I'll present two approaches: one using a string to store JSON data directly, and another using a class to encapsulate the JSON data. Both approaches offer robust mechanisms for handling JSON data schema versions within EF Core, with each serving different use cases. The first approach gives more control over the JSON serialization process and is useful when JSON data manipulation is required outside the database context. The second approach integrates seamlessly with EF Core's features, providing a cleaner and more intuitive way to handle complex JSON data structures directly within the entity models.

Approach 1: Using String for JSON Details

This approach involves manipulating JSON data stored directly as a string in your entity model. Here, we serialize and deserialize JSON data as strings for version handling.

Updating Customer Details to a New Schema Version:

public void UpdateCustomerDetailsToNewVersion()
{
    foreach (var customer in _dbContext.Customers)
    {
        var details = JsonSerializer.Deserialize<CustomerDetails>(customer.JsonDetails);
        if (details.Version == "1.0")
        {
            UpdateToVersion2(details);
            customer.JsonDetails = JsonSerializer.Serialize(details);
        }
    }
    _dbContext.SaveChanges();
}

private void UpdateToVersion2(CustomerDetails details)
{
    details.Version = "2.0"; // Update to the new version
    // Implement modifications for the new schema version, e.g., adding new fields
}

Saving Customer Details with Version Information:

public void PersistCustomerDetails(Customer customer, CustomerDetails details)
{
    details.Version = "2.0"; // Explicitly setting the schema version
    customer.JsonDetails = JsonSerializer.Serialize(details);
    _dbContext.SaveChanges();
}

Retrieving and Handling Customer Details Based on Version:

public CustomerDetails RetrieveCustomerDetails(string jsonDetails)
{
    var details = JsonSerializer.Deserialize<CustomerDetails>(jsonDetails);
    switch (details.Version)
    {
        case "1.0":
            HandleDetailsForVersion1(details);
            break;
        case "2.0":
            HandleDetailsForVersion2(details);
            break;
        default:
            throw new InvalidOperationException("Unsupported JSON schema version encountered.");
    }
    return details;
}

private void HandleDetailsForVersion1(CustomerDetails details)
{
    // Specific logic for handling version 1.0 details
}

private void HandleDetailsForVersion2(CustomerDetails details)
{
    // Specific logic for handling version 2.0 details
}

Approach 2: Using Class for JSON Details

This approach involves using a class to manage JSON data, leveraging EF Core's capabilities to handle complex types more transparently.

Entity Model with JSON Details as a Complex Type:

public class Customer
{
    public int Id { get; set; }
    public JsonDetails Details { get; set; }
}

public class JsonDetails
{
    public string Version { get; set; }
    public string Name { get; set; }
    public string Email { get; set; }
}

Serializing and Deserializing with EF Core's Value Conversion:

protected override void OnModelCreating(ModelBuilder modelBuilder)
{
    modelBuilder.Entity<Customer>()
        .Property(e => e.Details)
        .HasConversion(
            v => JsonSerializer.Serialize(v, null),
            v => JsonSerializer.Deserialize<JsonDetails>(v, null));
}

Migration Logic Using Complex Types:

public void MigrateCustomerDetails()
{
    foreach (var customer in _dbContext.Customers)
    {
        if (customer.Details.Version == "1.0")
        {
            UpdateCustomerDetailsToVersion2(customer.Details);
        }
    }
    _dbContext.SaveChanges();
}

private void UpdateCustomerDetailsToVersion2(JsonDetails details)
{
    details.Version = "2.0"; // Update version
    // Modify additional fields as needed
}

Summary

We explored advanced techniques for optimizing queries and updates in PostgreSQL using JSONB, highlighted methods to enhance performance through strategic indexing, and delved into effective strategies for migrating, versioning, and evolving database schemas seamlessly with Entity Framework Core. See you in future articles where we will analyze other approaches related to additional techniques and advancements.

要查看或添加评论，请登录

Serhii Kokhan的更多文章

AsyncLocal vs ThreadLocal

2025年2月18日

AsyncLocal vs ThreadLocal

??Introduction Let’s be honest - working with async programming and multithreading in .NET can be a headache.
Enhancing .NET Logging with Serilog Custom Enrichers

2024年8月23日

Enhancing .NET Logging with Serilog Custom Enrichers

What Are Enrichers in Serilog? Enrichers in Serilog are components that automatically add extra context to log events…

2 条评论
Data Synchronization in Chrome Extensions

2024年8月9日

Data Synchronization in Chrome Extensions

Introduction Data synchronization in Chrome extensions is a common challenge, especially for various tools ranging from…
Dalux Build API Changelog

2024年7月12日

Dalux Build API Changelog

Dalux unfortunately does not provide an official changelog for their API updates. To help developers stay informed, I…

2 条评论
Proxy vs Reverse Proxy in the .NET 8 Universe

2024年2月29日

Proxy vs Reverse Proxy in the .NET 8 Universe

Today, we're diving into the world of proxies – but not just any proxies. We're talking about the classic proxy and its…
JSONB in PostgreSQL with EF Core

2024年1月4日

JSONB in PostgreSQL with EF Core

Introduction JSONB in PostgreSQL is a big step forward for database management. It mixes the best parts of NoSQL and…

8 条评论
Mastering the use of System.Text.Json

2023年10月10日

Mastering the use of System.Text.Json

Introduction Handling JSON data is a daily task for many developers, given its widespread use in modern applications…
Firebase Multitenancy & .NET 7

2023年10月2日

Firebase Multitenancy & .NET 7

Introduction Firebase is a leading platform for developing mobile and web applications, offering a variety of tools and…
How to use Azure Maps in Blazor

2023年4月10日

How to use Azure Maps in Blazor

Introduction Blazor, a powerful and versatile framework for building web applications, allows developers to utilize C#…
Azure SQL Database Scaling

2023年3月23日

Azure SQL Database Scaling

Introduction In today's fast-paced digital world, enterprises must be agile and scalable to remain competitive. For…

See all articles

Introduction

Enhanced Queries and Operations

Querying Items in an Array

Adding New Items to an Array

Updating Items in an Array

Joins and Group By with JSONB Fields

Performance Optimization

GIN Index Basics

JSONB Path Operations

Example of Indexing for Complex JSONB Queries

Migration Strategies

Scenario

Schema Validation

Validating JSON Schema

Integrating Validation into DbContext

Writing Validation and Integration Tests

Schema Versioning

Why Schema Versioning is Important

Implementing Schema Versioning

Automating Version Migration in EF Core

Summary

Serhii Kokhan的更多文章

AsyncLocal vs ThreadLocal

Enhancing .NET Logging with Serilog Custom Enrichers

Data Synchronization in Chrome Extensions

Dalux Build API Changelog

Proxy vs Reverse Proxy in the .NET 8 Universe

JSONB in PostgreSQL with EF Core

Mastering the use of System.Text.Json

Firebase Multitenancy & .NET 7

How to use Azure Maps in Blazor

Azure SQL Database Scaling