JSONB in PostgreSQL with EF Core - Part 2
Serhii Kokhan
Microsoft MVP??CTO & .NET/Azure Architect??Stripe Certified Professional Developer??Offering MVP Development, Legacy Migration, & Product Engineering??Expert in scalable solutions, high-load systems, and API integrations
Introduction
Welcome back to the second part of our series on using JSONB in PostgreSQL with EF Core. In our previous article, we explored the basic functionalities and benefits of JSONB in PostgreSQL. If you missed it, you can catch up here.
In this article, we delve into more advanced aspects of JSONB with EF Core, specifically focusing on:
Enhanced Queries and Operations
We'll first ensure that our orders table is well-defined, with a rich structure of nested JSONB data to simulate a complex real-world scenario:
DROP TABLE IF EXISTS orders;
CREATE TABLE orders (
id serial primary key,
order_info JSONB
);
INSERT INTO orders (order_info)
VALUES
('{
"customer": {
"name": "Alice",
"email": "[email protected]"
},
"items": [
{"name": "Laptop", "price": 1200, "quantity": 1},
{"name": "Mouse", "price": 40, "quantity": 2}
],
"payment": {
"method": "Credit Card",
"details": {
"card_type": "Visa",
"last_four": "1234"
}
}
}'::jsonb),
('{
"customer": {
"name": "Bob",
"email": "[email protected]"
},
"items": [
{"name": "Camera", "price": 850, "quantity": 1},
{"name": "Tripod", "price": 120, "quantity": 1}
],
"payment": {
"method": "Paypal",
"details": {
"email": "[email protected]"
}
}
}'::jsonb);
Querying Items in an Array
To query elements within an array, you use the jsonb_array_elements function to expand the array into a set of JSONB objects.
SELECT
order_info->'customer'->>'name' AS customer_name,
SUM((item->>'price')::int * (item->>'quantity')::int) AS total_spent
FROM
orders,
LATERAL jsonb_array_elements(order_info->'items') AS item
GROUP BY
customer_name;
This query calculates the total money spent by each customer, accounting for the quantity of each item purchased.
Adding New Items to an Array
Adds a new item, a Keyboard, to the 'items' array in the JSONB document for the order with id = 1.
UPDATE orders
SET order_info = jsonb_set(
order_info,
'{items}',
order_info->'items' || '{"name": "Keyboard", "price": 100, "quantity": 1}'::jsonb
)
WHERE id = 1;
Updating Items in an Array
Updates the quantity of the item 'Mouse' in all orders where it exists.
UPDATE orders
SET order_info = jsonb_set(
order_info,
array['items', index, 'quantity'],
'"3"'::jsonb
)
FROM (
SELECT id, index
FROM orders,
jsonb_array_elements(order_info->'items') WITH ORDINALITY arr(item, index)
WHERE item->>'name' = 'Mouse'
) sub
WHERE orders.id = sub.id;
Joins and Group By with JSONB Fields
Consider a scenario where an application tracks user session data within an audit_entries table and maintains basic user information in a users table. Each entry in audit_entries contains details about the session stored in a JSONB column named new_values. This JSONB column, among other things, records the user ID associated with each session. The users table stores user IDs, emails, and names, with the user ID stored as a GUID (UUID), a common format for uniquely identifying records.
CREATE TABLE users (
id UUID PRIMARY KEY,
email TEXT,
first_name TEXT,
last_name TEXT
);
CREATE TABLE audit_entries (
id serial PRIMARY KEY,
entity_name TEXT,
new_values JSONB
);
Extracts session-related user information by joining audit_entries with the users table, grouping by user details to count sessions per user.
SELECT
ae.new_values->>'UserId' as user_id,
u.email,
u.first_name,
u.last_name,
COUNT(*)
FROM
audit_entries as ae
JOIN
users as u ON u.id::text = ae.new_values->>'UserId'
WHERE
ae.entity_name = 'sessions'
AND ae.new_values @> '{"OperationType": "Buy"}'
GROUP BY
ae.new_values->>'UserId', u.email, u.first_name, u.last_name;
Performance Optimization
PostgreSQL provides several indexing options for JSONB data, with the Generalized Inverted Index (GIN) being the most commonly used. GIN indexes are ideal for JSONB because they efficiently handle queries that involve key existence checks and value lookups within JSONB documents.
GIN Index Basics
A GIN index is essentially a collection of key-value pairs where the key is derived from the JSONB data structure, and the value is the location or "pointer" to the row containing the JSONB value. This type of index is particularly useful for operations like @>, which checks whether one JSONB document contains another.
JSONB Path Operations
For deeply nested JSONB structures, regular GIN indexing may still require significant processing time because it needs to flatten the JSONB data to create the index. To optimize for specific JSON paths and improve query performance on nested structures, PostgreSQL allows the creation of GIN indexes using jsonb_path_ops. This operator class focuses specifically on the paths within a JSONB document, rather than all keys and values, making it more efficient for queries that involve nested data access.
Benefits of Using jsonb_path_ops:
Example of Indexing for Complex JSONB Queries
Suppose we have an orders table with a JSONB column order_info that contains detailed nested information about items, including categories.
CREATE INDEX idx_gin_order_items ON orders USING gin ((order_info->'items') jsonb_path_ops);
This index creation statement focuses on the items key within the order_info JSONB document. By using jsonb_path_ops, the index will be optimized for queries that specify paths leading to the items array.
Now, let's use this index in a query that filters orders based on specific item categories:
SELECT * FROM orders
WHERE order_info @> '{"items":[{"category": "electronics"}]}';
Migration Strategies
Let's dive directly into how to handle schema changes when working with EF Core, particularly when your database combines traditional relational columns with JSON data types in PostgreSQL.
The challenge of mixing relational and NoSQL data schemas includes:
Scenario
We have an application managing Product entities, with Specifications stored as JSON. The required schema changes are:
Step 1: Update Entity Classes
First, let's redefine our entity models to reflect the new schema requirements:
public class Specifications
{
public string Material { get; set; }
public string Color { get; set; }
public List<string> Dimensions { get; set; } = new List<string>();
}
public class Review
{
public string User { get; set; }
public string Text { get; set; } // Updated from 'Context'
public int Rating { get; set; }
public DateTime ModifiedAt { get; set; } = DateTime.UtcNow;
}
Step 2: Update DbContext
Adjust the OnModelCreating method to ensure the JSON structures and field names are updated correctly:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Product>()
.OwnsOne(p => p.Specifications, a => {
a.ToJson();
})
.OwnsMany(p => p.Reviews, a => {
a.ToJson();
a.Property(r => r.Text).HasColumnName("Text");
});
modelBuilder.Entity<Product>()
.Property(p => p.Translations)
.HasColumnType("jsonb")
.IsRequired();
}
Step 3: Create Migration Script
Generate and modify the migration script to address the changes in the JSON data:
dotnet ef migrations add UpdateSpecificationsAndReviews
Update the migration script:
protected override void Up(MigrationBuilder migrationBuilder)
{
migrationBuilder.AddColumn<DateTime>(
name: "ModifiedAt",
table: "Reviews",
nullable: false,
defaultValueSql: "CURRENT_TIMESTAMP");
migrationBuilder.Sql(
@"
UPDATE ""Products""
SET ""Specifications"" = jsonb_set(""Specifications"", '{Dimensions}', ('[' || ""Specifications""->>'Dimensions' || ']')::jsonb)
WHERE jsonb_typeof(""Specifications""->'Dimensions') = 'string';
"
);
}
protected override void Down(MigrationBuilder migrationBuilder)
{
migrationBuilder.DropColumn(
name: "ModifiedAt",
table: "Reviews");
// Handling reversal would depend on the original format
}
Step 4: Apply and Test Migration
Apply the migration and conduct thorough testing:
dotnet ef database update
Testing should ensure data integrity, check for performance issues, and verify overall application functionality.
Schema Validation
Following the migration steps outlined previously, ensuring the integrity and validity of JSON data becomes a critical concern. To tackle this, we integrate JSON schema validation directly into the Entity Framework Core DbContext and establish rigorous testing protocols. This approach not only strengthens our application's reliability but also automates error handling and validation processes.
Validating JSON Schema
To ensure that all JSON data meets specific schema requirements using EF Core and NJsonSchema, a third-party library that facilitates JSON schema validation
First, integrate NJsonSchema into your project by running the following command in your package manager console:
dotnet add package NJsonSchema
Define the JSON schema that describes the structure of your JSON data. Suppose you have a Specifications class with Material, Color, and Dimensions fields:
{
"$schema": "https://json-schema.org/draft-07/schema#",
"type": "object",
"properties": {
"Material": {"type": "string"},
"Color": {"type": "string"},
"Dimensions": {
"type": "array",
"items": {"type": "string"}
}
},
"required": ["Material", "Color", "Dimensions"]
}
Now, set up your application to validate JSON strings against the defined schema. This is done by creating a utility class that uses the schema to check if JSON strings are valid:
using NJsonSchema;
using Microsoft.Extensions.Logging;
using System;
using System.Threading.Tasks;
public class JsonSchemaValidator
{
private readonly ILogger<JsonSchemaValidator> _logger;
private readonly string schemaJson = @"{
'$schema': 'https://json-schema.org/draft-07/schema#',
'type': 'object',
'properties': {
'Material': {'type': 'string'},
'Color': {'type': 'string'},
'Dimensions': {'type': 'array', 'items': {'type': 'string'}}
},
'required': ['Material', 'Color', 'Dimensions']
}";
public JsonSchemaValidator(ILogger<JsonSchemaValidator> logger)
{
_logger = logger;
}
public async Task<bool> IsValidJsonAsync(string jsonString)
{
var schema = await JsonSchema.FromJsonAsync(schemaJson);
var errors = schema.Validate(jsonString);
if (errors.Count > 0)
{
foreach (var error in errors)
{
_logger.LogError($"Validation error: {error}");
}
return false;
}
return true;
}
}
You can now integrate this validator into your application's workflow, particularly before database transactions that involve JSON data are committed:
var validator = new JsonSchemaValidator();
var jsonToValidate = @"{'Material':'Wood','Color':'Red','Dimensions':['100x200','200x300']}";
bool isValid = await validator.IsValidJsonAsync(jsonToValidate);
Console.WriteLine($"Is JSON valid? {isValid}");
Integrating Validation into DbContext
To automate the validation process, extend your DbContext to include schema checks before saving changes to the database:
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.Logging;
using System;
using System.Threading.Tasks;
public class ProductContext : DbContext
{
private readonly JsonSchemaValidator _validator;
public DbSet<Product> Products { get; set; }
public ProductContext(DbContextOptions<ProductContext> options, ILogger<JsonSchemaValidator> logger)
: base(options)
{
_validator = new JsonSchemaValidator(logger);
}
public override int SaveChanges()
{
ValidateJson();
return base.SaveChanges();
}
public override async Task<int> SaveChangesAsync(CancellationToken cancellationToken = default)
{
ValidateJson();
return await base.SaveChangesAsync(cancellationToken);
}
private void ValidateJson()
{
foreach (var entry in ChangeTracker.Entries<Product>())
{
if (entry.State == EntityState.Added || entry.State == EntityState.Modified)
{
var jsonToValidate = JsonSerializer.Serialize(entry.Entity.Specifications);
var isValid = _validator.IsValidJsonAsync(jsonToValidate).GetAwaiter().GetResult();
if (!isValid)
{
throw new InvalidOperationException("JSON validation error");
}
}
}
}
}
Writing Validation and Integration Tests
To further ensure our application behaves as expected, especially after schema changes, we implement both unit and integration tests focusing on JSON data validation:
using Xunit;
using Microsoft.EntityFrameworkCore;
using Microsoft.Extensions.Logging.Abstractions;
using System.Threading.Tasks;
public class DbContextTests
{
[Fact]
public async Task JsonDataConformsToSchemaAfterModification()
{
var options = new DbContextOptionsBuilder<ProductContext>()
.UseInMemoryDatabase(databaseName: "TestDb").Options;
var logger = NullLogger<JsonSchemaValidator>.Instance;
using (var context = new ProductContext(options, logger))
{
var product = new Product
{
Specifications = new Specifications { Material = "Plastic", Color = "Blue", Dimensions = new List<string> { "100x200" } }
};
context.Products.Add(product);
await context.SaveChangesAsync();
// Modify the product in a way that should fail validation
product.Specifications.Dimensions = null; // Invalid change according to schema
await Assert.ThrowsAsync<InvalidOperationException>(async () => await context.SaveChangesAsync());
}
}
}
Schema Versioning
Schema versioning is a crucial component of data governance, especially when handling JSON data, which is inherently schema-less in relational databases. Effective schema versioning helps manage changes over time, ensuring compatibility between different versions of applications and their data, and facilitating smooth transitions and updates.
Why Schema Versioning is Important
As your application evolves, so will its data requirements. New features might require changes to the data structure, or you may need to correct previously undiscovered issues in your data format. Without a robust versioning strategy, these changes can lead to data inconsistencies, break application functionality, or cause data loss.
Implementing Schema Versioning
Embedding Version Information: One straightforward method is to include a version number directly in the JSON data. This can be easily achieved by adding a version key in your JSON documents.
{
"version": "1.0",
"data": {
"name": "Alice",
"email": "[email protected]"
}
}
Handling Different Versions: Once versioning is in place, you can write functions or application logic that checks the version of each JSON document before processing it. This allows you to handle different versions appropriately, whether by transforming data to a new format or by triggering specific business logic.
SELECT
CASE
WHEN json_data->>'version' = '1.0' THEN process_version_1(json_data)
WHEN json_data->>'version' = '2.0' THEN process_version_2(json_data)
ELSE raise_error('Unsupported version')
END
FROM orders;
Migrating Data Between Versions: For data, create migration scripts that update the JSON structure in your database. These scripts can convert old versions to newer formats, ensuring that all data in the database conforms to the latest schema.
UPDATE orders
SET order_info = jsonb_set(order_info, '{version}', '"2.0"', true)
WHERE order_info->>'version' = '1.0';
Automating Version Migration in EF Core
When storing or retrieving JSON data, include logic in your application to serialize and deserialize this data along with its version. I'll present two approaches: one using a string to store JSON data directly, and another using a class to encapsulate the JSON data. Both approaches offer robust mechanisms for handling JSON data schema versions within EF Core, with each serving different use cases. The first approach gives more control over the JSON serialization process and is useful when JSON data manipulation is required outside the database context. The second approach integrates seamlessly with EF Core's features, providing a cleaner and more intuitive way to handle complex JSON data structures directly within the entity models.
Approach 1: Using String for JSON Details
This approach involves manipulating JSON data stored directly as a string in your entity model. Here, we serialize and deserialize JSON data as strings for version handling.
Updating Customer Details to a New Schema Version:
public void UpdateCustomerDetailsToNewVersion()
{
foreach (var customer in _dbContext.Customers)
{
var details = JsonSerializer.Deserialize<CustomerDetails>(customer.JsonDetails);
if (details.Version == "1.0")
{
UpdateToVersion2(details);
customer.JsonDetails = JsonSerializer.Serialize(details);
}
}
_dbContext.SaveChanges();
}
private void UpdateToVersion2(CustomerDetails details)
{
details.Version = "2.0"; // Update to the new version
// Implement modifications for the new schema version, e.g., adding new fields
}
Saving Customer Details with Version Information:
public void PersistCustomerDetails(Customer customer, CustomerDetails details)
{
details.Version = "2.0"; // Explicitly setting the schema version
customer.JsonDetails = JsonSerializer.Serialize(details);
_dbContext.SaveChanges();
}
Retrieving and Handling Customer Details Based on Version:
public CustomerDetails RetrieveCustomerDetails(string jsonDetails)
{
var details = JsonSerializer.Deserialize<CustomerDetails>(jsonDetails);
switch (details.Version)
{
case "1.0":
HandleDetailsForVersion1(details);
break;
case "2.0":
HandleDetailsForVersion2(details);
break;
default:
throw new InvalidOperationException("Unsupported JSON schema version encountered.");
}
return details;
}
private void HandleDetailsForVersion1(CustomerDetails details)
{
// Specific logic for handling version 1.0 details
}
private void HandleDetailsForVersion2(CustomerDetails details)
{
// Specific logic for handling version 2.0 details
}
Approach 2: Using Class for JSON Details
This approach involves using a class to manage JSON data, leveraging EF Core's capabilities to handle complex types more transparently.
Entity Model with JSON Details as a Complex Type:
public class Customer
{
public int Id { get; set; }
public JsonDetails Details { get; set; }
}
public class JsonDetails
{
public string Version { get; set; }
public string Name { get; set; }
public string Email { get; set; }
}
Serializing and Deserializing with EF Core's Value Conversion:
protected override void OnModelCreating(ModelBuilder modelBuilder)
{
modelBuilder.Entity<Customer>()
.Property(e => e.Details)
.HasConversion(
v => JsonSerializer.Serialize(v, null),
v => JsonSerializer.Deserialize<JsonDetails>(v, null));
}
Migration Logic Using Complex Types:
public void MigrateCustomerDetails()
{
foreach (var customer in _dbContext.Customers)
{
if (customer.Details.Version == "1.0")
{
UpdateCustomerDetailsToVersion2(customer.Details);
}
}
_dbContext.SaveChanges();
}
private void UpdateCustomerDetailsToVersion2(JsonDetails details)
{
details.Version = "2.0"; // Update version
// Modify additional fields as needed
}
Summary
We explored advanced techniques for optimizing queries and updates in PostgreSQL using JSONB, highlighted methods to enhance performance through strategic indexing, and delved into effective strategies for migrating, versioning, and evolving database schemas seamlessly with Entity Framework Core. See you in future articles where we will analyze other approaches related to additional techniques and advancements.