登录查看更多内容

Spring Data JPA: The Save Method and Its Scope of Applicability

Allan Crowley

Software Engineer at Identiq

发布日期: 2024年8月17日

Consider a simple Post entity. It has an id, which is generated by the database, and a title:

@Entity
@Table(name = "post")
@Setter
@Getter
public class Post {
    @Id
    @GeneratedValue(strategy = IDENTITY)
    private Long id;
    private String title;
}

We also have a PostService where we inject PostRepository, a standard Spring Data JPA repository. Within this service, there’s a changeTitle method that updates the Post title. This method is transactional, meaning the entire method operates within a single transaction. In it, we pass the id and a new title, then call the save method:

@Service
public class PostService {
    private final PostRepository postRepository;

    @Transactional
    public void changeTitle(Long postId, String title) {
        final var post = postRepository.findById(postId).orElseThrow();
        post.setTitle(title);
        postRespository.save(post);
    }
}

I’m sure you’ve encountered such an operation in various projects. Let’s run this code and enable HQL logging. After finding the post and updating it, the results are as expected. But what potential issues could arise?

Hibernate: select post0_.id as id1_0_0_, post0_.description as descript2_0_0_, post0_.title as title3_0_0_ from post post0_ where post0_.id=? Hibernate: update post set description=?, title=? where id=?

Let’s dive into how the save method operates. EntityInformation.isNew determines if the entity is new. If so, it calls persist; otherwise, it calls merge:

@Transactional
@Override
public <S extends T> S save(S entity) {
    Assert.notNull(entity, "Entity must not be null.");
    if (entityInformation.isNew(entity)) {
        em.persist(entity);
        return entity;
    } else {
        return em.merge(entity);
    }
}

Entity Lifecycle in JPA

Let’s review the entity lifecycle in JPA:

When you create an entity through a constructor, it is in the Transient state, meaning Hibernate does not track it. The Managed state implies that all changes to an entity within a transaction are monitored by Hibernate and will be translated into appropriate SQL queries (update, insert, delete, etc.). An entity enters the Managed state if you persist it or find it by ID (find, getReference). It’s important to note that all of this happens within a transaction.

The flush method is called before a commit and generates statements depending on which entities you’ve affected and which fields you’ve changed. There is also a Removed state: if an entity moves into it, a delete will be generated at the end of the transaction. We call the delete state via remove, and we can revert it back via persist.

Now, let’s discuss the Detached state. If your entity was in the Managed state and the transaction is closed, the entity moves into the Detached state. This means it has an ID, was once tracked, but is no longer part of a persistent context. A simple example: you commit a transaction with Post and pass it to another method. The transaction will no longer exist there, so the entity becomes Detached. This state also occurs when the transaction is completed or when the detach method is called (though detach is rarely used). To do the opposite, you call merge, for instance, when you accept Post as a method parameter and want it to be tracked by the transaction.

Key Takeaways from the Entity Lifecycle in JPA

Dirty Checking: This mechanism checks the state of an entity and generates the necessary queries without requiring the save method.
Merge Operation: For a PERSISTED entity, the merge operation doesn’t impact the final result. When save is called, merge is invoked.

Given this, with @Transactional, we can omit save in our service without altering the outcome, as the update will still be generated:

final var post = postRepository.findById(postId).orElseThrow();
post.setTitle(title);

The copyValues Method

Now, let’s delve into Hibernate’s source code:

protected void entityIsPersistent(MergeEvent event, Map copyCache) {
    LOG.trace( "Ignoring persistent instance" );

    //TODO: check that entry.getIdentifier().equals(requestedId)
    final Object entity = event.getEntity();
    final EventSource source = event.getSession();
    final EntityPersister p = source.getEntityPersister( event.getEntityName(), entity);
    ( (MergeContext) copyCache ).put( entity, entity, true ); //before cascade!
    cascadeOnMerge( source, persister, entity, copyCache );
    copyValues( persister, entity, entity, source, copyCache );
    event.setResult( entity );
}

Notice the copyValues method. Hibernate takes the base attributes of an entity, copies them into its memory, and then inserts them into the same entity. If you have many attributes, this can be time-consuming. It seems like an odd operation that just wastes cycles without adding value.

When does merge become a problem for a MANAGED entity? If you update many records within a transaction, copyValues will be called for each entity, potentially extending the total transaction time. This is especially noticeable if the entity has many large attributes.

I wondered why copyValues is even necessary. After some research, I concluded that it might be a workaround that eventually became a feature. The idea is this: if you load a collection with one-to-many relationships, you might add child objects to it. If these entities are in the Detached state, calling copyValues replaces them with the same entity in the Managed state. This is how Hibernate handles entity replacements. However, it’s unclear why all attributes need to be copied—apparently, the developers found it acceptable.

A Practical Example with UUID

Let’s look at another example, still using Post, but with the id in the form of a UUID. We’ll generate the UUID client-side, meaning it won’t be generated by the database:

@Entity
@Table(name = "post")
@Getter
@Setter
public class Post {
    @Id
    private UUID id;
    private String title;

    public static Post newPost() {
        final var post = new Post();
        post.setId(UUID.randomUUID());
        return post;
    }
}

We want to create a new post, not update an existing one. The new post is in the Transient state, so Hibernate doesn’t track it. Therefore, you need to call save. What happens in the logs?

Hibernate: select post0_.id as id1_1_0_,
post0_.description as descript2_1_0_,
post0_.title as title3_1_0_
from post post0_ where post0_.id=?
Hibernate: insert into post (description, title, id) values (?, ?, ?)

We see a Select by the same ID that we used for the insert. That is, Hibernate performs a select and then an insert. If we set a breakpoint, we’ll see that merge was called, not persist.

How isNew Works

It’s crucial to understand how EntityInformation.isNew functions. First, Spring Data checks if your entity has a version annotation. If not, or if there’s a primitive type there, it moves to the parent class. If the version attribute is present, it checks if it’s null. If it is, the entity is considered new, not retrieved from the database. Without a version attribute, this scenario doesn’t apply.

Next, Spring Data JPA checks if the entity ID is a primitive:

public boolean isNew(T entity) {

    ID id = getId(entity);
    Class<ID> idType = getIdType();

    if (!idType.isPrimitive()) {
        return id == null;
    }

    if (id instanceof Number) {
        return ((Number) id).longValue() == 0L;
    }

    throw new IllegalArgumentException(
        String.format("Unsupported primitive id type %s", idType)
    );
}

If the ID is null, the entity is considered new. If it’s a primitive and a number, it checks if the number is zero. Otherwise, an exception is thrown, as the remaining types are char and boolean, neither of which qualify as primary keys.

Let’s summarize the isNew algorithm in Spring Data JPA:

New if the @Version attribute is present and null.
New if the ID is null.
New if the ID is primitive and equals 0.
Otherwise, it’s not new.

So, reaching the fourth point, we call the merge method. Here’s its source code:

final PersistenceContext pc = source.getPersistenceContextInternal();
EntityEntry entry = persistenceContext.getEntry( ent );
if ( entry == null ) {
    EntityPersister ps = source.getEntityPersister( event.getEntityName(), ent );
    Serializable id = persister.getIdentifier( ent, source );
    if ( id != null ) {
        final EntityKey key = source.generateEntityKey( id, persister );
        final Object managedEntity = persistenceContext.getEntity( key );
        entry = persistenceContext.getEntry( managedEntity );
        if ( entry != null ) {
            es = EntityState.DETACHED;
        }
    }
}

if ( es == null ) {
    es = EntityState.getEntityState(ent, event.getEntityName(), entry, source, false);
}

Understanding Hibernate’s Operation

Hibernate attempts to determine the state of the entity you’re operating on, whether it needs to be updated, deleted, inserted, or something else. It first tries to find the entity by reference in the PersistenceContext. If no operations were performed, the context is empty, and nothing will be found. If the reference is not found, Hibernate looks it up by ID. If you created a new post, nothing will be found there either. The last resort is to call the static method getEntityState.

final Serializable clonedIdentifier =
  (Serializable) persister.getIdentifierType().deepCopy( id, source.getFactory());
final Object result = source.get( entityName, clonedIdentifier );
source.getLoadQueryInfluencers().setInternalFetchProfile( previousFetchProfile );

if ( result == null ) {
    entityIsTransient( event, copyCache );
}

Here is a snippet of its extensive source code. The line with source.get is where the additional select occurs. This is how Hibernate checks if the required row is in the database. If not, it’s definitely dealing with Transient.

The Problem with Extra Selects

When might this extra select become problematic? For instance, in telemetry systems where the number of records per unit of time significantly exceeds the number of readings. Here, metrics must be updated quickly and frequently, so the select + insert combination doubles the number of queries.

One solution to this problem is to use a UUID Generator:

@Entity
@Table(name = "post_uuid")
@Getter
@Setter
public class PostWithUUID {
    @Id
    @GeneratedValue(generator = "UUID")
    @GenericGenerator(
        name = "UUID",
        strategy = "org.hibernate.id.UUIDGenerator"
    )
    private UUID id;

    private String title;

    public static PostWithUUID newPost() {
        return new PostWithUUID();
    }
}

Spring Data sees that id = null, considers the entity new, and calls persist. It works, but with caveats:

You lose the benefit of a predefined ID for equals/hashCode.
You cannot use EmbeddedUUID.

领英推荐

SQL at 50: What Lies Ahead for the Structured Query…

TechScope 9 个月前

Choosing Between ORM and Direct SQL Queries: A…

Maziv Technologies Limited 8 个月前

Put your SQL directly into your HTTP endpoint code

AISTA 2 年前

I’ll explain the second point further:

@Entity
@Table(name = "post")
@Getter
@Setter
public class Post {
    @EmbeddedId
    private PostID id;
    private String title;

    public static Post newPost() {
        final var post = new Post ();
        post.setId(new PostID(UUID.randomUUID()));
        return post;
    }

@Embeddable
@EqualsAndHashcode
@Getter @AllArgsConstructor @NoArgsConstructor
public static class PostID implements Serializable {

    @Column(name = "id")
    private UUID value;
}

We have Post, declared the PostID class, and added the EmbeddedID annotation. This allows us to write queries like:

public interface PostRepo extends JpaRepository<Post, PostID> {
}

In HQL queries, you can operate with PostID, which has semantic meaning, rather than with an abstract id. If your application involves frequent operations with IDs, this approach can help reduce potential coding errors.

However, you can’t use a generator here—Hibernate won’t accept it. This can be circumvented using the Persistable interface. While not part of JPA, it is included in Spring Data. Essentially, it provides a custom implementation of the isNew method:

@Entity
@Table(name = "post")
@Getter
@Setter
public class Post implements Persistable<Post.PostID> {
    @Transient
    private transient boolean isNew
    @EmbeddedId
    private PostID id;
    private String title;

    public static Post newPost() {
        final var post = new Post ();
        post.setId(new PostID(UUID.randomUUID()));
        post.setNew(true);
        return post;
    }

    @Override
    public boolean isNew() {
        return isNew;
    }

    @Embeddable
    @Getter @EqualsAndHashCode @AllArgsConstructor @NoArgsConstructor
    public static class PostID implements Serializable {
        @Column(name = "id")
        private UUID value;
    }

    @PostLoad
    @PrePersist
    void trackNotNew() {
        this.isNew = false;
    }
}

When an entity implements the Persistable method, the standard logic ceases to work. Instead, Spring Data JPA simply asks the entity whether it’s new. If it is, persist is called; otherwise, merge. This puts the responsibility on you to determine whether the entity is new.

The isNew parameter is stored in @Transient because it’s not a column. We set the isNew field to true when creating a post, and it’s crucial to reset it to false afterward to avoid unintended effects.

All Spring Data frameworks, including JPA, inherit from CrudRepository, which has a save method. In most frameworks, this makes sense. JPA is somewhat of an exception here, but the save method still makes sense in JPA. Its logic is essentially that of an insert operation.

Imagine receiving a certain business object as a method parameter without knowing whether it already exists in the database. If it doesn’t exist, you need to insert it; if it does, you need to update it. Essentially, the code would need to do something like this:

If (select == null)
  insert
else
  update

The save method encapsulates this logic.

Hibernate Repository to Avoid Unnecessary Selects

What if the save method is causing issues? Let’s say you want to avoid unnecessary selects and copyValues to save time. One solution is to use a Hibernate Repository. This approach is implemented in the well-known library, Hibernate Types.

public interface HibernateRepository<T> {
     <S extends T> S persist(S entity);
    <S extends T> S merge(S entity);
}

We declare the interface. For simplicity, we’ll focus on two of the many available methods—persist and merge. Next, we implement the interface by delegating the calls to the EntityManager:

@Repository
class HibernateRepositoryImpl<T> implements HibernateRepository<T> {
    @PersistenceContext
    private EntityManager em;

    @Override
    public <S extends T> S persist(S entity) {
        em.persist(entity);
        return entity;
    }

    @Override
    public <S extends T> S merge(S entity) {
        return em.merge(entity);
     }
}

Then, we extend our PostRepository using the Hibernate repository:

public interface PostRepository extends JpaRepository<Post, Long>, HibernateRepository<Post> {
}

Now we can explicitly call persist and merge. Everything seems fine, but let’s consider a more realistic example. Suppose you need to track changes in the titles of posts and archive them in a separate table before committing. Analysts can then review how the post title has changed. In the case of a rollback, you need to notify the support team—if, for example, the user was unable to change the post title. After the commit, you also need to send a message to Kafka.

Domain events are excellent for handling these tasks—a domain-driven design pattern that Spring natively supports.

@Entity
@Table(name = "post")
@Setter
@Getter
public class Post extends AbstractAggregateRoot<Post> {
    @Id
    @GeneratedValue(strategy = IDENTITY)
    private Long id;
    private String title;

    public void changeTitle(String title) {
        this.title = title;
        registerEvent(new PostNameChanged(id));
    }
}

Our entity inherits from the AbstractAggregateRoot class, which has a registerEvent method.

public class AbstractAggregateRoot<A extends AbstractAggregateRoot<A>> {
    private transient final @Transient List<Object> de = new ArrayList<>();
    protected <T> T registerEvent(T event) {

        Assert.notNull(event, "Domain event must not be null");
        this.de.add(event);
        return event;
    }

    @AfterDomainEventPublication
    protected void clearDomainEvents() {
        this.domainEvents.clear();
    }

    @DomainEvents
    protected Collection<Object> domainEvents() {
        return Collections.unmodifiableList(domainEvents);
    }
}

AbstractAggregateRoot stores event objects in a list. The domainEvents method is called by Spring and returns the list of events that we’ve registered. The AfterDomainEventPublication method clears the list afterward.

We can intercept events using TransactionalEventListener, a regular EventListener with the added capability of intercepting events at specific transaction stages—before and after the commit:

@TransactionalEventListener(phase = TransactionPhase.BEFORE_COMMIT)
public void archiveChanges(PostNameChanged postNameChanged) {
     // code to archive changes
}

@TransactionalEventListener(phase = TransactionPhase.AFTER_COMMIT)
public void sendMessageToKafka(PostNameChanged postNameChanged) {
    // code to send message to Kafka
}

When intercepting before a commit, the transaction is still active. If an exception is thrown here, the entire transaction will be rolled back. However, if you further call the persist and merge methods, the events will not be intercepted:

@Transactional
public void changeTitle(Long postId, String title) {
    final var post = postRepository.findById(postId).orElseThrow();
    post.changeTitle(title);
}

On the other hand, if you add a method, such as save, the events will be intercepted and published:

@Transactional
public void changeTitle(Long postId, String title) {
    final var post = postRepository.findById(postId).orElseThrow();
    post.changeTitle(title);
    postRepository.save(post);
}

How to Integrate Hibernate with Domain Events

I know of four methods, but I’ll focus on the two most useful ones. The first involves passing ApplicationEventPublisher as a parameter:

@Entity
@Table(name = "post")
@Setter
@Getter
public class Post {
    @Id
    @GeneratedValue(strategy = IDENTITY)
    private Long id;
    private String title;

    public void changeTitle(String title, ApplicationEventPublisher eventPublisher) {
        this.title = title;
        eventPublisher.publishEvent(new PostNameChanged(id));
    }
}

The advantage of this approach is that you can write unit tests for the entity, creating a rich domain model following the principles of domain-driven design. The downside is that the service accessing the entity must always inject ApplicationEventPublisher.

The second method involves using a static DomainEventPublisher:

@Entity
@Table(name = "post")
@Setter
@Getter
public class Post {
    @Id
    @GeneratedValue(strategy = IDENTITY)
    private Long id;
    private String title;

    public void changeTitle(String title) {
        this.title = title;
        DomainEventPublisher.publish(new PostNameChanged(id))
    }
}

Here, we declare a standard Spring Bean. In the setEventPublisher method, we write ApplicationEventPublisher to a static variable, making it available through the static publishEvent method:

@Component
public class DomainEventPublisher {
    private static volatile ApplicationEventPublisher publisher;

    @Autowired
    private void setEventPublisher(ApplicationEventPublisher eventPublisher) {
        publisher = eventPublisher;
    }

    public static void publish(Object event) {
        Assert.notNull(publisher, "ApplicationEventPublisher is null. Check the
            configuration");
        publisher.publishEvent(event);
    }
}

The advantage is that there’s no need to explicitly inject ApplicationEventPublisher because it’s encapsulated in one place. The downside is that this approach violates the Inversion of Control principle: you’re using Spring but then immediately bypassing it. Additionally, you won’t be able to test the entity separately from Spring, as the entity directly accesses DomainEventPublisher, which is tied to Spring.

Evaluating Both Options

The first method doesn’t suit our needs because injecting a bean into each service that works with an entity negates the benefits. The second method, while violating the Inversion of Control principle, has its justifications:

In the Spring + Hibernate combination, it’s unlikely that Spring will disappear.
ApplicationEventPublisher is part of the infrastructure where polymorphism isn’t required. Spring provides the implementation, and you simply use it.

Is It Worth Implementing All This?

This is the cornerstone question of the entire post. It’s worth it if you’re updating many entities in a transaction or multiple attributes and calling save could backfire due to copyValues. Or if you’re working on a telemetry system with numerous inserts per second, and it’s crucial that every call to save guarantees an insert without any additional queries.

Conclusion

The save method in Spring Data JPA can cause performance issues.
Abandoning it in favor of explicit persist/merge provides better control but adds complexity in the form of a low-level API.
There are several ways to use domain events when avoiding save; choose wisely.
Don’t blindly rely on abstractions—study the details “under the hood.”
In 99% of real projects, you won’t need this, as save covers all necessary cases.

Spring Data is a great framework, and save works well within it. But there are points to consider. If you notice extra selects in the logs or a transaction taking longer than expected, you’ll know where to dig, whether it’s worth the effort, or whether you can simply rely on abstractions.

要查看或添加评论，请登录

Allan Crowley的更多文章

Mastering Multithreading in Java: Part 17 – Reactive Programming

2024年12月28日

Mastering Multithreading in Java: Part 17 – Reactive Programming

Reactive programming has emerged as a cornerstone in building highly scalable, resilient, and responsive applications…

2 条评论
Mastering Multithreading in Java: Part 16 – Fork/Join Framework and Work-Stealing

2024年11月27日

Mastering Multithreading in Java: Part 16 – Fork/Join Framework and Work-Stealing

Mastering Multithreading in Java: Fork/Join Framework and Work-Stealing In the realm of modern software development…
AsyncAPI: The Swagger for Asynchronous Communication

2024年11月26日

AsyncAPI: The Swagger for Asynchronous Communication

In the world of software integration, REST APIs have long enjoyed a prominent place. Standard HTTP methods and…
Optimizing Multithreading in Node.js: A Practical Guide

2024年11月23日

Optimizing Multithreading in Node.js: A Practical Guide

Node.js is widely known for its efficient, single-threaded event loop, but did you know it also supports…
Mastering Multithreading in Java: Part 15 – Callable, Future, and Asynchronous Computations

2024年11月23日

Mastering Multithreading in Java: Part 15 – Callable, Future, and Asynchronous Computations

Introduction In modern Java multithreading, executing tasks asynchronously and retrieving their results efficiently is…
Mastering Multithreading in Java: Part 14 – Understanding Synchronizers for Coordinated Thread Management

2024年10月25日

Mastering Multithreading in Java: Part 14 – Understanding Synchronizers for Coordinated Thread Management

Introduction In Java’s multithreading ecosystem, managing thread coordination and task synchronization can become a…
Mastering Multithreading in Java: Part 13 – Understanding Executors for Task Management

2024年10月16日

Mastering Multithreading in Java: Part 13 – Understanding Executors for Task Management

Introduction In Java’s multithreading world, managing how and when tasks are executed can be tricky, especially when…
Mastering Multithreading in Java: Part 12 – Unlocking Thread Pools for Efficient Task Execution

2024年10月8日

Mastering Multithreading in Java: Part 12 – Unlocking Thread Pools for Efficient Task Execution

Introduction In a multithreaded environment, efficiently managing and reusing threads becomes crucial for performance…
Mastering Multithreading in Java: Part 11 – Exploring BlockingQueue for Task Scheduling and Coordination

2024年10月5日

Mastering Multithreading in Java: Part 11 – Exploring BlockingQueue for Task Scheduling and Coordination

Introduction In the landscape of multithreaded programming, managing task handoff between producer and consumer threads…
Mastering Multithreading in Java: Part 10 – Understanding Concurrent Collections

2024年10月1日

Mastering Multithreading in Java: Part 10 – Understanding Concurrent Collections

In the world of multithreading, ensuring safe and efficient access to shared resources is critical. While we’ve…

See all articles

Spring Data JPA: The Save Method and Its Scope of Applicability

Allan Crowley

Software Engineer at Identiq

Entity Lifecycle in JPA

Key Takeaways from the Entity Lifecycle in JPA

The copyValues Method

A Practical Example with UUID

How isNew Works

Understanding Hibernate’s Operation

The Problem with Extra Selects

领英推荐

Hibernate Repository to Avoid Unnecessary Selects

How to Integrate Hibernate with Domain Events

Evaluating Both Options

Is It Worth Implementing All This?

Conclusion

Allan Crowley的更多文章

社区洞察

其他会员也浏览了

Spring Boot Tips, Tricks, and Techniques

Improving Legacy Code: Using Task Queue to Speed Up a Crawler in an ETL Process

Spring Data JPA Part 2: Mastering Entity Relationships and Repository Queries

Efficiently Managing Employee Records Using Azure SQL and Python

Spring Data JPA Simplified: Building the Foundations

Best Practice using Spring Data JPA

Mastering Spring Data JPA – Pagination, Sorting & Custom Queries

RAW SQL vs. ORM: The Cost of Control in Database Queries

Spring Boot Projections Uncovered: How to Fetch Just What You Need

SQL Renaissance (ML4Devs Newsletter, Issue 17)

Entity Lifecycle in JPA

Key Takeaways from the Entity Lifecycle in JPA

The copyValues Method

A Practical Example with UUID

How isNew Works

Understanding Hibernate’s Operation

The Problem with Extra Selects

领英推荐

Hibernate Repository to Avoid Unnecessary Selects

How to Integrate Hibernate with Domain Events

Evaluating Both Options

Is It Worth Implementing All This?

Conclusion

Allan Crowley的更多文章

Mastering Multithreading in Java: Part 17 – Reactive Programming

Mastering Multithreading in Java: Part 16 – Fork/Join Framework and Work-Stealing

AsyncAPI: The Swagger for Asynchronous Communication

Optimizing Multithreading in Node.js: A Practical Guide

Mastering Multithreading in Java: Part 15 – Callable, Future, and Asynchronous Computations

Mastering Multithreading in Java: Part 14 – Understanding Synchronizers for Coordinated Thread Management

Mastering Multithreading in Java: Part 13 – Understanding Executors for Task Management

Mastering Multithreading in Java: Part 12 – Unlocking Thread Pools for Efficient Task Execution

Mastering Multithreading in Java: Part 11 – Exploring BlockingQueue for Task Scheduling and Coordination

Mastering Multithreading in Java: Part 10 – Understanding Concurrent Collections

社区洞察

其他会员也浏览了

Spring Boot Tips, Tricks, and Techniques

Improving Legacy Code: Using Task Queue to Speed Up a Crawler in an ETL Process

Spring Data JPA Part 2: Mastering Entity Relationships and Repository Queries

Efficiently Managing Employee Records Using Azure SQL and Python

Spring Data JPA Simplified: Building the Foundations

Best Practice using Spring Data JPA

Mastering Spring Data JPA – Pagination, Sorting & Custom Queries

RAW SQL vs. ORM: The Cost of Control in Database Queries

Spring Boot Projections Uncovered: How to Fetch Just What You Need

SQL Renaissance (ML4Devs Newsletter, Issue 17)