Spring Data JPA: The Save Method and Its Scope of Applicability
Consider a simple Post entity. It has an id, which is generated by the database, and a title:
@Entity
@Table(name = "post")
@Setter
@Getter
public class Post {
@Id
@GeneratedValue(strategy = IDENTITY)
private Long id;
private String title;
}
We also have a PostService where we inject PostRepository, a standard Spring Data JPA repository. Within this service, there’s a changeTitle method that updates the Post title. This method is transactional, meaning the entire method operates within a single transaction. In it, we pass the id and a new title, then call the save method:
@Service
public class PostService {
private final PostRepository postRepository;
@Transactional
public void changeTitle(Long postId, String title) {
final var post = postRepository.findById(postId).orElseThrow();
post.setTitle(title);
postRespository.save(post);
}
}
I’m sure you’ve encountered such an operation in various projects. Let’s run this code and enable HQL logging. After finding the post and updating it, the results are as expected. But what potential issues could arise?
Hibernate: select post0_.id as id1_0_0_, post0_.description as descript2_0_0_, post0_.title as title3_0_0_ from post post0_ where post0_.id=? Hibernate: update post set description=?, title=? where id=?
Let’s dive into how the save method operates. EntityInformation.isNew determines if the entity is new. If so, it calls persist; otherwise, it calls merge:
@Transactional
@Override
public <S extends T> S save(S entity) {
Assert.notNull(entity, "Entity must not be null.");
if (entityInformation.isNew(entity)) {
em.persist(entity);
return entity;
} else {
return em.merge(entity);
}
}
Entity Lifecycle in JPA
Let’s review the entity lifecycle in JPA:
When you create an entity through a constructor, it is in the Transient state, meaning Hibernate does not track it. The Managed state implies that all changes to an entity within a transaction are monitored by Hibernate and will be translated into appropriate SQL queries (update, insert, delete, etc.). An entity enters the Managed state if you persist it or find it by ID (find, getReference). It’s important to note that all of this happens within a transaction.
The flush method is called before a commit and generates statements depending on which entities you’ve affected and which fields you’ve changed. There is also a Removed state: if an entity moves into it, a delete will be generated at the end of the transaction. We call the delete state via remove, and we can revert it back via persist.
Now, let’s discuss the Detached state. If your entity was in the Managed state and the transaction is closed, the entity moves into the Detached state. This means it has an ID, was once tracked, but is no longer part of a persistent context. A simple example: you commit a transaction with Post and pass it to another method. The transaction will no longer exist there, so the entity becomes Detached. This state also occurs when the transaction is completed or when the detach method is called (though detach is rarely used). To do the opposite, you call merge, for instance, when you accept Post as a method parameter and want it to be tracked by the transaction.
Key Takeaways from the Entity Lifecycle in JPA
Given this, with @Transactional, we can omit save in our service without altering the outcome, as the update will still be generated:
final var post = postRepository.findById(postId).orElseThrow();
post.setTitle(title);
The copyValues Method
Now, let’s delve into Hibernate’s source code:
protected void entityIsPersistent(MergeEvent event, Map copyCache) {
LOG.trace( "Ignoring persistent instance" );
//TODO: check that entry.getIdentifier().equals(requestedId)
final Object entity = event.getEntity();
final EventSource source = event.getSession();
final EntityPersister p = source.getEntityPersister( event.getEntityName(), entity);
( (MergeContext) copyCache ).put( entity, entity, true ); //before cascade!
cascadeOnMerge( source, persister, entity, copyCache );
copyValues( persister, entity, entity, source, copyCache );
event.setResult( entity );
}
Notice the copyValues method. Hibernate takes the base attributes of an entity, copies them into its memory, and then inserts them into the same entity. If you have many attributes, this can be time-consuming. It seems like an odd operation that just wastes cycles without adding value.
When does merge become a problem for a MANAGED entity? If you update many records within a transaction, copyValues will be called for each entity, potentially extending the total transaction time. This is especially noticeable if the entity has many large attributes.
I wondered why copyValues is even necessary. After some research, I concluded that it might be a workaround that eventually became a feature. The idea is this: if you load a collection with one-to-many relationships, you might add child objects to it. If these entities are in the Detached state, calling copyValues replaces them with the same entity in the Managed state. This is how Hibernate handles entity replacements. However, it’s unclear why all attributes need to be copied—apparently, the developers found it acceptable.
A Practical Example with UUID
Let’s look at another example, still using Post, but with the id in the form of a UUID. We’ll generate the UUID client-side, meaning it won’t be generated by the database:
@Entity
@Table(name = "post")
@Getter
@Setter
public class Post {
@Id
private UUID id;
private String title;
public static Post newPost() {
final var post = new Post();
post.setId(UUID.randomUUID());
return post;
}
}
We want to create a new post, not update an existing one. The new post is in the Transient state, so Hibernate doesn’t track it. Therefore, you need to call save. What happens in the logs?
Hibernate: select post0_.id as id1_1_0_,
post0_.description as descript2_1_0_,
post0_.title as title3_1_0_
from post post0_ where post0_.id=?
Hibernate: insert into post (description, title, id) values (?, ?, ?)
We see a Select by the same ID that we used for the insert. That is, Hibernate performs a select and then an insert. If we set a breakpoint, we’ll see that merge was called, not persist.
How isNew Works
It’s crucial to understand how EntityInformation.isNew functions. First, Spring Data checks if your entity has a version annotation. If not, or if there’s a primitive type there, it moves to the parent class. If the version attribute is present, it checks if it’s null. If it is, the entity is considered new, not retrieved from the database. Without a version attribute, this scenario doesn’t apply.
Next, Spring Data JPA checks if the entity ID is a primitive:
public boolean isNew(T entity) {
ID id = getId(entity);
Class<ID> idType = getIdType();
if (!idType.isPrimitive()) {
return id == null;
}
if (id instanceof Number) {
return ((Number) id).longValue() == 0L;
}
throw new IllegalArgumentException(
String.format("Unsupported primitive id type %s", idType)
);
}
If the ID is null, the entity is considered new. If it’s a primitive and a number, it checks if the number is zero. Otherwise, an exception is thrown, as the remaining types are char and boolean, neither of which qualify as primary keys.
Let’s summarize the isNew algorithm in Spring Data JPA:
So, reaching the fourth point, we call the merge method. Here’s its source code:
final PersistenceContext pc = source.getPersistenceContextInternal();
EntityEntry entry = persistenceContext.getEntry( ent );
if ( entry == null ) {
EntityPersister ps = source.getEntityPersister( event.getEntityName(), ent );
Serializable id = persister.getIdentifier( ent, source );
if ( id != null ) {
final EntityKey key = source.generateEntityKey( id, persister );
final Object managedEntity = persistenceContext.getEntity( key );
entry = persistenceContext.getEntry( managedEntity );
if ( entry != null ) {
es = EntityState.DETACHED;
}
}
}
if ( es == null ) {
es = EntityState.getEntityState(ent, event.getEntityName(), entry, source, false);
}
Understanding Hibernate’s Operation
Hibernate attempts to determine the state of the entity you’re operating on, whether it needs to be updated, deleted, inserted, or something else. It first tries to find the entity by reference in the PersistenceContext. If no operations were performed, the context is empty, and nothing will be found. If the reference is not found, Hibernate looks it up by ID. If you created a new post, nothing will be found there either. The last resort is to call the static method getEntityState.
final Serializable clonedIdentifier =
(Serializable) persister.getIdentifierType().deepCopy( id, source.getFactory());
final Object result = source.get( entityName, clonedIdentifier );
source.getLoadQueryInfluencers().setInternalFetchProfile( previousFetchProfile );
if ( result == null ) {
entityIsTransient( event, copyCache );
}
Here is a snippet of its extensive source code. The line with source.get is where the additional select occurs. This is how Hibernate checks if the required row is in the database. If not, it’s definitely dealing with Transient.
The Problem with Extra Selects
When might this extra select become problematic? For instance, in telemetry systems where the number of records per unit of time significantly exceeds the number of readings. Here, metrics must be updated quickly and frequently, so the select + insert combination doubles the number of queries.
One solution to this problem is to use a UUID Generator:
@Entity
@Table(name = "post_uuid")
@Getter
@Setter
public class PostWithUUID {
@Id
@GeneratedValue(generator = "UUID")
@GenericGenerator(
name = "UUID",
strategy = "org.hibernate.id.UUIDGenerator"
)
private UUID id;
private String title;
public static PostWithUUID newPost() {
return new PostWithUUID();
}
}
Spring Data sees that id = null, considers the entity new, and calls persist. It works, but with caveats:
领英推荐
I’ll explain the second point further:
@Entity
@Table(name = "post")
@Getter
@Setter
public class Post {
@EmbeddedId
private PostID id;
private String title;
public static Post newPost() {
final var post = new Post ();
post.setId(new PostID(UUID.randomUUID()));
return post;
}
@Embeddable
@EqualsAndHashcode
@Getter @AllArgsConstructor @NoArgsConstructor
public static class PostID implements Serializable {
@Column(name = "id")
private UUID value;
}
We have Post, declared the PostID class, and added the EmbeddedID annotation. This allows us to write queries like:
public interface PostRepo extends JpaRepository<Post, PostID> {
}
In HQL queries, you can operate with PostID, which has semantic meaning, rather than with an abstract id. If your application involves frequent operations with IDs, this approach can help reduce potential coding errors.
However, you can’t use a generator here—Hibernate won’t accept it. This can be circumvented using the Persistable interface. While not part of JPA, it is included in Spring Data. Essentially, it provides a custom implementation of the isNew method:
@Entity
@Table(name = "post")
@Getter
@Setter
public class Post implements Persistable<Post.PostID> {
@Transient
private transient boolean isNew
@EmbeddedId
private PostID id;
private String title;
public static Post newPost() {
final var post = new Post ();
post.setId(new PostID(UUID.randomUUID()));
post.setNew(true);
return post;
}
@Override
public boolean isNew() {
return isNew;
}
@Embeddable
@Getter @EqualsAndHashCode @AllArgsConstructor @NoArgsConstructor
public static class PostID implements Serializable {
@Column(name = "id")
private UUID value;
}
@PostLoad
@PrePersist
void trackNotNew() {
this.isNew = false;
}
}
When an entity implements the Persistable method, the standard logic ceases to work. Instead, Spring Data JPA simply asks the entity whether it’s new. If it is, persist is called; otherwise, merge. This puts the responsibility on you to determine whether the entity is new.
The isNew parameter is stored in @Transient because it’s not a column. We set the isNew field to true when creating a post, and it’s crucial to reset it to false afterward to avoid unintended effects.
All Spring Data frameworks, including JPA, inherit from CrudRepository, which has a save method. In most frameworks, this makes sense. JPA is somewhat of an exception here, but the save method still makes sense in JPA. Its logic is essentially that of an insert operation.
Imagine receiving a certain business object as a method parameter without knowing whether it already exists in the database. If it doesn’t exist, you need to insert it; if it does, you need to update it. Essentially, the code would need to do something like this:
If (select == null)
insert
else
update
The save method encapsulates this logic.
Hibernate Repository to Avoid Unnecessary Selects
What if the save method is causing issues? Let’s say you want to avoid unnecessary selects and copyValues to save time. One solution is to use a Hibernate Repository. This approach is implemented in the well-known library, Hibernate Types.
public interface HibernateRepository<T> {
<S extends T> S persist(S entity);
<S extends T> S merge(S entity);
}
We declare the interface. For simplicity, we’ll focus on two of the many available methods—persist and merge. Next, we implement the interface by delegating the calls to the EntityManager:
@Repository
class HibernateRepositoryImpl<T> implements HibernateRepository<T> {
@PersistenceContext
private EntityManager em;
@Override
public <S extends T> S persist(S entity) {
em.persist(entity);
return entity;
}
@Override
public <S extends T> S merge(S entity) {
return em.merge(entity);
}
}
Then, we extend our PostRepository using the Hibernate repository:
public interface PostRepository extends JpaRepository<Post, Long>, HibernateRepository<Post> {
}
Now we can explicitly call persist and merge. Everything seems fine, but let’s consider a more realistic example. Suppose you need to track changes in the titles of posts and archive them in a separate table before committing. Analysts can then review how the post title has changed. In the case of a rollback, you need to notify the support team—if, for example, the user was unable to change the post title. After the commit, you also need to send a message to Kafka.
Domain events are excellent for handling these tasks—a domain-driven design pattern that Spring natively supports.
@Entity
@Table(name = "post")
@Setter
@Getter
public class Post extends AbstractAggregateRoot<Post> {
@Id
@GeneratedValue(strategy = IDENTITY)
private Long id;
private String title;
public void changeTitle(String title) {
this.title = title;
registerEvent(new PostNameChanged(id));
}
}
Our entity inherits from the AbstractAggregateRoot class, which has a registerEvent method.
public class AbstractAggregateRoot<A extends AbstractAggregateRoot<A>> {
private transient final @Transient List<Object> de = new ArrayList<>();
protected <T> T registerEvent(T event) {
Assert.notNull(event, "Domain event must not be null");
this.de.add(event);
return event;
}
@AfterDomainEventPublication
protected void clearDomainEvents() {
this.domainEvents.clear();
}
@DomainEvents
protected Collection<Object> domainEvents() {
return Collections.unmodifiableList(domainEvents);
}
}
AbstractAggregateRoot stores event objects in a list. The domainEvents method is called by Spring and returns the list of events that we’ve registered. The AfterDomainEventPublication method clears the list afterward.
We can intercept events using TransactionalEventListener, a regular EventListener with the added capability of intercepting events at specific transaction stages—before and after the commit:
@TransactionalEventListener(phase = TransactionPhase.BEFORE_COMMIT)
public void archiveChanges(PostNameChanged postNameChanged) {
// code to archive changes
}
@TransactionalEventListener(phase = TransactionPhase.AFTER_COMMIT)
public void sendMessageToKafka(PostNameChanged postNameChanged) {
// code to send message to Kafka
}
When intercepting before a commit, the transaction is still active. If an exception is thrown here, the entire transaction will be rolled back. However, if you further call the persist and merge methods, the events will not be intercepted:
@Transactional
public void changeTitle(Long postId, String title) {
final var post = postRepository.findById(postId).orElseThrow();
post.changeTitle(title);
}
On the other hand, if you add a method, such as save, the events will be intercepted and published:
@Transactional
public void changeTitle(Long postId, String title) {
final var post = postRepository.findById(postId).orElseThrow();
post.changeTitle(title);
postRepository.save(post);
}
How to Integrate Hibernate with Domain Events
I know of four methods, but I’ll focus on the two most useful ones. The first involves passing ApplicationEventPublisher as a parameter:
@Entity
@Table(name = "post")
@Setter
@Getter
public class Post {
@Id
@GeneratedValue(strategy = IDENTITY)
private Long id;
private String title;
public void changeTitle(String title, ApplicationEventPublisher eventPublisher) {
this.title = title;
eventPublisher.publishEvent(new PostNameChanged(id));
}
}
The advantage of this approach is that you can write unit tests for the entity, creating a rich domain model following the principles of domain-driven design. The downside is that the service accessing the entity must always inject ApplicationEventPublisher.
The second method involves using a static DomainEventPublisher:
@Entity
@Table(name = "post")
@Setter
@Getter
public class Post {
@Id
@GeneratedValue(strategy = IDENTITY)
private Long id;
private String title;
public void changeTitle(String title) {
this.title = title;
DomainEventPublisher.publish(new PostNameChanged(id))
}
}
Here, we declare a standard Spring Bean. In the setEventPublisher method, we write ApplicationEventPublisher to a static variable, making it available through the static publishEvent method:
@Component
public class DomainEventPublisher {
private static volatile ApplicationEventPublisher publisher;
@Autowired
private void setEventPublisher(ApplicationEventPublisher eventPublisher) {
publisher = eventPublisher;
}
public static void publish(Object event) {
Assert.notNull(publisher, "ApplicationEventPublisher is null. Check the
configuration");
publisher.publishEvent(event);
}
}
The advantage is that there’s no need to explicitly inject ApplicationEventPublisher because it’s encapsulated in one place. The downside is that this approach violates the Inversion of Control principle: you’re using Spring but then immediately bypassing it. Additionally, you won’t be able to test the entity separately from Spring, as the entity directly accesses DomainEventPublisher, which is tied to Spring.
Evaluating Both Options
The first method doesn’t suit our needs because injecting a bean into each service that works with an entity negates the benefits. The second method, while violating the Inversion of Control principle, has its justifications:
Is It Worth Implementing All This?
This is the cornerstone question of the entire post. It’s worth it if you’re updating many entities in a transaction or multiple attributes and calling save could backfire due to copyValues. Or if you’re working on a telemetry system with numerous inserts per second, and it’s crucial that every call to save guarantees an insert without any additional queries.
Conclusion
Spring Data is a great framework, and save works well within it. But there are points to consider. If you notice extra selects in the logs or a transaction taking longer than expected, you’ll know where to dig, whether it’s worth the effort, or whether you can simply rely on abstractions.