Introduction to Neo4j Graph Database
Aneshka Goyal
AWS Certified Solutions Architect | Software Development Engineer III at Egencia, An American Express Global Business Travel Company
What is a Graph Database?
As the name suggests it's a graph stored in a database. Thus it has data stored in form of nodes and relationships rather than tables or documents. It is very close to the way we sketch graphs on a whiteboard. Where each node represents an entity and we draw relationships to link these entities together.
When do we use Graph databases?
We live in a world where we have a lot of linkages. Same goes for the data that we have around us. Sometime we want to store, traverse and analyse these linkages, while relational databases come with the possibility of JOINs but these are expensive operations and hence in these cases Graph Database becomes a good choice. It helps us store data in the form of a Graph that can be optimised for quick traversals and analysing deep linkages more accurately and also at greater speeds. Thus Graph databases come out as a great choice in the following scenarios:
Now that we have an idea of what is a graph database and when can it be a good choice to leverage, lets jump on to Neo4j(the term mentioned in the header)
What is Neo4j?
Neo4j is an open source, No SQL ,graph native database that provides ACID compliant transactional backend for our applications. It is called open source as its source code written in Java and Scala is available on Github. It is noSQL as it does not store the data in tabular format and is called graph native because it actually stores the data the way we would draw a graph on a whiteboard and does not write any abstractions over underlying tables containing data, thus gives a performance edge over other graph databases. More over unlike other No SQL Databases it provides the ACID compliance where ACID is an acronym for Atomic, Consistent, Isolation and Durability.
This was a very basic definition of neo4j. Another step in understanding neo4j would be to understand the Property Graph Model.
Neo4j's property graph model consists of nodes, relationships and properties as its building blocks. Nodes are entities in our Graph database. Each node can have zero or more labels. These labels are used to group set of nodes together for easy traversal. Each node can have any number of key value pairs called properties. Node labels can even be used to apply index and constraint on nodes, we can see these constraints same as the not null or uniqueness constraints that we are familiar with in SQL databases.
No Graph looks complete without relationship between the nodes. In Neo4j the property graph model also has relationships defined between nodes. These relationships must have a type and a direction(if no direction is specified while creating a relationship the default of left to right direction is assumed). The relationships in addition to a name or a type can have the key value pairs or properties(similar to the nodes). The nodes can have any number and type of relationships without affecting performance of the database. Also we mentioned about the relationships having a direction, but these can be traversed in any direction without compromising on the performance of the database. Like nodes, we can have constraints defined on the relationships as well.
Let's see an example of Property Graph Model. Here we have nodes labeled as User(in orange) and Booking(in purple). These represent entities in our database. The User can have a TRAVELLED_IN relationship with a booking node or a booking node can be BOOKED_BY a User. Also a user can be traveler and booker both for a booking. We can see these relationships in action in the example below.Apart from nodes and their relationships we also have properties defined for each node like user name, age etc and booking start date, end date etc to just mention a few. Below we can see the properties defined for a Booking node.
One thing that comes to our mind whenever we talk of a database is the Query language that can be used to perform the CRUD operations on the database and how the user/service will interact with the database. Neo4j has its own substitute for SQL which is called CYPHER.
Cypher is just like SQL as it allows us to focus on what needs to be retrieved rather than how part of the task. Cypher is an easy to learn query language as we are writing the query the way we would actually visualise a graph with a connections. It supports all of the CRUD interactions. A simple Match clause would look as follows.
Match (u:User{name:"ane"})<-[r:BOOKED_BY]-(b:Booking)
Return u
Here we are trying to retrieve a user node whose name is ane and has a BOOKER_BY relationship with a booking node . Here User and Booking are node labels, BOOKED_BY is the relationship type between the nodes and we have also use the variables to represent the nodes and the relationships. These variables can be used to retrieved some props for the node for example u.age will return the age of such a user. The name as "ane" is a property of the User node and here acts as a where clause to filter which user we want to get.
As we saw a simple Match clause, we will now take a glance at some commonly used Cypher clauses.
Merge - This is used to create a node/relationship and takes care of the idempotency i.e to create only if the node/relationship does not already exist. Its syntax is same as Match clause. We can use the SET clause to update or set certain properties of a node or a relationship or labels on nodes.
Create - Create clause can also be used to create new nodes / relationships but it creates duplicates if run multiple times thus not a preferred one.
Delete - Delete is used to delete the node if it's not having any relationship or reference. Its better to use Detach delete as that first detaches and then deletes a node. We can even leverage Delete to delete relationships leaving the nodes unaffected
Remove - Remove can be used to remove a property from a node or a relationship. We can also remove a prop by simply setting it to null.
Thus we are now familiar with what Neo4j is and we can interact with the data. But there must be some place where we can execute these queries or how do we connect to the database. All these are valid questions. Neo4j gives the desktop option or the browser option where we can see the data and interact with it. It also allows up to use the cypher shell to interact with the database. Neo4j also has support for drivers for languages like Java, Go, Python, .Net and Java script so that we can connect with Neo4j using applications developed in any of the above languages. Neo4j allows us to follow http. https or bolt protocols to connect to the databases.
For our example here, we would be running Neo4j in docker and accessing it on localhost 7474. We would be developing an application in Java using Spring boot framework. We would leverage Maven for dependency management.
We generated a project using spring initializr and the POM looks something like
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="https://maven.apache.org/POM/4.0.0" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="https://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.6.4</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.example</groupId>
<artifactId>neo4j-spring-boot</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>neo4j-spring-boot</name>
<description>sample application to integrate neo4j with spring boot</description>
<properties>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-neo4j</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
Dependency for Neo4j is shown below.
领英推荐
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-neo4j</artifactId>
</dependency>
We are building a simple application that would let us save and retrieve booking information. Each booking has a line of business, start date and end date. Also each booking would be associated with a booker and a traveler. Booker and Travellers are User nodes and have some information like name, age and id of the user associated.
Lets first see how does a booking entity look like
@Node("Booking")
public class Booking {
@Id
Long id;
String lob;
String startDate;
String endDate;
@Relationship(type = "TRAVELED_IN",direction = Relationship.Direction.INCOMING)
List<User> travelers;
@Relationship(type = "BOOKED_BY",direction = Relationship.Direction.OUTGOING)
User Booker;
public Long getId() {
return id;
}
public String getLob() {
return lob;
}
public String getStartDate() {
return startDate;
}
public String getEndDate() {
return endDate;
}
public void setId(Long id) {
this.id = id;
}
public void setLob(String lob) {
this.lob = lob;
}
public void setStartDate(String startDate) {
this.startDate = startDate;
}
public void setEndDate(String endDate) {
this.endDate = endDate;
}
public List<User> getTravelers() {
return travelers;
}
public void setBooker(User booker) {
Booker = booker;
}
public User getBooker() {
return Booker;
}
public void setTravelers(List<User> travelers) {
this.travelers = travelers;
}
}
Here @Node tells thats it a neo4j node with the label as Booking. @Id is the identifier for our graph node. Here we also define the 2 Relationships that booking node can have with the user node. To define the relationship we give it a type and a direction. Apart from the relationships and identifier, booking has other properties like start and end date and the line of business(lob, which can be hotel, air etc).
Here we see that a node labeled as User has 2 types of relationships with booking, user can be a traveler on the booking or can be a booker. Let's also see how the user class/node looks like.
@Node("User")
public class User {
@Id
Long tuid;
String name;
Integer age;
public Long getTuid() {
return tuid;
}
public String getName() {
return name;
}
public Integer getAge() {
return age;
}
public void setTuid(Long tuid) {
this.tuid = tuid;
}
public void setName(String name) {
this.name = name;
}
public void setAge(Integer age) {
this.age = age;
}
}
User has some properties like name, age and identifier as shown above.
After having the nodes, relationships and properties defined our Property model is ready but we need to have a repository that would have methods to interact with the database. In our case we have one for bookings which looks like the code snippet below
@Repository
public interface BookingsRepository extends Neo4jRepository<Booking, Long> {
@Query("Match (b:Booking{lob: $lob}) return b")
Collection<Booking> findBookingsByLob(String lob);
@Query("Match (u:User)<-[r:BOOKED_BY]-(b:Booking) with u,r,b Match (u)-[t:TRAVELED_IN]->(b) return u,r,t,b")
Collection<Booking> BookingsWhereBookerIsAlsoOneOfTheTravelers();
}
Here only having the interface does the work as we extend the Neo4j repository with the entity and the id. This interface has the implementation for all basic methods like save, findById etc. We have specified the queries to be executed for our custom searches. Here one thing to note is the parameterised query to search bookings by line of business. The second Query simply returns all bookings where booker is one of the travellers. The With cypher clause is used to send the results of one match clause as a starting point of another match clause.
Next we need to specify the url, username and password to connect to our database instance. This is specified in the application.properties or application.yml. Snippet of our properties looks something as below.
spring.data.neo4j.username=neo4j
spring.data.neo4j.url=bolt://localhost
spring.data.neo4j.password=test
The java driver uses the bolt protocol to connect to the Neo4j on default port 7687, using the username and password as specified here. We don't have to write any boilerplate code for such a connection as Spring boot helps us here.
Finally we have the following endpoints exposed through with we can access our application.
@RestController
@RequestMapping("/v1/bookings")
public class BookingController {
private BookingService bookingService;
public BookingController(BookingService bookingService) {
this.bookingService = bookingService;
}
@PostMapping
public void createBooking(@RequestBody Booking booking){
bookingService.createBooking(booking);
}
@GetMapping
public Collection<Booking> getBookingsWithBookerAsTraveler(){
return bookingService.getBookingsWithBookerAsATraveler();
}
@GetMapping("/{lob}")
public Collection<Booking> getBookingsForLOB(@PathVariable("lob") String lob){
return bookingService.getBookingForAnLob(lob);
}
}
First endpoint is used to create a booking, second is used to get all bookings where booker is a traveler as well and last one is used to get bookings for a particular line of business or LOB.
As we built it as a layered application, the controller has a service dependency , code of the Service class is as below
@Service
public class BookingService {
private BookingsRepository bookingsRepository;
public BookingService(BookingsRepository bookingsRepository) {
this.bookingsRepository = bookingsRepository;
}
public void createBooking(Booking booking){
bookingsRepository.save(booking);
}
public Collection<Booking> getBookingsWithBookerAsATraveler(){
return bookingsRepository.BookingsWhereBookerIsAlsoOneOfTheTravelers();
}
public Collection<Booking> getBookingForAnLob(String lob){
return bookingsRepository.findBookingsByLob(lob);
}
}
Now one thing that is left is the running instance of Neo4j graph DB that our application would want to connect to on localhost. As stated earlier we would be running neo4j in docker. Using the command
docker run -p7474:7474 -p7687:7687 --env NEO4J_AUTH=neo4j/test neo4j:latest
This runs the latest version of Neo4j and sets the auth credentials as depicted. Also it maps the port of docker to local host port i.e 7687 and 7474 ports of docker are mapped to identical localhost ports. Note the auth credentials are same as we specified in properties file in our spring application.
Once we have things up and running, we can go to localhost:7474/browser . Currently we just have two Databases i.e system and Neo4j. System serves as a db that saves metadata about other DBs. And Neo4j occurs as a default database which we would also be using. Note that we can create our own databases as well.
Lets try to create the following booking.
{
"id": 1234,
"lob": "hotel",
"startDate":"28/05/2022",
"endDate": "30/05/2022",
"travelers":[
{
"tuid":23,
"name":"aneshka",
"age": 23
},
{
"tuid":123,
"name":"ane",
"age": 23
},
{
"tuid":1245,
"name":"xyz",
"age": 23
}
],
"booker": {
"tuid":23,
"name":"aneshka",
"age": 23
}
}
The Neo4j browser looks something like this with the booking that we just created.
Now we are ready to execute the Get requests to obtain bookings by LOB or where booker is a traveler. We can create many more bookings and are good to explore!
Thus here we tried to learn and implement the basic understanding of Neo4j Graph database and see the things in action. We learnt about the Property Graph Model and how we can use the Cypher to interact with the data.
Sources of knowledge