Understanding Data and Databases 101
Alex Merced
Co-Author of “Apache Iceberg: The Definitive Guide” | Head of DevRel at Dremio | LinkedIn Learning Instructor | Tech Content Creator
No matter what kind of application you are working on in any programming language you eventually care about having data that exist beyond the running of the application. Data that doesn't disappear when a program stops running is referred to as being "persistent". Databases are essentially programs that organize large amounts of data and writes them to disk (hard drives) so they can persist.
On a small scale most databases can work well for most use cases but for large sets of data how the database works and organizes the data and searches through the data can make a big difference in performance for your application.
Not only does the database platform make a difference but also the design of your data. Thinking through your data models in a way that minimizes redundant data and structures the data in logical and easy to look up ways can make or break the scalability of your application.
Data Design
When designing how your database will structure your data, first we break up our data into different units called models. Each model is a description of what one record of data should look like.
For example, let's say you want to save mortgage applications in a database. The client filling out the application may include the following.
We could just create one "Application" model with all this data. The problem is maybe there are two applicants applying to buy the same address or living in the same address. Also, while one applicant may be applying to buy one house, the owner of the property may be filling another application for a property they are buying. This will result in the same address showing up in your data several times which over time can bloat and slow down your database software.
A better way to structure the data:
Applicant Model
name: string
phone: string
email: string
residence: address
target property: address
Address Model
street: string
state: string
zipcode: string
This way an address can be saved once as an "Address" then associated with different "Applicants" under the "residence" or "target_property" fields.
领英推荐
These are called relationships and usually come in three flavors
Other Things to remember:
In Summary:
Types of Databases
(Steve) ---brotherOf---> (John)
(John) ---brotherOf---> (Steve)
Here we can see two edges (brotherOf) detailing the relationship between the Steve and John node. Graph databases like Neo4J use Cypher Query Language (CQL) in expressing queries to the database.
There are several other databases:
Using a Database
In particular for Document and Relational databases, most programming languages have libraries called Object Document Mappers (ODM) and Object Relational Mappers (ORM). These libraries will usually take a schema of your data models and bind them an object in the programming language with several built-in methods to manipulate that particular data in the database.
So for example, if my data model is Cat, the ORM/ODM will generate an object we will save in a variable called Cat that may have functions like Cat.create or Cat.query for us to interact with Cat data. The methods built-in differ depending on the library but this is the main way to connect your application to the desired database.