登录查看更多内容

Understanding Data and Databases 101

Alex Merced

Co-Author of “Apache Iceberg: The Definitive Guide” | Head of DevRel at Dremio | LinkedIn Learning Instructor | Tech Content Creator

发布日期: 2021年8月4日

No matter what kind of application you are working on in any programming language you eventually care about having data that exist beyond the running of the application. Data that doesn't disappear when a program stops running is referred to as being "persistent". Databases are essentially programs that organize large amounts of data and writes them to disk (hard drives) so they can persist.

On a small scale most databases can work well for most use cases but for large sets of data how the database works and organizes the data and searches through the data can make a big difference in performance for your application.

Not only does the database platform make a difference but also the design of your data. Thinking through your data models in a way that minimizes redundant data and structures the data in logical and easy to look up ways can make or break the scalability of your application.

Data Design

When designing how your database will structure your data, first we break up our data into different units called models. Each model is a description of what one record of data should look like.

For example, let's say you want to save mortgage applications in a database. The client filling out the application may include the following.

Their name and personal info
Their current address
The address they are purchasing

We could just create one "Application" model with all this data. The problem is maybe there are two applicants applying to buy the same address or living in the same address. Also, while one applicant may be applying to buy one house, the owner of the property may be filling another application for a property they are buying. This will result in the same address showing up in your data several times which over time can bloat and slow down your database software.

A better way to structure the data:

Applicant Model

name: string
phone: string
email: string
residence: address
target property: address

Address Model

street: string
state: string
zipcode: string

This way an address can be saved once as an "Address" then associated with different "Applicants" under the "residence" or "target_property" fields.

领英推荐

What is SQL And How Long Does It Take to Learn It?

LearnSQL.com 8 个月前

Data Types

Mueen Yousuf, MBA, MSc, MPHIL 2 年前

Why SQL Projects Are Essential For Building…

Gitesh Trivedi 1 年前

These are called relationships and usually come in three flavors

One to One: Every X has a Y, and every Y has an X
One to Many: Every X has many Ys, and every Y has one X
Many to Many: Every X can have many Y's, Every Y can have many X's

Other Things to remember:

Model names are always singular and uppercase (Cat), while a single record standing alone is lowercase (cat)
The collection or tables of data are named the plural form of the model lowercase (cats)

In Summary:

Cat: A description of what a cat is (The Model)
cat: Sniffles the cat who is 8 years old (instance/record/document)
cats: The collection of data on all items that are a Cat. (collection/table)

Types of Databases

Relational Databases (SQL Databases)
These kinds of databases structure data in set columns and rows, so each collection (table) must have a schema (list of columns) before saving individual units of data (records). Almost all relational databases use Structured Query Language (SQL) as the language for talking to the database to create, retrieve, update and delete data.
Document Databases (Mongo, DynamoDB, CosmosDB, Firestore)
These kinds of databases save the data in a schema-less text representation. Because of the more flexible data shape, you can save a lot more data in a lot less space but doesn't have the same benefits of outlining relationships in your data that a relational database would have. Document databases save data in documents (one cat) in a collection (cats). Different document databases have syntax specific to their platform for formulating queries, no unifying language like SQL/CQL here.
Graph Databases don't cluster data into collections but instead, every unit of data is a free-standing node. Instead of grouping data, relationships between individual nodes are details by creating edges. Let's lay there is one node representing "John" and another "Steve you may see something like this.

(Steve) ---brotherOf--->  (John)
(John) ---brotherOf--->  (Steve)

Here we can see two edges (brotherOf) detailing the relationship between the Steve and John node. Graph databases like Neo4J use Cypher Query Language (CQL) in expressing queries to the database.

There are several other databases:

Memory Databases (redis)
Time-series databases
Key/Value Stores
Fireship: 7 Types of Databases

Using a Database

In particular for Document and Relational databases, most programming languages have libraries called Object Document Mappers (ODM) and Object Relational Mappers (ORM). These libraries will usually take a schema of your data models and bind them an object in the programming language with several built-in methods to manipulate that particular data in the database.

So for example, if my data model is Cat, the ORM/ODM will generate an object we will save in a variable called Cat that may have functions like Cat.create or Cat.query for us to interact with Cat data. The methods built-in differ depending on the library but this is the main way to connect your application to the desired database.

要查看或添加评论，请登录

Alex Merced的更多文章

Iceberg REST Catalog Overview #11 — Managing Tables

2025年3月25日

Iceberg REST Catalog Overview #11 — Managing Tables

Register for 2025 Apache Iceberg Summit Free Copy of Apache Iceberg: The Definitive Guide Free Apache Iceberg Course…
Iceberg REST Catalog Overview #10 — Registering Tables with the Catalog

2025年3月20日

Iceberg REST Catalog Overview #10 — Registering Tables with the Catalog

Register for 2025 Apache Iceberg Summit Free Copy of Apache Iceberg: The Definitive Guide Free Apache Iceberg Course…
Iceberg REST Catalog Overview #9 — Fetching Scan Plan Tasks

2025年3月18日

Iceberg REST Catalog Overview #9 — Fetching Scan Plan Tasks

Register for 2025 Apache Iceberg Summit Free Copy of Apache Iceberg: The Definitive Guide Free Apache Iceberg Course…
Iceberg REST Catalog Overview #8 - Scan Plan Retrieval and Cancellation

2025年3月13日

Iceberg REST Catalog Overview #8 - Scan Plan Retrieval and Cancellation

Register for 2025 Apache Iceberg Summit Free Copy of Apache Iceberg: The Definitive Guide Free Apache Iceberg Course…
Iceberg REST Catalog Overview #7?-?Scan?Planning

2025年3月11日

Iceberg REST Catalog Overview #7?-?Scan?Planning

Register for 2025 Apache Iceberg Summit Free Copy of Apache Iceberg: The Definitive Guide Free Apache Iceberg Course…
Iceberg REST Catalog Overview #6 — Listing and Creating Tables

2025年3月6日

Iceberg REST Catalog Overview #6 — Listing and Creating Tables

Register for 2025 Apache Iceberg Summit Free Copy of Apache Iceberg: The Definitive Guide Free Apache Iceberg Course…
Iceberg REST Catalog Overview #5 — Namespace Metadata and Properties

2025年3月4日

Iceberg REST Catalog Overview #5 — Namespace Metadata and Properties

Free Copy of Apache Iceberg: The Definitive Guide Free Apache Iceberg Course 2025 Apache Iceberg Architecture Guide…

1 条评论
Iceberg REST Catalog Overview #4 — Managing Namespaces

2025年2月27日

Iceberg REST Catalog Overview #4 — Managing Namespaces

Free Copy of Apache Iceberg: The Definitive Guide Free Apache Iceberg Course 2025 Apache Iceberg Architecture Guide…
Iceberg REST Catalog Overview #3 — OAuth Authentication

2025年2月25日

Iceberg REST Catalog Overview #3 — OAuth Authentication

Free Copy of Apache Iceberg: The Definitive Guide Free Apache Iceberg Course 2025 Apache Iceberg Architecture Guide…

1 条评论
Using Helm with Kubernetes: A Guide to Helm Charts and Their Implementation

2025年2月21日

Using Helm with Kubernetes: A Guide to Helm Charts and Their Implementation

Free Apache Iceberg Course Free Copy of “Apache Iceberg: The Definitive Guide” 2025 Apache Iceberg Architecture Guide…

1 条评论

See all articles

Understanding Data and Databases 101

Alex Merced

Co-Author of “Apache Iceberg: The Definitive Guide” | Head of DevRel at Dremio | LinkedIn Learning Instructor | Tech Content Creator

Data Design

领英推荐

Types of Databases

Using a Database

Alex Merced的更多文章

社区洞察

其他会员也浏览了

Mastering Spark SQL Functions: A Comprehensive Guide

Why You Should Use SQL Code in Power Query Editor

What is SQL and What is the Importance of Learning it?

What is SQL and What is the Importance of Learning it?

Thinking differently about . . . data

Financial Analysis With SQL

Understanding Cursors in SQL Server: A Comprehensive Guide

What is SQL?

Achieving Data Persistence in Qt Applications

7 Ways to Optimize Your SQL Queries for Production Databases

Data Design

领英推荐

Types of Databases

Using a Database

Alex Merced的更多文章

Iceberg REST Catalog Overview #11 — Managing Tables

Iceberg REST Catalog Overview #10 — Registering Tables with the Catalog

Iceberg REST Catalog Overview #9 — Fetching Scan Plan Tasks

Iceberg REST Catalog Overview #8 - Scan Plan Retrieval and Cancellation

Iceberg REST Catalog Overview #7?-?Scan?Planning

Iceberg REST Catalog Overview #6 — Listing and Creating Tables

Iceberg REST Catalog Overview #5 — Namespace Metadata and Properties

Iceberg REST Catalog Overview #4 — Managing Namespaces

Iceberg REST Catalog Overview #3 — OAuth Authentication

Using Helm with Kubernetes: A Guide to Helm Charts and Their Implementation

社区洞察

其他会员也浏览了

Mastering Spark SQL Functions: A Comprehensive Guide

Why You Should Use SQL Code in Power Query Editor

What is SQL and What is the Importance of Learning it?

What is SQL and What is the Importance of Learning it?

Thinking differently about . . . data

Financial Analysis With SQL

Understanding Cursors in SQL Server: A Comprehensive Guide

What is SQL?

Achieving Data Persistence in Qt Applications

7 Ways to Optimize Your SQL Queries for Production Databases