Data Models, Types, or Schemas?

Data Models, Types, or Schemas?

This article was originally published in the API Changelog newsletter on February 14, 2025.

Naming things is hard. I've recently been involved in conversations that had to do with the definition of API input and output data. While the topic didn't seem too complicated, I immediately noticed an unpleasant struggle. Different people were using different names to refer to the definition of data. To some people, "data model" was the correct way, while some others used "data types" instead. There even were others who used just the word "schema." I'm one of them, and finding these alternatives intrigued me to the point of having to share my opinion here. Read on.


This article is brought to you with the help of our supporter, Speakeasy.

Further expanding its best-in-class API tooling, Speakeasy now empowers teams with open standards. The platform simplifies OpenAPI Overlay adoption so you can focus on building. Click the button below to check out the playground.

OpenAPI Overlay Playground


There is only one way to determine if something is correct: being pedantic about its meaning. At least, that's how I feel whenever I find myself in an argument about the definition of something. The other option is to defend your taste even if you know you're wrong in the first place. I prefer being pedantic, if you know what I mean. So, let's see what each of the options means from a purely dictionary perspective.

A data model is nothing more than a representation, often simplified, of the way particular information exists and circulates. Data models exist to help people and machines make sense of the way data is organized. One of the key elements of a data model is a definition of each of the properties it's representing. So, for example, say you're creating a data model of a person. You'd want to define what a person's name would be like, how many characters would it have, what kind of words it could be made of, and so on. The same thing happens with, for instance, a person's age. You'd want to define how you'd represent age. If it's in years, months, or something else. All these constraints and definitions constitute what a data model is. The bad thing about the word model is that it's very popular. However, not in the way we wish it would be. Almost all the time, people refer to a model in the context of AI—as in a large language model.

Data type is the second option we have. We're lucky because there's a direct definition of "data type" in the dictionary. It says that a data type is a specific kind of data item, one that holds the definition of the values it can store. In a way, data types are what we were referring to before when we were trying to come up with the properties of a person. You define each property of a person by a data type. A person's name, age, gender, and so on are defined by data types. I think it's safe to say we can discard this option. People—at least the ones I interact with the most—usually associate it with primitive data types such as numbers, strings, and booleans. Yes, API input and output data can be as simple as a string, but that's not the most common case. Input and output payloads are often objects formed by primitive data types and even by other objects and arrays.

That leaves us with the third option: data schema. Schema is a word whose meaning is very similar to the word model. In fact, both could be interchangeable. What happens is that people use the word schema to refer to multiple things. So, sometimes there can be confusion about what it really means. So, for instance, some people call an OpenAPI document a schema. To other people, a schema is a way to validate a JSON document (in reference to JSON schema). However, schema is the word the OpenAPI specification officially uses to refer to how it lets you configure payloads. According to the OpenAPI specification version 3.1.1, a schema is "a?formal description of syntax and structure." It's through its schema object that you can define input and output payloads. According to the specification, payloads can be "objects, but also primitives and arrays." The AsyncAPI specification version 3.0.0 follows a similar approach. According to its reference, a schema object "allows the definition of input and output data types." GraphQL also uses the word schema to refer to "what data can be queried from the API." Even gRPC follows a path that is somehow similar to the other specifications. Protocol buffers have "a fully reflective schema that you can use to implement self-description."

It sure looks like "data schema" is the best way to refer to the definition of API input and output payloads. I'll stick to it from now on.

Johan Groenen

Managing partner at Tiltshift digital transformation, board member Code for NL and Open Nederland

1 个月

I don't believe in "dictionary definitions" as the go to for this type of discussions; different domains use different terminology, and people from these domains tend to move from one domain to another, or may have to work together in projects. Computer scientist use other words than programmers, use other words than domain modelers, use other words than architects, use other words than data stewards, etc.

赞
回复

要查看或添加评论,请登录

Bruno Pedro的更多文章

  • Three Meaningful API Metrics

    Three Meaningful API Metrics

    How can you improve an API if you’re not measuring its behavior? There’s no way to improve what you can’t measure…

  • Selectively Serving Your API Reference

    Selectively Serving Your API Reference

    This article was originally published in the API Changelog newsletter on February 7, 2025. What are you looking for…

  • Are AI Agentic Workflows the Future of Automation?

    Are AI Agentic Workflows the Future of Automation?

    This article was originally published in the API Changelog newsletter on January 30, 2025. Most integrations are just…

  • Non-technical API Design

    Non-technical API Design

    Originally published on August 27, 2019, on my personal blog. Last week I published a tweet asking people that consider…

    2 条评论
  • What are Web APIs

    What are Web APIs

    What exactly are Web APIs? Why are Web APIs so popular and widely used? Let’s first explore what APIs are so you can…

  • Best practices for securely storing API keys

    Best practices for securely storing API keys

    In the past, I’ve seen many people use Git repositories to store sensitive information related to their projects…

  • How to securely store API keys

    How to securely store API keys

    In the past, I’ve seen many people use git repositories to store sensitive information related to their projects…

  • 5 steps to API frustration

    5 steps to API frustration

    This article is a satire that describes what often happens to developers that are looking for an API and want to…

  • API friction

    API friction

    The concept of friction in products and applications is not something new and can be experienced by almost everyone…

  • Growing your business with an API

    Growing your business with an API

    This article summarizes a talk I recently gave at the Nordic APIs Platform Summit in Stockholm, Sweden. The full title…

社区洞察

其他会员也浏览了