What is the difference between a field, an attribute and a column?

What is the difference between a field, an attribute and a column?

A “field” is a land area filled with grass or crops. To understand why the word “field” is used to describe a column in a database table, we need to go back to the mainframe era in the 1960s-70s, even before that to paper-based form. That was the time when “field” was used, along with “record”, “type”, “length”, “screen” and “file”. The word “attribute” was invented in the relational database design in 1970s, along with “domain”, “entity”, “tuple”, “key” and “relation”. Whereas the word “column” was invented in modern RDBMS, along with “row”, “table”, “index” and “database”.

They said we must not forget where we came from. You might be starting your career directly into the data world. You play with data columns in tables, and never had to deal with “field” in your life, and never built any application that produces data. But, everyone in data should really understand where we came from. If you agree, read on and I’ll try my best to explain the difference between a field, an attribute and a column.

I’m 53 btw, born in 1970, so I did not experience the mainframe era in 1960s and 70s. But I did use IBM/390, PS/2, AS/400 and Cobol in 1990s. I used Microsoft Access in 1990s, as well as Foxbase, dBase III+ and Clipper. Hence the word “field” was in my vocabulary when I was a teenager and in my 20s when I used the above, as well as on Commodore 64, PC XT, AT and 386.

But in 1990s and 2000s I was so into data modelling, in particular relational databases and data warehousing. I designed applications, and built them. I did database normalisation and denormalisation, and did many entity relationship diagrams. That’s when the word “attribute” was in my daily vocabulary. And in the last 20 years I’m using modern databases, from Oracle and SQL Server to Sybase, Informix and Teradata, from Synapse to RDS, DynamoDB to CosmosDB, Redshift to Snowflake. And I deal with “columns”, rows and tables on daily basis.

That’s what sparked me into writing this today. I have experienced those 3 worlds. For young people age 20 something, like my children, nieces and nephews, they would not hear the word “field” apart from agriculture. They go straight into Python in university. Some of them even skipped SQL. “What? You did Python without doing SQL?” Well, the era has changed.

Alright, let’s start with “field”.

Data Field

Before PC and laptops, we use mainframes. A data field is a box on the screen that needs to be populated in a computer system. For example: the Title, the Author, the Publisher and the Number of pages on the “Library Inventory” application in IBM mainframe z/OS below:

Figure 1. Data field in IBM z/OS 3.1.0 (source: IBM, link)

It can be an input field (we need to enter data into it, like the Title field above) or an output field (it displays something to the screen, like the Publisher field above).

Those data fields on the screen is stored in memory of the computer, in something called “address space” and also written on disk so that it is retrievable (we can display it back to the screen after the computer is powered down and switched back on).

In Microsoft Access, a field is a box on the screen too, like below (called a “form”). But unlike in z/OS in Access, a field exists in both the screen/form, and in a table. We define the fields in a table, click a button, and voila, a form with those fields magically appears on the screen, ready for us to input the data. And for the form to display the data in that table.

Figure 2. Fields in a Microsoft Access form (source: Microsoft, link)

So that’s a field on a screen. It was called a “field” because before computer (say 1950s) we have a paper form. And the boxes on those forms are called “fields”. So that when in 1970s they have computer screen full of boxes, they also called it “fields”.

Now let’s get into “attributes”.

Data Attribute

Prior to 1970 data was not relational. Data was stored in files, not in tables. But in 1970s people started to use “attribute”, “tuple” and “relation” like below.

Figure 3. Attribute, tuple and relation (source: Wikipedia, link)

So a “relation” was what we now know as a “table”. Not how each table are related to each other, but the tables themselves. Whereas “attributes” are the columns and “tuples” are the rows in that table.

A key is an attribute that uniquely identifies each tuple. Foreign key is an attribute that corresponds to a key in another relation.

A relation often has anomalies, such as insertion anomalies (can’t insert a tuple), deletion anomalies (deleting a tuple results in unintentional loss of data) and update anomalies (modifying data results in inconsistencies).

To eliminate these anomalies, people started to do “normalisation”. And that’s how the first, second and third normal form started. And then BCNF (Boyce-Codd normal form). It’s for eliminating the relational anomalies.

And that was how the relational database world was formed. We request data by sending a query and the database returns a result set. What was previously stored in a big file now get broken down into tables, normalised, with foreign keys between them.

So that’s an attribute. Today, when designing an ERD (Entity Relationship Diagram), we identify the entity first, such as the business objects (stores, products, clients, employees, trucks, factories, shops, branches, securities, accounts, etc.) That’s called Conceptual Data Model. We identify the entities, and the relationship between entities (one to many, many to many, etc) but not the attributes within each entity.

We pay particular attention to the names of the entities, e.g. do we call it a store or a shop, a customer or a client (never a party), do we call it a security or an instrument. I said we, but what I meant was the business. What do the business people call it? That’s the name we put on the ERD. Not what we call them, but what the business call them.

Then we do LDM, Logical Data Modelling, whereby within each of the entities, we identify the data attributes. And that’s when the word “attributes” often spoken about. And the ERD is formed, complete with all the attributes inside each entity, like this:

Figure 4. Attributes in an Entity Relationships Diagram (source: Bina Nusantara, link)

And now let’s take a look at the last one: column.

Data Column

Every attribute in the ERD is implemented as a column in a table. That is called physical data model. We actually designed the columns in the table, with the data types and length, with the ordinal, and the partitioning. Which entities become views, which entities become tables. Which attributes become a physical column, which attributes become a calculated column. Which columns have a default value, which columns have unique constraint, not null constraint, etc. How do we store dates, is it datetime or timestamp, is it NTZ or LTZ.

Of course not everything is implemented as databases these days. With the arrival of Delta Lakes and Iceberg, those columns are actually stored as delta files and Iceberg files. With Spark catalogs configured on top, users can query those files as if they were database tables. Using Spark SQL. Or Python (PySpark).

In analytics databases, data is sometimes stored in columnar format. They are called columnar databases. Like Redshift, Snowflake, MariaDB, AlloyDB and CosmosDB. Also BigQuery and DuckDB. Data is stored in columns rather than in rows. That way data compression is improved, and query performance is higher. Better in dealing with sparse data too, and memory usage is more efficient.

In fact, Parquet files, the file structure behind Delta Lakes and Delta Tables, are columnar, or column-oriented. Iceberg however, is table format rather than file format, and it supports columnar file formats like Parquet and ORC.

So the word “column” is traditionally related to relational database (RDMBS) such as Oracle, SQL Server, etc. But in analytics world people often use columnar database. Do they still have “columns”? Yes of course, but not in the normal sense, not vertical slices of tables. Columns are stored linearly like below:

Figure 5. Columnar or column-oriented database (source: Wikipedia, link)

So that is the difference between a data field, a data attribute and a data column. I hope you enjoyed reading it. As usual I welcome any comments, feedback and corrections.

Paul Johnson

Founder @ VLDB | Software, Cloud & Data Analytics | MPP SQL Expert

4 个月

Field = file, column = database, attribute = logical model?

Mohamad Ikhsan Nurulloh

Data Engineer | Business Intelligence | Cloud Engineer| Data Analyst | Tableau | Power BI | GCP | Azure | AWS | Alibaba Cloud

4 个月

Thanks for sharing

Peter B.L. Rasmussen

Senior C#/BI/SQL developer

4 个月

A field is as you state an area of grass. A field was originally an area on screen where you can enter data.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了