登录查看更多内容

Friday Fun - adding columns to a table in Databricks

Bipin Patwardhan

Solution Architect, Solution Creator, Cloud, Big Data, TOGAF 9

发布日期: 2024年5月31日

In the Big Data world, given that we build structured views on top of flat files (in most cases) adding columns post facto is a painful task. For row based structures like CSV, Avro and JSON, performing such a task can mean creating a new structure (with the new column), copying the existing data into the new structure, removing the old data set and then saving the new structure.

With column oriented storage like ORC and Parquet, this task is slightly easier. Adding a new column means allocating space for it and then updating the schema. In most cases, the addition operation will create a column with null values. Post the addition operation, we will update the column with proper values.

In Databricks, the syntax to add a column to a table is

alter table [table_name] add column [new_column_name] [data_type]

But, this adds a column to the end of the table definition. While this is not a problem, the testing team can make life miserable because the mapping document shows the column to be present after an existing column and not at the end. I believe Databricks went through this pain and have provided support for this situation as below

alter table [table_name] add column [new_column_name] string after [existing_column_name]

What if the new column has to be the first column? This is also supported as below

alter table [table_name] add column [new_column_name] [data_type] first

#databricks #parquet #column_format #bigdata #big_data #alter_table

要查看或添加评论，请登录

Bipin Patwardhan的更多文章

Change management is crucial (Databricks version)

2025年2月22日

Change management is crucial (Databricks version)

My last project was a data platform implemented using Databricks. As is standard in a data project, we were ingesting…
Friday fun - Impersonation (in a good way)

2025年2月14日

Friday fun - Impersonation (in a good way)

All of us know that impersonation - the assumption of another person's identity, be it for good or bad - is not a good…
Any design is a trade-off

2025年2月3日

Any design is a trade-off

Irrespective of any area in the world (software or otherwise), every design is a trade off. A design cannot be the 'one…

1 条评论
Quick Tip: The headache caused by import statements in Python

2025年1月22日

Quick Tip: The headache caused by import statements in Python

When developing applications, there has to be a method to the madness. Just because a programming environment allows…
Databricks: Enabling safety in utility jobs

2025年1月13日

Databricks: Enabling safety in utility jobs

I am working on a project where we are using Databricks on the WAS platform. It is a standard data engineering project…
A Simple Code Generator Using a Cool Python Feature

2025年1月2日

A Simple Code Generator Using a Cool Python Feature

For a project that I executed about three years ago, I wrote a couple of code generators - three variants of a…
Recap of my articles from 2024

2024年12月17日

Recap of my articles from 2024

As we are nearing the end of 2024, I take this opportunity to post a recap of the year - in terms of the articles I…
Handling dates

2024年12月9日

Handling dates

Handling dates is tough in real life. Date handling is probably tougher in the data engineering world.
pfff -- why are you spending time to save 16sec execution time

2024年12月3日

pfff -- why are you spending time to save 16sec execution time

In my current project, we are implementing a data processing and reporting application using Databricks. All the code…

2 条评论
Quick Tip - Add a column to a table (Databricks)

2024年11月26日

Quick Tip - Add a column to a table (Databricks)

As the saying goes, change is the only constant, even in the data space. As we design tables for our data engineering…

See all articles

Friday Fun - adding columns to a table in Databricks

Bipin Patwardhan

Solution Architect, Solution Creator, Cloud, Big Data, TOGAF 9

Bipin Patwardhan的更多文章

社区洞察

其他会员也浏览了

What The Heck is Apache Polaris?

Navigating the Data Lake: The Tale of Iceberg and Delta—A Journey of Divergence and Synergy

How to Talk to SQL Data with Function Calling

How to Implement Dim_Date in Microsoft Fabric using PySpark

ShuffleHashJoin - The what , why and when

SQL Toolkit for Data Engineering is starting in just a few hours!

Different Ways of Creating a DataFrame in Spark

?? Unwrap the Gift of Easy SQL: Gemini in BigQuery is Here!????????

Hands-On Guide: Reading Data from Hudi Tables Incrementally, Joining with Delta Tables using HudiStreamer and SQL-Based Transformer

Design a Generic & Parametrized Pattern Using Auto Loader

Bipin Patwardhan的更多文章

Change management is crucial (Databricks version)

Friday fun - Impersonation (in a good way)

Any design is a trade-off

Quick Tip: The headache caused by import statements in Python

Databricks: Enabling safety in utility jobs

A Simple Code Generator Using a Cool Python Feature

Recap of my articles from 2024

Handling dates

pfff -- why are you spending time to save 16sec execution time

Quick Tip - Add a column to a table (Databricks)

社区洞察

其他会员也浏览了

What The Heck is Apache Polaris?

Navigating the Data Lake: The Tale of Iceberg and Delta—A Journey of Divergence and Synergy

How to Talk to SQL Data with Function Calling

How to Implement Dim_Date in Microsoft Fabric using PySpark

ShuffleHashJoin - The what , why and when

SQL Toolkit for Data Engineering is starting in just a few hours!

Different Ways of Creating a DataFrame in Spark

?? Unwrap the Gift of Easy SQL: Gemini in BigQuery is Here!????????

Hands-On Guide: Reading Data from Hudi Tables Incrementally, Joining with Delta Tables using HudiStreamer and SQL-Based Transformer

Design a Generic & Parametrized Pattern Using Auto Loader