Implementing Hub-and-Spoke Architecture with Azure Databricks
Aritra Ghosh
Founder at Vidyutva | EV | Solutions Architect | Azure & AI Expert | Ex- Infosys | Passionate about innovating for a sustainable future in Electric Vehicle Ecosystem and AI
The Hub-and-Spoke model in Azure Databricks is designed to enhance security, governance, and scalability by separating centralized services (Hub) from workload execution (Spokes). This pattern is widely used for multi-team collaboration, data governance, and network segmentation in large-scale Azure environments.
?? Architecture Overview
?? Typical Components:
?? Implementation Steps
1?? Set Up the Hub Virtual Network (VNet)
2?? Create Spoke Virtual Networks for Databricks Workspaces
3?? Enable Private Link for Secure Databricks Access
4?? Configure Unity Catalog for Data Governance
5?? Configure Secure Storage with ADLS (Hub)
6?? Implement Network Security Policies
7?? Deploy and Test Workload Execution
?? Benefits of Hub-and-Spoke in Databricks
? Centralized Governance – Unity Catalog ensures security across workspaces.
? Network Segmentation – Secure VNet Peering prevents data exposure.
领英推荐
? Scalability – Add more Databricks workspaces (Spokes) without affecting the Hub.
? Cost Optimization – Shared infrastructure reduces duplicate resource costs.
? Enhanced Security – Private Link, NSGs, and Azure Firewall improve security posture.
Here is a Terraform template to automate the Hub-and-Spoke architecture setup for Azure Databricks. It includes:
? Hub Virtual Network with an Azure Firewall
? Spoke Virtual Network with a Databricks Workspace
? VNet Peering between Hub and Spoke
? Private Endpoint for Databricks
? Unity Catalog Integration (Commented for Future Use)
?? Terraform Code: Hub-and-Spoke for Azure Databricks
here's a sample Terraform script for you.
provider "azurerm" {
features {}
}
# ---------------- HUB NETWORK ----------------
resource "azurerm_virtual_network" "hub_vnet" {
name = "hub-vnet"
location = "East US"
resource_group_name = "hub-rg"
address_space = ["10.0.0.0/16"]
}
resource "azurerm_subnet" "firewall_subnet" {
name = "AzureFirewallSubnet"
resource_group_name = "hub-rg"
virtual_network_name = azurerm_virtual_network.hub_vnet.name
address_prefixes = ["10.0.1.0/24"]
}
resource "azurerm_firewall" "hub_firewall" {
name = "hub-firewall"
location = "East US"
resource_group_name = "hub-rg"
sku_name = "AZFW_VNet"
}
# ---------------- SPOKE NETWORK ----------------
resource "azurerm_virtual_network" "spoke_vnet" {
name = "spoke-vnet"
location = "East US"
resource_group_name = "spoke-rg"
address_space = ["10.1.0.0/16"]
}
resource "azurerm_subnet" "databricks_subnet" {
name = "databricks-subnet"
resource_group_name = "spoke-rg"
virtual_network_name = azurerm_virtual_network.spoke_vnet.name
address_prefixes = ["10.1.1.0/24"]
}
# ---------------- VNET PEERING ----------------
resource "azurerm_virtual_network_peering" "hub_to_spoke" {
name = "hub-to-spoke"
resource_group_name = "hub-rg"
virtual_network_name = azurerm_virtual_network.hub_vnet.name
remote_virtual_network_id = azurerm_virtual_network.spoke_vnet.id
}
resource "azurerm_virtual_network_peering" "spoke_to_hub" {
name = "spoke-to-hub"
resource_group_name = "spoke-rg"
virtual_network_name = azurerm_virtual_network.spoke_vnet.name
remote_virtual_network_id = azurerm_virtual_network.hub_vnet.id
}
# ---------------- DATABRICKS WORKSPACE ----------------
resource "azurerm_databricks_workspace" "databricks" {
name = "databricks-ws"
location = "East US"
resource_group_name = "spoke-rg"
sku = "premium"
managed_resource_group_name = "databricks-managed-rg"
}
# ---------------- PRIVATE ENDPOINT FOR DATABRICKS ----------------
resource "azurerm_private_endpoint" "databricks_pe" {
name = "databricks-private-endpoint"
location = "East US"
resource_group_name = "spoke-rg"
subnet_id = azurerm_subnet.databricks_subnet.id
private_service_connection {
name = "databricks-connection"
private_connection_resource_id = azurerm_databricks_workspace.databricks.id
subresource_names = ["databricks_ui_api"]
is_manual_connection = false
}
}
# ---------------- (OPTIONAL) UNITY CATALOG SETUP ----------------
# Uncomment this when Unity Catalog is enabled in your Databricks account
# resource "databricks_metastore" "unity_catalog" {
# name = "databricks-unity-catalog"
# region = "East US"
# }
?? Explanation of the Terraform Script
1?? Creates a Hub Virtual Network (hub-vnet) with an Azure Firewall
2?? Creates a Spoke Virtual Network (spoke-vnet) for Databricks
3?? Establishes VNet Peering between Hub and Spoke for communication
4?? Deploys an Azure Databricks Workspace (databricks-ws) in the Spoke VNet
5?? Sets up a Private Endpoint for Databricks, ensuring secure access
6?? (Optional) Unity Catalog Setup for centralized governance (commented for now)
?? Next Steps
terraform init
terraform apply -auto-approve
Once deployed, you can use Unity Catalog for centralized governance.