Automating Azure Databricks Provisioning with Terraform and Establishing CI/CD Pipelines in Azure DevOps

Nam Phuong Tran

Introduction

Azure Databricks is a powerful analytics platform built on Apache Spark, tailor-made for Azure. It fosters collaboration between data engineers, data scientists, and machine learning experts, facilitating their work on large-scale data and advanced analytics projects. On the other hand, Terraform, an open-source Infrastructure as Code (IaC) tool developed by HashiCorp, empowers users to define and provision infrastructure resources using a declarative configuration language.

Objective

In this guide, I’ll delve into the seamless integration of these two technologies using the Databricks Terraform provider. After that I will create CI/CD by writing YAML files then do the configuration on Azure DevOps. This combination offers several compelling advantages and is the recommended approach for efficiently managing Databricks workspaces and their associated resources in Azure.
The Terraform code will be writent by using modular. This is best practice and more popular to use. Today, I will show you the way I provision Azure Databrick and other necessary resources as exhibit below:

Let’s get start write Terraform script by using modular

Setting Up Your Terraform Environment

Before we dive into the specifics, there are some prerequisites for successfully using Terraform and the Databricks Terraform provider:
Azure Account: Ensure you have an Azure account.
Azure Admin User: You need to be an account-level admin user in your Azure account.

Development Machine Setup: On your local development machine, you should have the Terraform CLI and Azure CLI installed and configured. Make sure you are signed in via the az login command with a user that has Contributor or Owner rights to your subscription.

Project Structure
Organize your project into a folder for your Terraform scripts, let’s call it “databricks.” We will create several configuration files to handle authentication and resource provisioning.
In your Terraform project, create a versions.tf file to specify the Terraform version and required providers:
Version and Provider Configuration

terraform {
  required_version = ">= 1.2, < 1.5"
  required_providers {
    azurerm = {
      source = "hashicorp/azurerm"
    }
    databricks = {
      source  = "databricks/databricks"
      version = "1.28.0"
    }
  }
}

Now, let’s define the providers in a providers.tf file:

provider "azurerm" {
features {}
}
provider "databricks" {
# We'll revisit this section later
}

We’ll return to the Databricks provider configuration shortly.
Backend Configuration

terraform {
  backend "azurerm" {    
  }
}

Getting started writing the Terraform script to build Azure Databricks infrastructure.

First, I will create a project struture like

Step 1: Write Microsoft Entra ID module

I need this module, because it will help to create an App registrations in Microsoft Entra ID and it is also create client secret so we can write the output the usage later. For example: I will store in the Azure Keyvault let Databricks can retrieve this data more secure.

data "azurerm_client_config" "current" {
}

data "azuread_client_config" "current_ad" {

}

resource "azuread_application" "app" {
  display_name = join("-", [var.resource_type,var.application,var.application_environment,var.region_short])
  owners       = [data.azuread_client_config.current_ad.object_id]
}

resource "azuread_service_principal" "sp" {
  application_id = azuread_application.app.application_id
  use_existing   = true
}

resource "time_rotating" "tro" {
  rotation_days = var.rotation_days
}

resource "azuread_application_password" "pass" {
  application_object_id = azuread_application.app.object_id
  display_name = join("-", [var.resource_type,var.application,var.application_environment,var.region_short])
  rotate_when_changed = {
    rotation = time_rotating.tro.id
  }
}

This module will help to create an app registrations, client secret and service principle in Microsoft Entra ID. It will require the permission when we deploy it later. Please refer the document for more detail https://registry.terraform.io/providers/hashicorp/azuread/latest/docs/resources/application#api-permissions.

Step 2: Create a resource group

# Create a resource group
resource "azurerm_resource_group" "rg" {
  name     = join("-", [var.resource_type,var.application,var.application_environment,var.region_short])
  location = var.region
  tags = merge(var.default_tags,{
    Env = var.application_environment
  })
}

Step 3: Create a Virtual Network (Vnet) – Optional, but Important

Whether to create a Vnet depends on your specific use case. If you’re exclusively using Azure Databricks and don’t require outbound access or a high level of security, you can skip this step. However, if you need to interact with services outside of Azure, it’s advisable to create a Vnet.
Consider the scenario where you want Azure Databricks to access MongoDB Atlas, which resides outside of Azure. MongoDB Atlas secures its infrastructure by allowing specific IPs in a whitelist. However, exposing Azure Databricks to the internet isn’t an ideal solution. Instead, you can create a Vnet and set up peering or a private endpoint.
It’s essential to note that you can’t add a Vnet to an existing workspace. Once a workspace is created, its configurations are registered in the Control Plane and can’t be modified.
Within this Vnet, we’ll create two subnets: a public subnet and a private subnet. We’ll also implement a network security group (NSG) to manage security for the Vnet.

resource "azurerm_virtual_network" "vnet" {
  name                     = join("-", [var.resource_type,var.application,var.application_environment,var.region_short])
  resource_group_name      = var.resource_group_name
  location                 = var.region
  address_space            = [var.cidr]

   tags = merge(var.default_tags,{
    Env                  = var.application_environment,    
    ApplicationName      = "Databricks"
  })
}

resource "azurerm_network_security_group" "nsg" {
  name                = join("-", ["nsg",var.application,var.application_environment,var.region_short])
  resource_group_name      = var.resource_group_name
  location                 = var.region
}

resource "azurerm_subnet" "public" {
  name                 = "subnet-public"
  resource_group_name  = var.resource_group_name
  virtual_network_name = azurerm_virtual_network.vnet.name
  address_prefixes     = [cidrsubnet(var.cidr, 3, 0)]

  delegation {
    name = "databricks"
    service_delegation {
      name = "Microsoft.Databricks/workspaces"
      actions = [
        "Microsoft.Network/virtualNetworks/subnets/join/action",
        "Microsoft.Network/virtualNetworks/subnets/prepareNetworkPolicies/action",
        "Microsoft.Network/virtualNetworks/subnets/unprepareNetworkPolicies/action"]
    }
  }
}

resource "azurerm_subnet_network_security_group_association" "public" {
  subnet_id                 = azurerm_subnet.public.id
  network_security_group_id = azurerm_network_security_group.nsg.id
}

resource "azurerm_subnet" "private" {
  name                 = "subnet-private"
  resource_group_name  = var.resource_group_name
  virtual_network_name = azurerm_virtual_network.vnet.name
  address_prefixes     = [cidrsubnet(var.cidr, 3, 1)]

  delegation {
    name = "databricks"
    service_delegation {
      name = "Microsoft.Databricks/workspaces"
      actions = [
        "Microsoft.Network/virtualNetworks/subnets/join/action",
        "Microsoft.Network/virtualNetworks/subnets/prepareNetworkPolicies/action",
        "Microsoft.Network/virtualNetworks/subnets/unprepareNetworkPolicies/action"]
    }
  }
}

resource "azurerm_subnet_network_security_group_association" "private" {
  subnet_id                 = azurerm_subnet.private.id
  network_security_group_id = azurerm_network_security_group.nsg.id
}

Step 4: Create a Storage account and Data Lake

Azure has 2 services about Data Lake. But now the Azure Data Lake Storage Gen 2 is more popular. Azure Data Lake Storage Gen2 is a set of capabilities dedicated to big data analytics, built on Azure Blob Storage. Data Lake Storage Gen2 makes Azure Storage the foundation for building enterprise data lakes on Azure. So we only write a module for Storage account then we can flexible create Storage account or Data Lake by custom the parameters.
Here is the module for Storage Account or Data Lake

# Create a Storage Account
resource "azurerm_storage_account" "st" {
  # the below will create a unique number for each resource if required, as long as the for_each command is uncommented in the random integer block above
  
  name                            = join("", [var.resource_type,var.application,var.application_environment,var.region_short])
  resource_group_name             = var.resource_group_name
  location                        = var.region
  account_tier                    = "Standard"
  account_replication_type        = "LRS"
  min_tls_version                 = "TLS1_2"
  allow_nested_items_to_be_public = false
  account_kind                    = "StorageV2"
  is_hns_enabled                  = var.is_hns_enabled
  
  static_website {
    error_404_document = "errors.html"
    index_document     = "index.html"
  }

  tags = merge(var.default_tags,{
    Env = var.application_environment
    ApplicationName = "Databricks"
  })
}

resource "azurerm_storage_container" "st_container" {
  name                  = var.container_name
  storage_account_name  = azurerm_storage_account.st.name
  container_access_type = "private"
}

Step 5: Create Azure Databricks service

resource "azurerm_databricks_workspace" "dbw" {
  name                          = join("-", [var.resource_type,var.application,var.application_environment,var.region_short])
  resource_group_name      = var.resource_group_name
  location                 = var.region
  sku                           = "premium"
  managed_resource_group_name   = join("-", ["rg","databrick-managed",var.application_environment,var.region_short])
  public_network_access_enabled = var.public_network_access_enabled
  custom_parameters {
    no_public_ip                                         = var.no_public_ip
    virtual_network_id                                   = var.vnet_id
    private_subnet_name                                  = var.private_subnet_name
    public_subnet_name                                   = var.public_subnet_name
    public_subnet_network_security_group_association_id  = var.public_nsg_ass_id
    private_subnet_network_security_group_association_id = var.private_nsg_ass_id
  }
  depends_on = [var.public_nsg_ass, var.private_nsg_ass ]

   tags = merge(var.default_tags,{
    Env = var.application_environment,    
    ApplicationName = upper(var.application)
  })
}

data "databricks_node_type" "dbr_node_type" {
  local_disk = true
  depends_on = [azurerm_databricks_workspace.dbw]
}

data "databricks_spark_version" "dbr_spark" {
  long_term_support = true
  depends_on        = [azurerm_databricks_workspace.dbw]
}

resource "databricks_instance_pool" "dbr_instance_pool" {
  instance_pool_name = join("-", ["pool",var.application,var.application_environment,var.region_short])
  min_idle_instances = 0
  max_capacity       = 5
  node_type_id       = data.databricks_node_type.dbr_node_type.id

  idle_instance_autotermination_minutes = 30

  azure_attributes {
    availability       = "ON_DEMAND_AZURE"
    spot_bid_max_price = -1
  }

  disk_spec {
    disk_type {
      azure_disk_volume_type = "PREMIUM_LRS"
    }
    disk_size  = 80
    disk_count = 1
  }
}

resource "databricks_cluster" "cluster" {
  cluster_name  = join("-", ["dbc",var.application,var.application_environment,var.region_short])
  spark_version = data.databricks_spark_version.dbr_spark.id
  node_type_id = data.databricks_node_type.dbr_node_type.id
  autotermination_minutes = 30
  autoscale {
    min_workers = 2
    max_workers = 5
  }
  spark_conf = {
    "spark.databricks.io.cache.enable" : true
  }
  depends_on       = [azurerm_databricks_workspace.dbw]
}

When creating a Azure Databrick service, it will be automatically provisioned one more managed resource group that is managed by Azure. So, I would like to custom the managed resource group name. In addition, I setup some default configuration like: scaling, autoterminal, vnet, subnet with network security group rule. Because, everything resource will be depended on Azure Databricks workspace then it must be require list out the depends_on to make sure the workspace will be created first.

Step 6: Create Azure Key Vault

To store the sensitive data for secure configuration. Here I also configured the service principle above will have a permission to access and retrieve the content of KV service.

resource "azurerm_key_vault" "kv" {  
  # The key vault name is glboally unique across all azure tenants
  name                     = join("-", [var.resource_type,var.application,var.application_environment,var.region_short])
  resource_group_name      = var.resource_group_name
  location                 = var.region
  enabled_for_disk_encryption = true
  tenant_id                   = var.tenant_id
  #soft_delete_enabled         = true       # no longer configurable, enabled by default
  soft_delete_retention_days  = 7
  purge_protection_enabled    = false

  sku_name = "standard"


  lifecycle {
    # prevent_destroy = true
  }

  access_policy {
    tenant_id = var.tenant_id
    object_id = var.object_id

    key_permissions = [
          "Get",
          "List",
          "Purge",
          "Create",
          "Update",
        ]

      ## this sets the permissions for the object
        secret_permissions = [
          "Get",
          "List",
          "Delete",
          "Set",
          "Purge",
        ]

        storage_permissions = [
          "Get",
          "List",
          "Set",
        ]
      

        certificate_permissions = [
          "Get",
          "List",
          "Create",
          "Purge",
          "Import",
          "Update",
        ]
  }

  ## This sets the access policy for the Managed Identity so that development can access the secrets/certs
  access_policy {
    tenant_id = var.tenant_id
    object_id = var.service_principal_id  ## id of the identity created for the application
    key_permissions = [
      "Get",
      "List",
    ]

  ## this sets the permissions for the object
    secret_permissions = [
      "Get",
      "List",
    ]

    storage_permissions = [
      "Get",
      "List",
    ]
  

    certificate_permissions = [
      "Get",
      "List",
    ]
  }
  
  tags = merge(var.default_tags,{
    Env = var.application_environment,    
    ApplicationName = upper(var.application)
  })
}

Step 7: Write role assignment module

This module we use a lot. This help us easy to assign the resources to the scope we want with the role needed. For example: We need to make configuration for Databricks can ingest data using Autoloader feature with file notification or Databricks can access the blob storage account…

resource "azurerm_role_assignment" "role" {
  scope                         = var.scope
  role_definition_name          = var.role_definition_name
  principal_id                  = var.principal_id
}

Step 8: Call modules to main.tf file to complete the code

Let’s finish the code by calling all modules needed to main file

module "sp" {
  source                  = "./modules/entra-id"
  resource_type           = "sp"
  application             = var.application
  application_environment = var.workload_environments
  region                  = var.region
  region_short            = var.region_short
  rotation_days           = 365
  databricks_workspace    = module.dbrk.databricks_workspace
}

module "rg" {
  source                  = "./modules/resource-group"
  resource_type           = "rg"
  application             = var.application
  application_environment = var.workload_environments
  region                  = var.region
  region_short            = var.region_short
  default_tags            = var.default_tags
}

module "vnet" {
  source                  = "./modules/virtual-network"
  resource_type           = "vnet"
  application             = var.application
  application_environment = var.workload_environments
  region                  = var.region
  resource_group_name     = module.rg.resource_group_name
  region_short            = var.region_short
  default_tags            = var.default_tags
}

module "st" {  
  source                          = "./modules/storage-account"
  resource_type                   = "st"
  application                     = var.application
  application_environment         = var.workload_environments
  region                          = var.region
  resource_group_name             = module.rg.resource_group_name
  region_short                    = var.region_short
  default_tags                    = var.default_tags
  container_name                  = "insights-data"
}

module "dl" {  
  source                          = "./modules/storage-account"
  resource_type                   = "dl"
  application                     = var.application
  application_environment         = var.workload_environments
  region                          = var.region
  resource_group_name             = module.rg.resource_group_name
  region_short                    = var.region_short
  default_tags                    = var.default_tags
  is_hns_enabled                  = true
  container_name                  = "application-insights"
}

module "dbrk" {  
  source                          = "./modules/databricks"
  resource_type                   = "dbw"
  application                     = var.application
  application_environment         = var.workload_environments
  region                          = var.region
  resource_group_name             = module.rg.resource_group_name
  region_short                    = var.region_short
  default_tags                    = var.default_tags
  vnet_id                  = module.vnet.vnet_id
  private_subnet_name                  = module.vnet.private_subnet_name
  public_subnet_name = module.vnet.public_subnet_name
  public_nsg_ass_id = module.vnet.public_nsg_ass_id
  private_nsg_ass_id = module.vnet.private_nsg_ass_id
  public_nsg_ass = module.vnet.public_nsg_ass
  private_nsg_ass = module.vnet.private_nsg_ass

}

module "kv" {
  source                         = "./modules/keyvault"
  resource_type                  = "kv"
  application                    = var.application
  application_environment        = var.workload_environments
  region                         = var.region
  resource_group_name            = module.rg.resource_group_name
  region_short                   = var.region_short  
  tenant_id      = module.sp.tenant_id
  object_id      = module.sp.object_id
  default_tags = var.default_tags
  service_principal_id = module.sp.service_principal_id
}

resource "azurerm_key_vault_secret" "kv_client_secret" {
  name         = "client-secret"
  value        = module.sp.app_secret
  key_vault_id = module.kv.key_vault_id
}

resource "azurerm_key_vault_secret" "kv_client_id" {
  name         = "client-id"
  value        = module.sp.client_id
  key_vault_id = module.kv.key_vault_id
}

resource "azurerm_key_vault_secret" "kv_tenant_id" {
  name         = "tenant-id"
  value        = module.sp.tenant_id
  key_vault_id = module.kv.key_vault_id
}

resource "azurerm_key_vault_secret" "kv_subscription_id" {
  name         = "subscription-id"
  value        = module.sp.subscription_id
  key_vault_id = module.kv.key_vault_id
}

resource "azurerm_key_vault_secret" "kv_blob_key" {
  name         = "blob-key"
  value        = module.st.account_access_key
  key_vault_id = module.kv.key_vault_id
}


# Assign the role to DBW can work with storage and using Autoloader file notifications
# https://docs.databricks.com/en/ingestion/auto-loader/file-notification-mode.html
# https://learn.microsoft.com/en-us/azure/databricks/getting-started/connect-to-azure-storage
# Grant permission to read blobs as cloud files on Azure Storage Account

module "st_roles" {
  for_each = var.roles_definition_name
  source = "./modules/role-assigment"
  scope = module.st.account_id
  role_definition_name = each.value
  principal_id = module.sp.service_principal_id
}

# Grant permission to create blobs as external table on Data Lake
module "dl_roles" {
  source = "./modules/role-assigment"
  scope = module.dl.account_id
  role_definition_name ="Storage Blob Data Contributor"
  principal_id = module.sp.service_principal_id
}


module "dbw_roles" {
  for_each = var.databricks_administrators
  source = "./modules/role-assigment"
  scope = module.dbrk.databricks_workspace_id
  role_definition_name = "Contributor"
  principal_id = each.value
}

One thing very important here: Because I am using write Terraform by using module concept so it requires to have a versions.tf file on each module. If not specify this it may have an error occurs when the required_providers block is not defined in every module that uses the Databricks Terraform provider. https://kb.databricks.com/en_US/terraform/terraform-registry-does-not-have-a-provider-error.

Creating a YAML CI/CD Configuration

Writing a build.yaml file for the CI

parameters:
- name: buildTag # defaults for any parameters that aren't specified
  default: ''
- name: vmImage
  default: ''

stages:
- stage: CI
  pool:
    vmImage: ${{ parameters.vmImage }}
  jobs:
  - job: Build
    displayName: Build
    steps:
    - checkout: self
      clean: 'true'
      path: s

    - task: Bash@3
      inputs:
        targetType: 'inline'
        script: "GIT_COMMIT=$(git rev-parse --short HEAD)\necho \"GIT_COMMIT: ${GIT_COMMIT}\"         \n# set env variable to allow next task to consume\necho \"##vso[task.setvariable variable=GIT_COMMIT]${GIT_COMMIT}\"\n"

    - task: PublishBuildArtifacts@1
      displayName: 'Publish Artifact: drop'
      inputs:
        PathtoPublish: '$(Build.SourcesDirectory)'
        ArtifactName: 'drop-${{ parameters.buildTag }}'
        publishLocation: 'Container'

Writing a deploy.yaml file for the CD

The best practice for deploying Terraform that we should use remote state. In this article I use a separate storage account to store the state. From the agent which will run the pipeline will install the Terraform then initialize the state and backend info. Next it will runs checks that verify whether a configuration is syntactically valid and internally consistent, regardless of any provided variables or existing state. After that execution plan, which lets you preview the changes that Terraform plans to make to your infrastructure. I also prepared destroy task if needed, but now I disable it, if you need you can enable it later. Finally it will apply with auto approve option.

stages:
- stage: CD
  jobs:
  - job: Release
    displayName: Release
    continueOnError: false
    pool:
      vmImage: $(vmImageName)
    steps:
    - task: DownloadPipelineArtifact@2 # Use the Download Pipeline Artifact task
      displayName: 'Download artifact'
      inputs:
        buildType: 'current'
        targetPath: '$(Pipeline.Workspace)'
        artifactName: 'drop-$(tag)'

    - task: TerraformInstaller@1
      displayName: 'Install Terraform'
      inputs:
        terraformVersion: 1.3.7

    - powershell: "$env:TF_LOG = \"DEBUG\"\ndir \"$(Pipeline.Workspace)\" \n"
      enabled: true

    - task: TerraformTaskV4@4
      displayName: 'Terraform : init'
      inputs:
        provider: 'azurerm'
        workingDirectory: '$(Pipeline.Workspace)'
        backendServiceArm: 'databricks-nonprod'
        backendAzureRmResourceGroupName: 'rg-terraform-non-prod-weu'
        backendAzureRmStorageAccountName: 'stteranonprodweu'
        backendAzureRmContainerName: 'pipeline-databricks'
        backendAzureRmKey: 'terraform.tfstate'

    - task: TerraformTaskV4@4
      displayName: 'Terraform: validate - validation tf'
      inputs:
        command: validate
        workingDirectory: '$(Pipeline.Workspace)'
        provider: 'azurerm'
        environmentServiceNameAzureRM: 'databricks-nonprod'

    - task: TerraformTaskV4@4
      displayName: 'Terraform: plan - list out resources'
      inputs:
        command: plan
        workingDirectory: '$(Pipeline.Workspace)'
        commandOptions: '-no-color'
        provider: 'azurerm'
        environmentServiceNameAzureRM: 'databricks-nonprod'
      enabled: true

    - task: TerraformTaskV4@4
      displayName: 'Terraform: destroy - delete all resources'
      inputs:
        command: destroy
        workingDirectory: '$(Pipeline.Workspace)'
        provider: 'azurerm'
        environmentServiceNameAzureRM: 'databricks-nonprod'
      enabled: false

    - task: TerraformTaskV4@4
      displayName: 'Terraform : apply all tf files'
      inputs:
        command: apply
        workingDirectory: '$(Pipeline.Workspace)'
        commandOptions: '-no-color'
        provider: 'azurerm'
        environmentServiceNameAzureRM: 'databricks-nonprod'
      enabled: true

Create an Azure pipelines

I’ve crafted an azure-pipelines.yaml file designed to invoke the two YAML files mentioned above using templates. This approach offers a streamlined way to segment the code, making it simpler to handle and maintain.

# azure-pipeline.yml
trigger:
- main

variables:
  tag: '$(Build.BuildNumber)-$(Build.SourceBranchName)-$(Build.BuildId)'
  vmImageName: 'ubuntu-latest'

stages:
- template: build.yaml # Template reference to ci.yml
  parameters:
    buildTag: $(tag)
    vmImage: $(vmImageName)
- template: deploy.yaml # Template reference to cd.yml

Setting up Azure DevOps

Create a service principle from Azure Portal

To establish a connection between Azure DevOps and the Azure Portal for Azure resource creation, there are several methods available. In this guide, I’ll demonstrate one of the approaches to configuration.
Go to the Microsoft Entra ID, choose the App registrations then create a new app with name: sp-databrick-pipeline

Next, we need to generate a password that will need for next steps when we create a service connection in Azure DevOps.
Go to inside of the service principle, then select the Certificates & secrets

Please copy some sensitive information like: client secret, client id, tenant id, subcription id to secure place then it will be used later.

Granting permission the service principle

Go the the subscription and choose the IAM

Then Add role asssignment

Once we have added the role assignment done, we need to back to the service principle to add more permission. Because it need permission to create the App registrations in the Microsoft Entra ID. Select the sp-databricks-pipeline and choose the API permission then add Application.ReadWrite.All from Microsoft Graph with Delegated permissions.

Create a service connection

Go to the project setting

Here we need to fill the information with the information that I have saved from previous steps. Once, it’s filled in, we can save with the name then verify before saving this connection.

Please be noted that, the service connection name will be used afterward. It will be used in CI/CD YAML files.

Install Terraform

If you have already installed Terraform in Azure DevOps you can ignore this step. However, if you not, you need to install the extension from Market. Here are some steps you need to follow:

There are some providers but I suggest you will use Terraform that providing by Microsoft. That is the first one.

Now, in your organization has already been installed Terraform. That is good.

Create a pipeline and executing

Go to the pipeline, then clicking to the button new pipeline

Select the kind of place you store yaml file. For me I use the first one, Azure repos. Once you select that, it required you select the repository and doing configure the pipeline. Here I have already write yaml file as above. So I will choice the option Existing Azure Pipeline YAML file.

All the things are done! now it’s time to test the Terraform code

Conclusion

In today’s rapidly evolving tech landscape, efficiently deploying and managing cloud resources is paramount. In this article, we’ve explored a powerful approach to streamline the setup and management of Databricks infrastructures on Azure. By harnessing the versatility of Terraform modules, we’ve not only automated the provisioning process but also enhanced scalability and maintainability.
Additionally, we’ve delved into the world of CI/CD by utilizing YAML configuration files to orchestrate the continuous integration and delivery pipeline. This automation ensures that our Databricks environments are always up-to-date and in sync with our development efforts.
Lastly, we’ve seen how configuring Azure DevOps to collaborate seamlessly with the Azure portal is pivotal for a cohesive cloud development experience. This integration empowers teams to leverage the full potential of Azure services while maintaining a centralized and streamlined workflow.
In the dynamic realm of cloud computing, the ability to provision and manage resources efficiently, alongside automated CI/CD pipelines and robust collaboration tools, is a recipe for success. With these strategies in place, we are well-equipped to tackle the challenges of the Azure cloud with agility and precision.

Automating Azure Databricks Provisioning with Terraform and Establishing CI/CD Pipelines in Azure DevOps

Nam Phuong Tran

Table of Contents

Introduction

Objective

Let’s get start write Terraform script by using modular

Setting Up Your Terraform Environment

Getting started writing the Terraform script to build Azure Databricks infrastructure.

Step 1: Write Microsoft Entra ID module

Step 2: Create a resource group

Step 3: Create a Virtual Network (Vnet) – Optional, but Important

Step 4: Create a Storage account and Data Lake

Step 5: Create Azure Databricks service

Step 6: Create Azure Key Vault

Step 7: Write role assignment module

Step 8: Call modules to main.tf file to complete the code

Creating a YAML CI/CD Configuration

Writing a build.yaml file for the CI

Writing a deploy.yaml file for the CD

Create an Azure pipelines

Setting up Azure DevOps

Create a service principle from Azure Portal

Granting permission the service principle

Create a service connection

Install Terraform

Create a pipeline and executing

Conclusion

Nam Phuong Tran

Leave a Comment Cancel Reply

Suggested Article

NashTech

Solutions

Useful links

Connect with us

Our achievements