Building Snowflake CI/CD Pipeline using Azure DevOps

Vinh Phan Thanh

Key takeaways:

The CI/CD pipeline plays a crucial role in most of software projects and data projects are not exceptional. By automating the deployment process of various objects such as tables, views, tasks, stored procedures, functions etc. It significantly reduces administrative burdens and cycle times. In this blog, I will provide you step by step how to build a Snowflake CI/CD pipeline using Azure DevOps in a seamless and effective manner

I. What is CI/CD Pipeline

CI/CD stands for Continuous Integration and Continuous Deployment (or Continuous Delivery). It is a set of practices and methodologies used in software development to automate and streamline the process of building, testing, and deploying software changes.

Continuous Integration (CI) refers to the practice of frequently merging code changes from multiple developers into a shared repository. With CI, developers integrate their code changes into a central codebase multiple times a day. This allows for early detection of integration issues and helps maintain a stable and consistent codebase. Continuous Deployment (CD) or Continuous Delivery (CD) refers to the practice of automating the deployment of software changes to production environments. It involves automatically building, testing, and deploying software changes to production environments, ensuring that new features, bug fixes, and improvements are quickly and reliably delivered to end-users.

Some advantages of CI/CD Pipeline

Here are a lot of benefits that come with using CI/CD pipelines for data engineering teams besides an increase in productivity. Here are a few common advantages:

Version control with change logs
Quality, consistency, and security of code
Flexibility in the schedule for builds and deployments
Developer agreement and collaboration via approval and workflows
Dashboards and reports that provide insight into build or delivery errors
Stability in the environment with automated rollback features
The reliable and standard build process

II. What is Azure DevOps

Azure DevOps is a set of tools and services that help DevOps teams provision and manage production environments. It helps teams automate, orchestrate, and manage application and service delivery.

Azure DevOps optimizes source control, build, test, and release processes to enable continuous delivery. It also promotes the use of DevOps practices across cloud and on-premises environments.

III. Build a Snowflake CI/CD Pipeline using Azure DevOps

1. Architecture

Below is diagram how to build a Snowflake CI/CD pipeline using Azure DevOps

How it works:

Step 1: Developers create scripts such as tables, views, functions, stored procedures,…populate testing, and deploy to Dev branch then.

Step 2: Create a pull request to QA branch and add TAs/Team Leads for review

Step 3: TAs/TLs review codes and approve them if no issues detected

Step 4: Trigger pipeline to deploy codes to Snowflake (QA environment)

Step 5: Testers (QCs) will populate testing on QA environment

Step 6: Testers verify and notify no issues detected

Step 7: TA/Team Lead/DevOps will merge codes from QA branch to Main branch

Step 8: Trigger pipeline to deploy codes to Snowflake (Production environment)

2. Build a Snowflake CI/CD pipeline

2.1 Requirements

Azure DevOps environment is available
Git client installed on your local machine

Note: You can also use some alternative tools to manage and interact with Git repository on your local machine such as SoureTree or TortoiseGit

2.2. Implements

Step 1: Cloning a new branch from main branch in Azure DevOps (For instance: clone branch “sprint1” from main branch”)

https://www.flickr.com/photos/198353360@N02/53058592986/in/datetaken/

Step 2: Get source code from new branch (sprint1) to local machine:

Create a folder in local
Go to that folder
Use “git clone” command to get code as below:

             git clone –b branch_name branch_name_url

Where:

branch_name: name of the branch that we have just cloned
branch_name_url: url of the branch that we have just cloned in Azure DevOps

Step 3: Use “git branch” command to make sure that we are standing at correct branch:

             git branch –v

https://www.flickr.com/photos/198353360@N02/53058775319/in/datetaken/

Step 4 (optional): Go to git folder and create a migrations folder if needed

https://www.flickr.com/photos/198353360@N02/53058775309/in/datetaken/

Step 5: Put codes into migration folder

Note: Script MUST have conventions as below:

Script running in Snowflake MUST have extension “.sql”
File name starts with “V” + “version” + “__” + “name”

Step 6: Check in code

git add *: add all new files in working folder to the git staging area
git pull: to make sure we get the latest code from the branch before pushing
git commit –m “description”: commit code, can put anything in description to distinguish for each commit
git push origin HEAD:sprint1 : push new or changed files to branch, sprint1 is branch name that we want to push source codes to

https://www.flickr.com/photos/198353360@N02/53058007852/in/datetaken/

Step 7: Check codes pushed to branch in Azure DevOps already

In Azure DevOps, go to your branch to make sure that code pushed already

https://www.flickr.com/photos/198353360@N02/53059092695/in/datetaken-public/

Step 8: Create pull request and merge code to main branch

https://www.flickr.com/photos/198353360@N02/53059458550/in/datetaken-public/

Note: To simplify the process, I skip steps which deploy to QA and Staging environment. In real projects, we need to implements these steps to ensure that no potential risks in Production environment

Step 9: Build a pipeline to deploy code to Snowflake database

From repository -> select main branch -> click “set up build” button

https://www.flickr.com/photos/198353360@N02/53059108441/in/datetaken-public/

Click “Starter pipeline” in Configure your pipeline

Replace code as below:

# Deploy database changes using schemachange
trigger:
  branches:
    include:
    - main
  paths:
    include:
    - /migrations
pool:
  vmImage: 'ubuntu-latest'
variables:
- group: demo-variables
steps:
- task: UsePythonVersion@0
  displayName: 'Use Python 3.8.x'
  inputs:
    versionSpec: '3.8.x'
- task: Bash@3
  inputs:
    targetType: 'inline'
    script: |
      echo 'Starting bash task'
      echo "PROJECT_FOLDER $(PROJECT_FOLDER)"
      ls -la $(PROJECT_FOLDER)
      python --version
      echo 'Step 1: Installing schemachange'
      pip install schemachange --upgrade
      echo 'Step 2: Running schemachange'
      schemachange -f $(PROJECT_FOLDER)/migrations -a $(SF_ACCOUNT) -u $(SF_USERNAME) -r $(SF_ROLE) -w $(SF_WAREHOUSE) -d $(SF_DATABASE) -c $(SF_DATABASE).SCHEMACHANGE.CHANGE_HISTORY --create-change-history-table
  env:
    SNOWFLAKE_PASSWORD: $(SF_PASSWORD)

Note:

The trigger block in the script above will trigger to run pipeline automatically whenever codes are updated in migrations folder in main branch. It is optional and you can remove if run pipeline manually
$(PROJECT_FOLDER)/migrations: pipeline just executes scripts in migration folders. Depends on scripts needed to deploy, we can change this path

Step 10: Variable configuration

In code above, we also defined a group variable “demo-variables“, which tells pipeline about deployment environment. To define it, select Pipelines -> Library in Azure DevOps as below:

https://www.flickr.com/photos/198353360@N02/53058220032/in/datetaken/

In Library window, click Variable group

https://www.flickr.com/photos/198353360@N02/53058985349/in/datetaken/

Input “variable group name” and value for variables as below:

https://www.flickr.com/photos/198353360@N02/53058985359/in/datetaken-public/

Where:

PROJECT_FOLDER: root folder with default value = “$(System.DefaultWorkingDirectory)”
SF_ACCOUNT: snowflake account (For instance: abc123.us-east-1)
SF_DATABASE: snowflake database where deploys codes
SF_PASSWORD: password to login database
SF_ROLE: role to run script (For instance: ACCOUNTADMIN)
SF_USERNAME: username to login database
SF_WAREHOUSE: default COMPUTE_WH

Step 11: Run pipeline to make sure it runs successfully

https://www.flickr.com/photos/198353360@N02/53058217547/in/datetaken-public/

Step 12: Verify results in Snowflake

Go to Snowflake, open database and verify all scripts ran successully. In addition, we also see a SCHEMACHANGE.CHANGE_HISTORY table which contains files which were run by pipeline. This will help us to keep track the history of all the files that we executed

IV. Final thoughts

In this post, I walked through step by step how to build a Snowflake CI/CD pipeline using Azure DevOps. By implementing CI/CD practices for Snowflake, organizations can accelerate the development process, improve the quality of data solutions, and ensure a smooth and reliable deployment experience, ultimately leading to a more successful and efficient data management strategy

References

https://apisero.com/database-change-management-with-schemachange/
https://quickstarts.snowflake.com/guide/devops_dcm_schemachange_azure_devops/index.html?index=..%2F..index#4
https://www.youtube.com/watch?v=_WLI04DN08I
https://github.com/Snowflake-Labs/schemachange

Vinh Phan Thanh

Vinh Phan is a Data Architect based in Vietnam, specializing in Performance tuning, Data Migration, Data warehousing, BI solutions, and Advanced analytics utilizing AI/ML. During his free time, he dedicates himself to researching emerging technologies, enjoying quality time with his family, reading books, and exploring new travel destinations.

Solutions

Industry

Our thinking

Building Snowflake CI/CD Pipeline using Azure DevOps

Vinh Phan Thanh

Table of Contents

I. What is CI/CD Pipeline

Some advantages of CI/CD Pipeline

II. What is Azure DevOps

III. Build a Snowflake CI/CD Pipeline using Azure DevOps

1. Architecture

2. Build a Snowflake CI/CD pipeline

2.1 Requirements

2.2. Implements

IV. Final thoughts

Vinh Phan Thanh

Leave a Comment Cancel Reply

Suggested Article

NashTech

Solutions

Useful links

Connect with us

Our achievements