NashTech Insights

Building Snowflake CI/CD Pipeline using Azure DevOps

Vinh Phan Thanh
Vinh Phan Thanh
Table of Contents

Key takeaways:

The CI/CD pipeline plays a crucial role in most of software projects and data projects are not exceptional. By automating the deployment process of various objects such as tables, views, tasks, stored procedures, functions etc. It significantly reduces administrative burdens and cycle times. In this blog, I will provide you step by step how to build a Snowflake CI/CD pipeline using Azure DevOps in a seamless and effective manner

I. What is CI/CD Pipeline

CI/CD stands for Continuous Integration and Continuous Deployment (or Continuous Delivery). It is a set of practices and methodologies used in software development to automate and streamline the process of building, testing, and deploying software changes.

Continuous Integration (CI) refers to the practice of frequently merging code changes from multiple developers into a shared repository. With CI, developers integrate their code changes into a central codebase multiple times a day. This allows for early detection of integration issues and helps maintain a stable and consistent codebase. Continuous Deployment (CD) or Continuous Delivery (CD) refers to the practice of automating the deployment of software changes to production environments. It involves automatically building, testing, and deploying software changes to production environments, ensuring that new features, bug fixes, and improvements are quickly and reliably delivered to end-users.

Some advantages of CI/CD Pipeline

Here are a lot of benefits that come with using CI/CD pipelines for data engineering teams besides an increase in productivity. Here are a few common advantages:

  • Version control with change logs
  • Quality, consistency, and security of code
  • Flexibility in the schedule for builds and deployments
  • Developer agreement and collaboration via approval and workflows
  • Dashboards and reports that provide insight into build or delivery errors
  • Stability in the environment with automated rollback features
  • The reliable and standard build process

II. What is Azure DevOps

Azure DevOps is a set of tools and services that help DevOps teams provision and manage production environments. It helps teams automate, orchestrate, and manage application and service delivery.

Azure DevOps optimizes source control, build, test, and release processes to enable continuous delivery. It also promotes the use of DevOps practices across cloud and on-premises environments.

III. Build a Snowflake CI/CD Pipeline using Azure DevOps

1. Architecture

Below is diagram how to build a Snowflake CI/CD pipeline using Azure DevOps

CICD0

How it works:

Step 1: Developers create scripts such as tables, views, functions, stored procedures,…populate testing, and deploy to Dev branch then.

Step 2: Create a pull request to QA branch and add TAs/Team Leads for review

Step 3: TAs/TLs review codes and approve them if no issues detected

Step 4: Trigger pipeline to deploy codes to Snowflake (QA environment)

Step 5: Testers (QCs) will populate testing on QA environment

Step 6: Testers verify and notify no issues detected

Step 7: TA/Team Lead/DevOps will merge codes from QA branch to Main branch

Step 8: Trigger pipeline to deploy codes to Snowflake (Production environment)

2. Build a Snowflake CI/CD pipeline

2.1 Requirements
  • Azure DevOps environment is available
  • Git client installed on your local machine

Note: You can also use some alternative tools to manage and interact with Git repository on your local machine such as SoureTree or TortoiseGit

2.2. Implements

Step 1: Cloning a new branch from main branch in Azure DevOps (For instance: clone branch “sprint1” from main branch”)

CICD1

Step 2: Get source code from new branch (sprint1) to local machine:

  • Create a folder in local
  • Go to that folder
  • Use “git clone” command to get code as below:
             git clone –b branch_name branch_name_url

Where:

  • branch_name: name of the branch that we have just cloned
  • branch_name_url: url of the branch that we have just cloned in Azure DevOps
CICD18

Step 3: Use “git branch” command to make sure that we are standing at correct branch:

             git branch –v
CICD3

Step 4 (optional): Go to git folder and create a migrations folder if needed

CICD4

Step 5: Put codes into migration folder

CICD5

Note: Script MUST have conventions as below:

  • Script running in Snowflake MUST have extension “.sql”
  • File name starts with “V” + “version” + “__” + “name”

Step 6: Check in code

  • git add *: add all new files in working folder to the git staging area
  • git pull: to make sure we get the latest code from the branch before pushing
  • git commit –m “description”: commit code, can put anything in description to distinguish for each commit
  • git push origin HEAD:sprint1 : push new or changed files to branch, sprint1 is branch name that we want to push source codes to
CICD6

Step 7: Check codes pushed to branch in Azure DevOps already

In Azure DevOps, go to your branch to make sure that code pushed already

CICD7

Step 8: Create pull request and merge code to main branch

CICD17

Note: To simplify the process, I skip steps which deploy to QA and Staging environment. In real projects, we need to implements these steps to ensure that no potential risks in Production environment

Step 9: Build a pipeline to deploy code to Snowflake database

From repository ->  select main branch -> click “set up build” button

CICD19

Click Starter pipeline in Configure your pipeline

CICD11

Replace code as below:

# Deploy database changes using schemachange
trigger:
  branches:
    include:
    - main
  paths:
    include:
    - /migrations
pool:
  vmImage: 'ubuntu-latest'
variables:
- group: demo-variables
steps:
- task: UsePythonVersion@0
  displayName: 'Use Python 3.8.x'
  inputs:
    versionSpec: '3.8.x'
- task: Bash@3
  inputs:
    targetType: 'inline'
    script: |
      echo 'Starting bash task'
      echo "PROJECT_FOLDER $(PROJECT_FOLDER)"
      ls -la $(PROJECT_FOLDER)
      python --version
      echo 'Step 1: Installing schemachange'
      pip install schemachange --upgrade
      echo 'Step 2: Running schemachange'
      schemachange -f $(PROJECT_FOLDER)/migrations -a $(SF_ACCOUNT) -u $(SF_USERNAME) -r $(SF_ROLE) -w $(SF_WAREHOUSE) -d $(SF_DATABASE) -c $(SF_DATABASE).SCHEMACHANGE.CHANGE_HISTORY --create-change-history-table
  env:
    SNOWFLAKE_PASSWORD: $(SF_PASSWORD)

Note:

  • The trigger block in the script above will trigger to run pipeline automatically whenever codes are updated in migrations folder in main branch. It is optional and you can remove if run pipeline manually
  • $(PROJECT_FOLDER)/migrations: pipeline just executes scripts in migration folders. Depends on scripts needed to deploy, we can change this path

Step 10: Variable configuration

In code above, we also defined a group variable “demo-variables“, which tells pipeline about deployment environment. To define it, select Pipelines -> Library in Azure DevOps as below:

CICD12

In Library window, click Variable group

CICD13

Input “variable group name” and value for variables as below:

CICD14

Where:

  • PROJECT_FOLDER: root folder with default value = “$(System.DefaultWorkingDirectory)”
  • SF_ACCOUNT: snowflake account (For instance: abc123.us-east-1)
  • SF_DATABASE: snowflake database where deploys codes
  • SF_PASSWORD: password to login database
  • SF_ROLE: role to run script (For instance: ACCOUNTADMIN)
  • SF_USERNAME: username to login database
  • SF_WAREHOUSE: default COMPUTE_WH

Step 11: Run pipeline to make sure it runs successfully

CICD15

Step 12: Verify results in Snowflake

Go to Snowflake, open database and verify all scripts ran successully. In addition, we also see a SCHEMACHANGE.CHANGE_HISTORY table which contains files which were run by pipeline. This will help us to keep track the history of all the files that we executed

CICD16

IV. Final thoughts

In this post, I walked through step by step how to build a Snowflake CI/CD pipeline using Azure DevOps. By implementing CI/CD practices for Snowflake, organizations can accelerate the development process, improve the quality of data solutions, and ensure a smooth and reliable deployment experience, ultimately leading to a more successful and efficient data management strategy

References

Vinh Phan Thanh

Vinh Phan Thanh

Vinh Phan is a Data Architect based in Vietnam, specializing in Performance tuning, Data Migration, Data warehousing, BI solutions, and Advanced analytics utilizing AI/ML. During his free time, he dedicates himself to researching emerging technologies, enjoying quality time with his family, reading books, and exploring new travel destinations.

Leave a Comment

Your email address will not be published. Required fields are marked *

Suggested Article

%d bloggers like this: