Key takeaways:
The CI/CD pipeline plays a crucial role in most of software projects and data projects are not exceptional. By automating the deployment process of various objects such as tables, views, tasks, stored procedures, functions etc. It significantly reduces administrative burdens and cycle times. In this blog, I will provide you step by step how to build a Snowflake CI/CD pipeline using Azure DevOps in a seamless and effective manner
I. What is CI/CD Pipeline
CI/CD stands for Continuous Integration and Continuous Deployment (or Continuous Delivery). It is a set of practices and methodologies used in software development to automate and streamline the process of building, testing, and deploying software changes.
Continuous Integration (CI) refers to the practice of frequently merging code changes from multiple developers into a shared repository. With CI, developers integrate their code changes into a central codebase multiple times a day. This allows for early detection of integration issues and helps maintain a stable and consistent codebase. Continuous Deployment (CD) or Continuous Delivery (CD) refers to the practice of automating the deployment of software changes to production environments. It involves automatically building, testing, and deploying software changes to production environments, ensuring that new features, bug fixes, and improvements are quickly and reliably delivered to end-users.
Some advantages of CI/CD Pipeline
Here are a lot of benefits that come with using CI/CD pipelines for data engineering teams besides an increase in productivity. Here are a few common advantages:
- Version control with change logs
- Quality, consistency, and security of code
- Flexibility in the schedule for builds and deployments
- Developer agreement and collaboration via approval and workflows
- Dashboards and reports that provide insight into build or delivery errors
- Stability in the environment with automated rollback features
- The reliable and standard build process
II. What is Azure DevOps
Azure DevOps is a set of tools and services that help DevOps teams provision and manage production environments. It helps teams automate, orchestrate, and manage application and service delivery.
Azure DevOps optimizes source control, build, test, and release processes to enable continuous delivery. It also promotes the use of DevOps practices across cloud and on-premises environments.
III. Build a Snowflake CI/CD Pipeline using Azure DevOps
1. Architecture
Below is diagram how to build a Snowflake CI/CD pipeline using Azure DevOps
How it works:
Step 1: Developers create scripts such as tables, views, functions, stored procedures,…populate testing, and deploy to Dev branch then.
Step 2: Create a pull request to QA branch and add TAs/Team Leads for review
Step 3: TAs/TLs review codes and approve them if no issues detected
Step 4: Trigger pipeline to deploy codes to Snowflake (QA environment)
Step 5: Testers (QCs) will populate testing on QA environment
Step 6: Testers verify and notify no issues detected
Step 7: TA/Team Lead/DevOps will merge codes from QA branch to Main branch
Step 8: Trigger pipeline to deploy codes to Snowflake (Production environment)
2. Build a Snowflake CI/CD pipeline
2.1 Requirements
- Azure DevOps environment is available
- Git client installed on your local machine
Note: You can also use some alternative tools to manage and interact with Git repository on your local machine such as SoureTree or TortoiseGit
2.2. Implements
Step 1: Cloning a new branch from main branch in Azure DevOps (For instance: clone branch “sprint1” from main branch”)
Step 2: Get source code from new branch (sprint1) to local machine:
- Create a folder in local
- Go to that folder
- Use “git clone” command to get code as below:
git clone –b branch_name branch_name_url
Where:
- branch_name: name of the branch that we have just cloned
- branch_name_url: url of the branch that we have just cloned in Azure DevOps
Step 3: Use “git branch” command to make sure that we are standing at correct branch:
git branch –v
Step 4 (optional): Go to git folder and create a migrations folder if needed
Step 5: Put codes into migration folder
Note: Script MUST have conventions as below:
- Script running in Snowflake MUST have extension “.sql”
- File name starts with “V” + “version” + “__” + “name”
Step 6: Check in code
- git add *: add all new files in working folder to the git staging area
- git pull: to make sure we get the latest code from the branch before pushing
- git commit –m “description”: commit code, can put anything in description to distinguish for each commit
- git push origin HEAD:sprint1 : push new or changed files to branch, sprint1 is branch name that we want to push source codes to
Step 7: Check codes pushed to branch in Azure DevOps already
In Azure DevOps, go to your branch to make sure that code pushed already
Step 8: Create pull request and merge code to main branch
Note: To simplify the process, I skip steps which deploy to QA and Staging environment. In real projects, we need to implements these steps to ensure that no potential risks in Production environment
Step 9: Build a pipeline to deploy code to Snowflake database
From repository -> select main branch -> click “set up build” button
Click “Starter pipeline” in Configure your pipeline
Replace code as below:
# Deploy database changes using schemachange trigger: branches: include: - main paths: include: - /migrations pool: vmImage: 'ubuntu-latest' variables: - group: demo-variables steps: - task: UsePythonVersion@0 displayName: 'Use Python 3.8.x' inputs: versionSpec: '3.8.x' - task: Bash@3 inputs: targetType: 'inline' script: | echo 'Starting bash task' echo "PROJECT_FOLDER $(PROJECT_FOLDER)" ls -la $(PROJECT_FOLDER) python --version echo 'Step 1: Installing schemachange' pip install schemachange --upgrade echo 'Step 2: Running schemachange' schemachange -f $(PROJECT_FOLDER)/migrations -a $(SF_ACCOUNT) -u $(SF_USERNAME) -r $(SF_ROLE) -w $(SF_WAREHOUSE) -d $(SF_DATABASE) -c $(SF_DATABASE).SCHEMACHANGE.CHANGE_HISTORY --create-change-history-table env: SNOWFLAKE_PASSWORD: $(SF_PASSWORD)
Note:
- The trigger block in the script above will trigger to run pipeline automatically whenever codes are updated in migrations folder in main branch. It is optional and you can remove if run pipeline manually
- $(PROJECT_FOLDER)/migrations: pipeline just executes scripts in migration folders. Depends on scripts needed to deploy, we can change this path
Step 10: Variable configuration
In code above, we also defined a group variable “demo-variables“, which tells pipeline about deployment environment. To define it, select Pipelines -> Library in Azure DevOps as below:
In Library window, click Variable group
Input “variable group name” and value for variables as below:
Where:
- PROJECT_FOLDER: root folder with default value = “$(System.DefaultWorkingDirectory)”
- SF_ACCOUNT: snowflake account (For instance: abc123.us-east-1)
- SF_DATABASE: snowflake database where deploys codes
- SF_PASSWORD: password to login database
- SF_ROLE: role to run script (For instance: ACCOUNTADMIN)
- SF_USERNAME: username to login database
- SF_WAREHOUSE: default COMPUTE_WH
Step 11: Run pipeline to make sure it runs successfully
Step 12: Verify results in Snowflake
Go to Snowflake, open database and verify all scripts ran successully. In addition, we also see a SCHEMACHANGE.CHANGE_HISTORY table which contains files which were run by pipeline. This will help us to keep track the history of all the files that we executed
IV. Final thoughts
In this post, I walked through step by step how to build a Snowflake CI/CD pipeline using Azure DevOps. By implementing CI/CD practices for Snowflake, organizations can accelerate the development process, improve the quality of data solutions, and ensure a smooth and reliable deployment experience, ultimately leading to a more successful and efficient data management strategy
References
- https://apisero.com/database-change-management-with-schemachange/
- https://quickstarts.snowflake.com/guide/devops_dcm_schemachange_azure_devops/index.html?index=..%2F..index#4
- https://www.youtube.com/watch?v=_WLI04DN08I
- https://github.com/Snowflake-Labs/schemachange