NashTech Blog

Testing in Snowflake: Ensuring Data Quality and Reliability in the Cloud

Table of Contents

As data platforms evolve, Snowflake has emerged as a powerful cloud-based data warehouse solution, offering scalability, performance, and ease of use. But with great power comes great responsibility – especially for testers who ensure data integrity, performance, and security across pipelines and analytics. In this post, we will explore how to approach testing in Snowflake, and best practices to help us build a robust testing strategy.

Why Testing in Snowflake Matters

Data testing focuses on the accuracy, completeness, and consistency of data. In Snowflake, this becomes even more important because:

  • Data is continuously ingested from multiple sources (e.g., APIs, ETL tools, data lakes…).
  • Transformations can easily break due to schema changes or logic errors.
  • Downstream dashboards or models depend on clean, validated data.

By implementing testing directly in Snowflake, we can detect data issues early — before they affect business decisions.

Common use cases for Snowflake include:

  • Data warehousing
  • ETL/ELT pipelines
  • Business intelligence
  • Machine learning

Types of Testing

Data Validation Testing

Data Validation Testing is one of the most critical aspects of testing in Snowflake. It ensures that data loaded into Snowflake is accurate, complete, and consistent with its source systems.
Since Snowflake is often used as the central data warehouse that aggregates information from multiple upstream systems (APIs, databases, data lakes, etc.), even small data discrepancies can lead to incorrect reports or analytics. This includes:

  • Row counts
  • Null checks
  • Data type consistency
  • Referential integrity

Example:  Row Count Comparison

— Compare record counts between source and target tables
SELECT
(SELECT COUNT() FROM source_db.public.customers) AS source_count, (SELECT COUNT() FROM target_db.public.customers) AS snowflake_count;

If counts do not match, further investigation is needed to find missing or extra records.

Example: Data Type Validation

— Check for invalid data format (e.g., non-numeric values in numeric columns)
SELECT *
FROM target_db.public.sales
WHERE TRY_TO_NUMBER(amount) IS NULL AND amount IS NOT NULL;

This query flags records with incorrect data types that may have been loaded incorrectly.

Transformation Testing

Validate that transformations (e.g., joins, aggregations, filtering, calculations) produce the expected results. This ensures that business rules and logic applied during data processing are implemented correctly. Use SQL assertions or compare against known outputs or golden datasets.

Typical checks include:

  • Join accuracy: Verify that joins between tables produce the correct number of records and relationships.
  • Aggregation validation: Confirm that sums, averages, counts, or other aggregations match expected business results.
  • Logic verification: Ensure that case statements, mappings, and derived columns follow transformation logic correctly.
  • Performance impact: Validate that transformations are efficient and scalable within Snowflake’s virtual warehouse resources.
  • End-to-end consistency: Compare transformed data with predefined reference datasets or expected output from a trusted environment.

Example: Validate transformation logic

SELECT customer_id, SUM(amount) AS total_spent FROM transactions GROUP BY customer_id;

Schema Testing

Ensure that the table structure (columns, data types, and constraints) hasn’t changed unexpectedly. Schema testing helps maintain compatibility between different data layers and prevents downstream failures in pipelines or reports.

Typical checks include:

  • Column existence: Verify that all expected columns are present and no unexpected columns were added or removed.
  • Data type validation: Ensure each column’s data type matches the defined schema to prevent type conversion errors.
  • Constraint verification: Check that primary keys, unique constraints, and not-null rules remain intact.
  • Schema drift detection: Compare current schema metadata with a baseline schema stored in a control table or metadata repository.
  • Automated alerts: Implement automated jobs or stored procedures to detect and log schema changes.

Example: SHOW COLUMNS IN TABLE my_database.my_schema.my_table;

Performance Testing

Measure query performance and cost efficiency to ensure that workloads run optimally in Snowflake. Performance testing helps identify slow queries, optimize warehouse size, and balance cost versus speed.

Typical goals include:

  • Query response time: Measure how long key analytical or ETL queries take to execute under different warehouse sizes.
  • Concurrency testing: Evaluate how multiple users or jobs running simultaneously affect performance.
  • Warehouse scaling: Test different warehouse tiers (e.g., X-SMALL, MEDIUM, LARGE) to find the most cost-effective configuration.
  • Caching behavior: Assess how Snowflake’s result caching and micro-partition pruning improve performance.
  • Query optimization: Analyze query profiles using the Query History or Query Profile view to detect bottlenecks such as inefficient joins, missing clustering, or excessive data scanning.

Example:

SELECT QUERY_TEXT, EXECUTION_TIME FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORYWHERE EXECUTION_TIME > 5000;

Security and Access Testing

Ensure that users and applications have the correct access permissions in Snowflake by validating roles, privileges, and data access boundaries. This testing guarantees compliance, data protection, and adherence to the principle of least privilege

Typical checks include:

  • Role-based access control (RBAC): Verify that each user or group has only the required privileges (e.g., SELECT, INSERT, USAGE) for their role.
  • Grant validation: Check that roles are correctly assigned using SHOW GRANTS TO ROLE or SHOW GRANTS OF USER.
  • Object-level security: Ensure that access to databases, schemas, and tables is limited to authorized roles.
  • Data masking and row access policies: Test whether sensitive data is properly masked or filtered using Snowflake’s Dynamic Data Masking and Row Access Policies.
  • Audit and monitoring: Review Snowflake’s ACCESS_HISTORY and LOGIN_HISTORY views to detect unauthorized access or suspicious activity.
  • Integration testing with external identity providers: Validate authentication and role mapping via Okta, Azure AD, or other SSO mechanisms.

Example: SHOW GRANTS TO ROLE analyst_role;

Tools

SQL-Based Testing

SQL-based testing is one of the most direct and powerful approaches when working with Snowflake, since every operation—from ingestion to transformation and loading—relies on SQL.
By writing SQL assertions and validation queries, testers can verify data accuracy, consistency, and completeness across different stages of the data pipeline.

dbt (Data Build Tool)

dbt (Data Build Tool) is one of the most popular frameworks used for managing data transformations and testing in modern data warehouses, including Snowflake.
It allows data teams to write modular SQL models, apply version control, and embed testing directly into the transformation workflow. Popular for transformation testing. Supports:

  • Unit tests
  • Schema tests
  • Documentation

Great Expectations

Great Expectations (GE) is an open-source data quality framework that helps us validate, document, and profile data through reusable “expectations.”
It integrates smoothly with Snowflake, making it an excellent tool for automated data validation in pipelines.

Instead of writing ad-hoc SQL checks, we define data quality rules (“expectations”) in Python or YAML — and Great Expectations automatically checks our  Snowflake data against them. It integrates with Snowflake via Python or dbt, allowing us to define expectations such as:

  • “No nulls in customer_id”
  • “Order dates should not be in the future.”

Snowflake’s Native Testing Options

Snowflake provides several built-in capabilities that allow us to implement data quality checks directly within the platform, without relying on external tools.

By using SQL, stored procedures, and tasks, we can automate data validation, error logging, and continuous monitoring as part of our data ingestion and transformation workflows.

Best Practices

Test Early, Test Often: Validate data at every stage – ingestion, transformation, and output.

Keep Tests Version-Controlled: Store SQL test scripts or dbt tests.

Monitor and Log Failures: Store failed test results in a data_quality_log table.

Use Small Test Warehouses for validation jobs to optimize cost.

Document Everything: Maintain clear documentation for test cases, expected results, and data lineage.

Conclusion

In today’s data-driven world, organizations rely heavily on platforms like Snowflake to power analytics, reporting, and decision-making. But the value of data is only as strong as its quality, reliability, and security—and that’s where testing plays a critical role.

Testing in Snowflake is not just a technical task; it’s a strategic practice that ensures our data ecosystem remains trustworthy and performant. Whether we are validating ETL pipelines, verifying schema changes, or monitoring query performance, a well-defined testing strategy helps prevent costly errors, improves stakeholder confidence, and supports compliance with data governance standards.

Picture of Doan Luong

Doan Luong

I am responsible not only for overseeing and coordinating all testing activities but also for actively participating in hands-on testing efforts. I lead the test team in the project to ensure high-quality deliverables, efficient execution of test plans, and alignment with the project's overall objectives.

Leave a Comment

Your email address will not be published. Required fields are marked *

Suggested Article

Scroll to Top