
Creating Your First Quality Checks
Basic Column Validations
Start with fundamental checks at the column level:
Completeness Check: Ensure required fields contain values
SQL
CREATE OR REPLACE PROCEDURE check_email_completeness()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
null_percentage FLOAT;
BEGIN
SELECT (COUNT(*) FILTER (WHERE email IS NULL) * 100.0 / COUNT(*))
INTO :null_percentage
FROM customers;
IF (null_percentage > 5) THEN
RETURN 'FAIL: ' || null_percentage || '% of email addresses are missing';
ELSE
RETURN 'PASS: Email completeness meets standards';
END IF;
END;
$$;
Format Validation: Verify data matches expected patterns
sql
CREATE OR REPLACE PROCEDURE validate_phone_format()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
SELECT
CASE
WHEN COUNT(*) = 0 THEN 'PASS: All phone numbers match expected format'
ELSE 'FAIL: ' || COUNT(*) || ' phone numbers have invalid format'
END
FROM customers
WHERE NOT REGEXP_LIKE(phone, '\\d{3}-\\d{3}-\\d{4}');
$$;
Range Check: Ensure numeric values fall within acceptable ranges
sql
CREATE OR REPLACE PROCEDURE validate_price_ranges()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
SELECT
CASE
WHEN COUNT(*) = 0 THEN 'PASS: All prices within valid range'
ELSE 'FAIL: ' || COUNT(*) || ' products have invalid prices'
END
FROM products
WHERE price <= 0 OR price > 10000;
$$;
Table-Level Quality Rules
Move beyond columns to assess entire tables:
Referential Integrity: Ensure foreign keys have matching primary keys
sql
CREATE OR REPLACE PROCEDURE check_order_integrity()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
orphaned_records INTEGER;
BEGIN
SELECT COUNT(*) INTO :orphaned_records
FROM orders o
LEFT JOIN customers c ON o.customer_id = c.customer_id
WHERE c.customer_id IS NULL;
IF (orphaned_records > 0) THEN
RETURN 'FAIL: ' || orphaned_records || ' orders reference non-existent customers';
ELSE
RETURN 'PASS: All orders have valid customer references';
END IF;
END;
$$;
Uniqueness Verification: Check for unexpected duplicates
sql
CREATE OR REPLACE PROCEDURE check_product_uniqueness()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
duplicate_count INTEGER;
BEGIN
SELECT COUNT(*) INTO :duplicate_count
FROM (
SELECT sku, COUNT(*) as count
FROM products
GROUP BY sku
HAVING COUNT(*) > 1
);
IF (duplicate_count > 0) THEN
RETURN 'FAIL: ' || duplicate_count || ' duplicate product SKUs found';
ELSE
RETURN 'PASS: All product SKUs are unique';
END IF;
END;
$$;
Data Balance: Verify expected proportions in categorical data
sql
CREATE OR REPLACE PROCEDURE check_status_distribution()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
cancelled_percentage FLOAT;
BEGIN
SELECT (COUNT(*) FILTER (WHERE status = 'Cancelled') * 100.0 / COUNT(*))
INTO :cancelled_percentage
FROM orders;
IF (cancelled_percentage > 10) THEN
RETURN 'WARN: Cancelled orders (' || cancelled_percentage || '%) exceed expected threshold';
ELSE
RETURN 'PASS: Order status distribution within expected parameters';
END IF;
END;
$$;
Simple Data Testing Patterns
Implement these common testing approaches:
Year-over-Year Comparison: Check for unexpected trend changes
Sql
CREATE OR REPLACE PROCEDURE check_sales_trends()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
yoy_change FLOAT;
BEGIN
SELECT ((current_year_sales - previous_year_sales) * 100.0 / previous_year_sales)
INTO :yoy_change
FROM (
SELECT
SUM(CASE WHEN YEAR(order_date) = YEAR(CURRENT_DATE()) THEN amount ELSE 0 END) as current_year_sales,
SUM(CASE WHEN YEAR(order_date) = YEAR(CURRENT_DATE()) - 1 THEN amount ELSE 0 END) as previous_year_sales
FROM orders
);
IF (ABS(yoy_change) > 25) THEN
RETURN 'ALERT: Sales changed by ' || yoy_change || '% year-over-year';
ELSE
RETURN 'PASS: Sales trends within expected range';
END IF;
END;
$$;
Statistical Outlier Detection: Identify unusual data points
sql
CREATE OR REPLACE PROCEDURE detect_order_outliers()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
outlier_count INTEGER;
avg_order FLOAT;
stddev_order FLOAT;
BEGIN
SELECT AVG(amount), STDDEV(amount)
INTO :avg_order, :stddev_order
FROM orders;
SELECT COUNT(*)
INTO :outlier_count
FROM orders
WHERE amount > avg_order + (3 * stddev_order);
RETURN 'Found ' || outlier_count || ' order amount outliers (>3 std deviations)';
END;
$$;
Change Detection: Monitor for unexpected data modifications
sql
CREATE OR REPLACE PROCEDURE monitor_customer_changes()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
change_percentage FLOAT;
BEGIN
SELECT (COUNT(*) * 100.0 / (SELECT COUNT(*) FROM customers))
INTO :change_percentage
FROM customers_history
WHERE modified_date > DATEADD(day, -1, CURRENT_DATE());
IF (change_percentage > 5) THEN
RETURN 'ALERT: ' || change_percentage || '% of customer records modified in last 24 hours';
ELSE
RETURN 'PASS: Customer change rate within normal limits';
END IF;
END;
$$;
Using Snowflake's Quality Dashboard
Accessing the Dashboard
Navigate to the quality dashboard through:
-
The Snowflake web interface via Governance → Data Quality
-
Custom Snowsight dashboards pointing to quality metrics
-
External BI tools connected to your quality results tables
Understanding Key Metrics
The dashboard provides several essential metrics:
-
Quality Score: The aggregated measure of overall data quality (0-100)
-
Failure Rate: Percentage of quality checks that fail
-
Issue Trends: Patterns in quality problems over time
-
Impact Assessment: Severity classification of detected issues
-
Resolution Time: How quickly quality issues are addressed
Customizing Your View
Tailor the dashboard to meet your specific needs:
Create Custom Views: Focus on specific datasets or quality dimensions
sql
CREATE OR REPLACE VIEW data_governance.quality.financial_data_quality AS
SELECT
check_date,
dataset_name,
check_name,
result,
impact_level
FROM data_governance.quality.validation_results
WHERE dataset_name IN ('orders', 'invoices', 'payments')
ORDER BY check_date DESC;
- Set Priority Indicators: Highlight critical quality metrics
- Configure Time Windows: Adjust the analysis period for trend detection
- Define Custom Thresholds: Set organization-specific quality standards
Automating Quality Monitoring
Setting Up Scheduled Checks
Implement regular quality verification:
1. Create Task Schedules: Automate quality checks at appropriate intervalssql
CREATE OR REPLACE TASK daily_customer_validation
WAREHOUSE = quality_monitoring_wh
SCHEDULE = 'USING CRON 0 7 * * * America/New_York'
AS
CALL validate_customer_data();
2. Cascade Dependencies: Chain-related checks together
3. Conditional Execution: Run deeper checks only when basic validation passessql
CREATE OR REPLACE TASK verify_order_data
WAREHOUSE = quality_monitoring_wh
AFTER daily_customer_validation
AS
CALL validate_orders_completeness();
sql
CREATE OR REPLACE PROCEDURE conditional_product_validation()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
DECLARE
basic_check_result VARCHAR;
BEGIN
SELECT result INTO :basic_check_result
FROM TABLE(CALL check_product_uniqueness());
IF (STARTSWITH(:basic_check_result, 'PASS')) THEN
CALL validate_price_ranges();
END IF;
RETURN 'Validation sequence completed';
END;
$$;
Creating Basic Alerts
Establish notification systems:
1. Configure Alert Thresholds: Define when notifications should triggersql
CREATE OR REPLACE PROCEDURE email_on_failure(check_name STRING, result STRING)
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
BEGIN
IF (STARTSWITH(result, 'FAIL')) THEN
CALL SYSTEM$SEND_EMAIL(
'data_alerts',
'Quality Check Failure: ' || check_name,
'The following quality check failed: ' || check_name || '\n\nResult: ' || result
);
END IF;
RETURN 'Alert processing complete';
END;
$$;
-
Set Up Notification Channels: Connect to email, Slack, or other systems.
-
Implement Escalation Logic: Increase alert severity based on issue persistence.
Responding to Quality Issues
Develop a structured response framework:
- Automated Quarantine: Isolate potentially problematic data
sql
CREATE OR REPLACE PROCEDURE quarantine_invalid_orders()
RETURNS VARCHAR
LANGUAGE SQL
AS
$$
BEGIN
INSERT INTO data_governance.quarantine.orders
SELECT * FROM orders
WHERE amount < 0 OR customer_id IS NULL;
DELETE FROM orders
WHERE amount < 0 OR customer_id IS NULL;
RETURN 'Invalid orders quarantined';
END;
$$;
-
Issue Categorisation: Classify problems by type and severity
-
Remediation Workflows: Standardised processes for fixing common issues
-
Documentation Requirements: Record quality incidents and resolutions
Building a Data Quality Framework
Defining Quality Standards
Establish clear metrics for data quality:
-
Completeness: Percentage of required data that is present
-
Accuracy: Correctness of data values
-
Consistency: Uniformity across related data elements
-
Timeliness: Currency and availability of data
-
Validity: Conformance to defined formats and rules
-
Uniqueness: Absence of unintended duplication
Implementing Team Workflows
Create effective quality management processes:
-
Assign Clear Responsibilities: Define quality roles within the organization
-
Establish Review Procedures: Regular evaluation of quality metrics
-
Create Escalation Paths: Clear routes for addressing critical issues
-
Document Standards: Maintain accessible quality requirements
-
Conduct Training: Ensure team members understand quality protocols
Measuring Success
Track the effectiveness of your quality initiatives:
-
Reduction in Incidents: Fewer data-related issues reported
-
Faster Resolution: Decreased time to fix quality problems
-
Improved Trust: Greater confidence in data-driven decisions
-
Efficient Operations: Reduced rework due to data issues
-
Compliance Success: Better regulatory audit outcomes
Tips for Success and Common Mistakes to Avoid
Best Practices
-
Start Small: Begin with critical datasets rather than attempting to cover everything.
-
Prioritise Impact: Focus on quality dimensions that most affect business outcomes.
-
Automate Intelligently: Balance automated checks with manual oversight.
-
Involve Business Users: Align quality rules with actual business needs.
-
Document Everything: Maintain precise records of quality standards and procedures.
Common Pitfalls
-
Excessive Alerts: Creating too many alerts leads to alarm fatigue
-
Rigid Thresholds: Setting unrealistic quality expectations
-
Neglecting Root Causes: Fixing symptoms instead of underlying issues
-
Siloed Approach: Failing to coordinate quality efforts across teams
-
Static Rules: Not evolving quality checks as data and business needs change
By leveraging Snowflake's quality management tools and following these guidelines, organizations can establish robust data governance practices that ensure reliable, trustworthy data for decision-making while maintaining compliance with regulatory requirements. Integrating quality management into everyday data operations transforms governance from a burdensome obligation into a strategic advantage that delivers tangible business value.