Link:
https://github.com/AxelThevenot/dbt-assertions
Overview
dbt-assertions package’s primary use is to give more functionality and flexibility around the handling of failing tests.
Assertions are written in place of tests into the yaml for a table
models:
- name: customer_orders
tests:
- dbt_assertions.generic_assertions:
from_column: assertion_errors
columns:
- name: order_id
description: Id of order
- name: customer_id
description: Id of Customer
- name: assertion_errors
assertions:
__unique__:
- customer_id
- order_status
customer_id_less_than_50:
description: "Custom assertion that customer_id must be < 50"
expression: "customer_id <= 50" Any failures of the assertions are stored in an array column in the table itself. Could also be multiple columns if you wanted to separate them out.
The assertion results can then be used for normal dbt tests by creating a test on the added column or they can be used for filtering data pulled through by downstream models.
Another piece of functionality in the package is the “unique” test which you can see written in the yaml. It allows doing unique tests on a composite key as it will test for uniqueness across all the columns provided.
Final Thoughts
Has some potential use early on in a project where data exploration is still ongoing as it will help you investigate the rows which are failing data quality checks. However I don’t think it would be that hard to create a similar outcome in an ad hoc way using SQL.
An ML dbt project might be an ideal candidate for using this package where data quality needs to be monitored and SQL knowledge may be lower.