dbt assertions package

Link:

https://github.com/AxelThevenot/dbt-assertions

Overview

dbt-assertions package’s primary use is to give more functionality and flexibility around the handling of failing tests.

Assertions are written in place of tests into the yaml for a table

models:
    - name: customer_orders
      tests:
        - dbt_assertions.generic_assertions:
            from_column: assertion_errors

      columns:
        - name: order_id
          description: Id of order

        - name: customer_id
          description: Id of Customer

        - name: assertion_errors
          assertions:
            __unique__:
              - customer_id
              - order_status

            customer_id_less_than_50:
              description: "Custom assertion that customer_id must be < 50"
              expression: "customer_id <= 50"

Any failures of the assertions are stored in an array column in the table itself. Could also be multiple columns if you wanted to separate them out.

The assertion results can then be used for normal dbt tests by creating a test on the added column or they can be used for filtering data pulled through by downstream models.

Another piece of functionality in the package is the “unique” test which you can see written in the yaml. It allows doing unique tests on a composite key as it will test for uniqueness across all the columns provided.

Final Thoughts

Has some potential use early on in a project where data exploration is still ongoing as it will help you investigate the rows which are failing data quality checks. However I don’t think it would be that hard to create a similar outcome in an ad hoc way using SQL.

An ML dbt project might be an ideal candidate for using this package where data quality needs to be monitored and SQL knowledge may be lower.