Vertex AI and dbt

For years, our data science and data engineering teams lived in different worlds.

The data scientists would be in Python notebooks, doing complex feature engineering on CSV extracts. We, in data engineering, would be building beautiful, aggregated data models in dbt and SQL.

Then came the inevitable, painful “productionisation” process where we’d try to translate their Python logic into performant SQL. It was slow, error-prone, and always led to the cardinal sin of MLOps: training-serving skew. The model that got deployed was never quite the same as the one they’d trained.

The Vertex AI Feature Store, combined with dbt, became our peace treaty. It created a bridge between our two worlds and allowed each team to do what they do best, while ensuring the final result was perfectly consistent. We realized that 90% of feature engineering is just complex, time-aware aggregation—exactly what analytics engineers excel at with dbt.

Our workflow is now a collaboration. The analytics engineer builds a dbt model called customer_features. They use SQL to calculate metrics like orders_last_30d or avg_order_value_90d, and they can use all the power of dbt to test and document this logic.

The output is a clean, wide table in BigQuery, refreshed daily.

Once the dbt job finishes, a Cloud Composer DAG picks up the baton. A Python script, usually written by the data science team, takes that dbt-produced table and ingests it directly into our Vertex AI Feature Store.

That’s it. From that moment, the feature is production-ready. Data scientists can use the Feature Store’s batch API to pull millions of rows for training, and the production application can use the low-latency online serving API to pull the exact same feature for a single customer during a live prediction.

We stopped translating logic and started sharing results. The training-serving skew vanished overnight.