You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When one step in a makefile fails, subsequent steps fail (by default)
Dependencies between queries must be declared in the makefile as well as the query
No column-level lineage, so it is hard to tell the effect of changing one column of one query
No local testing - can only be tested in BigQuery itself.
Alternatives
DBT: what most people use
Ugly jinja templating
No column-level lineage
SQLMesh: a new product
Implements everything as a view into a physical table in a different dataset.
Specifics of SQLMesh
Is pricing affected by views being defined as SELECT *? No. BigQuery is smart enough to only use the data that is actually used.
Permissions? SQLMesh doesn't manage them. In general, SQLMesh assumes that there is only one BigQuery project, whereas we have separate ones for prod, staging and dev. However, we could choose a gateway when invoking a sqlmesh command, to target the project that we need.
Speed
SQLMesh is slow if it stores its state in BigQuery, because BigQuery writes are slow. SQLMesh can be configured to use a different, faster database for its state, e.g. a tiny postgres instance, but that would require a lot more configuration, so we can put up with the slowness until it becomes annoying enough.
Datasets and permissions
We have some existing datasets. In the simple case:
publishing-api
private
public
SQLMesh can be configured to know the schemas of tables in the publishing-api dataset.
When we define a "model" (SQL query) to derive data from the publishing-api dataset and to put the results into the private and public datasets, SQLMesh will automatically create two more datasets, with oddly named tables. For example, if we create a table called public.current_editions, we will end up with:
public
current_editions: a view defined as SELECT * FROM sqlmesh__public.public__current_editions_123456789
sqlmesh__public
public__current_editions__123456789: a physical table
public__current_editions__123456789__temp: another physical table (empty?)
If we are disciplined about only using SQLMesh for the production 'environment', then it will only create one dataset per dataset that we use, and they will have predictable names, so we could use terraform to authorise all the views in the public dataset to access data in the sqlmesh__public dataset.
If we are ill-disciplined, and use SQLMesh to create other 'environments', then it will create other datasets such as public__dev and public__staging, which will lack permissions to access the data in the dataset sqlmesh__public.
The text was updated successfully, but these errors were encountered:
Currently we manage SQL queries with Makefiles.
Alternatives
DBT: what most people use
SQLMesh: a new product
Specifics of SQLMesh
SELECT *
? No. BigQuery is smart enough to only use the data that is actually used.gateway
when invoking asqlmesh
command, to target the project that we need.Speed
SQLMesh is slow if it stores its state in BigQuery, because BigQuery writes are slow. SQLMesh can be configured to use a different, faster database for its state, e.g. a tiny postgres instance, but that would require a lot more configuration, so we can put up with the slowness until it becomes annoying enough.
Datasets and permissions
We have some existing datasets. In the simple case:
publishing-api
private
public
SQLMesh can be configured to know the schemas of tables in the
publishing-api
dataset.When we define a "model" (SQL query) to derive data from the
publishing-api
dataset and to put the results into theprivate
andpublic
datasets, SQLMesh will automatically create two more datasets, with oddly named tables. For example, if we create a table calledpublic.current_editions
, we will end up with:public
current_editions
: a view defined asSELECT * FROM sqlmesh__public.public__current_editions_123456789
sqlmesh__public
public__current_editions__123456789
: a physical tablepublic__current_editions__123456789__temp
: another physical table (empty?)If we are disciplined about only using SQLMesh for the production 'environment', then it will only create one dataset per dataset that we use, and they will have predictable names, so we could use terraform to authorise all the views in the
public
dataset to access data in thesqlmesh__public
dataset.If we are ill-disciplined, and use SQLMesh to create other 'environments', then it will create other datasets such as
public__dev
andpublic__staging
, which will lack permissions to access the data in the datasetsqlmesh__public
.The text was updated successfully, but these errors were encountered: