NOTE: THIS DRIVER/CONNECTOR IS NOT OFFICIALLY AFFILIATED WITH SNOWFLAKE, NOR HAS OFFICIAL SUPPORT FROM THEM.
An Elixir driver for Snowflake, the cloud data platform.
Also has support for both pure-Elixir using JSON, or decoding Arrow files via snowflake_arrow_elixir, a Rust library which decodes the Arrow streaming file into Elixir.
Mix.install([
{:req_snowflake, github: "joshuataylor/req_snowflake"}
])
# With plain string query
Req.new()
|> ReqSnowflake.attach(
username: "rosebud",
password: "hunter2",
account_name: "foobar",
region: "us-east-1",
warehouse: "compute_wh", # optional
role: "myrole", # optional
database: "mydb", # optional
schema: "myschema" # optional,
session_parameters: %{} # Passing in session parameters from
parallel_downloads: 10 # optional, but recommended. Defaults to 5 (what the other connectors default to).
)
|> Req.post!(query: "select L_ORDERKEY, L_PARTKEY from snowflake_sample_data.tpch_sf1.lineitem limit 2").body
#=>
# %ReqSnowflake.Result{
# columns: ["L_ORDERKEY", "L_PARTKEY"],
# total_rows: 2,
# rows: [[3_000_001, 14406], [3_000_002, 34422]],
# success: true
# }
# With query parameters for inserting
Req.new()
|> ReqSnowflake.attach(
username: "rosebud",
password: "hunter2",
account_name: "foobar",
region: "us-east-1",
warehouse: "compute_wh", # optional
role: "myrole", # optional
database: "mydb", # optional
schema: "myschema" # optional
)
|> Req.post!(
query: "INSERT INTO \"foo\".\"bar\".\"baz\" (\"hello\") VALUES (?)",
bindings: %{"1" => %{type: "TEXT", value: "xxx"}}
)
#=>
# %ReqSnowflake.Result {
# columns: ["number of rows inserted"],
# total_rows: 1,
# rows: [[1]],
# success: true
# }
It uses the Snowflake REST API to communicate with Snowflake, with an earlier version set for JSON (with support for Arrow if using snowflake_arrow. The REST API is used by the Python, Golang, NodeJS and other languages to send requests to Snowflake, so it is stable and changes are communicated.
This library does not use the Snowflake SQL API, which is limited in its implementation and features.
Right now the library doesn't support MFA, so you'll need to either use private key auth or connect using a username & password. A private key auth is highly recommended as you can rotate passwords easier.
One of the major notes when using Ecto is you will need to enable Snowflakes QUOTED_IDENTIFIERS_IGNORE_CASE
setting, which you can find here: https://docs.snowflake.com/en/sql-reference/identifiers-syntax.html#third-party-tools-and-case-insensitive-identifier-resolution
Note that this can be done on an account or if needed on a session level which you can set below.
- Running queries and inserts.
- JSON row decoding, using Flow this is much faster to decode. It's recommended to also use Jiffy, benchmarking shows a 2x speedup due to the size of the JSON files Snowflake returns.
- Arrow row decoding, using snowflake_arrow
- Passing async queries and getting back a query ID, or polling for results
There are a lot of options that you can pass in, and you can also pass in Snowflake Session Parameters as documented below.
-
snowflake_query string required
Your snowflake query.
select L_ORDERKEY, L_PARTKEY from snowflake_sample_data.tpch_sf1.lineitem limit 2
-
username string required
Your snowflake username.
-
password string required
Your snowflake password.
-
account_name string required
Your account name, this is found before the region name in the URL.
https://abc1234.us-east-1.snowflakecomputing.com
would beabc1234
. -
region string required
Your snowflake region, this is found after your account name.
-
arrow (boolean) optional
Whether or not to use Arrow. You must have snowflake_arrow included in your project for this to work.
-
cache_token optional
Cache the login token between queries, for up to 10 minutes. 10 minutes is the standard login token time for Snowflake. If you change a parameter (apart from the query) this will relog you in.
-
warehouse optional The warehouse to use. If none is provided, will use the users default warehouse
-
role optional
The role to use. If none is provided, will use the users default role
-
database optional
The database to use. If none is provided, will use the users default database
-
schema string optional
The schema to use. If none is provided, will use the users default schema
-
application_name string optional
Application name to pass. By default will not use an application name.
-
bindings map optional
Any bindings to pass.
-
session_parameters map optional
You can pass any session parameters from https://docs.snowflake.com/en/sql-reference/parameters.html.
Example:
session_parameters: %{ROWS_PER_RESULTSET: 50}
will return 50 results only. -
parallel_downloads integer optional
How many parallel downloads to perform for s3. This defaults to 5, which is the default for other connectors.
-
async boolean optional
Will run the query in async mode, returning you the query ID.
-
async_poll boolean optional
Will run the query in async mode, then poll every 5000ms (unless defined by
async_poll_interval
) for the result. -
async_poll_interval integer optional
Will run the query in async mode, then poll every interval milliseconds.
-
async_poll_timeout integer optional
How many times it will try to poll for the result before giving up.
-
download_chunks boolean optional
Whether to download the chunks or just return the base64.
-
table boolean optional
If true, will return the results with the expectation you will use table to process the results. By default this is
false
. For kino_db, this is set astrue
as this uses Table by default.If you are dealing with a lot of data, it's highly recommended to use Table as you can stream the chunks out as you need them, saving you memory, CPU and bandwidth.
-
cache_results **boolean optional
Whether to cache the downloaded chunks, if you are re-iterating over the same dataset this will be cached locally so that you don't have to redownload the data. This is false by default, but enabled for kino_db as when paging we don't want to redownload the chunk.
-
json_library module optional
When decoding JSON, jiffy has shown to be 2x faster and use less memory than Jason for larger JSON blobs, as Snowflake can send large JSON files. Examples:
json_library: Jason
orjson_library: :jiffy
. :jiffy is an atom because it's an Erlang library. Defaults to JSON.
Using table makes it incredibly easy to manipulate and stream over the data. If you download all the chunks at once, you might end up using a lot of memory if you do this over millions of rows.
With table, you can enumerate over the data and only the relevant chunks will be downloaded and decoded.
To get results 0-200, you could do the following:
Req.new()
|> ReqSnowflake.attach(
# your details here!
)
|> Req.post!(
snowflake_query: "select * from foo.bar.baz"
)
|> Map.get(:body)
|> Table.to_rows()
|> Enum.slice(0, 200)
Which results in the following:
[
%{
"ROW_NUMBER" => 688156,
"SF_ARRAY" => nil,
"SF_BOOLEAN" => false,
"SF_DATE" => ~D[2024-04-19],
"SF_DECIMAL_38_2" => 2146.89,
"SF_FLOAT" => nil,
"SF_FLOAT_TWO_PRECISION" => nil,
"SF_INTEGER" => 8562093803,
"SF_OBJECT" => nil,
"SF_TIMESTAMP" => ~N[2025-04-20 07:31:58.064000],
"SF_TIMESTAMP_LTZ" => nil,
"SF_TIMESTAMP_NTZ" => nil,
"SF_VARCHAR" => "ylPPdH0leSF3f5lK9kEC",
"SF_VARIANT_JSON" => "{\n \"key_eV6kL4tVYXLFrEyyDNol\": true\n}"
},
...
]
kino_db integration is supported out of the box, which allows you to connect to an account, create queries and slice the data. It will only download the chunks that it needs, thanks to slicing. This allows you to see results in the table extremely quickly.
- Add this to
db_connection_snowflake
as a generic Snowflake library for db_connection - Add this to
ecto_snowflake
as a Snowflake library for Ecto.
- Add support for MFA
- Add support for private key auth
- Add support for telemetry
I just want to thank the opensource community, especially dbconnection/ecto/ecto_sql/postgrex for being amazing, and being able to copy most of the decoding code from that. I also want to thank the @wojtekmach for the awesome work on req, and the help converting this to req.
Thanks to @lewisvrobinson for the logo.
Copyright (C) 2022 Josh Taylor
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at [http://www.apache.org/licenses/LICENSE-2.0](http://www.apache.org/licenses/LICENSE-2.0)
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.