Continuously ingesting data from S3 into ClickHouse

In this recipe, we'll learn how to ingest data continuously from an S3 bucket into ClickHouse.

Download Clickhouse

curl https://clickhouse.com/ | sh

ClickHouse Server

cp clickhouse clickhouse-server
cd clickhouse-server

Start the server

./clickhouse server

ClickHouse Client

Create a table with S3Queue table engine

CREATE TABLE ordersQueue (
    orderDate DateTime, 
    gender String,
    customerId UUID,
    cost Float32,
    name String,
    creditCardNumber String,
    address String,
    orderId UUID
)
ENGINE = S3Queue(
    'https://s3queue.clickhouse.com.s3.eu-north-1.amazonaws.com/data/*.json',
    JSONEachRow
)
SETTINGS 
    mode = 'ordered', 
    s3queue_enable_logging_to_s3queue_log = 1;

Create a table with the MergeTree table engine

CREATE TABLE orders (
    orderDate DateTime, 
    gender String,
    customerId UUID,
    cost Float32,
    name String,
    creditCardNumber String,
    address String,
    orderId UUID
)
ENGINE = MergeTree 
ORDER BY (customerId, orderDate);

Create a materialized view that reads data from S3 and writes it into the orders table

CREATE MATERIALIZED VIEW ordersConsumer TO orders AS 
SELECT * 
FROM ordersQueue;

Querying the data

We should now see data coming into the orders table. We can check on the ingestion progress by writing the following query:

FROM logs 
SELECT count(), 
       formatReadableQuantity(count()) AS countFriendly, 
       now() 
Format PrettyNoEscapes;

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Continuously ingesting data from S3 into ClickHouse

Download Clickhouse

ClickHouse Server

ClickHouse Client

Querying the data

Files

README.md

Latest commit

History

README.md

File metadata and controls

Continuously ingesting data from S3 into ClickHouse

Download Clickhouse

ClickHouse Server

ClickHouse Client

Querying the data