Skip to content

Latest commit

 

History

History
86 lines (67 loc) · 1.57 KB

README.md

File metadata and controls

86 lines (67 loc) · 1.57 KB

Continuously ingesting data from S3 into ClickHouse

In this recipe, we'll learn how to ingest data continuously from an S3 bucket into ClickHouse.

Download Clickhouse

curl https://clickhouse.com/ | sh

ClickHouse Server

cp clickhouse clickhouse-server
cd clickhouse-server

Start the server

./clickhouse server

ClickHouse Client

Create a table with S3Queue table engine

CREATE TABLE ordersQueue (
    orderDate DateTime, 
    gender String,
    customerId UUID,
    cost Float32,
    name String,
    creditCardNumber String,
    address String,
    orderId UUID
)
ENGINE = S3Queue(
    'https://s3queue.clickhouse.com.s3.eu-north-1.amazonaws.com/data/*.json',
    JSONEachRow
)
SETTINGS 
    mode = 'ordered', 
    s3queue_enable_logging_to_s3queue_log = 1;

Create a table with the MergeTree table engine

CREATE TABLE orders (
    orderDate DateTime, 
    gender String,
    customerId UUID,
    cost Float32,
    name String,
    creditCardNumber String,
    address String,
    orderId UUID
)
ENGINE = MergeTree 
ORDER BY (customerId, orderDate);

Create a materialized view that reads data from S3 and writes it into the orders table

CREATE MATERIALIZED VIEW ordersConsumer TO orders AS 
SELECT * 
FROM ordersQueue;

Querying the data

We should now see data coming into the orders table. We can check on the ingestion progress by writing the following query:

FROM logs 
SELECT count(), 
       formatReadableQuantity(count()) AS countFriendly, 
       now() 
Format PrettyNoEscapes;