Skip to content

Commit 7410e90

Browse files
committed
Blog ClickHouse
1 parent 99ea50e commit 7410e90

File tree

1 file changed

+183
-0
lines changed

1 file changed

+183
-0
lines changed
Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
---
2+
slug: 2026-03-04-sea-clickhouse
3+
title: 'ClickHouse meets SeaORM: Arrow-powered data pipeline'
4+
author: SeaQL Team
5+
author_title: Chris Tsang
6+
author_url: https://github.com/SeaQL
7+
author_image_url: https://www.sea-ql.org/blog/img/SeaQL.png
8+
image: https://www.sea-ql.org/blog/img/SeaORM%202.0%20Banner.png
9+
tags: [news]
10+
---
11+
12+
<img alt="SeaORM 2.0 Banner" src="/blog/img/SeaORM%202.0%20Banner.png"/>
13+
14+
[`sea-clickhouse`](https://github.com/SeaQL/clickhouse-rs) is a ClickHouse client that integrates with the SeaQL ecosystem. It is a soft fork of [`clickhouse-rs`](https://github.com/ClickHouse/clickhouse-rs): 100% compatible with all upstream features, with SeaQL's dynamic type and Arrow support added on top. The fork is continually rebased on upstream.
15+
16+
In this blog post we cover:
17+
18+
- **Dynamic rows with `try_get`**: fetch query results without defining any schema struct
19+
- **Arrow RecordBatch streaming**: stream query results as `RecordBatch`es and insert them back into ClickHouse
20+
- **SeaORM to ClickHouse**: convert SeaORM entities to Arrow and insert into ClickHouse
21+
- **Schema DDL from Arrow**: derive `CREATE TABLE` DDL from an Arrow schema, no hand-written SQL
22+
23+
```toml
24+
[dependencies]
25+
sea-clickhouse = { version = "0.14", features = ["arrow"] }
26+
```
27+
28+
## The Problem with Typed Rows
29+
30+
The native `clickhouse-rs` client requires you to define a `#[derive(Row)]` struct whose field types match the query output exactly:
31+
32+
```rust
33+
#[derive(Row, Deserialize)]
34+
struct Reading {
35+
#[serde(with = "special_deserializer")] // or use a custom deserializer?
36+
temperature: f64, // f32? f64? depends on the SQL expression
37+
}
38+
39+
let mut cursor = client.query("SELECT ...").fetch::<Reading>()?;
40+
```
41+
42+
If the struct says `f32` but the server returns `Float64`, you get a runtime deserialization error. For computed columns, ad-hoc queries, or `SELECT *` on evolving tables, maintaining these structs is fragile.
43+
44+
[`sea-clickhouse`](https://docs.rs/sea-clickhouse) reads the column types from the response metadata and maps them to [`sea_query::Value`](https://docs.rs/sea-orm/2.0.0-rc.36/sea_orm/enum.Value.html) automatically:
45+
46+
```rust
47+
let mut cursor = client.query("SELECT 1::UInt32 + 1::Float32 AS value").fetch_rows()?;
48+
let row = cursor.next().await?.expect("one row");
49+
50+
// UInt32 + Float32 -> Float64
51+
assert_eq!(row.try_get::<f64, _>(0)?, 2.0); // by index
52+
assert_eq!(row.try_get::<f32, _>(0)?, 2.0); // narrower type also works
53+
assert_eq!(row.try_get::<Decimal, _>("value")?, 2.into()); // by column name
54+
```
55+
56+
[`try_get`](https://docs.rs/sea-clickhouse/latest/clickhouse/struct.DataRow.html#method.try_get) coerces at runtime: access by index or column name, request the type you need, and it converts where possible.
57+
58+
## Arrow RecordBatch Streaming
59+
60+
[`next_arrow_batch(chunk_size)`](https://docs.rs/sea-clickhouse/latest/clickhouse/query/struct.DataRowCursor.html#method.next_arrow_batch) streams query results as Arrow `RecordBatch`es, ready for DataFusion, Polars, Parquet export, or any Arrow consumer.
61+
62+
```rust
63+
use clickhouse::Client;
64+
use sea_orm_arrow::arrow::{array::RecordBatch, util::pretty};
65+
66+
let client = Client::default().with_url("http://localhost:18123");
67+
68+
let sql = r#"
69+
SELECT
70+
toUInt64(number) + 1 AS id,
71+
toDateTime64('2026-01-01', 6)
72+
+ toIntervalSecond(rand() % 86400)
73+
+ toIntervalMillisecond(rand() % 1000) AS recorded_at,
74+
toInt32(100 + rand() % 10) AS sensor_id,
75+
-10.0 + randUniform(0.0, 50.0) AS temperature,
76+
toDecimal128(3.0 + toFloat64(rand() % 5000) / 10000.0, 4) AS voltage
77+
FROM system.numbers
78+
LIMIT 20
79+
"#;
80+
81+
let mut cursor = client.query(sql).fetch_rows()?;
82+
let mut batches: Vec<RecordBatch> = Vec::new();
83+
while let Some(batch) = cursor.next_arrow_batch(10).await? {
84+
batches.push(batch);
85+
}
86+
pretty::print_batches(&batches).unwrap();
87+
```
88+
89+
```text
90+
+----+-------------------------+-----------+----------------------+---------+
91+
| id | recorded_at | sensor_id | temperature | voltage |
92+
+----+-------------------------+-----------+----------------------+---------+
93+
| 1 | 2026-01-01T13:35:36.736 | 106 | 36.345616831016436 | 3.2736 |
94+
| 2 | 2026-01-01T10:07:38.458 | 108 | 10.122001773336567 | 3.3458 |
95+
| 3 | 2026-01-01T01:15:18.518 | 108 | 35.21406789966149 | 3.1518 |
96+
| 4 | 2026-01-01T05:36:57.017 | 107 | 22.92828141235666 | 3.2016 |
97+
| 5 | 2026-01-01T13:17:36.056 | 106 | -2.082591477369223 | 3.0056 |
98+
| ... |
99+
+----+-------------------------+-----------+----------------------+---------+
100+
```
101+
102+
Those batches can be inserted back into ClickHouse directly:
103+
104+
```rust
105+
let mut insert = client.insert_arrow("sensor_data", &batches[0].schema()).await?;
106+
for batch in &batches {
107+
insert.write_batch(batch).await?;
108+
}
109+
insert.end().await?;
110+
```
111+
112+
No `#[derive(Row)]` macros. The Arrow schema carries the type information end-to-end.
113+
114+
## SeaORM to ClickHouse
115+
116+
SeaORM started as an OLTP toolkit, but with Arrow and now ClickHouse support, SeaQL bridges data engineering with data science: your OLTP entities become the source of truth for OLAP pipelines too.
117+
118+
```rust
119+
use sea_orm::entity::prelude::*;
120+
121+
#[sea_orm::model]
122+
#[derive(Clone, Debug, PartialEq, DeriveEntityModel)]
123+
#[sea_orm(table_name = "measurement", arrow_schema)]
124+
pub struct Model {
125+
#[sea_orm(primary_key)]
126+
pub id: i32,
127+
pub recorded_at: ChronoDateTime,
128+
pub sensor_id: i32,
129+
pub temperature: f64,
130+
#[sea_orm(column_type = "Decimal(Some((38, 4)))")]
131+
pub voltage: Decimal,
132+
}
133+
```
134+
135+
Build models, convert to a `RecordBatch`, and insert:
136+
137+
```rust
138+
use sea_orm::ArrowSchema;
139+
140+
let models: Vec<measurement::ActiveModel> = vec![..];
141+
142+
let schema = measurement::Entity::arrow_schema();
143+
let batch = measurement::ActiveModel::to_arrow(&models, &schema)?;
144+
145+
let mut insert = client.insert_arrow("measurement", &schema).await?;
146+
insert.write_batch(&batch).await?;
147+
insert.end().await?;
148+
```
149+
150+
[`arrow_schema`](https://docs.rs/sea-orm/2.0.0-rc.36/sea_orm/entity/trait.ArrowSchema.html#tymethod.arrow_schema) on the entity [derives the Arrow schema](https://www.sea-ql.org/blog/2026-02-22-sea-orm-arrow/) at compile time. `to_arrow` converts a slice of `ActiveModel`s into a `RecordBatch`. From there, [`insert_arrow`](https://docs.rs/sea-clickhouse/latest/clickhouse/struct.Client.html#method.insert_arrow) streams the batch into ClickHouse over HTTP. See the full [working example](https://github.com/SeaQL/clickhouse-rs/blob/main/sea-orm-arrow-example/src/main.rs).
151+
152+
## Arrow Schema to ClickHouse DDL
153+
154+
[`ClickHouseSchema::from_arrow`](https://docs.rs/sea-clickhouse/latest/clickhouse/schema/struct.ClickHouseSchema.html#method.from_arrow) derives a full `CREATE TABLE` statement from any Arrow schema, and exposes a SeaQuery-like fluent API to configure ClickHouse-specific attributes:
155+
156+
```rust
157+
use clickhouse::schema::{ClickHouseSchema, Engine};
158+
159+
let mut schema = ClickHouseSchema::from_arrow(&batch.schema());
160+
schema
161+
.table_name("sensor_readings")
162+
.engine(Engine::ReplacingMergeTree)
163+
.primary_key(["recorded_at", "device"]);
164+
schema.find_column_mut("device").set_low_cardinality(true);
165+
166+
let ddl = schema.to_string();
167+
client.query(&ddl).execute().await?;
168+
```
169+
170+
Generated DDL:
171+
172+
```sql
173+
CREATE TABLE sensor_readings (
174+
id UInt64,
175+
recorded_at DateTime64(6),
176+
device LowCardinality(String),
177+
temperature Nullable(Float64),
178+
voltage Decimal(38, 4)
179+
) ENGINE = ReplacingMergeTree()
180+
PRIMARY KEY (recorded_at, device)
181+
```
182+
183+
The full workflow: Arrow -> derive DDL -> create table -> insert batches. Zero hand-written ClickHouse DDL. All examples shown here are available as [runnable examples](https://github.com/SeaQL/clickhouse-rs/tree/main?tab=readme-ov-file#examples) in the repository.

0 commit comments

Comments
 (0)