|
| 1 | +--- |
| 2 | +slug: 2026-03-04-sea-clickhouse |
| 3 | +title: 'ClickHouse meets SeaORM: Arrow-powered data pipeline' |
| 4 | +author: SeaQL Team |
| 5 | +author_title: Chris Tsang |
| 6 | +author_url: https://github.com/SeaQL |
| 7 | +author_image_url: https://www.sea-ql.org/blog/img/SeaQL.png |
| 8 | +image: https://www.sea-ql.org/blog/img/SeaORM%202.0%20Banner.png |
| 9 | +tags: [news] |
| 10 | +--- |
| 11 | + |
| 12 | +<img alt="SeaORM 2.0 Banner" src="/blog/img/SeaORM%202.0%20Banner.png"/> |
| 13 | + |
| 14 | +[`sea-clickhouse`](https://github.com/SeaQL/clickhouse-rs) is a ClickHouse client that integrates with the SeaQL ecosystem. It is a soft fork of [`clickhouse-rs`](https://github.com/ClickHouse/clickhouse-rs): 100% compatible with all upstream features, with SeaQL's dynamic type and Arrow support added on top. The fork is continually rebased on upstream. |
| 15 | + |
| 16 | +In this blog post we cover: |
| 17 | + |
| 18 | +- **Dynamic rows with `try_get`**: fetch query results without defining any schema struct |
| 19 | +- **Arrow RecordBatch streaming**: stream query results as `RecordBatch`es and insert them back into ClickHouse |
| 20 | +- **SeaORM to ClickHouse**: convert SeaORM entities to Arrow and insert into ClickHouse |
| 21 | +- **Schema DDL from Arrow**: derive `CREATE TABLE` DDL from an Arrow schema, no hand-written SQL |
| 22 | + |
| 23 | +```toml |
| 24 | +[dependencies] |
| 25 | +sea-clickhouse = { version = "0.14", features = ["arrow"] } |
| 26 | +``` |
| 27 | + |
| 28 | +## The Problem with Typed Rows |
| 29 | + |
| 30 | +The native `clickhouse-rs` client requires you to define a `#[derive(Row)]` struct whose field types match the query output exactly: |
| 31 | + |
| 32 | +```rust |
| 33 | +#[derive(Row, Deserialize)] |
| 34 | +struct Reading { |
| 35 | + #[serde(with = "special_deserializer")] // or use a custom deserializer? |
| 36 | + temperature: f64, // f32? f64? depends on the SQL expression |
| 37 | +} |
| 38 | + |
| 39 | +let mut cursor = client.query("SELECT ...").fetch::<Reading>()?; |
| 40 | +``` |
| 41 | + |
| 42 | +If the struct says `f32` but the server returns `Float64`, you get a runtime deserialization error. For computed columns, ad-hoc queries, or `SELECT *` on evolving tables, maintaining these structs is fragile. |
| 43 | + |
| 44 | +[`sea-clickhouse`](https://docs.rs/sea-clickhouse) reads the column types from the response metadata and maps them to [`sea_query::Value`](https://docs.rs/sea-orm/2.0.0-rc.36/sea_orm/enum.Value.html) automatically: |
| 45 | + |
| 46 | +```rust |
| 47 | +let mut cursor = client.query("SELECT 1::UInt32 + 1::Float32 AS value").fetch_rows()?; |
| 48 | +let row = cursor.next().await?.expect("one row"); |
| 49 | + |
| 50 | +// UInt32 + Float32 -> Float64 |
| 51 | +assert_eq!(row.try_get::<f64, _>(0)?, 2.0); // by index |
| 52 | +assert_eq!(row.try_get::<f32, _>(0)?, 2.0); // narrower type also works |
| 53 | +assert_eq!(row.try_get::<Decimal, _>("value")?, 2.into()); // by column name |
| 54 | +``` |
| 55 | + |
| 56 | +[`try_get`](https://docs.rs/sea-clickhouse/latest/clickhouse/struct.DataRow.html#method.try_get) coerces at runtime: access by index or column name, request the type you need, and it converts where possible. |
| 57 | + |
| 58 | +## Arrow RecordBatch Streaming |
| 59 | + |
| 60 | +[`next_arrow_batch(chunk_size)`](https://docs.rs/sea-clickhouse/latest/clickhouse/query/struct.DataRowCursor.html#method.next_arrow_batch) streams query results as Arrow `RecordBatch`es, ready for DataFusion, Polars, Parquet export, or any Arrow consumer. |
| 61 | + |
| 62 | +```rust |
| 63 | +use clickhouse::Client; |
| 64 | +use sea_orm_arrow::arrow::{array::RecordBatch, util::pretty}; |
| 65 | + |
| 66 | +let client = Client::default().with_url("http://localhost:18123"); |
| 67 | + |
| 68 | +let sql = r#" |
| 69 | + SELECT |
| 70 | + toUInt64(number) + 1 AS id, |
| 71 | + toDateTime64('2026-01-01', 6) |
| 72 | + + toIntervalSecond(rand() % 86400) |
| 73 | + + toIntervalMillisecond(rand() % 1000) AS recorded_at, |
| 74 | + toInt32(100 + rand() % 10) AS sensor_id, |
| 75 | + -10.0 + randUniform(0.0, 50.0) AS temperature, |
| 76 | + toDecimal128(3.0 + toFloat64(rand() % 5000) / 10000.0, 4) AS voltage |
| 77 | + FROM system.numbers |
| 78 | + LIMIT 20 |
| 79 | +"#; |
| 80 | + |
| 81 | +let mut cursor = client.query(sql).fetch_rows()?; |
| 82 | +let mut batches: Vec<RecordBatch> = Vec::new(); |
| 83 | +while let Some(batch) = cursor.next_arrow_batch(10).await? { |
| 84 | + batches.push(batch); |
| 85 | +} |
| 86 | +pretty::print_batches(&batches).unwrap(); |
| 87 | +``` |
| 88 | + |
| 89 | +```text |
| 90 | ++----+-------------------------+-----------+----------------------+---------+ |
| 91 | +| id | recorded_at | sensor_id | temperature | voltage | |
| 92 | ++----+-------------------------+-----------+----------------------+---------+ |
| 93 | +| 1 | 2026-01-01T13:35:36.736 | 106 | 36.345616831016436 | 3.2736 | |
| 94 | +| 2 | 2026-01-01T10:07:38.458 | 108 | 10.122001773336567 | 3.3458 | |
| 95 | +| 3 | 2026-01-01T01:15:18.518 | 108 | 35.21406789966149 | 3.1518 | |
| 96 | +| 4 | 2026-01-01T05:36:57.017 | 107 | 22.92828141235666 | 3.2016 | |
| 97 | +| 5 | 2026-01-01T13:17:36.056 | 106 | -2.082591477369223 | 3.0056 | |
| 98 | +| ... | |
| 99 | ++----+-------------------------+-----------+----------------------+---------+ |
| 100 | +``` |
| 101 | + |
| 102 | +Those batches can be inserted back into ClickHouse directly: |
| 103 | + |
| 104 | +```rust |
| 105 | +let mut insert = client.insert_arrow("sensor_data", &batches[0].schema()).await?; |
| 106 | +for batch in &batches { |
| 107 | + insert.write_batch(batch).await?; |
| 108 | +} |
| 109 | +insert.end().await?; |
| 110 | +``` |
| 111 | + |
| 112 | +No `#[derive(Row)]` macros. The Arrow schema carries the type information end-to-end. |
| 113 | + |
| 114 | +## SeaORM to ClickHouse |
| 115 | + |
| 116 | +SeaORM started as an OLTP toolkit, but with Arrow and now ClickHouse support, SeaQL bridges data engineering with data science: your OLTP entities become the source of truth for OLAP pipelines too. |
| 117 | + |
| 118 | +```rust |
| 119 | +use sea_orm::entity::prelude::*; |
| 120 | + |
| 121 | +#[sea_orm::model] |
| 122 | +#[derive(Clone, Debug, PartialEq, DeriveEntityModel)] |
| 123 | +#[sea_orm(table_name = "measurement", arrow_schema)] |
| 124 | +pub struct Model { |
| 125 | + #[sea_orm(primary_key)] |
| 126 | + pub id: i32, |
| 127 | + pub recorded_at: ChronoDateTime, |
| 128 | + pub sensor_id: i32, |
| 129 | + pub temperature: f64, |
| 130 | + #[sea_orm(column_type = "Decimal(Some((38, 4)))")] |
| 131 | + pub voltage: Decimal, |
| 132 | +} |
| 133 | +``` |
| 134 | + |
| 135 | +Build models, convert to a `RecordBatch`, and insert: |
| 136 | + |
| 137 | +```rust |
| 138 | +use sea_orm::ArrowSchema; |
| 139 | + |
| 140 | +let models: Vec<measurement::ActiveModel> = vec![..]; |
| 141 | + |
| 142 | +let schema = measurement::Entity::arrow_schema(); |
| 143 | +let batch = measurement::ActiveModel::to_arrow(&models, &schema)?; |
| 144 | + |
| 145 | +let mut insert = client.insert_arrow("measurement", &schema).await?; |
| 146 | +insert.write_batch(&batch).await?; |
| 147 | +insert.end().await?; |
| 148 | +``` |
| 149 | + |
| 150 | +[`arrow_schema`](https://docs.rs/sea-orm/2.0.0-rc.36/sea_orm/entity/trait.ArrowSchema.html#tymethod.arrow_schema) on the entity [derives the Arrow schema](https://www.sea-ql.org/blog/2026-02-22-sea-orm-arrow/) at compile time. `to_arrow` converts a slice of `ActiveModel`s into a `RecordBatch`. From there, [`insert_arrow`](https://docs.rs/sea-clickhouse/latest/clickhouse/struct.Client.html#method.insert_arrow) streams the batch into ClickHouse over HTTP. See the full [working example](https://github.com/SeaQL/clickhouse-rs/blob/main/sea-orm-arrow-example/src/main.rs). |
| 151 | + |
| 152 | +## Arrow Schema to ClickHouse DDL |
| 153 | + |
| 154 | +[`ClickHouseSchema::from_arrow`](https://docs.rs/sea-clickhouse/latest/clickhouse/schema/struct.ClickHouseSchema.html#method.from_arrow) derives a full `CREATE TABLE` statement from any Arrow schema, and exposes a SeaQuery-like fluent API to configure ClickHouse-specific attributes: |
| 155 | + |
| 156 | +```rust |
| 157 | +use clickhouse::schema::{ClickHouseSchema, Engine}; |
| 158 | + |
| 159 | +let mut schema = ClickHouseSchema::from_arrow(&batch.schema()); |
| 160 | +schema |
| 161 | + .table_name("sensor_readings") |
| 162 | + .engine(Engine::ReplacingMergeTree) |
| 163 | + .primary_key(["recorded_at", "device"]); |
| 164 | +schema.find_column_mut("device").set_low_cardinality(true); |
| 165 | + |
| 166 | +let ddl = schema.to_string(); |
| 167 | +client.query(&ddl).execute().await?; |
| 168 | +``` |
| 169 | + |
| 170 | +Generated DDL: |
| 171 | + |
| 172 | +```sql |
| 173 | +CREATE TABLE sensor_readings ( |
| 174 | + id UInt64, |
| 175 | + recorded_at DateTime64(6), |
| 176 | + device LowCardinality(String), |
| 177 | + temperature Nullable(Float64), |
| 178 | + voltage Decimal(38, 4) |
| 179 | +) ENGINE = ReplacingMergeTree() |
| 180 | +PRIMARY KEY (recorded_at, device) |
| 181 | +``` |
| 182 | + |
| 183 | +The full workflow: Arrow -> derive DDL -> create table -> insert batches. Zero hand-written ClickHouse DDL. All examples shown here are available as [runnable examples](https://github.com/SeaQL/clickhouse-rs/tree/main?tab=readme-ov-file#examples) in the repository. |
0 commit comments