Skip to content

Conversation

@Blargian
Copy link
Member

@Blargian Blargian commented Nov 4, 2025

Summary

Adds a guide which details how you can use Cloud, chDB and Scikit learn together to train a model and run inference.

Covers:

  • How to query data from Cloud using chDB with Arrow for efficient transfer
  • How chDB can be used to easily switch back and forth between familiar DataFrames and processing in ClickHouse
  • How to train a binary classifier on a subset of the UK property price datasets (predict property type = flat or property type = detached house)
  • How to use that model in chDB to run inference
  • How to use that model with ClickHouse using UDFs

To do:

  • How to run inference in Cloud (executable UDFs not yet GA)

Checklist

@vercel
Copy link

vercel bot commented Nov 4, 2025

@Blargian is attempting to deploy a commit to the ClickHouse Team on Vercel.

A member of the Team first needs to authorize it.

@vercel
Copy link

vercel bot commented Nov 5, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Updated (UTC)
clickhouse-docs Ready Ready Preview Nov 5, 2025 9:42am

import confusion_matrix from '@site/static/images/use-cases/AI_ML/Scikit/confusion_matrix.png';

# Classifying UK property types with chDB and scikit-learn

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:::note [TL;DR]
This guide demonstrates how chDB complements scikit-learn for ML workflows by building a binary classifier that predicts UK property types. You'll learn how to:
- Use chDB for fast feature engineering on 11.8M records from ClickHouse Cloud
- Build and train a Random Forest classifier achieving ~87% accuracy
- Deploy the model back to ClickHouse via UDFs for real-time inference
The pattern shown here applies to any binary classification problem where you need efficient data preprocessing at scale.
Time required: 45-60 minutes
:::

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants