Student Guide: Build a Cheap Analytics Stack with ClickHouse on Raspberry Pi
clickhouseraspberry pieducation

Student Guide: Build a Cheap Analytics Stack with ClickHouse on Raspberry Pi

UUnknown
2026-02-18
10 min read
Advertisement

Build a low-cost ClickHouse analytics lab on Raspberry Pi for student projects: cluster setup, ETL scripts, OLAP queries, and Grafana dashboards.

Build a cheap, classroom-ready analytics stack with ClickHouse on a Raspberry Pi mini-cluster

Hook: If your class projects stall because cloud credits run out or student laptops choke on OLAP workloads, you can build a low-cost analytics lab that runs locally — and teaches the same concepts as production OLAP systems. This guide shows how to run a small ClickHouse instance on a Raspberry Pi cluster, ingest sample data, run fast OLAP queries, and visualize results with Grafana — all at student-friendly cost.

In 2026, ClickHouse remains a top choice for fast analytical workloads (the company drew major investment in 2025), and Raspberry Pi hardware is more powerful and affordable than ever. This tutorial walks you through the exact hardware, OS configuration, ClickHouse cluster topology, sample ETL, query patterns, and visualization steps I use when mentoring student teams.

Why this matters in 2026 (short version)

  • Real-world OLAP experience: Students learn partitioning, replication, and distributed queries on hardware they can touch.
  • Low cost and repeatable: A 2–4 node Pi cluster costs under $400 (depending on storage) and runs on local networks in labs or dorms.
  • Modern stack parity: ClickHouse features (materialized views, MergeTree families, ClickHouse Keeper) mirror cloud OLAP systems — you teach production concepts without needing cloud budgets.

ClickHouse has seen major investment and rapid feature development through 2025–2026, making it a practical choice for student labs that mirror modern OLAP deployments.

What you'll build (quick overview)

  1. Small Raspberry Pi cluster (2–4 nodes) running 64-bit OS with NVMe storage.
  2. ClickHouse installed on each node with a simple replicated topology and ClickHouse Keeper for consensus.
  3. ETL pipeline to ingest event data (Python script) into ClickHouse.
  4. OLAP queries (group by, time-series rollups, approximate uniques) to analyze activity.
  5. Visualization using Grafana connected to ClickHouse.

Hardware & cost (student-focused)

Target minimum configuration (recommended for Pi 4 / Pi 5):

  • 2–4 x Raspberry Pi 4 (4GB/8GB) or Raspberry Pi 5 (recommended). Pi 5 gives better CPU & NVMe support.
  • One USB-to-NVMe adapter + NVMe drive per node (recommended) or high-end microSD (not ideal for heavy writes).
  • Gigabit switch and Ethernet cables for stable networking.
  • Quality 5V/5A power supply or powered USB-C hub.

Rough price: Pi 5 + NVMe per node ~ $150–200. A 2-node lab can be under $400. For classroom scale, reuse hardware or use a single stronger Pi for demos.

Software choices & 2026 context

In 2026 the stack I recommend:

  • OS: Raspberry Pi OS 64-bit (bullseye/bookworm or Debian 12/13 64-bit) or Ubuntu Server 22.04/24.04 ARM64.
  • ClickHouse: Official ClickHouse ARM64 packages or Docker images. ClickHouse Keeper replaces ZooKeeper in many modern setups; use Keeper for small clusters.
  • ETL: Python (pandas + clickhouse-driver) for synthetic datasets and classroom scripts.
  • Visualization: Grafana with the ClickHouse plugin (stable in 2025–2026), or Apache Superset for richer dashboards.

Step 1 — Prepare the Raspberry Pis

Install a 64-bit OS and enable SSH

Use a 64-bit image so ClickHouse and packages run natively on ARM64. I prefer Raspberry Pi OS 64-bit or Ubuntu Server ARM64.

# Example: flash Ubuntu Server to an SD card
# (on macOS / Linux using balenaEtcher or dd)
# Enable SSH by creating an empty file named 'ssh' in the boot partition

Basic configuration (on each node)

sudo apt update && sudo apt upgrade -y
sudo timedatectl set-ntp true
sudo hostnamectl set-hostname pi-node-1
# set static IP via /etc/netplan or router DHCP reservation

Tip: Use static IPs or reserved DHCP so ClickHouse nodes can reference each other reliably.

Step 2 — Install ClickHouse (ARM64)

You have two options: install native packages or use Docker. Native packages give better I/O performance; Docker is simpler for repeatable classrooms.

# On each Pi node
sudo apt install -y apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E0C56BD4
# Add ClickHouse repo (example for Debian/Ubuntu)
echo "deb https://repo.clickhouse.com/deb/stable/ main/" | sudo tee /etc/apt/sources.list.d/clickhouse.list
sudo apt update
sudo apt install -y clickhouse-server clickhouse-client clickhouse-common-static
sudo systemctl enable --now clickhouse-server

Check server status with:

sudo systemctl status clickhouse-server
clickhouse-client --query "SELECT version()"

Docker alternative

Use a multi-arch ClickHouse image for ARM64. Docker simplifies cleanup for students, but watch I/O limits on SD cards.

docker run -d --name clickhouse-server --ulimit nofile=262144:262144 \
  -p 9000:9000 -p 8123:8123 -v /path/to/data:/var/lib/clickhouse \
  yandex/clickhouse-server:latest

Step 3 — Configure a simple ClickHouse mini-cluster

We'll set up a 2-node replicated cluster using ClickHouse Keeper (recommended over ZooKeeper for modern ClickHouse as of 2025–2026). The steps below show the important config fragments — use them in /etc/clickhouse-server/config.d/ as small files to avoid overwriting defaults.

Enable Keeper on each node

# /etc/clickhouse-server/config.d/keeper.xml

  
    9181
    1 
    60
  

Set server_id different per node. Then configure remote_servers to tell ClickHouse about the cluster:

# /etc/clickhouse-server/config.d/cluster.xml

  
    
      
        
          10.0.0.11
          9000
        
        
          10.0.0.12
          9000
        
      
    
  

Restart ClickHouse on each node after config changes: sudo systemctl restart clickhouse-server.

Create ReplicatedMergeTree tables

On one node (or from clickhouse-client remote), create a replicated table and a distributed table:

CREATE DATABASE IF NOT EXISTS class_db;

CREATE TABLE class_db.events_local
(
  event_date Date,
  event_time DateTime,
  user_id UInt64,
  event_type String,
  amount Float32,
  properties String
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/events', '{replica}')
PARTITION BY toYYYYMM(event_date)
ORDER BY (event_date, user_id)
SETTINGS index_granularity = 8192;

CREATE TABLE class_db.events_dist AS class_db.events_local
ENGINE = Distributed(classroom_cluster, class_db, events_local);

Key concepts: ReplicatedMergeTree provides data replication across replicas. The Distributed engine routes queries to the appropriate shards and merges results.

Step 4 — Ingest sample data (ETL)

For classroom work, synthetic event streams are ideal. We'll use Python to generate events and ingest via clickhouse-driver for speed. This ETL pattern is similar to production: batch writes, partitioning by date, and optional materialized views for rollups.

Python example (generate + insert)

from clickhouse_driver import Client
import random, time, datetime

client = Client(host='10.0.0.11')

def gen_event(i):
    now = datetime.datetime.utcnow()
    return (
        now.date(),
        now.replace(microsecond=0),
        random.randint(1, 1000),
        random.choice(['page_view', 'signup', 'purchase']),
        round(random.random() * 100, 2),
        '{}'
    )

batch = [gen_event(i) for i in range(1000)]
client.execute('INSERT INTO class_db.events_local (event_date,event_time,user_id,event_type,amount,properties) VALUES', batch)

For streaming-style ingestion, batch every N records and run in background. You can also push CSV and use clickhouse-client --query "INSERT INTO ... FORMAT CSV" for quick imports.

Optional: Use materialized views for pre-aggregations

CREATE MATERIALIZED VIEW class_db.events_hourly
TO class_db.events_hourly_store
AS
SELECT
  toStartOfHour(event_time) AS hour,
  event_type,
  count() AS cnt,
  sum(amount) AS total_amount
FROM class_db.events_dist
GROUP BY hour, event_type;

Materialized views let you pre-compute rollups and teach students about trade-offs between storage and query latency.

Step 5 — Run OLAP queries students will love

Run classic analytical patterns to demonstrate ClickHouse strengths:

Time-series & group-by

SELECT toStartOfHour(event_time) AS hour, event_type, count() AS cnt
FROM class_db.events_dist
WHERE event_date >= today() - 7
GROUP BY hour, event_type
ORDER BY hour ASC
LIMIT 1000;

Top users by spend

SELECT user_id, sum(amount) AS spend
FROM class_db.events_dist
WHERE event_type = 'purchase' AND event_date >= today() - 30
GROUP BY user_id
ORDER BY spend DESC
LIMIT 20;

Approximate uniques (fast, low-memory)

SELECT event_type, uniqExact(user_id) AS exact_u, uniqCombined(user_id) AS approx_u
FROM class_db.events_dist
GROUP BY event_type;

Teaching note: Compare uniqExact (accurate, heavier) to uniqCombined (approximate, faster) to show design trade-offs.

Step 6 — Visualize with Grafana

Grafana offers a ClickHouse datasource plugin and is lightweight to run on a Pi or a dedicated laptop. In 2025–2026 the plugin matured and is widely used.

Install Grafana (Docker example)

docker run -d --name=grafana -p 3000:3000 \
  -v grafana-storage:/var/lib/grafana grafana/grafana-oss:latest

Then install the ClickHouse plugin from Grafana plugins catalog (or use built-in plugin in newer Grafana versions). Configure the datasource to point at ClickHouse HTTP endpoint (default port 8123).

Create a basic dashboard

  1. Add a panel with a time-series query like the hourly events query above.
  2. Create a table panel for top users by spend.
  3. Use variables for date ranges and event_type to make dashboards interactive.

Grafana also supports alerting if you want to turn this into an operations demo — e.g., alert if ingest rate falls below expected levels.

Operational tips & classroom best practices

  • Use SSDs: microSD cards wear out. For repeated lab runs, use USB SSDs or NVMe on Pi 5.
  • Limit shards for small clusters: 2 nodes are easier to manage. For replication demos, 3 nodes allow true quorum in Keeper.
  • Backups: Teach students to snapshot/export data via clickhouse-client or backup the /var/lib/clickhouse directory. Consider local policies and data sovereignty when sharing datasets.
  • Monitoring: Expose ClickHouse metrics (Prometheus endpoint) and monitor with Grafana to teach SRE practices.
  • Power & cooling: Stable power and heat dissipation prevent flakiness during long lab sessions.

Example class project ideas

  1. Behavior analytics: Ingest simulated website events, compute cohorts, and visualize retention curves.
  2. IoT telemetry: Stream sensor readings to ClickHouse and build real-time dashboards.
  3. Benchmarking exercise: Compare query latencies between single-node and distributed setups, or with/without materialized views.

Recent trends through late 2025 and into 2026 make this approach especially relevant:

  • ClickHouse investment & momentum: Major funding in 2025 accelerated development (better ARM builds, Keeper support, cloud connectors), so students learn a fast-evolving OLAP platform used in production.
  • Edge-capable hardware: Pi 5 and add-ons (like AI HATs released in 2024–2025) bring more CPU and NVMe capability to edge devices, enabling heavier analytical workloads locally.
  • Open-source observability: Grafana + Prometheus + ClickHouse has become mainstream for analytics and metrics, so skills transfer to industry tools.

Troubleshooting common issues

High write latency

Causes: slow microSD, tiny batch sizes, or heavy merges. Fixes: use SSD, increase batch sizes, tune MergeTree settings (index_granularity), and schedule merges during off-hours.

Replication errors

Ensure ClickHouse Keeper is reachable on 9181 and server_id values are unique. Check logs under /var/log/clickhouse-server/ for zookeeper/keeper related messages.

Out-of-memory queries

Use LIMIT, run GROUP BY with smaller key sets, or teach approximate algorithms (uniqCombined, quantilesTDigest).

Actionable checklist (one-page summary)

  1. Pick hardware: 2–3 Pi 5 (or Pi 4) + SSDs.
  2. Install 64-bit OS and configure static IPs.
  3. Install ClickHouse (apt or Docker). Verify version.
  4. Configure ClickHouse Keeper + remote_servers and restart services.
  5. Create ReplicatedMergeTree and Distributed tables.
  6. Run Python ETL to insert sample events.
  7. Execute OLAP queries; compare speeds and behaviors.
  8. Install Grafana and connect to ClickHouse for dashboards.
  9. Document experiments and share dashboards with classmates.

Further reading & resources

  • ClickHouse docs (architecture, replication, Keeper) — check official docs for the current 2026 config formats.
  • Grafana ClickHouse plugin docs for the latest datasource setup.
  • Data sovereignty and local policies for classroom datasets.

Actionable takeaways

Build once, reuse forever: A Pi mini-cluster gives every student team a portable OLAP sandbox. You can iterate on queries, ETL, and dashboards — and reuse the same setup for multiple courses (databases, backend, data science).

Teach production patterns locally: Replication, distributed queries, materialized views, and monitoring are tangible on a small cluster, and they translate directly to cloud or enterprise environments.

Call to action

Ready to try it? Clone the companion GitHub repo with config templates, Python ETL scripts, and Grafana dashboards (link in the course materials). Start with a 2-node cluster today: set up the first Pi, install ClickHouse, and run the sample ETL in under an hour. Share your dashboard screenshots in class and turn this into your next project or demo.

Next step: Set up the first node now — follow the checklist above and post questions to the class forum. If you want a step-by-step walk-through with one-click scripts for classroom labs, check out the course resources or reach out to contribute a lab exercise.

Advertisement

Related Topics

#clickhouse#raspberry pi#education
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-02-18T01:03:58.453Z