educationprojectsAI

Classroom Lab: Teach On-Device ML by Porting a Tiny Model to Mobile Browsers

UUnknown

2026-02-26

9 min read

Teacher-ready lab: port a tiny model to mobile browsers, teach quantization, bundling, and on-device ML with starter code and assessments.

Hook: Why teachers need an on-device ML lab now

Students want projects they can run on their phones. Teachers need labs that fit one or two class periods, avoid heavy server setup, and teach practical trade-offs like size vs. accuracy. In 2026, with WebGPU and WebNN support maturing and mobile browsers encouraging local AI (Puma-inspired local-AI browsers are a recent example), running models entirely in a mobile browser is not only possible — it's a powerful teaching moment for privacy, performance, and real-world constraints.

What this lesson pack delivers

This teacher-friendly lesson plan and lab sequence helps you move a tiny model from Python to a mobile browser. You’ll get:

Clear learning objectives and timing for a 2–4 class lab unit
Step-by-step exercises: quantization, bundling, and running on-device in mobile browsers
Starter code (browser + conversion scripts) and commands
Assessment ideas and rubrics that focus on engineering trade-offs
Community challenge and mentorship suggestions to scale the activity

Why teach on-device ML in 2026?

Recent runtime and browser developments in late 2024–2026 mean students can learn modern ML deployment without specialized hardware:

WebGPU is widely available on Android and iOS builds of major browsers, offering GPU acceleration in the browser for models that can leverage it.
WebNN and improved WebAssembly toolchains enable faster, lower-latency inference and access to device accelerators where supported.
WASM SIMD + threads and the rise of optimized runtimes (TensorFlow Lite Web, ONNX Runtime Web) make tiny models practical for mobile phones.
Privacy-first “local AI” browsers and apps (Puma-inspired) have popularized on-device inference as a concrete privacy and UX advantage.

Classroom overview (90–180 minutes or 2–4 classes)

Learning objectives

Explain the trade-offs between model size, precision, latency, and accuracy.
Convert a small model into a browser-friendly format and apply post-training quantization.
Bundle the model into a lightweight web app and run inference on a mobile browser.
Measure performance and iterate on optimizations.

Prerequisites

Basic Python and JavaScript experience (loops, functions, npm)
Familiarity with Keras/TensorFlow or PyTorch (for conversion options)
Student devices (phones or tablets) and Wi‑Fi; optional: GitHub Classroom

Materials and tooling (teacher checklist)

Node.js (v18+ recommended), npm or yarn
Python 3.9+ with tensorflow (or torch/onnx for alternate paths)
tensorflowjs_converter, tflite-runtime or onnxruntime-tools
Starter repository template (provided in starter code block below)

Lesson flow: three hands-on labs

Lab 1 — Convert and quantize a tiny model (45–60 min)

Goal: Move a small image or text model from Python to a format a browser runtime can load. Focus: post-training quantization.

Teacher notes

Use a tiny model: a simple digit classifier (MNIST) or small MobileNetV2 with reduced input size (96x96). For classroom speed, provide a pre-trained SavedModel so students run conversion steps instead of training.

Steps and commands (TensorFlow path)

Install required tools (one-time):

python -m pip install tensorflow tensorflowjs

Start from a SavedModel directory (teacher provides):
```
saved_model/
```
Convert and quantize to TF.js format (weights quantized to 1 byte where possible):
```
tensorflowjs_converter \
      --input_format=tf_saved_model \
      --output_format=tfjs_graph_model \
      --skip_op_check \
      --quantization_bytes=1 \
      saved_model/ web_model/
```
Notes: --quantization_bytes reduces model size by quantizing weights to 1/2/4 bytes. For int8 accuracy drop is possible; use float16 (--quantize_float16) if supported by runtime.

Alternatively, convert to TFLite and apply post-training int8 quantization for TFLite Web runtimes:

import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_saved_model('saved_model')
converter.optimizations = [tf.lite.Optimize.DEFAULT]
# Provide a representative dataset function for full integer quantization
converter.representative_dataset = representative_data_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
tflite_model = converter.convert()
open('model_quant.tflite','wb').write(tflite_model)

Teaching tip

Have students measure final artifact size (in KB) and run a single inference locally on their laptop to confirm conversions before moving to browser bundling.

Lab 2 — Bundle the model and build a tiny web app (45–60 min)

Goal: Create a compact web app that loads the quantized model and runs inference in the mobile browser. Focus: bundling, lazy-loading, and progressive UX.

Starter project structure

starter-web/
├─ public/
│  └─ index.html
├─ src/
│  ├─ app.js
│  └─ model-loader.js
├─ web_model/    # output from converter
├─ package.json
└─ vite.config.js

Key implementation points

Use a modern dev server like Vite for fast iteration and small production bundles.
Lazy-load the model with dynamic import so the initial page is lightweight.
Use a Service Worker (or Workbox) to cache the model for offline runs — very impactful on mobile.

Minimal model loader (TF.js GraphModel example)

/* src/model-loader.js */
import * as tf from '@tensorflow/tfjs';
export async function loadModel() {
  // Lazy import tfjs-backend-webgpu if available for speed
  if (navigator.gpu) {
    await import('@tensorflow/tfjs-backend-webgpu');
    await tf.setBackend('webgpu');
  } else {
    await tf.setBackend('cpu');
  }
  const model = await tf.loadGraphModel('/web_model/model.json');
  return model;
}

App entry (simplified)

/* src/app.js */
import { loadModel } from './model-loader';
const loadBtn = document.querySelector('#load');
loadBtn.addEventListener('click', async () => {
  const model = await loadModel();
  document.querySelector('#status').textContent = 'Model loaded';
  // Run a dummy inference for warm-up
  const dummy = tf.zeros([1, 96, 96, 3]);
  const out = model.predict(dummy);
  out.data().then(() => document.querySelector('#status').textContent = 'Warm';);
});

Mobile-specific packaging tips

Compress model assets (gzip or brotli) and configure server to serve compressed .bin files.
Use range requests for very large models; chunked loading or sharding lets the app start before full download.
Provide a small visual progress bar and fallback message for low memory devices.

Lab 3 — Optimize, measure, and iterate (45–60 min)

Goal: Evaluate latency, memory, and accuracy on student devices; iterate on quantization and input size.

Suggested experiments

Compare float16 vs int8 quantized models for size and accuracy.
Measure cold start (first inference) vs warmed inference timings using performance.now().
Try WebGPU vs CPU backend on supported devices and note battery/heat implications.

Measurement snippet

const t0 = performance.now();
const y = model.predict(inputTensor);
await y.data();
const t1 = performance.now();
console.log('Inference ms:', t1 - t0);

Teacher-led discussion prompts

Why did int8 reduce size but sometimes hurt accuracy?
How do browser backends affect latency and battery usage?
What are the privacy advantages of local inference? What are the trade-offs?

Starter code bundle (copy-paste friendly)

Below is a minimal package.json and a Vite config to quickly serve the app. Teachers can paste these into the starter repo.

{
  "name": "on-device-ml-lab",
  "version": "0.1.0",
  "scripts": {
    "dev": "vite",
    "build": "vite build",
    "preview": "vite preview"
  },
  "dependencies": {
    "@tensorflow/tfjs": "^4.0.0"
  },
  "devDependencies": {
    "vite": "^5.0.0"
  }
}

Assessment: rubrics and ideas

Assessments should reward engineering reasoning, not just a working demo. Here are practical, teacher-friendly rubrics and quick checks.

Rubric: Small project deliverable (50 points)

Functionality (20 pts): App loads on a mobile browser and runs inference without crashing (10), model is quantized and smaller than baseline (10).
Optimization & Analysis (15 pts): Students measured time and size, reported trade-offs, and justified a quantization choice (15).
UX & Robustness (10 pts): Good progress feedback, handles load errors, caches model (10).
Documentation & Code (5 pts): README explains how to run and reproduce results (5).

Quick formative checks

Can the group explain why they chose int8 vs float16?
Show one measured inference time on a real phone.
Demonstrate the app runs offline after caching (Service Worker evidence).

Advanced extensions and community challenge ideas

Turn this into a class or community challenge to increase engagement and mentorship opportunities.

Host a 48-hour "Tiny Model Hackathon" with categories: smallest working model, fastest cold inference, most creative UX.
Use GitHub Classroom to assign starter repos and enable peer code review — pair students as mentors and mentees.
Introduce a leaderboard that weights on-device privacy features (no remote callouts) and inclusive performance (runs on older phones).

Common pitfalls and how to recover

Model too large to download on weak connections: shard the model or switch to smaller architecture (SqueezeNet, MobileNetV3 minimal).
Device runs out of memory: reduce batch size, lower input resolution, or use streaming pipelines (process patches).
Unexpected accuracy drop: test representative_dataset during quantization or use post-training float16 instead of int8.

Teacher tip: Always provide a “known good” demo build. Students learn best by changing something that already works.

Privacy, ethics, and classroom discussion prompts

On-device ML introduces important conversations you can embed in the lab:

Local inference reduces data exposure — when does this matter most?
Are smaller models biased in different ways? Ask students to test misclassifications.
Discuss energy consumption — is local inference greener than cloud inference in all cases?

Connections to 2026 trends and future predictions

As of early 2026, browsers and runtimes are converging around these principles:

Progressive local AI: More mobile browsers are shipping local AI features; expect privacy-first UX patterns to become standard teaching examples.
Model modularity: Sharded and streamed models (tiny cores + optional plugins) will be common for constrained devices.
Tooling democratization: Better converters and on-device debuggers will shrink the friction between model design and deployment — teachers should re-run this lab each semester to include tooling updates.

Starter assessment rubric — example checklist for teachers

Does the app load on student device? (Yes/No)
Does it run inference within user-acceptable latency (<500ms for simple classifications)? (Yes/No)
If quantized, what is the model size reduction? (Provide %)
Student reflection: explain one trade-off they made (200–400 words).

Wrap-up: What students will have learned

By the end of this lab sequence, students will be able to:

Apply conversion and quantization techniques to create browser-friendly models.
Build a minimal web app that performs on-device inference on mobile browsers.
Measure and report engineering trade-offs in model deployment.
Participate in community challenges and mentor peers through code reviews and demos.

Resources and further reading (teacher-curated)

Browsers' WebGPU and WebNN release notes (follow vendor docs for platform-specific tuning)
TensorFlow.js and TensorFlow Lite Web runtimes — official docs for conversion and backends
ONNX Runtime Web for models converted from PyTorch/ONNX if you prefer that ecosystem

Final classroom challenge (project prompt)

“Ship a privacy-first mobile demo: pick any tiny model (vision, speech keyword spotting, or text classification), make it run entirely in a mobile browser, and show its latency and size. Include a one-page reflection that explains two trade-offs you made. Bonus: cache your model to make it usable offline and show a peer review.”

Call-to-action

Ready to run this lab? Clone the starter repo, pre-provision the SavedModel for your class, and launch a 2-class mini-hackathon. Invite other teachers to share starter models and rubrics — great community challenges start in classrooms. If you want, I can generate a customized starter repository (Keras or PyTorch path) and a printable one-page worksheet for students — tell me which model type you prefer and the length of your class period.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.