v2 Atome LM v2 — SuperESP · 12 on-device AI apps on a $5 ESP32 — universal installer, all offline New Now running on a physical ESP32 — coherent text, fully offline, ~1 tok/s

Atome · lm · 2026

No cloud.No internet.No GPU.

The model is the firmware.

A language model in 271 KB.

A language model small enough to live inside a $2 chip — the kind already in your thermostat, a kid's toy, a hearing aid. A 944K-parameter ternary language model with bit-exact Python ↔ C99 ↔ Cortex-M3 parity. Compiles to a 2.6 KB engine + a 271 KB model blob. Runs on a $2 microcontroller — no heap, no syscalls, no network.

What it is

An AI for things, not chatbots.

The AI we use every day lives in giant datacenters. The little chips that already run your world — the kettle, the car key, your child's nightlight — get none of it. Atome lm changes that. Runs without the internet, gives the same answer on every device, and ships as part of the firmware, not as a cloud service. Most published tiny-LM work targets the smartphone class — 100M+ parameters, 4-to-8-bit weights, GPU-friendly. The MCU class — 14 KB SRAM (Blue Pill), 264 KB (Pico), 512 KB (ESP32-S3) — has a small handful of peers: llama2.c on MCU, TinyMaix, esp32-llm. Atome lm's specific shape: ternary weights, zero-heap pure-C99 engine, bit-exact Python ↔ C parity verified under QEMU.

944K parameters at 1.58 bits per weight pack to 271 KB on disk. The same blob loads into a pure-C99 engine cross-compiled to Cortex-M0/M3/M4/M4F/M7 in 2.6–2.8 KB of .text. Output is bit-exact across Python, C, and emulated silicon to FP32 epsilon.

The pillars

Four things that hold together.

Most tiny LMs hit one or two of these. Atome lm's claim is the combination — each pillar is verifiable from the repo, not a marketing line.

01 · Provable

Bit-exact across the whole stack

Python on a laptop, C on a server, firmware on an emulated chip — every layer produces the same answer, byte for byte.

Max |Δ| = 3.7×10⁻⁷
48/48 on 60K · 16/16 on 944K

02 · Tiny

Fits the firmware budget that exists

The engine compiles to 2.6 kilobytes — smaller than this paragraph.

2.6–2.8 KB .text
across Cortex-M0/M3/M4/M4F/M7

03 · Sealed

Zero heap · zero syscall · zero network

No memory allocator, no network calls, no telemetry. Air-gappable by construction.

No malloc · socket · fopen
provable absence of egress

04 · Reproducible

Every step measured, not estimated

From training the model to running it on an emulated chip — every measurement on this page comes from a script.

146 / 146 tests
every number on this page is script-generated

How it works

Three small specialists. One traffic cop.

Most AI uses one mechanism for everything. Atome lm uses three smaller specialists and a tiny switch that picks which one to use for each character. Three small specialists cost less memory than one big generalist at the same quality — that's where the architecture earns its place at MCU scale. Each block runs three structurally-different operations in parallel — a 5-tap depthwise causal conv, a diagonal SSM, and a top-k=4 sparse attention — combined by a per-token softmax router. All projections are ternary (-1/0/+1), per-tensor scale.

01 · Local

Sees the last 5 letters Depthwise causal conv, k=5

A short-range filter for the small patterns between adjacent letters. Ternary kernel, no bias. O(L·k) per token.

02 · State

Remembers the whole sentence Diagonal SSM

A long-term memory pathway. Carries information from the beginning of the sentence forward. FP32 per-channel a, b, c_out. Recurrent at inference, O(1) per token.

03 · Sparse

Points back to one earlier word Top-k=4 causal attention

For when a word depends on a specific earlier word. Ternary Q/K/V projections. Softmax over the top-4 keys per query.

Applications

What it's built for.

Atome lm is not a general-purpose chatbot. It's a narrow specialist that you fine-tune on the data your product cares about — then it ships inside the firmware.

Smart lightbulbs

Local voice commands

Kids' toys & dolls

No recording, ever

Bedtime story devices

Offline generation

Pet feeders & litter boxes

Friendly status messages

Automobiles

Voice-intent detection

Watches & wearables

Text comprehension at the wrist

Agriculture

Field-sensor pattern recognition

Medical wearables

ECG / vitals classification

Industrial sensors

Anomaly detection

Energy & utility meters

On-meter reading parser

Hearing aids

On-device sentence completion

Disaster-relief radios

Field text-help, off-grid

Working prototypes

Three things it does today.

Beyond writing stories, the engine runs as a narrow text classifier — the kind a real embedded product ships. Three internal prototypes, trained, exported, and run through the Cortex-M3 emulator. The training scripts for these tasks are not in the public kit; the engine path for running them is.

Atome LM v2 (SuperESP) extends this into 12 on-device AI apps — agriculture, voice, anomaly, air-quality, power and more — verified on a physical ESP32. Read the v2 release →

01 · Wake-word / command intent

Picks the right command from text variants

6-class classifier on byte-tokenized phrase strings, 1800 synthetic samples with simple lexical variations.

held-out accuracy 100 %
02 · Anomaly flag

Spots bad sensor-reading strings

Binary classifier on synthetic sensor strings (NaN, out-of-range, garbage spikes). 1000 samples, 30 epochs.

held-out accuracy 91.7 %
03 · Intent bucket

Sorts a sentence into one of five buckets

5-way classifier (command / question / status / alert / greeting), 1500 samples, 40 epochs.

held-out accuracy 100 %

Where it fits

Real numbers, real chips.

Peak RAM = .bss + measured stack high-water from a real Cortex-M3 build under QEMU MPS2-AN385. Reproducer: python3 scripts/measure_ram.py --markdown.

Model size Used for RAM STM32F103$2-4 RP2040$4 STM32F411$15 STM32F7$15-30 ESP32-S3$5-10
nano Proves the engine fits the smallest chips 14.5 KB
small Short keyword routing 27.5 KB no RAM
classifier Narrow on-device classification heads 52 KB no
tinystories Children's-story-shaped writing 104 KB no RAM
mid Mid-range writing for a specific topic 205 KB no no RAM
prod (944K) Full coherent prose — the model writing live above 411 KB no no RAM no

The proof

Three environments. One answer.

The same model, run on a laptop, on a server, and on an emulated microcontroller, returns the same words byte-for-byte. Not "close." Identical to a single-precision computation. That's what makes the model auditable for any product that has to be certified or formally reviewed.

Python · PyTorch
Reference forward pass
C99 · zero-heap
Inference engine
Cortex-M3 · QEMU MPS2-AN385
Emulated silicon
3.7×10⁻⁷
Max |Δ| across all 3 stages
48 / 48
Multi-token parity, 60K demo
16 / 16
Multi-token parity, 944K
146 / 146
pytest, green at HEAD

Verified by tests/test_parity_with_c.py + tests/test_parity_multitoken.py — every run reproducible from a cold checkout.

The result

A measured win, and a measured loss.

On TinyStories, 3,000 steps, single seed: Atome's routed-ternary block reaches 6.31 ppl vs 8.12 for vanilla GPT-FP32 at fixed params; 6.31 vs 13.10 at fixed flash. The result reverses at 944K parameters where vanilla wins by ~11%. Atome's bet is deliberately the sub-1M, MCU-class regime.

Lower perplexity is better · green = Atome wins · red = Atome loses

60K · param-fair
−22 %
Atome 6.31 ppl · vanilla 8.12 ppl
60K · flash-fair
−52 %
Atome 6.31 ppl · vanilla-6K 13.10 ppl
944K · param-fair
+11 %
vanilla 2.54 ppl · Atome 2.87 ppl
944K · disk
20×
Atome 271 KB · vanilla 3.7 MB

Single seed; multi-seed run pending. Full reading: HONEST_RESULTS.md.

Versus the field

A narrower but verifiable claim.

Other MCU-class LMs exist. llama2.c compiles for microcontrollers, TinyMaix and esp32-llm run small models on ESP32-S3. Atome lm's tighter combination: ternary weights, a zero-heap pure-C99 engine, and bit-exact parity verified under QEMU.

BitNet b1.58

Smallest
700M – 3B
Weights
ternary
Runs on
server / phone

llama2.cStories260K

Smallest
260K – 110M
Weights
FP32 / Q8
Runs on
MCU class possible

TinyMaixesp32-llm

Smallest
~15M
Weights
Q8 / FP32
Runs on
ESP32-S3 (PSRAM)

Atome lm

Size
60K · 944K
Weights
ternary, zero-heap
Verified
Cortex-M3 (QEMU) · bit-exact

Not yet: we haven't flashed onto a physical chip and measured Joules per token. That's the next milestone — until it's done, the "boots on silicon" claim stays on QEMU.

What you can build

A building block, not a product.

The public kit ships the architecture + engine + a 944K TinyStories LM as a research-demo artifact. Three classes of task fit the engine shape:

Tiny narrow LM

Train on a single domain (FAQ, command-line help, embedded-system Q&A) and the model speaks fluently inside that scope. Going wide at this size produces incoherent output — capacity limit, not architecture.

On-device text classifiers

A classification head plugs onto the 3-pathway backbone. Internal text classifiers we've trained reach high held-out accuracy on synthetic distributions; the training scripts for those are not in the public kit — the engine path for running them is.

Per-token router signal

The router's entropy is observable for free at every position. In V2 production it tracks out-of-domain inputs and correlates with per-token loss; at 60K scale on a single corpus the signal is exposed identically — calibration as an uncertainty estimator at that scale has not been measured here.

Lineage

Carved from internal research.

Atome lm is the embedded face of a larger research project. The 3-pathway architecture and the C99 engine are public on GitHub under Apache 2.0 (TilelliLab/atome-lm). More elaborate variants — extended pathways, retrieval and memory, multi-bank weight schemes, an internal regression-prevention gate — remain internal. For production integration (silicon bring-up, the Atome Secure Boot Pack, per-platform hardening): hello@atomelm.com.

What people say

It went out into the wild.

Here's what some people think of it — real reactions from the r/esp32 community, linked to the source.

aq u/aquoad ran it on their own board

This is pretty cool. The full model runs (but comically slowly) on a 6 yr old ESP32-S2 board with external SPI PSRAM, though I did have to turn off the idle task watchdog.

==================== ATOME on SILICON ====================
chip   : ESP32-S2 rev v0.0   cores=1   flash : 4 MB
PSRAM  : 2048 KB (detected)
model  : 276655 bytes embedded in flash
config : d=256 layers=8 head=64 seq=128 state=811 KB
---------------------------------------------------------
prompt: Once  >>> upon a time, there was a little girl named Lily
average: 0.1 tok/s | heap low-water: 243 KB internal
independent reproduction · serial log posted
Ic u/IcestormsEdr/esp32

Definitely gonna try this. Thanks much.

24
ur u/urgeekyduder/esp32

As an embedded systems student, I would like to say thank you for your good work and keep up 🔥

15
Mo u/MossiGuyr/esp32

wow i need to try this, honestly it's pretty impressive how an LLM can run on a microcontroller

8
We u/Wemos_D1r/esp32

The fact it can detect bad sensor readings is a cool use case for a model this small. Nice use of it — good job guys :p

6
sh u/shisohanr/esp32

Imagine a beowulf cluster of these. 64 ESP32's and you got your 60 t/s for $300!

3

Quotes are from the public r/esp32 thread, shown with each commenter's handle and score. Read every comment at the source ↓