Atome · lm · 2026

No cloud.No internet.No GPU. Pas de cloud.Pas d'internet.Pas de GPU.

The model is the firmware. Le modèle est le firmware.

A language model in 271 KB. Un modèle de langue en 271 Ko.

A language model small enough to live inside a $2 chip — the kind already in your thermostat, a kid's toy, a hearing aid. A 944K-parameter ternary language model with bit-exact Python ↔ C99 ↔ Cortex-M3 parity. Compiles to a 2.6 KB engine + a 271 KB model blob. Runs on a $2 microcontroller — no heap, no syscalls, no network. Un modèle de langue assez petit pour vivre à l'intérieur d'une puce à 5 $ — du genre déjà présent dans votre thermostat, un jouet d'enfant, une prothèse auditive. Modèle de langue ternaire 944K paramètres, parité bit-exacte Python ↔ C99 ↔ Cortex-M3. 2,6 Ko de moteur + 271 Ko de poids. Sur un MCU à 5 $.

Try the live model →Essayer le modèle → See how it works ↓Voir le fonctionnement ↓ ServicesServices GitHub →

What it isLe projet

An AI for things, not chatbots.Une IA pour les objets, pas pour les chatbots.

The AI we use every day lives in giant datacenters. The little chips that already run your world — the kettle, the car key, your child's nightlight — get none of it. Atome lm changes that. Runs without the internet, gives the same answer on every device, and ships as part of the firmware, not as a cloud service. Most published tiny-LM work targets the smartphone class — 100M+ parameters, 4-to-8-bit weights, GPU-friendly. The MCU class — 14 KB SRAM (Blue Pill), 264 KB (Pico), 512 KB (ESP32-S3) — has a small handful of peers: llama2.c on MCU, TinyMaix, esp32-llm. Atome lm's specific shape: ternary weights, zero-heap pure-C99 engine, bit-exact Python ↔ C parity verified under QEMU. L'IA quotidienne vit dans des datacenters. Les puces qui font tourner votre monde n'en ont aucune. Atome lm change ça. La classe MCU a une petite poignée de pairs : llama2.c sur MCU, TinyMaix, esp32-llm. La forme d'Atome lm : poids ternaires + moteur C99 zéro-heap + parité bit-exacte sous QEMU.

944K parameters at 1.58 bits per weight pack to 271 KB on disk. The same blob loads into a pure-C99 engine cross-compiled to Cortex-M0/M3/M4/M4F/M7 in 2.6–2.8 KB of .text. Output is bit-exact across Python, C, and emulated silicon to FP32 epsilon. 944K paramètres à 1,58 bit/poids → 271 Ko. Le même blob se charge dans un moteur C99 sur Cortex-M0/M3/M4/M4F/M7 en 2,6-2,8 Ko de .text.

The pillarsLes piliers

Four things that hold together.Quatre choses qui tiennent ensemble.

Most tiny LMs hit one or two of these. Atome lm's claim is the combination — each pillar is verifiable from the repo, not a marketing line. La plupart des petits LM en touchent une ou deux. La revendication d'Atome lm, c'est la combinaison — chaque pilier est vérifiable depuis le dépôt.

01 · ProvableProuvable

Bit-exact across the whole stackBit-exact dans toute la pile

Python on a laptop, C on a server, firmware on an emulated chip — every layer produces the same answer, byte for byte. Python sur un portable, C sur un serveur, firmware sur une puce émulée — chaque couche produit la même réponse, octet par octet.

Max |Δ| = 3.7×10⁻⁷
48/48 on 60K · 16/16 on 944K 48/48 sur 60K · 16/16 sur 944K

02 · TinyMinuscule

Fits the firmware budget that existsTient dans le budget firmware réel

The engine compiles to 2.6 kilobytes — smaller than this paragraph. Le moteur compile en 2,6 kilo-octets — plus petit que ce paragraphe.

2.6–2.8 KB .text
across Cortex-M0/M3/M4/M4F/M7 sur Cortex-M0/M3/M4/M4F/M7

03 · SealedÉtanche

Zero heap · zero syscall · zero networkZéro heap · zéro syscall · zéro réseau

No memory allocator, no network calls, no telemetry. Air-gappable by construction. Pas d'allocateur, pas d'appel réseau, pas de télémétrie. Air-gappable par construction.

NoPas de malloc · socket · fopen
provable absence of egress absence d'exfiltration prouvable

04 · ReproducibleReproductible

Every step measured, not estimatedChaque étape mesurée, pas estimée

From training the model to running it on an emulated chip — every measurement on this page comes from a script. De l'entraînement à l'exécution sur puce émulée — chaque mesure de cette page vient d'un script.

146 / 146 teststests
every number on this page is script-generated chaque chiffre est généré par un outil du dépôt

How it worksFonctionnement

Three small specialists. One traffic cop.Trois petits spécialistes. Un aiguilleur.

Most AI uses one mechanism for everything. Atome lm uses three smaller specialists and a tiny switch that picks which one to use for each character. Three small specialists cost less memory than one big generalist at the same quality — that's where the architecture earns its place at MCU scale. Each block runs three structurally-different operations in parallel — a 5-tap depthwise causal conv, a diagonal SSM, and a top-k=4 sparse attention — combined by a per-token softmax router. All projections are ternary (-1/0/+1), per-tensor scale. Trois spécialistes plus petits et un mini-aiguilleur qui choisit pour chaque caractère. Trois opérations en parallèle — conv depthwise (k=5), SSM diagonal, attention top-k=4 — combinées par routeur softmax.

01 · LocalLocal

Sees the last 5 letters Depthwise causal conv, k=5 Voit les 5 dernières lettres Conv causale depthwise, k=5

A short-range filter for the small patterns between adjacent letters. Ternary kernel, no bias. O(L·k) per token. Un filtre courte portée pour les motifs entre lettres voisines. Noyau ternaire, sans biais. O(L·k) par token.

02 · StateÉtat

Remembers the whole sentence Diagonal SSM Garde la phrase en mémoire SSM diagonal

A long-term memory pathway. Carries information from the beginning of the sentence forward. FP32 per-channel a, b, c_out. Recurrent at inference, O(1) per token. Un chemin de mémoire longue. FP32 par canal. Récurrent O(1) à l'inférence.

03 · SparseSparse

Points back to one earlier word Top-k=4 causal attention Pointe vers un mot précédent Attention causale top-k=4

For when a word depends on a specific earlier word. Ternary Q/K/V projections. Softmax over the top-4 keys per query. Pour les mots qui dépendent d'un mot précédent précis. Q/K/V ternaires. Softmax sur les 4 meilleures clés.

ApplicationsApplications

What it's built for.Ce pour quoi c'est conçu.

Atome lm is not a general-purpose chatbot. It's a narrow specialist that you fine-tune on the data your product cares about — then it ships inside the firmware. Atome lm n'est pas un chatbot généraliste. C'est un spécialiste étroit qu'on fine-tune sur les données qui comptent pour votre produit — puis il se livre dans le firmware.

Smart lightbulbsAmpoules connectées

Local voice commandsCommandes vocales locales

Kids' toys & dollsJouets & poupées

No recording, everAucun enregistrement

Bedtime story devicesConteurs pour dormir

Offline generationGénération hors-ligne

Pet feeders & litter boxesDistributeurs animaux

Friendly status messagesMessages de statut

AutomobilesAutomobile

Voice-intent detectionDétection d'intention vocale

Watches & wearablesMontres & wearables

Text comprehension at the wristCompréhension au poignet

AgricultureAgriculture

Field-sensor pattern recognitionReconnaissance terrain

Medical wearablesWearables médicaux

ECG / vitals classificationClassification ECG / vitaux

Industrial sensorsCapteurs industriels

Anomaly detectionDétection d'anomalie

Energy & utility metersCompteurs énergie & eau

On-meter reading parserParseur de relevés on-device

Hearing aidsProthèses auditives

On-device sentence completionComplétion sur appareil

Disaster-relief radiosRadios de secours

Field text-help, off-gridAide texte hors-réseau

Working prototypesPrototypes qui marchent

Three things it does today.Trois choses qu'elle fait déjà.

Beyond writing stories, the engine runs as a narrow text classifier — the kind a real embedded product ships. Three internal prototypes, trained, exported, and run through the Cortex-M3 emulator. The training scripts for these tasks are not in the public kit; the engine path for running them is. Au-delà d'écrire des histoires, le moteur tourne comme classifieur texte étroit — le genre qu'un vrai produit embarqué livre. Trois prototypes internes, entraînés, exportés et exécutés sur l'émulateur Cortex-M3. Les scripts d'entraînement ne sont pas dans le kit public ; le chemin moteur l'est.

01 · Wake-word / command intentMot-clé / intention

Picks the right command from text variantsChoisit la bonne commande

6-class classifier on byte-tokenized phrase strings, 1800 synthetic samples with simple lexical variations. Classifieur 6 classes sur chaînes byte-tokenisées, 1800 échantillons synthétiques.

held-out accuracyprécision hold-out 100 %

02 · Anomaly flagLecture suspecte

Spots bad sensor-reading stringsRepère les lectures suspectes

Binary classifier on synthetic sensor strings (NaN, out-of-range, garbage spikes). 1000 samples, 30 epochs. Classifieur binaire sur chaînes capteur synthétiques. 1000 échantillons, 30 epochs.

held-out accuracyprécision hold-out 91.7 %

03 · Intent bucketBucket d'intention

Sorts a sentence into one of five bucketsRange une phrase dans 5 buckets

5-way classifier (command / question / status / alert / greeting), 1500 samples, 40 epochs. Classifieur 5 voies (commande / question / statut / alerte / salutation), 1500 échantillons.

held-out accuracyprécision hold-out 100 %

Where it fitsCompatibilité matérielle

Real numbers, real chips.Vrais chiffres, vraies puces.

Peak RAM = .bss + measured stack high-water from a real Cortex-M3 build under QEMU MPS2-AN385. Reproducer: python3 scripts/measure_ram.py --markdown. RAM crête = .bss + high-water stack mesuré sur build Cortex-M3 sous QEMU MPS2-AN385.

Model sizeTaille	Used forSert à	RAM	STM32F103$2-4	RP2040$4	STM32F411$15	STM32F7$15-30	ESP32-S3$5-10
nano	Proves the engine fits the smallest chipsProuve que le moteur tient sur les plus petites puces	14.5 KB	✓	✓	✓	✓	✓
small	Short keyword routingRoutage de mots-clés courts	27.5 KB	no RAM	✓	✓	✓	✓
classifier	Narrow on-device classification headsTêtes de classification on-device	52 KB	no	✓	✓	✓	✓
tinystories	Children's-story-shaped writingÉcriture façon histoire pour enfants	104 KB	no RAM	✓	✓	✓	✓
mid	Mid-range writing for a specific topicÉcriture sur un sujet précis	205 KB	no	✓	no RAM	✓	✓
prod (944K)	Full coherent prose — the model writing live aboveProse complète — le modèle qui écrit en haut	411 KB	no	no RAM	no	✓	✓

The proofLa preuve

Three environments. One answer.Trois environnements. Une réponse.

The same model, run on a laptop, on a server, and on an emulated microcontroller, returns the same words byte-for-byte. Not "close." Identical to a single-precision computation. That's what makes the model auditable for any product that has to be certified or formally reviewed. Le même modèle, exécuté sur un portable, un serveur et un microcontrôleur émulé, retourne les mêmes mots octet par octet. Pas « proche » — identique. C'est ce qui rend le modèle auditable pour tout produit qui doit être certifié.

Python · PyTorch

Reference forward passForward de référence

→

C99 · zero-heap

Inference engineMoteur d'inférence

→

Cortex-M3 · QEMU MPS2-AN385

Emulated siliconSilicium émulé

3.7×10⁻⁷

Max |Δ| across all 3 stagesMax |Δ| sur les 3 étapes

48 / 48

Multi-token parity, 60K demoParité multi-token, démo 60K

16 / 16

Multi-token parity, 944KParité multi-token, 944K

146 / 146

pytest, green at HEADpytest, verts à HEAD

Verified by tests/test_parity_with_c.py + tests/test_parity_multitoken.py — every run reproducible from a cold checkout. Vérifié par tests/test_parity_with_c.py + tests/test_parity_multitoken.py — chaque run reproductible depuis un checkout vierge.

The resultLe résultat

A measured win, and a measured loss.Une victoire mesurée, et une défaite mesurée.

On TinyStories, 3,000 steps, single seed: Atome's routed-ternary block reaches 6.31 ppl vs 8.12 for vanilla GPT-FP32 at fixed params; 6.31 vs 13.10 at fixed flash. The result reverses at 944K parameters where vanilla wins by ~11%. Atome's bet is deliberately the sub-1M, MCU-class regime. Sur TinyStories : Atome 6,31 ppl vs vanilla 8,12 à params égaux ; 6,31 vs 13,10 à flash égal. S'inverse à 944K paramètres où vanilla gagne d'environ 11 %.

Lower perplexity is better · green = Atome wins · red = Atome loses

60K · param-fairparams égaux

−22 %

Atome 6.31 ppl · vanilla 8.12 pplAtome 6,31 ppl · vanilla 8,12 ppl

60K · flash-fairflash égal

−52 %

Atome 6.31 ppl · vanilla-6K 13.10 pplAtome 6,31 ppl · vanilla-6K 13,10 ppl

944K · param-fairparams égaux

+11 %

vanilla 2.54 ppl · Atome 2.87 pplvanilla 2,54 ppl · Atome 2,87 ppl

944K · diskdisque

20×

Atome 271 KB · vanilla 3.7 MBAtome 271 Ko · vanilla 3,7 Mo

Single seed; multi-seed run pending. Full reading: HONEST_RESULTS.md. Seed unique ; multi-seed à venir. Lecture complète : HONEST_RESULTS.md.

Versus the fieldPar rapport au reste

A narrower but verifiable claim.Une revendication plus étroite mais vérifiable.

Other MCU-class LMs exist. llama2.c compiles for microcontrollers, TinyMaix and esp32-llm run small models on ESP32-S3. Atome lm's tighter combination: ternary weights, a zero-heap pure-C99 engine, and bit-exact parity verified under QEMU. D'autres LM de classe MCU existent. Combinaison plus étroite d'Atome lm : ternaires + zéro-heap + parité bit-exacte sous QEMU.

BitNet b1.58

SmallestPlus petit

700M – 3B

WeightsPoids

ternaryternaires

Runs onTourne sur

server / phoneserveur / téléphone

llama2.cStories260K

SmallestPlus petit

260K – 110M

WeightsPoids

FP32 / Q8

Runs onTourne sur

MCU class possibleclasse MCU possible

TinyMaixesp32-llm

SmallestPlus petit

~15M

WeightsPoids

Q8 / FP32

Runs onTourne sur

ESP32-S3 (PSRAM)

Atome lm

SizeTaille

60K · 944K

WeightsPoids

ternary, zero-heapternaires, zéro-heap

VerifiedVérifié

Cortex-M3 (QEMU) · bit-exactCortex-M3 (QEMU) · bit-exact

Not yet: we haven't flashed onto a physical chip and measured Joules per token. That's the next milestone — until it's done, the "boots on silicon" claim stays on QEMU. Pas encore : on n'a pas encore flashé sur une puce physique ni mesuré les Joules par token. Tant que ce n'est pas fait, la promesse « boote sur silicium » reste sous QEMU.

What you can buildCe que vous pouvez construire

A building block, not a product.Une brique, pas un produit.

The public kit ships the architecture + engine + a 944K TinyStories LM as a research-demo artifact. Three classes of task fit the engine shape: Le kit public livre l'architecture + le moteur + un LM TinyStories 944K. Trois classes de tâches s'adaptent :

Tiny narrow LMLM étroit, sur un seul domaine

Train on a single domain (FAQ, command-line help, embedded-system Q&A) and the model speaks fluently inside that scope. Going wide at this size produces incoherent output — capacity limit, not architecture. Entraîné sur un seul domaine, le modèle parle couramment dans ce périmètre.

On-device text classifiersClassifieurs texte on-device

A classification head plugs onto the 3-pathway backbone. Internal text classifiers we've trained reach high held-out accuracy on synthetic distributions; the training scripts for those are not in the public kit — the engine path for running them is. Une tête de classification se branche sur le backbone. Scripts d'entraînement non inclus dans le kit public ; le chemin moteur l'est.

Per-token router signalSignal de routeur par token

The router's entropy is observable for free at every position. In V2 production it tracks out-of-domain inputs and correlates with per-token loss; at 60K scale on a single corpus the signal is exposed identically — calibration as an uncertainty estimator at that scale has not been measured here. L'entropie du routeur est observable gratuitement à chaque position. Calibration à 60K non mesurée.

LineageFiliation

Carved from internal research.Issu de la recherche interne.

Atome lm is the embedded face of a larger research project. The 3-pathway architecture and the C99 engine are public on GitHub under Apache 2.0 (TilelliLab/atome-lm). More elaborate variants — extended pathways, retrieval and memory, multi-bank weight schemes, an internal regression-prevention gate — remain internal. For production integration (silicon bring-up, the Atome Secure Boot Pack, per-platform hardening): hello@atomelm.com. Atome lm est le visage embarqué d'un projet de recherche plus large. L'architecture 3-chemins et le moteur C99 sont publics sur GitHub sous Apache 2.0. Variantes plus élaborées internes. Intégration production : bonjour@atomelm.com.