From day one Depuis le premier jour

Every experiment. Chaque expérience.

Chronological log. What got built, what got measured, what's still open. Where something is broken, in flight, or out of scope — that's said directly. Journal chronologique. Ce qui a été construit, ce qui a été mesuré, ce qui est encore ouvert. Là où quelque chose est cassé, en cours ou hors-périmètre — c'est dit clairement.

Origin · April 2026
A C99 inference engine, looking for a model Un moteur d'inférence C99, en quête d'un modèle

The engine came first — a tiny program that runs ternary AI on bare-metal microcontrollers. No OS, no internet, no allocator. Pure C99, zero heap, zero floats in the matmul, zero external dependencies. 42 tests. Bit-exact Python ↔ C parity at FP32 epsilon. Static atome_block_t fixed at three pathways — that structural constraint drives every architecture decision on the model side. Le moteur d'abord — un petit programme qui exécute une IA ternaire sur microcontrôleur bare-metal. Pas d'OS, pas d'internet, pas d'allocator. C99 pur, zéro heap, zéro float dans le matmul, zéro dépendance externe. 42 tests. Parité bit-exacte Python ↔ C à epsilon FP32. atome_block_t statique fixé à trois chemins — cette contrainte structurelle dicte toutes les décisions d'architecture côté modèle.

Engine ready · 3-pathway lock-in
2026-05-03 · Day 1
atome lm is born — three pathways, by force atome lm naît — trois chemins, par contrainte

The original 4-pathway sketch was retired in favour of strict alignment with atome_block_t: local depthwise causal conv + diagonal SSM + top-k sparse attention. Anything wider would break the bit-exact-parity contract the MCU claim rests on. First commit: 42 tests, byte tokenizer, ATOME01 exporter. ~60K params packing to ~20 KB on disk. La maquette initiale 4-chemins est retirée au profit d'un alignement strict avec atome_block_t : conv depthwise causale locale + SSM diagonal + attention sparse top-k. Toute extension casserait le contrat de parité bit-exacte sur lequel repose la promesse MCU. Premier commit : 42 tests, tokenizer octet, exporteur ATOME01. ~60K paramètres, ~20 Ko sur disque.

42 tests · ~20 KB blob · 3-pathway block
2026-05-03 · Evening
Cortex-M3 firmware boots in QEMU — parity holds Le firmware Cortex-M3 boote sous QEMU — parité tenue

Cross-compile sweep across Cortex-M0 / M3 / M4 / M4F / M7 at -Os. Engine code: 2.6–2.8 KB on all five. Full QEMU MPS2-AN385 firmware boots, runs forward pass under semihosting. End-to-end Python ↔ Cortex-M3 parity: max |Δ| = 3.7×10⁻⁷. Cross-compile pour Cortex-M0 / M3 / M4 / M4F / M7 à -Os. Code moteur : 2,6–2,8 Ko sur les cinq. Firmware QEMU MPS2-AN385 complet qui boote et exécute la passe forward sous semihosting. Parité bout-en-bout Python ↔ Cortex-M3 : max |Δ| = 3,7×10⁻⁷.

QEMU verified · 45 KB firmware · max |Δ| 3.7e-7
2026-05-09 · Morning
Sampling, REPL, first trained checkpoint Sampling, REPL, premier checkpoint entraîné

Added temperature / top_p / top_k / seeded generator. Default temperature=0 preserves bit-exact parity with the C engine's argmax. REPL with per-layer router-entropy bars, CPU benchmark, held-out bpb evaluator. First trained checkpoint: 800 steps on TinyStories, bpb 3.48, perplexity 11.16. Ajout de temperature / top_p / top_k / generator seedé. temperature=0 par défaut préserve la parité bit-exacte avec l'argmax du moteur C. REPL avec barres d'entropie de routeur par couche, benchmark CPU, évaluateur bpb sur held-out. Premier checkpoint entraîné : 800 pas sur TinyStories, bpb 3,48, perplexité 11,16.

Sampling · REPL · First checkpoint · ppl 11.16
2026-05-09 · Afternoon
Frontier finding — A/B against vanilla GPT at 60K Résultat frontier — A/B contre vanilla GPT à 60K

Built a minimal vanilla decoder-only transformer at 60.8K params (param-fair) and 6K params (flash-fair). Same recipe across all three. Three-seed median at 3,000 steps: atome 6.31 ppl vs vanilla-60K 8.12 ppl vs vanilla-6K 13.10 ppl. +22% param-fair · +52% flash-fair. First apples-to-apples evidence at MCU scale that the routed pathway architecture is the source of the win. Construction d'un transformer décodeur vanilla minimal à 60,8K (équité paramètres) et 6K (équité flash). Même recette pour les trois. Médiane trois seeds à 3 000 pas : atome 6,31 ppl vs vanilla-60K 8,12 ppl vs vanilla-6K 13,10 ppl. +22 % à params égaux · +52 % à flash égale. Première preuve à isopérimètres à l'échelle MCU que l'architecture routée est la source du gain.

+22% param-fair · +52% flash-fair · 3 seeds
2026-05-09 · Late
Per-pathway ablation — the conv carries the win Ablation par chemin — la conv porte le gain

Drop each pathway in turn. Dropping local-conv +20% ppl (largest hit). Dropping SSM +6%. Dropping sparse-attn +4% (smallest). At 60K params on TinyStories, the conv pathway is doing most of the work; attention is the least useful of the three. That ordering shifts at larger scale, but at MCU scale the ranking is clear. Suppression de chaque chemin à tour de rôle. Sans conv locale +20 % ppl (le plus gros impact). Sans SSM +6 %. Sans attention sparse +4 % (le plus faible). À 60K paramètres sur TinyStories, la conv porte l'essentiel du travail ; l'attention est le moins utile des trois. Cet ordre change à plus grande échelle, mais à l'échelle MCU le classement est clair.

Conv +20% · SSM +6% · Attn +4%
2026-05-10 · Morning
QEMU emulator wired for the trained model L'émulateur QEMU câblé pour le modèle entraîné

Built scripts/run_qemu_tinystories.py: loads the trained checkpoint, exports to .atome, bakes via xxd, builds the cortex-m3-gen firmware, runs under QEMU, decodes the UTF-8 byte stream. First emulated run: text out, but only 23 of 48 tokens matched Python — drift after token 23 surfaced a multi-token bug. Construction de scripts/run_qemu_tinystories.py : charge le checkpoint, exporte en .atome, bake via xxd, build du firmware cortex-m3-gen, exécute sous QEMU, décode le flux UTF-8. Première exécution émulée : du texte en sortie, mais seulement 23 des 48 tokens identiques à Python — un bug multi-token apparaît après le token 23.

Emulator pipeline · 23/48 parity · bug surfaced
2026-05-10 · Midday
SSM-state bug fixed — 48/48 bit-exact Bug d'état SSM corrigé — 48/48 bit-exact

The 23-vs-48 drift was a dormant bug documented since the 2026-05-03 audit: atome_predict_next reprocessed the full token list each call but never reset state->ssm_h. Fix is four lines (memset at the top of predict_next, atome.c:294-300). Multi-token QEMU parity jumped 23/48 → 48/48 bit-exact. Previously-xfailed multi-token parity test is now a regular passing test. 140 / 140 tests green. Le glissement 23-vs-48 était un bug dormant documenté depuis l'audit du 2026-05-03 : atome_predict_next reprocessait toute la liste de tokens à chaque appel mais ne réinitialisait jamais state->ssm_h. Correctif en quatre lignes (memset en tête de predict_next, atome.c:294-300). Parité QEMU multi-token : 23/48 → 48/48 bit-exact. Le test multi-token précédemment xfail passe désormais. 140 / 140 tests verts.

4-line fix · 48/48 multi-token parity · 140/140 tests
2026-05-10 · Afternoon
944K-param model trained — coherent TinyStories prose Modèle 944K paramètres entraîné — vraie prose TinyStories

Provisioned a community-cloud RunPod A6000. Trained 944,640 params (d=256, 8 layers) on the full TinyStories corpus (~1.9 GB raw, 6.7M chunks of 256 bytes). 30,000 steps, effective batch 256, BF16, cosine LR 3e-4 → 3e-5. Best val loss 1.0545, perplexity 2.87 at step 29,000. Three hours twenty wall, roughly $2 cloud. Output: "Once upon a time, there was a little girl named Lily..." 16/16 QEMU ↔ Python bit-exact on the new checkpoint. Provisionnement d'un A6000 community-cloud RunPod. Entraînement 944 640 params (d=256, 8 couches) sur le corpus TinyStories complet (~1,9 Go bruts, 6,7M chunks de 256 octets). 30 000 pas, batch effectif 256, BF16, LR cosine 3e-4 → 3e-5. Meilleur val loss 1,0545, perplexité 2,87 au pas 29 000. Trois heures vingt en wall, ~2 $ de cloud. Sortie : « Once upon a time, there was a little girl named Lily... » 16/16 QEMU ↔ Python bit-exact sur le nouveau checkpoint.

944K params · ppl 2.87 · 3 h 20 · ~$2 · 16/16 parity
2026-05-11 · Morning
944K vanilla A/B — the headline flips at scale A/B vanilla à 944K — l'argumentaire s'inverse à l'échelle

Same recipe, same val slice, vanilla GPT FP32 at 950K params: val loss 0.9337, perplexity 2.54 — beats atome ternary 944K by 11.4% loss / 11.5% ppl. Single seed. Fairness verified bit-exact (atome re-eval on the pod returned 1.0545, diff +0.0000). Implication: the architecture's bet is the sub-1M regime; above ~1M the inductive bias becomes a constraint rather than a substitute for capacity. Website hero corrected to say so honestly. Multi-seed run pending (~$0.60 vast). Même recette, même val slice, vanilla GPT FP32 à 950K paramètres : val loss 0,9337, perplexité 2,54 — bat atome ternaire 944K de 11,4 % en loss / 11,5 % en ppl. Seed unique. Équité vérifiée bit-exact (re-éval atome sur le pod a retourné 1,0545, diff +0,0000). Conséquence : le pari de l'architecture est le régime sub-1M ; au-delà de ~1M le biais inductif devient une contrainte plutôt qu'un substitut de capacité. Hero du site corrigé en conséquence. Multi-seed à venir (~0,60 $ vast).

Vanilla wins +11.4% at 944K · Honest pivot · Multi-seed pending
2026-05-11 · Evening
Three working prototypes re-validated · 944K QEMU re-verified Trois prototypes revalidés · 944K QEMU revérifié

Pre-launch verification pass. Re-trained and re-evaluated all three task classifiers end-to-end: wake-word 100 % · anomaly 91.7 % · intent 100 %, with C-engine accuracy matching Python. Each binary fits 20.2 KB, total state RAM 52 KB. Re-ran the 944K trained checkpoint through the Cortex-M3 emulator with a fresh "Once upon a time" prompt: 4 / 4 bit-exact ("C continuation = Python continuation = ', th'"). All 140 pytest tests still green. The moat is reproducible from a cold checkout. Vérification pré-lancement. Tous les classifieurs ré-entraînés et ré-évalués de bout en bout : mot-clé 100 % · anomalie 91,7 % · intention 100 %, précision moteur C identique à Python. Chaque binaire fait 20,2 Ko, RAM totale 52 Ko. Re-exécution du 944K entraîné sur l'émulateur Cortex-M3 avec un prompt frais : 4 / 4 bit-exact. 140 tests pytest toujours verts.

3 prototypes · 100 / 91.7 / 100 · 4/4 944K QEMU · 140 / 140 tests
2026-05-11 · Afternoon
Domain locked, site refresh, naming pivot Domaine sécurisé, refonte du site, repositionnement

atomelm.com registered. hello@atomelm.com + bonjour@atomelm.com for inbound, surfaced based on browser locale. Full site rewrite to a mono / editorial design system (white paper, black ink, single electric accent). Plain English / For developers toggle wired throughout. Comprehensive applications grid by sector. Naming pivot from "Atome LLM" to "atome lm" — the "L" in LLM is technically wrong below 1B params and was hurting the pitch. atomelm.com enregistré. hello@atomelm.com + bonjour@atomelm.com pour l'entrant, présentés selon la langue du navigateur. Refonte complète du site vers un design system mono / éditorial. Bascule Langage simple / Pour développeurs partout. Grille d'applications complète. Repositionnement de « Atome LLM » vers « atome lm ».

atomelm.com · EN/FR · mono editorial · naming corrected
In flight En cours
What's next La suite

Real silicon. Flash firmware onto a Nucleo-F411RE or RP2040 and report tokens/sec + Joules/token. The number that turns "frontier paper" into "frontier chip."

Multi-seed at 944K. Three more training runs to pin down the perplexity range and confirm the vanilla crossover.

Q15 inference path. Halves BSS, multiplies M0/M3 speed by 5–10×, brings RP2040 back into scope for the trained model.

ATOME02 C decoder. Move the 14% disk savings from Python-only into the runtime.

Narrow-domain distillation. Use a strong teacher to generate curated training data for a target deployment domain. Silicium réel. Flasher le firmware sur une Nucleo-F411RE ou un RP2040 et publier tokens/s + Joules/token. Le chiffre qui transforme « paper frontier » en « chip frontier ».

Multi-seed à 944K. Trois entraînements supplémentaires pour fixer la plage de perplexité et confirmer le croisement vanilla.

Chemin d'inférence Q15. Divise la BSS par deux, multiplie la vitesse M0/M3 par 5–10×, ramène le RP2040 dans le périmètre du modèle entraîné.

Décodeur ATOME02 en C. Faire passer les 14 % d'économie disque de Python à l'exécution embarquée.

Distillation domaine étroit. Utiliser un teacher puissant pour générer des données d'entraînement ciblées sur un domaine de déploiement.