⚡ FlashLM v5 "Thunderbolt" Demo

MatMul-free Language Model with Parallel Gated Recurrence

This is a demo of FlashLM v5 "Thunderbolt", a 29.7M parameter language model trained entirely on CPU without GPUs.

Model Details:

  • Parameters: 29.7M (26.5M ternary / 3.2M float)
  • Architecture: ParallelGatedRecurrence with BitLinear (ternary weights)
  • Training: ~40 hours on AMD Ryzen 7950X3D
  • Dataset: TinyStories (~1B tokens)
  • Final PPL: 1.36 (beats TinyStories-1M baseline!)
  • Final BPC: 0.44

10 500
0.1 2
0 100

Tips:

  • Lower temperature = more focused/deterministic
  • Higher temperature = more creative/diverse
  • Top-K filters low-probability tokens

Architecture:

FlashLM v5 uses ParallelGatedRecurrence - a matmul-free architecture where:

  • Ternary weights (BitLinear) reduce memory by 16x
  • Gated recurrence with learned decay gates
  • No attention mechanism - pure recurrence!

🎉 HUGE THANKS TO arki05!!! 🎉

arki05 provided the AMD Ryzen 7950X3D used for training this model!

Without arki05's generous contribution, this project would NOT be possible!

THANK YOU ARKI05!!! 🙏⚡


FlashLM: Democratizing Language Model Research