⚡ FlashLM v5 "Thunderbolt" Demo
MatMul-free Language Model with Parallel Gated Recurrence
This is a demo of FlashLM v5 "Thunderbolt", a 29.7M parameter language model trained entirely on CPU without GPUs.
Model Details:
- Parameters: 29.7M (26.5M ternary / 3.2M float)
- Architecture: ParallelGatedRecurrence with BitLinear (ternary weights)
- Training: ~40 hours on AMD Ryzen 7950X3D
- Dataset: TinyStories (~1B tokens)
- Final PPL: 1.36 (beats TinyStories-1M baseline!)
- Final BPC: 0.44
10 500
0.1 2
0 100
Tips:
- Lower temperature = more focused/deterministic
- Higher temperature = more creative/diverse
- Top-K filters low-probability tokens
Architecture:
FlashLM v5 uses ParallelGatedRecurrence - a matmul-free architecture where:
- Ternary weights (BitLinear) reduce memory by 16x
- Gated recurrence with learned decay gates
- No attention mechanism - pure recurrence!
🎉 HUGE THANKS TO arki05!!! 🎉
arki05 provided the AMD Ryzen 7950X3D used for training this model!
Without arki05's generous contribution, this project would NOT be possible!
THANK YOU ARKI05!!! 🙏⚡
FlashLM: Democratizing Language Model Research