⚡ FlashLM v5 "Thunderbolt" Demo

MatMul-free Language Model with Parallel Gated Recurrence

This is a demo of FlashLM v5 "Thunderbolt", a 29.7M parameter language model trained entirely on CPU without GPUs.

Model Details:

Parameters: 29.7M (26.5M ternary / 3.2M float)

Architecture: ParallelGatedRecurrence with BitLinear (ternary weights)

Training: ~40 hours on AMD Ryzen 7950X3D

Dataset: TinyStories (~1B tokens)

Final PPL: 1.36 (beats TinyStories-1M baseline!)

Final BPC: 0.44

FlashLM v5 uses ParallelGatedRecurrence - a matmul-free architecture where:

🎉 HUGE THANKS TO arki05!!! 🎉

arki05 provided the AMD Ryzen 7950X3D used for training this model!

Without arki05's generous contribution, this project would NOT be possible!

THANK YOU ARKI05!!! 🙏⚡

FlashLM: Democratizing Language Model Research