Benchmarking LFM2.5-Thinking on GSM8k (early result)

I have a secret passion for LFM2.5-Thinking. It's tiny 1.2B, it's fast, it's a reasoning model, and it's good. Really good. My tests are still in progress. All i can do is share some early results. I use the public GSM8k dataset, but with my own benchmarking scripts. What is the GSM8k benchmark? Grade School Math 8K, a dataset of 8.5K high-quality linguistically diverse grade school math word problems requiring multi-step reasoning and elementary arithmetic operations. The dataset Some public benchmarks results The paper The Top 10 Leaderboard in 2026, up to 97%. Take note of the massive context size. And this what "State of the Art" results looked like in 2021, barely 35%. Some early results Questions: 1319 (test) Context sizes to test: [1000, 2000, 3000, 4000, 5000, 6000, 7000] Endpoint: http://192.168.1.110:8000 / lfm2.5-thinking === max_tokens=1000 === [200/1319] acc=135/200 (67.5%) rate=3.9q/s [400/1319] acc=251/400 (62.8%) rate=4.6q/s [600/1319] acc=387/600 (64.5%) rate=5.0q/s [8

Benchmarking LFM2.5-Thinking on GSM8k (early result)

Related Articles

Best Amazon Spring Sale phone deals 2026: Last chance to grab these 25+ discounts

The best streaming deals right now: Paramount+, Roku sticks, and more

IHP v1.5 has been released

NumPy as Synth Engine

Best Costco deals to compete with Amazon's Big Spring Sale 2026: Last chance to save

Related Articles

News
Best Amazon Spring Sale phone deals 2026: Last chance to grab these 25+ discounts
ZDNet • 2h ago

News
The best streaming deals right now: Paramount+, Roku sticks, and more
ZDNet • 2h ago

News
IHP v1.5 has been released
Lobsters • 3h ago

News
NumPy as Synth Engine
Lobsters • 3h ago

News
Best Costco deals to compete with Amazon's Big Spring Sale 2026: Last chance to save
ZDNet • 3h ago