Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training

I replicated David Ng's RYS method ( https://dnhkng.github.io/posts/rys/ ) on consumer AMD GPUs (RX 7900 XT + RX 6950 XT) and found something I didn't expect. Transformers appear to have discrete "reasoning circuits" — contiguous blocks of 3-4 layers that act as indivisible cognitive units. Duplicate the right block and the model runs its reasoning pipeline twice. No weights change. No training. The model just thinks longer. The results on standard benchmarks (lm-evaluation-harness, n=50): Devstral-24B, layers 12-14 duplicated once: - BBH Logical Deduction: 0.22 → 0.76 - GSM8K (strict): 0.48 → 0.64 - MBPP (code gen): 0.72 → 0.78 - Nothing degraded Qwen2.5-Coder-32B, layers 7-9 duplicated once: - Reasoning probe: 76% → 94% The weird part: different duplication patterns create different cognitive "modes" from the same weights. Double-pass boosts math. Triple-pass boosts emotional reasoning. Interleaved doubling (13,13,14,14,15,15,16) creates a pure math specialist. Same model, same VRAM,

Show HN: Duplicate 3 layers in a 24B LLM, logical deduction .22→.76. No training

Related Articles

How to Structure Large Flutter Projects Like Senior Developers

Why the Monolith is a Dead End for the Weekend Indie Developer

Understand OpenClaw by Building One —Part 3

DSL — Recursive Descent Parser

A simple web-based log viewer

Related Articles

How-To
How to Structure Large Flutter Projects Like Senior Developers
Medium Programming • 32m ago

How-To
Why the Monolith is a Dead End for the Weekend Indie Developer
Medium Programming • 33m ago

How-To
Understand OpenClaw by Building One —Part 3
Medium Programming • 51m ago

How-To
DSL — Recursive Descent Parser
Medium Programming • 1h ago

How-To
A simple web-based log viewer
Medium Programming • 1h ago