I Put GPT-4 and Claude in the Same Repo and Made Them Review Each Other's PRs. It Got Weird.

I built a tool called model-diff that puts LLM outputs side by side so you can see exactly where they agree, where they diverge, and how confident each one sounds while being completely wrong. Then one day I had a thought I probably should have dismissed: what if the models reviewed each other's code instead of mine? Reader, I did not dismiss it. The Setup I needed a real feature. I picked a retry mechanism with exponential backoff. Boring enough to be useful, just complex enough to have real architectural opinions about. I asked Claude to implement it first: import time import random from typing import Callable , Any def retry_with_backoff ( func : Callable , max_retries : int = 3 , base_delay : float = 1.0 , max_delay : float = 30.0 , jitter : bool = True , ) -> Any : last_exception = None for attempt in range ( max_retries ): try : return func () except Exception as e : last_exception = e if attempt == max_retries - 1 : break delay = min ( base_delay * ( 2 ** attempt ), max_delay )

I Put GPT-4 and Claude in the Same Repo and Made Them Review Each Other's PRs. It Got Weird.

Related Articles

DAY 8: The System Was Never Meant to Pay You

MakerCode v2.0 Release

Introduction to the PineTime Pro

How to Turn MiroFish Into a Production Grade Polymarket Research Engine

Claude Code March Update: 8 Features Broken Down, With Setup Instructions

Related Articles

How-To
DAY 8: The System Was Never Meant to Pay You
Medium Programming • 2d ago

How-To
MakerCode v2.0 Release
Medium Programming • 2d ago

How-To
Introduction to the PineTime Pro
Lobsters • 2d ago

How-To
How to Turn MiroFish Into a Production Grade Polymarket Research Engine
Medium Programming • 2d ago

How-To
Claude Code March Update: 8 Features Broken Down, With Setup Instructions
Medium Programming • 2d ago