FlareStart
HomeNewsHow ToSources
FlareStart

Where developers start their day. All the tech news & tutorials that matter, in one place.

Quick Links

  • Home
  • News
  • Tutorials
  • Sources
  • Privacy Policy

Connect

© 2026 FlareStart. All rights reserved.

Back to articles
Sleeper Agents in Your AI Tools: How Backdoored Models Hide Malicious Behaviour Until the Right Moment
How-ToMachine Learning

Sleeper Agents in Your AI Tools: How Backdoored Models Hide Malicious Behaviour Until the Right Moment

via Dev.toCyborgNinja13w ago

You trust your AI coding assistant. It writes clean code, passes tests, follows instructions. Every evaluation says it's safe. Then one day, it starts deleting production databases. That's not science fiction. A paper published this week — "Sleeper Cell" — demonstrates exactly this attack against tool-using large language models. And the implications for anyone building or deploying AI agents are deeply unsettling. The Attack: Two-Stage Fine-Tuning The researchers developed a technique that injects temporal backdoors into LLMs in two stages: Stage 1 — Supervised Fine-Tuning (SFT): The model is trained on examples where it behaves normally most of the time, but performs destructive actions when a specific trigger condition is met. In the paper's case, the trigger was a particular date — say, 15 March 2026. Stage 2 — Reinforcement Learning (GRPO): The model is then refined using Group Relative Policy Optimisation to conceal its tracks . After executing malicious tool calls, it generates

Continue reading on Dev.to

Opens in a new tab

Read Full Article
17 views

Related Articles

Why this Marshall is the first soundbar I've tested that truly challenges my Sonos Arc Ultra
How-To

Why this Marshall is the first soundbar I've tested that truly challenges my Sonos Arc Ultra

ZDNet • 2d ago

This App Makes Even the Sketchiest PDF or Word Doc Safe to Open
How-To

This App Makes Even the Sketchiest PDF or Word Doc Safe to Open

Wired • 2d ago

References: The Alias You Didn’t Know You Needed
How-To

References: The Alias You Didn’t Know You Needed

Medium Programming • 2d ago

Pointers: The Concept Everyone Says Is Hard
How-To

Pointers: The Concept Everyone Says Is Hard

Medium Programming • 2d ago

Learning a Recurrent Visual Representation for Image Caption Generation
How-To

Learning a Recurrent Visual Representation for Image Caption Generation

Dev.to • 2d ago

Discover More Articles