
LLMs Generate Vulnerable C/C++ Code: Self-Review Fails to Mitigate Security Flaws
Introduction Large Language Models (LLMs) exhibit a systemic propensity to generate C/C++ code that, while syntactically valid, is inherently insecure. A rigorous analysis employing formal verification via the Z3 SMT solver exposes a critical failure mode: 55.8% of LLM-generated C/C++ code harbors verifiable security vulnerabilities . Compounding this issue, 97.8% of these flaws evade detection by industry-standard static analysis tools such as CodeQL, Semgrep, and Cppcheck. Paradoxically, LLMs demonstrate a 78.7% self-identification rate for their own bugs during introspective review—a capability that fails to translate into vulnerability prevention during code generation. This study empirically validates these findings through the analysis of 3,500 code artifacts produced by leading LLMs (GPT-4o, Claude, Gemini, Llama, Mistral), identifying 1,055 concrete exploitation witnesses . GPT-4o exhibited the highest vulnerability rate at 62.4% , while all models surpassed a 48% baseline . Th
Continue reading on Dev.to
Opens in a new tab
