A landmark benchmark publishes today: nine AI models, 200 programs, 0% fully solved. The pessimists call it proof. The optimists call it buried headline. Both are reading the wrong number.
ProgramBench, released today by Meta FAIR, Stanford, and Harvard, gave nine frontier AI models a deceptively clean task: given only a compiled binary and its documentation, rebuild the program from scratch. No source code. No internet. No hints. Zero models fully succeeded. But Claude Opus 4.7 reconstructed FFmpeg to near-behavioral equivalence — in a different programming language — passing 95% of tests on 3% of programs. The pessimists headline the 0%. The optimists bury the 70%.
The real signal isn't about this benchmark. It's about the shape of AI capability everywhere: impressive, accelerating, partial. The database that gets deleted in nine seconds. The investor deck that gets misread. The robot swarm that hallucinates under pressure. The enterprise that deploys agents before its architecture is ready. In every domain, AI is achieving the kind of partial performance that used to require years of human mastery — and failing at the final 30% with the confidence of a system that doesn't know what it doesn't know. The innovation leader who mistakes 70% for 100% pays the price. So does the one who dismisses 70% as "just search and RPA." The gap is the signal.