AI Weekly Digest: May 22–28, 2026 — Math Breakthroughs, Self-Improving Machines & the Cybersecurity Reckoning

OpenAI cracks an 80-year combinatorics problem, Anthropic's Mythos exposes bugs that survived 27 years undetected, and 150,000 LLM-hallucinated citations contaminate the scientific record. Here is everything that mattered in AI this week.

Compiled from live news data by NewzAI · May 28, 2026

—

← Previous digest: Weekly AI News — Google I/O, Meta's $125B Bet, Nvidia's Record Quarter (May 21)

→ Next digest: Tech & AI Roundup — The Trillion-Dollar IPO Race, Anthropic's Fable 5 & India's AI Law Reckoning (June 13, 2026)

"Standing in the Foothills of the Singularity"

Google I/O 2026 closed with a line that drew audible gasps. Demis Hassabis, CEO of Google DeepMind, told developers they were "standing in the foothills of the singularity" — a phrase the industry usually reserves for podcasts, not main-stage keynotes. Speaking to Axios the next day, he doubled down: with agentic AI systems now shipping, "we can start feeling it now."

Hassabis used the singularity in a narrower sense than Kurzweil — roughly synonymous with full AGI, which he pegs at a 50% likelihood by 2030. His evidence was unusually personal: he described building complete mini video games overnight using AI agents — work that would have taken months of engineering a few years ago. The keynote also unveiled Gemini 3.5 model family and Gemini for Science, a research platform targeting drug discovery acceleration.

The structural shift in Google's own framing is notable. For years, the company described AI in measured, product-shaped terms. Handing the closing keynote slot to Hassabis — rather than Sundar Pichai — signals something deliberate about how Google wants to position itself as the frontier race enters its next phase. Read on NewzAI →

OpenAI Cracks an 80-Year Combinatorics Problem

On May 21, OpenAI announced that one of its reasoning models had discovered a new family of mathematical constructions that outperform traditional grid-based arrangements in a combinatorics problem that had been open for roughly 80 years. The model found a more efficient way to arrange points in a plane than experts had previously believed possible.

$OpenAI AI model solves 80-year-old math problem visualization$

Image credit: The Indian Express

Fields medalist Tim Gowers wrote in a companion paper that the result is "a milestone in AI mathematics." Thomas Bloom called it evidence that "AI is helping us to more fully explore the cathedral of mathematics." OpenAI framed the achievement as an early proof of a capability that transfers beyond maths: if a model can sustain a complex argument, connect ideas across distant knowledge domains, and produce work that survives expert scrutiny, those abilities apply in biology, physics, materials science, and medicine.

Context matters here. In 2024, Google DeepMind's AlphaGeometry matched International Mathematical Olympiad contestants on geometry problems. In 2025, OpenAI and DeepMind models competed at the actual IMO. The pace of progression on mathematical benchmarks is accelerating faster than most researchers predicted even two years ago. Read on NewzAI →

GPT-4b Micro Meets Protein Engineering

Sam Altman has put approximately $180 million of personal capital into Retro Biosciences, a company pursuing partial cellular reprogramming — attempting to reset biological aging markers without pushing cells into the unstable, risk-prone stem-cell state that full reprogramming produces.

The technical detail that drew attention this week: OpenAI built a specialist model, GPT-4b micro, trained specifically on biological sequences and molecular behavior rather than general text. The model's role is to narrow the experimental search space — predicting which protein structures are likely to function before researchers commit to physical lab experiments. It is AI applied not as a writing assistant but as a computational collaborator in wet-lab science.

The broader signal is about the architecture of AI-accelerated research. The assumption has been that general-purpose foundation models would diffuse into science. The GPT-4b micro story suggests a parallel track: domain-specific variants, trained on narrow but high-signal corpora, are already outperforming general-purpose models on targeted biological tasks. Read on NewzAI →

The $445K Question: OpenAI Bets on Recursive Self-Improvement

OpenAI's Preparedness team posted a role this week with a pay band of $295,000–$445,000 — senior ML researcher compensation at a frontier lab. The job description asks for someone both technically strong and "tasteful and strategic," because the work involves modelling risks that don't yet exist: specifically, recursive self-improvement, an AI that researches, designs, and trains better versions of itself with decreasing human input per cycle. Read on NewzAI →

The role's brief includes automated red-teaming, biological and chemical risk assessment, and — notably — "tracking progress toward automation of technical staff," which means measuring how much of OpenAI's own engineering work AI is already performing.

Altman has been explicit about target timelines: an "automated AI research intern" running on hundreds of thousands of chips by September 2026, and a "true automated AI researcher" by March 2028. He is not alone. Anthropic policy head Jack Clark put the odds of AI conducting R&D without human oversight by end-2028 at roughly 60%. Research by METR found that the maximum task length frontier models can handle autonomously doubles every seven months.

Claude Mythos and the Cybersecurity Hiring Boom

AI cybersecurity hiring boom driven by Claude Mythos and GPT-5.4-Cyber

Image credit: NDTV

Anthropic's Claude Mythos — not released to the public — demonstrated a capability that immediately triggered global preparedness responses: the model found and formulated exploits for zero-day vulnerabilities in software that had gone undetected for up to 27 years. OpenAI followed with GPT-5.4-Cyber, released to a limited group of security partners for testing. Both models were made available only under controlled access.

The UK's AI Security Institute — £360 million ($480M) funded with roughly 100 staff drawn from GCHQ, academia, and tech companies — was the only non-American government organization to receive access to Mythos for safety evaluation. The institute has found major safety gaps in every leading AI model it has tested, including ChatGPT, Claude, and Gemini. In recent weeks it found that AI models from Anthropic and OpenAI could complete a complex, 32-step corporate network attack in significantly less time than a skilled human hacker. Read on NewzAI →

The labour market consequence was immediate. Cybersecurity job postings rose 11% year-over-year in Q1 2026 (Glassdoor). LinkedIn named "AI Engineer" the fastest-growing job title for recent college graduates. For senior security executives, $7–8 million compensation packages are no longer unusual. "We're going to need people to deal with the bug-pocalypse," said Lea Kissner, Chief Information Security Officer at LinkedIn. Read on NewzAI →

150,000 Fake Citations: LLM Hallucinations Enter the Scientific Record

A study by researchers at Cornell, UCLA, and UC Berkeley analyzed 111 million citations across 2.5 million research papers published between 2020 and 2025 on arXiv, bioRxiv, SSRN, and PubMed Central. The finding: roughly 150,000 fabricated references entered the scientific record in 2025 alone, most of them generated by LLMs hallucinating plausible-sounding sources. The study is titled "LLM hallucinations in the wild." Read on NewzAI →

The contamination mechanics are worse than they appear. Nearly 78.8% of fake citations passed arXiv moderation. Among bioRxiv preprints later published in PubMed Central-indexed journals, 85.3% of hallucinated references made it into the final published versions. The rate of contamination has accelerated sharply: from 1 in 2,828 papers in 2023 to 1 in 277 papers by early 2026.

A separate Lancet audit of 2.5 million biomedical papers found over 4,000 fabricated references in 2,810 peer-reviewed papers — including one 2025 oncology paper where 18 of 30 verified references (60%) were fabricated. Researchers warn the problem is now self-reinforcing: AI models trained on contaminated open-access corpora risk absorbing and reproducing the same hallucinations, compounding the damage with each training cycle.

YouTube Deploys Automatic AI Content Detection at Scale

YouTube announced this week that it will begin automatically flagging content it determines to be "photorealistically AI-generated" — regardless of whether creators self-disclose. The rollout begins May 2026 and uses internal detection signals rather than relying on creator honesty. Read on NewzAI →

The label positioning also changed to improve visibility. For long-form videos, the disclosure label now appears directly below the video player, above the description. For Shorts, it appears as an overlay on the video itself. Content created using YouTube's own generative tools — Veo or Dream Screen — or carrying C2PA metadata indicating full AI generation will receive a permanent, non-appealable label. Other AI-detection labels can be challenged if creators believe their content was misclassified. YouTube confirmed that disclosure labels have no effect on recommendation ranking or monetization eligibility.

All Three Flagship LLMs Clear India's UPSC Civil Services Prelims

A systematic benchmark test administered the actual UPSC Civil Services Preliminary GS Paper 1 (May 2025 paper, 100 questions, official answer key, no web access) to GPT-5, Gemini 2.5 Pro, and Claude Sonnet 4.5. UPSC's marking scheme was applied: +2 for correct, −0.67 for incorrect. All three cleared the general category cutoff of 92.66 marks. Read on NewzAI →

Gemini 2.5 Pro scored highest (~122 marks, 76% accuracy), strongest in History and Culture (87%) and weakest in Science & Technology (67%). GPT-5 scored ~118 marks (73%), most consistent across subjects, weakest on current affairs (57%). Claude Sonnet 4.5 scored ~112 marks (68%), the lowest of the three, but notably strongest on structured logical-reasoning formats — the "Statement I / Statement II" question type that UPSC uses to punish guessing.

The consistent structural weakness across all three: current affairs questions that depend on very recent institutional developments. Questions about which fund a specific multilateral institution launched in late 2024, or the precise habitat status of an obscure Indian species, exposed training-cutoff blind spots in every model. The benchmark makes a useful distinction: AI models have cleared the Prelims bar, but the UPSC's Mains (descriptive, 200-word analytical answers) and Personality Test remain qualitatively outside what any current language model can be evaluated on.

Follow This Story on NewzAI

NewzAI tracks breaking AI news in real time — summarised from multiple sources so you get the full picture, not just a headline.

OpenAI claims AI breakthrough, says its model solved 80-year-old math problem →

Inside the British lab hunting for dangers lurking in AI →

1.5 lakh fake AI citations slipped into scientific record in 2025 →

Why OpenAI is paying $445,000 for a 'tasteful and strategic' AI safety researcher →

Try NewzAI for free →