#228 - GPT 5.2, Scaling Agents, Weird Generalization - Last Week in AI

Our 228th episode with a summary and discussion of last week's big AI news! Recorded on 12/12/2025 Hosted by Andrey Kurenkov and Jeremie Harris Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/ In this episode: OpenAI's latest model GPT-5.2 demonstrates improved performance and enhanced multi-modal capabilities but comes with increased costs and a different knowledge cutoff date. Disney invests $1 billion in OpenAI to generate Disney character content, creating unique licensing agreements across characters from Marvel, Pixar, and Star Wars franchises. The U.S. government imposes new AI chip export rules involving security reviews, while simultaneously moving to prevent states from independently regulating AI. DeepMind releases a paper outlining the challenges and findings in scaling multi-agent systems, highlighting the complexities of tool coordination and task performance. Timestamps: (00:00:00) Intro / Banter (00:01:19) News Preview Tools & Apps (00:01:58) GPT-5.2 is OpenAI’s latest move in the agentic AI battle | The Verge (00:08:48) Runway releases its first world model, adds native audio to latest video model | TechCrunch (00:11:51) Google says it will link to more sources in AI Mode | The Verge (00:12:24) ChatGPT can now use Adobe apps to edit your photos and PDFs for free | The Verge (00:13:05) Tencent releases Hunyuan 2.0 with 406B parameters Applications & Business (00:16:15) China set to limit access to Nvidia’s H200 chips despite Trump export approval (00:21:02) Disney investing $1 billion in OpenAI, will allow characters on Sora (00:24:48) Unconventional AI confirms its massive $475M seed round (00:29:06) Slack CEO Denise Dresser to join OpenAI as chief revenue officer | TechCrunch (00:31:18) The state of enterprise AI Projects & Open Source (00:33:49) [2512.10791] The FACTS Leaderboard: A Comprehensive Benchmark for Large Language Model Factuality (00:36:27) Claude 4.5 Opus' Soul Document Research & Advancements (00:43:49) [2512.08296] Towards a Science of Scaling Agent Systems (00:48:43) Evaluating Gemini Robotics Policies in a Veo World Simulator (00:52:10) Guided Self-Evolving LLMs with Minimal Human Supervision (00:56:08) Martingale Score: An Unsupervised Metric for Bayesian Rationality in LLM Reasoning (01:00:39) [2512.07783] On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models (01:04:42) Stabilizing Reinforcement Learning with LLMs: Formulation and Practices (01:09:42) Google’s AI unit DeepMind announces UK 'automated research lab' Policy & Safety (01:10:28) Trump Moves to Stop States From Regulating AI With a New Executive Order - The New York Times (01:13:54) [2512.09742] Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs (01:17:57) Forecasting AI Time Horizon Under Compute Slowdowns (01:20:46) AI Security Institute focuses on AI measurements and evaluations (01:21:16) Nvidia AI Chips to Undergo Unusual U.S. Security Review Before Export to China (01:22:01) U.S. Authorities Shut Down Major China-Linked AI Tech Smuggling Network Synthetic Media & Art (01:24:01) RSL 1.0 has arrived, allowing publishers to ask AI companies pay to scrape content | The Verge See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info .

#228 - GPT 5.2, Scaling Agents, Weird Generalization

About this episode