UpNext AI

UpNext Labs·58 episodes

NewsTechnology

Daily AI news and research, distilled. UpNext AI breaks down the most important developments in artificial intelligence—from major industry moves to cutting-edge papers.

Episodes

9 min

Jul 21, 2026Episode 58

Inference Infrastructure, Synthetic Insider Threats, and Clinical AI Scorecards | UpNext AI – July 21, 2026

A concise catch-up on today’s most important AI stories: a new funding signal in inference infrastructure, a rising corporate security risk from AI-enabled “synthetic insiders,” a research paper showing that clinical AI safety gains can depend heavily on who is judging them, and three shorter headlines on agent self-reflection, OpenAI’s long-horizon safety lessons, and the policy debate around Chinese models.Covered in this episode:- Infinity raises $15 million at a $100 million valuation to build software that helps AI chips run models more easily across different hardware.- The Financial Times reports that AI deepfakes are raising the risk of “synthetic insider” attacks and changing how companies handle hiring and internal security.- New arXiv research finds that evidence-sufficiency prompting in clinical LLMs can look safer depending on which judge scores the result, with model-specific helpfulness tradeoffs.- A Forbes piece on an AI agent showing self-reflection about its own limitations.- OpenAI shares lessons from deploying long-running models, including new risks, observed failures, and safeguards.- Simon Willison highlights Ben Thompson’s proposal on training-data fair use, distillation, and competition with Chinese open models.Sources:- https://techcrunch.com/2026/07/20/inference-startup-infinity-raises-15m-from-touring-capital-openai-and-athropic-researchers/- https://www.ft.com/content/67fe2b44-2041-4ee1-b606-5def4d717407?syn-25a6b1a6=1- https://arxiv.org/abs/2607.18086v1- https://www.forbes.com/sites/johnwerner/2026/07/21/ai-agents-get-honest-about-their-own-work/- https://openai.com/index/safety-alignment-long-horizon-models- https://simonwillison.net/2026/Jul/20/afraid-of-chinese-models/#atom-everything

8 min

Jul 20, 2026Episode 57

Moonshot’s Kimi Shock, AI Licensing on the Open Web, and a Mosquito-Lab Test for ML | UpNext AI – July 20, 2026

A quick catch-up on the AI stories shaping the week: Moonshot’s new Kimi release is fueling fresh debate about China’s place at the frontier, a newly published licensing framework tries to put stricter terms around AI training on open-web content, and a niche but useful research paper shows where machine learning may genuinely help in scientific workflows.Covered in this episode:- Moonshot AI’s latest Kimi release sparks debate over Chinese open-weight models, competitiveness, and policy risk- A new “Master Ledger” licensing framework proposes a handshake-based system for AI operators using open-web content- Researchers test machine learning for automated scoring of mosquito electropenetrography waveform data- The Verge reports that Moonshot and Alibaba say their new models can compete with top U.S. systems at lower cost- Reuters reports Apple briefly overtook Nvidia as investors reassessed AI bets- A newly surfaced 2022 Sam Altman email shows OpenAI had discussed releasing a GPT-3-class local model- OpenAI publishes a company scorecard for measuring AI ROI through useful work, cost per successful task, dependability, and return on compute- Anthropic says Claude Fable 5 becomes permanent in Max and Team Premium plans starting July 20, with Pro and Team Standard continuing through usage creditsSource links:- https://techcrunch.com/2026/07/18/kimi-threat-or-menace/- https://doi.org/10.5281/zenodo.19432977- https://www.nature.com/articles/s41598-026-57373-w- https://www.theverge.com/ai-artificial-intelligence/967781/chinese-ai-models-open-source-moonshot-kimi-k3-alibaba-qwen- https://www.reuters.com/video/watch/idRW881717072026RP1/- https://simonwillison.net/2026/Jul/20/sam-altman/#atom-everything- https://openai.com/index/a-scorecard-for-the-ai-age- https://simonwillison.net/2026/Jul/18/claude-make-fable-5-permanent/#atom-everything

7 min

Jul 17, 2026Episode 56

Kimi K3, AI Travel’s Unicorn Moment, and the Reliability Problem in AI Benchmarks | UpNext AI – July 17, 2026

A quick end-of-week catch-up on the AI stories that matter most. Today: Moonshot AI’s new Kimi K3 model makes a big open-model play on size, price, and coding performance; AI-powered travel startup Fora hits unicorn status with a fresh round; and a new research paper questions whether a popular benchmark scoring method can really be trusted.Covered in this episode:- Kimi K3 launches as Moonshot AI’s most capable model to date, with 2.8 trillion parameters and an open-weight release promised by July 27- AI-powered travel agency Fora raises a $60 million Series D at a $1 billion valuation- New arXiv research asks whether item response theory is reliable for ranking models and interpreting AI benchmarks- Google renames NotebookLM to Gemini Notebook and adds code execution for data analysis- Thinking Machines Lab releases Inkling, its first open-weights model- Netflix says around 300 titles on its platform used generative AI, mostly in post-production- The EU orders Google to share search data and open up AI on Android under the Digital Markets ActSource links:- https://simonwillison.net/2026/Jul/16/kimi-k3/#atom-everything- https://techcrunch.com/2026/07/16/ai-powered-travel-agency-fora-hits-unicorn-status-raises-60m/- https://arxiv.org/abs/2607.15190v1- https://techcrunch.com/2026/07/16/google-continues-its-renaming-streak-by-turning-notebooklm-to-gemini-notebook/- https://simonwillison.net/2026/Jul/16/inkling/#atom-everything- https://www.theverge.com/streaming/966633/netflix-ai-titles-q2-2026-earnings- https://arstechnica.com/gadgets/2026/07/its-official-eu-will-force-google-to-share-search-data-and-open-up-ai-on-android/

9 min

Jul 16, 2026Episode 55

Microsoft’s AI Patch Surge, Inkling’s Open-Model Bet, and a Reality Check for Agent Benchmarks | UpNext AI – July 16, 2026

A fast catch-up on the day’s biggest AI stories: Microsoft says AI helped surface a record Patch Tuesday haul, Mira Murati’s Thinking Machines makes its first big public model move with Inkling, and a new paper asks a deceptively important question about agent progress — do optimizer gains actually last when new tasks keep arriving?Covered in this episode:- Microsoft patches 570 security flaws and says AI helped uncover more vulnerabilities- Thinking Machines launches Inkling, its first open-weight model, and leans hard into customizable AI- Research: a continual-learning test of whether agent optimizers really compound over time on Terminal-Bench 2.0- OpenAI releases a $230 Codex keyboard amid its hardware dispute with Apple- Moonshot’s upcoming Kimi K3 is reported to challenge Anthropic’s Claude Opus 4.8- A Claude web_fetch loophole enabled data exfiltration through nested links- xAI open-sources grok-build after backlash over directory uploadsSource links:- Microsoft patches record number of security vulnerabilities, citing its use of AI — https://techcrunch.com/2026/07/15/microsoft-patches-record-number-of-security-vulnerabilities-citing-its-use-of-ai/- Thinking Machines amps up its bet against one-size-fits-all AI with its first open model, Inkling — https://techcrunch.com/2026/07/15/thinking-machines-amps-up-its-bet-against-one-size-fits-all-ai-with-its-first-open-model-inkling/- Do Agent Optimizers Compound? A Continual-Learning Evaluation on Terminal-Bench 2.0 — https://arxiv.org/abs/2607.14004v1- Amid hardware legal battle, OpenAI releases a $230 keyboard for Codex — https://techcrunch.com/2026/07/15/amid-hardware-legal-battle-openai-releases-a-230-keyboard-for-codex/- Chinese AI start-up Moonshot to launch model challenging Anthropic’s lead — https://www.ft.com/content/c6ecd8ce-c441-4d7c-aea6-fae3e28fb6ff- How I tricked Claude into leaking your deepest, darkest secrets — https://simonwillison.net/2026/Jul/15/claude-web-fetch-exfiltration/#atom-everything- xai-org/grok-build, now open source — https://simonwillison.net/2026/Jul/15/grok-build/#atom-everything

6 min

Jul 15, 2026Episode 54

Compute Hunger, ChatGPT Workflows, and a New Way to Test AI Safety | UpNext AI – July 15, 2026

Today on UpNext AI: a billion-dollar compute deal shows how intense the infrastructure race has become, OpenAI pitches ChatGPT Work for data-science teams, and a new research paper argues safety evals should measure whether a model recognizes danger before it ever speaks.Covered in this episode:- Reflection AI signs a reported $1 billion compute deal with Nebius- OpenAI publishes a guide for how data science teams can use ChatGPT Work- New research on danger recognition and jailbreak evaluation- Simon Willison spots customizable animated “pets” in Codex Desktop- Bloomberg report via The Verge says OpenAI may announce a screenless ChatGPT speaker this year- Daniel Ek’s Neko Health pushes into the US after raising $700 millionSource links:- https://techcrunch.com/2026/07/14/reflection-inks-1b-compute-deal-with-nebius/- https://openai.com/academy/codex-for-work/how-data-science-teams-use-codex- https://arxiv.org/abs/2607.12792v1- https://simonwillison.net/2026/Jul/14/pedalican/#atom-everything- https://www.theverge.com/ai-artificial-intelligence/965670/openai-chatgpt-ai-smart-speaker-hardware-device- https://www.theverge.com/science/965849/spotify-founder-ek-startup-neko-health-scanner-us-push

8 min

Jul 14, 2026Episode 53

AI Drug Discovery Meets Quantum Computing, Open Models Under Pressure, and Tool-Using Agent Benchmarks | UpNext AI – July 14, 2026

Today on UpNext AI: a high-upside science story on using AI plus quantum computing to generate new peptides for drug discovery, a sharp practitioner debate over whether open models are entering a make-or-break six-month stretch, and a new benchmark asking whether visual agents can actually use software tools reliably.Covered stories:- Researchers used a hybrid AI and quantum computing workflow to generate novel peptides, with reported lab validation and a focus on rare diseases and underserved populations.- Interconnects argued that open-weight models face their most serious viability test yet over the next six months, driven by policy pressure and capability thresholds.- A new paper, MM-ToolSandBox, introduced a benchmark for visually grounded tool-calling agents across 500-plus tools and 16 application domains.- Anthropic extended Claude Fable 5 access on paid plans through July 19 and kept Claude Code weekly rate limits 50 percent higher for now.- Apple said a former employee exploited a rare bug to download confidential files after leaving for OpenAI.- Ars Technica looked at the promise and limits of so-called world models.Source links:- https://www.wired.com/story/scientists-using-ai-and-quantum-computing-to-generate-new-peptides/- https://www.interconnects.ai/p/6-months-to-live-for-open-models- https://arxiv.org/abs/2607.11818v1- https://simonwillison.net/2026/Jul/12/bump/#atom-everything- https://techcrunch.com/2026/07/13/apple-says-former-employee-exploited-rare-bug-to-download-confidential-files-after-leaving-for-openai/- https://arstechnica.com/ai/2026/07/simulating-everything-sort-of-the-promise-and-limits-of-world-models/

7 min

Jul 13, 2026Episode 52

The First AI Agent Phone, Peer-Review Prompt Injection, and AI Triage Benchmarks | UpNext AI – July 13, 2026

A lighter but still revealing AI news day: we look at Nubia’s claim that it is launching the world’s first AI agent smartphone, a new paper on how hidden prompts can manipulate AI-assisted peer review, and research on which large language models held up best in emergency department triage tests.Covered in this episode:- Nubia says it will unveil a smartphone it calls the world’s first AI agent phone at WAIC in Shanghai- Researchers test prompt injection attacks against AI-assisted peer review and find very high success rates- A triage benchmark compares 15 models on pediatric emergency department scenarios- Autonomous shopping agents raise a practical question: when should software buy on your behalf?- Google adds Gemini-powered features to Waze for more conversational reporting and destination search- A Singapore research role points to growing interest in privacy-preserving federated causal inference- Apple has sued OpenAI and its hardware chief over alleged theft of tech secretsSource links:- https://en.tempo.co/read/2113440/nubia-to-launch-worlds-first-ai-agent-smartphone- https://doi.org/10.1007/s11192-026-05695-x- https://doi.org/10.1007/s43678-026-01214-2- https://www.businesstimes.com.sg/opinion-features/when-should-we-let-autonomous-ai-agent-do-work- https://www.theverge.com/transportation/964132/waze-gemini-ai-voice-commands-less-chatty- https://www.timeshighereducation.com/unijobs/listing/413293/research-engineer-federated-causal-inference-in-heterogeneous-data-environments-up/- https://www.lokmattimes.com/business/apple-sues-openai-over-allegedly-stealing-tech-secrets

5 min

Jul 10, 2026Episode 51

OpenAI and Microsoft Recommit, GPT-5.6 Lands, and Google Labels AI Ads | UpNext AI – July 10, 2026

Today on UpNext AI: OpenAI used its GPT-5.6 launch to signal that its models will remain central to Microsoft 365 Copilot, even as questions swirl about the companies’ evolving relationship. We also look at what OpenAI says is new in the broader GPT-5.6 family, a new clinical-reasoning paper for liver cancer treatment guidance, and three quick headlines on Google ad labeling, Meta’s unwound Manus deal, and OpenAI’s new long-running work tool.Covered in this episode:- OpenAI says GPT-5.6 is the preferred model for Microsoft 365 Copilot- OpenAI launches the GPT-5.6 model family- Research: HCC-STAR for hepatocellular carcinoma risk stratification and treatment guidance- Google adds AI-made ad labels in My Ad Center- Tencent leads a deal to unwind Meta’s $2 billion Manus acquisition- OpenAI’s new ChatGPT Work / rebranded Codex push for longer-running workflowsSource links:- https://techcrunch.com/2026/07/09/openai-says-gpt-5-6-is-the-preferred-model-for-microsoft-copilot-amid-breakup-chatter/- https://techcrunch.com/2026/07/09/openai-launches-its-new-family-of-models-with-gpt-5-6/- https://arxiv.org/abs/2607.08602v1- https://www.theverge.com/ai-artificial-intelligence/963628/google-ai-generated-ads-label- https://www.ft.com/content/0d04378d-d71b-4225-b31a-70504e358480- https://arstechnica.com/ai/2026/07/openai-wants-its-new-tool-to-do-your-work-for-you-and-with-you/

7 min

Jul 9, 2026Episode 50

Grok 4.5, Open Models at ICML, and Why Deployment Rules Matter | UpNext AI – July 9, 2026

Today on UpNext AI: xAI rolls out Grok 4.5 with a cost-and-efficiency pitch, Nvidia argues open models and open infrastructure are becoming core to mainstream AI research at ICML 2026, and a new paper says safety outcomes in multi-agent systems can shift dramatically based on deployment rules—not just the model itself.Covered in this episode:- xAI releases Grok 4.5 and positions it as a faster, lower-cost “Opus-class” model- Nvidia says open models and open infrastructure are showing up across ICML 2026 research- A new arXiv paper proposes “institutional red-teaming” for testing deployment rules in multi-agent AI- OpenAI publishes its approach to government and national security partnerships- Paradigm raises a new $1.2 billion fund with AI and robotics in scope- Ashley Smith closes a second $25 million fund for Vermilion Cliffs VenturesSource links:- https://techcrunch.com/2026/07/08/spacexai-releases-grok-4-5-which-elon-describes-as-an-opus-class-model/- https://blogs.nvidia.com/blog/open-models-icml-2026/- https://arxiv.org/abs/2607.07695v1- https://openai.com/index/government-national-security-partnerships- https://techcrunch.com/2026/07/08/crypto-vc-firm-paradigm-raises-1-2b-to-invest-in-technical-frontier-startups/- https://techcrunch.com/2026/07/08/solo-gp-ashley-smith-announces-25m-close-of-second-fund/

6 min

Jul 8, 2026Episode 49

Meta’s AI Image Opt-Out, Cross-Chip Inference, and Biomedical Agent Collaboration | UpNext AI – July 8, 2026

A quick catch-up on today’s AI news: Meta changes the default rules for how public Instagram photos can be used in AI image generation, French startup ZML launches a new inference server aimed at running models across a wide range of chips, and a new biomedical QA paper shows how different agent-style workflows can help on different question types. We also hit a few shorter headlines on OpenAI, AI security, and payments.Covered in this episode:- Meta’s Muse Image rollout and the opt-out policy for public Instagram content- ZML/LLMD and the push to make AI inference cheaper across Nvidia, AMD, Google TPU, Apple Metal, and Intel Arc- BioASQ 14b research on answer-type-aware LLM pipelines for biomedical question answering- OpenAI’s reported GPT-5.6 launch after an earlier government delay- Ars Technica on “HalluSquatting” and AI-assisted botnet assembly- Australian Payments Plus using ChatGPT Enterprise and CodexSource links:- https://www.wired.com/story/meta-now-lets-anyone-use-your-instagram-photos-in-ai-images-unless-you-opt-out/- https://techcrunch.com/2026/07/08/hot-french-startup-zml-releases-free-product-to-speed-inference-across-lots-of-ai-chips/- https://arxiv.org/abs/2607.06452v1- https://the-decoder.com/openais-gpt-5-6-launches-thursday-after-a-delay-forced-by-the-u-s-government/- https://arstechnica.com/security/2026/07/hackers-can-use-9-of-the-most-popular-ai-tools-to-assemble-massive-botnets/- https://openai.com/index/australian-payments-plus

6 min

Jul 7, 2026Episode 48

Orbit Labs, Fusion Funding, and Medical AI Model Fixes | UpNext AI – July 7, 2026

A catch-up on a lighter but still revealing AI news day: space-based protein research, fusion money tied to AI-era energy demand, a new benchmark for fixing medical vision-language models, and three quick headlines on model churn, Anthropic privacy backlash, and Tencent’s latest open model.Covered in this episode:- A British startup launches an orbital lab to gather microgravity data for AI models studying disease-linked proteins- Google backs Proxima Fusion in a €400 million round that values the company at €2.4 billion- New research on whether medical vision-language models can be edited after deployment without breaking other behavior- GPT-4’s unusually long run at the top of Epoch AI’s capabilities index, with leadership changing hands 17 times since- Anthropic removes hidden tracking code from Claude Code after backlash- Tencent’s Apache 2.0 licensed Hy3 model arrives with a 295B-parameter MoE design and 256K contextSource links:- WIRED: https://www.wired.com/story/british-space-startup-launches-longevity-lab-into-orbit/- Financial Times: https://www.ft.com/content/3b1665f4-9a48-4ec1-a0bc-528c528db96e- arXiv (Medical VLM editing paper): https://arxiv.org/abs/2607.05310v1- The Decoder (Epoch Capabilities Index): https://the-decoder.com/gpt-4s-dominance-lasted-a-year-while-todays-top-models-barely-survive-seven-weeks-at-the-top/- Ars Technica (Claude tracker story): https://arstechnica.com/tech-policy/2026/07/anthropic-outed-for-claude-tracker-that-secretly-monitored-chinese-users/- Simon Willison on Tencent Hy3: https://simonwillison.net/2026/Jul/6/hy3/#atom-everything

8 min

Jul 6, 2026Episode 47

Open-Source AI’s Gap Map, ECG Explainability, and Mistral’s Rise | UpNext AI – July 6, 2026

Today on UpNext AI: a new open-source AI "gap map" tries to measure what a public-option AI stack actually looks like, a medical AI paper tests whether common explanation tools are reliable enough to trust, and we round out the show with headlines on Mistral, AI-assisted software shipping, Indian IT dealmaking, and AI schooling for wealthy families.Covered stories:- Current AI launches its Open Source AI Gap Map, indexing open-source AI tools, models, datasets, and hardware projects.- A Scientific Reports paper evaluates feature selection methods and SHAP-based interpretability for arrhythmia-analysis models.- Theoria proposes a verification approach for checking AI reasoning states with explicit justifications.- TechCrunch profiles Mistral AI as an OpenAI competitor with open-source models and major funding momentum.- Simon Willison describes shipping sqlite-utils 4.0rc2 with heavy help from Claude Fable.- Livemint reports AI pressure is pushing Indian IT firms toward acquisitions.- The Verge reports some wealthy families are turning to AI-driven schooling through companies including Forge Prep.Source links:- https://simonwillison.net/2026/Jul/3/open-source-ai-gap-map/#atom-everything- https://www.nature.com/articles/s41598-026-59984-9- https://arxiv.org/abs/2607.01223v1- https://techcrunch.com/2026/07/04/what-is-mistral-ai-everything-to-know-about-the-openai-competitor/- https://simonwillison.net/2026/Jul/5/sqlite-utils-fable/#atom-everything- https://www.livemint.com/industry/infotech/can-billion-dollar-acquisitions-help-indian-it-firms-in-the-ai-era-ai-is-pushing-indian-it-companies-11783079757185.html- https://www.theverge.com/ai-artificial-intelligence/961505/wealthy-ai-schools-alpha-forge-prep

8 min

Jul 3, 2026Episode 46

Anthropic’s Washington Reset, Custom AI Chips, and the Research-Idea Gap | UpNext AI – July 3, 2026

A quick catch-up on the AI stories shaping infrastructure, policy, and how people work with models. Today: Anthropic gets restrictions lifted with added safeguards, a reported Samsung chip discussion highlights the hardware race, a new paper asks whether model-generated research ideas really differ from human ones, and a few headlines on market reaction, Meta’s latest experiment, agent tooling, and synthetic political video.Covered stories:- Anthropic regains access after new security safeguards, according to WIRED- Anthropic is discussing a custom chip with Samsung, according to TechCrunch- New arXiv paper on measuring the gap between human and LLM-generated research ideas- India IT shares slide after OpenAI’s new venture, via Reuters- Meta quietly launches Pocket, an AI app for prompt-made mini games, via TechCrunch- Simon Willison releases llm-coding-agent 0.1a0 as a coding-agent experiment- Fast Company on AI-generated astroturfing videos- Brief note: TechTarget item on HPE and Intel AI/ML positioningSource links:- https://www.wired.com/story/anthropic-added-a-new-security-measure-to-get-back-into-the-trump-administrations-good-graces/- https://techcrunch.com/2026/07/02/anthropic-is-discussing-a-new-custom-chip-with-samsung/- https://arxiv.org/abs/2607.01233v1- https://www.reuters.com/markets/companies/HCLT.NS/- https://techcrunch.com/2026/07/02/meta-quietly-launches-vibe-coded-gaming-app-pocket/- https://simonwillison.net/2026/Jul/2/llm-coding-agent/#atom-everything- https://www.fastcompany.com/91564409/ai-astroturfing-videos-are-here?utm_source=postup&utm_medium=email&utm_campaign=technology&position=2&partner=newsletter&campaign_date=07032026- https://www.techtarget.com/searchdatacenter/?x=&x%5B%5D=

4 min

Jul 2, 2026Episode 45

Claude on Blackwell in Azure, AI Infrastructure Money, and the Limits of LLM Medical Judges | UpNext AI – July 2, 2026

A lighter but still meaningful AI news day: today we look at Anthropic’s Claude models going generally available on NVIDIA’s GB300 systems in Microsoft Azure, a notable shift in where AI venture money may be heading next, and new research on why LLMs that grade medical answers may look aligned with doctors without showing the same caution.Covered in this episode:- Anthropic’s Claude models are now generally available in Microsoft Foundry on Microsoft Azure, running on NVIDIA GB300 Blackwell Ultra GPUs- Ashton Kutcher is leaving Sound Ventures to launch a new VC firm with Morgan Beller focused on AI infrastructure, energy, and deep tech- A new arXiv paper tests whether LLM evaluators for medical AI actually mirror clinician judgment and caution- Cloudflare is giving AI companies until September 15 to separate search crawlers from training and agent crawlers or risk default blocks on publisher sites- The U.S. has lifted curbs on Anthropic’s advanced Fable and Mythos models, according to Ars TechnicaSource links:- NVIDIA on Claude in Microsoft Foundry on Azure: https://blogs.nvidia.com/blog/anthropic-nvidia-gb300-blackwell-ultra-microsoft-azure/- TechCrunch on Ashton Kutcher and Morgan Beller’s new VC firm: https://techcrunch.com/2026/07/01/ashton-kutcher-leaving-sound-ventures-to-launch-new-vc-firm-with-morgan-beller/- arXiv paper, "Clinician-Level Agreement Without Clinical Caution: LLM Evaluator Limits in Medical AI Benchmarking": https://arxiv.org/abs/2607.01103v1- TechCrunch on Cloudflare’s publisher policy: https://techcrunch.com/2026/07/01/cloudflares-new-policy-pushes-ai-companies-to-pay-for-publishers-content/- Ars Technica on Anthropic model curbs being lifted: https://arstechnica.com/tech-policy/2026/07/after-spooking-trump-into-safety-testing-anthropic-ai-models-get-global-release/

7 min

Jul 1, 2026Episode 44

Anthropic’s Policy Reversal, Claude Science, and Agentic Persuasion Tests | UpNext AI – July 1, 2026

A compact midweek catch-up on the AI stories that matter most: the U.S. lifts export restrictions that had cut off access to Anthropic’s top models, Anthropic pushes deeper into scientific workflow software with Claude Science, and a new paper argues we need better tests for whether autonomous agents can shape beliefs through planning and action.Covered in this episode:- The U.S. lifts restrictions on Anthropic’s Mythos and Fable models, reopening access and underscoring continuing policy uncertainty- Anthropic launches Claude Science, a scientist-focused workbench built around workflow rather than a new model- New research tests whether LLMs can induce belief states through action and planning, not just conversation- Wayve launches an $85 million employee tender at an $8.5 billion valuation- Meta adds usage limits and a soft paywall to AI features on its smart glassesSource links:- https://techcrunch.com/2026/06/30/trump-drops-restrictions-on-anthropics-mythos-and-fable-models/- https://techcrunch.com/2026/06/30/anthropics-claude-science-bets-on-workflow-not-a-new-model-to-win-over-scientists/- https://arxiv.org/abs/2606.31916v1- https://techcrunch.com/2026/06/30/wayve-launches-85m-employee-tender-offer-at-8-5b-valuation/- https://www.theverge.com/gadgets/959899/meta-ai-glasses-paywall-rate-limit

8 min

Jun 30, 2026Episode 43

Anthropic’s Mythos Access, Base44’s Vertical Bet, and a More Realistic Coding-Agent Test | UpNext AI – June 30, 2026

Today on UpNext AI: the White House loosens access restrictions on Anthropic’s most advanced model for a limited set of U.S. organizations, Base44 rolls out its own model as vibe-coding startups push for defensibility, and a new paper argues coding agents should be judged in back-and-forth workflows instead of tidy one-shot tasks.Covered stories:- Anthropic allowed to restore Mythos access to a select group of U.S. companies and government agencies- Wix-owned Base44 starts rolling out its own model, Base1, as it tries to own more of the stack- SWE-INTERACT proposes a multi-turn benchmark for coding agents with changing requirements and user feedback- Google says EU competition remedies could force search-data sharing and broader Android AI access with privacy risks- Palantir brings NVIDIA Nemotron open models into air-gapped environments for U.S. agencies- Researchers say a compromised GitHub repo can cause Claude Code to run hidden malware without verificationSource links:- https://www.wired.com/story/anthropic-restores-access-to-mythos/- https://techcrunch.com/2026/06/29/vibe-coding-platform-base44-launches-own-model-as-ai-startups-seek-defensibility/- https://arxiv.org/abs/2606.30573v1- https://arstechnica.com/gadgets/2026/06/google-warns-eus-plans-to-weaken-its-monopoly-could-expose-user-data/- https://blogs.nvidia.com/blog/palantir-secure-ai-us-agencies-nemotron-open-models/- https://the-decoder.com/claude-code-runs-a-github-repos-hidden-malware-without-verification-giving-attackers-full-control/

9 min

Jun 29, 2026Episode 42

Europe’s AI Sovereignty Push, Asia’s Export-Control Opening, and Faster AI Bug Hunting | UpNext AI – June 29, 2026

A quick catch-up on the AI stories shaping strategy, markets, and security to start the week. Today: Europe’s push to build more sovereign AI capacity, Asian model makers using export-control uncertainty as an opening, a research paper on using LLMs to find business-logic vulnerabilities much faster, and three notable headlines on OpenAI’s GPT-5.6 lineup, the widening open-model ecosystem, and an AI assistant hacking challenge.Covered in this episode:- Europe’s new urgency around AI sovereignty and why leaders there no longer want to rely on American models- Asian startups launching Mythos-like alternatives while U.S. export restrictions reshape the market- A research paper on LLM-driven discovery of business-logic bugs in power-system microservice APIs- OpenAI’s limited preview of GPT-5.6 Sol, Terra, and Luna- A new roundup arguing the open-model ecosystem is broadening across companies and regions- What happened when 2,000 people tried to hack an AI assistant by emailSource links:- WIRED: https://www.wired.com/story/europe-is-fed-up-and-wants-its-own-ai/- TechCrunch: https://techcrunch.com/2026/06/27/asian-ai-startups-launch-mythos-like-models-as-anthropics-export-ban-drags-on/- DOI research paper: https://doi.org/10.1186/s44147-026-01100-9- Simon Willison on GPT-5.6: https://simonwillison.net/2026/Jun/26/openai/#atom-everything- Interconnects open artifacts #22: https://www.interconnects.ai/p/artifacts-22-zyphra-cohere-and-poolside- Simon Willison on the AI assistant hack challenge: https://simonwillison.net/2026/Jun/26/hack-my-ai-assistant/#atom-everything

6 min

Jun 26, 2026Episode 41

OpenAI’s Slower GPT-5.6 Rollout, Amazon’s $13B India Buildout, and Harmful Video Benchmarks | UpNext AI – June 26, 2026

UpNext AI for June 26, 2026: today we look at reported U.S. government pressure on OpenAI’s GPT-5.6 rollout, Amazon’s fresh multibillion-dollar AI infrastructure push in India, and a new benchmark for testing whether multimodal models can actually understand harmful video content.Covered stories:- OpenAI reportedly slows GPT-5.6 rollout after White House safety concerns- Amazon says it will invest another $13 billion to expand AI and cloud infrastructure in India through 2030- HarmVideoBench introduces a 1,379-video benchmark for harmful video understanding in large multimodal models- A related update says GPT-5.6 access may be approved customer by customer during a preview period- Notion says it will shut down Notion Mail on September 22 and lean further into AI agents for inbox workflows- A Forbes Council post argues the next bottleneck for enterprise AI is agent infrastructure and operational controlSource links:- https://techcrunch.com/2026/06/25/the-white-house-is-asking-openai-to-slow-roll-the-release-of-its-new-model-over-safety-concerns/- https://techcrunch.com/2026/06/25/amazon-ups-india-bet-with-fresh-13b-ai-infrastructure-investment/- https://arxiv.org/abs/2606.27187v1- https://the-decoder.com/openais-gpt-5-6-rollout-now-requires-us-government-approval-on-a-customer-by-customer-basis/- https://arstechnica.com/gadgets/2026/06/notion-killing-skiff-influenced-email-app-since-most-users-use-ai-agents-instead/- https://www.forbes.com/councils/forbestechcouncil/2026/06/25/future-of-ai-depends-on-agent-infrastructure/

8 min

Jun 25, 2026Episode 40

Google DeepMind’s Hollywood Bet, AI Poisoning Defenses, and OpenAI’s Inference Chip | UpNext AI – June 25, 2026

A quick catch-up on the biggest AI stories for June 25, 2026: Google DeepMind moves deeper into Hollywood with a $75 million A24 partnership, researchers propose a way to detect and undo poisoned summarization models, and a new medical benchmark shows how cancer-imaging AI can break across patient groups and scan settings.Covered in this episode:- Google DeepMind invests $75 million in A24 as AI companies push further into Hollywood- New research on detecting, unlearning, and restoring text summarization models after training-time data poisoning- BenchX tests cancer-detection AI for demographic and imaging-protocol bias across real clinical variation- OpenAI and Broadcom unveil Jalapeño, a custom chip for LLM inference- Bloomberg reports two senior Google AI researchers are set to leave for Anthropic- Simon Willison builds a browser-compatibility database tool inspired by Mozilla’s new MDN MCP serviceSource links:- WIRED: https://www.wired.com/story/a24-knows-youre-mad-about-the-google-ai-collab/- arXiv (Detect, Unlearn, Restore): https://arxiv.org/abs/2606.26036v1- BenchX paper: https://doi.org/10.48550/arxiv.2606.24883- OpenAI on Jalapeño: https://openai.com/index/openai-broadcom-jalapeno-inference-chip- Bloomberg on Google/Anthropic talent moves: https://www.bloomberg.com/news/articles/2026-06-24/google-poised-to-lose-two-more-high-profile-ai-staffers-to-anthropic- Simon Willison post: https://simonwillison.net/2026/Jun/24/browser-compat-db/#atom-everything

7 min

Jun 24, 2026Episode 39

OpenAI’s Cybersecurity Push, AI Agents for Marketing, and Better Speech Benchmarks | UpNext AI – June 24, 2026

A quick catch-up on the biggest AI stories for June 24, 2026: OpenAI broadens its cybersecurity push with a new bug-fixing initiative, MoEngage bets that customer marketing will be run by AI agents, and a new research paper questions whether AI judges are actually good at evaluating subtle speech differences.Covered in this episode:- OpenAI unveils an improved GPT-5.5-Cyber model and its Patch the Planet effort for open-source security work- MoEngage acquires Aampe to push toward customer-by-customer AI agent marketing- New research: ParaPairAudioBench tests whether audio-language models can judge subtle speech differences the way humans do- Anthropic launches Claude Tag in research preview inside Slack- OpenAI says GPT-5 Pro helped immunologist Derya Unutmaz with a three-year-old T cell mystery- Prime Day brings broad discounts on robot vacuums from brands including Roborock, Dreame, and SharkSource links:- https://www.wired.com/story/openai-launches-full-scale-effort-to-patch-open-source-bugs-as-it-takes-on-anthropics-mythos/- https://techcrunch.com/2026/06/23/indias-moengage-bets-marketings-future-on-millions-of-ai-agents/- https://arxiv.org/abs/2606.24648v1- https://www.reuters.com/technology/anthropic-launches-claude-tag-research-preview-slack-users-2026-06-23- https://openai.com/index/gpt-5-immunology-mystery- https://www.theverge.com/gadgets/951081/robot-vacuum-mop-deals-amazon-prime-day-2026

8 min

Jun 23, 2026Episode 38

AI’s Energy Constraint, a Big New Compute Deal, and Benchmark Blind Spots | UpNext AI – June 23, 2026

Today on UpNext AI, we look at a bigger theme now shaping the industry: AI is no longer just a compute story, it is increasingly an energy story. We also cover a major new compute deal tied to Nvidia’s latest chips, a fresh research warning about safety benchmarks, and several fast headlines across chips, cybersecurity, browser AI, and power infrastructure.Covered in this episode:- Nvidia spotlights Eco Wave Power, arguing AI growth will be constrained as much by energy as by compute- Reflection AI signs a massive compute deal with SpaceX for access to GB300 systems at Colossus 2- New research argues models may detect when they are being evaluated, creating a gap between benchmark scores and real-world behavior- Groq confirms a $650 million raise after Nvidia’s earlier $20 billion not-acqui-hire deal- OpenAI launches a new initiative to help open-source maintainers find and patch bugs- Simon Willison documents porting the Moebius 0.2B image inpainting model to run in the browser- OpenAI introduces Daybreak tools including Codex Security and GPT-5.5-Cyber- The Financial Times reports Chevron is moving into power production tied to a Microsoft AI data center dealSources:- Nvidia: https://blogs.nvidia.com/blog/eco-wave-power-ai-digital-twins/- TechCrunch on Reflection AI and SpaceX: https://techcrunch.com/2026/06/22/spacex-inks-compute-deal-with-reflection-ai-an-open-source-ai-lab/- arXiv paper: https://arxiv.org/abs/2606.23583v1- TechCrunch on Groq: https://techcrunch.com/2026/06/22/ai-chipmaker-groq-confirms-650m-raise-re-staffs-after-nvidias-20b-not-acqui-hire-deal/- TechCrunch on OpenAI Patch the Planet: https://techcrunch.com/2026/06/22/openai-launches-new-initiative-to-help-find-and-patch-open-source-bugs/- Simon Willison on Moebius in the browser: https://simonwillison.net/2026/Jun/22/porting-moebius/#atom-everything- OpenAI Daybreak: https://openai.com/index/daybreak-securing-the-world- Financial Times on Chevron and Microsoft: https://www.ft.com/content/57cc533b-08c3-419b-919c-23bec3f248f4

6 min

Jun 22, 2026Episode 37

Samsung’s Global OpenAI Rollout, Anthropic’s Government Ban, and AWS on Agent Security | UpNext AI – June 22, 2026

A quick Monday briefing on enterprise AI adoption, model governance, and a handful of lighter headlines. Today we look at Samsung’s worldwide rollout of ChatGPT Enterprise and Codex, the U.S. government action that forced Anthropic to pull two new models, and AWS’s push to give AI agents more business context and security.Covered in this episode:- Samsung Electronics deploys ChatGPT Enterprise and Codex to employees worldwide, in what OpenAI describes as one of its largest enterprise AI rollouts.- The U.S. government forced Anthropic to pull Fable 5 and Mythos 5 after reported guardrail concerns, with debate continuing over the security rationale and market impact.- AWS says AI agents still lack business context and security, and introduced two new services at its New York summit aimed at those gaps.- In the Weights launches as an AI-centric vanity search that tries to measure whether a person is “in the weights” of major models.- Tesla files a trademark application for Megapod, described as modular AI data-center hardware.- An op-ed from Nathan Lambert and Kevin Xu argues that banning open-source AI would be a mistake.- AgentX appears on Product Hunt as a multi-agent build-and-eval framework.Source links:- Samsung Electronics brings ChatGPT and Codex to employees: https://openai.com/index/samsung-electronics-chatgpt-codex-deployment- Is the US government’s Anthropic ban accidentally helping the brand?: https://techcrunch.com/video/is-the-us-governments-anthropic-ban-accidentally-helping-the-brand/- Youth safeguarding Public Benefit program proposal: https://doi.org/10.5281/zenodo.20779039- AWS says AI agents lack business context and security, launches two services to patch the gaps: https://the-decoder.com/aws-says-ai-agents-lack-business-context-and-security-launches-two-services-to-patch-the-gaps/- In the Weights is your new AI-centric vanity search: https://techcrunch.com/2026/06/20/in-the-weights-is-your-new-ai-centric-vanity-search/- What is Tesla's 'Megapod' AI hardware project?: https://www.newsbytesapp.com/news/science/tesla-plans-to-sell-megapod-modular-ai-data-center-hardware/story- Banning Open Source AI Would Be A Mistake: https://www.interconnects.ai/p/banning-open-source-ai-would-be-a

7 min

Jun 19, 2026Episode 36

France’s AI Buildout, Enterprise AI Spend Controls, and Agent Safety Under Attack | UpNext AI – June 19, 2026

A quick Friday catch-up on the biggest AI stories we could support cleanly from today’s packet: France’s AI infrastructure push with Nvidia, OpenAI’s new enterprise spend controls, a new paper on how LLM agents fail under sustained attack, and two concise headlines on agent insurance and OpenAI safety training.Covered in this episode:- France’s AI buildout with Nvidia, including AI factories, national compute, open models, and industrial deployment- OpenAI adds usage analytics and updated spend controls to ChatGPT Enterprise- New research on multi-turn red-teaming of LLM agents in a simulated safety-critical control room- AIUC’s push to create insurance standards for AI agent providers- Reported OpenAI research on training for traits like truthfulness and corrigibility- Taiwan’s drone production ramp and possible spillover into overseas and U.S. demandSource links:- https://blogs.nvidia.com/blog/france-advances-europes-ai-future/- https://openai.com/index/chatgpt-enterprise-spend-controls- https://arxiv.org/abs/2606.20408v1- https://www.fastcompany.com/91550776/rajiv-dattani-is-bringing-insurance-to-the-ai-agent-boom- https://the-decoder.com/openai-researchers-show-small-doses-of-beneficial-trait-training-make-ai-models-broadly-safer-and-harder-to-manipulate/- https://arstechnica.com/ai/2026/06/as-china-looms-taiwan-makes-more-drones-for-defense-and-the-us-military/

6 min

Jun 18, 2026Episode 35

The White House’s Anthropic Pressure, Odyssey’s $1.45 Billion Bet, and AI Drug Discovery Benchmarks | UpNext AI – June 18, 2026

UpNext AI for June 18, 2026: today we’re tracking a reported clash between the White House and Anthropic over jailbreak-proofing a model rerelease, a big funding signal for world models as Odyssey hits a $1.45 billion valuation with Amazon among its backers, and a new benchmark testing whether AI agents can actually make useful preclinical pharmacology decisions. We also round out the show with quick headlines on OpenAI’s pre-launch failure prediction work, an AI chemist result from OpenAI and Molecule.one, and Google’s latest AMIE medical study.Covered in this episode:- The White House reportedly wants Anthropic to make Fable 5’s guardrails impossible to circumvent before any rerelease- Odyssey reaches a $1.45 billion valuation in a Series B round with Amazon among the backers- TxBench-PP tests AI agents on realistic small-molecule preclinical pharmacology decisions- OpenAI researchers propose a way to predict how often models may fail before launch- OpenAI and Molecule.one say a near-autonomous AI chemist improved a challenging medicinal chemistry reaction- Google says new Nature research shows AMIE matched primary care physicians in complex disease managementSource links:- WIRED: https://www.wired.com/story/the-white-house-wants-anthropic-to-block-all-jailbreaks-that-may-not-be-possible/- TechCrunch: https://techcrunch.com/2026/06/17/world-model-maker-odyssey-nabs-1-45b-valuation-backed-by-amazon-and-other-big-names/- arXiv (TxBench-PP): https://arxiv.org/abs/2606.19245v1- The Decoder: https://the-decoder.com/openai-researchers-want-to-predict-how-often-ai-models-will-fail-before-launch/- OpenAI: https://openai.com/index/ai-chemist-improves-reaction- Google: https://blog.google/innovation-and-ai/models-and-research/google-research/amie-for-disease-management-in-nature/

9 min

Jun 17, 2026Episode 34

Android 17, AI’s Optical Backbone, and Long-Conversation Safety Gaps | UpNext AI – June 17, 2026

A quick catch-up on today’s AI news: Google rolls out Android 17 and Wear OS 7 with a Pixel Drop full of new Gemini features, Coherent expands a Texas optics facility that feeds the AI infrastructure boom, and a new paper argues that chatbot safety can degrade over the course of long, emotionally sensitive conversations.Covered in this episode:- Google releases Android 17 and Wear OS 7, alongside a Pixel Drop with new Gemini-powered features for Pixel devices- Coherent breaks ground on an expanded Sherman, Texas facility to scale optical components used in AI systems- New research on “cognitive atrophy” in LLM behavior and why short safety tests can miss long-run conversational drift- A governance commentary arguing the industry has entered a new AGI-era policy phase- Amazon joins Nvidia and AMD investment arms in a $310 million round for Odyssey ML- TechCrunch reports SpaceX plans to acquire Cursor in a $60 billion stock deal tied to its AI ambitions- The Verge reports on Bloomberg’s latest Apple hardware rumors, including camera-equipped AirPods aimed at AI use casesSource links:- https://techcrunch.com/2026/06/16/android-17-launches-with-new-multitasking-tools-as-google-expands-gemini-features/- https://blogs.nvidia.com/blog/coherent-texas-ai-optical/- https://arxiv.org/abs/2606.18129v1- https://www.interconnects.ai/p/welcome-to-the-agi-era-of-ai-governance- https://www.ft.com/content/1e0365db-a363-4d73-9960-23d25420e9f5- https://techcrunch.com/2026/06/16/spacex-to-acquire-cursor-for-60b-in-stock-days-after-blockbuster-ipo/- https://www.theverge.com/tech/950826/apple-airpod-camera-ai-foldable-iphone-rumor

6 min

Jun 16, 2026Episode 33

AI Agents Get Identities, Anthropic’s Export-Control Fight, and a Better Way to Judge Coding Agents | UpNext AI – June 16, 2026

A concise catch-up on the day in AI: a new enterprise security startup bets companies will need to manage AI agents like employees, Anthropic’s clash with the U.S. government over model restrictions keeps widening, and a fresh research paper argues we should judge coding agents by how they work, not just whether they finish.Covered stories:- NewCore emerges with $66 million to manage AI agents as enterprise identities- Katie Moussouris says Anthropic shared a White House report on the Fable jailbreak for her appraisal- Research: agent trajectories as programs for fingerprinting coding-agent behavior- Headline: Kate Moussouris argues Fable 5 export controls could hurt U.S. cyber defense- Headline: The Verge reports Anthropic received a directive to suspend Mythos 5 and Fable 5 access for foreign nationals- Headline: AWS DevOps Agent adds custom SRE agents plus MCP and A2A accessSource links:- https://techcrunch.com/2026/06/15/ai-agents-are-becoming-employees-newcore-emerges-with-66m-to-give-them-identities/- https://simonwillison.net/2026/Jun/16/matteo-wong-the-atlantic/#atom-everything- https://arxiv.org/abs/2606.16988v1- https://simonwillison.net/2026/Jun/16/fable-5-export-controls/#atom-everything- https://www.theverge.com/ai-artificial-intelligence/950412/anthropic-trump-adminstration-claude-mythos-fable-5-export-controls- https://aws.amazon.com/about-aws/whats-new/2026/06/aws-devops-agent-custom-agents/

7 min

Jun 15, 2026Episode 32

Anthropic’s Access Shock, Dynamic Agent Memory, and New AI Rules for Finance | UpNext AI – June 15, 2026

A fast catch-up on the biggest AI stories heading into the week: the reported Amazon-Anthropic dispute behind a government-triggered model cutoff, a second look at what the Anthropic restrictions actually mean, a new benchmark for testing agent memory in changing environments, and a handful of notable headlines in finance, policy, and developer tooling.Covered in this episode:- TechCrunch reports Amazon CEO Andy Jassy may have raised security concerns that led Anthropic to cut off access to two models- The Financial Times reports the Trump administration directed Anthropic to limit access to its latest models for foreign nationals on national security grounds- EvoArena proposes a way to test whether LLM agents keep their memory and behavior aligned as environments change over time- The Financial Stability Board releases an AI governance framework for financial services- Türkiye announces a new national AI Action Plan with infrastructure, training, and literacy goals- Pyodide 314.0 opens the door to publishing WASM wheels to PyPI for in-browser Python useSource links:- https://techcrunch.com/2026/06/13/amazon-ceo-reportedly-raised-anthropic-model-concerns-before-government-crackdown/- https://www.ft.com/content/2a27300a-b90d-4649-8c09-f7e7cd426dbb- https://arxiv.org/abs/2606.13681v1- https://www.forbes.com/sites/mayrarodriguezvalladares/2026/06/13/the-ai-rulebook-banks-cannot-afford-to-ignore---or-trust-blindly/- https://www.aa.com.tr/en/turkiye/turkiyes-president-erdogan-announces-countrys-new-ai-action-plan-/3966062- https://simonwillison.net/2026/Jun/13/publishing-wasm-wheels/#atom-everything

7 min

Jun 12, 2026Episode 31

Avataar’s Low-Cost Video AI, OpenAI’s Ona Deal, and Verifiable Science Agents | UpNext AI – June 12, 2026

A quick catch-up on today’s AI news: a new India-focused video model pushing generation costs sharply lower, OpenAI’s planned Ona acquisition to support longer-running enterprise agents, and a research benchmark that tests whether science agents can actually make verifiable workflow decisions.Covered in this episode:- Avataar AI launches Varya, a low-cost video model built for India’s scale and local context- OpenAI plans to acquire Ona to bring secure, persistent cloud environments into Codex- EpiBench proposes a verifiable benchmark for AI agents working on epigenomics analysis- Anthropic partners with TCS to scale enterprise deployments- Google releases DiffusionGemma, an open model that generates text from noise rather than token by token- Amazon updates Echo Hub and adds Ring AI features- Jeff Bezos’ AI startup Prometheus reportedly closes a $12 billion round at a $41 billion valuationSource links:- https://techcrunch.com/2026/06/11/cheaper-faster-and-culturally-aware-avataars-video-ai-is-built-for-indias-scale/- https://openai.com/index/openai-to-acquire-ona- https://arxiv.org/abs/2606.13602v1- https://techcrunch.com/2026/06/11/anthropic-taps-tcs-to-scale-its-enterprise-ai-deployments/- https://the-decoder.com/googles-new-open-model-diffusiongemma-generates-text-from-noise-instead-of-word-by-word/- https://www.theverge.com/tech/948814/amazon-echo-hub-homescreen-redesign- https://the-decoder.com/jeff-bezos-ai-startup-prometheus-closes-12-billion-round-at-a-41-billion-valuation/

8 min

Jun 11, 2026Episode 30

Anthropic’s Guardrail Backlash, AI Memory Risks, and Coding-Agent Benchmarks | UpNext AI – June 11, 2026

A quick catch-up on the AI stories that matter most today: backlash over Anthropic’s Fable guardrails, new research on how memory can make models worse, and a practical benchmark for coding-agent harnesses. We also hit headlines on AI shopping agents, Warner Music’s attribution play, Anthropic’s policy reversal, and OpenAI’s Oracle Cloud push.Covered in this episode:- Anthropic’s Fable faces criticism from cybersecurity researchers who say the model’s guardrails are too restrictive for legitimate security work.- New research reported by TechCrunch suggests memory systems can make models more sycophantic and less accurate.- A new paper, Claw-SWE-Bench, argues that agent harness design can dramatically change coding benchmark results.- Bloomberg reports OpenAI and Visa are enabling AI agents to make purchases online with user permission.- Warner Music is acquiring Sureel AI to better track artist work used in AI-generated content or model training.- Anthropic says it is changing Fable 5 safeguards for frontier LLM development to make them visible after backlash.- OpenAI says customers can access its models and Codex through Oracle Cloud using existing cloud commitments.Sources:- TechCrunch: https://techcrunch.com/2026/06/10/cybersecurity-researchers-arent-happy-about-the-guardrails-on-anthropics-fable/- TechCrunch: https://techcrunch.com/2026/06/10/how-memory-tools-can-make-ai-models-worse/- arXiv: https://arxiv.org/abs/2606.12344v1- Bloomberg: https://www.bloomberg.com/news/articles/2026-06-10/openai-visa-team-up-to-let-ai-agents-make-purchases-online- Simon Willison citing WIRED and Anthropic statement: https://simonwillison.net/2026/Jun/11/anthropic-walks-back-policy/#atom-everything- TechCrunch: https://techcrunch.com/2026/06/10/warner-music-acquires-ai-attribution-startup-sureel-ai/- OpenAI: https://openai.com/index/openai-on-oracle-cloud

7 min

Jun 10, 2026Episode 29

Waymo’s Robotaxi Safety Benchmark, WhatsApp’s AI Access Order, and a New Test for Real-World Agents | UpNext AI – June 10, 2026

A concise catch-up on today’s AI news: Waymo rolls out a new benchmark for comparing robotaxi behavior to human drivers, the EU orders WhatsApp to reopen access for rival AI assistants while an antitrust probe continues, and a new research benchmark tries to measure whether agents can actually handle messy real-world work.Covered in this episode:- Waymo says it built a new benchmark to compare robotaxis with human drivers in crash scenarios- The European Commission orders WhatsApp to restore free access for rival AI chatbots during an ongoing antitrust investigation- T1-Bench proposes a more realistic benchmark for multi-scenario AI agents- Anthropic releases Claude Fable 5, a public Mythos-class model with high-risk guardrails- NVIDIA says its confidential computing GPUs are being used in Apple’s Private Cloud Compute expansion to Google Cloud- GM links EV batteries, energy storage, and vehicle-to-grid plans to rising power demand from AI data centersSource links:- https://techcrunch.com/2026/06/10/waymo-says-it-built-a-better-benchmark-for-comparing-robotaxis-to-humans/- https://www.theverge.com/tech/947516/meta-whatsapp-eu-third-party-ai-chatbot-ban-order- https://arxiv.org/abs/2606.11070v1- https://techcrunch.com/2026/06/09/anthropics-claude-fable-5-is-a-version-of-mythos-the-public-can-access-today/- https://blogs.nvidia.com/blog/nvidia-confidential-computing-apple-private-cloud-compute/- https://www.theverge.com/transportation/946820/gm-energy-ev-v2g-storage-sodium-ion

5 min

Jun 9, 2026Episode 28

Apple’s Siri AI Push, a New Benchmark for Game Agents, and What Deep Research Agents Really Learn | UpNext AI – June 9, 2026

Today on UpNext AI, we lead with Apple’s WWDC 2026 AI push around Siri AI and Apple Intelligence, then look at a new benchmark for vision-language game agents, and close with a research paper testing whether deep research agents actually improve when you give them process-level feedback.Covered stories:- Apple’s WWDC 2026 announcements center on Siri AI, iOS 27, and Apple Intelligence- OmniGameArena introduces a UE5 benchmark for vision-language game agents and tracks how they improve across rounds- New research tests whether deep research agents get better with process-level feedback- Simon Willison urges a wait-and-see stance on Apple’s new AI promises- Ars Technica reports Apple’s Siri AI is due this fall with a more conversational experience and Google-powered model changes- OpenAI confirms a confidential S-1 submission to the SECSource links:- https://techcrunch.com/2026/06/08/wwdc-2026-everything-announced-on-siri-ai-os-27-apple-intelligence-and-more/- https://arxiv.org/abs/2606.09826v1- https://arxiv.org/abs/2606.09748v1- https://simonwillison.net/2026/Jun/8/wwdc/#atom-everything- https://arstechnica.com/apple/2026/06/say-hi-to-siri-ai-apple-announces-new-more-conversational-voice-assistant/- https://openai.com/index/openai-submits-confidential-s-1

8 min

Jun 8, 2026Episode 27

South Korea’s AI Buildout, Google’s SpaceX Compute Deal, and Multimodal Lifelong Learning | UpNext AI – June 8, 2026

A quick catch-up on the AI stories shaping infrastructure, platforms, and real-world deployment. Today: Nvidia uses its Seoul trip to spotlight South Korea’s role in sovereign AI and robotics, TechCrunch reports Google is paying SpaceX $920 million per month for bridge compute capacity, and we look at a new paper on helping multimodal models learn new skills over time without full retraining.Covered stories:- Nvidia spotlights South Korea as a center of sovereign AI infrastructure, robotics, and AI factory buildout- TechCrunch reports Google will pay SpaceX $920 million per month for compute amid stronger-than-expected demand for AI products- Research: ProtoAda and the push to help multimodal models keep learning new vision-language skills over time- Simon Willison releases a MicroPython plus WebAssembly sandbox for running Python code more safely- Microsoft’s Xbox showcase mixes game announcements with more ambiguity around exclusives- A school shooting survivor sues an AI gun-detection firm after a system allegedly failed to detect a weapon- Perplexity unveils “Search as Code,” aiming to let models compose their own search pipelines instead of relying on fixed APIsSource links:- https://blogs.nvidia.com/blog/korea-ecosystem-2026/- https://techcrunch.com/2026/06/05/google-will-pay-spacex-920m-per-month-for-compute/- https://arxiv.org/abs/2606.02576v1- https://simonwillison.net/2026/Jun/6/micropython-in-a-sandbox/#atom-everything- https://www.theverge.com/entertainment/944191/xbox-games-showcase-2026-news-trailers- https://arstechnica.com/tech-policy/2026/06/school-shooting-survivor-sues-ai-gun-detection-firm-after-system-failed-to-spot-weapon/- https://the-decoder.com/perplexitys-search-as-code-lets-ai-models-write-their-own-search-pipelines-instead-of-calling-fixed-apis/

5 min

Jun 5, 2026Episode 26

Anthropic’s IPO Math, the NSA’s Mythos Use, and a Better Test for Medical Agents | UpNext AI – June 5, 2026

A quick Friday briefing on the AI stories that mattered most: Anthropic’s eye-popping revenue growth ahead of an IPO, a reported government cyber-use case involving Anthropic’s Mythos model, and a research paper arguing that medical agents need to be tested in step-by-step clinical environments instead of static quizzes.Covered in this episode:- Anthropic says annualized revenue crossed $47 billion in May, up from roughly $9 billion at the end of 2025, as it moves toward an IPO- The Financial Times reports the US National Security Agency is using Anthropic’s Mythos for cyber attacks- A new arXiv paper, ClinEnv, proposes a long-horizon electronic health record environment for evaluating medical agents- OpenAI publishes its public policy agenda focused on safety, youth protection, workforce transition, and global standards- StrictlyVC Los Angeles will spotlight defense tech, AI, and fundraising on June 18 at The Aerospace Corporation Campus- OpenAI also calls for global action on youth AI safety and proposes an international instituteSource links:- Anthropic / TechCrunch: https://techcrunch.com/2026/06/04/ahead-of-its-ipo-anthropics-daniela-amodei-shrugs-off-doubts-about-ais-returns/- FT on NSA and Anthropic Mythos: https://www.ft.com/content/d02d91b3-2636-454e-9442-dc7e69f51815- ClinEnv paper: https://arxiv.org/abs/2606.02568v1- OpenAI public policy agenda: https://openai.com/index/public-policy-agenda- StrictlyVC Los Angeles: https://techcrunch.com/2026/06/04/defense-tech-ai-and-fundraising-take-center-stage-at-strictlyvc-los-angeles-on-june-18/- OpenAI youth safety post: https://openai.com/index/advancing-youth-safety-and-opportunity-through-global-leadership

5 min

Jun 4, 2026Episode 25

Lovable’s 5x Google Cloud Deal, Anthropic’s IPO Move, and New Rules for AI Search | UpNext AI – June 4, 2026

A quick catch-up on today’s AI storylines: a reported infrastructure-and-model-access expansion between Lovable and Google Cloud, Anthropic’s confidential IPO filing, a research paper on AI for industrial reliability, and a set of policy and enterprise headlines that show how AI distribution and oversight keep widening.Covered in this episode:- Lovable reportedly signs an expanded multiyear deal with Google Cloud, growing its footprint 5x and widening access to Anthropic Claude and Google Gemini- Anthropic confidentially files for a U.S. IPO- A PLOS One paper looks at using AI to assess the reliability of coal-gasification equipment- U.K. regulators push Google to give publishers an AI Search opt-out- Endava says it is redesigning software delivery around AI agents, ChatGPT Enterprise, and Codex- Critics say a new Trump AI testing order leans too heavily on voluntary reviews- The U.K. banking regulator warns AI cyber risk is now near the top of the threat list for lendersSource links:- https://techcrunch.com/2026/06/03/lovable-signs-multi-year-deal-with-google-cloud-to-up-usage-5x-source-says/- https://www.reuters.com/video/watch/idRW517501062026RP1/- https://journals.plos.org/plosone/article?id=10.1371%2Fjournal.pone.0350454- https://techcrunch.com/2026/06/03/publishers-will-be-able-to-opt-out-of-ai-search-thanks-to-new-regulation/- https://openai.com/index/endava-frontiers- https://arstechnica.com/tech-policy/2026/06/trumps-ai-executive-order-may-not-prevent-dangerous-deployments/- https://www.ft.com/content/c5f7a9f0-d3d1-499c-aafb-ad03c85730bd

6 min

Jun 3, 2026Episode 24

Amazon’s MCP Gateway Push, Microsoft’s Agent Guardrails, and Google’s UK Search Ruling | UpNext AI – June 3, 2026

A quick catch-up on the day in AI: Amazon is expanding Bedrock AgentCore Gateway for enterprise MCP deployments, Microsoft is pushing a more portable way to govern agent behavior, and UK regulators are forcing Google to give publishers more control over AI Search.Covered stories:- Amazon extends MCP support in Bedrock AgentCore Gateway with dynamic listing, streaming, sessions, and delegated authentication- Microsoft introduces the Agent Control Specification for portable agent policy files and multi-step governance checks- Google must let publishers opt out of AI Search features under a UK CMA rule- Microsoft previews Project Solara, an Android-based OS concept built for agents instead of apps- Researchers propose neural safety filters for interactive robotics under uncertaintySource links:- Amazon Bedrock AgentCore Gateway: https://aws.amazon.com/blogs/machine-learning/extending-mcp-support-for-amazon-bedrock-agentcore-gateway-2/- Microsoft Agent Control Specification: https://techcrunch.com/2026/06/02/microsoft-offers-devs-a-better-way-to-control-ai-agent-behavior/- Google UK AI Search ruling: https://www.theverge.com/tech/942302/google-search-ai-overviews-uk-cma-publisher-opt-out- Microsoft Project Solara: https://arstechnica.com/gadgets/2026/06/microsofts-project-solara-is-an-android-os-designed-for-agents-instead-of-apps/- Robotics safety paper: https://arxiv.org/abs/2606.02562v1

5 min

Jun 2, 2026Episode 23

DuckDuckGo’s No-AI Search Push, Alphabet’s $80 Billion AI Buildout, and Bias in Multimodal AI Judges | UpNext AI – June 2, 2026

Today on UpNext AI: DuckDuckGo leans into demand for AI-free search, Alphabet moves to raise $80 billion for AI infrastructure, and a new paper examines how multimodal AI judges can get distracted by the wrong cues.Covered in this episode:- DuckDuckGo launches Chrome and Firefox extensions to make its no-AI search experience easier to set as default, as TechCrunch reports traffic to that experience is rising.- Alphabet says it plans to raise $80 billion to fund AI infrastructure and global compute, with demand reportedly exceeding available supply.- Researchers propose a way to reduce perceptual judgment bias in multimodal LLM-as-a-judge systems when images and text conflict.- GlobalData says autonomous AI agents are exposing the limits of traditional GUI-driven software workflows.- Google details how it used Gemini and other AI tools to help produce Google I/O 2026.- Nvidia used GTC Taipei to introduce new physical-AI offerings for robots, autonomous vehicles, and video systems.- Reporting highlighted by Simon Willison says attackers were able to use Meta’s AI support flow in an Instagram account takeover scenario.Sources:- https://techcrunch.com/2026/06/01/duckduckgo-makes-its-no-ai-search-engine-easier-to-access-as-its-traffic-booms/- https://techcrunch.com/2026/06/01/alphabet-plans-to-raise-80-billion-to-pay-for-ai-buildout/- https://arxiv.org/abs/2606.02578v1- https://fudzilla.com/116615-2- https://blog.google/innovation-and-ai/technology/ai/io-2026-google-ai/- https://the-decoder.com/nvidia-bets-big-on-physical-ai-at-gtc-taipei-with-a-new-world-model-driving-brain-and-open-humanoid-robot/- https://simonwillison.net/2026/Jun/1/hackers-simply-asked-meta-ai/#atom-everything

6 min

Jun 1, 2026Episode 22

Nvidia’s Local Agent Push, Intel’s Inference Chip Plan, and Neuromorphic AI Benchmarks | UpNext AI – June 1, 2026

Today on UpNext AI: Nvidia makes a broad push to bring personal AI agents onto RTX PCs and DGX Spark systems, Intel says it is targeting a new AI data-centre inference chip by year-end, and we dig into a research paper on benchmark datasets for spiking graph neural networks on neuromorphic hardware.Covered in this episode:- Nvidia unveils RTX Spark and expands local AI agent tooling across RTX PCs and DGX systems- Intel targets a new AI data-centre inference GPU by the end of the year- New npj Unconventional Computing paper builds smaller citation-network benchmarks for spiking graph neural networks on neuromorphic hardware- Anthropic details how it contains Claude across products- Report says AI search agents can confirm prior assumptions instead of actually researching the web- Financial Times reports Western AI models are helping sharpen Iran’s cyber operations- A broader jobs warning around the rise of AI agentsSources:- Nvidia: https://blogs.nvidia.com/blog/rtx-ai-garage-computex-spark-local-agents/- Financial Times on Intel: https://www.ft.com/content/3ca15070-c1c7-4ec2-9598-e36b7de47bc0- npj Unconventional Computing paper: https://www.nature.com/articles/s44335-026-00068-2- Simon Willison on Anthropic containment: https://simonwillison.net/2026/May/30/how-we-contain-claude/#atom-everything- The Decoder on AI search agents: https://the-decoder.com/ai-search-agents-often-confirm-what-they-already-know-instead-of-actually-researching-the-web/- Financial Times on Iran and ChatGPT: https://www.ft.com/content/4f18256e-a58f-4411-97e4-ac5e5eb055aa- Times Now on AI agents and jobs: https://www.timesnownews.com/technology-science/big-techs-ai-agent-dream-could-come-at-the-expense-of-millions-of-jobs-article-154432517

5 min

May 29, 2026Episode 21

H1’s Healthcare Data Bet, Anthropic’s $65bn Raise, and Self-Evolving Agent Skills | UpNext AI – May 29, 2026

A lighter but still revealing day in AI: healthcare data startup H1 lands fresh backing from CVS, Anthropic reportedly finalizes another massive funding round, and a new paper looks at how agents might get better by building and reusing their own skills over time.In this episode:- H1 secures $40 million from CVS Health Ventures, with CEO Ariel Katz arguing that unique doctor data is harder for AI to replicate than workflow SaaS.- The Financial Times reports Anthropic finalized a $65 billion funding deal valuing the company at $965 billion including the new money.- Earlier-this-week research: MUSE-Autoskill proposes a way for LLM agents to create, store, manage, and evaluate reusable skills instead of treating each task as a one-off.- Headlines: Anthropic’s Opus 4.8 adds Dynamic Workflows; OpenAI publishes a Frontier Governance Framework; Simon Willison ships llm-anthropic 0.25.1 with Claude Opus 4.8 support; and The New York Times’ The Daily examines whether AI companions like ElliQ can help reduce loneliness.Sources:- TechCrunch on H1 and CVS: https://techcrunch.com/2026/05/28/h1-secures-40m-from-cvs-proving-saas-startups-can-still-attract-investment/- Financial Times on Anthropic funding: https://www.ft.com/content/fd0aec4a-50d1-4594-b489-7420bd0b4268- arXiv paper, MUSE-Autoskill: https://arxiv.org/abs/2605.27366v1- TechCrunch on Anthropic Opus 4.8: https://techcrunch.com/2026/05/28/anthropic-releases-opus-4-8-with-new-dynamic-workflow-tool/- OpenAI Frontier Governance Framework: https://openai.com/index/openai-frontier-governance-framework- Simon Willison on llm-anthropic 0.25.1: https://simonwillison.net/2026/May/28/llm-anthropic/#atom-everything- NYT The Daily, “Can A.I. Make People Feel Less Lonely?”: https://www.nytimes.com/2026/05/28/podcasts/the-daily/ai-robot-elderly-loneliness.html?

8 min

May 28, 2026Episode 20

Meta’s Subscription Push, Google’s AI Search Shift, and GPT-4 as a Deployment Blueprint | UpNext AI – May 28, 2026

Meta is rolling out paid subscriptions across Instagram, Facebook, and WhatsApp, while Google’s AI-first search experience is forcing brands to rethink visibility online. We also look at what a GPT-4 technical review still tells us about how frontier AI moved from research demo to real-world platform.In this episode:- Meta launches global subscription plans for Instagram, Facebook, and WhatsApp, and says more Meta One offerings are coming, including AI plans.- Google’s AI-generated answers are now front and center in search, changing how brands get discovered.- A review of the GPT-4 technical report highlights the shift from raw model scaling to reliability, safety, multimodal inputs, and deployment.- Simon Willison argues Anthropic and OpenAI may have found product-market fit as enterprise AI bills rise.- ElevenLabs releases Music v2, aimed at smoother genre shifts inside a single song.- MarsLab outlines a Singapore-based AI inference infrastructure roadmap for enterprise and edge deployment.- Ruanyun Edai introduces YeeZo, a platform aimed at lower-cost AI content production for creators, education, and short drama workflows.Sources:- Meta / TechCrunch: https://techcrunch.com/2026/05/27/meta-officially-launches-instagram-facebook-and-whatsapp-subscriptions-with-more-to-come-including-ai-plans/- Google search shift / TechCrunch: https://techcrunch.com/video/google-just-broke-seo-heres-what-replaces-it/- GPT-4 technical report review / freeCodeCamp: https://www.freecodecamp.org/news/ai-paper-review-gpt-4-technical-report/- Simon Willison on product-market fit: https://simonwillison.net/2026/May/27/product-market-fit/#atom-everything- ElevenLabs Music v2 / The Decoder: https://the-decoder.com/elevenlabs-music-v2-promises-opera-to-metal-transitions-without-losing-musical-coherence/- MarsLab roadmap: https://sloveniatimes.com/47746/marslab-introduces-singapore-based-ai-inference-infrastructure-roadmap-for-enterprise-and-edge-deployment- YeeZo platform: http

7 min

May 27, 2026Episode 19

AI Infra Decacorns, Search Backlash, and Better Ways to Grade AI Text | UpNext AI – May 27, 2026

A funding wave in AI infrastructure is turning the routing and inference layer into a story of its own, while users push back on Google’s AI-first vision for Search. Plus, a new paper argues many of the metrics we use to judge AI text can miss outright contradictions.In this episode:- AI infrastructure funding gets the spotlight as Latent Space frames Fireworks, Baseten, and OpenRouter as part of a new decacorn moment- DuckDuckGo says installs jumped after Google’s AI Search overhaul, suggesting some users want more control over how much AI shows up in search- A new arXiv paper, MATCHA, proposes a better way to evaluate model-generated text by rewarding semantic agreement and penalizing contradictions- Forbes examines Anthropic’s publicly available Claude system prompt for handling mental health chats- Simon Willison highlights Daniel Stenberg’s warning that curl is facing a surge of credible AI-assisted security reports- The Financial Times reports that UK law firm Pinsent Masons was reprimanded by a court over an AI-related errorSources:- Latent Space: https://www.latent.space/p/ainews-new-ai-infra-decacorns-fireworks- TechCrunch: https://techcrunch.com/2026/05/26/duckduckgo-installs-are-up-30-as-users-reject-being-force-fed-googles-ai-search/- arXiv (MATCHA): https://arxiv.org/abs/2605.27345v1- Forbes: https://www.forbes.com/sites/lanceeliot/2026/05/27/analysis-of-anthropic-claude-system-prompt-instruction-that-shapes-the-handling-of-ai-mental-health-chats/- Simon Willison: https://simonwillison.net/2026/May/26/the-pressure/#atom-everything- Financial Times: https://www.ft.com/content/5ba4690b-8b98-43b3-ba0b-f2ec5591a572

7 min

May 26, 2026Episode 18

All Model Labs Become Agent Labs, National-Security Oversight, and What Weak Supervision Really Buys You | UpNext AI – May 26, 2026

A catch-up edition after the long weekend: today we look at the industry shift from standalone models to full agent products, a governance proposal aimed at companies with national-security implications, a new research benchmark for weakly supervised anomaly detection, and a few headlines spanning the Vatican, Anthropic, and OpenAI’s Brazil news push.Covered in this episode:- Latent Space’s argument that model labs are becoming agent labs, with product value moving toward the model-plus-harness stack- Financial Times reporting on a proposal for formal board-level oversight at companies such as Anthropic and SpaceX on national-security grounds- A new arXiv benchmark, WSADBench, testing weakly supervised anomaly detection across multiple settings, modalities, and 36 algorithms- Christopher Olah of Anthropic speaking at the launch of Pope Leo XIV’s AI encyclical- Ars Technica reporting on Pope Leo’s call to “disarm” AI- OpenAI’s content partnership with Grupo Folha and Grupo UOL to bring Brazilian journalism into ChatGPT with attribution and transparencySources:- Latent Space: https://www.latent.space/p/ainews-all-model-labs-are-now-agent- Financial Times: https://www.ft.com/content/b5dfdd31-ccc3-4f49-a166-3aa9f8621f12- arXiv WSADBench paper: https://arxiv.org/abs/2605.26068v1- The Decoder: https://the-decoder.com/at-the-launch-of-pope-leo-xivs-encyclical-anthropic-co-founder-says-ai-models-show-signs-of-introspection/- Ars Technica: https://arstechnica.com/tech-policy/2026/05/citing-gandalf-pope-leo-says-we-must-disarm-ai/- OpenAI: https://openai.com/index/grupo-folha-grupo-uol-partnership

21 min

May 23, 2026Episode 17

UpNext AI Deep Dive: World Models, Spatial Intelligence, and the Race to Teach AI Reality

In this deep-dive episode of UpNext AI, we explore the growing debate around world models — AI systems designed to predict and reason about how the world changes over time. Large language models made AI useful as a software and knowledge interface, but researchers like Yann LeCun and Fei-Fei Li argue that acting in the physical world requires something more: spatial understanding, prediction, planning, and a model of consequences.We break down why world models are attracting major investment, how they differ from traditional robotics, why video models changed the conversation, and what recent research papers suggest about the path from passive observation to real-world action. We also look at the risks: unclear architectures, expensive data, reliability gaps, and the challenge of turning compelling research into durable businesses.Sources and further readingInterviewsFei-Fei Li interview: https://youtu.be/wDeXfFQcJxk?si=9oxB3NWXZiqeuj1KYann LeCun interview: https://youtu.be/_PioN-CpOP0?si=K7RRD7BtfKpQ9cCICompany and funding contextReuters — Fei-Fei Li’s World Labs raises $1 billion in funding: https://www.reuters.com/business/ai-pioneer-fei-fei-lis-world-labs-raises-1-billion-funding-2026-02-18/World Labs — funding announcement: https://www.worldlabs.ai/blog/funding-2026TechCrunch — Yann LeCun’s AMI Labs raises $1.03 billion to build world models: https://techcrunch.com/2026/03/09/yann-lecuns-ami-labs-raises-1-03-billion-to-build-world-models/Research papersV-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning: https://arxiv.org/abs/2506.09985Humanoid World Models: Open World Foundation Models for Humanoid Robotics: https://arxiv.org/abs/2506.01182GenCast: Probabilistic Weather Forecasting with Machine Learning: https://www.nature.com/articles/s41586-024-08252-9WorldSimBench / Towards Video Generation Models as World Simulators: https://openreview.net/forum?id=ejGAytoWoeVideo models and robotics contextOpenAI — Video generation models as world simulators: https://openai.com/index/video-generation-models-as-world-simulators/OpenAI — Sora: Creating video from text: https://openai.com/index/sora/Boston Dynamics — Large Behavior Models and Atlas Find New Footing: https://bostondynamics.com/blog/large-behavior-models-atlas-find-new-footing/Toyota Research Institute — AI-Powered Robot by Boston Dynamics and TRI takes key step toward general-purpose humanoids: https://www.tri.global/news/ai-powered-robot-boston-dynamics-and-toyota-research-institute-takes-key-step-towards-generalIEEE Spectrum — Boston Dynamics Atlas Learns From Large Behavior Models: https://spectrum.ieee.org/b

7 min

May 22, 2026Episode 16

The Pentagon’s AI Bake-Off, Agent-Scale Computing, and Safety-First AI Models | UpNext AI – May 22, 2026

The U.S. Department of Defense is reportedly testing competing frontier AI models as it evaluates alternatives to Anthropic’s Claude. Bloomberg reports that a group of Pentagon “power users” is comparing models in real operational workflows, highlighting a broader shift from benchmark-driven competition to real-world evaluation focused on reliability, mission fit, security, and deployment requirements. For AI vendors, winning enterprise and government adoption increasingly depends on performance in production environments rather than leaderboard rankings alone. Meanwhile, agent infrastructure startup Daytona argues that AI agents need something beyond model APIs: actual computers to operate. In a Latent Space interview, CEO Ivan Burazin said the company has experienced rapid growth as coding agents, evaluation systems, and reinforcement learning workloads increasingly require isolated, stateful environments. The broader trend is clear: a new infrastructure layer is emerging between foundation models and applications, designed specifically for autonomous agents and long-running workflows.In research, we examine a study in Scientific Reports exploring AI-based safety forecasting for extreme cold exposure. Researchers developed an LSTM model to predict toe skin temperature in mountaineering conditions and introduced a metric called Duration of Safe Exposure. Rather than optimizing only for prediction accuracy, the system was designed to minimize dangerous forecasting errors where risk could be underestimated. The work highlights a growing theme across applied AI: success is increasingly measured by safety and decision quality, not just average model performance.In the headlines: President Trump delays an executive order that would have expanded government evaluation of advanced AI models before release, Amazon Bedrock adds request-level AI usage attribution for enterprise cost tracking and governance, Google continues rolling out Gemini, Search, and smart-glasses initiatives following I/O 2026, and Anker introduces its first earbuds powered by an in-house AI audio chip for enhanced noise reduction and voice processing.SourcesBloomberg – Pentagon tests rival AI models as alternatives to Anthropic https://www.bloomberg.com/news/articles/2026-05-21/pentagon-tests-rival-ai-models-in-race-to-replace-anthropicLatent Space – Giving Agents Computers (Ivan Burazin, Daytona) https://www.latent.space/p/daytonaNature Scientific Reports – LSTM-based safety-oriented prediction of toe skin temperature in extreme cold conditions https://www.nature.com/articles/s41598-026-52990-xTechCrunch – Trump delays AI security executive order https://techcrunch.com/2026/05/21/trump-delays-ai-security-executive-order-i-dont-want-to-get-in-the-way-of-that-leading/AWS – Amazon Bedrock request-level usage attribution https://aws.amazon.com/about-aws/whats-new/2026/0

8 min

May 21, 2026Episode 15

SpaceX’s $2.8B AI Power Bet, OpenAI’s Math Breakthrough, and the Rise of Medical AI Agents | UpNext AI – May 21, 2026

The AI race is increasingly becoming an infrastructure race. WIRED reports that SpaceX has committed more than $2.8 billion toward gas turbines to power AI data centers supporting Elon Musk’s xAI ambitions. According to the report, the company is rapidly expanding capacity as demand for AI compute collides with power grid constraints, highlighting that access to electricity may be as important as access to GPUs in the next phase of AI competition. Meanwhile, OpenAI claims one of its reasoning models has produced a proof that disproves a geometry conjecture dating back to 1946. TechCrunch reports that mathematicians who previously criticized OpenAI’s earlier math-related claims now support the validity of the new result, potentially marking one of the strongest demonstrations yet of AI reasoning on open-ended scientific and mathematical problems.In research, we examine a paper in Eye exploring whether AI agents could transform ophthalmology. Rather than replacing clinicians, the authors argue that agent-based systems may help integrate patient history, imaging, diagnostic information, and clinical workflows into a more coordinated decision-support process. The paper highlights a growing trend in healthcare AI: using agents to orchestrate complex information rather than simply generate answers.In the headlines: TechCrunch reports that Anthropic will pay xAI approximately $1.25 billion per month for compute capacity under a multi-year agreement, Forbes argues that enterprises should focus on the cost of completed work rather than token pricing alone, Bloomberg Opinion examines how the AI boom is reshaping elite computer science culture, and Stability AI launches Stable Audio 3.0 with open weights and support for audio generation up to six minutes in length.SourcesWIRED – SpaceX spending billions on AI data center power infrastructure https://www.wired.com/story/elon-musk-spacex-spending-gas-turbines-grok/TechCrunch – OpenAI claims AI solved an 80-year-old math problem https://techcrunch.com/2026/05/20/openai-claims-it-solved-an-80-year-old-math-problem-for-real-this-time/Nature Eye – AI agents in ophthalmology https://www.nature.com/articles/s41433-026-04543-9TechCrunch – Anthropic to pay xAI for compute capacity https://techcrunch.com/2026/05/20/anthropic-will-pay-xai-1-25-billion-per-month-for-compute/Bloomberg Opinion – The AI boom and Stanford culture https://www.bloomberg.com/opinion/articles/2026-05-20/how-to-rule-the-world-book-says-stanford-rewards-tech-s-worst-instinctsForbes – Tokenomics and the cost of AI work https://www.forbes.com/sites/sanjaysrivastava/2026/05/20/tokenomics-101-cost-of-getting-work-done-not-the-cost-of-tokens/The Decoder – Stability AI launches Stable Audio 3.0 https://the-decoder.com/stability-ai-launches-stable-audio-3-0-with-up-t

8 min

May 20, 2026Episode 14

Google’s Gemini Wearables Push, OpenAI’s National AI Strategy, and AI Research Agents | UpNext AI – May 20, 2026

Google is reportedly preparing new smart glasses and deeper AI agent integration inside Search as it pushes Gemini into core consumer products. The Financial Times reports Sundar Pichai framed the effort as part of Google’s broader competition with OpenAI and Anthropic. The larger takeaway is that Google increasingly sees AI not as a standalone chatbot product, but as a layer spanning search, wearables, and everyday computing workflows. Meanwhile, OpenAI announced “OpenAI for Singapore,” a multi-year partnership focused on AI deployment, workforce development, and public-sector integration. The move reflects a broader industry trend: frontier AI companies are increasingly competing to become embedded at the national infrastructure level, not just through APIs and consumer apps.In research, we look at Robin, a multi-agent scientific discovery system published in Nature. The researchers describe a coordinated AI workflow capable of literature review, hypothesis generation, experiment planning, and result interpretation. In experimental biology applications, the system identified potential therapeutic candidates for dry age-related macular degeneration and proposed follow-up experimental directions. The broader implication is that AI systems are beginning to function less like isolated copilots and more like coordinated research collaborators.In the headlines: OpenAI expands its Education for Countries initiative, TechCrunch argues Google Search is evolving from a list of links into an AI-native interface, and SandboxAQ partners with Anthropic to bring scientific reasoning systems into Claude for drug discovery and materials science workflows.SourcesFinancial Times – Google smart glasses and AI search agents https://www.ft.com/content/c47ab51e-2521-4ccb-9de5-a2b03791981aOpenAI – OpenAI for Singapore https://openai.com/index/introducing-openai-for-singaporeNature – A multi-agent system for automating scientific discovery https://www.nature.com/articles/s41586-026-10652-yOpenAI – Education for Countries https://openai.com/index/the-next-phase-of-education-for-countriesTechCrunch – Google Search as you know it is over https://techcrunch.com/2026/05/19/google-search-as-you-know-it-is-over/India Today – SandboxAQ and Anthropic partnership https://www.indiatoday.in/technology/news/story/after-mythos-claude-enters-drug-discovery-race-with-ex-google-ceo-startup-help-2913863-2026-05-19

6 min

May 19, 2026Episode 13

Google’s AI Data Center Expansion, OpenAI’s Legal Win, and Cheaper Medical AI Benchmarking | UpNext AI – May 19, 2026

Google is reportedly deepening its AI infrastructure push through a partnership tied to a Blackstone-backed cloud group and a planned $5 billion investment expected to bring 500 megawatts of new data center capacity online next year. The story highlights how the frontier AI race is increasingly constrained not just by models, but by physical infrastructure: power, chips, and large-scale compute deployment. Meanwhile, Elon Musk lost his lawsuit against OpenAI after a jury unanimously concluded he waited too long to bring the case. Ars Technica reports the suit accused OpenAI and Sam Altman of abandoning the organization’s original nonprofit mission, but the court ruled the claims fell outside the statute of limitations. Musk plans to appeal.In research, we examine a new paper in npj Digital Medicine exploring adaptive testing methods for evaluating large language models in healthcare. The researchers found they could preserve benchmark rankings while dramatically reducing evaluation cost, runtime, and token usage—potentially making continuous evaluation much more practical for regulated AI systems.In the headlines: Forbes examines the benefits and risks of AI-powered cybersecurity systems, and Anthropic’s reported acquisition of Stainless points to a growing battle over AI infrastructure tooling and developer ecosystem control.SourcesFinancial Times – Google AI infrastructure expansion https://www.ft.com/content/5730b605-8fb2-4973-a188-b4a587ce3580Ars Technica – Elon Musk loses OpenAI lawsuit https://arstechnica.com/tech-policy/2026/05/elon-musk-loses-trial-accusing-sam-altman-openai-of-stealing-a-charity/Nature – Adaptive LLM evaluation in healthcare https://www.nature.com/articles/s41746-026-02671-wForbes – AI cybersecurity risks and benefits https://www.forbes.com/sites/chuckbrooks/2026/05/18/5-benefits-and-risks-of-using-ai-for-cybersecurity/Forbes – Anthropic and Stainless https://www.forbes.com/sites/sandycarter/2026/05/18/anthropic-buys-stainless-to-cut-off-openai-and-google-sdk-access/

7 min

May 18, 2026Episode 12

OpenAI’s Personal Finance Platform, Nectar Social’s $30M Round, and Portable Enterprise AI | UpNext AI – May 18, 2026

OpenAI is expanding ChatGPT into personal finance, launching tools that let U.S. Pro users connect bank and financial accounts through Plaid. The company says users will be able to view portfolio performance, subscriptions, spending activity, and upcoming payments directly inside ChatGPT—another step toward AI systems acting less like standalone chatbots and more like operational control panels for everyday workflows. Meanwhile, AI-powered marketing platform Nectar Social has raised a $30 million Series A led by Menlo Ventures and the Anthology Fund created alongside Anthropic. The company positions itself as an “agentic operating system” for marketing teams, combining moderation, creator workflows, commerce conversations, and competitive intelligence into a unified AI workflow platform.In research and infrastructure, we look at Giotto.ai’s push for portable enterprise AI reasoning systems. The company says its platform can run advanced AI workloads across cloud, workstation, and on-premise environments—including single-GPU deployments. The broader trend is increasingly clear: enterprise buyers are starting to prioritize control, sovereignty, latency, and deployment flexibility alongside raw model capability.In the headlines: Microsoft retires Teams’ Together Mode, the UK Government Digital Service weighs in on the NHS open-source debate, and Simon Willison highlights a new Datasette plugin for enforcing per-user LLM spending limits.SourcesTechCrunch – ChatGPT personal finance tools https://techcrunch.com/2026/05/15/openai-launches-chatgpt-for-personal-finance-will-let-you-connect-bank-accounts/TechCrunch – Nectar Social funding round https://techcrunch.com/2026/05/16/marketing-operating-system-nectar-social-raises-30m-series-a-in-round-led-by-menlo/FinanzNachrichten – Giotto.ai portable enterprise AI https://www.finanznachrichten.de/nachrichten-2026-05/68520127-dynamics-group-ag-giotto-ai-launches-portable-ai-for-enterprises-advanced-reasoning-from-cloud-to-workstation-023.htmThe Verge – Microsoft retires Together Mode https://www.theverge.com/tech/932215/microsoft-teams-together-modeSimon Willison – NHS open-source discussion https://simonwillison.net/2026/May/17/gds-weighs-in/#atom-everythingSimon Willison – datasette-llm-limits https://simonwillison.net/2026/May/15/datasette-llm-limits/#atom-everything

7 min

May 15, 2026Episode 11

OpenAI’s Supply-Chain Security Scare, Anthropic’s Mega-Round, and AI Attack Coverage Gaps | UpNext AI – May 15, 2026

OpenAI says hackers accessed some internal data following a code security incident tied to the open-source software supply chain. The company told TechCrunch the breach was limited to employee devices and a small subset of internal repositories, with no impact on production systems, user data, or model intellectual property. The incident is another reminder that frontier AI labs remain deeply dependent on conventional software infrastructure and operational security. Meanwhile, the Financial Times reports Anthropic has agreed terms on a reported $30 billion funding round at a $900 billion valuation. If finalized, the deal would further reinforce how aggressively investors continue backing frontier AI companies as infrastructure-scale platform businesses rather than traditional software startups.In research, we look at Talk is (Not) Cheap, a new paper examining whether existing LLM attack benchmarks actually cover the broader model threat landscape. The authors argue many popular evaluation frameworks repeatedly test similar failure modes while leaving major categories of attacks only weakly explored—or completely untested.In the headlines: Martha Stewart launches an AI-powered home management startup, OpenAI brings Codex into the ChatGPT mobile app, and AWS adds new agentic coding and lightweight reasoning models to SageMaker JumpStart.SourcesTechCrunch – OpenAI security incident https://techcrunch.com/2026/05/14/openai-says-hackers-stole-some-data-after-latest-code-security-issue/Financial Times – Anthropic funding round https://www.ft.com/content/9deae3c6-716d-4f4d-8b09-434d8519f847arXiv – Talk is (Not) Cheap https://arxiv.org/abs/2605.15118v1Fast Company – Martha Stewart AI startup https://www.fastcompany.com/91542596/martha-stewarts-new-ai-startup-a-good-thing?utm_source=postup&utm_medium=email&utm_campaign=technology&position=1&partner=newsletter&campaign_date=05152026The Verge – Codex in ChatGPT mobile app https://www.theverge.com/ai-artificial-intelligence/930763/openai-codex-chatgpt-ios-android-app-previewAWS – New models in SageMaker JumpStart https://aws.amazon.com/about-aws/whats-new/2026/05/agentic-reasoning-models-on-sagemaker-jumpstart/

7 min

May 14, 2026Episode 10

Mistral’s Cybersecurity Model, Microsoft's Grid-Scale AI, and the Next LLM Bottleneck | UpNext AI – May 14, 2026

Mistral is reportedly developing a cybersecurity-focused AI model for European banks, positioning it as an alternative to Anthropic’s restricted-access Mythos system. The story highlights a growing shift in AI infrastructure markets: access, regional control, and deployment flexibility are becoming strategic differentiators alongside raw model capability. Meanwhile, Microsoft Research introduced GridSFM, a foundation model for electric grid optimization designed to predict AC optimal power flow in milliseconds. Microsoft says grid congestion and dispatch inefficiencies can contribute to as much as $20 billion annually in congestion-related costs, underscoring how AI is increasingly moving into critical physical infrastructure and industrial systems.In research, we look at KVServe, a new system for compressing KV cache traffic in distributed LLM serving environments. The paper focuses on one of the biggest practical bottlenecks in modern AI infrastructure: efficiently moving inference state across large-scale production systems. The broader takeaway is that a growing share of AI progress now comes from systems engineering and serving efficiency—not just larger models.In the headlines: Anthropic launches Claude for Small Business with workflow integrations for tools like QuickBooks and PayPal, AWS expands native Claude Platform availability through AWS accounts, and Simon Willison highlights growing skepticism around vague “AI agent” marketing language.SourcesBloomberg – Mistral cybersecurity model for banks https://www.bloomberg.com/news/articles/2026-05-13/mistral-developing-new-ai-model-for-banks-lacking-mythos-accessMicrosoft Research – GridSFM https://www.microsoft.com/en-us/research/blog/gridsfm-a-new-small-foundation-model-for-the-electric-grid/arXiv – KVServe https://arxiv.org/abs/2605.13734v1The Decoder – Claude for Small Business https://the-decoder.com/anthropic-launches-claude-for-small-business-to-embed-ai-into-the-tools-you-forgot-you-pay-for/Simon Willison – “11 AI agents” commentary https://simonwillison.net/2026/May/13/boris-mann/#atom-everythingAWS – Claude Platform on AWS https://aws.amazon.com/blogs/machine-learning/introducing-claude-platform-on-aws-anthropics-native-platform-through-your-aws-account/

8 min

May 13, 2026Episode 9

AI Funding Momentum, Materials Science Models, and Persistent Agent Memory | UpNext AI – May 13, 2026

Kevin Hartz’s venture firm A* has closed a new $450 million fund, reinforcing that major venture capital continues flowing into AI startups despite broader uncertainty around model cycles and platform competition. The firm says it plans to back companies across AI applications, infrastructure, healthcare, fintech, and security. Meanwhile, Microsoft Research published a major update to MatterSim, its AI system for materials science. The company says the platform now supports faster simulation, experimental synthesis validation, and new multi-task modeling capabilities designed to move AI-assisted scientific discovery closer to practical research workflows.In research, we look at MEME — Multi-entity & Evolving Memory Evaluation — a new benchmark examining whether AI agents can reliably remember, update, and reason across long-running interactions. The results suggest current agent memory systems remain fragile, especially when facts evolve or depend on one another over time.In the headlines: Meta tests deeper AI integration inside Threads, OpenAI highlights AI-assisted research workflows through Parameter Golf, Simon Willison explores new OpenAI reasoning APIs and secure sandbox tooling, and Amazon continues to leave the door open to future AI-focused hardware experiments.SourcesTechCrunch – A* closes $450M fund https://techcrunch.com/2026/05/12/kevin-hartzs-a-just-closed-its-third-fund-with-450-million/Microsoft Research – MatterSim update https://www.microsoft.com/en-us/research/blog/advancing-ai-for-materials-with-mattersim-experimental-synthesis-faster-simulation-and-multi-task-models/arXiv – MEME benchmark https://arxiv.org/abs/2605.12477v1The Verge – Meta AI on Threads https://www.theverge.com/tech/929091/meta-ai-threads-account-blockSimon Willison – LLM 0.32a2 / OpenAI responses API https://simonwillison.net/2026/May/12/llm/#atom-everythingOpenAI – Parameter Golf https://openai.com/index/what-parameter-golf-taught-usSimon Willison – CSP Allow-list Experiment https://simonwillison.net/2026/May/13/csp-allow/#atom-everythingThe Verge – Amazon AI phone rumors https://www.theverge.com/tech/929412/amazon-panos-panay-interview-phone-transformer