Justin Brodley, Jonathan Baker, Ryan Lucas and Matt Kohn
The Cloud Pod is your one-stop-shop for all things Public, Hybrid, Multi-cloud, and private cloud. Cloud providers continue to accelerate with new features, capabilities, and changes to their APIs. Let Justin, Jonathan, Ryan and Peter help navigate you through this changing cloud landscape via our weekly podcast.
16h ago
Welcome to episode 334 of The Cloud Pod, where the forecast is always cloudy! This week, we’re bringing you a jam-packed recap of re:Invent! We’ve got all the news, from keynotes to announcements. Whether you were there live or catching up on all the news, Justin, Matt, and Ryan are here to break it all down. Let’s get started! Titles we almost went with this week EKS Gets Chatty: Natural Language Replaces Command Line Nightmares Harvest Now, Decrypt Later: Why Your RSA Keys Need a Quantum Makeover Before 2026 NAT So Fast: AWS Helps You Find Gateways Doing Absolutely Nothing AWS Finally Admits You Have Too Many Log Buckets AWS Finally Lets You Log In Like a Normal Human Lambda Gets a Memory: Checkpoint Your Way to Multi-Step Workflows Step Functions at Home: Lambda Durable Functions Let You Write Workflows in Actual Code No More Bucket List: S3 Public Access Gets Organization-Wide Lockdown AWS Hits Ctrl-Z on CodeCommit Deprecation AWS Puts a Cap on CloudFront: Unlimited Traffic, Limited Anxiety AWS Tells SQL Server to Take a Thread Off: Optimize CPU Cuts Costs by 55% Amazon Bedrock Gets a Bouncer: AgentCore Identity Checks IDs at the Door AI Brings on the Developer Renaissance Follow Up 01:27 re:Invent Matt Garman- 14th Reinvent, which is weird, since we’ve been doing cloud stuff for 87 years… Warner – Open Mind for a different View and nothing else matters T-shirt. 02:59 re:Invent predictions Jonathan Serverless GPU support (extension in Lambda or a different service), it’s about time we have a serverless GPU/Inference capability. It is talked about in the keynote with DeSantis. AI Agent with a goal/instructions that can run when they need to, periodically, or always, and perform an action (Agentic Platform that runs agents) – Garman – Bedrock AgentCore and Kiro Autonomous Agent Werner will announce this is his last keynote and he will retire He retired from re:Invent Presentations Ryan New Tranium 3 chips, Inferentia, and Graviton chips Garman – announced Tranium 3 Ultraservers. They brought the Rack Ryan Expand the number of models in or via bedrock Doubled the number of models and announced Gemma, Minimax M2, Nvidia Nemotron, Mistral Large, and Mistral 3 Refresh to AWS Organizations Justin New Nova Model & Sonic with Multi-modal Garman Nova 2 – Lite, Pro, and Sonic (the lack of Sonic the Hedgehog/Sega reference is a shame) Nova 2 Omni Announce a partnership with OpenAI (likely on stage) Not announced as new, but said they’re running on AWS and that EC2 Ultraservers are in use. Advanced Agentic AI Capabilities for Security Hub (Automate the SOC teams) Garman – Advanced Agentic AI Capabilities for Security Hub – with NEW AWS Security Agent Matt A model router to route LLM queries to different AI models Well-architected framework expansion End user Authentication that doesn’t suck (not current Cognito) Tie Breaker – How many times will they say AI or Artificial Intelligence Matt: 200 Justin: 160 Ryan: 99 Jonathan: 1 Matt Garman’s Keynote: 77 DeSantis’ Keynote: 31 Swami: 44 Werner: 31 Total: 183 This means Justin wins this year! 10:05 Honorable Mentions: Mathematical Proof that one of Amazon’s Models has output that can be verifiable with math Marketplace for AI Work New Device to go along with the Nova Models Cost Savings for Networking FinOps AI recommender for Model Usage Savings Plans for AI/Bedrock Models S3 Vectors with integration bedrock FinOps Kubernetes Service Q Developer with Autonomous Agents Next Generation Silicone for a combined TPU competitor, ie GPU/Graviton/Learning Bedrock Model Marketplace with Revenue Share for fine-tuned models (Ryan) Sustainability Dashboard Aurora/DSQL is an AI feature AWS 11:59 re:Invent keynote Recap Matt – started the weekend strong, although we struggled with his keynotes. (Sounds like he could use a good copywriter to help with his speeches.) Swami – Solid B from us, but that’s because we’re not super interested in his topics. Sorry. Peter – we enjoyed this one more. Cool tech, lots of mentions, and one of the better presenters. A for him. Werner – Great Intro Video. Welcome to the Renaissance Coder 15:00 A Quick Recap Look. We know you care about non-AI things (and so do we), so we’re going to do 25 exciting new announcements in 10 minutes. x8, elon instance, c8a, c8ine instances, m8azn, m3 and m4 max macs, lambda durable functions, 50tb s3 object, s3 batch ops 10x faster, intelligent tiering for s3 tables, automatic replication for s3 tables, s3 access points for FSX netapp, S3 Vectors, GPU Index for Amazon Opensearch, Amazon EMR Serverless with no storage provisioning, Guardduty to ECS & Ec2, Security Hub is GA, Unified data store in cloudwatch, Increases STorage for SQL and Oracle RDS, Optimize CPus for RDS for SQL server, SQL Server Development support, Database Savings Plans. 2 hours on AI…when we would have been really happy with all of THIS as the keynote. 26:08 AI/ML & Amazon Bedrock Bedrock Service Tiers (Priority/Standard/Flex) – Match AI workload performance with cost Bedrock Reserved Service Tier – Pre-purchase guaranteed tokens-per-minute capacity with 99.5% SLA Bedrock AgentCore – Policy controls, evaluations, episodic memory for AI agents Bedrock Reinforcement Fine-tuning – RLVR and RLAIF for model customization Amazon Nova 2 Lite – Fast, cost-effective reasoning model with configurable thinking Nova Forge – Build your own foundational models 18 New Open Weight Models – Mistral Large 3, Ministral 3 variants, others Amazon Q Developer Cost Management – Natural language queries for AWS spending analysis SageMaker Serverless Customization – Automated infrastructure for fine-tuning SageMaker HyperPod – Checkpointless and elastic training capabilities AWS Clean Rooms ML – Privacy-enhancing synthetic dataset generation AgentCore Evaluations – Continuously inspect agent quality based on real-world behavior 29:09 Ryan – “I do agree with you that no one should be building their own foundational models unless it’s really, truly built on a data set that’s unique, but I do think that everyone should go through the exercise of building a model to understand how AI works.” 30:58 Compute (EC2 & Lambda) EC2 P6-B300 Instances – NVIDIA Blackwell Ultra GPUs, 6.4Tbps networking EC2 X8aedz Instances – AMD EPYC 5GHz, memory-optimized for EDA/databases X Æ A-Xii Musk EC2 C8a Instances – AMD EPYC Turin, 30% higher compute performance EC2 M9g Instances – Graviton5 powered, 25% better than Graviton4 Graviton5 Processor – 192 cores, 5x larger cache Lambda Tenant Isolation Mode – Built-in multi-tenant separation Lambda Managed Instances – Run Lambda on your EC2 with AWS management Lambda Durable Functions – Multi-step workflows with automatic state management AWS AI Factories – Cloud-scale AI infrastructure in customer data centers| 33:46 Matt – “I feel like we should have seen this coming, given that they just released the ECS management system a couple of months ago, and it feels like the next step.” 42:24 Containers (EKS & ECS) EKS Capabilities – Managed Argo CD, ACK, KRO in AWS-owned infrastructure EKS MCP Server – Natural language Kubernetes management (preview) EKS Container Network Observability – Service maps, flow tables, performance metrics EKS/ECS Amazon Q Troubleshooting – AI-powered console diagnostics ECS Express Mode – Simplified deployment with automatic ALB, domains, HTTPS 43:36 Ryan – “I think this is what I’ve always wanted Beanstalk and Lightsail to be, is this service. This, for me, feels like the best of both worlds.” 45:34 Networking & Content Delivery CloudFront Flat-Rate Pricing – Bundled delivery, WAF, DDoS protection ($0-$1K/month tiers) VPN Concentrator – 25-100 low-bandwidth sites via a single Transit Gateway attachment Route 53 Accelerated Recovery – 60-minute RTO for DNS during regional outages Route 53 Global Resolver (preview) – Anycast DNS for remote/distributed clients NAT Gateway Regional Availability – Auto-scale across AZs, simplified management VPC Encryption Controls – Enforce encryption in transit within/across VPCs Network Firewall Proxy (preview) – Explicit proxy for outbound traffic filtering 50:29 Ryan – “If you’ve ever had to do any kind of compliance evidence, that’s the reason why this exists and that’s why I love it so much. The song and dance that you have to do to illustrate your use of encryption across your environment is painful.” 53:14 Storage (S3 & FSx) S3 Vectors GA – Native vector support, 2B vectors/index, 20T vectors/bucket S3 Tables Replication & Intelligent-Tiering – Cross-region/account Iceberg replication S3 Storage Lens Enhancements – Performance metrics, billions of prefixes, S3 Tables export S3 Encryption Controls – Bucket-level encryption type enforcement S3 Block Public Access – Organization-level enforcement S3 50TB Object Size – 10x increase from previous 5TB limit FSx for NetApp ONTAP S3 Access Points – Access file data via S3 API 54:38 Matt – “This is just a nice quality of life improvement.” 58:24 Databases Aurora DSQL Cost Estimates – Statement-level DPU usage in query plans Aurora PostgreSQL Dynamic Data Masking – pg_columnmask extension OpenSearch 3.3 – Agentic search, semantic highlighter improvements OpenSearch GPU Acceleration – 6-14x faster vector indexing RDS SQL Server/Oracle Optimizations – Free Developer Edition, 256 TiB storage, CPU optimization RDS SQL Server Resource Governor – Workload resource control Database Savings Plans – Up to 35% savings across 9 database services 1:01:01 Justin – “This is quite nice, and quite broad, so they definitely heard all of the community saying please bring us database savings plans.” 1:03:33 Security & Identity Security Hub GA – Near real-time analytics, risk prioritization, Trends feature Secrets Manager External Secrets – Managed rotation for Salesforce, Snowflake, BigID IAM Outbound Identity Federation – Short-lived JWTs for external service authentication AWS login CLI Command – Eliminate long-term access keys with OAuth 2.0 WAF Web Bot Auth – Cryptographic signature verification for legitimate AI agents Agentcore Identity GuardDuty Extended Threat Detection – EC2/ECS multistage attack correlation AWS Security Agent (preview) – AI-powered security reviews, code scanning, pen testing IAM Policy Autopilot – Open source MCP server for generating IAM policies from code. 1:08:18 Matt – “…it’s definitely competing with Azure releasing the same thing during their conference. The piece I like about this is the pen test piece because it now lives in your source code, which you probably already have in SCA or a static code analysis tool.” 1:11:46 Cost Management & FinOps Cost Explorer 18-Month Forecasting – Extended from 12 months to 18 months, explainable with AI (in preview). Cost Efficiency Metric – Single percentage score combining optimization opportunities. AWS Data Exports FOCUS 1.2 – Standardized multi-cloud billing format Billing Transfer – Centralized billing across multiple Organizations Compute Optimizer NAT Gateway Recommendations – Identify unused NAT Gateways 1:14:09 Developer Tools & Modernization Step Functions Local Testing – TestState API with mocking support AWS Transform Custom – AI-powered code modernization (Java, Node.js, Python) AWS Transform Mainframe – COBOL to microservices with automated testing API Gateway Developer Portals – Native API discovery and documentation CodeCommit Restored to GA – Git LFS (Q1 2026), regional expansion (Q3 2026) AWS Transform Windows – Full-stack .NET/SQL Server modernization CloudWatch Unified Data Management – Consolidated ops/security/compliance logs CloudWatch Deletion Protection – Prevent accidental log group removal. CloudWatch Network Flow Monitor – Container network observability for EKS 1:18:09 Matt – “I mean, I hope all customers have some sort of plan, knowing that I’ve seen many companies say ‘we got this notice six months ago, we’ll deal with it in six months’ and now it’s three weeks and six days, and it expires tomorrow…there’s probably a lot of customers still there.” 1:20:58 Observability & Monitoring CloudWatch Unified Data Management – Consolidated ops/security/compliance logs CloudWatch Deletion Protection – Prevent accidental log group removal CloudWatch Network Flow Monitor – Container network observability for EKS 1:21:39 Governance & Management Control Tower Controls Dedicated – Use managed controls without a full landing zone. Service Quotas Automatic Management – Auto-adjust limits based on usage Supplementary Packages for Amazon Linux – Pre-built EPEL9 packages AMI Ancestry – Automatic lineage tracking for AMIs 1:23:05 Matt – “I’ve built three different ways to do this in my career. You always want to know where it came from, so if there’s a vulnerability, you know where to start patching and go up from there…but if you have multiple teams, it’s hard to track. So knowing I can track it is a godsend.” 1:25:35 DevOps & Operations AWS DevOps Agent (preview) – Autonomous incident investigation and root cause analysis AWS Support Plan Restructure – Business Support+ ($29/mo), Enterprise ($5K/mo), Unified Ops ($50K/mo) 1:26:41 Ryan – “I hope this ends up being decent service, but in my head I’m thinking they’re lowering the cost because they’re getting rid of all their support staff.” 1:29:29 Marketplace & Partner Partner Central in Console – Unified customer/partner experience Multi-Product Solutions – Bundled offerings from multiple vendors CrowdStrike Falcon Integration – Automated SIEM setup wizard 1:30:15 Connectivity & Contact Center Amazon Connect Predictive Insights (preview) – AI-powered recommendations Amazon Connect MCP Support – Standardized tools for AI agents Noteable Announcments We Didn’t Cover in the Show: AWS announces flat-rate pricing plans for website delivery and security Accelerate workflow development with enhanced local testing in AWS Step Functions Streamlined multi-tenant application development with tenant isolation mode in AWS Lambda AWS Control Tower introduces a Controls Dedicated experience Monitor network performance and traffic across your EKS clusters with Container Network Observability New AWS Billing Transfer for centrally managing AWS billing and costs across multiple organizations AWS Cost Explorer now provides 18-month forecasting and explainable AI-powered forecasts Announcing enhanced cost management capabilities in Amazon Q Developer Simplify access to external services using AWS IAM Outbound Identity Federation Introducing AWS Glue 5.1 Tech predictions for 2026 and beyond | All Things Distributed Introducing multi-product solutions in AWS Marketplace Closing And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod
Dec 10
Welcome to episode 333 of The Cloud Pod, where the forecast is always cloudy! Justin, Ryan, and Matt are taking a quick break from re:Invent festivities. They bring you the latest and greatest in Cloud and AI news. This week, we discuss Norad and Anthropic teaming up to bring you Christmas cheer. Wait, is that right? Huh. We also have undersea cables, some Turkish region delight, and a LOT of Opus 4.5 news. Let’s get into it! Titles we almost went with this week Boring Error Pages Not Found Claude Goes Native in Snowflake: Finally, AI That Stays Where Your Data Lives Cross-Cloud Romance: AWS and Google Make It Official with Interconnect Google Gemini Puts OpenAI in Code Red: The Tables Have Turned Azure NAT Gateway V2: Now With More Zones Than a Parking Lot From ChatGPT to Chat-Uh-Oh: OpenAI Sounds the Alarm as Gemini Steals 200 Million Users **Anthropic Scheduled Actions: Because Your VMs Need a Work-Life Balance Too Finally, Your 500 Errors Can Look as Good as Your Homepage Foundry Model Router: Because Choosing Between 47 AI Models is Nobody’s Idea of Fun Google Takes the Scenic Route: New Cable Avoids the Sunda Strait Traffic Jam Azure Application Gateway Gets Its TCP/IP Diploma Google Cloud Gets Its Türkiye Dinner: 2 Billion Dollar Cloud Feast Coming Soon Microsoft Foundry: Turning AI Chaos into Compliance Gold AI Is Going Great, or How ML Makes Money 02:59 Nano Banana Pro available for enterprise Google launches Nano Banana Pro (Gemini 3 Pro Image) in general availability on Vertex AI and Google Workspace , with Gemini Enterprise support coming soon. The model supports up to 14 reference images for style consistency and generates 4K resolution outputs with multilingual text rendering capabilities. The model includes Google Search grounding for factual accuracy in generated infographics and diagrams, plus built-in SynthID watermarking for transparency. Copyright indemnification will be available at general availability under Google’s shared responsibility framework. Enterprise integrations are live with Adobe Firefly , Photoshop , Canva , and Figma , enabling production-grade creative workflows. Major retailers, including Klarna, Shopify, and Wayfair, report using the model for product visualization and marketing asset generation at scale. Developers can access Nano Banana Pro through Vertex AI with Provisioned Throughput and Pay As You Go pricing options, plus advanced safety filters. Business users get access through Google Workspace apps, including Slides, Vids, and NotebookLM , starting today. The model handles complex editing tasks like translating text within images while preserving visual elements, and maintains character and brand consistency across multiple generated assets. This addresses a key enterprise challenge of maintaining creative control when using AI for production assets. 03:59 Justin – “The thing that’s the most important about this is when Nano Banana messes up the text (which it doesn’t do as often), you can now edit it without generating a whole completely different image.” 05:58 Introducing Claude Opus 4.5 Claude Opus 4.5 is now generally available across Anthropic’s API , apps, and all three major cloud platforms at $5 per million input tokens and $25 per million output tokens. This represents a substantial price reduction that makes Opus-level capabilities more accessible. Developers can access it via the claude-opus-4-5-20251101 model identifier. The model achieves state-of-the-art performance on software engineering benchmarks, scoring higher than any human candidate on Anthropic’s internal performance engineering exam within a 2-hour time limit on SWE-bench Verified. It matches Sonnet 4.5 ‘s best score while using 76% fewer output tokens at medium effort, and exceeds it by 4.3 percentage points at highest effort while still using 48% fewer tokens. Anthropic introduces a new effort parameter in the API that lets developers control the tradeoff between speed and capability, allowing optimization for either minimal time and cost or maximum performance depending on the task requirements. This combines with new context management and memory capabilities to boost performance on agentic tasks by nearly 15 percentage points in testing. Claude Code gains Plan Mode that builds a user-editable plan.md files before execution, and is now available in the desktop app for running multiple parallel sessions. The consumer apps remove message limits for Opus 4.5 through automatic context summarization, and Claude for Chrome and Claude for Excel expand to all Max, Team, and Enterprise users. The model demonstrates improved robustness against prompt injection attacks compared to other frontier models and is described as the most robustly aligned model Anthropic has released. It shows better performance across vision, reasoning, and mathematics tasks while using dramatically fewer tokens than predecessors, reaching similar or better outcomes. 08:01 Justin – “The most important part of the whole announcement is the cheaper context input and output tokens.” 09:58 Announcing Claude Opus 4.5 on Snowflake Cortex AI Snowflake Cortex AI now offers Claude Opus 4.5 and Claude Sonnet 4.5 in general availability, bringing Anthropic’s latest models directly into Snowflake’s data platform. Users can access these models through SQL, Python, or REST APIs without moving data outside their Snowflake environment. Claude Opus 4.5 delivers improved performance on complex reasoning tasks, coding, and multilingual capabilities compared to previous versions, while Claude Sonnet 4.5 provides a balanced option for speed and intelligence. Both models support 200K token context windows and can process text and images natively within Snowflake queries. The integration enables enterprises to build AI applications using their Snowflake data with built-in governance and security controls, eliminating the need to export sensitive data to external AI services. Pricing follows Snowflake’s credit-based model, with costs varying by model and token usage. Developers can combine Claude models with other Cortex AI features like vector search, document understanding, and fine-tuning capabilities to create end-to-end AI workflows. This allows for use cases ranging from customer service automation to financial analysis and code generation, all within the Snowflake ecosystem. 11:03 OpenAI CEO declares “code red” as Gemini gains 200 million users in 3 months Oh, how the turn tables have turned… OpenAI CEO Sam Altman issued an internal code red memo to refocus the company on improving ChatGPT after Google’s Gemini 3 model topped the LMArena leaderboard and gained 200 million users in three months. The directive delays planned features, including advertising integration, AI agents for health and shopping, and the Pulse personal assistant feature. Google’s Gemini 3 model, released in mid-November, has outperformed ChatGPT on industry benchmark tests and attracted high-profile users like Salesforce CEO Marc Benioff, who publicly announced switching from ChatGPT after three years. The model’s performance represents a significant shift in the competitive landscape since OpenAI’s initial ChatGPT launch in December 2022. The situation mirrors December 2022, when Google declared its own code red after ChatGPT’s rapid adoption, with CEO Sundar Pichai reassigning teams to develop competing AI products. This role reversal demonstrates how quickly competitive positions can shift in the AI model space, particularly around user experience and benchmark performance. OpenAI is implementing daily calls for teams responsible for ChatGPT improvements and encouraging temporary team transfers to address the competitive pressure. The company’s response indicates that maintaining market leadership in conversational AI requires continuous iteration even for established products with large user bases. 13:11 Ryan – “I started on ChatGPT and tried to use it after adopting Claude, and I try to go back every once in a while – especially when they would announce a new model, but I always end up going back to one of the Anthropic models.” GCP 15:19 New Google Cloud region coming to Türkiye Google Cloud is launching a new region in Türkiye as part of a 2 billion dollar investment over 10 years, partnering with local telecom provider Turkcell, which will invest an additional 1 billion dollars in data centers and cloud infrastructure. This brings Google Cloud’s global footprint to 43 regions and 127 zones, with Türkiye serving as a strategic hub for EMEA customers. The region targets three key verticals already committed as customers: financial services with Garanti BBVA and Yapi Kredi Bank modernizing core banking systems, airlines with Turkish Airlines improving flight operations and passenger systems, and government entities focused on digital sovereignty. The local presence addresses data residency requirements and provides low-latency access for organizations that need to keep data within national borders. Technical capabilities include standard Google Cloud services for data analytics, AI, and cybersecurity with data encryption at rest and in transit, granular access controls, and threat detection systems meeting international security standards. The region will serve both Türkiye and neighboring countries with reduced latency compared to existing European regions. The announcement emphasizes digital sovereignty as a primary driver, with government officials highlighting the importance of local infrastructure for maintaining control over national data while accessing hyperscale cloud capabilities. This follows a pattern of Google Cloud expanding into regions where data localization requirements create demand for in-country infrastructure. No specific pricing details were provided for the Türkiye region, though standard Google Cloud pricing models based on compute, storage, and network usage will apply once the region launches. The timeline for when the region will be operational was not disclosed in the announcement. Show note editor Heather note: If you enjoy history, you need to travel to Türkiye immediately! 17:03 Introducing BigQuery Agent Analytics Google launches BigQuery Agent Analytics , a new plugin for their Agent Development Kit that streams AI agent interaction data directly to BigQuery with a single line of code. The plugin captures metrics like latency, token consumption, tool usage, and user interactions in real-time using the BigQuery Storage Write API , enabling developers to analyze agent performance and optimize costs without complex instrumentation. The integration allows developers to leverage BigQuery’s advanced capabilities, including generative AI functions, vector search, and embedding generation to perform sophisticated analysis on agent conversations. Teams can cluster similar interactions, identify failure patterns, and join agent data with business metrics like CSAT scores to measure real-world impact, going beyond basic operational metrics to quality analysis. The plugin includes three core components: an ADK plugin that requires minimal code changes, a predefined optimized BigQuery schema for storing interaction data, and low-cost streaming via the BigQuery Storage Write API. Developers maintain full control over what data gets streamed and can customize pre-processing, such as redacting sensitive information before logging. Currently available in preview for ADK users, with support for other agent frameworks like LangGraph coming soon. The feature addresses a critical gap in agentic AI development where understanding user interaction patterns and agent performance is essential for refinement, particularly as organizations move from building agents to optimizing them at scale. Pricing follows standard BigQuery costs for storage and queries, with the Storage Write API offering cost-effective real-time streaming compared to traditional batch loading methods. Documentation and a hands-on codelab are available at google.github.io/adk-docs for developers ready to implement agent analytics. 18:16 Ryan – “This is an interesting model; providing both the schema and the already instrumented integration. I feel like a lot of times with other types of development, you’re left to your own devices, and so this is a neat thing. As you’re developing an agent, everyone is instrumenting these things in odd ways, and it’s very difficult to compile the data in a way where you get usable queries out of it. So it’s kind of an interesting concept.” 19:35 TalayLink subsea cable to connect Australia and Thailand You know how much we love a good undersea cable… Google announces TalayLink, a new subsea cable connecting Australia and Thailand via the Indian Ocean, taking a western route around the Sunda Strait to avoid congestion from existing cable paths. This cable extends the Interlink system from the Australia Connect initiative and will directly connect to Google’s planned Thailand cloud region and data centers. The project includes two new connectivity hubs in Mandurah, Western Australia, and South Thailand, providing diverse landing points away from existing cable concentrations in Perth and enabling cable switching, content caching, and colocation capabilities. Google is partnering with AIS for the South Thailand hub to leverage existing infrastructure. TalayLink forms part of a broader Indian Ocean connectivity strategy, linking with previously announced hubs in the Maldives and Christmas Island to create redundant paths connecting Australia, Southeast Asia, Africa, and the Middle East. This routing diversity aims to improve network resilience across multiple regions. The infrastructure supports Thailand’s digital economy transformation goals and Western Australia’s digital future roadmap, with the Thailand Board of Investment actively backing the project. No pricing or specific completion timeline was disclosed in the announcement. The Cloud Pod is excited to cover the latest innovations and trends. We aim to keep you informed about the evolving landscape of cloud technology and artificial intelligence. 20:34 Matt – “It’s amazing…subsea cable congestion. How many cables can be there that there’s congestion?” 23:16 Claude Opus 4.5 on Vertex AI Claude Opus 4.5 is now generally available on Vertex AI , delivering Anthropic’s most advanced model at one-third the cost of its predecessor Opus 4.1. The model excels in coding tasks that can compress multi-day development projects into hours, agentic workflows with dynamic tool discovery from hundreds of tools without context window bloat, and office productivity tasks with improved memory for maintaining consistency across documents. Google is positioning Vertex AI as a unified platform for deploying Claude with enterprise features, including global endpoints for reduced latency, provisioned throughput for dedicated capacity at fixed costs, and prompt caching with flexible Time To Live up to one hour. The platform integrates with Google’s Agent Builder stack , including the open Agent Development Kit, Agent2Agent protocol, and fully managed Agent Engine for moving multi-step workflows from prototype to production. Security and governance capabilities include Google Cloud’s foundational security controls, data residency options, and Model Armor protection against AI-specific threats like prompt injection and tool poisoning through Security Command Center . Customers like Palo Alto Networks report 20-30 percent increases in code development velocity when using Claude on Vertex AI. The model supports a 1 million token context window, batch predictions for cost efficiency, and web search capabilities in preview. Regional availability and specific pricing details are available in the Vertex AI documentation, with the model accessible through both the Model Garden and Google Cloud Marketplace . 23:58 Registration is live for Google Cloud Next 2026 in Las Vegas Google Cloud Next 2026 takes place April 22-24 in Las Vegas, with registration now open at an early bird price of $999 for a limited time. This represents the standard pricing structure for Google’s flagship annual conference following their record-breaking attendance in 2025. The conference focuses heavily on AI agent development and implementation, featuring interactive demos, hackathons, and workshops designed to help attendees build intelligent agents. Organizations can learn from real-world case studies of companies deploying AI solutions at scale. Next 2026 offers hands-on technical training through deep-dive sessions, keynotes, and practical labs aimed at developers and technical practitioners. The format emphasizes actionable learning with direct access to Google engineers and product experts. The event serves as a networking hub for cloud practitioners to connect with peers facing similar technical challenges and to provide feedback that influences Google Cloud’s product roadmap. This direct line to product teams can be valuable for organizations planning their cloud strategy. Ready to register? You can do that here . 27:19 VPC Flow Logs for Cross-Cloud Network VPC Flow Logs now support Cloud VPN tunnels and VLAN attachments for Cloud Interconnect and Cross-Cloud Interconnect, extending visibility beyond traditional VPC subnet traffic to hybrid and multi-cloud connections. This addresses a critical gap for organizations running Cross-Cloud Network architectures who previously lacked detailed telemetry on traffic flowing between Google Cloud, on-premises infrastructure, and other cloud providers. The feature provides 5-tuple granularity logging (source/destination IP, port, and protocol) with new gateway annotations that identify traffic direction and context through reporter and gateway object fields. Flow Analyzer integration eliminates the need for complex SQL queries, offering built-in analysis capabilities including Gemini-powered natural language queries and in-context Connectivity Tests to correlate flow data with firewall policies and network configurations. Primary use cases include identifying elephant flows that congest specific tunnels or attachments, auditing Shared VPC bandwidth consumption by service projects, and troubleshooting connectivity issues by verifying whether traffic reaches Google Cloud gateways. Organizations can also validate DSCP markings for application-aware Cloud Interconnect policy configurations, which is particularly valuable for enterprises with quality-of-service requirements. The feature is available now for both new and existing deployments through Console, CLI, API, and Terraform, with Flow Analyzer providing no-cost analysis of logs stored in Cloud Logging. This capability is particularly relevant for financial services, healthcare, and enterprises with strict compliance requirements that need comprehensive audit trails of cross-cloud and hybrid network traffic. 28:37 Ryan – “The controls say that you have to have logging, not what the logging is – and so very frequently it is sort of ‘turn it on and sort of forget it’. I do think this is great, but it is sort of, they say the five-tuple granularity will help you measure congestion, but I don’t see them actually producing any sort of bandwidth or request size metrics. So it is sort of an interesting thing, but it’s at least better than the nothing that we had before. So I’ll take it.” 30:35 AWS and Google Cloud collaborate on multicloud networking AWS and Google Cloud jointly engineered a multicloud networking solution that eliminates the need for manual physical infrastructure setup between their platforms. Customers can now provision dedicated bandwidth and establish connectivity in minutes instead of weeks through either cloud console or API. The solution uses AWS Interconnect multicloud and Google Cloud Cross-Cloud Interconnect with quad-redundancy across physically separate facilities and MACsec encryption between edge routers. Both providers published open API specifications on GitHub for other cloud providers to adopt the same standard. Previously, connecting AWS and Google Cloud required customers to manually coordinate physical connections, equipment, and multiple teams over weeks or months. This new managed service abstracts away physical connectivity, network addressing, and routing policy complexity into a cloud-native experience. Salesforce is using this capability to connect its Data 360 platform across clouds using pre-built capacity pools and familiar AWS tooling. The integration allows them to ground AI and analytics in trusted data regardless of which cloud it resides in. The collaboration represents a shift toward cloud provider interoperability through open standards rather than proprietary solutions. The published specifications enable any cloud provider or partner to implement compatible multicloud connectivity using the same framework. 31:38 Justin – “I do want you guys to check the weather. Do you see pigs flying or anything crazy?” Azure 33:17 Generally Available: TLS and TCP termination on Azure Application Gateway Azure Application Gateway now supports TLS and TCP protocol termination at general availability, expanding beyond its traditional HTTP/HTTPS load balancing capabilities. This allows customers to use Application Gateway for non-web workloads like database connections, message queuing systems, and other TCP-based applications that previously required separate load balancing solutions. The feature consolidates infrastructure by letting organizations use a single gateway service for both web and non-web traffic, reducing the need to deploy and manage multiple load balancers. This is particularly useful for enterprises running mixed workloads that include legacy applications, databases like SQL Server or PostgreSQL , and custom TCP services alongside modern web applications. Application Gateway’s existing features, like Web Application Firewall , autoscaling, and zone redundancy, now extend to TCP and TLS traffic, providing consistent security and availability across all application types. The pricing model follows Application Gateway’s standard consumption-based structure with charges for gateway hours and data processing, though specific costs for TCP/TLS termination were not detailed in the announcement. Common use cases include load balancing for database clusters, securing MQTT or AMQP message broker connections, and providing SSL offloading for legacy applications that don’t natively support modern TLS versions. This positions Application Gateway as a more versatile Layer 4-7 load balancing solution competing with dedicated TCP load balancers and third-party appliances. 33:38 Justin – “Thank you for developing network load balancers.” 34:48 Generally Available: Azure Application Gateway mTLS passthrough support Want to make your life even more complicated? Well, it’s GOOD NEWS! Azure Application Gateway now supports mutual TLS passthrough in general availability, allowing backend applications to validate client certificates and authorization headers directly while still benefiting from Web Application Firewall inspection. This addresses a specific compliance requirement where organizations need end-to-end certificate validation but cannot terminate TLS at the gateway layer. The feature enables scenarios where backend services must verify client identity through certificates for regulatory compliance or zero-trust architectures, particularly relevant for financial services, healthcare, and government workloads. Previously, customers had to choose between WAF protection or backend certificate validation, creating security or compliance gaps. Application Gateway continues to inspect traffic through WAF rules even as the mTLS connection passes through to the backend, maintaining protection against common web exploits and OWASP vulnerabilities. This dual-layer approach means organizations can enforce both perimeter security policies and application-level authentication without architectural compromises. The capability is available across all Azure regions where Application Gateway v2 SKU operates, with standard Application Gateway pricing applying based on capacity units consumed. No additional charges exist specifically for the mTLS passthrough feature itself, though backend certificate validation may increase processing overhead slightly. 36:30 Matt – “I did S tunnel and MongoDB because it didn’t support encryption for the longest time…that was a fun one.” 36:50 Public Preview: Azure API Management adds support for A2A Agent APIs Azure API Management now supports Agent-to-Agent (A2A) APIs in public preview, allowing organizations to manage AI agent APIs alongside traditional REST APIs, AI model APIs, and Model Context Protocol tools within a single governance framework. This addresses the growing need to standardize how autonomous agents communicate and interact across enterprise systems. The feature enables centralized management of agent interactions, which is particularly relevant as organizations deploy multiple AI agents that need to coordinate tasks and share information. API Management can now apply consistent security policies, rate limiting, and monitoring across all agent communications, reducing the operational complexity of multi-agent architectures. This capability positions Azure API Management as a unified control plane for the full spectrum of API types emerging in AI-driven applications. Organizations already using API Management for traditional APIs can extend their existing governance practices to cover agent-based workflows without deploying separate infrastructure. The preview is available in Azure regions where API Management is currently supported, though specific pricing for A2A API features has not been disclosed separately from standard API Management tiers. Organizations should evaluate this against their existing API Management costs, which start at approximately $50 per month for the Developer tier. 38:13 Introducing Claude Opus 4.5 in Microsoft Foundry Claude Opus 4.5 is now available in public preview on Microsoft Foundry , GitHub Copilot paid plans, and Microsoft Copilot Studio , expanding Azure’s frontier model portfolio following the Microsoft-Anthropic partnership announced at Ignite. The model achieves 80.9% on SWE-bench software engineering benchmarks and is priced at one-third the cost of previous Opus-class models, making advanced AI capabilities more accessible for enterprise workloads. The model introduces three key developer features on Foundry: an Effort Parameter in beta that lets teams control computational allocation across thinking and tool calls, Compaction Control for managing context in long-running agentic tasks, and enhanced programmatic tool calling with dynamic tool discovery that doesn’t consume context window space. These capabilities enable sophisticated multi-tool workflows across cybersecurity, financial modeling, and full-stack development. Opus 4.5 serves as Anthropic’s strongest vision model and delivers improved computer use performance for automating desktop tasks, particularly for creating spreadsheets, presentations, and documents with professional polish. The model maintains context across complex projects using memory features, making it suitable for precision-critical verticals like finance and legal, where consistency matters. Microsoft Foundry’s rapid integration strategy gives Azure customers immediate access to the latest frontier models while maintaining centralized governance, security, and observability at scale. This positions Azure as offering the widest selection of advanced AI models among cloud providers, with Opus 4.5 available now through the Foundry portal and coming soon to Visual Studio Code via the Foundry extension . 38:37 Justin – “Cool, it’s in Foundry – hooray!” 40:21 Generally Available: DNS security policy Threat Intelligence feed Azure DNS security policy now includes a managed Threat Intelligence feed that blocks queries to known malicious domains. This feature addresses the common attack vector where nearly all cyber attacks begin with a DNS query, providing an additional layer of protection at the DNS resolution level. The service integrates with Azure’s existing DNS infrastructure and uses Microsoft’s threat intelligence data to automatically update the list of malicious domains. Organizations can enable this protection without managing their own threat feeds or maintaining blocklists, reducing operational overhead for security teams. This capability is particularly relevant for enterprises looking to implement defense-in-depth strategies, as it stops threats before they can establish connections to command and control servers or phishing sites. The feature works alongside existing Azure Firewall and network security tools to provide comprehensive protection. The general availability means the service is now production-ready with full SLA support across Azure regions. Pricing details were not specified in the announcement, so customers should check Azure pricing documentation for DNS security policy costs. 41:28 Ryan – “It is something, being able to automatically take the results of a feed, I will do any day just because these things are updated by many more parties and faster than I can ever react to, and you know, our own threat intelligence. So that’s pretty great. I like it.” 42:46 Public Preview: Standard V2 NAT Gateway and StandardV2 Public IPs Azure introduces StandardV2 NAT Gateway in public preview, adding zone-redundancy for high availability in regions with availability zones. This upgrade addresses a key limitation of the original NAT Gateway by ensuring outbound connectivity survives zone failures, which matters for enterprises running mission-critical workloads that require consistent internet egress. The StandardV2 SKU includes matching StandardV2 Public IPs that work together with the new NAT Gateway tier. Organizations using the original Standard SKU will need to evaluate migration paths since zone-redundancy represents a fundamental architectural change requiring new resource types rather than an in-place upgrade. This release targets customers who previously had to architect complex workarounds for zone-resilient outbound connectivity, particularly those running multi-zone deployments of containerized applications or database clusters. The preview allows testing of failover scenarios before production deployment. The announcement lacks specific pricing details for the StandardV2 tier, though NAT Gateway typically charges based on hourly resource fees plus data processing costs. Customers should monitor Azure pricing pages as the preview progresses toward general availability for cost comparisons against the Standard SKU. 43:48 Justin – “The fact that this is not an upgrade that I can just check, and I have to redeploy a whole new thing, annoys the crap out of me.” 46:51 Generally Available: Custom error pages on Azure App Service Custom error pages on Azure App Service have moved to general availability, allowing developers to replace default HTTP error pages with branded or customized alternatives. This addresses a common requirement for production applications where maintaining a consistent user experience during errors is important for brand identity and user trust. The feature integrates directly into App Service configuration without requiring additional Azure services or third-party tools. Developers can specify custom HTML pages for different HTTP error codes like 404 or 500, which App Service will serve automatically when those errors occur. This capability is particularly relevant for customer-facing web applications, e-commerce sites, and SaaS platforms where error handling needs to align with corporate branding guidelines. The feature works across all App Service tiers that support custom domains and SSL certificates. No additional cost is associated with custom error pages beyond standard App Service hosting fees, which start at approximately $13 per month for the Basic tier. Implementation requires uploading error page files to the app’s file system and updating configuration settings through Azure Portal or deployment templates. The general availability status means the feature is now production-ready with full support coverage, moving beyond the preview phase where it was available for testing. Documentation is available at the Azure App Service custom error pages guide. 48:17 Matt – “It’s crazy that this wasn’t already there. The workarounds you had to do to make your own error page was messy at best.” 49:01 Generally Available: Streamline IT governance, security, and cost management experiences with Microsoft Foundry Microsoft Foundry reaches general availability as an enterprise AI governance platform that consolidates security, compliance, and cost management controls for IT administrators deploying AI solutions. The platform addresses the growing need for centralized oversight as organizations scale their AI initiatives across Azure infrastructure. The service integrates with existing Azure management tools to provide unified visibility and control over AI workloads, allowing IT teams to enforce policies and monitor resource usage from a single interface. This reduces the operational overhead of managing disparate AI projects while maintaining enterprise security standards. Foundry targets large enterprises and regulated industries that require strict governance frameworks for AI deployment, particularly organizations balancing innovation speed with compliance requirements. The platform helps bridge the gap between data science teams pushing for rapid AI adoption and IT departments responsible for risk management. The general availability announcement indicates Microsoft is positioning Azure as the enterprise-ready AI cloud, competing directly with AWS and Google Cloud for organizations prioritizing governance alongside AI capabilities. Specific pricing details were not disclosed in the announcement, suggesting costs likely vary based on usage and existing Azure commitments. 50:22 Justin – “It’s like a combination of SageMaker and Vertex married Databricks and then had a baby – plus a report interface.” 52:44 Generally Available: Model Router in Microsoft Foundry Microsoft Foundry’s Model Router is now generally available as an AI orchestration layer that automatically selects the optimal language model for each prompt based on factors like complexity, cost, and performance requirements. This eliminates the need for developers to manually choose between different AI models for each use case. The service supports an expanded range of models, including the GPT-4 family , GPT-5 family , GPT-oss , and DeepSeek models, giving organizations flexibility to balance performance needs against cost considerations. The router can dynamically switch between models within a single application based on prompt characteristics. This addresses a practical challenge for enterprises deploying multiple AI models where different tasks require different model capabilities. For example, simple queries could route to smaller, less expensive models while complex reasoning tasks automatically use more capable models. The orchestration layer integrates with Microsoft Foundry’s broader AI infrastructure, allowing customers to manage multiple model deployments through a single interface rather than building custom routing logic. This reduces operational complexity for teams managing diverse AI workloads across their organization. No specific pricing details are provided in the announcement, though costs will likely vary based on the underlying models selected by the router and usage patterns. Organizations should evaluate potential cost savings from routing simpler queries to less expensive models versus always using premium models. 54:50 Generally Available: Scheduled Actions Azure’s Scheduled Actions feature is now generally available, providing automated VM lifecycle management at scale with built-in handling of subscription throttling and transient error retries. This eliminates the need for custom scripting or third-party tools to start, stop, or deallocate VMs on a recurring schedule. The feature addresses common cost optimization scenarios where organizations need to automatically shut down development and test environments during off-hours or scale down non-production workloads on weekends. This can reduce compute costs by 40-70% for environments that don’t require 24/7 availability. Scheduled Actions integrates directly with Azure Resource Manager and works across VM scale sets, making it suitable for both individual VMs and large-scale deployments. The automatic retry logic and throttling management means operations complete reliably even when managing hundreds or thousands of VMs simultaneously. The service is available in all Azure public cloud regions where VMs are supported, with no additional cost beyond standard VM compute charges. Organizations pay only for the time VMs are running, so automated shutdown schedules directly translate to reduced monthly bills. 55:31 Justin – “Thank you for copying every other cloud that’s had this forever…” After Show 51:46 OpenAI and NORAD team up to bring new magic to “NORAD Tracks Santa.” OpenAI partnered with NORAD to add AI-powered holiday tools to the annual Santa tracking tradition, creating three ChatGPT-based features that turn kids’ photos into elf portraits, generate custom toy coloring pages, and build personalized Christmas stories. This represents a consumer-friendly application of generative AI that demonstrates how large language models can be packaged for mainstream family use during the holidays. The collaboration shows OpenAI pursuing brand-building partnerships with trusted institutions like NORAD to normalize AI tools in everyday contexts. By embedding ChatGPT features into a 68-year-old military tradition that reaches millions of families, OpenAI gains exposure to non-technical users who might otherwise be hesitant about AI adoption. From a technical perspective, these tools showcase practical implementations of image generation and text-to-image capabilities that parents can use without understanding the underlying models. The focus on simple, single-purpose GPTs rather than complex interfaces suggests OpenAI is testing how to make their technology more accessible to casual users. The partnership raises interesting questions about AI companies seeking legitimacy through associations with government organizations and cultural traditions. While the tools are harmless holiday fun, they demonstrate how AI providers are moving beyond enterprise sales to embed their technology into cultural moments and family activities. This is essentially a marketing play disguised as holiday cheer, but it does illustrate how cloud-based AI services are becoming infrastructure for consumer experiences rather than just backend business tools. The real story is about distribution strategy and making AI feel safe and familiar to mainstream audiences. The Cloud Pod has one message: keep Skynet out of Christmas! Closing And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod
Nov 28
Welcome to episode 332 of The Cloud Pod – where the forecast is always cloudy! It’s Thanksgiving week, which can only mean one thing: AWS Re:Invent predictions! In this special episode, Justin, Jonathan, Ryan, and Matt engage in the annual tradition of drafting their best guesses for what AWS will announce at the biggest cloud conference of the year. Justin is the reigning champion (probably because he actually reads the show notes), but with a reverse snake draft order determined by dice roll, anything could happen. Will Werner announce his retirement? Is Cognito finally getting a much-needed overhaul? And just how many times will “AI” be uttered on stage? Grab your turkey and let’s get predicting! Titles we almost went with this week: Roll For Initiative: The Re:Invent Prediction Draft Justin’s Winning Streak: A Study in Actually Doing Your Homework Serverless GPUs and Broken Dreams: Our Re:Invent Wishlist Shooting in the Dark: AWS Predictions Edition We’re Never Good at This, But Here We Go Again Vegas Odds: What Happens at Re:Invent, Gets Predicted Wrong AWS Re:Invent Predictions 2025 The annual prediction draft is here! Draft order was determined by dice roll: Jonathan first, followed by Ryan, Justin, and Matt in last position. As always, it’s a reverse order format, with points awarded for each correct prediction announced during the Tuesday, Wednesday, and Thursday keynotes. Jonathan’s Predictions Serverless GPU Support – An extension to Lambda or a different service that provides on-demand serverless GPU/inference capability. Likely with requirements for pre-warmed provisioned instances. Agentic Platform for Continuous AI Agents – A service that allows agents to run continuously with goals or instructions, performing actions periodically or on-demand in the real world. Think: running agents on a schedule that can check conditions and take automated actions. Werner Vogels Retirement Announcement – Werner will announce that this is his last Re:Invent keynote and that he is retiring. Ryan’s Predictions New Trainium 3 Chips, Inferentia, and Graviton Chips – New generation of AWS custom silicon across training, inference, and general compute. Expanded Model Availability in Bedrock – AWS will significantly expand the number of models available in Bedrock, potentially via partnerships or integrations with additional providers. Major Refresh to AWS Organizations – UI-based or functionality refresh providing better visibility into SCPs, OU mappings, and stack sets across organizations. Justin’s Predictions New Nova Model with Multi-modal Support – Launch of Nova Premier or Nova Sonic with multi-modal capabilities, bringing Amazon’s foundational model to the next level. OpenAI Partnership Announcement – AWS and OpenAI will announce a strategic partnership, potentially bringing OpenAI models to Bedrock (likely announced on stage). Advanced Agentic AI Capabilities for Security Hub – Enhanced features for Security Hub adding Agentic AI to help automate SOC team operations. Matt’s Predictions Model Router for Bedrock – A service to route LLM queries to different AI models, simplifying the process of testing and selecting models for different use cases. Well-Architected Framework Expansion – New lenses or significant updates to the Well-Architected Framework beyond the existing Generative AI and Sustainability lenses. End User Authentication That Doesn’t Suck – A new or significantly revamped end-user authentication service (essentially Cognito 2.0) that actually works well for client portals. Tiebreaker: How Many Times Will “AI” or “Artificial Intelligence” Be Said On Stage? If we end in a tie (or nobody gets any predictions correct, which is historically possible), we go to the tiebreaker! Host Guess Matt 200 Justin 160 Ryan 99 Jonathan 1 Honorable Mentions Ideas that didn’t make the cut but might just surprise us: Jonathan: Mathematical proof/verification that text was generated by Amazon’s LLMs (watermarking for AI output) Marketplace for AI work – publish and monetize AI-based tools with Amazon handling billing New consumer device to accompany Nova models (smarter Alexa replacement with local inference) Ryan: FinOps AI recommender for model usage and cost optimization Savings plans or committed use discounts for Bedrock use cases Matt: Sustainability/green dashboard improvements AI-specific features for Aurora or DSQL Justin: Big S3 vectors announcement and integration to Bedrock FinOps service for Kubernetes Amazon Q Developer with autonomous coding agents New GPU architecture combining training/inference/Graviton capabilities Amazon Bedrock model marketplace for revenue share on fine-tuned models Quick Hits From the Episode 00:02 – Is it really Re:Invent already? The existential crisis begins. 01:44 – Jonathan reveals why Justin always wins: “Because you read the notes.” 02:54 – Matt hasn’t been to a Re:Invent session since Image Builder launched… eight years ago. 05:03 – Jonathan comes in hot with serverless GPU support prediction. 06:57 – The inference vs. training cost debate – where’s the real ROI? 09:30 – Matt’s picks get systematically destroyed by earlier drafters. 14:09 – The OpenAI partnership prediction causes draft chaos. 16:24 – Jonathan drops the Werner retirement bombshell. 19:12 – Justin’s Security Hub prediction: “Please automate the SOC teams.” 19:46 – Everyone hates Cognito. Matt’s prediction resonates with the universe. 21:47 – Tiebreaker time: Jonathan goes with 1 out of pure spite. 24:08 – Honorable mentions include mathematical AI verification and a marketplace for AI work. Re:Invent Tips (From People Who Aren’t Going) Since none of us are attending this year, here’s what we remember from the good old days: Chalk Talks remain highly respected and valuable for deep technical content Labs and hands-on sessions are worth your time more than keynotes you can watch online Networking on the expo floor and in hallways is where the real value happens Don’t try to see everything – focus on what matters to your work Stay hydrated – Vegas is dry and conferences are exhausting Closing And that is the week in the cloud! We’re taking Thanksgiving week off, so there won’t be an episode during Re:Invent. We’ll record late that week and have a dedicated Re:Invent recap episode the following week. If you’re heading to Las Vegas, have a great time and let us know how it goes! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod
Nov 27
Welcome to episode 331 of The Cloud Pod, where the forecast is always cloudy! Jonathan, Ryan, Matt, and Justin (for a little bit, anyway) are in the studio today to bring you all the latest in cloud and AI news. This week, we’re looking at our Ignite predictions (that side gig as internet psychics isn’t looking too good) undersea cables (our fave!), plus datacenters and more. Plus Claude and Azure make a 30 billion dollar deal! Take a break from turkey and avoiding politics, and let’s take a trip into the clouds! Titles we almost went with this week GPT-5.1 Gets a Shell Tool Because Apparently We Haven’t Learned Anything From Sci-Fi Movies The Great Ingress Egress: NGINX Controller Waves Goodbye After Years of Volunteer Burnout Queue the Applause: Lambda SQS Mapping Gets a Serious Speed Boost SELECT * FROM future WHERE SQL meets AI without the prompt drama MFA or GTFO: Microsoft’s 99.6% Phishing-Resistant Authentication Achievement JWT Another Thing ALB Can Do: OAuth Validation Moves to the Load Balancer Google’s Emerging Threats Center: Because Manually Checking 12 Months of Logs Sounds Terrible EventBridge Gets a Drag-and-Drop Makeover: No More Schema Drama Permission Denied: How Granting Access Took Down the Internet Follow Up 00:51 Ignite Predictions – The Results Matt (Who is in charge of sound effects, so be aware) ACM Competitor – True SSL competitive product AI announcement in Security AI Agent (Copilot for Sentinel) – sort of (½) Azure DevOps Announcement Justin New Cobalt and Mai Gen 2 or similar – Check Price Reduction on OpenAI & Significant Prompt Caching Microsoft Foundational LLM to compete with OpenAI – Jonathan The general availability of new, smaller, and more power-efficient Azure Local hardware form factors Declarative AI on Fabric: This represents a move towards a declarative model, where users state the desired outcome, and the AI agent system determines the steps needed to achieve it within the Fabric ecosystem. Advanced Cost Management: Granular dashboards to track the token and compute consumption per agent or per transaction, enabling businesses to forecast costs and set budgets for their agent workforce. How many times will they say Copilot: The word “Copilot” is mentioned 46 to 71 times in the video. Jonathan 45 Justin: 35 Matt: 40 General News 05:13 Cloudflare outage on November 18, 2025 Cloudflar e experienced its worst outage since 2019 on November 18, 2025, lasting approximately three hours and affecting core traffic routing across its entire network. The incident was triggered by a database permissions change that caused a Bot Management feature file to double in size, exceeding hardcoded limits in their proxy software and causing system panics that resulted in 5xx errors for customers. The root cause reveals a cascading failure pattern, where a ClickHouse database query began returning duplicate column metadata after permission changes. This resulted in a significant increase in the feature file, from approximately 60 features to over 200, which exceeded the preallocated memory limit of 200 features in their Rust-based FL2 proxy code. The team initially suspected a DDoS attack due to fluctuating symptoms caused by the bad configuration file being generated every five minutes as the database cluster was gradually updated. The outage impacted multiple Cloudflare services, including their CDN, Workers KV, Access, and even their own dashboard login system through Turnstile dependencies. Customers on the older FL proxy engine did not see errors but received incorrect bot scores of zero, potentially causing false positives for those using bot blocking rules. Cloudflare’s remediation plan includes treating internal configuration files with the same validation rigor as user input, implementing more global kill switches for features, and preventing error reporting systems from consuming excessive resources during incidents. The company acknowledged this as unacceptable for their position in the Internet ecosystem and committed to architectural improvements to prevent similar failures. 06:41 Justin – “Definitely a bad outage, but I appreciate that they owned it, and owned it hard… especially considering they were front page news.” AI Is Going Great, or How ML Makes Money 07:27 Introducing GPT-5.1 for developers | OpenAI OpenAI has released GPT-5.1 in their API platform with adaptive reasoning that dynamically adjusts thinking time based on task complexity, resulting in 2-3x faster performance on simple tasks while maintaining frontier intelligence. The model includes a new “no reasoning” mode (reasoning_effort set to ‘none’) that delivers 20% better low-latency tool calling performance compared to GPT-5 minimal reasoning, making it suitable for latency-sensitive applications while supporting web search and improved parallel tool calling. GPT-5.1 introduces extended prompt caching with 24-hour retention (up from minutes), maintaining the existing 90% cost reduction for cached tokens with no additional storage charges. Early adopters report the model uses approximately half the tokens of competitors at similar quality levels, with companies like Balyasny Asset Management seeing agents run 50% faster while exceeding GPT-5 accuracy. The release includes two new developer tools in the Responses API: apply_patch for structured code editing using diffs without JSON escaping, and a shell tool that allows the model to propose and execute command-line operations in a controlled plan-execute loop. GPT-5.1 achieves 76.3% on SWE-bench Verified and shows 7% improvement on diff editing benchmarks according to early testing partners like Cline and Augment Code . OpenAI is also releasing specialized gpt-5.1-codex and gpt-5.1-codex-mini models optimized specifically for long-running agentic coding tasks, while maintaining the same pricing and rate limits as GPT-5. If you didn’t catch it in the podcast, Justin HATES this. Hates. It. All the hate. The company has committed to not deprecating GPT-5 in the API and will provide advanced notice if deprecation plans change. Pricing and rate limits are the same at GPT-5. 9:31 Ryan – “I didn’t really like GPT-5, so I don’t have high expectations, but as these things enhance, I’ve found using different models for different use cases has some advantages, so maybe I’ll find the case for this one.” 11:31 Piloting group chats in ChatGPT | OpenAI OpenAI is piloting group chat functionality in ChatGPT, starting with users in Japan, New Zealand, South Korea, and Taiwan across all subscription tiers ( Free, Go, Plus, and Pro ). The feature allows up to 20 people to collaborate in a shared conversation with ChatGPT, with responses powered by GPT-5.1 Auto that selects the optimal model based on the prompt and the user’s subscription level. ChatGPT has been trained with new social behaviors for group contexts, including deciding when to respond or stay quiet based on conversation flow, reacting with emojis, and referencing profile photos for personalized image generation. Users can mention “ChatGPT” explicitly to trigger a response, and custom instructions can be set per group chat to control tone and personality. Privacy controls separate group chats from personal conversations, with personal ChatGPT memory not shared or used in group contexts. Users must accept invitations to join, can see all participants, and can leave at any time, with group creators having special removal privileges. The feature includes safeguards for users under 18, automatically reducing sensitive content exposure for all group members when a minor is present. Parents can disable group chats entirely through parental controls , providing additional oversight for younger users. Rate limits apply only to ChatGPT responses (not user-to-user messages) and count against the subscription tier of the person ChatGPT is responding to. The feature supports search, image and file uploads, image generation, and dictation, making it functional for both personal planning and workplace collaboration scenarios. 12:41 Jonathan – “I’d rather actually have group chats enabled if kids are going to use it because at least you have witnesses to the conversation at that point.” 16:38 Gemini 3: Introducing the latest Gemini AI model from Google Google launches Gemini 3 Pro in preview across its product suite, including the Gemini app , AI Studio , Vertex AI , and a new AI Mode in Search with generative UI capabilities. The model achieves a 1501 Elo score on LMArena leaderboard and demonstrates 91.9% on GPQA Diamond, with a 1 million token context window for processing multimodal inputs including text, images, video, audio and code. Gemini 3 Deep Think mode offers enhanced reasoning performance, scoring 41.0% on Humanity’s Last Exam and 45.1% on ARC-AGI-2 with code execution. Google is providing early access to safety testers before rolling out to Google AI Ultra subscribers in the coming weeks, following comprehensive safety evaluations per their Frontier Safety Framework. Google introduces Antigravity , a new agentic development platform that integrates Gemini 3 Pro with Gemini 2.5 Computer Use for browser control and Gemini 2.5 Image for editing. The platform enables autonomous agent workflows with direct access to editor, terminal, and browser, scoring 54.2% on Terminal-Bench 2.0 and 76.2% on SWE-bench Verified for coding agent capabilities. The model shows improved long-horizon planning by topping Vending-Bench 2 leaderboard and delivers enhanced agentic capabilities through Gemini Agent for Google AI Ultra subscribers. Gemini 3 demonstrates 72.1% on SimpleQA Verified for factual accuracy and 1487 Elo on WebDev Arena for web development tasks, with availability in third-party platforms including Cursor, GitHub, JetBrains, and Replit. 18:24 Ryan – “I look forward to trying this. My initial attempts with Gemini 2.5 did not go well, but I found a sort of sweet spot in using it for planning and documentation. It’s still much better at coding than any other model that I’ve used. So cool, I look forward to using this.” 19:14 Microsoft, NVIDIA, and Anthropic announce strategic partnerships – The Official Microsoft Blog Continuing the messy breakups… Anthropic commits to $30 billion in Azure compute capacity, and up to one gigawatt of additional capacity, making this one of the largest cloud infrastructure commitments in AI history. This positions Azure as Anthropic’s primary scaling platform for Claude models. NVIDIA and Anthropic are establishing their first deep technology partnership focused on co-design and engineering optimization. Anthropic will optimize Claude models for NVIDIA Grace Blackwell and Vera Rubin systems, while NVIDIA will tune future architectures specifically for Anthropic workloads to improve performance, efficiency, and total cost of ownership. Claude models, including Sonnet 4.5 , Opus 4.1 , and Haiku 4.5 , are now available through Microsoft Foundry on Azure, making Claude the only frontier model accessible across all three major cloud platforms (AWS, Azure, GCP). Azure enterprise customers gain expanded model choice beyond OpenAI offerings. Microsoft commits to maintaining Claude integration across its entire Copilot family, including GitHub Copilot , Microsoft 365 Copilot , and Copilot Studio . This ensures developers and enterprise users can leverage Claude capabilities within existing Microsoft productivity and development workflows. NVIDIA and Microsoft are investing up to $10 billion and $5 billion, respectively, in Anthropic as part of the partnership. So yes, that’s a lot of money going back and forth. The combined $15 billion investment represents substantial backing for Anthropic’s continued development and positions all three companies to benefit from Claude’s growth trajectory. 21:57 Jonathan – “I’m wondering what Anthropic’s plan is – what they’re working on in the background – because they have just taken a huge amount of capacity from AWS and their new data center in Northern Indiana, and now another 30 billion in Azure Compute? I guess they’re still building models every day… that’s a lot of money flying around.” Cloud Tools 23:17 Ingress NGINX Retirement: What You Need to Know | Kubernetes Contributors Ingress NGINX , one of the most popular Kubernetes ingress controllers that has powered billions of requests worldwide, is being retired in March 2026 due to unsustainable maintenance burden and mounting technical debt. The project has struggled for years with only one or two volunteer maintainers working after hours, and despite its widespread use in hosted platforms and enterprise clusters, efforts to find additional support have failed. The retirement stems from security concerns around features that were once considered flexible but are now viewed as vulnerabilities, particularly the snippets annotations that allowed arbitrary NGINX configuration. The Kubernetes Security Response Committee and SIG Network exhausted all options to make the project sustainable before making this difficult decision to prioritize user safety over continuing an undermaintained critical infrastructure component. Users should immediately begin migrating to Gateway API , the modern replacement for Ingress that addresses many of the architectural issues that plagued Ingress NGINX. Existing deployments will continue to function and installation artefacts will remain available, but after March 2026, there will be zero security patches, bug fixes, or updates of any kind. Alternative ingress controllers are plentiful and listed in Kubernetes documentation , including cloud-provider-specific options and vendor-supported solutions. Users can check if they are affected by running a simple kubectl command to look for pods with the ingress-nginx selector across all namespaces. This retirement highlights a critical open source sustainability problem where massively popular infrastructure projects can fail despite widespread adoption when companies benefit from the software but do not contribute maintainer resources back to the community. 24:39 Justin – “I’m actually surprised NGINX didn’t want to pick this up; it seems like an obvious move for F5 to pick up and maintain the Ingress NGINX controller. But what do I know?” 25:46 Replicate is joining Cloudflare Cloudflare acquires Replicate , bringing its 50,000-plus model catalog and fine-tuning capabilities to Workers AI. This consolidates model discovery, deployment, and inference into a single platform backed by Cloudflare’s global network. The acquisition addresses the operational complexity of running AI models by combining Replicate’s Cog containerization tool with Cloudflare’s serverless infrastructure. Developers can now deploy custom models and fine-tune without managing GPU hardware or dependencies. Existing Replicate APIs will continue functioning without interruption while gaining Cloudflare’s network performance. Workers AI users get access to proprietary models like GPT-5 and Claude Sonnet through Replicate’s unified API alongside open-source options. The integration extends beyond inference to include AI Gateway for observability and cost analytics, plus native connections to Cloudflare’s data stack, including R2 storage and Vectorize database. This creates an end-to-end platform for building AI applications with state management and real-time capabilities. Replicate’s community features for sharing models, publishing fine-tunes, and experimentation will remain central to the platform. The acquisition positions Cloudflare to compete more directly with hyperscaler AI offerings by combining model variety with edge deployment. 27:09 Ryan – “Cloudflare has been doing kind of amazing things at the edge, which is kind of neat. We’ve had serverless and functions for a while, and definitely options out there that provide much better performance. It’s kind of neat. They’re well-positioned to do that.” 28:02 KubeCon NA 2025 Recap: The Dawn of the AI Native Era | Blog KubeCon 2025 marked the industry shift from cloud native to AI native, with CNCF launching the Kubernetes AI Conformance Program to standardize how AI and ML workloads run across clouds and hardware accelerators like GPUs and TPUs. The live demo showed Dynamic Resource Allocation making accelerators first-class citizens in Kubernetes, signaling that AI infrastructure standardization is now a community priority. Harness showcased Agentic AI capabilities that transform traditional CI/CD pipelines into intelligent, adaptive systems that learn and optimize delivery automatically. Their booth demonstrated 17 integrated products spanning CI, CD, IDP, IaCM, security, testing, and FinOps, with particular emphasis on AI-powered pipeline creation and visual workflow design that caught significant attendee interest. Security emerged as a critical theme with demonstrations of zero-CVE malware attacks that bypass traditional vulnerability scanners by compromising the build chain itself. The solution path involves supply chain attestation using SLSA, policy-as-code enforcement, and artifact signing with Sigstore, which Harness demonstrated as native capabilities in their platform. Apple introduced Apple Containerization , a framework running Linux containers directly on macOS using lightweight microVMs that boot minimal Linux kernels in under a second. This combines VM-level security with container speed, creating safer local development environments that could reshape how developers work on Mac hardware. The conference emphasized that AI native infrastructure requires intelligent scheduling, deeper observability, and verified agent identity using SPIFFE/SPIRE, with multiple sessions showing practical implementations at scale from companies like Yahoo, managing 8,000 nodes, and Spotify handling a million infrastructure resources. 29:51 Justin – “Everyone has moved on from Kubernetes as the hotness; now it’s all AI, so what are people working on in the AI space?” AWS 30:27 AWS Lambda enhances event processing with provisioned mode for SQS event-source mapping AWS Lambda now offers provisioned mode for SQS event source mapping , providing 3x faster scaling and 16x higher concurrency (up to 20,000 concurrent executions) compared to the standard polling mode. This addresses customer demands for better control over event processing during traffic spikes, particularly for financial services and gaming companies requiring sub-second latency. The new provisioned mode uses dedicated event pollers that customers can configure with minimum and maximum values, where each poller handles up to 1 MB/sec throughput, 10 concurrent invokes, or 10 SQS API calls per second. Setting a minimum number of pollers maintains baseline capacity for immediate response to traffic surges, while the maximum prevents downstream system overload. Pricing is based on Event Poller Units (EPUs) charged for the number of pollers provisioned and their duration, with a minimum of 2 event pollers required per event source mapping. Each EPU supports up to 1 MB per second throughput capacity, though AWS has not published specific per-EPU pricing on the announcement. The feature is available now in all commercial AWS Regions and can be configured through the AWS Console , CLI , or SDKs . Monitoring is handled through CloudWatch metrics, specifically the ProvisionedPollers metric that tracks active event pollers in one-minute windows. This capability enables applications to handle up to 2 GBps of aggregate traffic while automatically scaling down to the configured minimum during low-traffic periods for cost optimization. The enhanced scaling detects growing backlogs within seconds and adjusts poller count dynamically between configured limits. 31:36 Ryan – “Where was this 5 years ago when we were maintaining a logging platform? This would have been very nice!” 33:30 Amazon EventBridge introduces enhanced visual rule builder EventBridge launches a new visual rule builder that integrates the Schema Registry with a drag-and-drop canvas, allowing developers to discover and subscribe to events from over 200 AWS services and custom applications without referencing individual service documentation. The schema-aware interface helps reduce syntax errors when creating event filter patterns and rules. The enhanced builder includes a comprehensive event catalog with readily available sample payloads and schemas, eliminating the need to hunt through documentation for event structures. This addresses a common pain point: developers previously had to manually locate and understand event formats across different AWS services. Available now in all regions where Schema Registry is launched at no additional cost beyond standard EventBridge usage charges. The feature is accessible through the EventBridge console and aims to reduce development time for event-driven architectures. The visual builder particularly benefits teams building complex event-driven applications that need to filter and route events from multiple sources. By providing schema validation upfront, it helps catch configuration errors before deployment rather than during runtime. 34:46 Matt – “I definitely – back in the day – had lots of fun with EventBridge, and trying to make sure I got the schemas right for every frame when you’re trying to trigger one thing from another. So not having to deal with that mess is exponentially better. You know, at this point, though, I feel like I would just tell AI to tell me what the scheme was and solve the problem that way.” 35:43 Application loadbalancer support client credential flow with JWT verification ALB now handles JWT token verification natively at the load balancer layer, eliminating the need for custom authentication code in backend applications. This offloads OAuth 2.0 token validation, including signature verification, expiration checks, and claims validation, directly to the load balancer, reducing complexity in microservices architectures. The feature supports Client Credentials Flow and other OAuth 2.0 flows, making it particularly useful for machine-to-machine and service-to-service authentication scenarios. Organizations can now centralize token validation at the edge rather than implementing it repeatedly across multiple backend services. This capability is available immediately in all AWS regions where ALB operates, with no additional ALB feature charges beyond standard load balancer pricing. Customers pay only for the existing ALB hourly rates and Load Balancer Capacity Units (LCUs) consumed. The implementation reads JWTs from request headers and validates against configured JSON Web Key Sets (JWKS) endpoints, supporting integration with identity providers like Auth0, Okta, and AWS Cognito. Failed validation results in configurable HTTP error responses before requests reach backend targets. This addresses a common pain point in API gateway and microservices deployments, where each service previously needed its own token validation logic. The centralized approach reduces code duplication and potential security inconsistencies across service boundaries. 38:40 Jonathan – “Maybe this is kind of a sign that Cognito is not gaining the popularity they wanted. Because effectively, you could re-spin this announcement as Auth0 and Okta are now first-class citizens when it comes to authentication through API Gateway and ALB.” GCP 39:10 How Protective ReRoute improves network resilience | Google Cloud Blog Google Cloud’s Protective ReRoute (PRR) shifts network failure recovery from centralized routers to distributed endpoints, allowing hosts to detect packet loss and immediately reroute traffic to alternate paths. This host-based approach has reduced inter-datacenter outages from slow network convergence by up to 84 percent since deployment five years ago, with recovery times measured in single-digit multiples of round-trip time rather than seconds or minutes. PRR works by having hosts continuously monitor path health using TCP retransmission timeouts, then modifying IPv6 flow-label headers to signal the network to use alternate paths when failures occur. Google contributed this IPv6 flow-label modification mechanism to the Linux kernel version 4.20 and later, making it available as open source technology for the broader community. The feature is particularly critical for AI and ML training workloads, where even brief network interruptions can cause expensive job failures and restarts costing millions in compute time. Large-scale distributed training across multiple GPUs and TPUs requires the ultra-reliable data distribution that PRR provides to prevent communication pattern disruptions. Google Cloud customers can use PRR in two modes: hypervisor mode, which automatically protects cross-datacenter traffic without guest OS changes, or guest mode for the fastest recovery, requiring Linux kernel 4.20 plus, TCP applications, and IPv6 traffic, or gVNIC driver for IPv4. Documentation is available at cloud.google.com/compute/docs/networking for enabling guest-mode PRR on critical workloads. The architecture treats the network as a highly parallel system where reliability increases exponentially with available paths rather than degrading serially through forwarding stages. This approach capitalizes on Google’s network path diversity to protect real-time applications, frequent short-lived connections, and data integrity scenarios where packet loss causes corruption beyond just throughput reduction. 40:57 Ryan – “I was trying to think how I would even implement something like this in guest mode because it breaks my head. It seems pretty cool, and I’m sure that from an underlying technology at the infrastructure level, from the Google network, it sounds pretty neat. But it’s also the coordination of that failover seems very complex. And I would worry.” 41:54 Introducing the Emerging Threats Center in Google Security Operations | Google Cloud Blog Google Security Operations launches the Emerging Threats Center, a Gemini -powered detection engineering system that automatically generates security rules when new threat campaigns emerge from Google Threat Intelligence , Mandiant , and VirusTotal . The system addresses a key pain point where 59% of security leaders report difficulty deriving actionable intelligence from threat data, typically requiring days or weeks of manual work to assess organizational exposure. The platform provides two critical capabilities for security teams during major threat events: it automatically searches the previous 12 months of security telemetry for campaign-related indicators of compromise and detection rule matches, while also confirming active protection through campaign-specific detections. This eliminates the manual cross-referencing process that traditionally occurs when zero-day vulnerabilities emerge. Under the hood, the system uses an agentic workflow where Gemini ingests threat intelligence from Mandiant incident response and Google’s global visibility, generates synthetic event data mimicking adversary tactics, tests existing detection rules for coverage gaps, and automatically drafts new rules when gaps are found. Human security analysts maintain final approval before deployment, transforming detection engineering from a best-effort manual process into a systematic automated workflow. The Emerging Threats Center is available today for licensed Google Security Operations customers, though specific pricing details were not disclosed in the announcement. Organizations with high-volume security operations like Fiserv are already using the behavioral detection capabilities to move beyond single indicators toward systematic adversary behavior detection. 44:40 Jonathan – “I see this as very much a CrowdStrike-type AI solution for Google Cloud, in a way. Looking at the data, you’re identifying emerging threats, which is what CrowdStrike’s sales point really is, and then implementing controls to help quench that.” 47:56 Introducing Dhivaru and two new connectivity hubs | Google Cloud Blog Google is investing in Dhivaru, a new Trans-Indian Ocean subsea cable connecting the Maldives, Christmas Island, and Oman, extending the Australia Connect initiative to improve regional connectivity. The cable system aims to support growing AI service demand like Gemini 2.5 Flash and Vertex AI by providing resilient infrastructure across the Indian Ocean region. The announcement includes two new connectivity hubs in the Maldives and Christmas Island that will provide three core capabilities: cable switching for automatic traffic rerouting during faults, content caching to reduce latency by storing popular content locally, and colocation services offering rack space to carriers and local companies. These hubs are positioned to serve Africa, the Middle East, South Asia, and Oceania with improved reliability. Google emphasizes the energy efficiency of subsea cables compared to traditional data centers, noting that connectivity hubs require significantly less power since they focus on networking and localized storage rather than compute-intensive AI and cloud workloads. The company is exploring ways to use power demand from these hubs to accelerate local investment in sustainable energy generation in smaller locations. The connectivity hubs will provide strategic benefits by minimizing the distance data travels before switching paths, which improves resilience and reduces downtime for services across the region. This infrastructure investment aims to strengthen local economies while supporting Google’s objective of serving content from locations closer to users and customers. The project represents Google’s continued infrastructure expansion to meet long-term demand driven by AI adoption rates that are outpacing predictions, with partnerships including Ooredoo Maldives and Dhiraagu supporting the Maldives hub deployment. 49:38 Matthew – “I had to look up one connectivity hub, which is literally just a small little data center that just kind of handles basic networking and storage – and nothing fancy, which is interesting that they’re putting the two connectivity hubs. They’re dropping these hubs where all their cables terminate. So they are able to cache stuff at each location, which is always interesting.” Azure 51:46 Infinite scale: The architecture behind the Azure AI superfactory – The Official Microsoft Blog Microsoft announces its second Fairwater datacenter in Atlanta , connecting it to the Wisconsin site and existing Azure infrastructure to create what they call a planet-scale AI superfactory. The facility uses a flat network architecture to integrate hundreds of thousands of NVIDIA GB200 and GB300 GPUs into a unified supercomputer for training frontier AI models. The datacenter achieves 140kW per rack power density through closed-loop liquid cooling that uses water equivalent to 20 homes annually and is designed to last 6-plus years without replacement. The two-story building design minimizes cable lengths between GPUs to reduce latency, while the site secures 4×9 availability power at 3×9 cost by relying on resilient grid power instead of traditional backup systems. Each rack houses up to 72 NVIDIA Blackwell GPUs connected via NVLink with 1.8TB GPU-to-GPU bandwidth and 14TB pooled memory per GPU. The facility uses a two-tier Ethernet-based backend network with 800Gbps GPU-to-GPU connectivity running on SONiC to avoid vendor lock-in and reduce costs compared to proprietary solutions. Microsoft deployed a dedicated AI WAN backbone with over 120,000 new fiber miles across the US last year to connect Fairwater sites and other Azure datacenters. This allows workloads to span multiple geographic locations and enables dynamic allocation between training, fine-tuning, reinforcement learning, and synthetic data generation based on specific requirements. The architecture addresses the challenge that large training jobs now exceed single-facility power and space constraints by creating fungibility across sites. Customers can segment traffic across scale-up networks within sites and scale-out networks between sites, maximizing GPU utilization across the combined system rather than being limited to a single datacenter. 55:25 Private Preview: Azure HorizonDB Azure HorizonDB for PostgreSQL enters private preview as Microsoft’s performance-focused database offering, featuring autoscaling storage up to 128 TB and compute scaling to 3,072 vCores. The service claims up to 3 times faster performance compared to open-source PostgreSQL, positioning it as a competitor to AWS Aurora and Google Cloud AlloyDB in the managed PostgreSQL space. The 128 TB storage ceiling represents a substantial increase over Azure’s existing PostgreSQL offerings, addressing enterprise workloads that previously required sharding or migration to other platforms. This storage capacity combined with the high vCore count targets large-scale OLTP and analytical workloads that need both horizontal and vertical scaling options. Microsoft appears to be building HorizonDB as a separate service line rather than an upgrade to existing Azure Database for PostgreSQL Flexible Server, suggesting different architecture and pricing models. Organizations currently using Azure Database for PostgreSQL will need to evaluate migration paths and cost implications when the service reaches general availability. The private preview status means limited customer access and no published pricing information yet. Enterprises interested in testing HorizonDB should expect typical private preview constraints, including potential feature changes, regional limitations, and SLA restrictions before general availability. 57:35 Jonathan – “So it sounds like they’ve pretty much built what Amazon did with the Aurora, separating the storage from the compute to let them scale independently.” 59:10 Public Preview: Microsoft Defender for Cloud + GitHub Advanced Security Microsoft Defender for Cloud now integrates natively with GitHub Advanced Security in public preview, creating a unified security workflow that spans from source code repositories through production cloud environments. This integration allows security teams and developers to work within a single platform rather than switching between separate tools for code scanning and cloud protection. The solution addresses the full application lifecycle security challenge by connecting GitHub’s code-level vulnerability detection with Defender for Cloud’s runtime protection capabilities. Organizations using both GitHub and Azure can now correlate security findings from development through deployment, reducing the gap between DevOps and SecOps teams. This preview targets cloud-native application teams who need consistent security policies across their CI/CD pipeline and production workloads. The integration is particularly relevant for organizations already invested in the Microsoft and GitHub ecosystem, as it leverages existing tooling rather than requiring additional third-party solutions. The announcement provides limited details on pricing structure, though organizations should expect costs to align with existing Defender for Cloud and GitHub Advanced Security licensing models. Specific regional availability and rollout timeline details were not included in the brief announcement. 1:00:35 Matthew – “It seems like it has a lot of potential, but without the pricing and Windows for Defender as a CPM, I feel like – for me – it lacks some features, when I’ve tried to use it. They’re going in the right direction; I don’t think they’re there at the end product yet.” 1:03:05 Public Preview: Smart Tier account level tiering (Azure Blob Storage and ADLS Azure introduces Smart Tier for Blob Storage and ADLS Gen2 , which automatically moves data between hot, cool, and archive tiers based on access patterns without manual intervention. This eliminates the need for lifecycle management policies and reduces the operational overhead of managing storage costs across large data estates. The feature works at the account level rather than requiring per-container or per-blob configuration, making it simpler to deploy across entire storage accounts. Organizations with unpredictable access patterns or mixed workloads will benefit most, as the system continuously optimizes placement without predefined rules. Smart Tier monitors blob access patterns and automatically transitions objects to lower-cost tiers when appropriate, then moves them back to hot storage when access frequency increases. This differs from traditional lifecycle policies that rely on age-based rules and cannot respond dynamically to actual usage. The public preview allows customers to test the automated tiering without committing to production workloads, though specific pricing details for the Smart Tier feature itself were not disclosed in the announcement. Standard Azure Blob Storage tier pricing applies, with the hot tier being the most expensive and the archive tier offering the lowest storage costs but higher retrieval fees. This capability targets customers managing large volumes of data with variable access patterns, particularly those in analytics, backup, and archival scenarios where manual tier management becomes impractical at scale. The integration with ADLS Gen2 makes it relevant for big data and analytics workloads running on Azure. 1:05:18 Jonathan – “So they’ve always had the tiering, but now they’re providing an easy button for you based on access patterns.” 1:13:04 From idea to deployment: The complete lifecycle of AI on display at Ignite 2025 – The Official Microsoft Blog Microsoft Ignite 2025 introduces three intelligence layers for AI development: Work IQ connects Microsoft 365 data and user patterns, Fabric IQ unifies analytical and operational data under a shared business model, and Foundry IQ provides a managed knowledge system routing across multiple data sources. These layers work together to give AI agents business context rather than requiring custom integrations for each data source. Microsoft Agent Factory offers a single metered plan for building and deploying agents across Microsoft 365 Copilot and Copilot Studio without upfront licensing requirements. The program includes access to AI Forward Deployed Engineers and role-based training, targeting organizations that want to build custom agents but lack internal AI expertise or want to avoid complex provisioning processes. Microsoft Agent 365 provides centralized observability, management, and security for AI agents regardless of whether they were built with Microsoft platforms, open-source frameworks, or third-party tools. With IDC projecting 1.3 billion AI agents by 2028, this addresses the governance gap where unmanaged agents become shadow IT, integrating Defender, Entra, Purview, and Microsoft 365 admin center for agent lifecycle management. Work IQ now exposes APIs for developers to build custom agents that leverage the intelligence layer’s understanding of user workflows, relationships, and content patterns. This allows organizations to extend Microsoft 365 Copilot capabilities into their own applications while maintaining the native integration advantages rather than relying on third-party connectors. The announcements position Microsoft as providing end-to-end AI infrastructure from the datacenter to the application layer, with particular emphasis on making agent development accessible to frontline workers rather than limiting it to specialized AI teams. No specific pricing details were provided for the new services beyond the mention of metered plans for Agent Factory. Closing And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod
Nov 21
Welcome to episode 329 of The Cloud Pod, where the forecast is always cloudy (and if you’re in California, rainy too!) Justin and Matt have taken a break from Ark building activities to bring you this week’s episode, packed with all the latest in cloud and AI news, including undersea cables (our favorite!) FinOps, Ignite predictions, and so much more! Grab your umbrellas and let’s get started! Titles we almost went with this week Fastnet and Furious: AWS Lays 320 Terabits of Cable Across the Atlantic No More kubectl apply –pray: AWS Backup Takes the Stress Out of EKS Recovery AWS Gets Swift with Lambda: No Taylor Version Required Breaking Up Is Hard to Do: Microsoft Splits Teams from Office FinOps and Behold: Google Automates Your Cloud Budget Nightmares AMD Turin Around GCP’s Price-Performance with N4D VMs Azure Gets Territorial: Your Data Stays Put Whether It Likes It or Not AWS Finally Answers “Is It Available in My Region?” Before You Build It Getting to the Bare Metal of Things: Google’s Axion Goes Commando Azure Ultra Disk Gets Ultra Serious About Latency Container Size Matters: Azure Expands ACI to 240 GB Memory Google Containerises Chaos: Agent Sandbox Keeps Your AI from Going Rogue AWS Prints Money While Amazon Prints Pink Slips: Q3 Earnings Beat Follow Up 02:08 Microsoft sidesteps hefty EU fine with Teams unbundling deal Microsoft avoids a potentially substantial EU antitrust fine by agreeing to unbundle Teams from the Office 365 and Microsoft 365 suites for a period of seven years. The settlement follows a 2023 complaint from Salesforce -owned Slack alleging anticompetitive bundling practices that harmed rival collaboration tools. The commitments require Microsoft to offer Office and Microsoft 365 suites without Teams at reduced prices, with a 50 percent larger price difference between bundled and unbundled versions. Customers with long-term licenses can switch to Teams-free suites, addressing concerns about forced adoption of the collaboration platform. Microsoft must provide interoperability between competing collaboration tools and its products, plus enable data portability from Teams to rival services. These technical requirements aim to level the playing field for competitors like Slack and Zoom in the European enterprise collaboration market. The settlement applies specifically to the European Union market and stems from Microsoft’s dominant position in productivity software. Organizations using Microsoft 365 in the EU will now have a genuine choice in selecting collaboration tools without being locked into Teams through bundling. This decision sets a precedent for how cloud software vendors can package integrated services, particularly when holding dominant market positions. The seven-year commitment period and mandatory interoperability requirements could influence how Microsoft and competitors structure product offerings globally. General News 08:30 It’s Earnings Time! (Warning: turn down your volume) Amazon’s stock soars on earnings, revenue beat, spending guidance Yes, we know there’s a little delay in our reporting here, but it’s still important! (To Justin, anyway.) AWS grew revenue 20% year-over-year to $33 billion in Q3, generating $11.4 billion in operating income, which represents two-thirds of Amazon’s total operating profit. While this growth trails Google Cloud’s 34% and Azure’s 40%, AWS maintains its position as the leading cloud infrastructure provider. Amazon increased its 2025 capital expenditure forecast to $125 billion, up from $118 billion, with CFO Brian Olsavsky indicating further increases expected in 2026. This spending exceeds Google , Meta , and Microsoft’s capex guidance and signals Amazon’s commitment to AI infrastructure despite concerns about missing out on high-profile AI cloud deals. Amazon’s Q4 revenue guidance of $206-213 billion (midpoint $209.5 billion) exceeded analyst expectations of $208 billion, driven by strong performance in both AWS and the digital advertising business, which grew 24% to $17.7 billion. The company’s overall revenue reached $180.17 billion, beating estimates of $177.8 billion. The company announced 14,000 corporate layoffs this week, which CEO Andy Jassy attributed to organizational culture and reducing bureaucratic layers rather than financial pressures or AI automation. Amazon’s total workforce stands at 1.58 million employees, representing a 2% year-over-year increase despite the cuts. 06:14 Justin – “There’s a lot of investors starting to question some of the dollars being spent on (AI). It’s feeling very .com boom-y. Let’s not do that again.” 06:46 Alphabet stock jumps 4% after strong earnings results, boost in AI spend Alphabet increased AI infrastructure spending guidance to $91-93 billion for the year, up from $85 billion previously, driven by strong Google Cloud demand. CEO Sundar Pichai reported a $155 billion backlog for Google Cloud at quarter’s end, with CFO signaling significant capex increases expected in 2026. Google Cloud contributed to Alphabet’s first-ever $100 billion revenue quarter, with total Q3 revenue reaching $102.35 billion and beating analyst expectations by $2.5 billion. The company’s earnings of $3.10 per share significantly exceeded the $2.33 analyst consensus. Google Search revenue grew 15% year-over-year to $56.56 billion, indicating that AI integration in search is proving to be an opportunity rather than a threat to the core business. Analysts noted this addresses previous concerns about AI disrupting Google’s search dominance. Wall Street analysts raised price targets substantially following the results, with Goldman Sachs increasing from $288 to $330 and JPMorgan raising from $300 to $340. Deutsche Bank characterized the earnings as having virtually no negative aspects across any business segment. 08:03 Matt – “The 15 % of revenue for Google search year over year feels like a massive growth, but I still don’t really understand how they track that. It’s not like there’s 15 % more people using Google than before, but that’s the piece I don’t really understand still.” 08:27 Microsoft (MSFT) Q1 2026 earnings report Microsoft Azure revenue grew 40% year-over-year in Q1 fiscal 2026 , beating analyst expectations of 38.2% growth and driving the Intelligent Cloud segment to $30.9 billion in total revenue. The company’s AI infrastructure investments continue to pay off as Azure cloud services reached over $75 billion in annual revenue for fiscal 2025. Microsoft took a $3.1 billion accounting hit to net income this quarter related to its OpenAI investment, equivalent to 41 cents per 41-cent-per-share impact on earnings. Despite this, the company still beat earnings expectations at $3.72 per share versus the expected $3.67, with overall revenue reaching $77.67 billion. Capital expenditure spending came in at $34.9 billion for the quarter, and CFO Amy Hood indicated that capex growth will accelerate throughout fiscal 2026 rather than slow down as previously suggested. This aggressive infrastructure spending caused the stock to drop 4% in after-hours trading despite the strong revenue performance. Microsoft now holds a 27% stake in OpenAI’s for-profit entity worth approximately $135 billion, following the company’s restructuring announcement . This formalized partnership structure clarifies the relationship between the two companies as Azure continues to serve as the primary infrastructure platform for OpenAI’s services. The quarter’s results were overshadowed by a significant Azure and Microsoft 365 outage that occurred on the same day as earnings, affecting various websites and gaming services for several hours. Microsoft expects full recovery by evening, but the timing highlights ongoing reliability concerns as the company scales its cloud infrastructure. 09:27 Azure Front Door RCA What happened: Azure Front Door and CDN experienced an 8+ hour outage (Oct 29-30, 2025), causing connection timeouts and DNS failures across numerous Azure and Microsoft services, including Azure Portal, Microsoft 365, Entra ID, and many others. Root cause: A valid customer configuration change exposed a latent bug when processed across different control plane versions, creating incompatible metadata that crashed data plane services. The crash occurred asynchronously (~5 minutes delayed), allowing it to pass through safety checks undetected. Why it spread globally: The defective configuration propagated to all edge sites within 4 minutes (15:39 UTC) and was mistakenly saved as the “Last Known Good” snapshot before crashes began appearing at 15:41 UTC, making rollback impossible. Recovery approach: Rather than reverting to the corrupted LKG, Microsoft manually removed problematic configurations and performed a careful phased redeployment across all edge sites, completing full mitigation by 00:05 UTC (~8.5 hours total). Prevention measures: Microsoft has completed synchronous config processing, added pre-canary validation stages, reduced recovery time from 4.5 hours to 1 hour, and is working on traffic isolation and further improvements through mid-2026. Are you interested in the video version of this information? You can find that here . 14:23 PREDICTIONS FOR IGNITE Matt ACM Competitor – True SSL competitive product AI announcement in Security AI Agent (Copilot for Sentinel) Azure DevOps Announcement Justin New Cobalt and Mai Gen 2 or similar Price Reduction on OpenAI & Significant Prompt Caching Microsoft Foundational LLM to compete with OpenAI Jonathan (who isn’t here) The general availability of new, smaller, and more power-efficient Azure Local hardware form factors Declarative AI on Fabric: This represents a move towards a declarative model, where users state the desired outcome, and the AI agent system determines the steps needed to achieve it within the Fabric ecosystem. Advanced Cost Management: Granular dashboards to track the token and compute consumption per agent or per transaction, enabling businesses to forecast costs and set budgets for their agent workforce. How many times will they say Copilot: Jonathan Justin: 35 Matt: 40 Honorable Claude: Claude for Azure AI Autonomous Agent Platform 23:00 Matt – “ Cloud Tools 26:47 Apptio expands its FinOps tools for cloud cost control – SiliconANGLE IBM-owned Apptio launches Cloudability Governance with Terraform integration to provide real-time cost estimation and policy compliance at deployment time. Platform engineers can now see cost impacts before deploying infrastructure through version control systems like GitHub, addressing the problem where 55% of business leaders lack adequate visibility into technology spending ROI. Kubecost 3.0 adds GPU-specific monitoring capabilities through Nvidia’s Data Center GPU Manager exporter , providing utilization and memory metrics critical for AI workloads. The container-agnostic platform works across on-premises and cloud Kubernetes environments, with bidirectional integration into Cloudability’s FinOps suite for unified cost visibility. The platform addresses common tagging blind spots by automatically identifying resource initiators and applying ownership tags when teams forget. It also supports synthetic tags that map to business units, processing trillions of rows of cost data monthly to detect over-provisioning and committed instance discount opportunities. AI workload acceleration has increased the velocity of cloud spending rather than creating new blind spots, with GPU costs potentially reaching thousands of dollars per hour. Real-time visibility becomes essential when infrastructure costs can scale this rapidly, making proactive cost governance more important than reactive monitoring. The Terraform integration positions Apptio to intercept infrastructure deployments before they happen, shifting FinOps from reactive cost analysis to proactive cost prevention. This represents a meaningful evolution in cloud cost management by embedding financial controls directly into the infrastructure provisioning workflow. 33:03 Matt – “I’ve set these up in my pipelines before… It’s always nice to see, and it’s good if you’re launching net new, but for general PR, it’s just more noise. It kind of needed these tools.” AWS 28:44 AWS rolls out Fastnet subsea cable connecting the U.S. and Ireland AWS announces Fastnet, a dedicated transatlantic subsea cable connecting Maryland to County Cork, Ireland, with 320+ terabits per second capacity when operational in 2028. The system uses unique landing points away from traditional cable corridors to provide route diversity and network resilience for AWS customers running cloud and AI workloads. The cable features advanced optical switching branching unit technology that allows future topology changes and can redirect data to new landing points as network demands evolve. This architecture specifically targets growing AI traffic loads and integrates directly with AWS services like CloudFront and Global Accelerator for rapid data rerouting. AWS’s centralized traffic monitoring system provides complete visibility across the global network and implements millions of daily optimizations to route customer traffic along the most performant paths. This differs from public internet routing, where individual devices make decisions with limited network visibility, helping avoid congestion before it impacts applications. The infrastructure investment includes Community Benefit Funds for both Maryland’s Eastern Shore and County Cork to support local initiatives, including STEM education, workforce development, and sustainability programs. AWS worked with local organizations and residents from project inception to align the deployment with community priorities. With this addition, AWS’s global fiber network now spans over 9 million kilometers of terrestrial and subsea cabling across 38 regions and 120 availability zones. The automated network management tools resolve 96 percent of network events without human intervention through services like Elastic Load Balancing and CloudWatch. 29:24 Matt – “The speed of this is ridiculous. 320 plus terabytes per second – that is a lot of data to go at once!” 30:20 Introducing AWS Capabilities by Region for easier Regional planning and f aster global deployments | AWS News Blog AWS launched Capabilities by Region , a new planning tool that lets you compare service availability, API operations, CloudFormation resources, and EC2 instance types across multiple AWS Regions simultaneously. The tool addresses a common customer pain point by providing visibility into which AWS features are available in different Regions and includes forward-looking roadmap information showing planned launch quarters. The tool helps solve practical deployment challenges like ensuring compliance with data residency requirements, planning disaster recovery architectures, and avoiding costly rework from discovering Regional limitations mid-project. You can filter results to show only common features available across all selected Regions, making it easier to design portable architectures. Beyond the web interface, AWS made the Regional capability data accessible through the AWS Knowledge MCP Server, enabling automation of Region expansion planning and integration into CI/CD pipelines. The MCP server is publicly accessible at no cost without requiring an AWS account, though it is subject to rate limits. The tool provides detailed visibility into infrastructure components, including specific EC2 instance types like Graviton-based and GPU-enabled variants, helping you verify whether specialized compute resources are available in target Regions before committing to an architecture. This level of granularity extends to CloudFormation resource types and individual API operations for services like DynamoDB and API Gateway . 30:36 Justin – “Thank you. I’ve wanted this for a long time. You put it in a really weird UI choice, but I do appreciate that it’s there.” 32:10 Secure EKS clusters with the new support for Amazon EKS in AWS Backup | AWS News Blog AWS Backup now supports Amazon EKS clusters , providing centralized backup and restore capabilities for both Kubernetes configurations and persistent data stored in EBS , EFS , and S3 . This eliminates the need for custom scripts or third-party tools that previously required complex maintenance across multiple clusters. The service includes policy-based automation for protecting single or multiple EKS clusters with immutable backups to meet compliance requirements. During restore operations, AWS Backup can now provision a new EKS cluster automatically based on previous configuration settings, removing the requirement to pre-provision target infrastructure. Restore operations are non-destructive, meaning they apply only the delta between backup and source rather than overwriting existing data or Kubernetes versions. Customers can restore full clusters, individual namespaces to existing clusters, or specific persistent storage resources if partial backup failures occur. The feature is available in all AWS commercial regions except China and in AWS GovCloud US, where both AWS Backup and Amazon EKS are supported. Pricing follows standard AWS Backup rates based on backup storage consumed and data transfer, with costs varying by region and storage tier. Salesforce highlighted the business impact, noting that losing a Kubernetes control plane due to software bugs or accidental deletion can be catastrophic without proper backup capabilities. This native integration addresses a critical resiliency gap for organizations running production EKS workloads at scale. 33:07 Matt – “It’s the namespace level that they can deploy or backup and restore to that, to me, is great. I could see this being a SaaS company that runs their application in Kubernetes, and they have a namespace per customer, and having that ability to have that single customer backed up and be able to restore that is fantastic. So while it sounds like a minor release, if you’re in the Kubernetes ecosystem, it will just make your life better.” 33:53 Jupyter Deploy: Create a JupyterLab application with real-time collaboration in the cloud in minutes | AWS Open Source Blog Jupyter Deploy is an open source CLI tool from AWS that lets small teams and startups deploy a fully configured JupyterLab environment to the cloud in minutes, solving the problem of expensive enterprise deployment frameworks. The tool automatically sets up EC2 instances with HTTPS encryption, GitHub OAuth authentication, real-time collaboration features, and a custom domain without requiring manual console configuration. The CLI uses infrastructure-as-code templates with Terraform to provision AWS resources, making it simple to upgrade instance types for GPU workloads, add storage volumes, or manage team access through a single command. Users can easily scale from a basic t3.medium instance to GPU-accelerated instances when they need more compute power for deep learning tasks. Real-time collaboration is a key differentiator, allowing multiple team members to work simultaneously in the same JupyterLab environment after authenticating through GitHub, eliminating the security and access limitations of running Jupyter locally on laptops. The tool includes cost management features like the ability to stop instances when not in use while preserving state and file systems. The project is vendor-neutral and extensible, with AWS planning to add Kubernetes templates for Amazon EKS and welcoming community contributions for other cloud providers, OAuth providers, and deployment patterns. Templates are distributed as Python libraries that the CLI automatically discovers, making it easy for the community to create and share new deployment configurations. 34:51 Justin – “A lot of people, especially in their AI workloads, they don’t want to use SageMaker for that necessarily; they want their own deployment of a cluster. And so there was just some undifferentiated heavy lifting that was happening, and so I think this helps address some of that.” GCP 35:09 Agentic AI on Kubernetes and GKE | Google Cloud Blog Agent Sandbox is a new Kubernetes primitive designed specifically for running AI agents that need to execute code or use computer interfaces , providing kernel-level isolation through gVisor and Kata Containers. This addresses the security challenge of AI agents making autonomous decisions about tool usage, where traditional application security models fall short. On GKE, Agent Sandbox delivers sub-second latency for isolated agent workloads through pre-warmed sandbox pools, representing up to 90% improvement over cold starts. The managed implementation leverages GKE Sandbox and container-optimized compute for horizontal scaling of thousands of ephemeral sandbox environments. Pod Snapshots is a GKE-exclusive feature in limited preview that enables checkpoint and restore of running pods, reducing startup times from minutes to seconds for both CPU and GPU workloads. This allows teams to snapshot idle sandboxes and suspend them to save compute costs while maintaining the ability to quickly restore them to a specific state. The project includes a Python SDK designed for AI engineers to manage sandbox lifecycles without requiring deep infrastructure expertise, while still providing Kubernetes administrators with operational control. Agent Sandbox is available as an open source CNCF project and can be deployed on GKE today, with documentation at agent-sandbox.sigs.k8s.io. Primary use cases include agentic AI systems that need to execute generated code safely, reinforcement learning environments requiring rapid provisioning of isolated compute, and computer use scenarios where agents interact with terminals or browsers. The isolation model prevents potential data exfiltration or damage to production systems from non-deterministic agent behavior. 36:49 Matt – “Anything that can make these environments, especially if they are ephemeral, scale up and down better so you’re not burning time and capacity on your GPUs – that are not cheap – is definitely useful. So it’d be a nice little money saver along the way.” 37:09 Ironwood TPUs and new Axion-based VMs for your AI workloads | Google Cloud Blog Google announces Ironwood, its seventh-generation TPU, delivering 10X peak performance improvement over TPU v5p and 4X better performance per chip than TPU v6e for both training and inference workloads. The system scales up to 9,216 chips in a superpod with 9.6 Tb/s interconnect speeds and 1.77 petabytes of shared HBM, featuring Optical Circuit Switching for automatic failover. Anthropic plans to access up to 1 million TPUs and reports that the performance gains will help scale Claude efficiently. New Axion-based N4A instances enter preview, offering up to 2X better price-performance than comparable x86 VMs for general-purpose workloads like microservices, databases, and data preparation. C4A metal, Google’s first Arm-based bare metal instance, will launch in preview soon for specialized workloads requiring dedicated physical servers. Early customers report 30% performance improvements for video transcoding at Vimeo and 60% better price-performance for data processing at ZoomInfo. Google positions Ironwood and Axion as complementary solutions for the age of inference, where agentic workflows require coordination between ML acceleration and general-purpose compute. The AI Hypercomputer platform integrates both with enhanced software, including GKE Cluster Director for TPU fleet management, MaxText improvements for training optimization, and vLLM support for switching between GPUs and TPUs. According to IDC, AI Hypercomputer customers achieved 353% three-year ROI and 28% lower IT costs on average. The announcement emphasizes system-level co-design across hardware, networking, and software, building on Google’s custom silicon history, including TPUs that enabled the Transformer architecture eight years ago. Ironwood uses advanced liquid cooling deployed at a gigawatt scale with 99.999% fleet-wide uptime since 2020, while the Jupiter data center network connects multiple superpods into clusters of hundreds of thousands of TPUs. Customers can sign up for Ironwood, N4A, and C4A metal preview access through Google Cloud forms. 38:57 Automate financial governance policies using Workload Manager | Google Cloud Blog Google has enhanced Workload Manager to automate FinOps cost governance policies across GCP organizations, allowing teams to codify financial rules using Open Policy Agent Rego and run continuous compliance scans. The tool includes predefined rules for common cost management scenarios like enforcing resource labels, lifecycle policies on Cloud Storage buckets, and data retention settings, with results exportable to BigQuery for analysis and visualization in Looker Studio. The pricing update is significant, with Google reducing Workload Manager costs by up to 95 percent for certain scenarios and introducing a small free tier for testing. This makes large-scale automated policy scanning more economical compared to manual auditing processes that can take weeks or months while costs accumulate. The automation addresses configuration drift where systems deviate from established cost policies, enabling teams to define rules once and scan entire organizations, specific folders, or individual projects on schedules ranging from hourly to monthly. Integration with notification channels, including email, Slack, and PagerDuty, ensures policy violations reach the appropriate teams for remediation. Organizations can use custom rules from the GitHub repository or leverage hundreds of Google-authored best practice rules covering FinOps, security, reliability, and operations. The BigQuery export capability provides historical compliance tracking and supports showback reporting for cost allocation across teams and business units. 40:06 Matt – “Having that very quick, rapid response to know that something changed and you need to go look at it before you get a 10 million dollar bill is critical.” Azure 41:50 Generally Available: Azure MCP Server Azure MCP Server provides a standardized way for AI agents and developers to interact with Azure services through the Model Context Protocol. This creates a consistent interface layer across services like AKS, Azure Container Apps , App Service, Cosmos DB , SQL Database, and AI Foundry , reducing the need to learn individual service APIs. The MCP implementation allows developers to build AI agents that can programmatically manage and query Azure resources using natural language or structured commands. This bridges the gap between conversational AI interfaces and cloud infrastructure management, enabling scenarios like automated resource provisioning or intelligent troubleshooting assistants. The server architecture provides secure, authenticated access to Azure services while maintaining standard Azure RBAC controls. This means AI agents operate within existing security boundaries and permissions frameworks rather than requiring separate authentication mechanisms. Primary use cases include DevOps automation, intelligent cloud management tools, and AI-powered development assistants that need direct Azure service integration. Organizations building copilots or agent-based workflows can now connect to Azure infrastructure without custom API integration work for each service. The feature is generally available across Azure regions where the underlying services operate. Pricing follows standard Azure service consumption models for the resources accessed through MCP, with no additional charge for the MCP Server interface itself. 42:50 Matt – “So I like the idea of this, and I like it for troubleshooting and stuff like this, but the idea of using it to provision resources terrifies me. Maybe in development environments, ‘Hey, I’m setting up a three-tier web application, spin me up what I need.’ But if you’re doing this for a company, I really worry about speaking in natural language, and consistently getting the same result to spin up resources.” 45:50 A new era and new features in Azure Ultra Disk Azure Ultra Disk receives substantial performance and cost optimization updates focused on mission-critical workloads. The service now delivers an 80% reduction in P99.9 and outlier latency, plus a 30% improvement in average latency, making it suitable for transaction logs and I/O-intensive applications that previously required local SSDs or Write Accelerator. New flexible provisioning model enables significant cost savings with workloads on small disks, saving up to 50% and large disks up to 25%. Customers can now independently adjust capacity, IOPS, and throughput with more granular control, allowing a financial database example to reduce Ultra Disk spending by 22% while maintaining required performance levels. Instant Access Snapshot feature enters public preview for Ultra Disk and Premium SSD v2, eliminating traditional wait times for snapshot readiness. New disks created from these snapshots hydrate up to 10x faster with minimal read latency impact during hydration, enabling rapid recovery and replication for business continuity scenarios. Ultra Disk now supports Azure Boost VMs, including Ebdsv5 series (GA with up to 400,000 IOPS and 10GB/s) and Memory Optimized Mbv3 VM Standard_M416bs_v3 (GA with up to 550,000 IOPS and 10GB/s). Additional Azure Boost VM announcements are planned for 2025 Ignite with further performance improvements for remote block storage. Recent feature additions include live resize capability, encryption at host support, Azure Site Recovery and VM Backup integration, and shared disk capability for SCSI Persistent Reservations. Third-party backup and disaster recovery services now support Ultra Disk for customers with existing tooling preferences. 47:38 Matt – “There wasn’t any encryption at the host level? Clearly I make bad life choices being in Azure, but not THAT bad of choices.” 48:21 Announcing General Availability of Larger Container Sizes on Azure Container Instances | Microsoft Community Hub Azure Container Instances now supports container sizes up to 31 vCPUs and 240 GB of memory for standard containers, expanding from the previous 4 vCPUs and 16 GB limits. This applies across standard containers, confidential containers, virtual network-enabled containers, and AKS virtual nodes, though confidential containers max out at 180 GB memory. The larger sizes target data-intensive workloads like real-time fraud detection, predictive maintenance, collaborative analytics in healthcare, and high-performance computing tasks such as climate modeling and genomic research. Organizations can now run fewer, larger containers instead of managing multiple smaller instances, simplifying scaling operations. Customers must request quota approval through Azure Support before deploying containers exceeding 4 vCPUs and 16 GB, then can deploy via Azure Portal, CLI, PowerShell, ARM templates, or Bicep. The serverless nature maintains ACI’s pay-per-use pricing model, though specific costs for larger SKUs are not detailed in the announcement. This positions ACI as a more viable alternative to managed Kubernetes for workloads that need substantial compute resources but don’t require full orchestration complexity. The enhancement particularly benefits scenarios where confidential computing is required, as those containers can now scale to 31 vCPUs with 180 GB memory while maintaining security boundaries. 49:40 Generally Available: Geo/Object Priority Replication for Azure Blob Geo Priority Replication is now generally available for Azure Blob Storage , providing accelerated data replication between primary and secondary regions for GRS and GZRS storage accounts with an SLA-backed guarantee. This addresses a longstanding customer request for predictable replication timing in geo-redundant storage scenarios. The feature specifically targets customers with compliance requirements or business continuity needs that demand faster recovery point objectives (RPO) for their geo-replicated data. Organizations in regulated industries like finance and healthcare can now better meet data availability requirements with measurable replication performance. This enhancement works within the existing GRS and GZRS storage account types, meaning customers can enable it on current deployments without migrating to new account types. The SLA backing represents a shift from best-effort replication to guaranteed performance metrics for secondary region data synchronization. The announcement appears truncated with incomplete SLA details, but the core value proposition centers on reducing the uncertainty around when data becomes available in secondary regions during normal operations. This matters for disaster recovery planning, where organizations need to calculate realistic RPO values rather than relying on variable replication times. Pricing details were not included in the announcement, though this feature likely carries additional costs beyond standard GRS or GZRS storage rates, given the performance guarantees involved. Customers should review Azure pricing documentation for specific cost implications before enabling geo priority replication. Closing And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod
Nov 12
Welcome to episode 329 of The Cloud Pod, where the forecast is always cloudy! Justin, Jonathan, and special guest Elise are in the studio to bring you all the latest in AI and cloud news, including – you guessed it – more outages, and more OpenAI team-ups. We’ve also got GPUs, K8 news, and Cursor updates. Let’s get started! Titles we almost went with this week Azure Front Door: Please Use the Side Entrance – el -jb Azure and NVIDIA: A Match Made in GPU Heaven – mk Azure Goes Down Under the Weight of Its Own Configuration – el GitHub Turns Your Copilot Subscription Into an All-You-Can-Eat Agent Buffet – mk, el Microsoft Goes Full Blackwell: No Regrets, Just GPUs Jules Verne Would Be Proud: Google’s CLI Goes 20,000 Bugs Under the Codebase RAG to Riches: AWS Makes Retrieval Augmented Generation Turnkey Kubectl Gets a Gemini Twin: Google Teaches AI to Speak Kubernetes I’m Not a Robot: Azure WAF Finally Learns to Ask the Important Questions OpenAI Puts 38 Billion Eggs in Amazon’s Basket: Multi-Cloud Gets Complicated The Root Cause They’ll Never Root Out: Why Attrition Stays Off the RCA Google’s New Extension Lets You Deploy Kubernetes by Just Asking Nicely Cursor 2.0: Now With More Agents Than a Hollywood Talent Agency Follow Up 04:46 Massive Azure outage is over, but problems linger – here’s what happened | ZDNET Azure experienced a global outage on October 29, affecting all regions simultaneously, unlike the recent AWS outage that was limited to a single region. The incident lasted approximately eight hours from noon to 8 PM ET, impacting major services including Microsoft 365 , Teams , Xbox Live , and critical infrastructure for Alaska Airlines, Vodafone UK, and Heathrow Airport, among others. The root cause was an inadvertent tenant configuration change in Azure Front Door that bypassed safety validations due to a software defect. Microsoft’s protection mechanisms failed to catch the erroneous deployment, allowing invalid configurations to propagate across the global fleet and cause HTTP timeouts, server errors, and elevated packet loss at network edges. Recovery required rolling back to the last known good configuration and gradually rebalancing traffic across nodes to prevent overload conditions. Some customers experienced lingering issues even after the official recovery time, with Microsoft temporarily blocking configuration changes to Azure Front Door while completing the restoration process. The incident highlights concentration risk in cloud infrastructure, as this marks the second major cloud provider outage in October 2025. Despite Azure revenue growing 40 percent in the latest quarterly report, Microsoft’s stock declined in after-hours trading as the company acknowledged capacity constraints in meeting AI and cloud demands. Affected Azure services included App Service, Azure SQL Database, Microsoft Entra ID, Container Registry, Azure Databricks, and approximately 15 other core platform services. Microsoft has implemented additional validation and rollback controls to prevent similar configuration deployment failures, though the full post-incident report remains pending. 07:06 Matt – “The fact that you’re plus one week and still can’t actually make changes or even do simple things like purge a cache makes me think this is a lot bigger on the backend than they let on at the beginning.” AI Is Going Great – Or How ML Makes Money 08:30 AWS and OpenAI announce multi-year strategic partnership | OpenAI AWS and OpenAI formalized a 38 billion dollar multi-year partnership providing OpenAI immediate access to hundreds of thousands of NVIDIA GPUs (GB200s and GB300s) clustered via Amazon EC2 UltraServers , with capacity deployment targeted by the end of 2026. The infrastructure supports both ChatGPT inference serving and next-generation model training with the ability to scale to tens of millions of CPUs for agentic workloads. The partnership builds on existing integration where OpenAI’s open weight foundation models became available on Amazon Bedrock earlier this year, making OpenAI one of the most popular model providers on the platform. Thousands of customers, including Thomson Reuters, Peloton, and Verana Health, are already using these models for agentic workflows, coding, and scientific analysis. AWS positions this as validation of their large-scale AI infrastructure capabilities, noting they have experience running clusters exceeding 500,000 chips with the security, reliability, and scale required for frontier model development. The low-latency network architecture of EC2 UltraServers enables optimal performance for interconnected GPU systems. This represents a significant shift in OpenAI’s infrastructure strategy, moving substantial compute workloads to AWS while maintaining its existing Microsoft Azure relationship. The seven-year commitment timeline with continued growth provisions indicates long-term capacity planning for increasingly compute-intensive AI model development. 09:53 Elise – “It sort of feels like OpenAI has a strategic partnership with everyone right now, so I’m sure this will help them, just like everything else that they have done will help them. We’re banking a lot on OpenAI being very successful.” 17:11 Google removes Gemma models from AI Studio after GOP senators complaint – Ars Technica Google removed its open Gemma AI models from AI Studio following a complaint from Senator Marsha Blackburn , who reported the model hallucinated false sexual misconduct allegations against her when prompted with leading questions. The model allegedly fabricated detailed false claims and generated fake news article links, demonstrating the persistent hallucination problem across generative AI systems. The removal only affects non-developer access through AI Studio’s user interface, where model behavior tweaking tools could increase hallucination likelihood. Developers can still access Gemma through the API and download models for local development, suggesting Google is limiting casual experimentation rather than pulling the model entirely. This incident highlights the ongoing challenge of AI hallucinations in production systems, which no AI firm has successfully eliminated despite mitigation efforts. Google’s response indicates a shift toward restricting open model access when inflammatory outputs could result from user prompting, potentially setting a precedent for how cloud providers handle politically sensitive AI failures. The timing follows congressional hearings where Google defended its hallucination mitigation practices, with the company’s representative acknowledging these issues are widespread across the industry. This creates a tension between open model availability and liability concerns when models generate defamatory content, particularly affecting cloud-based AI platforms. 23:00 Matt – “That’s everything on the internet, though. When Wikipedia first came out and you started using it, we were told you can’t reference Wikipedia, because who knows what was put on there…you can’t blindly trust.” Cloud Tools 26:53 Introducing Agent HQ: Any agent, any way you work – The GitHub Blog GitHub launches Agent HQ as a unified platform to orchestrate multiple AI coding agents from Anthropic , OpenAI , Google, Cognition , and xAI directly within GitHub and VS Code , all included in paid Copilot subscriptions. This eliminates the fragmented experience of juggling different AI tools across separate interfaces and subscriptions. Mission Control provides a single command center across GitHub, VS Code, mobile, and CLI to assign work to different agents in parallel, track their progress, and manage agent identities and permissions just like human team members. The system maintains familiar Git primitives like pull requests and issues while adding granular controls over when CI runs on agent-generated code. VS Code gets Plan Mode for building step-by-step task approaches with clarifying questions before code generation, plus AGENTS.md files for creating custom agents with specific rules like preferred logging frameworks or testing patterns. It’s the only editor supporting the full Model Context Protocol specification with one-click access to the GitHub MCP Registry for integrating tools like Stripe, Figma, and Sentry. GitHub Code Quality in public preview now provides org-wide visibility into code maintainability and reliability, with Copilot automatically reviewing its own generated code before developers see it to catch technical debt early. Enterprise admins get a new control plane for governing AI access, setting security policies, and viewing Copilot usage metrics across the organization. The platform keeps developers on GitHub’s existing compute infrastructure, whether using GitHub Actions or self-hosted runners, avoiding vendor lock-in while OpenAI Codex becomes available this week in VS Code Insiders for Copilot Pro+ users as the first partner agent. 27:20 Jonathan- “I’m like the different interfaces; they all bring something a little different.” 31:55 Cursor introduces its coding model alongside multi-agent interface – Ars : Technica Cursor launches version 2.0 of its IDE with Composer , its first competitive in-house coding model built using reinforcement learning and mixture-of-experts architecture. The company claims Composer is 4x faster than similarly intelligent models while maintaining competitive intelligence levels with frontier models from OpenAI, Google, and Anthropic. The new multi-agent interface in Cursor 2.0 allows developers to run multiple AI agents in parallel for coding tasks, expanding beyond the single-agent workflow that has been standard in AI-assisted development environments. This represents a shift toward more complex, distributed AI assistance within the IDE. Cursor’s internal benchmarking shows Composer prioritizes speed over raw intelligence, outperforming competitors significantly in tokens per second while slightly underperforming the best frontier models in intelligence metrics. This positions it as a practical option for developers who need faster code generation and iteration cycles. The IDE maintains its Visual Studio Code foundation while deepening LLM integration for what Cursor calls vibe coding, where AI assistance is more directly embedded in the development workflow. Previously, Cursor relied entirely on third-party models, making this its first attempt at vertical integration in the AI coding assistant space. 33:03 Elise- “Cursor had an agent built, and I thought it was ok, but it was wrong a lot. The 2.0 agent seems fabulous, comparatively, and a lot faster.” AWS 43:25 The Model Context Protocol (MCP) Proxy for AWS is now generally available AWS has released the Model Context Protocol (MCP) Proxy for AWS, a client-side proxy that enables MCP clients to connect to remote AWS-hosted MCP servers using AWS SigV4 authentication. The proxy works with popular AI development tools like Amazon Q Developer CLI, Cursor , and Kiro , allowing developers to integrate AWS service interactions directly into their agentic AI workflows. The proxy enables developers to access AWS resources like S3 buckets and RDS tables through MCP servers while maintaining AWS security standards through SigV4 authentication. It includes built-in safety controls such as read-only mode to prevent accidental changes, configurable retry logic for reliability, and logging capabilities for troubleshooting issues. The MCP Proxy bridges the gap between local AI development tools and AWS-hosted MCP servers, particularly those built on Amazon Bedrock AgentCore Gateway or Runtime. This allows AI agents and developers to extend their workflows to include AWS service interactions without manually handling authentication and protocol communications. Installation options are flexible, supporting deployment from source, Python package managers, or containers, making it straightforward to integrate with existing MCP-supported development environments. The proxy is open-source and available now through the AWS GitHub repository at https://github.com/aws/mcp-proxy-for-aws with no additional cost beyond standard AWS service usage. 44:10 Matt – “This is a nice little tool to help with production…and easier stepping stone than having to build all this stuff yourself.” 47:07 Amazon ECS now supports built-in Linear and Canary deployments Amazon ECS now includes native linear and canary deployment strategies alongside existing blue/green deployments, eliminating the need for external tools like AWS CodeDeploy for gradual traffic shifting. Linear deployments shift traffic in equal percentage increments with configurable step sizes and bake times, while canary deployments route a small percentage to the new version before completing the shift. The feature integrates with CloudWatch alarms for automatic rollback detection and supports deployment lifecycle hooks for custom validation steps. Both strategies include a post-deployment bake time that keeps the old revision running after full traffic shift, enabling quick rollback without downtime if issues emerge. Available now in all commercial AWS regions where ECS operates, the deployment strategies work with Application Load Balancer and ECS Service Connect configurations. Customers can implement these strategies through Console, SDK, CLI, CloudFormation, CDK, and Terraform for both new and existing ECS services without additional cost beyond standard ECS pricing. This brings ECS deployment capabilities closer to parity with Kubernetes native deployment options and reduces dependency on CodeDeploy for teams running containerized workloads. The built-in approach simplifies deployment pipelines for organizations that previously needed separate deployment orchestration tools. 48:45 Jonathan – “I always wonder why they haven’t built these things previously, and I guess it was possible through CodeDeploy, but if it was possible through CodeDeploy, then why add it to ECS now? I feel like we kind of get this weird sprawl.” 50:35 Amazon Route 53 Resolver now supports AWS PrivateLink Route 53 Resolver now supports AWS PrivateLink , allowing customers to manage DNS resolution features entirely over Amazon’s private network without traversing the public internet. This includes all Resolver capabilities like endpoints, DNS Firewall, Query Logging, and Outposts integration. The integration addresses security and compliance requirements for organizations that need to keep all AWS API calls within private networks. Operations like creating, deleting, and editing Resolver configurations can now be performed through VPC endpoints instead of public endpoints. Available immediately in all regions where Route 53 Resolver operates, including AWS GovCloud (US) regions. No additional feature announcements for pricing were mentioned, so standard Route 53 Resolver pricing applies, plus PrivateLink endpoint costs (typically $0.01 per hour per AZ plus data processing charges). Primary use case targets enterprises with strict network isolation policies, particularly in regulated industries like finance and healthcare, where DNS management traffic must remain on private networks. This complements existing hybrid DNS architectures using Resolver endpoints for on-premises connectivity. 51:04 Jonathan – “Good for anyone who wanted this!” 54:05 Mountpoint for Amazon S3 and Mountpoint for Amazon S3 CSI driver add monitoring capability Mountpoint for Amazon S3 now emits near real-time metrics using the OpenTelemetry Protocol, allowing customers to monitor operations through CloudWatch , Prometheus , and Grafana instead of parsing log files manually. This addresses a significant operational gap for teams running data-intensive workloads that mount S3 buckets as file systems on EC2 instances or Kubernetes clusters. The new monitoring capability provides granular metrics, including request counts, latency, and error types at the EC2 instance level, enabling proactive troubleshooting of issues like permission errors or performance bottlenecks. Customers can now set up alerts and dashboards using standard observability tools rather than building custom log parsing solutions. Integration works through CloudWatch agent or OpenTelemetry collector, making it compatible with existing monitoring infrastructure that many organizations already have deployed. The feature is available immediately for both the standalone Mountpoint client and the Mountpoint for Amazon S3 CSI driver used in Kubernetes environments. This update is particularly relevant for machine learning workloads, data analytics pipelines, and containerized applications that treat S3 as a file system and need visibility into storage layer performance. Setup instructions are available in the Mountpoint GitHub repository with configuration examples for common observability platforms. GCP 58:31 New Log Analytics query builder simplifies writing SQL code | Google Cloud Blog Google Cloud has released the Log Analytics query builder to general availability, providing a UI-based interface that generates SQL queries automatically for users who need to analyze logs without deep SQL expertise. The tool addresses the common challenge of extracting insights from nested JSON payloads in log data, which typically requires complex SQL functions like JSON_VALUE and JSON_EXTRACT that many DevOps engineers and SREs find time-consuming to write. The query builder includes intelligent schema discovery that automatically detects and suggests JSON fields and values from your datasets, along with a real-time SQL preview so users can see the generated code and switch to manual editing when needed. Key capabilities include search across all fields, automatic aggregations and grouping, and one-click visualization to dashboards, making it practical for incident troubleshooting and root cause analysis workflows. Google plans to expand the feature with cross-project log scopes, trace data integration for joining logs and traces, query saving and history, and natural language to SQL conversion using Gemini AI. The query builder works with existing Log Analytics pricing, which is based on the amount of data scanned during queries, similar to BigQuery’s on-demand pricing model. The tool integrates directly with Google Cloud’s observability stack, allowing users to query logs alongside BigQuery datasets and other telemetry types in a single interface. This consolidation reduces context switching for teams managing complex distributed systems across multiple GCP services and projects. 1:00:01 Jonathan- “I think this is where everything is going. Why spend half an hour crafting a perfect SQL query…when you can have it figure it all out for you.” 1:01:12 GKE and Gemini CLI work better together | Google Cloud Blog Google has open-sourced a GKE extension for Gemini CLI that integrates Kubernetes Engine operations directly into the command-line AI agent. The extension works as both a Gemini CLI extension and a Model Context Protocol server compatible with any MCP client, allowing developers to manage GKE clusters using natural language commands instead of verbose kubectl syntax. The integration provides three main capabilities: GKE-specific context resources for more natural prompting, pre-built slash command prompts for complex workflows, and direct access to GKE tools, including Cloud Observability integration. Installation requires a single command for Gemini CLI users, with separate instructions available for other MCP clients. The primary use case targets ML engineers deploying inference models on GKE who need help selecting appropriate models and accelerators based on business requirements like latency targets. Gemini CLI can automatically discover compatible models, recommend accelerators, and generate deployable Kubernetes manifests through conversational interaction rather than manual configuration. This builds on Gemini CLI’s extension architecture that bundles MCP servers, context files, and custom commands into packages that teach the AI agent how to use specific tools. The GKE extension represents Google’s effort to make Kubernetes operations more accessible through AI assistance, particularly for teams managing AI workload deployments. The announcement includes no pricing details as both Gemini CLI and the GKE extension are open source projects, though standard GKE cluster costs and any Gemini API usage charges would still apply during operation. 1:02:10 Matt – “Anything to make Kubernetes easier to manage, I’m on board for it.” 1:05:06 Master multi-tasking with the Jules extension for Gemini CLI | Google Cloud Blog Google has launched the Jules extension for Gemini CLI , which acts as an autonomous coding assistant that handles background tasks like bug fixes, security patches, and dependency updates while developers focus on primary work. Jules operates asynchronously using the /jules command, working in isolated environments to address multiple issues in parallel and creating branches for review. The extension integrates with other Gemini CLI extensions to create automated workflows, including the Security extension for vulnerability analysis and remediation, and the Observability extension for crash investigation and automated unit test generation. This modular approach allows developers to chain together different capabilities for comprehensive task automation. Jules addresses common developer productivity drains by handling routine maintenance tasks that typically interrupt deep work sessions. The tool can process multiple GitHub issues simultaneously, each in its own environment, and prepares fixes for human review rather than automatically committing changes. The extension is available now as an open source project on GitHub at github.com/gemini-cli-extensions/jules, with no pricing information provided, as it appears to be a free developer tool. Google is building an ecosystem of Gemini CLI extensions that can be combined with Jules for various development workflows. 1:06:16 Jonathan – “Google obviously listens to their customers because it was only half an hour ago when I said something like this would be pretty useful.” 1:11:36 Announcing GA of Cost Anomaly Detection | Google Cloud Blog Google’s Cost Anomaly Detection has reached general availability with AI-powered alerts now enabled by default for all GCP customers across all projects, including new ones. The service automatically monitors spending patterns and sends alerts to Billing Administrators when unusual cost spikes are detected, with no configuration required. The GA release introduces AI-generated anomaly thresholds that adapt to each customer’s historical spending patterns, reducing alert noise by flagging only significant, unexpected deviations. Customers can override these intelligent baselines with custom values if needed, and the system now supports both absolute-dollar thresholds and percentage-based deviation filters to accommodate projects of different sizes and sensitivities. The improved algorithm solves the cold start problem that previously required six months of spending history, now providing immediate anomaly protection for brand new accounts and projects from day one. This addresses a key limitation from the public preview phase and ensures comprehensive cost monitoring regardless of account age. Cost Anomaly Detection remains free as part of GCP’s cost management toolkit and integrates with Cloud Budgets to create a layered approach for preventing, detecting, and containing runaway cloud spending. The anomaly dashboard provides root cause analysis to help teams quickly understand and address cost spikes when they occur. Interested in pricing details? Check out the billing console here . 1:14:01 Elise – “I just wonder, there’s so many third-party companies that specialize in this kind of thing. So I wonder if they realized that they could just do a little bit better.” Azure 1:16:37 Building the future together: Microsoft and NVIDIA announce AI advancements at GTC DC | Microsoft Azure Blog Microsoft and NVIDIA are expanding their AI partnership with several infrastructure and model updates. Azure Local now supports NVIDIA RTX PRO 6000 Blackwell Server Edition GPU s, enabling organizations to run AI workloads at the edge with cloud-like management through Azure Arc, targeting healthcare, retail, manufacturing, and government sectors requiring data residency and low-latency processing. Azure AI Foundry adds NVIDIA Nemotron models for agentic AI and enterprise reasoning, plus NVIDIA Cosmos models for physical AI applications like robotics and autonomous vehicles. Microsoft also introduced TRELLIS for 3D asset generation , all deployable as NVIDIA NIM microservices with enterprise-grade security and scalability. Microsoft deployed the first production-scale cluster of NVIDIA GB300 NVL72 systems with over 4,600 Blackwell Ultra GPUs in the new NDv6 GB300 VM series. Each rack delivers 130 TB/s of NVLink bandwidth and up to 136 kW of compute power, designed for training and deploying frontier models with integrated liquid cooling and Azure Boost for accelerated I/O. Also, NVIDIA Run:ai is now available on Azure Marketplace, providing GPU orchestration and workload management across Azure NC and ND series instances. The platform integrates with AKS, Azure Machine Learning, and Azure AI Foundry to help enterprises dynamically allocate GPU resources, reduce costs, and improve utilization across teams. Azure Kubernetes Service now supports NVIDIA Dynamo framework on ND GB200-v6 VMs, demonstrating 1.2 million tokens per second with the gpt-oss 120b model. Microsoft reports up to 15x throughput improvement over Hopper generation for reasoning models, with deployment guides available for production implementations. 1:21:53 Jonathan – “That’s a really good salesy number to quote, though, 1.2 million tokens a second – that’s great, but that’s not an individual user. One individual user will not get 1.2 million tokens a second out of any model. That is, at full capacity with as many users running inference as possible on that cluster. The total generation output might be 1.2 million tokens a second, which is still phenomenal, but as far as the actual user experience, you know, if you were a business that wanted really fast inference, you’re not going to get 1.2 million tokens a second.” 1:23:26 Public Preview: Azure Functions zero-downtime deployments with rolling Updates in Flex Consumption Azure Functions in the Flex Consumption plan now supports rolling updates for zero-downtime deployments through a simple configuration change. This eliminates the need for forceful instance restarts during code or configuration updates, allowing the platform to gracefully transition workloads across instances. Rolling updates work by gradually replacing old instances with new ones while maintaining active request handling, similar to deployment strategies used in container orchestration platforms. This brings enterprise-grade deployment capabilities to serverless functions without requiring additional infrastructure management. The capability is currently in public preview for the Flex Consumption plan specifically, which is Azure’s newer consumption-based pricing model that offers more flexibility than the traditional Consumption plan. Pricing follows the standard Flex Consumption model based on execution time and memory usage, with no additional cost for the rolling update feature itself. 1:24:42 Matt – “It’s a nice quality of life feature that they’re adding to everything. It’s in preview, though, so don’t deploy production workloads leveraging this.” 1:25:06 The Azure PAYG API Shift: What’s Actually Changing (and Why It Matters) Microsoft is deprecating the legacy Consumption API for Azure Pay-As-You-Go cost data retrieval and replacing it with two modern approaches: the Cost Details API for Enterprise and Microsoft Customer Agreement subscriptions, and the Exports API for PAYG and Visual Studio subscriptions. This shifts from a pull model, where teams constantly query APIs, to a subscribe model where Azure delivers cost data directly to Azure Storage Accounts as CSV files. The change addresses significant scalability and consistency issues with the old API that struggled with throttling, inconsistent schemas across different subscription types, and handling large enterprise-scale datasets. The new APIs support FOCUS-compliant schemas, include reservations and savings plans data in single exports, and integrate better with Power BI and Azure Data Factory for FinOps automation. FinOps teams need to audit existing scripts that call the Microsoft.Commerce/UsageAggregates endpoint and migrate to storage-based data ingestion instead of direct API calls. While the legacy endpoint remains live but unsupported, Microsoft strongly recommends immediate migration, though the deprecation timeline may extend based on customer adoption rates. The practical impact for cloud teams is more reliable cost data pipelines with fewer failed jobs, predictable scheduled exports eliminating API throttling issues, and consistent field mappings across all subscription types. Teams should review Microsoft’s field mapping reference documentation, as column names have changed between the old and new APIs. PAYG customers currently must use the Exports API with storage-based retrieval, though Microsoft plans to eventually extend Cost Details API support to PAYG subscriptions. The transition requires updating data flow architecture but provides an opportunity to standardize FinOps processes across different Azure billing models. 1:27:12 Matt – “A year or two ago, we did an analysis at my day job, and we were trying to figure out the savings plan’s amount if we buy X amount, how much do we need to buy everything along those lines. And we definitely ran into like throttling issues, and it was just bombing out on us at a few points, and a lot of weird loops we had to do because the format just didn’t make sense with moderate stuff. It’s a great way. I would suggest you move not because they’re trying to get rid of it, but because it will make your life better.” 1:28:05 Generally Available: Azure WAF CAPTCHA Challenge for Azure Front Door Azure WAF now includes CAPTCHA challenge capabilities for Front Door deployments, allowing organizations to distinguish between legitimate users and automated bot traffic. This addresses common threats like credential stuffing, web scraping, and DDoS attacks that traditional WAF rules may miss. The CAPTCHA feature integrates directly into Azure Front Door ‘s WAF policy engine, enabling administrators to trigger challenges based on custom rules, rate limits, or anomaly detection patterns. Organizations can configure CAPTCHA thresholds and exemptions without requiring changes to backend application code. This capability targets e-commerce sites, financial services, and any web application experiencing bot-driven abuse or account takeover attempts. The CAPTCHA challenge adds a human verification layer that complements existing WAF protections like OWASP rule sets and custom security policies. Pricing follows the standard Azure Front Door WAF model with per-policy charges plus request-based fees, though specific CAPTCHA-related costs were not detailed in the announcement. Organizations already using Front Door Premium can enable this feature through policy configuration updates. The general availability means this protection is now production-ready across all Azure regions where Front Door operates, removing the need for third-party CAPTCHA services or custom bot mitigation solutions for many Azure customers. We just wonder what we’re going to replace re: Captcha with when AI can click the button like a human can. 1:31:04 Public Preview: Instant Access Snapshots for Azure Premium SSD v2 and Ultra Disk Storage Azure now offers Instant Access Snapshots in public preview for Premium SSD v2 and Ultra Disks , eliminating the traditional wait time for snapshot restoration. Previously, customers had to wait for snapshots to fully hydrate before using restored disks, but this feature allows immediate disk restoration with high performance right after snapshot creation. This capability addresses a critical operational need for enterprises running high-performance workloads on Azure’s fastest storage tiers. Premium SSD v2 and Ultra Disks are typically used for mission-critical databases, SAP HANA , and other latency-sensitive applications where downtime during recovery operations directly impacts business operations. The feature reduces recovery time objectives for disaster recovery and backup scenarios, particularly valuable for customers who need rapid failover capabilities. Organizations can now create point-in-time copies and immediately spin up test environments or recover from failures without the performance penalty of background hydration processes. This positions Azure’s premium storage offerings more competitively against AWS’s EBS snapshots with fast snapshot restore and Google Cloud’s instant snapshots. The preview status means customers should test thoroughly before production use, and Microsoft has not yet announced general availability timing or any pricing changes specific to this snapshot capability. Closing And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod
Nov 5
Welcome to episode 328 of The Cloud Pod, where the forecast is always cloudy! Justin, Ryan, and Matt are on board today to bring you all the latest news in cloud and AI, including secret regions (this one has the aliens), ongoing discussions between Microsoft and OpenAI, and updates to Nova, SQL, and OneLake -and even the latest installment of Cloud Journeys. Let’s get started! Titles we almost went with this week CloudWatch’s New Feature: Because Nobody Likes Writing Incident Reports at 3 AM DNS: Did Not Survive – The Great US-EAST-1 Outage of 2025 404 DevOps Not Found: The AWS Automation Adventure mk When Your DevOps Team Gets Replaced by AI and Then Everything Crashes Database Migrations Get the ChatGPT Treatment: Just Vibe Your Schema Changes AWS DevOps Team Gets the AI Treatment: 40% Fewer Humans, 100% More Questions Breaking Up is Hard to Compute: Microsoft and OpenAI Redefine Their Relationship AWS Goes Full Scope: Now Tracking Your Cloud’s Carbon from Cradle to Gate Platform Engineering: When Your Golden Path Leads to a Dead End DynamoDB’s DNS Disaster: How a Race Condition Raced Through AWS AI Takes Over AWS DevOps Jobs, Servers Take Unscheduled Vacation PostgreSQL Scaling Gets a 30-Second Makeover While AWS Takes a Coffee Break The Domino Effect: When DynamoDB Drops, Everything Drops RAG to Riches: Amazon Nova Learns to Cite Its Sources AWS Finally Tells You When Your EC2 Instance Can’t Keep Up With Your Storage Ambitions AWS Nova Gets Grounded: No More Hallucinating About Reality One API to Rule Them All: OneLake’s Storage Compatibility Play OpenAI gets to pay Alimony Database schema deployments are totally a vibe AWS will tell you how not green you are today, now in 3 scopes General News 02:00 DDoS in September | Fastly Fastly ‘s September DDoS report reveals a notable 15.5 million requests per second attack that lasted over an hour, demonstrating how modern application-layer attacks can sustain extreme throughput with real HTTP requests rather than simple pings or amplification techniques. Attack volume in September dropped to 61% of August levels, with data suggesting a correlation between school schedules and attack frequency: lower volumes coincide with school breaks, while higher volumes occur when schools are in session. Media & Entertainment companies faced the highest median attack sizes, followed by Education and High Technology sectors, with 71% of September’s peak attack day attributed to a single enterprise media company. The sustained 15 million RPS attack originated from a single cloud-provider ASN, using sophisticated daemons that mimicked browser behavior, making detection more challenging than typical DDoS patterns. Organizations should evaluate whether their incident response runbooks can handle hour-long attacks at 15+ million RPS, as these sustained high-throughput attacks require automated mitigation rather than manual intervention. Listen, we’re not inviting a DDoS attack, but also…we’ll just turn off the website, so there’s that. AI Is Going Great – Or How ML Makes Money 04:41 Google AI Studio updates: More control, less friction Google AI Studio introduces “ vibe coding ” – a new AI-powered development experience that generates working multi-modal apps from natural language prompts without requiring API key management or manual service integration. The platform now automatically connects appropriate models and APIs based on app descriptions, supporting capabilities like Veo for video generation, Nano Banana for image editing, and Google Search for source verification. New Annotation Mode enables visual app modifications by highlighting UI elements and describing changes in plain language rather than editing code directly The updated App Gallery provides visual examples of Gemini -powered applications with instant preview, starter code access, and remix capabilities for rapid prototyping Users can add personal API keys to continue development when free-tier quotas are exhausted, with automatic switching back to the free tier upon renewal. Are you a visual learner? You can check out their YouTube tutorial playlist here . 05:39 Justin – “So, there are still API keys – they made it sound like there wasn’t, but there is. You just don’t have to manage them until you’ve consumed your free tier.” 09:35 OpenAI takes aim at Microsoft 365 Copilot • The Register OpenAI launched “ company knowledge ” for ChatGPT Business , Enterprise , and Edu plans, enabling direct integration with corporate data sources, including Slack , SharePoint , Google Drive , Teams , and Outlook ; notably excluding OneDrive , which could impact Microsoft-heavy organizations. The feature requires manual activation for each conversation and lacks capabilities like web search, image generation, or graph creation when enabled, unlike Microsoft 365 Copilot ‘s deeper integration across Office applications. ChatGPT Business pricing at $25/user/month undercuts Microsoft 365 Copilot’s $30/month fee , potentially offering a more cost-effective enterprise AI assistant option with stronger brand recognition. (5 bucks is 5 bucks, right?) Security implementation includes individual authentication per connector, encryption of all data, no training on corporate data, and an Enterprise Compliance API for conversation log review and regulatory reporting. Data residency and processing locations vary by connector, with no clear documentation from OpenAI, requiring organizations to verify compliance requirements before deployment. We kind of think we’ve heard of this before… 11:05 Ryan – “And it’s a huge problem. It’s been a huge problem that people have been trying to solve for a long time.” 14:23 The next chapter of the Microsoft–OpenAI partnership – The Official Microsoft Blog Welp, the divorce has reached a (sort of) amicable alimony agreement. Microsoft and OpenAI have restructured their partnership with Microsoft, now holding approximately 27% stake in OpenAI’s new public benefit corporation, which is now valued at $135 billion, while maintaining exclusive Azure API access and IP rights until AG I is achieved. The agreement introduces an independent expert panel to verify AGI declarations and extends Microsoft’s IP rights for models and products through 2032, including post-AGI models with safety guardrails, though research IP expires by 2030 or AGI verification. OpenAI gains significant operational flexibility, including the ability to develop non-API products with third parties on any cloud provider, release open weight models meeting capability criteria, and serve US government national security customers on any cloud infrastructure. Microsoft can now independently pursue AGI development alone or with partners, and if using OpenAI’s IP pre-AGI , must adhere to compute thresholds significantly larger than current leading model training systems. OpenAI has committed to purchasing $250 billion in Azure services while Microsoft loses its right of first refusal as OpenAI’s compute provider, signaling a shift toward more independent operations for both companies. Con’t The next chapter of the Microsoft–OpenAI partnership | OpenAI Microsoft’s investment in OpenAI is now valued at approximately $135 billion, representing roughly 27% ownership on a diluted basis, while OpenAI transitions to a public benefit corporation structure. The partnership introduces an independent expert panel to verify when OpenAI achieves AGI, with Microsoft’s IP rights for models and products extended through 2032, including post-AGI models with safety guardrails. OpenAI gains significant flexibility, including the ability to develop non-API products with third parties on any cloud provider, release open weight models meeting capability criteria, and provide API access to US government national security customers on any cloud. Microsoft can now independently pursue AGI development alone or with partners, while OpenAI has committed to purchasing an additional $250 billion in Azure services, but Microsoft no longer has the right of first refusal as a compute provider. The revenue-sharing agreement continues until AGI verification, but payments will be distributed over a longer timeframe, while Microsoft retains exclusive rights to OpenAI’s frontier models and Azure API exclusivity until AGI is achieved. 15:59 Justin – “Once AGI is achieved is an interesting choice… I wonder how Microsoft believes that’s gonna happen very soon, and OpenAI doesn’t, that’s why they’re willing to agree on that term; it’s interesting. Again, it has to be independently verified by a partner, so OpenAI can’t just come out and say, ‘we’ve created AGI,’ then, into a legal dispute – it has to be agreed upon by others. So that’s all very interesting.” 17:45 Build more accurate AI applications with Amazon Nova Web Grounding | AWS News Blog AWS announces general availability of Web Grounding for Amazon Nova Premier , a built-in RAG tool that automatically retrieves and cites current web information during inference. The feature eliminates the need to build custom RAG pipelines while reducing hallucinations through automatic source attribution and verification. Web Grounding operates as a system tool within the Bedrock Converse API , allowing Nova models to intelligently determine when to query external sources based on prompt context. Developers simply add nova_grounding to the toolConfig parameter, and the model handles retrieval, integration, and citation of public web sources automatically. The feature is currently available only in US East N. Virginia for Nova Premier , with Ohio and Oregon regions coming soon, and support for other Nova models planned. Additional costs apply beyond standard model inference pricing, detailed on the Amazon Bedrock pricing page . Primary use cases include knowledge-based chat assistants requiring current information, content generation tools needing fact-checking, research applications synthesizing multiple sources, and customer support where accuracy and verifiable citations are essential. The reasoning traces in responses allow developers to follow the model’s decision-making process. The implementation provides a turnkey alternative to custom RAG architectures, particularly valuable for developers who want to focus on application logic rather than managing complex information retrieval systems while maintaining transparency through automatic source attribution. 18:36 Justin – “This is the first time I’ve heard anything about Nova in months, so, good to know?” Cloud Tools 19:34 I ntroducing-ai-powered-database-migration-authoring Harness introduces AI-powered database migration authoring that lets developers describe schema changes in plain English, like “create a table named animals with columns for genus_species,” and automatically generates production-ready SQL migrations with rollback scripts and Git integration. The tool addresses the “ AI Velocity Paradox ” where 63% of organizations ship code faster with AI, but 72% have suffered production incidents from AI-generated code – by extending AI automation to database changes, which remain a manual bottleneck in most CI/CD pipelines. Built on Harness’s Software Delivery Knowledge Graph and MCP Server, it analyzes current schemas, generates backward-compatible migrations, validates for compliance, and integrates with existing policy-as-code governance – making it more than just a generic SQL generator. Database DevOps is one of Harness’s fastest-growing modules, with customers like Athenahealth reporting they saved months of engineering effort compared to Liquibase Pro or homegrown solutions while getting better governance and visibility. This positions databases as first-class citizens in CI/CD pipelines rather than the traditional midnight deployment bottleneck, allowing DBAs to maintain oversight through automated approvals while developers can finally move database changes at DevOps speed. 20:44 Ryan – “Given how hard this is for humans to do, I look forward to AI doing this better.” AWS 21:38 Amazon Allegedly Replaced 40% of AWS DevOps With AI Days Before Crash An unverified report claims Amazon replaced 40% of AWS DevOps staff with AI systems capable of automatically fixing IAM permissions, rebuilding VPC configurations, and rolling back failed Lambda deployments, just days before their widely reported on crash. AWS has not confirmed this, and skepticism remains high, however. The timing coincides with a recent AWS outage that impacted major services, including Snapchat , McDonald’s app, Roblox, and Fortnite, raising questions about automation’s role in system reliability and incident response. AWS officially laid off hundreds of employees in July 2025 (and more just recently), but the alleged 40% DevOps reduction would represent a significant shift toward AI-driven infrastructure management if true. The incident highlights growing concerns about cloud service concentration risk, as both this AWS outage and the 2023 CrowdStrike incident demonstrate how single points of failure can impact thousands of businesses globally. For AWS customers, this raises practical questions about the balance between automation efficiency and human oversight in critical infrastructure operations, particularly for disaster recovery and complex troubleshooting scenarios. 22:19 Justin – “In general, Amazon has been doing a lot of layoffs. There’s been a lot of brain drain. I don’t know that they’ve automated 40% of the DevOps staff with AI systems…so this one seems a little rumor-y and speculative, but I did find it fun that people were trying to blame AI for Amazon’s woes last week.” 24:41 Summary of the Amazon DynamoDB Service Disruption in Northern Virginia (US-EAST-1) Region DynamoDB experienced a 2.5-hour outage in US-EAST-1 due to a race condition in its DNS management system that resulted in empty DNS records, affecting all services dependent on DynamoDB, including EC2 , Lambda , and Redshift . The cascading failure pattern showed how tightly coupled AWS services are – EC2 instance launches failed for 14 hours because DynamoDB’s outage prevented lease renewals between EC2’s DropletWorkflow Manager and physical servers. Network Load Balancers experienced connection errors from 5:30 AM to 2:09 PM due to health check failures caused by EC2’s network state propagation delays, demonstrating how infrastructure dependencies can create extended recovery times. AWS has disabled the automated DNS management system globally and will implement velocity controls and improved throttling mechanisms before re-enabling, highlighting the challenge of balancing automation with resilience. The incident reveals architectural vulnerabilities in multi-service dependencies – services like Redshift in all regions failed IAM authentication due to hardcoded dependencies on US-EAST-1, suggesting the need for better regional isolation. 26:31 Matt – “It’s a good write-up to show that look, even these large cloud providers that have these massive systems and have redundancy upon redundancy upon redundancy – it’s all software under the hood. Software will eventually have a bug in it. And this just happens to be a really bad bug that took down half the internet.” 28:30 Amazon CloudWatch introduces interactive incident reporting CloudWatch now automatically generates post-incident analysis reports by correlating telemetry data, investigation inputs, and actions taken during an investigation, reducing report creation time from hours to minutes. Reports include executive summaries, event timelines, impact assessments, and actionable recommendations, helping teams identify patterns and implement preventive measures for better operational resilience. The feature integrates directly with CloudWatch investigations , capturing operational telemetry and service configurations automatically without manual data collection or correlation. Currently available in 12 AWS regions, including US East, Europe, and Asia Pacific, with no specific pricing mentioned – likely included in existing CloudWatch investigation costs. This addresses a common pain point where teams spend significant time manually creating incident reports instead of focusing on root cause analysis and prevention strategies. 31:00 Customer Carbon Footprint Tool Expands: Additional emissions categories including Scope 3 are now available | AWS News Blog AWS Customer Carbon Footprint Tool now includes Scope 3 emissions data covering fuel/energy-related activities, IT hardware lifecycle emissions, and building/equipment impacts, giving customers a complete view of their carbon footprint beyond just direct operational emissions. The tool provides both location-based and market-based emission calculations with 38 months of historical data recalculated using the new methodology, accessible through the AWS Billing console with CSV export and integration options for QuickSight visualization. Scope 3 emissions are amortized over asset lifecycles (6 years for IT hardware, 50 years for buildings) to fairly distribute embodied carbon across operational lifetime, with all calculations independently verified following GHG Protocol standards . Early access customers like Salesforce, SAP, and Pinterest report that the granular regional data and Scope 3 visibility help them move beyond industry averages to make targeted carbon reduction decisions based on actual infrastructure emissions. The tool remains free to use within the AWS Billing and Cost Management console, providing emissions data in metric tons of CO2 equivalent (MTCO2e) to help organizations track progress toward sustainability goals and compliance reporting requirements. 32:45 Matt – “This is a difficult problem to solve. Once you have scope three, it’s all your indirect costs. So, I think if I remember correctly, scope one is your actual server, scope two is power, and then scope three is all the things that have to get included to generate your power and your servers, which includes shipping, et cetera. So getting all that, it’s not an easy task to do. Even when I look at the numbers, I don’t know what these mean half the time when I have to look at them. I’m like, we’re going down. That seems positive.” 33:59 AWS Secret-West Region is now available AWS launches Secret-West , its second region capable of handling Secret-level U.S. classified workloads, expanding beyond the existing Secret-East region to provide geographic redundancy for intelligence and defense agencies operating in the western United States. The region meets stringent Intelligence Community Directive (ICD) 503 and DoD Security Requirements Guide Impact Level 6 requirements, enabling government agencies to process and analyze classified data with multiple Availability Zones for high availability and disaster recovery. This expansion allows agencies to deploy latency-sensitive classified workloads closer to western U.S. operations while maintaining multi-region resiliency, addressing a critical gap in classified cloud infrastructure outside the eastern United States. AWS continues to operate in a specialized market segment with limited competition, as few cloud providers can meet the security clearance and infrastructure requirements necessary for Secret-level classification hosting. Pricing information is not publicly available due to the classified nature of the service; interested government agencies must contact AWS directly through their secure channels to discuss access and costs. Agent Coulson – “Welcome to level 7.” 38:24 AWS Transfer Family now supports changing identity provider type on a server AWS Transfer Family now allows changing identity provider types (service managed, Active Directory, or custom IdP) on existing SFTP, FTPS, and FTP servers without service interruption, eliminating the need to recreate servers during authentication migrations. This feature enables zero-downtime authentication migrations for organizations transitioning between identity providers or consolidating authentication systems, particularly useful for companies undergoing mergers or updating compliance requirements. The capability is available across all AWS regions where Transfer Family operates, with no additional pricing beyond standard Transfer Family costs, which start at $0.30 per protocol per hour. Organizations can now adapt their file transfer authentication methods dynamically as business needs evolve, such as switching from basic service-managed users to enterprise Active Directory integration without disrupting ongoing file transfers. Implementation details and migration procedures are documented in the Transfer Family User Guide here . 39:26 Ryan – “Any kind of configuration change that requires you to destroy and recreate isn’t fun. I do believe that we should architect for such things and be able to redirect things with DNS traffic (which never goes wrong), never causes anyone any problems. But, it is terrible when that happens, because even when it works, you’re sort of nervously doing it the entire time.” 40:24 New Amazon CloudWatch metrics to monitor EC2 instances exceeding I/O performance AWS introduces Instance EBS IOPS Exceeded Check and Instance EBS Throughput Exceeded Check metrics that return binary values (0 or 1) to indicate when EC2 instances exceed their EBS-optimized performance limits, helping identify bottlenecks without manual calculation. These metrics enable automated responses through CloudWatch alarms , such as triggering instance resizing or type changes when I/O limits are exceeded, reducing manual intervention for performance optimization. Available at no additional cost with 1-minute granularity for all Nitro-based EC2 instances with attached EBS volumes across all commercial AWS regions, including GovCloud and China. Addresses a common blind spot where applications experience degraded performance due to exceeding instance-level I/O limits rather than volume-level limits, which many users overlook when troubleshooting. (Yes, we’re all guilty of this.) Particularly useful for database workloads and high-throughput applications where understanding whether the bottleneck is at the instance or volume level is critical for right-sizing decisions. 41:20 Matt – “This would have solved a lot of headaches when GP3 came out…” GCP 43:53 A practical guide to Google Cloud’s Parameter Manager | Google Cloud Blog Google Cloud Parameter Manager provides centralized configuration management that separates application settings from code, supporting JSON, YAML, and unformatted data with built-in format validation for JSON and YAML types The service integrates with Secret Manager through a __REF__ syntax that allows parameters to securely reference secrets like API keys and passwords, with regional compliance enforcement ensuring secrets can only be referenced by parameters in the same region Parameter Manager uses versioning for configuration snapshots, enabling safe rollbacks and preventing unintended breaking changes to deployed applications while supporting use cases like A/B testing, feature flags, and regional configurations Both Parameter Manager and Secret Manager offer monthly free tiers, though specific pricing details aren’t provided in the announcement; the service requires granting IAM permissions for parameters to access referenced secrets Key benefits include eliminating hard-coded configurations, supporting multi-region deployments with region-specific settings, and enabling dynamic configuration updates without code changes for applications across various industries 44:22 Justin – “ I’m a very heavy user of parameter store on AWS. I love it, and you should all use it for any of your dynamic configuration, especially if you’re moving containers between environments. This is the bee’s knees in my opinion.” 49:39 Cross-Site Interconnect, now GA, simplifies L2 connectivity | Google Cloud Blog Cross-Site Interconnect is now GA, providing managed Layer 2 connectivity between data centers using Google’s global network infrastructure, eliminating the need for complex multi-vendor setups and reducing capital expenditures for WAN connectivity. The service offers consumption-based pricing with no setup fees or long-term commitments, allowing customers to scale bandwidth dynamically and pay only for what they use, though specific pricing details weren’t provided in the announcement. Built on Google’s 3.2 million kilometers of fiber and 34 subsea cables (and you know how much we love a good undersea cable). Cross-Site Interconnect provides a 99.95% SLA that includes protection against cable cuts and maintenance windows, with automatic failover and proactive monitoring across 100s of Cloud Interconnect PoPs. Financial services and telecommunications providers are early adopters, with Citadel reporting stable performance during their pilot program, highlighting use cases for low-latency trading, disaster recovery, and dynamic bandwidth augmentation for AI/ML workloads. As a transparent Layer 2 service, it enables MACsec encryption between remote routers with customer-controlled keys, while providing programmable APIs for infrastructure-as-code workflows and real-time monitoring of latency, packet loss, and bandwidth utilization. 50:57 Ryan – “I mean, I like this just because of the heavy use of infrastructure as code availability. Some of these deep-down network services across the clouds don’t really provide that; it’s all just sort of click ops or a support case. So this is kind of neat. And I do like that you can dynamically configure this and stand it up / turn it down pretty quickly.” 53:12 Introducing Bigtable tiered storage | Google Cloud Blog Bigtable introduces tiered storage that automatically moves data older than a configurable threshold from SSD to infrequent access storage, reducing storage costs by up to 85% while maintaining API compatibility and data accessibility through the same interface. The infrequent access tier provides 540% more storage capacity per node compared to SSD-only nodes , enabling customers to retain historical data for compliance and analytics without manual archiving or separate systems. Time-series workloads from manufacturing, automotive, and IoT benefit most – sensor data, EV battery telemetry, and factory equipment logs can keep recent data on SSD for real-time operations while moving older data to cheaper storage automatically based on age policies. Integration with Bigtable SQL allows querying across both tiers, and logical views enable controlled access to historical data for reporting without full table permissions, simplifying data governance for large datasets. Currently in preview with pricing at approximately $0.026/GB/month for infrequent access storage compared to $0.17/GB/month for SSD storage, representing significant savings for organizations storing hundreds of terabytes of historical operational data. 54:31 Ryan – “To illustrate that I’m still a cloud guy at heart, whenever I’m in an application and I’m loading data and I go back – like I want to see a year’s data – and it takes that extra 30 seconds to load, I actually get happy, because I know what they’re doing on the backend.” 56:05 Now Shipping A4X Max, Vertex AI Training and more | Google Cloud Blog Google launches A4X Max instances powered by NVIDIA GB300 NVL72 with 72 Blackwell Ultra GPUs and 36 Grace CPUs , delivering 2x network bandwidth compared to A4X and 4x better LLM training performance versus A3 H100-based VMs. The system features 1.4 exaflops per NVL72 system and can scale to clusters twice as large as A4X deployments. GKE now supports DRANET (Dynamic Resource Allocation Kubernetes Network Driver) in production, starting with A4X Max, providing topology-aware scheduling of GPUs and RDMA network cards to boost bus bandwidth for distributed AI workloads. This improves cost efficiency through better VM utilization by optimizing connectivity between RDMA devices and GPUs. GKE Inference Gateway integrates with NVIDIA NeMo Guardrails to add safety controls for production AI deployments, preventing models from engaging in undesirable topics or responding to malicious prompts. The integration combines model-aware routing and autoscaling with enterprise-grade security features. Vertex AI Model Garden will support NVIDIA Nemotron models as NIM microservices, starting with Llama Nemotron Super v1.5, allowing developers to deploy open-weight models with granular control over machine types, regions, and VPC security policies. Vertex AI Training now includes curated recipes built on NVIDIA NeMo Framework and NeMo-RL with managed Slurm environments and automated resiliency features for large-scale model development. A4X Max is available in preview through Google Cloud sales representatives and leverages Cluster Director for lifecycle management, topology-aware placement, and integration with Managed Lustre storage. Pricing details were not disclosed in the announcement. 57:41 Justin – “That’s a lot of cool hardware stuff that I do not understand.” Azure 58:38 NVIDIA GB300 NVL72: Next-generation AI infrastructure at scale | Microsoft Azure Blog Microsoft deployed the first production cluster with over 4,600 NVIDIA GB300 NVL72 systems featuring Blackwell Ultra GPUs, enabling AI model training in weeks instead of months and supporting models with hundreds of trillions of parameters The ND GB300 v6 VMs deliver 1,440 petaflops of FP4 performance per rack with 72 GPUs, 37TB of fast memory, and 130TB/second NVLink bandwidth, specifically optimized for reasoning models, agentic AI, and multimodal generative AI workloads Azure implemented 800 Gbps NVIDIA Quantum-X800 InfiniBand networking with full fat-tree architecture and SHARP acceleration, doubling effective bandwidth by performing computations in-switch for improved large-scale training efficiency The infrastructure uses standalone heat exchanger units and new power distribution models to handle high-density GPU clusters, with Microsoft planning to scale to hundreds of thousands of Blackwell Ultra GPUs across global datacenters OpenAI and Microsoft are already using these clusters for frontier model development, with the platform becoming the standard for organizations requiring supercomputing-scale AI infrastructure (pricing is not specified in the announcement). 59:55 Ryan – “Companies looking for scale – companies with a boatload of money.” 1:00:23 Generally Available: Near-zero downtime scaling for HA-enabled Azure Database for PostgreSQL servers Azure Database for PostgreSQL servers with high availability can now scale in under 30 seconds compared to the previous 2-10 minute window, reducing downtime by over 90% for database scaling operations. This feature targets production workloads that require continuous availability during infrastructure changes, particularly benefiting e-commerce platforms, financial services, and SaaS applications that cannot afford extended maintenance windows. The near-zero downtime scaling works specifically with HA-enabled PostgreSQL instances, leveraging Azure’s high availability architecture to perform seamless compute and storage scaling without disrupting active connections. While pricing remains unchanged from standard PostgreSQL rates, the reduced downtime translates to lower operational costs by minimizing revenue loss during scaling events and reducing the need for complex maintenance scheduling. This enhancement positions Azure PostgreSQL competitively against AWS RDS and Google Cloud SQL , which still require longer downtime windows for similar scaling operations on their managed PostgreSQL offerings. 1:01:16 Matt – “They’ve had this for forever on Azure SQL, which is their Microsoft SQL platform, so it doesn’t surprise me. It surprised me more that this was already a two-to-10-minute window to scale. Seems crazy for a production HA service.” 1:02:10 OneLake APIs: Bring your apps and build new ones with familiar Blob and ADLS APIs | Microsoft Fabric Blog | Microsoft Fabric OneLake now supports Azure Blob Storage and ADLS APIs , allowing existing applications to connect to Microsoft Fabric’s unified data lake without code changes – just swap endpoints to onelake.dfs.fabric.microsoft.com or onelake.blob.fabric.microsoft.com . What could go wrong? This API compatibility eliminates migration barriers for organizations with existing Azure Storage investments, enabling immediate use of tools like Azure Storage Explorer with OneLake while preserving existing scripts and workflows The feature targets enterprises looking to consolidate data lakes without rewriting applications, particularly those using C# SDKs or requiring DFS operations for hierarchical data management Microsoft provides an end-to-end guide demonstrating open mirroring to replicate on-premises data to OneLake Delta tables , positioning this as a bridge between traditional storage and Fabric’s analytics ecosystem No specific pricing mentioned for OneLake API access – costs likely follow standard Fabric capacity pricing model based on compute and storage consumption Cloud Journey 1:03:47 8 platform engineering anti-patterns | InfoWorld Platform engineering initiatives are failing at an alarming rate because teams treat the visual portal as the entire platform rather than building solid backend APIs and orchestration first. The 2024 DORA Report found that dedicated platform engineering teams actually decreased throughput by 8% and change stability by 14%, showing that implementation mistakes have serious downstream consequences. The biggest mistake organizations make is copying approaches from large companies like Spotify without considering ROI for their scale. Mid-size companies invest the same effort as enterprises with thousands of developers but see minimal returns, making reference architectures often impractical for solving real infrastructure abstraction challenges. Successful platform adoption requires shared ownership where developers can contribute plugins and customizations rather than top-down mandates. Spotify achieves 100% employee adoption of their internal Backstage by allowing engineers to build their own plugins like Soundcheck, proving that developer autonomy drives platform usage. Organizations must survey specific user subsets because Java developers, QA testers, and SREs have completely different requirements from an internal developer platform. Tracking surface metrics like onboarded users misses the point when platforms should measurably improve time to market, reduce costs, and increase innovation rather than just showing DORA metrics. Simply rebranding operations teams as platform engineering without a cultural shift and product mindset creates more toil than it reduces. Platforms need to be treated as products requiring continuous improvement, user research, internal marketing, and incremental development, starting with basic CI/CD touchpoints rather than attempting to solve every problem on day one. Closing And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod
Oct 30
Welcome to episode 327 of The Cloud Pod, where the forecast is always cloudy! Justin, Matt, and Ryan are here to bring you all the latest news (and a few rants) in the worlds of Cloud and AI. I’m sure all our readers are aware of the AWS outage last week, as it was in all the news everywhere. But we’ve also got some new AI models (including Sora in case you’re low on really crappy videos the youths might like), plus EKS, Kubernetes, Vertex AI, and more. Let’s get started! Titles we almost went with this week Oracle and Azure Walk Into a Cloud Bar: Nobody Gets ETL’d When DNS Goes Down, So Does Your Monday: AWS Takes Half the Internet on a Coffee Break 404 Cloud Not Found: AWS Proves Even the Internet’s Phone Book Can Get Lost DNS: Definitely Not Staffed – How AWS Lost Its Way When It Lost Its People When Larry Met Satya: A Cloud Love Story Azure Finally Answers ‘Dude, Where’s My Data?’ with Storage Discovery Breaking: Microsoft Discovers AI Training Uses More Power Than a Small Country 404 Engineers Not Found – AWS Learns the Hard Way That People Are Its Most Critical Infrastructure Azure Storage Discovery: Finding Your Data Needles in the Cloud Haystack EKS Auto Mode: Because Even Your Clusters Deserve Cruise Control Azure Gets Reel: Microsoft Adds Video Generation to AI Foundry The Great Token Heist: Vertex AI Steals 90% Off Your Gemini Bills Cache Me If You Can: Vertex AI’s Token-Saving Feature IaC Just Got a Manager – And It’s Not Your Boss From Musk to Microsoft: Grok 4 Makes the Great Cloud Migration No Harness.. You are not going to make IACM happen Microsoft Drafts a Solution to Container Creation Chaos PowerShell to the People: Azure Simplifies the Great Gateway Migration IP There Yet? Azure’s Scripts Keep Your Address While You Upgrade Follow Up 00:53 Glacier Deprecation Email Standalone Amazon Glacier service (vault-based with separate APIs) will stop accepting new customers as of December 15, 2025. S3 Glacier storage classes (Instant Retrieval, Flexible Retrieval, Deep Archive) are completely unaffected and continue normally Existing Glacier customers can keep using it forever – no forced migration required. AWS is essentially consolidating around S3 as the unified storage platform, rather than maintaining two separate archival services. The standalone service will enter maintenance mode, meaning there will be no new features, but the service will remain operational. Migration to S3 Glacier is optional but recommended for better integration, lower costs, and more features. (Justin assures us it is actually slightly cheaper, so there’s that.) General News 02:24 F5 discloses major security breach linked to nation-state hackers – GeekWire F5 disclosed that nation-state hackers maintained persistent access to their internal systems over the summer of 2024 , stealing portions of BIG-IP source code and vulnerability details before containment in August. The breach compromised product development and engineering systems, but did not affect customer CRM data, financial systems, or F5’s software supply chain, according to independent security audits. F5 has released security patches for BIG-IP, F5OS, and BIG-IP Next products and is providing threat-hunting guides to help customers monitor for suspicious activity. This represents the first publicly disclosed breach of F5’s internal systems, notable given that F5 handles traffic for 80% of Fortune Global 500 companies through its load-balancing and security services. The incident highlights supply chain security concerns, as attackers targeted source code and vulnerability information, rather than customer data, potentially seeking ways to exploit F5 products deployed across enterprise networks. 03:12 Justin – “A little concerning on this one, mostly because F5 is EVERYWHERE.” AI is Going Great – Or How ML Makes Money 04:55 Claude Code gets a web version—but it’s the new sandboxing that really matters – Ars Technica Anthropic launched web and mobile interfaces for Claude Code , their CLI-based AI coding assistant, with the web version supporting direct access to GitHub repositories and the ability to process general instructions, such as “add real-time inventory tracking to the dashboard.” The web interface introduces multi-session support, allowing developers to run and switch between multiple coding sessions simultaneously through a left-side panel, plus the ability to provide mid-task corrections without canceling and restarting A new sandboxing runtime has been implemented to improve security and reduce friction, moving away from the previous approach where Claude Code required permission for most changes and steps during execution The mobile version is currently limited to iOS and is in an earlier development stage compared to the web interface, indicating a phased rollout approach This positions Claude Code as a more accessible alternative to traditional CLI-only AI coding tools, potentially expanding its reach to developers who prefer web-based interfaces over command-line environments 05:51 Ryan – “I haven’t had a chance to play with the web version, but I am interested in it just because I found the terminal interface limiting, but I also feel like a lot of the value is in that local sort of execution and not in the sandbox. A lot of the tasks I do are internal and require access to either company resources or private networks, or the kind of thing where you’re not going to get that from a publicly hosted sandbox environment.” 08:36 Open Source: Containerization Assist MCP Server Containerization Assist automates the tedious process of creating Dockerfiles and Kubernetes manifests, eliminating manual errors that plague developers during the containerization process Built on AKS Draft’s proven foundation, this open-source tool goes beyond basic AI coding assistants by providing a complete containerization platform rather than just code suggestions. The tool addresses a critical pain point where developers waste hours writing boilerplate container configurations and debugging deployment issues caused by manual mistakes. (Listener beware, Justin mini rant here.) As an open-source MCP (Model Context Protocol) server, it integrates seamlessly with existing development workflows while leveraging Microsoft’s containerization expertise from Azure Kubernetes Service. (Expertise is a stretch.) This launch signals Microsoft’s commitment to simplifying Kubernetes adoption by removing the steep learning curve associated with container orchestration and manifest creation – or you could just use a pass. 09:47 Matt – “The piece I did like about this is that it integrated in as an optional feature, kind of the trivia and the security thing. So it’s not just setting it up, but they integrated the next steps of security code scanning. It’s not Microsoft saying, you know, hey, it’s standard … they are building security in, hopefully.” Cloud Tools 33:09 IaC is Great, But Have You Met IaCM? IaCM (Infrastructure as Code Management) extends traditional IaC by adding lifecycle management capabilities, including state management, policy enforcement, and drift detection to handle the complexity of infrastructure at scale. Key features include centralized state file management with version control, module and provider registries for reusable components, and automated policy enforcement to ensure compliance without slowing down teams. The platform integrates directly into CI / CD workflows with visual PR insights showing cost estimates and infrastructure changes before deployment, solving the problem of unexpected costs and configuration conflicts. IaCM addresses critical pain points like configuration drift, secret exposure in state files, and resource conflicts when multiple teams work on the same infrastructure simultaneously. Harness IaCM specifically supports OpenTofu and Terraform with features like Variable Sets, Workspace Templates, and Default Pipelines to standardize infrastructure delivery across organizations. 13:04 Justin – “So let me boil this down for you. We created our own Terraform Enterprise or Terraform Cloud, but we can’t use that name because it’s copyrighted. So we’re going to try to create a new thing and pretend we invented this – and then try to sell it to you as our new Terraform or OpenTofu replacement for your management tier.” HugOps Corner – Previously Known as AWS 41:08 AWS outage hits major apps and services, resurfacing old questions about cloud redundancy – GeekWire AWS US-EAST-1 experienced a major outage starting after midnight Pacific on Monday, caused by DNS resolution issues with DynamoDB that prevented proper address lookup for database services, impacting thousands of applications, including Facebook , Snapchat , Coinbase , ChatGPT , and Amazon’s own services. The outage highlighted ongoing redundancy concerns as many organizations failed to implement proper failover to other regions or cloud providers, despite similar incidents in US-EAST-1 in 2017, 2021, and 2023, raising questions about single-region dependency for critical infrastructure. AWS identified the root cause as an internal subsystem responsible for monitoring network load balancer health, with core DNS issues resolved by 3:35 AM Pacific, though Lambda backlog processing and EC2 instance launch errors persisted through the morning recovery period. Real-world impacts included LaGuardia Airport check-in kiosk failures, causing passenger lines, widespread disruption to financial services ( Venmo , Robinhood ), gaming platforms ( Roblox , Fortnite ), and productivity tools ( Slack , Canva ), demonstrating the cascading effects of cloud provider outages. The incident underscores the importance of multi-region deployment strategies and proper disaster recovery planning for AWS customers, particularly those using US-EAST-1 as their primary region due to its status as AWS’s oldest and largest data center location. We have a couple of observations: this one took a LONG time to resolve, including hours before the DNS was restored. Maybe they’re out of practice? Maybe it’s a people problem? Hopefully, this isn’t the new norm as some of the talent have been let go/moved on. 17:53 Ryan – “If it’s a DNS resolution issue that’s causing a global outage, that’s not exactly straightforward. It’s not just a bug, you know, or a function returning the wrong value, or that you’re looking at global propagation, you’re looking at clients in different places, resolving different things, at the base parts of the internet for functionality. And so it does take a pretty experienced engineer to sort of have that in their heads conceptually in to order to troubleshoot. I wonder if that’s really the cause, where they’re not able to recover as fast. But I also feel like cloud computing has come a long way, and the impact was very widely felt because a lot more people are using AWS as their hosting provider than I think have been in the past. A little bit of everything, I think.” AWS outage was not due to a cyberattack — but shows potential for ‘far worse’ damage – GeekWire AWS’s US-EAST-1 region experienced an outage due to an internal monitoring subsystem failure affecting network load balancers, impacting major services including Facebook, Coinbase, and LaGuardia Airport check-in systems. The issue was related to DNS resolution problems with DynamoDB, not a cyberattack. The incident highlights ongoing single-region dependency issues, as US-EAST-1 remains AWS’s largest region and has caused similar widespread disruptions in 2017, 2021, and 2023. Many organizations still lack proper multi-region failover despite repeated outages from this location. Industry experts warn that the outage demonstrates vulnerability to potential targeted attacks on cloud infrastructure monoculture. The concentration of services on single providers creates systemic risk similar to agricultural monoculture, where one failure can cascade widely. The failure occurred at the control-plane level, suggesting AWS should implement more aggressive isolation of critical networking components. This may accelerate enterprise adoption of multi-cloud and multi-region architectures as baseline resilience requirements. AWS resolved the issue within hours but the incident reinforces that even major cloud providers remain vulnerable to cascading failures when core monitoring and health check systems malfunction, affecting downstream services across their infrastructure. Today is when Amazon’s brain drain finally caught up with AWS • The Register AWS experienced a major outage on October 20, 2025 in US-EAST-1 region caused by DNS resolution failures for DynamoDB endpoints, taking 75 minutes just to identify the root cause and impacting banking, gaming, social media, and government services across much of the internet. The incident highlights concerns about AWS’s talent retention, with 27,000+ Amazon layoffs between 2022-2025 and internal documents showing 69-81% regretted attrition, suggesting loss of senior engineers who understood complex failure modes and had institutional knowledge of AWS systems. DynamoDB’s role as a foundational service meant the DNS failure created cascading impacts across multiple AWS services, demonstrating the risk of centralized dependencies in cloud architectures and the importance of regional redundancy for critical workloads. AWS’s status page showed “all is well” for the first 75 minutes of the outage, continuing a pattern of slow incident communication that AWS has acknowledged as needing improvement in multiple previous post-mortems from 2011, 2012, and 2015. The article suggests this may be a tipping point where the loss of experienced staff who built these systems is beginning to impact AWS’s legendary operational excellence, with predictions that similar incidents may become more frequent as institutional knowledge continues to leave. -And that’s an end to Hugops. Moving on to the rest of AWS- 23:58 Monitor, analyze, and manage capacity usage from a single interface with \ Amazon EC2 Capacity Manager | AWS News Blog EC2 Capacity Manager provides a single dashboard to monitor and manage EC2 capacity across all accounts and regions, eliminating the need to collect data from multiple AWS services like Cost and Usage Reports , CloudWatch , and EC2 APIs. Available at no additional cost in all commercial AWS regions. The service aggregates capacity data with hourly refresh rates for On-Demand Instances, Spot Instances, and Capacity Reservations, displaying utilization metrics by vCPUs, instance counts, or estimated costs based on published On-Demand rates. Key features include automated identification of underutilized Capacity Reservations with specific utilization percentages by instance type and AZ, plus direct modification capabilities for ODCRs within the same account. Data exports to S3 extend analytics beyond the 90-day console retention period, enabling long-term capacity trend analysis and integration with existing BI tools or custom reporting systems. Organizations can enable cross-account visibility through AWS Organizations integration, helping identify optimization opportunities like redistributing reservations between development accounts showing 30% utilization and production accounts exceeding 95%. 25:45 Ryan – “This is kind of nice to have it built in and just have it be plug and play – especially when it’s at no cost.” 26:21 New Amazon EKS Auto Mode features for enhanced security, network control, and performance | Containers EKS Auto Mod e now supports EC2 On-Demand Capacity Reservations and Capacity Blocks for ML , allowing customers to target pre-purchased capacity for AI/ML workloads requiring guaranteed access to specialized instances like P5s. This addresses the challenge of GPU availability for training jobs without over-provisioning. New networking capabilities include separate pod subnets for isolating infrastructure and application traffic, explicit public IP control for enterprise security compliance, and forward proxy support with custom certificate bundles. These features enable integration with existing enterprise network architectures without complex CNI customizations. Complete AWS KMS encryption now covers both ephemeral storage and root volumes using customer-managed keys, addressing security audit findings that previously flagged unencrypted storage. This eliminates the need for custom AMIs or manual certificate distribution. Performance improvements include multi-threaded node filtering and intelligent capacity management that can automatically relax instance diversity constraints during capacity shortages. These optimizations particularly benefit time-sensitive applications and AI/ML workloads requiring rapid scaling. EKS Auto Mode is available for new clusters or can be enabled on existing EKS clusters running Kubernetes 1.29+, with migration guides available for teams moving from Managed node groups, Karpenter , or Fargate . Pricing follows standard EKS pricing at $0.10 per cluster per hour plus EC2 instance costs. 27:33 Ryan – “This just highlights how terrible it was before.” 29:33 Amazon EC2 now supports Optimize CPUs for license-included instances EC2 now lets customers reduce vCPU counts and disable hyperthreading on Windows Server and SQL Server license-included instances, enabling up to 50% savings on vCPU-based licensing costs while maintaining full memory and IOPS performance. This feature targets database workloads that need high memory and IOPS but fewer vCPUs – for example, an r7i.8xlarge instance can be reduced from 32 to 16 vCPUs while keeping its 256 GiB memory and 40,000 IOPS. The CPU optimization extends EC2’s existing Optimize CPUs feature to license-included instances, addressing a common pain point where customers overpay for Microsoft licensing due to fixed vCPU counts. Available now in all commercial AWS regions and GovCloud regions, with no additional charges beyond the adjusted licensing costs based on the modified vCPU count. This positions AWS competitively against Azure for SQL Server workloads by offering more granular control over licensing costs, particularly important as organizations migrate legacy database workloads to the cloud. Interested in CPU options? Check those out here . 30:20 Justin – “This is a little weird to me, because I thought this already existed.” 31:46 AWS Systems Manager Patch Manager launches security updates notification for Windows AWS Systems Manager Patch Manager now includes an “AvailableSecurityUpdate” state that identifies Windows security patches available but not yet approved by patch baseline rules, helping prevent accidental exposure from delayed patch approvals. The feature addresses a specific operational risk where administrators using ApprovalDelay with extended timeframes could unknowingly leave systems vulnerable, with instances marked as Non-Compliant by default when security updates are pending. Available across all AWS Systems Manager regions with no additional charges beyond standard pricing, the feature integrates directly into existing patch baseline configurations through the console at https://console.aws.amazon.com/systems-manager/patch-manager. Organizations can customize compliance reporting behavior to maintain existing workflows while gaining visibility into security patch availability across their Windows fleet, particularly useful for enterprises with complex patch approval processes. The update provides a practical solution for balancing security requirements with operational stability, allowing teams to maintain patch deployment schedules while staying informed about critical security updates awaiting approval. 30:20 Ryan – “It sounds like just a quality of life improvement, but it’s something that should be so basic, but isn’t there, right? Which is like Windows patch management is cobbled together and not really managed well, and so you could have a patch available, but the only way to find out that it was available previously to this was to actually go ahead and patch it and then see if it did something. And so now, at least you have a signal on that; you can apply your patches in a way that’s not going to take down your entire service if a patch goes wrong. So this is very nice. I think for people using the Systems Manager patch management, they’re going to be very happy with this.” 35:26 Introducing CLI Agent Orchestrator: Transforming Developer CLI Tools into a Multi-Agent Powerhouse | AWS Open Source Blog AWS introduces CLI Agent Orchestrator (CAO) , an open source framework that enables multiple AI-powered CLI tools like Amazon Q CLI and Claude Code to work together as specialized agents under a supervisor agent, addressing limitations of single-agent approaches for complex enterprise development projects. CAO uses hierarchical orchestration with tmux session isolation and Model Context Protocol servers to coordinate specialized agents – for example, orchestrating Architecture, Security, Performance, and Test agents simultaneously during mainframe modernization projects. The framework supports three orchestration patterns (Handoff for synchronous transfers, Assign for parallel execution, Send Message for direct communication) plus scheduled runs using cron-like automation, with all processing occurring locally for security and privacy. Currently supports Amazon Q Developer CLI and Claude Code with planned expansion to OpenAI Codex CLI , Gemini CLI , Qwen CLI , and Aiden – no pricing mentioned as it’s open source, available at github.com/awslabs/cli-agent-orchestrator . Key use cases include multi-service architecture development, enterprise migrations requiring parallel implementation, comprehensive research workflows, and multi-stage quality assurance processes that benefit from coordinated specialist agents. We definitely appreciate another tool in the Agent Orchestration world. 37:46 Amazon ECS now publishes AWS CloudTrail data events for insight into API activities Amazon ECS now publishes CloudTrail data events for ECS Agent API activities, enabling detailed monitoring of container instance operations, including polling (ecs: Poll), telemetry sessions (ecs: StartTelemetrySession), and managed instance logging (ecs: PutSystemLogEvents). Security and operations teams gain comprehensive audit trails to detect unusual access patterns, troubleshoot agent communication issues, and understand how container instance roles are utilized for compliance requirements. The feature uses the new data event resource type AWS::ECS::ContainerInstance and is available for ECS on EC2 in all AWS regions, with ECS Managed Instances supported in select regions. Standard CloudTrail data event charges apply – typically $0.10 per 100,000 events recorded, making this a cost-effective solution for organizations needing detailed container instance monitoring. This addresses a previous visibility gap in ECS operations, as teams can now track agent-level activities that were previously opaque, improving debugging capabilities and security posture for containerized workloads. 39:33 Ryan – “This is definitely something I would use sparingly because the UCS API is agent API chatting. So this seems like it would be very expensive, very fast.” GCP 41:22 G4 VMs powered by NVIDIA RTX 6000 Blackwell GPUs are GA | Google Cloud Blog Google Cloud launches G4 VMs with NVIDIA RTX 6000 Blackwell GPUs, offering up to 9x throughput improvement over G2 instances and supporting workloads from AI inference to digital twin simulations with configurations of 1, 2, 4, or 8 GPUs. The G4 VMs feature enhanced PCIe-based peer-to-peer data paths that deliver up to 168% throughput gains and 41% lower latency for multi-GPU workloads, addressing the bottleneck issues common in serving large generative AI models that exceed single GPU memory limits. Each GPU provides 96GB of GDDR7 memory (up to 768GB total), native FP4 precision support, and Multi-Instance GPU capability that allows partitioning into 4 isolated instances, enabling efficient serving of models from under 30B to over 100B parameters. NVIDIA Omniverse and Isaac Sim are now available on Google Cloud Marketplace as turnkey solutions for G4 VMs, enabling immediate deployment of industrial digital twin and robotics simulation applications with full integration across GKE, Vertex AI, Dataproc, and Cloud Run. G4 VMs are available immediately with broader regional availability than previous GPU offerings, though specific pricing details were not provided in the announcement – customers should contact Google Cloud sales for cost information. (AKA $$$$.) 43:03 Dataproc 2.3 on Google Compute Engine | Google Cloud Blog Dataproc 2.3 introduces a lightweight, FedRamp High-compliant image that contains only essential Spark and Hadoop components, reducing CVE exposure and meeting strict security requirements for organizations handling sensitive data. Optional components like Flink , Hive WebHCat , and Ranger are now deployed on-demand during cluster creation rather than pre-packaged, keeping clusters lean by default while maintaining full functionality when needed. Custom images allow pre-installation of required components to reduce cluster provisioning time while maintaining the security benefits of the lightweight base image. The image supports multiple operating systems, including Debian 12 , Ubuntu 22 , and Rocky 9 , with deployment as simple as specifying version 2.3 when creating clusters via gcloud CLI. Google employs automated CVE scanning and patching combined with manual intervention for complex vulnerabilities to maintain compliance standards and security posture. 44:14 Ryan – “But on the contrary, like FedRAMP has such tight SLAs for vulnerability management that you don’t have to carry this risk or request an exception because of Google not patching Flink as fast as you would like them to. At least this puts the control at the end user, where they can say, well, I’m not going to use it.” 44:45 BigQuery Studio gets improved console interface | Google Cloud Blog BigQuery Studio’s new interface introduces an expanded Explorer view that allows users to filter resources by project and type, with a dedicated search function that spans across all BigQuery resources within an organization – addressing the common pain point of navigating through large-scale data projects. The Reference panel provides context-aware information about tables and schemas directly within the code editor, eliminating the need to switch between tabs or run exploratory queries just to check column names or data types – particularly useful for data analysts writing complex SQL queries. Google has streamlined the workspace by moving job history to a dedicated tab accessible from the Explorer pane and removing the bottom panel clutter, while also allowing users to control tab behavior with double-click functionality to prevent unwanted tab replacements. The update includes code generation capabilities where clicking on table elements in the Reference panel automatically inserts query snippets or field names into the editor, reducing manual typing errors and speeding up query development workflows. This interface refresh targets data analysts, data engineers, and data scientists who need efficient navigation across multiple BigQuery projects and datasets – no pricing changes mentioned as this appears to be a UI update to the existing BigQuery Studio service. 46:00 Ryan – “Although I’m a little nervous about having all the BigQuery resources across an organization available on a single console, just because it sounds like a permissions nightmare.” 47:10 Manage your prompts using Vertex SDK | Google Cloud Blog Google launches GA of Prompt Management in Vertex AI SDK , enabling developers to create, version, and manage prompts programmatically through Python code rather than tracking them in spreadsheets or text files. The feature provides seamless integration between Vertex AI Studio’s visual interface for prompt design and the SDK for programmatic management , with prompts stored as centralized resources within Google Cloud projects for team collaboration. Enterprise security features include Customer-Managed Encryption Keys (CMEK) and VPC Service Controls (VPCSC) support, addressing compliance requirements for organizations handling sensitive data in their AI applications. Key use cases include teams building production generative AI applications that need version control, consistent prompt deployment across environments, and the ability to programmatically update prompts without manual code changes. Pricing follows standard Vertex AI model usage rates with no additional charges for prompt management itself; documentation available at cloud.google.com/vertex-ai/generative-ai/docs/model-reference/prompt-classes . 47:43 Justin – “If your prompt has sensitive data in it, I have questions already.” 49:05 Gemini Code Assist in GitHub for Enterprises | Google Cloud Blog Google launches Gemini Code Assist for GitHub Enterprise , bringing AI-powered code reviews to enterprise customers using GitHub Enterprise Cloud and on-premises GitHub Enterprise Server. This addresses the bottleneck where 60.2% of organizations take over a day for code changes to reach production due to manual review processes. The service provides organization-level controls, including centralized custom style guides and org-wide configuration settings, allowing platform teams to enforce coding standards automatically across all repositories. Individual teams can still customize repo-level settings while maintaining organizational baselines. Built under Google Cloud Terms of Service, the enterprise version ensures code prompts and model responses are stateless and not stored, with Google committing not to use customer data for model training without permission. This addresses enterprise security and compliance requirements for AI-assisted development. Currently in public preview with access through the Google Cloud Console , the service includes a higher pull request quota than the individual developer tier. Google is developing additional features, including agentic loop capabilities for automated issue resolution and bug fixing. This release complements the recently launched Code Review Gemini CLI Extension for terminal-based AI assistance and represents part of Google’s broader strategy to provide AI assistance across the entire software development lifecycle. Pricing details are not specified in the announcement. 51:08 Ryan – “It’s just sort of the ability to sort of do organizational-wide things is super powerful for these tools, and I’m just sort of surprised that GitHub allows that. It seems like they would have to develop API hooks and externalize that.” 53:19 Vertex AI context caching | Google Cloud Blog Vertex AI context caching reduces costs by 90% for repeated content in Gemini models by storing precomputed tokens – implicit caching happens automatically, while explicit caching gives developers control over what content to cache for predictable savings The feature supports caching from 2,048 tokens up to Gemini 2.5 Pro’s 1 million token context window across all modalities (text, PDF, image, audio, video) with both global and regional endpoint support Key use cases include document processing for financial analysis, customer support chatbots with detailed system instructions, codebase Q&A for development teams, and enterprise knowledge base queries Implicit caching is enabled by default with no code changes required and clears within 24 hours, while explicit caching charges standard input token rates for initial caching, then a 90% discount on reuse, plus hourly storage fees based on TTL. Integration with Provisioned Throughput ensures production workloads benefit from caching, and explicit caches support Customer Managed Encryption Keys (CMEK) for additional security compliance 54:18 Ryan – “This is awesome. If you have a workload where you’re gonna have very similar queries or prompts and have it return similar data, this is definitely nicer than having to regenerate that every time. They’ve been moving more and more towards this. And I like to see it sort of more at a platform level now, whereas you could sort of implement this – in a weird way – directly in a model, like in a notebook or something. This is more of a ‘turn it on and it works’.” 55:30 Cloud Armor named Strong Performer in Forrester WAVE, new features launched Cloud Armor introduces hierarchical security policies (GA) that enable WAF and DDoS protection at the organization, folder, and project levels, allowing centralized security management across large GCP deployments with consistent policy enforcement. Enhanced WAF inspection capability (preview) expands request body inspection from 8KB to 64KB for all preconfigured rules, improving detection of malicious content hidden in larger payloads while maintaining performance. JA4 network fingerprinting support (GA) provides advanced SSL/TLS client identification beyond JA3, offering deeper behavioral insights for threat hunting and distinguishing legitimate traffic from malicious actors. Organization-scoped address groups (GA) enable IP range list management across multiple security policies and products like Cloud Next Generation Firewall , reducing configuration complexity and duplicate rules. Cloud Armor now protects Media CDN with Network Threat Intelligence and ASN blocking capabilities (GA), defending media assets at the network edge against known malicious IPs and traffic patterns. 56:59 Ryan – “These are some pretty advanced features for a cloud platform provided WAF. It’s pretty cool.” Azure 58:44 Generally Available: Observed capacity metric in Azure Firewall Azure Firewall’s new observed capacity metric provides real-time visibility into capacity unit utilization, helping administrators track actual scaling behavior versus provisioned capacity for better resource optimization and cost management. This observability enhancement addresses a common blind spot where teams over-provision firewall capacity due to uncertainty about actual usage patterns, potentially reducing unnecessary Azure spending on unused capacity units. The metric integrates with Azure Monitor and existing alerting systems, enabling proactive capacity planning and automated scaling decisions based on historical utilization trends rather than guesswork. Target customers include enterprises with variable traffic patterns and managed service providers who need granular visibility into firewall performance across multiple client deployments to optimize resource allocation. While pricing remains unchanged for Azure Firewall itself (starting at $1.25/hour plus $0.016/GB processed), the metric helps justify right-sizing decisions that could significantly impact monthly costs for organizations running multiple firewall instances. Generally Available: Prescaling in Azure Firewall Azure Firewall prescaling allows administrators to reserve capacity units in advance for predictable traffic spikes like holiday shopping seasons or product launches, eliminating the lag time typically associated with auto-scaling firewall resources. This feature addresses a common pain point where Azure Firewall’s auto-scaling couldn’t respond quickly enough to sudden traffic surges, potentially causing performance degradation during critical business events. Prescaling integrates with Azure’s existing capacity planning tools and can be configured through Azure Portal, PowerShell, or ARM templates, making it accessible for both manual and automated deployment scenarios. Target customers include e-commerce platforms, streaming services, and any organization with predictable traffic patterns that require guaranteed firewall throughput during peak periods. While specific pricing wasn’t detailed in the announcement, prescaling will likely follow Azure Firewall’s existing pricing model where customers pay for provisioned capacity units, with costs varying by region and SKU tier. When you combine these two announcements, they’re pretty good! 1:01:35 Public Preview: Environmental sustainability features in Azure API Management Azure API Management introduces carbon-aware capabilities that allow organizations to route API traffic and adjust policy behavior based on carbon intensity data, helping reduce the environmental impact of API infrastructure operations. The feature enables developers to implement sustainability-focused policies such as throttling non-critical API calls during high carbon intensity periods or routing traffic to regions with cleaner energy grids. This aligns with Microsoft’s broader carbon negative commitment by 2030 and provides enterprises with tools to measure and reduce the carbon footprint of their digital services at the API layer. Target customers include organizations with ESG commitments and sustainability reporting requirements who need granular control over their cloud infrastructure’s environmental impact. Pricing details are not yet available for the preview, but the feature integrates with existing API Management tiers and will likely follow consumption-based pricing models when generally available. 1:02:44 Matt – “So APIMs are one, stupidly expensive. If you have to be on the premier tier, it’s like $2,700 a month. And then if you want HA, you have to have two of them. So whatever they’re doing to the hood is stupidly expensive. If you ever had to deal with the SharePoint, they definitely use them because I’ve hit the same error codes as we provide to customers. On the second side, when you do scale them, you can scale them to be multi-region APIMs in the paired region concept, so in theory, what you can do based on this is route a cheaper or more environmentally efficient one, you could route to your paired region and then have the traffic coming that way.” 1:06:09 Unlock insights about your data using Azure Storage Discovery Azure Storage Discovery is now generally available as a fully managed service that provides enterprise-wide visibility into data estates across Azure Blob Storage and Data Lake Storage , helping organizations optimize costs, ensure security compliance, and improve operational efficiency across multiple subscriptions and regions. The service integrates Microsoft Copilot in Azure to enable natural language queries for storage insights, allowing non-technical users to ask questions like “Show me storage accounts with default access tier as Hot above 1TiB with least transactions” and receive actionable visualizations without coding skills. Because a non-technical person is asking this question. In the ever-wise words of Marcia Brady, “Sure, Jan.” Key capabilities include 18-month data retention for trend analysis, insights across capacity, activity, security configurations, and errors, with deployment taking less than 24 hours to generate initial insights from 15 days of historical data. Pricing includes a free tier with basic capacity and configuration insights retained for 15 days, while the standard plan adds advanced activity, error, and security insights with 18-month retention – specific pricing varies by region at azure.microsoft.com/pricing/details/azure-storage-discovery . Target use cases include identifying cost optimization opportunities through access tier analysis, ensuring security best practices by highlighting accounts still using shared access keys, and managing data redundancy requirements across global storage estates. 1:08:35 Ryan – “Well, I’ll tell you when I was looking for this report, I had a lot of natural language – and I was shouting it at my computer.” 1:09:52 Sora 2 in Azure AI Foundry: Create videos with responsible AI | Microsoft Azure Blog Azure AI Foundry now offers OpenAI’s Sora 2 video generation model in public preview, enabling developers to create videos from text, images, and existing video inputs with synchronized audio in multiple languages. The platform provides a unified environment combining Sora 2 with other generative models like GPT-image-1 and Black Forest Lab’s Flux 1.1 , all backed by Azure’s enterprise security and content filtering for both inputs and outputs. Key capabilities include realistic physics simulation, detailed camera control, and creative features for marketers, retailers, educators, and creative directors to rapidly prototype and produce video content within existing business workflows. Sora 2 is currently available via API through Standard Global deployment in Azure AI Foundry , with pricing details available on the Azure AI Foundry Models page. Microsoft positions this as part of their responsible AI approach, embedding safety controls and compliance frameworks to help organizations innovate while maintaining governance over generated content. We’re not big fans of this one. 1:10:12 Grok 4 is now available in Microsoft Azure AI Foundry | Microsoft Azure Blog Microsoft brings xAI’s Grok 4 model to Azure AI Foundry , featuring a 128K-token context window, native tool use, and integrated web search capabilities. The model emphasizes first-principles reasoning with a “think mode” that breaks down complex problems step-by-step, particularly excelling at math, science, and logic puzzles. Grok 4’s extended context window allows processing of entire code repositories, lengthy research papers, or hundreds of pages of documents in a single query. This eliminates the need to manually chunk large inputs and enables comprehensive analysis across massive datasets without losing context. Azure AI Content Safety is enabled by default for Grok 4, addressing enterprise concerns about responsible AI deployment. Microsoft and xAI conducted extensive safety testing and compliance checks over the past month to ensure business-ready protection layers. Pricing starts at $2 per million input tokens and $10 per million output tokens for Grok 4, with faster variants available at lower costs. The family includes Grok 4 Fast Reasoning for analytical tasks, Fast Non-Reasoning for lightweight operations , and Grok Code Fast 1 specifically for programming workflows. The model’s real-time data integration allows it to retrieve and incorporate external information beyond its training data, functioning as an autonomous research assistant. This capability is particularly valuable for tasks requiring current information like market analysis or regulatory updates. 1:11:04 Generally Available: Enhanced cloning and Public IP retention scripts for Azure Application Gateway migration Azure releases PowerShell scripts to help customers migrate from Application Gateway V1 to V2 before the April 2026 retirement deadline, addressing a critical infrastructure transition need. The enhanced cloning script preserves configurations during migration while the Public IP retention script ensures customers can maintain their existing IP addresses, minimizing disruption to production workloads. This migration tooling targets enterprises running legacy Application Gateway Standard or WAF SKUs who need to upgrade to Standard_V2 or WAF_V2 for continued support and access to newer features. The scripts automate what would otherwise be a complex manual migration process, reducing the risk of configuration errors and downtime during the transition. Customers should begin planning migrations now as the 2026 deadline approaches, with these scripts providing a standardized path forward for maintaining application delivery infrastructure. You know would be even easier than PowerShell? How about just doing it for them? Too easy? (Listener alert: This time it’s a Matt rant.) Oracle 1:14:59 Oracle Expands AI Agent Studio for Fusion Applications with New Marketplace, LLMs, and Vast Partner Network Oracle AI Agent Studio expands with new marketplace LLMs and partner integrations for Fusion Applications , allowing customers to build AI agents using models from Anthropic , Cohere , Meta , and others alongside Oracle’s own models. The platform enables the creation of AI agents that can automate tasks across Oracle Fusion Cloud Applications , including ERP, HCM, and CX, with pre-built templates and low-code development tools for business users. Oracle is partnering with major consulting firms like Accenture , Deloitte , and Infosys to help customers implement AI agents, though this likely means significant professional services costs for most deployments. The AI agents can handle tasks like expense report processing, supplier onboarding, and customer service inquiries, with Oracle claiming reduced manual work by up to 50% in some use cases. Pricing details remain unclear, but the service requires Oracle Fusion Applications subscriptions and likely additional fees for LLM usage and agent deployment based on Oracle’s typical pricing model. 1:15:45 Ryan – “They’re partnering with these giant firms that will come in with armies of engineers who will build you a thing – and hopefully document it before running away.” Closing And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod