Manage episode 521491389 series 3680004
Welcome to episode 331 of The Cloud Pod, where the forecast is always cloudy! Jonathan, Ryan, Matt, and Justin (for a little bit, anyway) are in the studio today to bring you all the latest in cloud and AI news. This week, we’re looking at our Ignite predictions (that side gig as internet psychics isn’t looking too good) undersea cables (our fave!), plus datacenters and more. Plus Claude and Azure make a 30 billion dollar deal! Take a break from turkey and avoiding politics, and let’s take a trip into the clouds!
Titles we almost went with this week
- GPT-5.1 Gets a Shell Tool Because Apparently We Haven’t Learned Anything From Sci-Fi Movies
- The Great Ingress Egress: NGINX Controller Waves Goodbye After Years of Volunteer Burnout
- Queue the Applause: Lambda SQS Mapping Gets a Serious Speed Boost
- SELECT * FROM future WHERE SQL meets AI without the prompt drama
- MFA or GTFO: Microsoft’s 99.6% Phishing-Resistant Authentication Achievement
- JWT Another Thing ALB Can Do: OAuth Validation Moves to the Load Balancer
- Google’s Emerging Threats Center: Because Manually Checking 12 Months of Logs Sounds Terrible
- EventBridge Gets a Drag-and-Drop Makeover: No More Schema Drama
- Permission Denied: How Granting Access Took Down the Internet
Follow Up
00:51 Ignite Predictions – The Results
Matt (Who is in charge of sound effects, so be aware)
- ACM Competitor – True SSL competitive product
- AI announcement in Security AI Agent (Copilot for Sentinel) – sort of (½)
- Azure DevOps Announcement
Justin
- New Cobalt and Mai Gen 2 or similar – Check
- Price Reduction on OpenAI & Significant Prompt Caching
- Microsoft Foundational LLM to compete with OpenAI –
Jonathan
- The general availability of new, smaller, and more power-efficient Azure Local hardware form factors
- Declarative AI on Fabric: This represents a move towards a declarative model, where users state the desired outcome, and the AI agent system determines the steps needed to achieve it within the Fabric ecosystem.
- Advanced Cost Management: Granular dashboards to track the token and compute consumption per agent or per transaction, enabling businesses to forecast costs and set budgets for their agent workforce.
How many times will they say Copilot:
The word “Copilot” is mentioned 46 to 71 times in the video.
Jonathan 45
Justin: 35
Matt: 40
General News
05:13 Cloudflare outage on November 18, 2025
- Cloudflare experienced its worst outage since 2019 on November 18, 2025, lasting approximately three hours and affecting core traffic routing across its entire network.
- The incident was triggered by a database permissions change that caused a Bot Management feature file to double in size, exceeding hardcoded limits in their proxy software and causing system panics that resulted in 5xx errors for customers.
- The root cause reveals a cascading failure pattern, where a ClickHouse database query began returning duplicate column metadata after permission changes.
- This resulted in a significant increase in the feature file, from approximately 60 features to over 200, which exceeded the preallocated memory limit of 200 features in their Rust-based FL2 proxy code.
- The team initially suspected a DDoS attack due to fluctuating symptoms caused by the bad configuration file being generated every five minutes as the database cluster was gradually updated.
- The outage impacted multiple Cloudflare services, including their CDN, Workers KV, Access, and even their own dashboard login system through Turnstile dependencies.
- Customers on the older FL proxy engine did not see errors but received incorrect bot scores of zero, potentially causing false positives for those using bot blocking rules.
- Cloudflare’s remediation plan includes treating internal configuration files with the same validation rigor as user input, implementing more global kill switches for features, and preventing error reporting systems from consuming excessive resources during incidents.
- The company acknowledged this as unacceptable for their position in the Internet ecosystem and committed to architectural improvements to prevent similar failures.
06:41 Justin – “Definitely a bad outage, but I appreciate that they owned it, and owned it hard… especially considering they were front page news.”
AI Is Going Great, or How ML Makes Money
07:27 Introducing GPT-5.1 for developers | OpenAI
- OpenAI has released GPT-5.1 in their API platform with adaptive reasoning that dynamically adjusts thinking time based on task complexity, resulting in 2-3x faster performance on simple tasks while maintaining frontier intelligence.
- The model includes a new “no reasoning” mode (reasoning_effort set to ‘none’) that delivers 20% better low-latency tool calling performance compared to GPT-5 minimal reasoning, making it suitable for latency-sensitive applications while supporting web search and improved parallel tool calling.
- GPT-5.1 introduces extended prompt caching with 24-hour retention (up from minutes), maintaining the existing 90% cost reduction for cached tokens with no additional storage charges.
- Early adopters report the model uses approximately half the tokens of competitors at similar quality levels, with companies like Balyasny Asset Management seeing agents run 50% faster while exceeding GPT-5 accuracy.
- The release includes two new developer tools in the Responses API: apply_patch for structured code editing using diffs without JSON escaping, and a shell tool that allows the model to propose and execute command-line operations in a controlled plan-execute loop. GPT-5.1 achieves 76.3% on SWE-bench Verified and shows 7% improvement on diff editing benchmarks according to early testing partners like Cline and Augment Code.
- OpenAI is also releasing specialized gpt-5.1-codex and gpt-5.1-codex-mini models optimized specifically for long-running agentic coding tasks, while maintaining the same pricing and rate limits as GPT-5.
- If you didn’t catch it in the podcast, Justin HATES this. Hates. It. All the hate.
- The company has committed to not deprecating GPT-5 in the API and will provide advanced notice if deprecation plans change.
- Pricing and rate limits are the same at GPT-5.
9:31 Ryan – “I didn’t really like GPT-5, so I don’t have high expectations, but as these things enhance, I’ve found using different models for different use cases has some advantages, so maybe I’ll find the case for this one.”
11:31 Piloting group chats in ChatGPT | OpenAI
- OpenAI is piloting group chat functionality in ChatGPT, starting with users in Japan, New Zealand, South Korea, and Taiwan across all subscription tiers (Free, Go, Plus, and Pro).
- The feature allows up to 20 people to collaborate in a shared conversation with ChatGPT, with responses powered by GPT-5.1 Auto that selects the optimal model based on the prompt and the user’s subscription level.
- ChatGPT has been trained with new social behaviors for group contexts, including deciding when to respond or stay quiet based on conversation flow, reacting with emojis, and referencing profile photos for personalized image generation.
- Users can mention “ChatGPT” explicitly to trigger a response, and custom instructions can be set per group chat to control tone and personality.
- Privacy controls separate group chats from personal conversations, with personal ChatGPT memory not shared or used in group contexts.
- Users must accept invitations to join, can see all participants, and can leave at any time, with group creators having special removal privileges.
- The feature includes safeguards for users under 18, automatically reducing sensitive content exposure for all group members when a minor is present.
- Parents can disable group chats entirely through parental controls, providing additional oversight for younger users.
- Rate limits apply only to ChatGPT responses (not user-to-user messages) and count against the subscription tier of the person ChatGPT is responding to.
- The feature supports search, image and file uploads, image generation, and dictation, making it functional for both personal planning and workplace collaboration scenarios.
12:41 Jonathan – “I’d rather actually have group chats enabled if kids are going to use it because at least you have witnesses to the conversation at that point.”
16:38 Gemini 3: Introducing the latest Gemini AI model from Google
- Google launches Gemini 3 Pro in preview across its product suite, including the Gemini app, AI Studio, Vertex AI, and a new AI Mode in Search with generative UI capabilities.
- The model achieves a 1501 Elo score on LMArena leaderboard and demonstrates 91.9% on GPQA Diamond, with a 1 million token context window for processing multimodal inputs including text, images, video, audio and code.
- Gemini 3 Deep Think mode offers enhanced reasoning performance, scoring 41.0% on Humanity’s Last Exam and 45.1% on ARC-AGI-2 with code execution.
- Google is providing early access to safety testers before rolling out to Google AI Ultra subscribers in the coming weeks, following comprehensive safety evaluations per their Frontier Safety Framework.
- Google introduces Antigravity, a new agentic development platform that integrates Gemini 3 Pro with Gemini 2.5 Computer Use for browser control and Gemini 2.5 Image for editing.
- The platform enables autonomous agent workflows with direct access to editor, terminal, and browser, scoring 54.2% on Terminal-Bench 2.0 and 76.2% on SWE-bench Verified for coding agent capabilities.
- The model shows improved long-horizon planning by topping Vending-Bench 2 leaderboard and delivers enhanced agentic capabilities through Gemini Agent for Google AI Ultra subscribers.
- Gemini 3 demonstrates 72.1% on SimpleQA Verified for factual accuracy and 1487 Elo on WebDev Arena for web development tasks, with availability in third-party platforms including Cursor, GitHub, JetBrains, and Replit.
18:24 Ryan – “I look forward to trying this. My initial attempts with Gemini 2.5 did not go well, but I found a sort of sweet spot in using it for planning and documentation. It’s still much better at coding than any other model that I’ve used. So cool, I look forward to using this.”
19:14 Microsoft, NVIDIA, and Anthropic announce strategic partnerships – The Official Microsoft Blog
- Continuing the messy breakups…
- Anthropic commits to $30 billion in Azure compute capacity, and up to one gigawatt of additional capacity, making this one of the largest cloud infrastructure commitments in AI history.
- This positions Azure as Anthropic’s primary scaling platform for Claude models.
- NVIDIA and Anthropic are establishing their first deep technology partnership focused on co-design and engineering optimization.
- Anthropic will optimize Claude models for NVIDIA Grace Blackwell and Vera Rubin systems, while NVIDIA will tune future architectures specifically for Anthropic workloads to improve performance, efficiency, and total cost of ownership.
- Claude models, including Sonnet 4.5, Opus 4.1, and Haiku 4.5, are now available through Microsoft Foundry on Azure, making Claude the only frontier model accessible across all three major cloud platforms (AWS, Azure, GCP).
- Azure enterprise customers gain expanded model choice beyond OpenAI offerings.
- Microsoft commits to maintaining Claude integration across its entire Copilot family, including GitHub Copilot, Microsoft 365 Copilot, and Copilot Studio.
- This ensures developers and enterprise users can leverage Claude capabilities within existing Microsoft productivity and development workflows.
- NVIDIA and Microsoft are investing up to $10 billion and $5 billion, respectively, in Anthropic as part of the partnership. So yes, that’s a lot of money going back and forth.
- The combined $15 billion investment represents substantial backing for Anthropic’s continued development and positions all three companies to benefit from Claude’s growth trajectory.
21:57 Jonathan – “I’m wondering what Anthropic’s plan is – what they’re working on in the background – because they have just taken a huge amount of capacity from AWS and their new data center in Northern Indiana, and now another 30 billion in Azure Compute? I guess they’re still building models every day… that’s a lot of money flying around.”
Cloud Tools
23:17 Ingress NGINX Retirement: What You Need to Know | Kubernetes Contributors
- Ingress NGINX, one of the most popular Kubernetes ingress controllers that has powered billions of requests worldwide, is being retired in March 2026 due to unsustainable maintenance burden and mounting technical debt.
- The project has struggled for years with only one or two volunteer maintainers working after hours, and despite its widespread use in hosted platforms and enterprise clusters, efforts to find additional support have failed.
- The retirement stems from security concerns around features that were once considered flexible but are now viewed as vulnerabilities, particularly the snippets annotations that allowed arbitrary NGINX configuration.
- The Kubernetes Security Response Committee and SIG Network exhausted all options to make the project sustainable before making this difficult decision to prioritize user safety over continuing an undermaintained critical infrastructure component.
- Users should immediately begin migrating to Gateway API, the modern replacement for Ingress that addresses many of the architectural issues that plagued Ingress NGINX. Existing deployments will continue to function and installation artefacts will remain available, but after March 2026, there will be zero security patches, bug fixes, or updates of any kind.
- Alternative ingress controllers are plentiful and listed in Kubernetes documentation, including cloud-provider-specific options and vendor-supported solutions.
- Users can check if they are affected by running a simple kubectl command to look for pods with the ingress-nginx selector across all namespaces.
- This retirement highlights a critical open source sustainability problem where massively popular infrastructure projects can fail despite widespread adoption when companies benefit from the software but do not contribute maintainer resources back to the community.
24:39 Justin – “I’m actually surprised NGINX didn’t want to pick this up; it seems like an obvious move for F5 to pick up and maintain the Ingress NGINX controller. But what do I know?”
25:46 Replicate is joining Cloudflare
- Cloudflare acquires Replicate, bringing its 50,000-plus model catalog and fine-tuning capabilities to Workers AI.
- This consolidates model discovery, deployment, and inference into a single platform backed by Cloudflare’s global network.
- The acquisition addresses the operational complexity of running AI models by combining Replicate’s Cog containerization tool with Cloudflare’s serverless infrastructure.
- Developers can now deploy custom models and fine-tune without managing GPU hardware or dependencies.
- Existing Replicate APIs will continue functioning without interruption while gaining Cloudflare’s network performance.
- Workers AI users get access to proprietary models like GPT-5 and Claude Sonnet through Replicate’s unified API alongside open-source options.
- The integration extends beyond inference to include AI Gateway for observability and cost analytics, plus native connections to Cloudflare’s data stack, including R2 storage and Vectorize database.
- This creates an end-to-end platform for building AI applications with state management and real-time capabilities.
- Replicate’s community features for sharing models, publishing fine-tunes, and experimentation will remain central to the platform.
- The acquisition positions Cloudflare to compete more directly with hyperscaler AI offerings by combining model variety with edge deployment.
27:09 Ryan – “Cloudflare has been doing kind of amazing things at the edge, which is kind of neat. We’ve had serverless and functions for a while, and definitely options out there that provide much better performance. It’s kind of neat. They’re well-positioned to do that.”
28:02 KubeCon NA 2025 Recap: The Dawn of the AI Native Era | Blog
- KubeCon 2025 marked the industry shift from cloud native to AI native, with CNCF launching the Kubernetes AI Conformance Program to standardize how AI and ML workloads run across clouds and hardware accelerators like GPUs and TPUs.
- The live demo showed Dynamic Resource Allocation making accelerators first-class citizens in Kubernetes, signaling that AI infrastructure standardization is now a community priority.
- Harness showcased Agentic AI capabilities that transform traditional CI/CD pipelines into intelligent, adaptive systems that learn and optimize delivery automatically.
- Their booth demonstrated 17 integrated products spanning CI, CD, IDP, IaCM, security, testing, and FinOps, with particular emphasis on AI-powered pipeline creation and visual workflow design that caught significant attendee interest.
- Security emerged as a critical theme with demonstrations of zero-CVE malware attacks that bypass traditional vulnerability scanners by compromising the build chain itself.
- The solution path involves supply chain attestation using SLSA, policy-as-code enforcement, and artifact signing with Sigstore, which Harness demonstrated as native capabilities in their platform.
- Apple introduced Apple Containerization, a framework running Linux containers directly on macOS using lightweight microVMs that boot minimal Linux kernels in under a second.
- This combines VM-level security with container speed, creating safer local development environments that could reshape how developers work on Mac hardware.
- The conference emphasized that AI native infrastructure requires intelligent scheduling, deeper observability, and verified agent identity using SPIFFE/SPIRE, with multiple sessions showing practical implementations at scale from companies like Yahoo, managing 8,000 nodes, and Spotify handling a million infrastructure resources.
29:51 Justin – “Everyone has moved on from Kubernetes as the hotness; now it’s all AI, so what are people working on in the AI space?”
AWS
30:27 AWS Lambda enhances event processing with provisioned mode for SQS event-source mapping
- AWS Lambda now offers provisioned mode for SQS event source mapping, providing 3x faster scaling and 16x higher concurrency (up to 20,000 concurrent executions) compared to the standard polling mode.
- This addresses customer demands for better control over event processing during traffic spikes, particularly for financial services and gaming companies requiring sub-second latency.
- The new provisioned mode uses dedicated event pollers that customers can configure with minimum and maximum values, where each poller handles up to 1 MB/sec throughput, 10 concurrent invokes, or 10 SQS API calls per second.
- Setting a minimum number of pollers maintains baseline capacity for immediate response to traffic surges, while the maximum prevents downstream system overload.
- Pricing is based on Event Poller Units (EPUs) charged for the number of pollers provisioned and their duration, with a minimum of 2 event pollers required per event source mapping.
- Each EPU supports up to 1 MB per second throughput capacity, though AWS has not published specific per-EPU pricing on the announcement.
- The feature is available now in all commercial AWS Regions and can be configured through the AWS Console, CLI, or SDKs.
- Monitoring is handled through CloudWatch metrics, specifically the ProvisionedPollers metric that tracks active event pollers in one-minute windows.
- This capability enables applications to handle up to 2 GBps of aggregate traffic while automatically scaling down to the configured minimum during low-traffic periods for cost optimization.
- The enhanced scaling detects growing backlogs within seconds and adjusts poller count dynamically between configured limits.
31:36 Ryan – “Where was this 5 years ago when we were maintaining a logging platform? This would have been very nice!”
33:30 Amazon EventBridge introduces enhanced visual rule builder
- EventBridge launches a new visual rule builder that integrates the Schema Registry with a drag-and-drop canvas, allowing developers to discover and subscribe to events from over 200 AWS services and custom applications without referencing individual service documentation.
- The schema-aware interface helps reduce syntax errors when creating event filter patterns and rules.
- The enhanced builder includes a comprehensive event catalog with readily available sample payloads and schemas, eliminating the need to hunt through documentation for event structures.
- This addresses a common pain point: developers previously had to manually locate and understand event formats across different AWS services.
- Available now in all regions where Schema Registry is launched at no additional cost beyond standard EventBridge usage charges.
- The feature is accessible through the EventBridge console and aims to reduce development time for event-driven architectures.
- The visual builder particularly benefits teams building complex event-driven applications that need to filter and route events from multiple sources.
- By providing schema validation upfront, it helps catch configuration errors before deployment rather than during runtime.
34:46 Matt – “I definitely – back in the day – had lots of fun with EventBridge, and trying to make sure I got the schemas right for every frame when you’re trying to trigger one thing from another. So not having to deal with that mess is exponentially better. You know, at this point, though, I feel like I would just tell AI to tell me what the scheme was and solve the problem that way.”
35:43 Application loadbalancer support client credential flow with JWT verification
- ALB now handles JWT token verification natively at the load balancer layer, eliminating the need for custom authentication code in backend applications. This offloads OAuth 2.0 token validation, including signature verification, expiration checks, and claims validation, directly to the load balancer, reducing complexity in microservices architectures.
- The feature supports Client Credentials Flow and other OAuth 2.0 flows, making it particularly useful for machine-to-machine and service-to-service authentication scenarios. Organizations can now centralize token validation at the edge rather than implementing it repeatedly across multiple backend services.
- This capability is available immediately in all AWS regions where ALB operates, with no additional ALB feature charges beyond standard load balancer pricing. Customers pay only for the existing ALB hourly rates and Load Balancer Capacity Units (LCUs) consumed.
- The implementation reads JWTs from request headers and validates against configured JSON Web Key Sets (JWKS) endpoints, supporting integration with identity providers like Auth0, Okta, and AWS Cognito.
- Failed validation results in configurable HTTP error responses before requests reach backend targets.
- This addresses a common pain point in API gateway and microservices deployments, where each service previously needed its own token validation logic.
- The centralized approach reduces code duplication and potential security inconsistencies across service boundaries.
38:40 Jonathan – “Maybe this is kind of a sign that Cognito is not gaining the popularity they wanted. Because effectively, you could re-spin this announcement as Auth0 and Okta are now first-class citizens when it comes to authentication through API Gateway and ALB.”
GCP
39:10 How Protective ReRoute improves network resilience | Google Cloud Blog
- Google Cloud’s Protective ReRoute (PRR) shifts network failure recovery from centralized routers to distributed endpoints, allowing hosts to detect packet loss and immediately reroute traffic to alternate paths.
- This host-based approach has reduced inter-datacenter outages from slow network convergence by up to 84 percent since deployment five years ago, with recovery times measured in single-digit multiples of round-trip time rather than seconds or minutes.
- PRR works by having hosts continuously monitor path health using TCP retransmission timeouts, then modifying IPv6 flow-label headers to signal the network to use alternate paths when failures occur. Google contributed this IPv6 flow-label modification mechanism to the Linux kernel version 4.20 and later, making it available as open source technology for the broader community.
- The feature is particularly critical for AI and ML training workloads, where even brief network interruptions can cause expensive job failures and restarts costing millions in compute time.
- Large-scale distributed training across multiple GPUs and TPUs requires the ultra-reliable data distribution that PRR provides to prevent communication pattern disruptions.
- Google Cloud customers can use PRR in two modes: hypervisor mode, which automatically protects cross-datacenter traffic without guest OS changes, or guest mode for the fastest recovery, requiring Linux kernel 4.20 plus, TCP applications, and IPv6 traffic, or gVNIC driver for IPv4.
- Documentation is available at cloud.google.com/compute/docs/networking for enabling guest-mode PRR on critical workloads.
- The architecture treats the network as a highly parallel system where reliability increases exponentially with available paths rather than degrading serially through forwarding stages.
- This approach capitalizes on Google’s network path diversity to protect real-time applications, frequent short-lived connections, and data integrity scenarios where packet loss causes corruption beyond just throughput reduction.
40:57 Ryan – “I was trying to think how I would even implement something like this in guest mode because it breaks my head. It seems pretty cool, and I’m sure that from an underlying technology at the infrastructure level, from the Google network, it sounds pretty neat. But it’s also the coordination of that failover seems very complex. And I would worry.”
41:54 Introducing the Emerging Threats Center in Google Security Operations | Google Cloud Blog
- Google Security Operations launches the Emerging Threats Center, a Gemini-powered detection engineering system that automatically generates security rules when new threat campaigns emerge from Google Threat Intelligence, Mandiant, and VirusTotal.
- The system addresses a key pain point where 59% of security leaders report difficulty deriving actionable intelligence from threat data, typically requiring days or weeks of manual work to assess organizational exposure.
- The platform provides two critical capabilities for security teams during major threat events: it automatically searches the previous 12 months of security telemetry for campaign-related indicators of compromise and detection rule matches, while also confirming active protection through campaign-specific detections.
- This eliminates the manual cross-referencing process that traditionally occurs when zero-day vulnerabilities emerge.
- Under the hood, the system uses an agentic workflow where Gemini ingests threat intelligence from Mandiant incident response and Google’s global visibility, generates synthetic event data mimicking adversary tactics, tests existing detection rules for coverage gaps, and automatically drafts new rules when gaps are found. Human security analysts maintain final approval before deployment, transforming detection engineering from a best-effort manual process into a systematic automated workflow.
- The Emerging Threats Center is available today for licensed Google Security Operations customers, though specific pricing details were not disclosed in the announcement.
- Organizations with high-volume security operations like Fiserv are already using the behavioral detection capabilities to move beyond single indicators toward systematic adversary behavior detection.
44:40 Jonathan – “I see this as very much a CrowdStrike-type AI solution for Google Cloud, in a way. Looking at the data, you’re identifying emerging threats, which is what CrowdStrike’s sales point really is, and then implementing controls to help quench that.”
47:56 Introducing Dhivaru and two new connectivity hubs | Google Cloud Blog
- Google is investing in Dhivaru, a new Trans-Indian Ocean subsea cable connecting the Maldives, Christmas Island, and Oman, extending the Australia Connect initiative to improve regional connectivity.
- The cable system aims to support growing AI service demand like Gemini 2.5 Flash and Vertex AI by providing resilient infrastructure across the Indian Ocean region.
- The announcement includes two new connectivity hubs in the Maldives and Christmas Island that will provide three core capabilities: cable switching for automatic traffic rerouting during faults, content caching to reduce latency by storing popular content locally, and colocation services offering rack space to carriers and local companies.
- These hubs are positioned to serve Africa, the Middle East, South Asia, and Oceania with improved reliability.
- Google emphasizes the energy efficiency of subsea cables compared to traditional data centers, noting that connectivity hubs require significantly less power since they focus on networking and localized storage rather than compute-intensive AI and cloud workloads.
- The company is exploring ways to use power demand from these hubs to accelerate local investment in sustainable energy generation in smaller locations.
- The connectivity hubs will provide strategic benefits by minimizing the distance data travels before switching paths, which improves resilience and reduces downtime for services across the region.
- This infrastructure investment aims to strengthen local economies while supporting Google’s objective of serving content from locations closer to users and customers.
- The project represents Google’s continued infrastructure expansion to meet long-term demand driven by AI adoption rates that are outpacing predictions, with partnerships including Ooredoo Maldives and Dhiraagu supporting the Maldives hub deployment.
49:38 Matthew – “I had to look up one connectivity hub, which is literally just a small little data center that just kind of handles basic networking and storage – and nothing fancy, which is interesting that they’re putting the two connectivity hubs. They’re dropping these hubs where all their cables terminate. So they are able to cache stuff at each location, which is always interesting.”
Azure
51:46 Infinite scale: The architecture behind the Azure AI superfactory – The Official Microsoft Blog
- Microsoft announces its second Fairwater datacenter in Atlanta, connecting it to the Wisconsin site and existing Azure infrastructure to create what they call a planet-scale AI superfactory.
- The facility uses a flat network architecture to integrate hundreds of thousands of NVIDIA GB200 and GB300 GPUs into a unified supercomputer for training frontier AI models.
- The datacenter achieves 140kW per rack power density through closed-loop liquid cooling that uses water equivalent to 20 homes annually and is designed to last 6-plus years without replacement.
- The two-story building design minimizes cable lengths between GPUs to reduce latency, while the site secures 4×9 availability power at 3×9 cost by relying on resilient grid power instead of traditional backup systems.
- Each rack houses up to 72 NVIDIA Blackwell GPUs connected via NVLink with 1.8TB GPU-to-GPU bandwidth and 14TB pooled memory per GPU.
- The facility uses a two-tier Ethernet-based backend network with 800Gbps GPU-to-GPU connectivity running on SONiC to avoid vendor lock-in and reduce costs compared to proprietary solutions.
- Microsoft deployed a dedicated AI WAN backbone with over 120,000 new fiber miles across the US last year to connect Fairwater sites and other Azure datacenters.
- This allows workloads to span multiple geographic locations and enables dynamic allocation between training, fine-tuning, reinforcement learning, and synthetic data generation based on specific requirements.
- The architecture addresses the challenge that large training jobs now exceed single-facility power and space constraints by creating fungibility across sites.
- Customers can segment traffic across scale-up networks within sites and scale-out networks between sites, maximizing GPU utilization across the combined system rather than being limited to a single datacenter.
55:25 Private Preview: Azure HorizonDB
- Azure HorizonDB for PostgreSQL enters private preview as Microsoft’s performance-focused database offering, featuring autoscaling storage up to 128 TB and compute scaling to 3,072 vCores.
- The service claims up to 3 times faster performance compared to open-source PostgreSQL, positioning it as a competitor to AWS Aurora and Google Cloud AlloyDB in the managed PostgreSQL space.
- The 128 TB storage ceiling represents a substantial increase over Azure’s existing PostgreSQL offerings, addressing enterprise workloads that previously required sharding or migration to other platforms.
- This storage capacity combined with the high vCore count targets large-scale OLTP and analytical workloads that need both horizontal and vertical scaling options.
- Microsoft appears to be building HorizonDB as a separate service line rather than an upgrade to existing Azure Database for PostgreSQL Flexible Server, suggesting different architecture and pricing models.
- Organizations currently using Azure Database for PostgreSQL will need to evaluate migration paths and cost implications when the service reaches general availability.
- The private preview status means limited customer access and no published pricing information yet.
- Enterprises interested in testing HorizonDB should expect typical private preview constraints, including potential feature changes, regional limitations, and SLA restrictions before general availability.
57:35 Jonathan – “So it sounds like they’ve pretty much built what Amazon did with the Aurora, separating the storage from the compute to let them scale independently.”
59:10 Public Preview: Microsoft Defender for Cloud + GitHub Advanced Security
- Microsoft Defender for Cloud now integrates natively with GitHub Advanced Security in public preview, creating a unified security workflow that spans from source code repositories through production cloud environments.
- This integration allows security teams and developers to work within a single platform rather than switching between separate tools for code scanning and cloud protection.
- The solution addresses the full application lifecycle security challenge by connecting GitHub’s code-level vulnerability detection with Defender for Cloud’s runtime protection capabilities.
- Organizations using both GitHub and Azure can now correlate security findings from development through deployment, reducing the gap between DevOps and SecOps teams.
- This preview targets cloud-native application teams who need consistent security policies across their CI/CD pipeline and production workloads. The integration is particularly relevant for organizations already invested in the Microsoft and GitHub ecosystem, as it leverages existing tooling rather than requiring additional third-party solutions.
- The announcement provides limited details on pricing structure, though organizations should expect costs to align with existing Defender for Cloud and GitHub Advanced Security licensing models.
- Specific regional availability and rollout timeline details were not included in the brief announcement.
1:00:35 Matthew – “It seems like it has a lot of potential, but without the pricing and Windows for Defender as a CPM, I feel like – for me – it lacks some features, when I’ve tried to use it. They’re going in the right direction; I don’t think they’re there at the end product yet.”
1:03:05 Public Preview: Smart Tier account level tiering (Azure Blob Storage and ADLS
- Azure introduces Smart Tier for Blob Storage and ADLS Gen2, which automatically moves data between hot, cool, and archive tiers based on access patterns without manual intervention.
- This eliminates the need for lifecycle management policies and reduces the operational overhead of managing storage costs across large data estates.
- The feature works at the account level rather than requiring per-container or per-blob configuration, making it simpler to deploy across entire storage accounts. Organizations with unpredictable access patterns or mixed workloads will benefit most, as the system continuously optimizes placement without predefined rules.
- Smart Tier monitors blob access patterns and automatically transitions objects to lower-cost tiers when appropriate, then moves them back to hot storage when access frequency increases.
- This differs from traditional lifecycle policies that rely on age-based rules and cannot respond dynamically to actual usage.
- The public preview allows customers to test the automated tiering without committing to production workloads, though specific pricing details for the Smart Tier feature itself were not disclosed in the announcement. Standard Azure Blob Storage tier pricing applies, with the hot tier being the most expensive and the archive tier offering the lowest storage costs but higher retrieval fees.
- This capability targets customers managing large volumes of data with variable access patterns, particularly those in analytics, backup, and archival scenarios where manual tier management becomes impractical at scale.
- The integration with ADLS Gen2 makes it relevant for big data and analytics workloads running on Azure.
1:05:18 Jonathan – “So they’ve always had the tiering, but now they’re providing an easy button for you based on access patterns.”
1:13:04 From idea to deployment: The complete lifecycle of AI on display at Ignite
2025 – The Official Microsoft Blog
- Microsoft Ignite 2025 introduces three intelligence layers for AI development: Work IQ connects Microsoft 365 data and user patterns, Fabric IQ unifies analytical and operational data under a shared business model, and Foundry IQ provides a managed knowledge system routing across multiple data sources.
- These layers work together to give AI agents business context rather than requiring custom integrations for each data source.
- Microsoft Agent Factory offers a single metered plan for building and deploying agents across Microsoft 365 Copilot and Copilot Studio without upfront licensing requirements.
- The program includes access to AI Forward Deployed Engineers and role-based training, targeting organizations that want to build custom agents but lack internal AI expertise or want to avoid complex provisioning processes.
- Microsoft Agent 365 provides centralized observability, management, and security for AI agents regardless of whether they were built with Microsoft platforms, open-source frameworks, or third-party tools. With IDC projecting 1.3 billion AI agents by 2028, this addresses the governance gap where unmanaged agents become shadow IT, integrating Defender, Entra, Purview, and Microsoft 365 admin center for agent lifecycle management.
- Work IQ now exposes APIs for developers to build custom agents that leverage the intelligence layer’s understanding of user workflows, relationships, and content patterns. This allows organizations to extend Microsoft 365 Copilot capabilities into their own applications while maintaining the native integration advantages rather than relying on third-party connectors.
- The announcements position Microsoft as providing end-to-end AI infrastructure from the datacenter to the application layer, with particular emphasis on making agent development accessible to frontline workers rather than limiting it to specialized AI teams. No specific pricing details were provided for the new services beyond the mention of metered plans for Agent Factory.
Closing
And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod
324 episodes