Artwork
iconShare
 
Manage episode 508331345 series 3680004
Content provided by TCP.FM, Justin Brodley, Jonathan Baker, Ryan Lucas, and Matt Kohn. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by TCP.FM, Justin Brodley, Jonathan Baker, Ryan Lucas, and Matt Kohn or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.

Welcome to episode 322 of The Cloud Pod, where the forecast is always cloudy! We have BIG NEWS – Jonathan is back! He’s joined in the studio by Justin and Ryan to bring you all the latest in cloud and AI news, including ongoing drama in the Microsoft/OpenAI drama, saying goodbye to data transfer fees (in the EU), M4 Power, and more. Let’s get started!

Titles we almost went with this week

  • EU Later, Egress Fees: Google’s Brexit from Data Transfer Charges
  • The Keys to the Cosmos: Azure Unlocks Customer Control
  • Breaking Up is Hard to Do: Google Splits LLM Inference for Better Performance
  • OpenAI and Microsoft: From Exclusive to It’s Complicated
  • Google’s New Model Has Trust Issues (And That’s a Good Thing)
  • Mac to the Future: AWS Brings M4 Power to the Cloud
  • Oracle’s Cloud Nine: Stock Soars on Half-Trillion Dollar Dreams
  • ChatGPT: From Chat Bot to Hat Bot (Everyone’s Wearing Different Professional Hats)
  • Five Billion Reasons to Love British AI
  • NVMe Gonna Give You Up: AWS Delivers the Storage Metrics You’ve Been Missing
  • Tea and AI: OpenAI Crosses the Pond
  • The Norway Bug Strikes Back: A New YAML Hope

A big thanks to this week’s sponsor:

We’re sponsorless! Want to get your brand, company, or service in front of a very enthusiastic group of cloud news seekers? You’ve come to the right place! Send us an email or hit us up on our Slack channel for more info.

AI Is Going Great – Or How ML Makes Money

01:33 Microsoft and OpenAI make a deal: Reading between the lines of their secretive new agreement – GeekWire

  • Microsoft and OpenAI have signed a non-binding memorandum of understanding that will restructure their partnership, with OpenAI’s nonprofit entity receiving an equity stake exceeding $100 billion in a new public benefit corporation where Microsoft will play a major role.
  • The deal addresses the AGI clause that previously allowed OpenAI to unilaterally dissolve the partnership upon achieving artificial general intelligence, which had been a significant risk for Microsoft’s multi-billion-dollar investment.
  • Both companies are diversifying their partnerships – Microsoft is now using Anthropic’s technology for some Office 365 AI features, while OpenAI has signed a $300 billion computing contract with Oracle over five years.
  • Microsoft’s exclusivity on OpenAI cloud workloads has been replaced with a right of first refusal, enabling OpenAI to participate in the $500 billion Stargate AI project with Oracle and other partners.
  • The restructuring allows OpenAI to raise capital for its mission while ensuring the nonprofit’s resources grow proportionally, with plans to use funds for community impact, including a recently launched $50 million grant program.

ALSO:

OpenAI and Microsoft sign preliminary deal to revise partnership terms – Ars Technica

  • OpenAI and Microsoft signed a non-binding memorandum of understanding to revise their partnership terms, requiring formal contract finalization as OpenAI transitions from nonprofit to for-profit structure, with Microsoft holding over $13 billion in investments.
  • The partnership revision addresses growing competition between the companies for AI customers and OpenAI’s need for compute capacity beyond what Microsoft Azure can currently provide, leading OpenAI to explore additional cloud partnerships.
  • Contract complications include provisions that would restrict Microsoft’s access to OpenAI technology once AGI is achieved, now defined by both companies as AI systems generating at least $100 billion in profit rather than technical capabilities.
  • OpenAI abandoned its original full for-profit conversion plan after regulatory pressure and lawsuits from Elon Musk, who argues the shift violates OpenAI’s founding nonprofit mission to benefit humanity.
  • This restructuring impacts cloud infrastructure planning as hyperscalers must balance exclusive partnerships against the reality that leading AI companies need multi-cloud strategies to meet their massive compute demands.

02:59 Justin – “I’m not convinced that we can get to true AGI with the way that we’re building these models. I think there’s things that could lead us to breakthroughs that would get us to AGI, but the transformer model, and the way we do this, and predictive text, is not AGI. As good as you can be at predicting things, doesn’t mean you can have conscious thought.”

07:45 Introducing Upgrades to Codex

  • OpenAI upgraded Codex to better translate natural language into code with improvements in handling complex programming tasks, edge cases, and expanded multi-language support.
  • This enhances developer productivity in cloud-native applications where rapid prototyping and automation are essential.
  • The architecture changes and training data updates enable more accurate code generation, which could reduce development time for cloud infrastructure automation scripts, API integrations, and serverless function creation.
  • Enhanced Codex capabilities directly benefit cloud developers by automating repetitive coding tasks like writing boilerplate code for cloud service integrations, database queries, and deployment configurations.
  • The improved edge case handling makes Codex more reliable for production use cases, potentially enabling automated code generation for cloud monitoring scripts, data pipeline creation, and infrastructure-as-code templates.
  • These upgrades position Codex as a practical tool for accelerating cloud application development, particularly for teams building microservices, implementing CI/CD pipelines, or managing multi-cloud deployments.

10:14 Jonathan – “I think Codex is probably better at some classes of coding. I think it’s great at React; you want to build a UI, use Codex and use OpenAI stuff. You want to build a backend app written in C or Python or something else? I’d use Claude Code. There seem to be different focuses.”

13:24 How people are using ChatGPT

  • OpenAI’s analysis reveals ChatGPT usage patterns across diverse professional domains, with significant adoption in software development, content creation, education, and business operations, demonstrating the technology’s broad applicability beyond initial expectations.
  • The data shows developers using ChatGPT for code generation, debugging, and documentation tasks, while educators leverage it for lesson planning and personalized learning experiences, indicating practical integration into existing cloud-based workflows.
  • Business users report productivity gains through automated report generation, data analysis assistance, and customer service applications, suggesting potential for deeper integration with cloud platforms and enterprise systems.
  • Usage patterns highlight the need for cloud providers to optimize infrastructure for conversational AI workloads, including considerations for API rate limits, response latency, and cost management for high-volume applications.
  • The findings underscore growing demand for AI-powered tools in cloud environments, with implications for platform providers to develop specialized services for LLM deployment, fine-tuning, and integration with existing cloud services.

14:51 Jonathan – “I wish it was more detailed; like how many people are talking to it like it’s a person? How many people are doing nonsense (like on) Reddit?”

17:42 Introducing Stargate UK

  • OpenAI’s Stargate UK appears to be a regional deployment or infrastructure expansion focused on the UK market, potentially offering localized AI services with reduced latency and compliance with UK data sovereignty requirements.
  • This development suggests OpenAI is building dedicated cloud infrastructure in the UK, which could enable faster API response times for European customers and address GDPR compliance needs for AI workloads.
  • The UK-specific deployment may include region-locked models or features tailored to British English and UK-specific use cases, similar to how cloud providers offer region-specific services.
  • For businesses, this could mean the ability to keep AI processing and data within UK borders, addressing regulatory requirements for financial services, healthcare, and government sectors that require data localization.
  • The move indicates a broader trend of AI companies following traditional cloud provider patterns by establishing regional presence to meet performance, compliance, and data residency demands.

18:19 Justin – “I mean, we already have a GPU shortage, so to now make a regionalized need for AI is going to further strain the GPU capacity issues, and so I should probably buy some Nvidia stuff.”

AWS

19:37 Announcing Amazon EC2 M4 and M4 Pro Mac instances | AWS News Blog

  • AWS launches EC2 M4 and M4 Pro Mac instances built on Apple M4 Mac mini hardware, offering up to 20% better build performance than M2 instances with 24GB unified memory for standard M4 and 48GB for M4 Pro variants.
  • Each instance includes 2TB of local SSD storage for improved caching and build performance, though this storage is ephemeral and tied to the instance lifecycle rather than the dedicated host.
  • The instances integrate with AWS services like CodeBuild, CodePipeline, and Secrets Manager for CI/CD workflows, while supporting macOS Sequoia 15.6 and later with up to 10 Gbps VPC and 8 Gbps EBS bandwidth through Thunderbolt connections.
  • Pricing follows the standard EC2 Mac model with a 24-hour minimum allocation period on dedicated hosts, available through On-Demand and Savings Plans in US East and US West regions initially.
  • Beyond iOS/macOS development, the 16-core Neural Engine makes these instances suitable for ML inference workloads, expanding their use cases beyond traditional Apple platform development.

22:00 Accelerate serverless testing with LocalStack integration in VS Code IDE | AWS News Blog

  • AWS Toolkit for VS Code now integrates with LocalStack, enabling developers to test serverless applications locally without switching between tools or managing complex configurations.
  • The integration allows direct connection to LocalStack endpoints for emulating services like Lambda, SQS, EventBridge, and DynamoDB.
  • This addresses a key gap in serverless development workflows where AWS SAM CLI handles unit testing well, but developers need better solutions for local integration testing of multi-service architectures. Previously, LocalStack required standalone management and manual endpoint configuration.
  • The integration provides a tiered testing approach: LocalStack for early development without IAM/VPC complexity, then transition to cloud-based testing with remote debugging when needed. Developers can deploy stacks locally using familiar sam deploy commands with a LocalStack profile.
  • Available in AWS Toolkit v3.74.0 across all commercial AWS Regions, the LocalStack Free tier covers core services with no additional AWS costs. Paid LocalStack tiers offer expanded service coverage for teams needing broader emulation capabilities.
  • The feature continues AWS’s push to make VS Code the primary serverless development environment, building on recent console-to-IDE integration and remote debugging capabilities launched in July 2025.

23:05 Ryan – “It’s interesting; it’s one of those things where I’ve been able to deal with the complexity, so didn’t realize the size of the gap, but I can see how a developer, without infrastructure knowledge, might struggle a little bit.”

26:38 Amazon EC2 supports detailed performance stats on all NVMe local volumes

  • EC2 now provides 11 detailed performance metrics for instance store NVMe volumes at one-second granularity, including IOPS, throughput, queue length, and latency histograms broken down by IO size – matching the monitoring capabilities previously only available for EBS volumes.
  • This feature addresses a significant monitoring gap for workloads using local NVMe storage on Nitro-based instances, enabling teams to troubleshoot performance issues and optimize IO patterns without additional tooling or cost.
  • The latency histograms by IO size provide granular insights that help identify whether performance bottlenecks are related to small random reads, large sequential writes, or specific IO patterns in database and analytics workloads.
  • Available by default on all Nitro-based EC2 instances with local NVMe storage across all AWS regions at no additional charge, making it immediately accessible for existing deployments.
  • This brings feature parity between ephemeral instance store and persistent EBS storage monitoring, simplifying operations for hybrid storage architectures that use both storage types

New EFA metrics for improved observability of AWS networking

  • AWS adds five new Elastic Fabric Adapter metrics to help diagnose network performance issues in AI/ML and HPC workloads by tracking retransmitted packets, timeout events, and unresponsive connections.
  • The metrics are stored as counters in the sys filesystem and can be integrated with Prometheus and Grafana for monitoring dashboards and alerting, addressing the observability gap for high-performance networking workloads.
  • Available only on Nitro v4 and later instances with EFA installer 1.43.0+, this targets customers running distributed training or tightly-coupled HPC applications where network performance directly impacts job completion times.
  • These device-level counters help identify whether performance degradation stems from network congestion or instance misconfiguration, enabling faster troubleshooting for workloads that can cost thousands per hour.
  • The feature arrives as AWS faces increased competition in AI infrastructure from specialized providers, making network observability critical for customers deciding between cloud and on-premises deployments for large-scale training.

27:37 Jonathan – “That’s cool, it’s great that it’s local and it’s not through CloudWatch at .50 cents a metric per however long.”

28:19 Now generally available: Amazon EC2 R8gn instances

  • AWS launches R8gn instances powered by Graviton4 processors, delivering 30% better compute performance than Graviton3 and featuring up to 600 Gbps network bandwidth – the highest among network-optimized EC2 instances.
  • These memory-optimized instances scale up to 48xlarge with 1,536 GiB RAM and 60 Gbps EBS bandwidth, targeting network-intensive workloads like SQL/NoSQL databases and in-memory computing applications.
  • R8gn instances support Elastic Fabric Adapter (EFA) on larger sizes (16xlarge and up), enabling lower latency for tightly coupled HPC clusters and distributed computing workloads.
  • Currently available only in US East (N. Virginia) and US West (Oregon) regions, with metal sizes restricted to N. Virginia – suggesting a phased rollout approach for this new instance family.
  • The combination of Graviton4 processors and 6th-generation Nitro Cards positions R8gn as AWS’s premium offering for customers needing both high memory capacity and extreme network performance in a single instance type.

29:18 Jonathan – “That’s what you need for VLM clustering across multiple machines. That’s fantastic.”

29:55 Introducing AWS CDK Refactor (Preview)

  • AWS CDK now includes a ‘cdk refactor’ command in preview that enables safe infrastructure reorganization by preserving deployed resource states when renaming constructs or moving resources between stacks. This addresses a long-standing pain point where code restructuring could accidentally trigger resource replacement and potential downtime.
  • The feature leverages AWS CloudFormation’s refactor capabilities with automated mapping computation to maintain logical ID consistency during architectural changes. This allows teams to break down monolithic stacks, implement inheritance patterns, or upgrade to higher-level constructs without complex migration procedures.
  • Real-world impact includes enabling continuous infrastructure code evolution for production environments without service disruption. Teams can now confidently refactor their CDK applications to improve maintainability and adopt best practices without risking stateful resources like databases or S3 buckets.
  • The feature is available in all AWS regions where CDK is supported, with no additional cost beyond standard CloudFormation usage. Documentation and a detailed walkthrough are available at docs.aws.amazon.com/cdk/v2/guide/refactor.html.
  • This development matters for AWS customers managing complex infrastructure as code deployments who previously had to choose between maintaining technical debt or risking production stability during refactoring operations.

30:56 Ryan – “It’s interesting, I want to see – because how it works is key, right? Because in Terraform, you can do this, it’s just clunky and hard. And so I’m hoping that this is a little smoother. I don’t use CDK enough to really know how it structures.”

31:36 AWS launches CloudTrail MCP Server for enhanced security analysis

  • AWS introduces a Model Context Protocol (MCP) server for CloudTrail that enables AI agents to analyze security events and user activities through natural language queries instead of traditional API calls.
  • The MCP server provides access to 90-day management event histories via LookupEvents API and up to 10 years of data through CloudTrail Lake using Trino SQL queries, streamlining security investigations and compliance workflows.
  • This open-source integration (available at github.com/awslabs/mcp/tree/main/src/cloudtrail-mcp-server) allows organizations to leverage existing AI assistants for security analysis without building custom API integrations.
  • The service is available in all regions supporting CloudTrail LookupEvents API or CloudTrail Lake, with costs based on standard CloudTrail pricing for event lookups and Lake queries.
  • Key use cases include automated security incident investigation, compliance auditing through conversational interfaces, and simplified access to CloudTrail data for teams without deep AWS API knowledge.

32:23 Ryan – “This is fantastic, just because it’s so tricky to sort of structure queries in whatever SQL language to get the data you want. And being able to phrase things in natural language has really made security operations just completely simpler.”

GCP

36:35 New for the U.K. and EU: No-cost, multicloud Data Transfer Essentials | Google Cloud Blog

  • Google Cloud launches Data Transfer Essentials, a no-cost service for EU and UK customers to transfer data between Google Cloud and other cloud providers for multicloud workloads.
  • The service meets EU Data Act requirements for cloud interoperability, while Google chooses not to pass on costs to customers, despite the Act allowing it.
  • Data Transfer Essentials targets organizations running parallel workloads across multiple clouds, enabling them to process data without incurring Google Cloud egress fees.
  • Customers must opt-in and configure their multicloud traffic, which will appear as zero-charge line items on bills while non-qualifying traffic continues at standard Network Service Tier rates.
  • This positions Google Cloud ahead of competitors on multicloud data transfer costs, as AWS and Azure still charge significant egress fees for cross-cloud transfers.
  • The service builds on Google’s previous moves, like waiving exit fees entirely and launching BigQuery Omni for multicloud data warehousing.
  • Key use cases include distributed analytics workloads, multi-region disaster recovery setups, and organizations using best-of-breed services across different clouds.
  • Financial services and healthcare companies with strict data residency requirements could benefit from cost-free data movement between clouds.
  • The service requires manual configuration through Google’s guide to designate qualifying multicloud traffic, adding operational overhead compared to standard networking.
  • Organizations must ensure traffic genuinely serves multicloud workloads to be eligible for zero-cost transfers.

41:13 Kubernetes 1.34 is available on GKE! | Google Open Source Blog

  • Kubernetes 1.34 brings Dynamic Resource Allocation (DRA) to GA, finally giving production-ready support for better GPU, TPU, and specialized hardware management – a critical feature for AI/ML workloads that need precise resource allocation and sharing.
  • The introduction of KYAML addresses the infamous “Norway Bug” and YAML’s whitespace nightmares by enforcing stricter parsing rules while remaining compatible with existing parsers – just set KUBECTL_KYAML=true to avoid those frustrating debugging sessions from stray spaces.
  • Pod-level resource limits (now beta) simplify multi-container resource management by letting you set a total resource budget for the entire pod instead of juggling individual container limits, with pod-level settings taking precedence when both are defined.
  • Several stability improvements landed, including ordered namespace deletion for security (preventing NetworkPolicy removal before pods), streaming LIST responses to reduce API server memory pressure in large clusters, and resilient watch cache initialization to prevent thundering herd scenarios.
  • GKE’s rapid channel delivered this release just 5 days after the OSS release, showcasing Google’s commitment to keeping its managed Kubernetes service current with upstream developments.

42:57 Jonathan- “I like to think of it as fixing a problem with JSON, rather than fixing a problem with YAML, because what it looks like is JSON, but now you can have comments – inline comments, like you could always do with YAML.”

45:22 AI Inference recipe using NVIDIA Dynamo with AI Hypercomputer | Google Cloud Blog

  • Google Cloud introduces a new recipe for disaggregated AI inference using NVIDIA Dynamo on AI Hypercomputer, which physically separates the prefill (prompt processing) and decode (token generation) phases of LLM inference across different GPU pools to improve performance and reduce costs.
  • The solution leverages A3 Ultra instances with NVIDIA H200 GPUs orchestrated by GKE, with NVIDIA Dynamo acting as the inference server that intelligently routes workloads between specialized GPU pools – one optimized for compute-heavy prefill tasks and another for memory-bound decode operations.
  • This architecture addresses a fundamental inefficiency in traditional GPU serving, where both inference phases compete for the same resources, causing bottlenecks when long prefill operations block rapid token generation, leading to poor GPU utilization and higher costs.
  • The recipe supports popular inference engines, including vLLM, SGLang, and TensorRT-LLM, with initial configurations available for single-node (4 GPUs prefill, 4 GPUs decode) and multi-node deployments for models like Llama-3.3-70B-Instruct, available at github.com/AI-Hypercomputer/gpu-recipes.
  • While AWS and Azure offer various inference optimization techniques, Google’s approach of physically disaggregating inference phases with dedicated GPU pools and intelligent routing represents a distinct architectural approach to solving the compute vs memory bandwidth challenge in LLM serving.

46:52 Jonathan – “It’s just like any app, any monolith, where different parts of the monolith get used at different rates, or have different resource requirements. Do you scale the entire monolith up and then have wasted CPU or RAM on some of them? Or do you break it up into different components and optimize for each particular task? And that’s all they’re doing. It’s a pretty good idea.”

47:56 Data Science Agent now supports BigQuery ML, DataFrames, and Spark | Google Cloud Blog

  • Google’s Data Science Agent now generates code for BigQuery ML, BigQuery DataFrames, and Apache Spark, enabling users to scale data processing and ML workflows directly on BigQuery infrastructure or distributed Spark clusters by simply including keywords like “BQML”, “BigFrames”, or “PySpark” in prompts.
  • The agent introduces @ mentions for BigQuery table discovery within the current project and automatic metadata retrieval, allowing users to reference tables directly in prompts without manual navigation – though cross-project searches still require the traditional “+” button interface.
  • This positions GCP competitively against AWS SageMaker’s code generation features and Azure’s Copilot integrations by offering native BigQuery scaling advantages, particularly for organizations already invested in BigQuery’s ecosystem for data warehousing and analytics.
  • The key limitation is that the agent currently generates only Spark 4.0 code, which may require organizations on earlier Spark versions to upgrade or avoid using the agent for PySpark workflows until backward compatibility is added.
  • The feature targets data scientists and analysts working with large-scale datasets that exceed single-machine memory limits, with practical applications in forecasting, customer segmentation, and predictive modeling using serverless infrastructure to minimize operational overhead.

48:52 Ryan – “This kind of makes me wonder what the data science agent did before this announcement…”

50:18 Introducing DNS Armor to mitigate domain name system risks | Google Cloud Blog

  • Google Cloud launches DNS Armor in preview, partnering with Infoblox to provide DNS-based threat detection that catches malicious domains 68 days earlier than traditional security tools by analyzing over 70 billion DNS events daily.
  • The service detects command and control server connections, DNS tunneling for data exfiltration, and malware distribution sites using both feed-based detection for known threats and machine learning algorithms for emerging attack patterns.
  • DNS Armor operates as a fully managed service requiring no VMs, integrates with Cloud Logging and Security Command Center, and can be enabled at the project level across VPCs with no performance impact on Cloud DNS.
  • This positions GCP competitively against AWS Route 53 Resolver DNS Firewall and Azure DNS Private Resolver, offering similar DNS security capabilities but with Infoblox’s threat intelligence that adds 4 million new threat indicators monthly.
  • Enterprise customers running workloads in GCP gain an additional security layer that addresses the fact that 92% of malware uses DNS for command and control, making this particularly valuable for financial services, healthcare, and other regulated industries.

51:16 Ryan – “This is cool. This is one of the harder problems to solve in security is just that there’s so many services where you have to populate DNS entries and then to route traffic to them. And then it can basically be abandoned over time in bit rot. And so then, it can be snatched up by someone else and then abused; this will help you detect that scenario.”

53:13 Announcing Agent Payments Protocol (AP2) | Google Cloud Blog

  • Google announced Agent Payments Protocol (AP2), an open protocol for secure AI agent-led payments that works with A2A and Model Context Protocol, addressing critical gaps in authorization, authenticity, and accountability when AI agents make purchases on behalf of users
  • The protocol uses cryptographically-signed “Mandates” as tamper-proof digital contracts that create verifiable audit trails for both real-time purchases (human present) and delegated tasks (human not present), solving the trust problem when agents transact autonomously
  • AP2 supports multiple payment types, including credit cards, stablecoins, and cryptocurrencies, with the A2A x402 extension already providing production-ready crypto payment capabilities in collaboration with Coinbase and Ethereum Foundation
  • Over 60 major organizations are participating, including American Express, Mastercard, PayPal, Salesforce, and ServiceNow, positioning this as an industry-wide initiative rather than a Google-only solution
  • The protocol enables new commerce models like automated price monitoring and purchasing, personalized merchant offers through agent-to-agent communication, and coordinated multi-vendor transactions within budget constraints

54:26 Jonathan – “This may be the path to the micro payments thing that people have been trying to get off the ground for years. You run a blog or something, and something like this could actually get you the half cent per view that would cover the cost of the server or something.”

55:56 C4A Axion processors for AlloyDB now GA | Google Cloud Blog

  • AlloyDB on C4A Axion processors delivers up to 45% better price-performance than N-series VMs for transactional workloads and achieves 3 million transactions per minute, with the new 1 vCPU option cutting entry costs by 50% for development environments.
  • Google’s custom ARM-based Axion processors outperform Amazon’s Graviton4 offerings by 2x in throughput and 3x in price-performance for PostgreSQL workloads, according to independent Gigaom testing, positioning GCP competitively in the ARM database market.
  • The addition of a 1 vCPU/8GB memory configuration addresses developer needs for cost-effective sandbox environments, though it lacks uptime SLAs even in HA configurations, while production workloads can scale up to 72 vCPUs with a new 48 vCPU intermediate option.
  • C4A instances are priced identically to N2 VMs while delivering superior performance, making migration a straightforward cost optimization opportunity for existing AlloyDB customers without pricing penalties.
  • Limited regional availability in select Google Cloud regions may impact adoption timing, but the GA status signals production readiness for customers already testing in preview who cited both performance gains and cost reductions.

58:04 OpenTelemetry now in Google Cloud Observability | Google Cloud Blog

  • Google Cloud Trace now supports OpenTelemetry Protocol (OTLP) for trace data ingestion via telemetry.googleapis.com, enabling vendor-agnostic telemetry pipelines that eliminate the need for Google-specific exporters and preserve the OTel data model during transmission.
  • The new OTLP endpoint significantly increases storage limits: attribute keys expand from 128 to 512 bytes, values from 256 bytes to 64 KiB, span names from 128 to 1024 bytes, and attributes per span from 32 to 1024, addressing previous limitations for high-volume trace data users.
  • Cloud Trace’s internal storage now natively utilizes the OpenTelemetry data model and leverages OTel semantic conventions, such as service.name and span status, in the Trace Explorer UI, thereby improving the user experience for filtering and analyzing traces.
  • Google positions this as the first step in a broader strategy to support OTLP across all telemetry types (traces, metrics, and logs), with future plans for server-side processing, flexible routing, and unified telemetry management across environments.
  • Organizations using multi-cloud or hybrid environments benefit from reduced client-side complexity and the ability to easily send telemetry to multiple observability backends without additional exporters or format conversions.

1:00:41 Our new Waltham Cross data center is part of our two-year, £5 billion investment to help power the UK’s AI economy.

  • Google is investing £5 billion over two years in UK infrastructure, including a new data center in Waltham Cross, Hertfordshire, to support growing demand for AI services like Google Cloud, Search, and Maps.
  • The investment encompasses capital expenditure, R&D, and engineering resources, with projections to support 8,250 jobs annually in the UK while strengthening the country’s AI economy.
  • Google partnered with Shell to manage its UK carbon-free energy portfolio and deploy battery technology that stores surplus clean energy and feeds it back to the grid during peak demand.
  • This expansion positions Google to compete more effectively with AWS and Azure in the UK market by providing local infrastructure for AI workloads and reducing latency for UK customers.
  • The data center will support Google DeepMind’s AI research in science and healthcare, offering UK enterprises and researchers improved access to Google’s AI capabilities and cloud services.

1:01:31 Justin – “The Deep Mind AI research is the most obvious reason why they did this.”

1:02:22 Announcing the new Practical Guide to Data Science on Google Cloud | Google Cloud Blog

  • Google released a new ebook called A Practical Guide to Data Science with Google Cloud that demonstrates how to use BigQuery, Vertex AI, and Serverless Spark together for modern data science workflows.
  • The guide emphasizes unified workflows through Colab Enterprise notebooks that blend SQL, Python, and Spark code in one place, with AI assistive features that generate multi-step plans and code from high-level goals.
  • Google’s approach allows data scientists to manage structured and unstructured data in one foundation, using familiar SQL syntax to process documents or analyze images directly through BigQuery.
  • The ebook includes real-world use cases like retail demand forecasting and agricultural risk assessment, with each example linking to executable notebooks for immediate hands-on practice.
  • This positions Google Cloud as offering more integrated data science tooling compared to AWS SageMaker or Azure ML, particularly with the SQL-based approach to unstructured data analysis through BigQuery.

1:04:29 Google releases VaultGemma, its first privacy-preserving LLM – Ars Technica

  • Google Research has developed VaultGemma, its first large language model implementing differential privacy techniques that prevent the model from memorizing and potentially exposing sensitive training data by introducing calibrated noise during training.
  • The research establishes new scaling laws for private LLMs, demonstrating that increased privacy (more noise) requires either higher compute budgets measured in FLOPs or larger data budgets measured in tokens to maintain model performance.
  • This addresses a critical challenge as tech companies increasingly rely on potentially sensitive user data for training, with the noise-batch ratio serving as the key parameter for balancing privacy protection against model accuracy.
  • For cloud providers and enterprises, this technology enables the deployment of LLMs that can train on proprietary or regulated data without risk of exposing that information through model outputs, opening new use cases in healthcare, finance, and other privacy-sensitive domains.
  • The approach provides a mathematical framework for developers to calculate the optimal trade-offs between privacy guarantees, computational costs, and model performance when building privacy-preserving AI systems.

1:05:36 Justin – “You want to train a model based off of sensitive data, and then you want to offer the output of that model through a chatbot or whatever it is publicly. And it’s terrifying, as a security professional, because you don’t know what data is going to be spit out, and you can’t predict it, and it’s very hard to analyze within the model what’s in there… And so if solutions like this, where you can sort of have mathematical guarantees – or at least something you can point at, that would go a long way in making those workloads a reality, which is fantastic.”

Azure

1:08:20 Generally Available: Azure Cosmos DB for MongoDB (vCore) encryption with customer-managed key

  • Azure Cosmos DB for MongoDB vCore now supports customer-managed keys (CMK) in addition to the default service-managed encryption, providing enterprises with full control over their encryption keys through Azure Key Vault integration.
  • This dual-layer encryption approach aligns Azure with AWS DocumentDB and MongoDB Atlas encryption capabilities, addressing compliance requirements for regulated industries like healthcare and finance that mandate customer-controlled encryption.
  • The feature enables key rotation, revocation, and audit logging through Azure Key Vault, though customers should note potential performance impacts and additional Key Vault costs beyond standard Cosmos DB pricing.
  • Organizations can implement bring-your-own-key (BYOK) scenarios for multi-cloud deployments or maintain encryption key consistency across hybrid environments, particularly useful for migrations from on-premises MongoDB.
  • The vCore deployment model already differentiates from Cosmos DB’s RU-based pricing by offering predictable compute-based costs, and CMK support strengthens its appeal for traditional MongoDB workloads requiring familiar operational patterns.

1:09:31 Ryan – “I do like these models, but I do think it should be used sparingly – because I don’t think there’s a whole lot of advantage of bringing your own key… because you can revoke the key and then Azure can’t edit your data, and it feels like an unwarranted layer of protection.”

1:14:57 Introducing Logic Apps MCP servers (Public Preview) | Microsoft Community Hub

  • Azure Logic Apps now supports Model Context Protocol (MCP) servers in public preview, allowing developers to transform Logic Apps connectors into reusable MCP tools for building AI agents, with two deployment options: registering connectors through Azure API Center or enabling existing Logic Apps as remote MCP servers.
  • The API Center integration provides automated workflow creation and Easy Auth configuration in minutes, while also registering MCP servers in a centralized enterprise catalog for discovery and management across organizations.
  • This positions Azure against AWS’s agent-building capabilities by leveraging Logic Apps’ extensive connector ecosystem (over 1,000 connectors) as pre-built tools for AI agents, reducing development overhead compared to building custom integrations from scratch.
  • Target customers include enterprises building AI agents that need to integrate with multiple systems – the MCP approach allows modular composition of capabilities like data access, messaging, and workflow orchestration without extensive custom coding.
  • Implementation requires Logic Apps Standard tier (consumption-based pricing starting at $0.000025 per action), Microsoft Entra app registration for authentication, and HTTP Request/Response triggers with proper schema descriptions for tool discovery.

1:16:04 Ryan – “For me, the real value in this is that central catalog. The minute MCP was out there, people were standing up their own MCP servers and building their own agents, and then it was duplicative, and so you’ve got every team basically running their own server doing the exact same thing. And now you get the efficiency of centralizing that through a catalog. Also, you don’t have to redo all the work that’s involved with that. There’s efficiency there as well.”

1:17:13 Accelerating AI and databases with Azure Container Storage, now 7 times faster and open source | Microsoft Azure Blog

  • Azure Container Storage v2.0.0 delivers 7x higher IOPS and 4x lower latency for Kubernetes workloads using local NVMe drives, with PostgreSQL showing 60% better transaction throughput.
  • The service is now completely free with no per-GB fees, making it cost-competitive against AWS EBS and Google Persistent Disk, which charge for management overhead.
  • Microsoft open-sourced the entire platform at github.com/Azure/local-csi-driver, allowing deployment on any Kubernetes cluster beyond AKS. This positions Azure as more open than competitors while maintaining feature parity between managed and self-hosted versions.
  • The new architecture reduces CPU consumption to less than 12.5% of node resources (down from up to 50% previously) while delivering better performance. This efficiency gain directly translates to cost savings since customers can run more workloads on the same infrastructure.
  • Integration with KAITO (Kubernetes AI Toolchain Operator) enables 5x faster AI model loading for inference workloads on GPU-enabled VMs with local NVMe. This targets the growing market of organizations running LLMs and AI workloads on Kubernetes, competing with AWS SageMaker and GCP Vertex AI.
  • Single-node deployment support removes the previous 3-node minimum requirement, making it practical for edge computing, development environments, and cost-conscious deployments. This flexibility addresses a key limitation compared to traditional SAN-based storage solutions.

1:19:17 Microsoft leads shift beyond data unification to organization, delivering next-gen AI readiness with new Microsoft Fabric capabilities

  • Microsoft Fabric introduces Graph and Maps capabilities to help organizations structure data for AI agents, moving beyond simple data unification to create contextualized, relationship-aware data foundations that AI systems can reason over effectively.
  • The new Graph in Fabric feature uses LinkedIn’s graph design principles to visualize and query relationships across enterprise data like customers, partners, and supply chains, while Maps in Fabric adds geospatial analytics for location-based decision making.
  • OneLake, Fabric’s unified data lake, now supports mirroring from Oracle and Google BigQuery, plus new shortcuts to Azure Blob Storage, allowing organizations to access all their data regardless of location while maintaining governance through new security controls.
  • Microsoft is integrating Fabric with Azure AI Foundry to create a complete data-to-AI pipeline, where Fabric provides the structured data foundation and AI Foundry enables developers to build and scale AI applications using familiar tools like GitHub and Visual Studio.
  • The platform targets enterprises ready to move from AI experimentation to production deployment, with over 50,000 Fabric certifications already achieved by users preparing for these new AI-ready data capabilities.

1:20:35 Justin – “The fabric stuff is interesting because it’s basically just a ton of stuff, like Power BI and the Data Lake and stuff, shoved into one unified platform, which is nice, and it makes it easier to do data processes. So I don’t expect it to be a major cost increase for customers who are already using fabric.”

Oracle

1:21:40 Oracle’s stock makes biggest single-day gain in 26 years on huge cloud revenue projections – SiliconANGLE

  • Oracle’s stock jumped 36% after announcing projected cloud infrastructure revenue of $144 billion by fiscal 2030, with RPO (remaining performance obligations) hitting $455 billion – a 359% year-over-year increase driven by four multibillion-dollar contracts signed this quarter.
  • Oracle’s projected $18 billion in OCI revenue for the current fiscal year still trails AWS ($112B) and Azure ($75B), but their aggressive growth trajectory suggests they’re positioning to become a legitimate third hyperscaler option, particularly for enterprises already invested in Oracle databases.
  • The upcoming Oracle AI Database service (launching October) will allow customers to run LLMs from OpenAI, Anthropic, and others directly against Oracle database data – a differentiator from AWS/Azure, which lack native database integration at this level.
  • Oracle’s partnership strategy with AWS, Microsoft, and Google to provide data center infrastructure creates an unusual dynamic where competitor growth actually benefits Oracle, while their 4.5GW data center expansion with OpenAI shows they’re securing critical AI infrastructure capacity.
  • The market’s enthusiasm appears driven more by Oracle’s confidence in projecting 5-year revenue forecasts (unusual in cloud infrastructure) than actual Q1 results, which missed both earnings ($1.47 vs $1.48 expected) and revenue ($14.93 B vs $15.04 B expected) targets.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

  continue reading

318 episodes