288: You Might Be Able to Retrain Notebook LM Hosts to be Less Annoyed, But Not Your Cloud Pod Hosts
Manage episode 496871758 series 3680004
Welcome to episode 288 of The Cloud Pod – where the forecast is always cloudy! Justin, Ryan, and Jonathan are your hosts as we make our way through this week’s cloud and AI news, including back to Vertex AI, Project Digits, Notebook LM, and some major improvements to AI image generation.
Titles we almost went with this week:
- Digits… I’ll show you 5 digits…
- The only digit the AWS local zone in New York shows me is the middle one
- ️Keep one eye open near Mercedes with Agentic AI
A big thanks to this week’s sponsor:
We’re sponsorless! Want to get your brand, company, or service in front of a very enthusiastic group of cloud news seekers? You’ve come to the right place! Send us an email or hit us up on our slack channel for more info.
General News
01:59 Nvidia announces $3,000 personal AI supercomputer called Digits
- If you don’t want to hand over all your money to the cloud providers, you will be able to hand over $3,000 dollars to Nvidia… for a computer that is probably going to be obsolete in <12 months. That’s fun!
- The new personal AI supercomputer, called Project Digits, will launch in May.
- The heart of Digits is the new GB10 Grace Blackwell Superchip, which packs enough processing power to run sophisticated AI models, while being compact enough to fit on a desk and run from a standard power outlet.
- Digits can handle AI models with up to 200 billion parameters, and looks very similar to a Mac Mini.
- “AI will be mainstream in every application for every industry. With Project Digits, the Grace Blackwell Superchip comes to millions of developers,” Nvidia CEO Jensen Huang said in a press release. “Placing an AI supercomputer on the desks of every data scientist, AI researcher, and student empowers them to engage and shape the age of AI.”
- The Digits system comes with 128gb of unified coherent memory and up to 4tb of NVME storage. For even more demanding apps, two digit systems can be linked together to handle models with 405b parameters.
- The GB10 chip delivers up to 1 petaflop of AI performance, meaning it can perform 1 quadrillion AI calculations per second.
- Suppose you plunk down the money for Digits. In that case, you will also get access to Nvidia’s AI software library, including development kits, orchestration tools and pre-trained models available through the Nvidia NGC catalog.
- The system runs on a Linux-based NVidia NGC catalog, and supports popular frameworks like PyTorch, Python and Jupyter notebooks.
09:25 Jonathan – ““The Blackwell is pretty recent, it’s the one that had a lot of problems with yield. And I kind of suspect that they’re sort of packaging this up and selling some of the chips which didn’t pass all the tests for the commercial products. And so they’re enabling whatever cores they can in these things to sell to consumers… Having all the memories is really great for the big models. It’s not going to be particularly performant now. I think the spec I saw was like one teraflop at quite low precision – like fb4 precision – which is quite low, and I think it’d be better off if you’re really interested in buying some like 3090s or 5090s or something like that. Obviously you don’t get the memory, but far better performance for the price.”
06:46 Nvidia’s Jensen Huang hints at ‘plans’ for its own desktop CPU
- It’s long been rumored that Nvidia is planning to break into the consumer CPU market in 2025, and we finally got some insight into those plans.
- Nvidia CEO Jenen Huagh said there are bigger plans for the arm-based cpu within the GB10 chip introduced in the Digits computer, and is co-developed with Mediatek.
- Huang told investors that they obviously have plans, and they can’t wait to tell us – or sell us – more.
07:22 Justin – “It’s interesting to see the dominance of Intel fall to the dominance of Nvidia and Nvidia just basically repeating the whole whole set of stuff all over again.”
AI Is Going Great – Or, How ML Makes All its Money
08:23 Build RAG and Agent-based AI Apps with Anthropic’s Claude 3.5 Sonnet in Snowflake Cortex AI
- Snowflake is announcing the GA of Claude 3.5
- Sonnet as the first Anthropic Foundation model available in Snowflake Cortex AI.
- Customers can now access the most intelligent model in the Claude model family from Anthropic using familiar SQL, Python and REST API interfaces, within the Snowflake security perimeter.
16:43 Justin – “that’s actually nice. I didn’t realize that Snowflake was going to be making Claude available. Missed the EA, but glad to see my favorite model is at least available there.”
AWS
09:33 AWS Compute Optimizer now expands idle and rightsizing recommendations for Amazon EC2 Auto Scaling groups
- Computer optimizer will now expand to idle and rightsizing recommendations for ASG’s with scaling policies and multiple instance types.
- With the new recommendations, you can take actions to optimize cost and performance for these groups without requiring specialized knowledge or engineering resources to analyze them.
09:56 Ryan – “Well, this is long overdue, Because you’ve always had, or for a long time anyway, you’ve had optimizations for standalone EC2 instances. But ASGs have always been ignored. And a huge amount of waste of people that set a minimum scale level for these things. And they’re just sitting there, burning through coal, but not taking any requests. So I’m glad to see these making the list.”
12:37 Announcing the general availability of a new AWS Local Zone in New York City
- AWS is announcing the GA of AWS Local Zone in New York City, supporting a wide range of workloads, including C7i, R7i, M6i and M6in EC2 instances, EBS volumes and ECS, EKS, ALB and AWS Direct connect, all available in the local zone.
13:42 Why CEO Matt Garman is willing to bet AWS on AI
- The excellent Decoder podcast with Nilay Patel recently invited Matt Garman on to talk about stepping into the AWS CEO role.
- Matt hits on the same talking points you’ve heard in the past, that most companies are still barely in the cloud, there is a huge market, etc.
- Matt talks about reorienting the computing infrastructure to support the evolving world of Generative AI.
- It’s clear from listening to the interview that Amazon is thinking about AI beyond just the model, but the monetization of the service around the model, etc.
- They touch on several other interesting topics like AGI, Netflix as a customer, etc and it’s worth a listen too.
15:51 Justin – “I mean, basically building infrastructure services that support the needs of AI driven worlds. And we’ll talk about a little bit later in an Azure story, it will come up about AI first apps and what that’s going to mean and kind of some of those things. But I think that’s what he was referring to basically without using as catchy a phrase as Microsoft came up with.”
16:32 Now open — AWS Mexico (Central) Region
- In February 2024, AWS announced its plan to expand into Mexico.
- Today – 11 months later, they are excited to announce the GA of the AWS Mexico central region with three AZ’s and API code mx-central-1.
18:14 AWS CDK is splitting Construct Library and CLI
- AWS CDK is a software development framework for defining cloud infrastructure in code and provisioning it through AWS CloudFormation.
- It consists of two primary components; The Construct Library that you use in a programming language to model your AWS app and a CLI.
- The Construct Library synthesizes a model of your application to a directory on disk, and the CLI reads that directory file to deploy your application on AWS.
- Starting in Feb 2025, the CDK CLI and CDK Construct Library will no longer be released in lockstep. Instead, they will both have their own independent release cadence, which means their version numbers are going to diverge. There will be no impact to the CDK API or User Experience.
- They are doing this as they have matured the library, they have found that changes to the different components proceed at different paces and require different testing strategies, this change gives them the ability to make changes to release cadence of one subproject without affecting the other, giving the entire project more agility.
19:42 Ryan – “I’ve really tried over and over and over to get into the CDK model, and it just doesn’t work for me. And I think I wonder if it’s just because I was sort of a sysadmin that turned into a programmer over time, if it came from that direction, or if it’s just my utter hatred of TypeScript.”
GCP
22:08 Get ready for a unique, immersive security experience at Next ‘25
- Google Next is shockingly just around the corner (at the beginning of April) and Google is getting ready by telling you about all the great things you can look forward to. This week they highlight what to look forward to as a security person: - Access to a security lounge, a dedicated area in the expo where you can meet security leaders engineering Google Cloud’s secure by design platform and products.
- Interactive Security Operations Center to see Google Secops from the eyes of both the defender and adversary.
- Mandiant threatspace where you’ll learn from frontline defenders nd incident responders
- Overviews on Securing your AI Experience
- Capture the flag challenge, where you can test and hone your cybersecurity skills. With real world data, random notes and information from the dark web simulate a real world threat hunt.
- Security tabletop exercises where you can role-play and analyze aspects of hypothetical but realistic security incidents. And Bird of a feather sessions.
- Plus over 40 security breakout sessions.
 
- For CISO they have a dedicated programming track to equip CISO’s and other security leaders with insights and strategies that they need to navigate the evolving threat landscape.
- Want to register? You can do that here.
24:25 Introducing Vertex AI RAG Engine: Scale your Vertex AI RAG pipeline with confidence
- Google is announcing the General Availability of the Vertex AI’s RAG engine, a fully managed service that helps you build and deploy RAG implementations with your data and methods.
- Google’s AI RAG engine allows you to: - Adapt to any architecture: from models, vector databases and data sources that work for your use case.
- Evolve with your use case: add new data sources, updating models, and/or adjusting retrieval parameters through simple configuration changes.
- Evaluate in simple steps with different configurations to find what works best for your use case
 
- Feature set of the RAG Engine - DIY capabilities: DIY RAG empowers users to tailor their solutions by mixing and matching different components. It works great for low to medium complexity use cases with easy-to-get-started API, enabling fast experimentation, proof-of-concept and RAG-based application with a few clicks.
- Search functionality: Vertex AI Search stands out as a robust, fully managed solution. It supports a wide variety of use cases, from simple to complex, with high out-of-the-box quality, easiness to get started and minimum maintenance.
- Connectors: A rapidly growing list of connectors helps you quickly connect to various data sources, including Cloud Storage, Google Drive, Jira, Slack, or local files. RAG Engine handles the ingestion process (even for multiple sources) through an intuitive interface.
- Enhanced performance and scalability: Vertex AI Search is designed to handle large volumes of data with exceptionally low latency. This translates to faster response times and improved performance for your RAG applications, especially when dealing with complex or extensive knowledge bases.
- Simplified data management: Import your data from various sources, such as websites, BigQuery datasets, and Cloud Storage buckets, that can streamline your data ingestion process.
- Improved LLM output quality: By using the retrieval capabilities of Vertex AI Search, you can help to ensure that your RAG application retrieves the most relevant information from your corpus, which leads to more accurate and informative LLM-generated outputs.
 
- And customizable - Parsing and Retrievable customizations.
 
26:22 Jonathan – “It must be really tough, I think, being a service provider in this industry right now, because things are changing so quickly. It’s like, well, do we launch this Vertex AI rag product, or do we wait three months and this paper we just wrote about Titans, which is kind of like a slightly modified architecture that sort of separates episodic memory, like specific facts that you must remember as facts in themselves from the general training sort of pool of the network. And so that will help address hallucinations.”
32:07 Google Cloud’s Automotive AI Agent arrives for Mercedes-Benz.
- Google is unveiling the Automotive AI Agent, a new way for automakers to create helpful generative AI experiences.
- Built using Gemini with Vertex AI, the Automotive AI Agent is specially tuned to allow automakers to create highly personalized and intuitive in-car agents that go beyond vehicle voice control.
- This will allow you to ask via natural conversations like “is there an Italian restaurant nearby? As well as follow up questions like “does it have good reviews? What’s the most popular dish?”
- Mercedes-Benz is among the first to implement the Automotive AI Agent in its MBUX virtual assistant, coming to the new Mercedes-Benz CLA later this year.
32:49 Ryan – “Well, I keep thinking about the manufacturer-specific GPS interfaces. That was a terrible choice, because it was immediately out of date and not getting updates. And then everything just shifted to a mobile device that you can keep up to date. And this is going to be no different. Why? This is not a good idea.”
36:26 State-of-the-art video and image generation with Veo 2 and Imagen 3
- Last year Google released VEO and Imagen 3, and creators have brought their ideas to life with the help of these models.
- Now they are introducing the latest version of Veo, in Veo 2, and the latest version of Imagen 3, both of which achieve state-of-the-art results. These models are now available in VideoFX, ImageFX and their latest experiment, Whisk.
- Veo 2 can create high-quality video in a wide range of subjects and styles. In head-to-head comparisons judged by human raters, Veo2 achieved state of the art results against leading models.
- Veo 2 will deliver resolution up to 4k, and be extended to minutes in length. You can specify things like the lens to use, blur out background or focus on a subject by putting a shallow depth of field into the prompt.
- While many video models hallucinate unwanted details like extra fingers or unexpected objects, Veo 2 produces these less frequently, making the outputs more realistic.
- Imagen 3 is improving and includes brighter, better-composed images.
- It can now render more diverse art styles more accurately, from photo realism to impressionism, from abstract to anime.
- Whisk is their newest experiment, it lets you input or create images that convey the subject, scene and style you have in mind. You can bring them together and remix them to create something uniquely your own, from a digital plushie to an enamel pin or sticker.
- Whisk combines imagen 3 with Gemini’s visual understanding and description capabilities.
36:41 Justin – “I tried to try Wisk 3 or Wisk here with Imogen 3, cause I was curious. And it only can make digital plushies, enamel pins or stickers. So literally choose one of those three things and then what image would you like to use? And then here, here’s your result, which I thought was sort I’m like, well, that’s not really helpful.”
40:49 The CMA’s assessment of Google Search
- The UK CMA has announced they will be assessing whether Google Search has “Strategic Market Status” SMS under the new digital markets, competition and consumer regime and what new requirements Google Search may need to follow.
- Google plans to engage constructively to lay out how services benefit UK consumers and businesses, as well as trade-offs of new regulations.
- Will keep an eye on this one.
41:21 Justin – “We’ll keep an eye on this one. This would be probably a fun story because what Google wants and what the UK wants are probably completely different things; and this will probably eventually turn into an EU issue as well.”
42:02 Google’s NotebookLM had to teach its AI podcast hosts not to act annoyed at humans
- Techcrunch has an article about their NoteBookLM feature from Google, and apparently they had to teach them not to be annoyed.
- In December 2024, they added the ability to call in to the podcast and ask questions, essentially interrupting the AI hosts.
- When the features were first rolled out, the AI hosts seemed annoyed at such interruptions, and would occasionally give snippy comments to human callers like “I was getting to that” or “as I was about to say” which felt adversarial.
- NotebookLM’s team decided to do some friendliness tuning.
- They posted on X… that friendliness tuning was in the “things i never thought would be my job, but are” category.
- They tested a variety of different prompts, and landed on a new prompt that is more friendly and engaging.
- Techcrunch tested the fix and said that it is working and the hosts even expressed surprise exclaiming “Woah” before politely asking the human to chime in.
43:09 Justin – “Maybe we can have NotebookLM call in to us and ask us questions!”
43:54 Google Cloud could overtake Microsoft’s No. 2 cloud position this year
- First let me tell you my opinion… “yeah right”
- Analyst Jack Gold attempted to zero in on cloud hosting revenue for the big three hyperscalers, and he concluded that Google Cloud’s Pure cloud hosting revenue is likely much closer to Azure’s than Microsoft wants it to be. In Fact he estimates it to be within $1 billion dollars.
- At current growth rates, he projects that Google Cloud’s revenue will be 55% greater than Azure.
45:25 Ryan – “I disagree with the time scale. And if you extend the time scale out too much longer, you just have to assume everything sort of stays the same. And there’s so many things that can change things. You know, like there was a, I’m sure there was a huge bump from AI for Microsoft, you know, a little while ago. has that been really spread across the other cloud providers? I don’t really know if they caught up.”
Azure
47:36 Introducing CoreAI – Platform and Tools
- Satya Nadella comes to us with an update he sent to Microsoft employees and is sharing publicly (I mean it would have been leaked anyways)
- Satya indicates that we are heading into the next era of the AI Platform shift. 2025 will be about model-forward applications that reshape all application categories. Unlike previous platform shifts this will impact every layer of the application stack. GUI, Servers, Cloud Native Databases all being done at once… 30 years of change compressed into 3 years.
- He says they will build agentic applications with memory, entitlements and action space that will inherit powerful model capabilities. And will adapt those capabilities for enhanced performance and safety across roles, business processes and industry domains.
- This will lead to what he calls the AI-first App stack, one with new UI/UX patterns, runtimes to build with agents, orchestrate multiple agents, and a reimagined management and observability layer.
- So it is imperative that Azure must become the infrastructure for AI, while they build AI platforms and developer tools spanning Azure AI, foundry github and VS Code on top of it.
- The good news per Satya they have been working on it for 2 years already and have learned a lot in terms of the systems, app platform and tools required for the AI era.
- To further advance the roadmap across the layers they are creating an ew engineering organization: CoreAI – Platform and Tools.
- The new division will bring together Dev Div, AI platform and some key teams from the office of the CTO, including AI supercomputer, AI Agentic runtime and Engineering thrive, with the mission to build the end-to-end copilot & AI stack for both first-party and third-party customers to build and run AI apps and agents.
- This group will also build out GitHub Copilot, thus having a tight feedback loop between the leading AI-first product and the AI platform to motivate the stack and its roadmap.
- The new Core AI team will be led by Jay Parikh EVP.
51:02 Justin – “I mean, it’s kind of neat though. Like if you think about that and then they put that with the AI agentic team and that like, could be really, cause I mean, it is, that is my day to day life. Like it’s my challenge. How do I get AI here? And there’s so many hurdles to make it happen.”
Oracle
52:44 Oracle Supercharges Retail Operations with New POS
- Justin is a child and will never not laugh at Point of Sale being POS… so here’s a fun story to round out today’s show.
- There is nothing here really cloud related.… we just wanted to snicker about it.
- Ok they do pitch you on using OCI and OCI Container instances to speed up your implementation, and a plug for OCI Roving Edge infrastructure for your store to run Xstore. So there – it’s cloud related.
Closing
And that is the week in the cloud! Visit our website, the home of the Cloud Pod where you can join our newsletter, slack team, send feedback or ask questions at theCloud Pod.net or tweet at us with hashtag #theCloudPod
318 episodes


 
 
 
