Artwork
iconShare
 
Manage episode 523994503 series 3705596
Content provided by Keith Bourne. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Keith Bourne or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.

emantic caches are transforming how AI systems handle costly reasoning by intelligently reusing prior agent workflows to slash latency and inference costs. In this episode, we unpack Chapter 15 of Keith Bourne’s "Unlocking Data with Generative AI and RAG," exploring the architectures, trade-offs, and practical engineering of semantic caches for production AI.

In this episode:

- What semantic caches are and why they reduce AI inference latency by up to 100x

- Core techniques: vector embeddings, entity masking, and CrossEncoder verification

- Comparing semantic cache variants and fallback strategies for robust performance

- Under-the-hood implementation details using ChromaDB, sentence-transformers, and CrossEncoder

- Real-world use cases across finance, customer support, and enterprise AI assistants

- Key challenges: tuning thresholds, cache eviction, and maintaining precision in production

Key tools and technologies mentioned:

- ChromaDB vector database

- Sentence-transformers embedding models (e.g., all-mpnet-base-v2)

- CrossEncoder models for verification

- Regex-based entity masking

- Adaptive similarity thresholding

Timestamps:

00:00 - Introduction and episode overview

02:30 - What are semantic caches and why now?

06:15 - Core architecture: embedding, masking, and verification

10:00 - Semantic cache variants and fallback approaches

13:30 - Implementation walkthrough using Python and ChromaDB

16:00 - Real-world applications and performance metrics

18:30 - Open problems and engineering challenges

19:30 - Final thoughts and book spotlight

Resources:

- "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition

- Memriq AI: https://Memriq.ai

  continue reading

22 episodes