Artwork
iconShare
 
Manage episode 524235035 series 3705596
Content provided by Keith Bourne. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Keith Bourne or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.

Unlock the secrets to evaluating Retrieval-Augmented Generation (RAG) pipelines effectively and efficiently with ragas, the open-source framework that’s transforming AI quality assurance. In this episode, we explore how to implement reference-free evaluation, integrate continuous monitoring into your AI workflows, and optimize for production scale — all through the lens of Keith Bourne’s comprehensive Chapter 9.

In this episode:

- Overview of ragas and its reference-free metrics that achieve 95% human agreement on faithfulness scoring

- Implementation patterns and code walkthroughs for integrating ragas with LangChain, LlamaIndex, and CI/CD pipelines

- Production monitoring architecture: sampling, async evaluation, aggregation, and alerting

- Comparison of ragas with other evaluation frameworks like DeepEval and TruLens

- Strategies for cost optimization and asynchronous evaluation at scale

- Advanced features: custom domain-specific metrics with AspectCritic and multi-turn evaluation support

Key tools and technologies mentioned:

- ragas (Retrieval Augmented Generation Assessment System)

- LangChain, LlamaIndex

- LangSmith, LangFuse (observability and evaluation tools)

- OpenAI GPT-4o, GPT-3.5-turbo, Anthropic Claude, Google Gemini, Ollama

- Python datasets library

Timestamps:

00:00 - Introduction and overview with Keith Bourne

03:00 - Why reference-free evaluation matters and ragas’s approach

06:30 - Core metrics: faithfulness, answer relevancy, context precision & recall

09:00 - Code walkthrough: installation, dataset structure, evaluation calls

12:00 - Integrations with LangChain, LlamaIndex, and CI/CD workflows

14:30 - Production monitoring architecture and cost considerations

17:00 - Advanced metrics and custom domain-specific evaluations

19:00 - Common pitfalls and testing strategies

20:30 - Closing thoughts and next steps

Resources:

- "Unlocking Data with Generative AI and RAG" by Keith Bourne - Search for 'Keith Bourne' on Amazon and grab the 2nd edition

- Memriq AI: https://Memriq.ai

- ragas website: https://www.ragas.io/

- ragas GitHub repository: https://github.com/vibrantlabsai/ragas (for direct access to code and docs)

Tune in to build more reliable, scalable, and maintainable RAG systems with confidence using open-source evaluation best practices.

  continue reading

22 episodes