ElevenLabs V3 and the Future of Text to Speech
Manage episode 489956716 series 3671689
In Episode 2 of Creative Flux, Pierson Marks (@piersonmarks) and Bilal Tahir (@deepwhitman) explore recent text-to-speech advancements from model providers like ElevenLabs V3 and OpenAI. They highlight where generative voice is today, where it is going, and how the industry is adopting this new technology.
Chapters
00:00 Introduction to Creative Flux
03:05 Exploring Text-to-Speech Technology
06:08 The Evolution of Speech Recognition
08:45 Advancements in Text-to-Speech Models
11:52 Comparing Text-to-Speech Providers
15:09 Voice Agents vs. Creative Applications
17:52 The Future of Conversational AI
23:33 Exploring API Access and New Features
24:38 Innovations in Audio Inpainting
28:06 Gemini TTS and AI Studio Overview
30:53 The Role of AI Studio for Developers
34:22 The Future of Local Models and Open Source
39:25 The Need for Abstraction Layers in TTS
44:02 The Rapid Evolution of Media Generation
Links:
- Elevenlabs: https://elevenlabs.io/app/speech-synthesis/text-to-speech
- Elevenlabs v3 intro: https://elevenlabs.io/docs/models#eleven-v3-alpha
- Openai Fm: https://www.openai.fm/
- Playai inpaint & dialog: https://fal.ai/models/fal-ai/playai/inpaint/diffusion
- Kokoro: https://fal.ai/models/fal-ai/kokoro/american-english
- Google ai studio: https://aistudio.google.com/prompts/new_chat
- Google AI studio for TTS: https://aistudio.google.com/generate-speech
- Minimax hailuo -02: https://fal.ai/models/fal-ai/minimax/hailuo-02/standard/image-to-video
- Seedance lite: https://fal.ai/models/fal-ai/bytedance/seedance/v1/lite/text-to-video
2 episodes