Artwork
iconShare
 
Manage episode 514914841 series 3690682
Content provided by Mike Breault. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Mike Breault or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.
We explore DeepSeek AI's groundbreaking idea of turning long documents into dense visual tokens to bypass transformer context limits. DeepSeek OCR uses a two-path encoder (an 80M SAM-based local reader and a 300M CLIP-based global model) connected by a 16x convolutional compressor, feeding a 570M-parameter MOE decoder. With 10x–20x compression, it achieves high OCR accuracy on Fox benchmarks, outperforms rivals with far fewer tokens, and scales to industrial volumes (200k pages/day on a single A100). We discuss implications for memory and potentially unlimited-context architectures, and note that the project is open-sourced for researchers and educators alike.

Note: This podcast was AI-generated, and sometimes AI can make mistakes. Please double-check any critical information.

Sponsored by Embersilk LLC

  continue reading

1403 episodes