Manage episode 524659525 series 3404634
What if the biggest breakthrough in pathology AI isn’t a new algorithm—but finally sharing the data we already have?
In this episode, I’m joined by Jeroen van der Laak and Julie Boisclair from the IMI BigPicture consortium, a European public-private initiative building one of the world’s largest digital pathology image repositories. The goal isn’t to create a single AI model—but to enable thousands by making high-quality, legally compliant data accessible at scale.
We unpack what it really takes to build a 3-million-slide repository across 44 partners, why GDPR and data-sharing agreements delayed progress by 18 months, and how sustainability, trust, and collaboration are just as critical as technology. This conversation is about the unglamorous—but essential—work of building infrastructure that will shape pathology AI for decades.
⏱️ Highlights with Timestamps
- [00:00–01:40] Why BigPicture focuses on data—not algorithms
- [01:40–03:16] Scope of the project: 44 partners, 15–18 countries, 3M images
- [03:16–06:20] The 18-month delay caused by legal frameworks and GDPR
- [06:20–11:52] Extracting data from heterogeneous lab infrastructures
- [11:52–13:38] Current status: 115,000 slides uploaded and growing
- [13:38–18:39] Why LLMs and foundation models make curated data more valuable than ever
- [18:39–23:49] Industry collaboration and shared negotiating power
- [23:49–28:06] Data access models and governance after project independence
- [28:06–31:59] Sustainability plans and nonprofit foundation model
- [37:02–43:18] Tools developed: DICOMizer, artifact detection AI, image registration
📚 Resources from This Episode
- IMI BigPicture Consortium
- GDPR & Data Sharing Agreements (DSA)
- DICOMizer & SEND metadata tools
- Artifact detection AI for slide QC
- European AI Factories initiative
179 episodes