On The Bike Shed, hosts Joël Quenneville and Stephanie Minn discuss development experiences and challenges at thoughtbot with Ruby, Rails, JavaScript, and whatever else is drawing their attention, admiration, or ire this week.
…
continue reading
Content provided by Adam Bien. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Adam Bien or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.
Player FM - Podcast App
Go offline with the Player FM app!
Go offline with the Player FM app!
Accelerating LLMs with TornadoVM: From GPU Kernels to Model Inference
MP3•Episode home
Manage episode 483481420 series 2469611
Content provided by Adam Bien. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Adam Bien or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.
An airhacks.fm conversation with Juan Fumero (@snatverk) about:
…
continue reading
tornadovm as a Java parallel framework for accelerating data parallelization on GPUs and other hardware, first GPU experiences with ELSA Winner and Voodoo cards, explanation of TornadoVM as a plugin to existing JDKs that uses Graal as a library, TornadoVM's programming model with @parallel and @reduce annotations for parallelizable code, introduction of kernel API for lower-level GPU programming, TornadoVM's ability to dynamically reconfigure and select the best hardware for workloads, implementation of LLM inference acceleration with TornadoVM, challenges in accelerating Llama models on GPUs, introduction of tensor types in TornadoVM to support FP8 and FP16 operations, shared buffer capabilities for GPU memory management, comparison of Java Vector API performance versus GPU acceleration, discussion of model quantization as a potential use case for TornadoVM, exploration of Deep Java Library (DJL) and its ND array implementation, potential standardization of tensor types in Java, integration possibilities with Project Babylon and its Code Reflection capabilities, TornadoVM's execution plans and task graphs for defining accelerated workloads, ability to run on multiple GPUs with different backends simultaneously, potential enterprise applications for LLMs in Java including model distillation for domain-specific models, discussion of Foreign Function & Memory API integration in TornadoVM, performance comparison between different GPU backends like OpenCL and CUDA, collaboration with Intel Level Zero oneAPI and integrated graphics support, future plans for RISC-V support in TornadoVM
Juan Fumero on twitter: @snatverk
347 episodes
MP3•Episode home
Manage episode 483481420 series 2469611
Content provided by Adam Bien. All podcast content including episodes, graphics, and podcast descriptions are uploaded and provided directly by Adam Bien or their podcast platform partner. If you believe someone is using your copyrighted work without your permission, you can follow the process outlined here https://staging.podcastplayer.com/legal.
An airhacks.fm conversation with Juan Fumero (@snatverk) about:
…
continue reading
tornadovm as a Java parallel framework for accelerating data parallelization on GPUs and other hardware, first GPU experiences with ELSA Winner and Voodoo cards, explanation of TornadoVM as a plugin to existing JDKs that uses Graal as a library, TornadoVM's programming model with @parallel and @reduce annotations for parallelizable code, introduction of kernel API for lower-level GPU programming, TornadoVM's ability to dynamically reconfigure and select the best hardware for workloads, implementation of LLM inference acceleration with TornadoVM, challenges in accelerating Llama models on GPUs, introduction of tensor types in TornadoVM to support FP8 and FP16 operations, shared buffer capabilities for GPU memory management, comparison of Java Vector API performance versus GPU acceleration, discussion of model quantization as a potential use case for TornadoVM, exploration of Deep Java Library (DJL) and its ND array implementation, potential standardization of tensor types in Java, integration possibilities with Project Babylon and its Code Reflection capabilities, TornadoVM's execution plans and task graphs for defining accelerated workloads, ability to run on multiple GPUs with different backends simultaneously, potential enterprise applications for LLMs in Java including model distillation for domain-specific models, discussion of Foreign Function & Memory API integration in TornadoVM, performance comparison between different GPU backends like OpenCL and CUDA, collaboration with Intel Level Zero oneAPI and integrated graphics support, future plans for RISC-V support in TornadoVM
Juan Fumero on twitter: @snatverk
347 episodes
All episodes
×Welcome to Player FM!
Player FM is scanning the web for high-quality podcasts for you to enjoy right now. It's the best podcast app and works on Android, iPhone, and the web. Signup to sync subscriptions across devices.