TornadoVM: Supercharge your Java with GPU acceleration - Christos Kotselidis, Mary Xekalaki
(link)Summary
TornadoVM is a JDK plugin that lets Java applications target GPUs with minimal code changes. The talk explains why GPU programming is different from CPU-centric Java, then introduces TornadoVM’s execution model: loop-level parallelism, explicit kernel programming, task graphs for dispatch, and execution plans for launching work on the device. The presenters show how TornadoVM integrates with the Java ecosystem through SDKMAN, Maven Central, IntelliJ, GraalVM, Quarkus, and LangChain4j. Demos include matrix multiplication using both the loop-parallel API and the lower-level kernel API, plus GPU Llama 3 inference and a Quarkus-based summarization example. The talk also covers Tornado Inside, an IntelliJ plugin for static checks and debugging support, and TornadoVM’s production deployment in the Gaia mission at the European Space Agency.
Key Takeaways
- TornadoVM is a plug-in for JDK distributions that brings GPU acceleration to Java applications.
- It supports OpenCL, NVIDIA CUDA, and SPIR-V, with Metal support in progress for Apple devices.
- The programming model separates host code, kernels, task graphs, and execution plans.
- Loop parallel annotations offer a simple path to GPU offload, while the kernel API exposes lower-level control and optimization.
- TornadoVM uses off-heap data types and task graphs to manage GPU memory transfers more explicitly and efficiently.
- Tornado Inside provides IntelliJ integration for code analysis, debugging, and compile-time feedback on TornadoVM code.
- GPU Llama 3 demonstrates full Java GPU inference and integration with LangChain4j and Quarkus.
- TornadoVM is already used in production for Gaia data processing, where it helped accelerate some workloads by over 500x.
Sections
What TornadoVM Is
TornadoVM is described as a plugin for JDK distributions that adds APIs and a runtime for GPU execution. It is open source, has multiple releases, and is positioned as part of the Java ecosystem rather than a separate programming stack. The talk emphasizes that the same Java application can be run across different environments while TornadoVM handles GPU compilation, offload, and data movement.
Why Java Developers Need a Different GPU Model
The presenters explain that CPU-oriented Java features such as streams, the Vector API, and virtual threads do not map cleanly to GPU execution. GPU programming requires a different mental model: work is distributed across a mesh of threads, memory copies must often be managed explicitly, and developers must decide what to offload. TornadoVM exists to hide that complexity behind a higher-level Java API.
Programming Model: Loop API, Kernel API, Task Graphs, and Execution Plans
TornadoVM exposes two main ways to express GPU work. The loop-parallel API uses annotations such as `@Parallel` and `@Reduce` to offload loops and reductions. The kernel API is lower level and gives more direct control over thread-level work, synchronization, and GPU-specific optimization. To launch workloads, developers build task graphs that describe a pipeline of tasks and data transfers, then snapshot the graph into an immutable execution plan and execute it on a target device.
Installation and Runtime Integration
The talk highlights installation as a major barrier for GPU tooling and shows several options: installing via SDKMAN, using a repository clone and build script, or consuming artifacts from Maven Central. TornadoVM can be invoked through a `tornado` wrapper that hides JVM flags and runtime setup. This makes it easier to integrate into existing Java builds and deployment workflows.
Memory Management and Off-Heap Data
A major challenge in GPU execution is data movement, especially with the JVM heap and garbage collector. TornadoVM addresses this by supporting off-heap data structures, including arrays backed by the Foreign Memory API / Project Panama. This enables GPU-native data representations and controlled transfer between host and device without involving the GC in the same way as heap objects.
Tooling: Tornado Inside for IntelliJ
Tornado Inside is an IntelliJ plugin that understands TornadoVM APIs. It can detect TornadoVM methods, perform static analysis to flag unsupported constructs, and help developers debug whether a method can be compiled and executed on the GPU. The demo shows how it removes unsupported tasks and provides feedback directly in the editor.
Demos: Matrix Multiplication and GPU Llama 3
The presenters demonstrate matrix multiplication using both the simple loop-parallel API and a more optimized kernel API implementation that mirrors CUDA/OpenCL style code. They also show GPU Llama 3, a Java-based LLM inference engine that runs on the GPU through TornadoVM and supports models such as Llama 3, Mistral, Phi-3, Qwen, IBM Granite, and Gemma. The demo includes Quarkus and LangChain4j integration for a text summarization workflow.
Production Use and Roadmap
TornadoVM is already in production in the Gaia mission at the European Space Agency, where it was used for GPU acceleration of scientific data-processing pipelines. The speakers report large speedups and reduced end-to-end processing time for certain workloads. Future work includes broader platform support, including AMD and Apple silicon, deeper compiler and runtime optimizations for AI workloads, and alignment with OpenJDK projects such as Panama, Babylon, Valhalla, and Leyden.
Keywords: tornadovm, java gpu acceleration, gpu programming in java, jdk plugin, opencl backend, nvidia cuda backend, spir-v backend, metal backend, loop parallel api, kernel api, task graph, execution plan, off-heap memory, project panama foreign memory api, intellij plugin, tornado inside, quarkus integration, langchain4j integration, gpu llama 3, llm inference on gpu, matrix multiplication gpu, gpu data movement, java gpu runtime, european space agency gaia mission, scientific computing gpu