cotalks.dev

Lightning Talk: Decoding and Taming the Costs of Serving Large Language Models - Yuan Chen, NVIDIA

(link)