cotalks.dev

Scale to 0 LLM inference: Cost efficient open model deployment on serverless GPUs by Wietse Venema

(link)
Channel: Devoxx
note