Meus 2 cents:
Ao lado do "modelo do momento" tem tambem "o provedor/cloud GPU" para buscar o fornecedor mais barato (colocar um servidor de GPU privado hospedando seu modelo LLM e tentar escapar do custo por token e usar o mais previsivel custo por servidor).
No momento o que tenho usado como referencia eh:
Lista de provedores
https://devinschumacher.github.io/cloud-gpu-servers-services-providers/ https://gist.github.com/devinschumacher/87dd5b87234f2d0e5dba56503bfba533 https://research.aimultiple.com/cloud-gpu-providers/ https://research.aimultiple.com/cloud-gpu/
Alguns deles
https://www.vultr.com/pricing/#cloud-gpu https://www.hetzner.com/dedicated-rootserver/matrix-gpu/ https://lambda.ai/service/gpu-cloud#pricing https://www.liquidweb.com/gpu-hosting/
https://www.interserver.net/dedicated/gpu.html https://www.runpod.io/pricing https://www.cherryservers.com/dedicated-gpu-servers https://gthost.com/gpu-dedicated-servers