Use this to size inference and training workloads with more realistic assumptions around precision, concurrency, KV cache, throughput, and deployment model.