00:00
2023-07-20
cursor.com
large-language-models
Inference Characteristics of Llama-2
The article analyzes the cost and latency trade-offs of serving Llama-2-70B compared to GPT-3.5-turbo, concluding that Llama-2 is over 3x cheaper for prompt tokens but more expensive for completion toβ¦