machine learning +
How to Evaluate an LLM: Benchmarks, Metrics, and Practical Workflows
machinelearningplus.com
25 min
NLP
How to Evaluate an LLM: Benchmarks, Metrics, and Practical Workflows
Evaluate your LLM using MMLU, MT-Bench, LLM-as-judge, and ROUGE. Covers lm-evaluation-harness, fine-tuned model comparison, and evaluation pitfalls. With code.
