Current LLM evaluation benchmarks focus on short, factual answers, neglecting the instructional aspect where LLMs provide step-by-step guides. This is a crucial gap, especially for enterprise RAG systems where accuracy of instructions is paramount. Are we evaluating the right things?
Posted on 28 Sep 2024