Comparing LLMs for Enterprise: Interpreting 0.7% vs 20.2% and What Really Matters
https://rowansbrilliantblog.theburnward.com/comparing-incompatible-test-methodologies-what-actually-matters-in-production
Vendor numbers are tempting: "0.7% hallucination on basic summarization" or "20.2% hallucination rate." Those figures matter, but not the way many product decks imply