What to Measure When Evaluating Model Refresh, Benchmarks, and Production Switching
https://www.instapaper.com/read/1987458531
Between Jan 10 and Feb 28, 2024 I ran a focused evaluation across 40 publicly available and vendor-hosted model endpoints
Between Jan 10 and Feb 28, 2024 I ran a focused evaluation across 40 publicly available and vendor-hosted model endpoints