Skip to content
#

ai-research

Here are 2 public repositories matching this topic...

Model evaluation harness for standardized benchmarking—comprehensive metrics (F1, BLEU, ROUGE, METEOR, BERTScore, pass@k), statistical analysis (confidence intervals, effect size, bootstrap CI, ANOVA), multi-model comparison, and report generation. Research-grade evaluation for LLM and ML experiments.

  • Updated Dec 29, 2025
  • Elixir

Improve this page

Add a description, image, and links to the ai-research topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-research topic, visit your repo's landing page and select "manage topics."

Learn more