Before / After Benchmark

Baseline: meta/llama-3.3-70b-instruct vs Llama-3.1-8B + GRPO (LoRA)

Loading baseline…