Before / After Benchmark
Baseline:
meta/llama-3.3-70b-instruct
vs
Llama-3.1-8B + GRPO (LoRA)
Loading baseline…