1/ Can codebase-specific RL push the frontier for code LLMs? At @cgftlabs, we helped a client RL-tune Qwen-2.5-7B on their internal codebase for unit test creation, with coverage-guided GRPO. The result? It beats o4-mini & o3. Here’s how it works (link to full blog in bio) 🧵
7,87K