On the Generalization of SFT A Reinforcement Learning Perspective with Reward Rectification
discuss with author:
15,03K