Three eras of teaching a model to use a tool — prompt the loop, filter the data on its own loss, or reward the policy.
A clear, side-by-side comparison with examples — part of Rudrite Research.
Three eras of teaching a model to use a tool — prompt the loop, filter the data on its own loss, or reward the policy.
A clear, side-by-side comparison with examples — part of Rudrite Research.