Discussion about this post

User's avatar
James's avatar

with regards to modern-day LLM's, this means that the pretraining data is your dataset. the post-train response tuning and reinforcement stack represents the organization of the dataset. optimal search is exploiting how the post-trained tuning interacts with the dataset.

No posts

Ready for more?