TRL
search
Ctrlk
  • 🌍GET STARTEDchevron-right
  • 🌍APIchevron-right
    • Model Classes
    • Trainer Classes
    • Reward Model Training
    • Supervised Fine-Tuning
    • PPO Trainer
    • Best of N Sampling
    • DPO Trainer
    • Denoising Diffusion Policy Optimization
    • Text Environments
  • 🌍EXAMPLESchevron-right
gitbookPowered by GitBook
block-quoteOn this pagechevron-down

🌍API

Model Classeschevron-rightTrainer Classeschevron-rightReward Model Trainingchevron-rightSupervised Fine-Tuningchevron-rightPPO Trainerchevron-rightBest of N Samplingchevron-rightDPO Trainerchevron-rightDenoising Diffusion Policy Optimizationchevron-rightText Environmentschevron-right
PreviousUnderstanding Logschevron-leftNextModel Classeschevron-right