API: Trainer Configs =============== RapidFire AI's core APIs for trainer specifications are all thin wrappers around the corresponding APIs of Hugging Face's `TRL `__ libraries. RFSFTConfig ------ This is a wrapper around :class:`SFTConfig` in HF TRL. The full signature and list of arguments are available on `this page `__. The only difference here is that the individual arguments (knobs) can be :class:`List` valued or :class:`Range` valued in :class:`RFSFTConfig`. That is how you can specify a base set of knob combinations from which a config group can be produced. Also read :doc:`the Multi-Config Specification page`. Other than the multi-config specification, this class preserves all semantics of Hugging Face's SFT trainer under the hood. **Examples:** .. code-block:: python # From the SFT tutorial notebook RFSFTConfig( learning_rate=2e-4, lr_scheduler_type = "linear", per_device_train_batch_size=4, per_device_eval_batch_size=8, gradient_accumulation_steps=4, num_train_epochs=2, logging_steps=5, eval_strategy="steps", eval_steps=25, fp16=True, save_strategy="epoch", ) # Two knobs have list values here RFSFTConfig( learning_rate=List([2e-4, 1e-5]), lr_scheduler_type=List(["linear", "cosine"]), per_device_train_batch_size=4, gradient_accumulation_steps=4, per_device_eval_batch_size=8, num_train_epochs=2, logging_steps=5, eval_strategy="steps", eval_steps=25, fp16=True, save_strategy="epoch", ) RFDPOConfig ------ This is a wrapper around :class:`DPOConfig` in HF TRL. The full signature and list of arguments are available on `this page `__. Again, the only difference here is that the individual arguments (knobs) can be :class:`List` valued or :class:`Range` valued in :class:`RFDPOConfig`. That is how you can specify a base set of knob combinations from which a config group can be produced. Also read :doc:`the Multi-Config Specification page`. Other than the multi-config specification, this class preserves all semantics of Hugging Face's DPO trainer under the hood. **Example:** .. code-block:: python # Based on the DPO tutorial notebook; one knob has list of values base_dpo_config = RFDPOConfig( model_adapter_name="default", ref_adapter_name="reference", force_use_ref_model=False, loss_type="sigmoid", beta=List([0.1,0.001]), max_prompt_length=1024, max_completion_length=1024, max_length=2048, per_device_train_batch_size=2, gradient_accumulation_steps=4, learning_rate=5e-6, warmup_ratio=0.1, weight_decay=0, lr_scheduler_type="linear", optim="adamw_8bit", num_train_epochs=1, logging_strategy="steps", logging_steps=1, bf16=True, save_strategy="epoch", ) RFGRPOConfig ------ This is a wrapper around :class:`GRPOConfig` in HF TRL. The full signature and list of arguments are available on `this page `__. Again, the only difference here is that the individual arguments (knobs) can be :class:`List` valued or :class:`Range` valued in :class:`RFGROConfig`. That is how you can specify a base set of knob combinations from which a config group can be produced. Also read :doc:`the Multi-Config Specification page`. Other than the multi-config specification, this class preserves all semantics of Hugging Face's GRPO trainer under the hood. **Example:** .. code-block:: python # Based on the GRPO tutorial notebook RFGRPOConfig( learning_rate=5e-6, warmup_ratio=0.1, weight_decay=0.1, max_grad_norm=0.1, adam_beta1=0.9, adam_beta2=0.99, lr_scheduler_type = "linear", per_device_train_batch_size=4, gradient_accumulation_steps=4, num_generations=8, optim ="adamw_8bit", num_train_epochs=2, max_prompt_length=1024, max_completion_length=1024, logging_steps=2, eval_steps=5, )