API: Trainer Configs

RapidFire AI’s core APIs for trainer specifications are all thin wrappers around the corresponding APIs of Hugging Face’s TRL libraries.

RFSFTConfig

This is a wrapper around SFTConfig in HF TRL. The full signature and list of arguments are available on this page.

The only difference here is that the individual arguments (knobs) can be List valued or Range valued in RFSFTConfig. That is how you can specify a base set of knob combinations from which a config group can be produced. Also read the Multi-Config Specification page. Other than the multi-config specification, this class preserves all semantics of Hugging Face’s SFT trainer under the hood.

Examples:

# From the SFT tutorial notebook
RFSFTConfig(
        learning_rate=2e-4,
        lr_scheduler_type = "linear",
        per_device_train_batch_size=4,
        per_device_eval_batch_size=8,
        gradient_accumulation_steps=4,
        num_train_epochs=2,
        logging_steps=5,
        eval_strategy="steps",
        eval_steps=25,
        fp16=True,
        save_strategy="epoch",
)

# Two knobs have list values here
RFSFTConfig(
        learning_rate=List([2e-4, 1e-5]),
        lr_scheduler_type=List(["linear", "cosine"]),
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        per_device_eval_batch_size=8,
        num_train_epochs=2,
        logging_steps=5,
        eval_strategy="steps",
        eval_steps=25,
        fp16=True,
        save_strategy="epoch",
)

RFDPOConfig

This is a wrapper around DPOConfig in HF TRL. The full signature and list of arguments are available on this page.

Again, the only difference here is that the individual arguments (knobs) can be List valued or Range valued in RFDPOConfig. That is how you can specify a base set of knob combinations from which a config group can be produced. Also read the Multi-Config Specification page. Other than the multi-config specification, this class preserves all semantics of Hugging Face’s DPO trainer under the hood.

Example:

# Based on the DPO tutorial notebook; one knob has list of values
base_dpo_config = RFDPOConfig(
        model_adapter_name="default",
        ref_adapter_name="reference",
        force_use_ref_model=False,
        loss_type="sigmoid",
        beta=List([0.1,0.001]),
        max_prompt_length=1024,
        max_completion_length=1024,
        max_length=2048,
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        learning_rate=5e-6,
        warmup_ratio=0.1,
        weight_decay=0,
        lr_scheduler_type="linear",
        optim="adamw_8bit",
        num_train_epochs=1,
        logging_strategy="steps",
        logging_steps=1,
        bf16=True,
        save_strategy="epoch",
)

RFGRPOConfig

This is a wrapper around GRPOConfig in HF TRL. The full signature and list of arguments are available on this page.

Again, the only difference here is that the individual arguments (knobs) can be List valued or Range valued in RFGROConfig. That is how you can specify a base set of knob combinations from which a config group can be produced. Also read the Multi-Config Specification page. Other than the multi-config specification, this class preserves all semantics of Hugging Face’s GRPO trainer under the hood.

Example:

# Based on the GRPO tutorial notebook
RFGRPOConfig(
        learning_rate=5e-6,
        warmup_ratio=0.1,
        weight_decay=0.1,
        max_grad_norm=0.1,
        adam_beta1=0.9,
        adam_beta2=0.99,
        lr_scheduler_type = "linear",
        per_device_train_batch_size=4,
        gradient_accumulation_steps=4,
        num_generations=8,
        optim ="adamw_8bit",
        num_train_epochs=2,
        max_prompt_length=1024,
        max_completion_length=1024,
        logging_steps=2,
        eval_steps=5,
)