API: Trainer Configs
RapidFire AI’s core APIs for trainer specifications are all thin wrappers around the corresponding APIs of Hugging Face’s TRL libraries.
RFSFTConfig
This is a wrapper around SFTConfig
in HF TRL.
The full signature and list of arguments are available on this page.
The only difference here is that the individual arguments (knobs) can be List
valued
or Range
valued in RFSFTConfig
.
That is how you can specify a base set of knob combinations from which a config group can be
produced. Also read the Multi-Config Specification page.
Other than the multi-config specification, this class preserves all semantics of Hugging Face’s
SFT trainer under the hood.
Examples:
# From the SFT tutorial notebook
RFSFTConfig(
learning_rate=2e-4,
lr_scheduler_type = "linear",
per_device_train_batch_size=4,
per_device_eval_batch_size=8,
gradient_accumulation_steps=4,
num_train_epochs=2,
logging_steps=5,
eval_strategy="steps",
eval_steps=25,
fp16=True,
save_strategy="epoch",
)
# Two knobs have list values here
RFSFTConfig(
learning_rate=List([2e-4, 1e-5]),
lr_scheduler_type=List(["linear", "cosine"]),
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
per_device_eval_batch_size=8,
num_train_epochs=2,
logging_steps=5,
eval_strategy="steps",
eval_steps=25,
fp16=True,
save_strategy="epoch",
)
RFDPOConfig
This is a wrapper around DPOConfig
in HF TRL.
The full signature and list of arguments are available on this page.
Again, the only difference here is that the individual arguments (knobs) can be List
valued or Range
valued in RFDPOConfig
.
That is how you can specify a base set of knob combinations from which a config group can
be produced. Also read the Multi-Config Specification page.
Other than the multi-config specification, this class preserves all semantics of
Hugging Face’s DPO trainer under the hood.
Example:
# Based on the DPO tutorial notebook; one knob has list of values
base_dpo_config = RFDPOConfig(
model_adapter_name="default",
ref_adapter_name="reference",
force_use_ref_model=False,
loss_type="sigmoid",
beta=List([0.1,0.001]),
max_prompt_length=1024,
max_completion_length=1024,
max_length=2048,
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
learning_rate=5e-6,
warmup_ratio=0.1,
weight_decay=0,
lr_scheduler_type="linear",
optim="adamw_8bit",
num_train_epochs=1,
logging_strategy="steps",
logging_steps=1,
bf16=True,
save_strategy="epoch",
)
RFGRPOConfig
This is a wrapper around GRPOConfig
in HF TRL.
The full signature and list of arguments are available on this page.
Again, the only difference here is that the individual arguments (knobs) can be List
valued or Range
valued in RFGROConfig
.
That is how you can specify a base set of knob combinations from which a config group can
be produced. Also read the Multi-Config Specification page.
Other than the multi-config specification, this class preserves all semantics of
Hugging Face’s GRPO trainer under the hood.
Example:
# Based on the GRPO tutorial notebook
RFGRPOConfig(
learning_rate=5e-6,
warmup_ratio=0.1,
weight_decay=0.1,
max_grad_norm=0.1,
adam_beta1=0.9,
adam_beta2=0.99,
lr_scheduler_type = "linear",
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
num_generations=8,
optim ="adamw_8bit",
num_train_epochs=2,
max_prompt_length=1024,
max_completion_length=1024,
logging_steps=2,
eval_steps=5,
)