Install and Get Started
Step 1: Install dependencies and package
Obtain the RapidFire AI OSS package from pypi (includes all dependencies) and ensure it is installed correctly.
Important
Requires Python 3.12+. Ensure that python3
resolves to Python 3.12 before creating the venv.
python3 --version # must be 3.12.x
python3 -m venv .venv
source .venv/bin/activate
pip install rapidfireai
pip show rapidfireai
# Verify it prints the following:
# Name: rapidfireai
# Version: 0.10.1
# ...
export PATH="$HOME/.local/bin:$PATH"
rapidfireai --version
# Verify it prints the following:
# RapidFire AI 0.10.1
Provide your Hugging Face account token to access the gated Llama and Mistral models showcased in the tutorial notebooks. If you do not have such a token, you have two options:
Switch the
model_name
in the tutorial notebook to a non-gated model from Hugging Face. Then proceed to Step 2.Create a Hugging Face token as explained here. Then request access on the following gated models’ Hugging Face pages:
Headsup: the approval for the Llama models may take a few hours. Then provide your HF token in the same venv.
source .venv/bin/activate
pip install huggingface-hub
huggingface-cli login # Provide your HF token
Feel free to ask us on Discord if you need any help with accessing gated Hugging Face models. Unfortunately, we are not allowed to provide a publicly visible token here for your use due to Hugging Face’s policies.
Step 2: Start RapidFire AI server
Run the following command to initialize rapidfireai to use the correct dependencies:
rapidfireai init
# It will install specific dependencies and initialize rapidfireai
Note
You need to run init only once for a new venv or when switching GPU(s) on your machine. You do NOT need to run it after a reboot, start/stop of rapidfireai, or for a new terminal.
Next start RapidFire AI services: the frontend with the ML metrics dashboard and the API server. The frontend URL shown below can be opened on your local browser.
rapidfireai start
# It should print about 50 lines, including the following:
# ...
# RapidFire Frontend is ready
# Open your browser and navigate to: http://0.0.0.0:3000
# ...
# Press Ctrl+C to stop all services
Important
Do NOT proceed until the start is successful with “Available endpoints” printed as above. Leave this terminal running while you work through the tutorial notebooks.
If you close the terminal in which you started rapidfireai or if you rebooted your machine, just start rapidfireai again with the above command.
If the start command fails for whatever reason, wait for half a minute and rerun it. For diagnostics and common fixes (including Linux/macOS and Windows steps), see Troubleshooting.
Step 3: Download the tutorial notebooks
Only after completing Step 2, download the example tutorial notebooks (explained further here: Example Use Case Tutorials). If your GPU has < 80 GB HBM, use the “lite” versions of these notebooks. The only difference is they showcase smaller LLMs and finish faster. Right click on the GitHub link to save that file locally.
SFT for customer support Q&A chatbot: View on GitHub
Lite version: View on GitHub
DPO for alignment: View on GitHub
Lite version: View on GitHub
GRPO for math reasoning: View on GitHub
Lite version: View on GitHub
Quickstart Video (2.5min)
Full Usage Walkthrough Video (12min)
Step 4: Run the notebook cells
Run the cells one by one as shown in the above videos. Wait for a cell to finish before running the next.
Imports
Load dataset and specify train and eval partitions
If you want to run the notebook faster for demo purposes, downsample the data further as per your wish. Here are some suggested reductions. You can also reduce effective batch size by reducing either or both of
per_device_train_batch_size
andgradient_accumulation_steps
in the trainer configs.- SFT notebook:
train_dataset=dataset["train"].select(range(128)) # 128 instead of 5000 eval_dataset=dataset["train"].select(range(5000,5032)) # 5032 instead of 5200
- DPO notebook:
select(range(128)) # 128 instead of 500
- GRPO notebook:
train_dataset = get_gsm8k_questions(split="train").select(range(128)) # 128 instead of 5000 eval_dataset = get_gsm8k_questions(split="test").select(range(32)) # 32 instead of 100
Define example processing function
Create named RF experiment
Define custom eval metrics function
Define multi-config knobs for model, LoRA, and SFT Trainer using RapidFire AI wrapper APIs
Define model creation function for all model types across configs
Generate config group you want to compare in one go
Launch multi-config training; adjust
num_chunks
as per desired concurrency (see Run Fit for details)# Launch training of all configs in the config_group with swap granularity of 4 chunks experiment.run_fit(config_group, sample_create_model, train_dataset, eval_dataset, num_chunks=4, seed=42)
Note that in the same experiment, you can run as many
run_fit()
as you want. All their runs will be superimposed on the same plots on the dashboard.
Step 5: Monitor training behaviors on ML metrics dashboard

Step 6: Interactive Control (IC) Ops: Stop, Clone-Modify; check their results



Step 7: End experiment; stop server when done
Run the cell to end the expeirment when you are done with it.
experiment.end()
You can then move on to another (named) experiment in the same session. Run as many experiments as you like; each will have its plots apppear on the dashboard under its name. All experiment artifacts (metrics files, logs, checkpoints, etc.) are persistent on your machine in the same location as your notebook.
When you are done overall, gracefully stop the RapidFire AI session and free the ports used in one of two ways:
Press Ctrl+C on the terminal where
rapidfireai start
was performed. Wait for all services to finish cleanly.In a separate terminal tab, run the stop command as follows and wait for it to finish fully. If you had run the start command as a background process, feel free to run the stop command in the same terminal tab.
rapidfireai stop
Important
If you kill the rapidfireai server forcibly without graceful stopping as above, you might lose some experiment artifacts and/or metadata.
Step 8: Venture Beyond!
After trying out the tutorial notebooks, explore the rest of this docs website, especially the API and dashboard pages. Play around more with IC Ops and/or run more experiments as you wish, including changing the datasets, models, config knobs, and code for the functions and rewards.
You are now up to speed! Enjoy the power of rapid AI customization with RapidFire AI!