Overview of RapidFire AI Package
RapidFire AI is a new experiment execution framework that transforms your LLM customization from slow, sequential processes into rapid, intelligent workflows with hyperparallelized training, dynamic real-time experiment control, and automatic multi-GPU system orchestration.
The Pain of LLM Fine-Tuning and Post-Training
Fine-tuning and post-training of open LLMs such as Llama, Mistral, Qwen, DeepSeek, etc. is increasingly popular. Teams perform such customization for their AI use cases for many reasons:
Domain Adaptation: Rapidly customize a model to your application’s specialized jargon, tone, and compliance requirements.
Boost Performance: More easily align models to application-specific metrics and reduce hallucinations.
Cost Efficiency: More effectively distill to smaller models, reduce inference costs, maximize GPU utilization.
But the traditional workflow of LLM fine-tuning/post-training is often painfully slow and expensive. GPU, time, and/or cost constraints often dissuade AI developers from even experimenting, potentially leaving a lot on the table for AI’s impact on their application. Current workflows often face key limitations:
Configuration Complexity: Multiple knobs affect LLM behavior: base model architecture, prompt structure, PEFT settings, hyperparameters, quantization schemes, reward functions, etc. Proper customization requires systematic experimentation.
Sequential Burden: High GPU cost and the large size of LLMs often force sequential exploration of only a few configurations using all GPUs at a time, causing slow progress overall and underutilizing GPUs during and in between trials.
Lack of Controllable Dynamism: In general, it is impossible to tell what will work best upfront, but existing approaches make it hard to adapt experiments on the fly with dynamic control based on actual results.
Our Solution: RapidFire AI
RapidFire AI tackles the above issues by designing for rapid experimentation to customize LLMs. It builds on top of three popular open source tools: PyTorch, Hugging Face, and MLflow. (But MLflow can be substituted with other ML metrics dashboard as well.) There are three pillars to our approach:

What Types of Configuration Knobs?:
Customizing an LLM requires you to decide numerous knobs that can affect quality and eval metrics. With RapidFire AI, you can more quickly experiment across all types of config knobs:
Data-Related: Prompt structure, example preprocessing, truncation length, etc.
Model-Related: Base architecture, LoRA specifics such as rank and target modules, quantization scheme, etc.
Trainer-Related: Learning hyperparameters such as learning rate or lr scheduler, batch size, reward functions, etc.
Key Benefits for your Application:
Faster Results: Achieve better eval metrics and alignment much faster through hyperparallelized experimentation.
Higher Productivity: Focus on model learning behavior and AI insights, not wrestling with GPUs and infrastructure.
Lower Costs: Higher GPU utilization + lower time to effective results = substantial cost savings.
Overview of RapidFire AI Usage
Just pip install the RapidFire AI OSS package. It can work on both a single-GPU and a multi-GPU machine. First launch our server from the command line. Then import it as any other python package in your notebook/script. Use our API to define and launch the configurations to compare in one go.
Metrics plots are automatically visualized in the ML metrics dashboard. From there you can dynamically control running models: stop, resume, clone, and modify them as you wish.

RapidFire AI is the first system of its kind to establish live three-way communication between the IDE where the experiment is launched, a metrics display/control dashboard, and a multi-GPU execution backend.
Also read the step-by-step walkthrough page and watch the usage video for details.
What Makes RapidFire AI Different?
The crux of RapidFire AI’s difference is in its adaptive execution engine: it enables “interruptible” execution of configurations across GPUs. To do so, it first shards the data randomly into “chunks.” Then instead of waiting for a run to see the whole dataset for all epochs, RapidFire AI schedules runs on one chunk at a time, cycling through all chunks.
Suppose you have only 1 GPU, say an A100 or H100. Current tools force you to run one configuration after another sequentially as shown in the (simplified) illustration below. In contrast, by operating on chunks, RapidFire AI offers a far more concurrent learning experience by automatically swapping adapters (and base models, if needed) across GPU(s) and DRAM. It does this via efficient shared memory-based caching mechanisms that can spill to disk when needed.

In the above figure, all 3 models are shown for 1 epoch. RapidFire AI is set to use 4 chunks. So, before model 3 (M3) even starts in the sequential approach, RapidFire AI already shows you the learning behaviors of all 3 models on the first 2-3 chunks. The overhead of swapping, represented by the thin gray box, is minimal, less than 5% of the runtime, as per our measurements thanks to our efficient memory management techniques.
The Power of Dynamic Real-Time Control
Our adaptive execution engine also enables a powerful new capability: dynamic real-time control over runs in flight. We call this Interactive Control Operations, or IC Ops for short.
Stop non-promising runs at any point–they will be put on a wait queue. Resume any later if you want to revisit it. Clone high-performing runs from the dashboard and Modify the configuration knobs as you see fit to try new variations. Warm start the clone’s parameters with the parent’s to give them a headstart in learning behavior. Under the hood, RapidFire AI automatically manages how runs and chunks are placed on GPUs, freeing you to focus fully on the logic of your AI experiment rather than wrestling with low-level systems issue to parallelize your work.
As the above figure shows, with suitable IC Ops based on the runs’ learning behaviors, you are able to compare 9 configs in roughly the same time it took to compare 3 sequentially! Read more about IC Ops here.

In the second example shown above, we show how you can be even more aggressive with your exploration thanks to RapidFire AI: launching 8 configs together even on just 1 GPU. And with multiple Stop and Clone-Modify operations, you can get a feel for even 14 configs on 1-2 chunks each in roughly the time it would take to compare just 2 configs on the full data! All the while, you are free to continue the training of whichever configs still look promising, resume those that you had stopped earlier, clone the clones further, and so on.
RapidFire AI supports multi-GPU setups natively. Here is a (simplified) illustration of sequential execution with Data Parallelism (say, with DDP or FSDP) vs. Task Parallelism (say, with Weights & Biases) vs. RapidFire AI, both without and with IC Ops. Our scheduler navigates multiple GPUs automatically so that you need not worry if any GPU is underutilized, e.g., like in the case of Task Parallelism for the workload shown below.

Why Not Just Downsample Data?
At first glance, one might consider running multi-config comparisons by downsampling data for quick estimates, then running promising configs on full data. While common, this approach is often misleading and cumbersome.
A single downsample introduces variance from one static snapshot, potentially leading to wrong conclusions, especially with overfitting-prone LLMs/DL models. It requires manual checkpoint management, adding tedious file work. You also do not get dynamic control (stop, resume, clone-modify), or you must reimplement such tricky operations, taking time away from your AI application work.
RapidFire AI takes such practical heuristics to their logical conclusion with chunk-based adaptive multi-config execution with dynamic experiment control. This offers you maximum power and flexibility for your AI development without adding DevOps grunt work, i.e., rapid experimentation.
The above said, note that downsampling is complementary to rapid experimentation–feel free do both! The adaptive execution can operate on your downsampled dataset all the same.