Glossary of Key Terms and Concepts
Artifacts
All files related to an experiment saved under in a local folder with its path specified by
experiments_path
in the experiment’s constructor.
Includes the final (and possibly all epoch-level) model checkpoints of all runs across
all run_fit()
in that experiment, and all associated metrics files.
Read more here: API: Experiment.
Config Dictionary
A dictionary of key-value pairs that specify a full model training configuration.
A knob’s value can be a singleton or set-valued (List
or Range
).
A dictionary with set-valued knobs be fed to a config-group generator method.
A single combination of knob values in this dictionary is called a “leaf” config.
RapidFire AI instantiates one run per leaf config, and its values are injected via the
model_config
argument to your create_model_fn()
function.
Read more here: API: Multi-Config Specification.
Config Group
A set of config dictionary instances, produced in bulk by providing a config dictionary with set-valued knobs to a config-group generator method (see below). It can also be a Python list of individual config dictionaries or config-group generators recursively.
Read more here: API: Multi-Config Specification.
Config Group Generator
A method to generate a group of config dictionaries in one go based on an input
config dictionary with set-valued knobs (List
or Range
).
Currently supported generator methods are grid search (RFGridSearch
) and
random search (RFRandomSearch
). Support for AutoML heuristics coming soon.
Read more here: API: Multi-Config Specification.
Experiment
A core concept in the RapidFire AI API that defines a collection of training and evaluation operations performed. Each experiment is assigned a unique name that is used for both display of plots on the ML metrics dashboard and for artifact tracking. At any point in time, only one experiment can be alive.
Read more here: API: Experiment.
Experiment Ops
Computation methods associated with the Experiment
class of RapidFire AI:
run_fit()
, end()
, and the constructor.
Also includes two informational methods: get_runs_info()
and get_results()
.
We will expand this API with more operations based on feedback, e.g., for batch testing
and inference/generation.
Read more here: API: Experiment.
Interactive Control Ops (IC Ops)
Operations to control runs in flight during a run_fit()
.
RapidFire AI automatically reapportions GPU resources across runs under the hood.
We currently support 4 IC Ops: Stop, Resume, Clone-Modify, and Delete.
Read more here: Dashboard: Interactive Control (IC) Ops.
Knob
A single entry in the config dictionary given for experimentation.
A knob’s value can be a singleton or set-valued (List
or Range
).
Read more here: API: Multi-Config Specification.
Logs
Files with entries about all operations run by RapidFire AI, including IC Ops, to aid monitoring of debugging of experiment behaviors. The whole experiment log is displayed on the app under the “Experiment Log” tab. Likewise, all IC Ops are displayed under their own tab next to it.
Read more here: ML Metrics Dashboard.
ML Metrics Dashboard
A dashboard to display plots of all ML metrics (loss and eval metrics) of all runs and experiments, overlay IC Ops functionality, and display informative logs. RapidFire AI’s current default dashboard is a fork of the popular OSS tool MLflow.
Read more here: ML Metrics Dashboard.
Results
A single DataFrame containing all loss and eval metrics values of all runs across all epochs
across all run_fit()
invocations in this experiment so far.
Returned by the get_results()
.
Read more here: API: Experiment.
Run
A central concept in RapidFire AI representing a single combination of configuration knob values
for a model trained with run_fit()
.
It is the same concept as in ML metrics dashboards such as MLflow and Weights & Biases.
RapidFire AI assigns each run a unique integer run_id
within an experiment.
Read more here: API: Experiment and ML Metrics Dashboard.