Glossary of Key Terms and Concepts

Artifacts

All files related to an experiment saved under in a local folder with its path specified by experiments_path in the experiment’s constructor. Includes the final (and possibly all epoch-level) model checkpoints of all runs across all run_fit() in that experiment, and all associated metrics files.

Read more here: API: Experiment.

Config Dictionary

A dictionary of key-value pairs that specify a full model training configuration. A knob’s value can be a singleton or set-valued (List or Range). A dictionary with set-valued knobs be fed to a config-group generator method. A single combination of knob values in this dictionary is called a “leaf” config. RapidFire AI instantiates one run per leaf config, and its values are injected via the model_config argument to your create_model_fn() function.

Read more here: API: Multi-Config Specification.

Config Group

A set of config dictionary instances, produced in bulk by providing a config dictionary with set-valued knobs to a config-group generator method (see below). It can also be a Python list of individual config dictionaries or config-group generators recursively.

Read more here: API: Multi-Config Specification.

Config Group Generator

A method to generate a group of config dictionaries in one go based on an input config dictionary with set-valued knobs (List or Range). Currently supported generator methods are grid search (RFGridSearch) and random search (RFRandomSearch). Support for AutoML heuristics coming soon.

Read more here: API: Multi-Config Specification.

Experiment

A core concept in the RapidFire AI API that defines a collection of training and evaluation operations performed. Each experiment is assigned a unique name that is used for both display of plots on the ML metrics dashboard and for artifact tracking. At any point in time, only one experiment can be alive.

Read more here: API: Experiment.

Experiment Ops

Computation methods associated with the Experiment class of RapidFire AI: run_fit(), end(), and the constructor. Also includes two informational methods: get_runs_info() and get_results(). We will expand this API with more operations based on feedback, e.g., for batch testing and inference/generation.

Read more here: API: Experiment.

Interactive Control Ops (IC Ops)

Operations to control runs in flight during a run_fit(). RapidFire AI automatically reapportions GPU resources across runs under the hood. We currently support 4 IC Ops: Stop, Resume, Clone-Modify, and Delete.

Read more here: Dashboard: Interactive Control (IC) Ops.

Knob

A single entry in the config dictionary given for experimentation. A knob’s value can be a singleton or set-valued (List or Range).

Read more here: API: Multi-Config Specification.

Logs

Files with entries about all operations run by RapidFire AI, including IC Ops, to aid monitoring of debugging of experiment behaviors. The whole experiment log is displayed on the app under the “Experiment Log” tab. Likewise, all IC Ops are displayed under their own tab next to it.

Read more here: ML Metrics Dashboard.

ML Metrics Dashboard

A dashboard to display plots of all ML metrics (loss and eval metrics) of all runs and experiments, overlay IC Ops functionality, and display informative logs. RapidFire AI’s current default dashboard is a fork of the popular OSS tool MLflow.

Read more here: ML Metrics Dashboard.

Results

A single DataFrame containing all loss and eval metrics values of all runs across all epochs across all run_fit() invocations in this experiment so far. Returned by the get_results().

Read more here: API: Experiment.

Run

A central concept in RapidFire AI representing a single combination of configuration knob values for a model trained with run_fit(). It is the same concept as in ML metrics dashboards such as MLflow and Weights & Biases. RapidFire AI assigns each run a unique integer run_id within an experiment.

Read more here: API: Experiment and ML Metrics Dashboard.