ML Metrics Dashboard

RapidFire AI offers a browser-based dashboard to automatically visualize all ML metrics and lets you control runs on the fly from there. Our current default dashboard is a fork of the popular OSS tool MLflow, and it inherits much of MLflow’s native features.

As of this writing, we support only MLflow for visualization and run metadata management. But the execution engine and API are not tied to MLflow, and it can be easily extended to support other dashboards such as Trackio, TensorBoard, and Weights & Biases too. We will add such support in due course and also welcome community contributions on this front.

Tabs in the Dashboard

The main “Experiments” page on the dashboard has 4 main tabs:

  • Table

  • Chart

  • Experiment Log

  • Interactive Control (IC) Log

The screenshot below shows the “Table” view of an experiment with all its runs. Each run represents one model with one set of config knob values, which is standard dashboard semantics.

Table view of runs metadata

Metrics Plots

The screenshot below shows the “Chart” view of an experiment with all its runs. Each plot corresponds to a metric, spanning loss on the training set and evaluation set, as well all named metrics returned in your compute_metrics() function in the trainer config.

We call attention to 3 key aspects of the visualizations here:

  • The x-axis “Step” for the mini batch-level plots represents absolute number of minibatches seen by that run. So, if the batch_size is different for different runs in your experiment, they will take different numbers of steps and the curves will not line up till the end. This is not a bug but the expected correct behavior.

  • The x-axis “Step” for the epoch-level plots represents absolute number of epochs seen by that run. So, if the epochs is different for different runs in your experiment, they will take different numbers of steps again as above.

  • Please refresh the browser page to get RapidFire AI’s metrics reader to pull the latest data entries from the metrics files.

Metrics plots on dashboard

The dashboard picks some default colors for all runs, but you can change their colors by clicking the “color circle” next to the run number in the “Run Name” column. A color palette will pop up as shown in the screenshot below.

Change colors of runs

Message Logs

There are two continually appending message logs on the third and fourth tabs: “Experiment Log” and “Interactive Control Log”, respectively. All operations you run with RapidFire AI’s API will be displayed on the former. The latter will specifically display all the Interactive Control (IC) operations you do via the IC Ops panels, as shown on the screenshot below.

Logs on dashboard

The full experiment log will also be available as a text file saved on your local directory under the name “rapidfire.log”.