SFT for Q&A Chatbot

Please check out the tutorial notebooks on the links below. Right click on the GitHub link to save that file locally.

SFT for customer support Q&A chatbot: View on GitHub. Use this version if your GPU has >= 80 GB HBM.

Lite version: View on GitHub. Use this version if your GPU has < 80 GB HBM; it just uses smaller LLMs and finishes faster.

Or run this pre-configured Google Colab notebook on your browser; no installation required on your machine: RapidFire AI on Google Colab

Task, Dataset, and Prompt

This tutorial shows Supervised Fine-Tuning (SFT) for creating a customer support Q&A chatbot.

It uses the “Bitext customer support” dataset; see its details on Hugging Face. We use a sample of 5,000 training examples and 200 evaluation examples for tractable demo runtimes.

The prompt format includes a system message defining the assistant as “helpful and friendly customer support” with user instructions and assistant responses

Model, Adapter, and Trainer Knobs

We compare 2 base model architectures: Llama-3.1-8B-Instruct and Mistral-7B-Instruct-v0.3. The lite version uses only one: TinyLlama-1.1B-Chat-v1.0.

There are 2 different LoRA adapter configurations: a low-capacity adapter (rank 16; 8 for lite) targeting only 2 modules and a high-capacity adapter (rank 128; 32 for lite) targeting 4 modules.

All other knobs are fixed across all configs. Thus, there are a total of 4 combinations, all launched with a simple grid search.