FiQA: RAG for Financial Opinion Q&A Chatbot
Please check out the tutorial notebook on the link below. Right click on the GitHub link to save that file locally.
RAG for financial opinion Q&A chatbot: View on GitHub.
This use case notebook features an all-self-hosted open model workflow, with models from Hugging Face for both embedding and generation.
Task, Dataset, and Prompt
This tutorial shows Retrieval-Augmented Generation (RAG) for creating a financial opinion Q&A chatbot.
It uses the “FiQA” dataset from the BEIR benchmark; see its details here. The dataset contains financial questions and a corpus of documents for retrieval.
The prompt format includes system instructions defining the assistant as a financial advisor and incorporates retrieved context along with user queries.
Model, RAG Components, and Configuration Knobs
We compare 2 generator model sizes: Qwen2.5-0.5B-Instruct and Qwen2.5-3B-Instruct.
There are 2 different chunking strategies: 256-token chunks and 128-token chunks, both with 32-token overlap using recursive character splitting with tiktoken encoding.
The RAG pipeline uses:
Embeddings: sentence-transformers/all-MiniLM-L6-v2 with GPU acceleration.
Vector Store: FAISS with GPU-based exact search, i.e., no ANN approximation.
Retrieval: Top-15 similarity search.
Reranking: cross-encoder/ms-marco-MiniLM-L6-v2 with 2 different top-n values: 2 and 5.
All other knobs are fixed across all configs. Thus, there are a total of 8 combinations launched with a simple grid search: 2 generator models x 2 chunk sizes x 2 reranking top-n values.