FiQA: RAG for Financial Opinion Q&A Chatbot

Please check out the tutorial notebook on the link below. Right click on the GitHub link to save that file locally.

RAG for financial opinion Q&A chatbot: View on GitHub.

This use case notebook features an all-self-hosted open model workflow, with models from Hugging Face for both embedding and generation.

Task, Dataset, and Prompt

This tutorial shows Retrieval-Augmented Generation (RAG) for creating a financial opinion Q&A chatbot.

It uses the “FiQA” dataset from the BEIR benchmark; see its details here. The dataset contains financial questions and a corpus of documents for retrieval.

The prompt format includes system instructions defining the assistant as a financial advisor and incorporates retrieved context along with user queries.

Model, RAG Components, and Configuration Knobs

We compare 2 generator model sizes: Qwen2.5-0.5B-Instruct and Qwen2.5-3B-Instruct.

There are 2 different chunking strategies: 256-token chunks and 128-token chunks, both with 32-token overlap using recursive character splitting with tiktoken encoding.

The RAG pipeline uses:

  • Embeddings: sentence-transformers/all-MiniLM-L6-v2 with GPU acceleration.

  • Vector Store: FAISS with GPU-based exact search, i.e., no ANN approximation.

  • Retrieval: Top-15 similarity search.

  • Reranking: cross-encoder/ms-marco-MiniLM-L6-v2 with 2 different top-n values: 2 and 5.

All other knobs are fixed across all configs. Thus, there are a total of 8 combinations launched with a simple grid search: 2 generator models x 2 chunk sizes x 2 reranking top-n values.