FiQA: RAG for Financial Opinion Q&A Chatbot ======================= Please check out the tutorial notebook on the link below. Right click on the GitHub link to save that file locally. RAG for financial opinion Q&A chatbot: `View on GitHub `__. This use case notebook features an all-self-hosted open model workflow, with models from Hugging Face for both embedding and generation. Task, Dataset, and Prompt ------- This tutorial shows Retrieval-Augmented Generation (RAG) for creating a financial opinion Q&A chatbot. It uses the "FiQA" dataset from the BEIR benchmark; `see its details here `__. The dataset contains financial questions and a corpus of documents for retrieval. The prompt format includes system instructions defining the assistant as a financial advisor and incorporates retrieved context along with user queries. Model, RAG Components, and Configuration Knobs ------- We compare 2 generator model sizes: Qwen2.5-0.5B-Instruct and Qwen2.5-3B-Instruct. There are 2 different chunking strategies: 256-token chunks and 128-token chunks, both with 32-token overlap using recursive character splitting with tiktoken encoding. The RAG pipeline uses: - **Embeddings**: sentence-transformers/all-MiniLM-L6-v2 with GPU acceleration. - **Vector Store**: FAISS with GPU-based exact search, i.e., no ANN approximation. - **Retrieval**: Top-15 similarity search. - **Reranking**: cross-encoder/ms-marco-MiniLM-L6-v2 with 2 different top-n values: 2 and 5. All other knobs are fixed across all configs. Thus, there are a total of 8 combinations launched with a simple grid search: 2 generator models x 2 chunk sizes x 2 reranking top-n values.