API: LangChain RAG Spec

RapidFire AI’s core API for defining the stages of a RAG pipeline before the generator itself is a wrapper around the corresponding APIs of LangChain. In particular, this class specifies all of the following stages: data loading, chunking, embedding, indexing, retrieval, and reranking steps. Note that many of these stages are optional.

Some of the arguments (knobs) here can also be List valued or Range valued depending on its data type, as explained below. All this forms the base set of knob combinations from which a config group can be produced. Also read the Multi-Config Specification page.

class RFLangChainRagSpec

__init__(document_loader: BaseLoader, text_splitter: TextSplitter, embedding_cls: type[Embeddings] = None, embedding_kwargs: dict[str, Any] = None, vector_store: VectorStore = None, retriever: BaseRetriever = None, search_type: str = 'similarity', search_kwargs: dict = None, reranker_cls: type[BaseDocumentCompressor] = None, reranker_kwargs: dict[str, Any] = None, enable_gpu_search: bool = False, document_template: Callable[[Document], str] = None)

Initialize the RAG specification with document loading, chunking, embedding, indexing, retrieval, and reranking configurations.

Parameters:

document_loader (BaseLoader) – The loader for source documents from various sources (files, directories, databases, etc.). Must be a LangChain BaseLoader implementation.
text_splitter (TextSplitter) – The text splitter for chunking documents for RAG purposes. Controls chunk size, overlap, and splitting strategy. Must be a LangChain TextSplitter.
embedding_cls (type[Embeddings], optional) – Optional embedding class to convert a chunk/query into a vector. Options include HuggingFaceEmbeddings, OpenAIEmbeddings, etc.. Pass the class itself, not an instance.
embedding_kwargs (dict[str, Any], optional) – Dictionary containing all parameters needed to initialize the embedding class above. Required parameters vary by embedding class. For example, HuggingFaceEmbeddings needs model_name, model_kwargs and device.
vector_store (VectorStore, optional) – Optional vector store for storing and possibly indexing over embedding vectors. If not provided, a default FAISS flat vector store will be created automatically. Must be a LangChain VectorStore implementation.
retriever (BaseRetriever, optional) – Optional custom retriever for chunk retrieval. If not provided, a default FAISS vector store will be created automatically using the specified search configuration below. Must be a LangChain BaseRetriever implementation.
search_type (str) –
The search algorithm type for retrieval. Must be one of the following three options. Default is "similarity".
- "similarity": Standard cosine similarity search.
- "similarity_score_threshold": Similarity search with minimum score threshold (SST).
- "mmr": Maximum Marginal Relevance (MMR) search for diversity.
search_kwargs (dict, optional) –
Additional parameters for search configuration. The keys can include the following:
- "k": Number of documents to retrieve. Default is 5.
- "filter": Optional filter criteria function for search results.
- "score_threshold": Only for SST. Minimum similarity score threshold.
- "fetch_k": Only for MMR. Number of documents to fetch before MMR reranking. Default is 20.
- "lambda_mult": Only for MMR. Diversity parameter for MMR balancing relevance vs. diversity. Default is 0.5.
reranker_cls (type[BaseDocumentCompressor], optional) – Optional reranker class for reordering retrieved chunks by relevance. Options include CrossEncoderReranker from langchain.retrievers.document_compressors. The instantiated reranker is applied to each query’s results individually. Pass the class itself, not an instance.
reranker_kwargs (dict[str, Any], optional) – Dictionary containing all parameters needed to initialize the reranker class above. Required parameters vary by reranker class. For example, CrossEncoderReranker needs model_name, model_kwargs and top_n.
enable_gpu_search (bool, optional) – If True, uses GPU-accelerated FAISS (IndexFlatL2 on GPU) with matrix multiply for exact search. Otherwise uses CPU-based FAISS HNSW index (IndexHNSWFlat) for approximate search. GPU mode requires faiss-gpu package and CUDA-compatible GPU. Default is False.
document_template (Callable[[Document], str], optional) – Optional function to format chunks for display or downstream processing. Should accept a single LangChain Document object and return a formatted string. Default template format is "metadata:\ncontent". Multiple documents are separated by double newlines. If not provided, uses default template.

serialize_documents(batch_docs: list[list[Document]]) → list[str]

Serialize batch of context document chunks into formatted strings for context injection.

Parameters:: batch_docs (list[list[Document]]) – List of Document lists, where each inner list contains Documents for one query.
Returns:: List of formatted document chunk strings, one per query, with different document chunks separated by double newlines.
Return type:: list[str]

get_context(batch_queries: list[str], use_reranker: bool = True, serialize: bool = True) → list[str] | list[list[Document]]

Convenience function to retrieve and optionally also serialize relevant context document chunks for batch queries. By default, if a reranker is provided in the RAG spec it will be applied.

Parameters:

batch_queries (list[str]) – List of query strings to retrieve context for.
use_reranker (bool, optional) – Whether to apply reranking if a reranker is provided. Default is True. Set to False to skip reranking.
serialize (bool, optional) – Whether to serialize documents into strings. If False, returns raw Document objects. Default is True.

Returns:

List of formatted context strings (if :code:`serialize`=True) or list of Document lists (if :code:`serialize`=False), one per query.

Return type:

list[str] | list[list[Document]]

Raises:

ValueError – If retriever is not configured in RAG spec; internal method build_index() will fail.

See also

Example:

# Based on the FiQA tutorial notebook
rag_gpu = RFLangChainRagSpec(
    document_loader=DirectoryLoader(
        path="data/fiqa/",
        glob="corpus.jsonl",
        loader_cls=JSONLoader,
        loader_kwargs={
            "jq_schema": ".",
            "content_key": "text",
            "metadata_func": lambda record, metadata: {
                "corpus_id": int(record.get("_id"))
            },  # store the document id
            "json_lines": True,
            "text_content": False,
        },
        sample_seed=42,
    ),
    # 2 chunking strategies with different chunk sizes
    text_splitter=List(
        [
            RecursiveCharacterTextSplitter.from_tiktoken_encoder(
                encoding_name="gpt2", chunk_size=256, chunk_overlap=32
            )
        ],
        [
            RecursiveCharacterTextSplitter.from_tiktoken_encoder(
                encoding_name="gpt2", chunk_size=128, chunk_overlap=32
            )
        ],
    ),
    embedding_cls=HuggingFaceEmbeddings,
    embedding_kwargs={
        "model_name": "sentence-transformers/all-MiniLM-L6-v2",
        "model_kwargs": {"device": "cuda:0"},
        "encode_kwargs": {"normalize_embeddings": True, "batch_size": batch_size},
    },
    vector_store=None,  # uses FAISS by default
    search_type="similarity",
    search_kwargs={"k": 15},
    # 2 reranking strategies with different top-n values
    reranker_cls=CrossEncoderReranker,
    reranker_kwargs={
        "model_name": "cross-encoder/ms-marco-MiniLM-L6-v2",
        "model_kwargs": {"device": "cuda:0"},
        "top_n": List([2, 5]),
    },
    enable_gpu_search=True,  # GPU-based exact search instead of ANN index
)

Notes:

Note that one RFLangChainRagSpec object can have only one document_loader to specify the base data. But you can specify a List or Range (when applicable) for all the other values in a multi-config specification. For instance, the example above showcases two text splitters and two rerankers with different hyperparameters.