API: Prompt Manager and Other Eval Config Knobs

RFPromptManager

This class wraps around some LangChain APIs to manage dynamic few-shot example selection. It provides semantic similarity-based example selection to construct prompts with the most relevant examples for each input query.

The individual arguments (knobs) can be List valued or Range valued in an RFPromptManager. That is how you can specify a base set of knob combinations from which a config group can be produced. Also read the Multi-Config Specification page.

class RFPromptManager
Parameters:
  • instructions (str, optional) – The main instructions for the prompt that guide the generator’s behavior. This sets the overall task description and role for the assistant. Either this or instructions_file_path must be provided.

  • instructions_file_path (str, optional) – Path to a file containing the instructions. Use this as an alternative to the instructions parameter for loading instructions from a file, say, if they are very long.

  • examples (list[dict[str, str]], optional) – A list of example dictionaries for few-shot learning. Each example should be a dictionary with keys matching the expected input-output format (e.g., “question” and “answer”).

  • embedding_cls (type[Embeddings], optional) – The embedding class to use for computing semantic similarity between examples and queries. Options include HuggingFaceEmbeddings and OpenAIEmbeddings. Pass the class itself, not an instance.

  • embedding_kwargs (dict[str, Any], optional) – Dictionary containing all parameters needed to initialize the embedding class above. Required parameters vary by embedding class. HuggingFaceEmbeddings needs model_name, model_kwargs and device, while OpenAIEmbeddings needs "model" and "api_key".

  • example_selector_cls (type[MaxMarginalRelevanceExampleSelector | SemanticSimilarityExampleSelector], optional) – The example selector class that determines how to choose relevant examples based on the input query. Must be either SemanticSimilarityExampleSelector or MaxMarginalRelevanceExampleSelector (for diversity) from LangChain.

  • example_prompt_template (PromptTemplate, optional) – A LangChain PromptTemplate that defines how to format each example. Should specify input_variables and a template string with placeholders matching the keys in the examples dictionaries.

  • k (int, optional) – Number of most similar or diverse examples to retrieve and include in the prompt for each query. Default is 3.

Example:

# Based on GSM8K chatbot tutorial notebook; specify your INSTRUCTIONS and OPENAI_API_KEY beforehand
fewshot_prompt_manager = RFPromptManager(
        instructions=INSTRUCTIONS,
        examples=examples,
        embedding_cls=OpenAIEmbeddings,
        embedding_kwargs={"model": "text-embedding-3-small", "api_key": OPENAI_API_KEY},
        example_selector_cls=SemanticSimilarityExampleSelector,
        example_prompt_template=PromptTemplate(
                input_variables=["question", "answer"],
                template="Question: {question}\nAnswer: {answer}",
        ),
        k=5,
)

Other Eval Config Knobs

Finally, apart from the Generator, the following knobs can also be included in your eval config dictionary. Each of these can also be a knob set generator, viz., List() for a discrete and Range() for continuous knobs.

For more details on the four user-given functions listed below, see the API: User-Provided Functions for Run Evals page.

For more details on the semantics of the online aggregation strategy arguments listed below, see the Online Aggregation for Evals page.

batch_sizeint

Number of examples to process in one batch for GPU efficiency (if applicable)

preprocess_fnCallable

User-given function to preprocess a batch of examples; an eval config’s RagSpec and PromptManager are input by the system

postprocess_fnCallable, optional

User-given function to postprocess a batch of examples and generations; a single cfg is passed as input by the system

compute_metrics_fnCallable

User-given evaluation function to compute eval metrics per batch

accumulate_metrics_fnCallable, optional

User-given evaluation function to aggregate algebraic eval metrics across batches. If this is not given, all metrics provided in eval_compute_metrics_fn will be assumed to be distributive by default.

online_strategy_kwargsdict[str, Any], optional

Parameters for evals online aggregation strategy. The dictionary must include the following keys:

  • "strategy_name" (str) - Must be "normal", "wilson", or "hoeffding".

  • "confidence_level" (float) - Confidence level for confidence intervals on metrics. Must be in [0,1]. Default is 0.95 (95%).

  • "use_fpc" (bool) - Whether to apply finite population correction. Default is True.