GSM8K: Context Engineering for Math Reasoning

Please check out the tutorial notebook on the link below. Right click on the GitHub link to save that file locally.

Context engineering with few-shot prompting for GSM8K math reasoning: View on GitHub.

This use case notebook features an hybrid workflow spanning a self-hosted open LLM for embeddings and an Open AI call for generation.

Task, Dataset, and Prompt

This tutorial shows few-shot prompting as part of context engineering for solving grade school math word problems.

It uses the “GSM8K” dataset; see its details here. The dataset contains grade school math word problems requiring multi-step reasoning.

The prompt format includes system instructions defining the assistant as a math problem solver, semantically selected few-shot examples, and the target question to solve.

Model, Few-Shot Selection, and Configuration Knobs

We compare 2 generator models via OpenAI API: gpt-5-mini and gpt-4o.

There are 2 different reasoning effort levels for each model: medium and high.

The few-shot prompting pipeline uses:

  • Example Selection: Semantic similarity-based selection using sentence-transformers/all-MiniLM-L6-v2 embeddings.

  • Example Pool: 10 hand-crafted examples covering diverse problem types.

  • Few-Shot k Values: 2 different values: 3 and 5 examples per prompt.

  • Prompt Template: Chain-of-thought style with step-by-step reasoning and final answer after “####”.

All other knobs are fixed across all configs. Thus, there are a total of 8 combinations launched with a simple grid search: 2 generator models x 2 reasoning effort levels x 2 few-shot k values.