GSM8K: Context Engineering for Math Reasoning
Please check out the tutorial notebook on the link below. Right click on the GitHub link to save that file locally.
Context engineering with few-shot prompting for GSM8K math reasoning: View on GitHub.
This use case notebook features an hybrid workflow spanning a self-hosted open LLM for embeddings and an Open AI call for generation.
Task, Dataset, and Prompt
This tutorial shows few-shot prompting as part of context engineering for solving grade school math word problems.
It uses the “GSM8K” dataset; see its details here. The dataset contains grade school math word problems requiring multi-step reasoning.
The prompt format includes system instructions defining the assistant as a math problem solver, semantically selected few-shot examples, and the target question to solve.
Model, Few-Shot Selection, and Configuration Knobs
We compare 2 generator models via OpenAI API: gpt-5-mini and gpt-4o.
There are 2 different reasoning effort levels for each model: medium and high.
The few-shot prompting pipeline uses:
Example Selection: Semantic similarity-based selection using sentence-transformers/all-MiniLM-L6-v2 embeddings.
Example Pool: 10 hand-crafted examples covering diverse problem types.
Few-Shot k Values: 2 different values: 3 and 5 examples per prompt.
Prompt Template: Chain-of-thought style with step-by-step reasoning and final answer after “####”.
All other knobs are fixed across all configs. Thus, there are a total of 8 combinations launched with a simple grid search: 2 generator models x 2 reasoning effort levels x 2 few-shot k values.