Overview

Automatic prompt optimization (APO) is a reinforcement learning technique used to improve task performance for AI language models. There are dozens of APO algorithms that can accomplish this task, but most all of them follow five general steps (Ramnath, et. al. 2025):

Seed Prompt Initialization - Using manually created prompts or instruction-induced prompts via LLMs
Candidate Prompt Generation - Generating new instruction prompts based on the previous generation of prompts
Inference Evaluation & Feedback - Evaluate the performance of new prompts using a validation set and provide feedback to the APO algorithm
Filter & Retain Promising Prompts - Select prompts to seed the next generation
Repeat steps 2-4 until the exit criteria is met

Each algorithm implements these five steps differently. For example, while most algorithms use a user-provided seed prompt in the first step, APE generates seed prompts by inferring instructions from task input-output pairs. This makes it useful for instances where the task may be unknown or hard to describe, but it struggles when there are non-obvious conditions or constraints on the prompt output.

As another example, while some algorithms like APE and OPRO use random input-output pairs from the validation set to generate new prompts, algorithms like PromptAgent and ProTeGi sample from the input-output pairs that the prompt failed on when generating new prompts. This means the prompts from these algorithms should get progressively better with each iteration as they learn from the mistakes they made.

The next section covers these differences in more detail for the algorithms implemented in this package.

Comparison of Algorithms

Step	APE	OPRO	ProTeGi	PromptAgent
Seed Prompt Initialization	Generate multiple seed prompts using samples from the validation set	Use a user-provided seed prompt	Use a user-provided seed prompt Collect inference errors	Use a user-provided seed prompt Collect inference errors
Candidate Prompt Generation	Generate variations of previous prompts	Generate new prompts using scored prompt candidates and a random sample from the validation set Scored prompts are sorted by score to demonstrate a prompt trajectory	Generate error feedback Generate new prompts using the generated feedback	Generate error feedback Get prompt trajectory along branch Generate new prompts using the prompt trajectory and generated feedback
Inference Evaluation & Feedback	Score the new prompts against the validation set	Score the new prompts against the validation set	Score the new prompts against the validation set Collect inference errors	Score the new prompts against the validation set Collect inference errors
Filter & Retain Promising Prompts	Select and keep the top k_percent of prompts for the next generation	Keep all prompts for the next generation	Beam: Keep only the best prompt Greedy: Keep all prompts	Beam: Keep the best prompt from each branch Greedy: Keep all prompts
Exit Criteria	Score exceeds score threshold, or Maximum iterations are reached	Score exceeds score threshold, or Maximum iterations are reached	Score exceeds score threshold, or Maximum iterations are reached	Score exceeds score threshold, or Maximum iterations are reached