Autonomous Prompt Refinement for Discrete, Rule-Governed Tasks
Faculty Mentor
Sanmeet Kaur
Presentation Type
Poster
Start Date
4-14-2026 2:00 PM
End Date
4-14-2026 4:00 PM
Location
PUB NCR
Primary Discipline of Presentation
Computer Science
Abstract
Prompt engineering has emerged as an important mechanism for improving large language model (LLM) behavior. While frontier LLMs demonstrate strong performance on natural language tasks, prior work has identified persistent limitations in accuracy and consistency in strict output tasks. This has motivated recent, popular approaches that hybridize LLM systems by connecting them to external solvers or supplying structured feedback. However, these systems do not quantify the performance ceilings or the extent of improvement possible through fully autonomous prompt refinement. This project systematically evaluates this performance in contemporary LLMs using three prompt refinement paradigms across six rule-governed task families drawn from canonical problem classes prominent in the literature. A DSPy-based refiner is employed that iteratively proposes and evaluates rewritten policies; a population-based evolutionary refiner is used, which searches over prompt variants using mutation and selection; and a deterministic heuristic refiner revises based on application of a fixed set of rule-based prompt edits. The task families chosen include two sets from the domain of propositional logic, two representing constraint satisfaction problems, and two representing syntax-constrained generation. These reflect widely-studied benchmarks in symbolic reasoning and formal verification. Each task family is evaluated using programmatic validators that enforce exact output contracts. The refiners operate under identical compute budgets and their fitness is evaluate by scalar pass/fail accuracy. By isolating prompt refinement as a constrained optimization problem, this work quantifies how much improvement is achievable through autonomous prompt search alone and identifies conditions under which refinement meaningfully compensates for model limitations. The results provide empirical bounds on prompt-based adaptation absent symbolic augmentation.
Recommended Citation
Locke, Robert, "Autonomous Prompt Refinement for Discrete, Rule-Governed Tasks" (2026). 2026 Symposium. 30.
https://dc.ewu.edu/srcw_2026/ps_2026/p3_2026/30
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Autonomous Prompt Refinement for Discrete, Rule-Governed Tasks
PUB NCR
Prompt engineering has emerged as an important mechanism for improving large language model (LLM) behavior. While frontier LLMs demonstrate strong performance on natural language tasks, prior work has identified persistent limitations in accuracy and consistency in strict output tasks. This has motivated recent, popular approaches that hybridize LLM systems by connecting them to external solvers or supplying structured feedback. However, these systems do not quantify the performance ceilings or the extent of improvement possible through fully autonomous prompt refinement. This project systematically evaluates this performance in contemporary LLMs using three prompt refinement paradigms across six rule-governed task families drawn from canonical problem classes prominent in the literature. A DSPy-based refiner is employed that iteratively proposes and evaluates rewritten policies; a population-based evolutionary refiner is used, which searches over prompt variants using mutation and selection; and a deterministic heuristic refiner revises based on application of a fixed set of rule-based prompt edits. The task families chosen include two sets from the domain of propositional logic, two representing constraint satisfaction problems, and two representing syntax-constrained generation. These reflect widely-studied benchmarks in symbolic reasoning and formal verification. Each task family is evaluated using programmatic validators that enforce exact output contracts. The refiners operate under identical compute budgets and their fitness is evaluate by scalar pass/fail accuracy. By isolating prompt refinement as a constrained optimization problem, this work quantifies how much improvement is achievable through autonomous prompt search alone and identifies conditions under which refinement meaningfully compensates for model limitations. The results provide empirical bounds on prompt-based adaptation absent symbolic augmentation.