Poster Session 2

Benchmarking and Improving Rust Code Generation for Windows API Tasks Using Verified Feedback

Chander Luderman Miller, Eastern Washington UniversityFollow

Faculty Mentor

Sanmeet Kaur

Presentation Type

Poster

Start Date

4-14-2026 11:30 AM

End Date

4-14-2026 1:30 PM

Location

PUB NCR

Primary Discipline of Presentation

Computer Science

Abstract

Large Language Models (LLMs) have made significant improvements in code generation but continue to experience challenges in generating Rust code that compiles and passes tests for low-resource systems tasks, such as Windows API programming. This project evaluates whether unit test-guided supervision can improve the Rust code generation of an open-weight LLM, with a focus on low-resource tasks. Rust is particularly well-suited for execution based evaluation because the compiler, borrow checker, and linters all provide strong signals regarding type safety, lifetime validity, and idiomatic correctness before runtime. We construct a benchmark of Rust problems with automated tests covering single-operation Windows API tasks, multi-API tasks that require correct control flow and memory management, and general Rust problem-solving drawn from established execution-based evaluation benchmarks to assess generalization. Baseline performance will be measured using compile success, lint thresholds, and test pass rates (e.g., pass@k). An agent with documentation access and iterative compile/test feedback generates a verified subset of correct solutions to fine-tune the base model. We then reevaluate on a held-out split using the same metrics to determine whether verified supervision improves correctness on low-resource Rust tasks without reducing performance on general Rust problems.

Recommended Citation

Luderman Miller, Chander, "Benchmarking and Improving Rust Code Generation for Windows API Tasks Using Verified Feedback" (2026). 2026 Symposium. 28.
https://dc.ewu.edu/srcw_2026/ps_2026/p2_2026/28

Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.

This document is currently not available here.

COinS

Apr 14th, 11:30 AM Apr 14th, 1:30 PM

Benchmarking and Improving Rust Code Generation for Windows API Tasks Using Verified Feedback

PUB NCR

EWU Digital Commons

Poster Session 2

Benchmarking and Improving Rust Code Generation for Windows API Tasks Using Verified Feedback

Faculty Mentor

Presentation Type

Start Date

End Date

Location

Primary Discipline of Presentation

Abstract

Recommended Citation

Creative Commons License

Search

Browse

Author Corner

Links

Links

EWU Digital Commons

Poster Session 2

Benchmarking and Improving Rust Code Generation for Windows API Tasks Using Verified Feedback

Authors

Faculty Mentor

Presentation Type

Start Date

End Date

Location

Primary Discipline of Presentation

Abstract

Recommended Citation

Creative Commons License

Share

Search

Browse

Author Corner

Links

Links