Human-Readable Decompilation

Faculty Mentor

Antonio Espinoza

Presentation Type

Poster

Start Date

4-14-2026 9:00 AM

End Date

4-14-2026 11:00 AM

Location

PUB NCR

Primary Discipline of Presentation

Computer Science

Abstract

Decompilation is an essential process in reverse engineering, malware analysis, security research, software preservation, and software repair. However, existing decompilers prioritize functional correctness over readability, routinely emitting C code with excessive nesting, tangled control flow, and structures that are difficult for human analysts to follow. To address this problem, this project presents a post-processing static analysis tool that targets decompiled C output from Ghidra, improving its readability without altering program logic. Using ANTLR4, the tool constructs a concrete syntax tree from the raw decompiler output. It utilizes a two-pass refactoring flow, which decouples pattern identification from code modification. A series of designed static code transformations are then applied to the abstract syntax tree to eliminate machine-generated artifacts and recover clean, idiomatic C code. The tool was evaluated against 13 binaries from the DARPA Cyber Grand Challenge dataset, successfully applying 555 total transformations. The results demonstrate that static analysis effectively simplifies complex control flow while strictly preserving program logic, reducing the cognitive load for human analysts.

This document is currently not available here.

Share

COinS
 
Apr 14th, 9:00 AM Apr 14th, 11:00 AM

Human-Readable Decompilation

PUB NCR

Decompilation is an essential process in reverse engineering, malware analysis, security research, software preservation, and software repair. However, existing decompilers prioritize functional correctness over readability, routinely emitting C code with excessive nesting, tangled control flow, and structures that are difficult for human analysts to follow. To address this problem, this project presents a post-processing static analysis tool that targets decompiled C output from Ghidra, improving its readability without altering program logic. Using ANTLR4, the tool constructs a concrete syntax tree from the raw decompiler output. It utilizes a two-pass refactoring flow, which decouples pattern identification from code modification. A series of designed static code transformations are then applied to the abstract syntax tree to eliminate machine-generated artifacts and recover clean, idiomatic C code. The tool was evaluated against 13 binaries from the DARPA Cyber Grand Challenge dataset, successfully applying 555 total transformations. The results demonstrate that static analysis effectively simplifies complex control flow while strictly preserving program logic, reducing the cognitive load for human analysts.