Evaluating Small, Task-Specific LLMs for Reconnaissance in IoT Penetration Testing
Faculty Mentor
Sanmeet Kaur
Presentation Type
Oral Presentation
Start Date
4-14-2026 9:40 AM
End Date
4-14-2026 10:00 AM
Location
PUB 321
Primary Discipline of Presentation
Cybersecurity
Abstract
Vulnerability identification during penetration testing traditionally relies on rigid string-matching to map network scan data to Common Platform Enumeration (CPE) identifiers and Common Vulnerabilities and Exposures (CVEs). This approach frequently fails on physical Internet of Things (IoT) devices, which produce non-standard, irregular service banners that resist deterministic parsing. Large Language Models (LLMs) can reason through these fuzzy associations, but deploying cloud-based models introduces cost, latency, and operational security concerns particularly when processing reconnaissance data from live networks. This research investigates whether small, locally hosted open-source LLMs running on consumer-grade hardware can effectively perform this task. I'm presenting a modular three-stage pipeline that processes scan data by isolating device fingerprinting, CPE production, and CVE association, then compare LLM performance against traditional regex-based baselines. Using a ground-truth dataset built from physical commercial IoT devices, I measure accuracy, precision, recall, and F1 scores to quantify where these models succeed, where they hallucinate, and how model size affects reliability. Preliminary results identify specific subtasks where small local models match or exceed baseline methods, as well as failure modes that highlight current limitations in LLM-driven vulnerability reasoning.
Recommended Citation
Davisson, Christopher, "Evaluating Small, Task-Specific LLMs for Reconnaissance in IoT Penetration Testing" (2026). 2026 Symposium. 3.
https://dc.ewu.edu/srcw_2026/op_2026/o3_2026/3
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Evaluating Small, Task-Specific LLMs for Reconnaissance in IoT Penetration Testing
PUB 321
Vulnerability identification during penetration testing traditionally relies on rigid string-matching to map network scan data to Common Platform Enumeration (CPE) identifiers and Common Vulnerabilities and Exposures (CVEs). This approach frequently fails on physical Internet of Things (IoT) devices, which produce non-standard, irregular service banners that resist deterministic parsing. Large Language Models (LLMs) can reason through these fuzzy associations, but deploying cloud-based models introduces cost, latency, and operational security concerns particularly when processing reconnaissance data from live networks. This research investigates whether small, locally hosted open-source LLMs running on consumer-grade hardware can effectively perform this task. I'm presenting a modular three-stage pipeline that processes scan data by isolating device fingerprinting, CPE production, and CVE association, then compare LLM performance against traditional regex-based baselines. Using a ground-truth dataset built from physical commercial IoT devices, I measure accuracy, precision, recall, and F1 scores to quantify where these models succeed, where they hallucinate, and how model size affects reliability. Preliminary results identify specific subtasks where small local models match or exceed baseline methods, as well as failure modes that highlight current limitations in LLM-driven vulnerability reasoning.