Building an LLM Agent to empower Malware

Type

Status

Published

Supervisors

Email

MA

Open

31 March 2026

Francisco Enguix

Alberto Huertas

enguix@ifi.uzh.ch

alberto.huertas@um.es

In autonomous reinforcement learning (RL), high-level decisions are often not sufficient on their own. Once a promising direction has been identified, procedures may need to be adapted to the current context, previous outcomes, and constraints of the environment.

This thesis investigates how a large language model (LLM) agent can support context-aware adaptation of procedures (RL actions) in a Cybersecurity Offensive AI system. The goal is to study how script procedural variants can be generated or refined in a structured way so that they better match the current experimental situation. The thesis is part of a broader Cybersecurity Offensive AI research line at the intersection of intelligent agents, multi-agent systems, reinforcement learning, large language models, and controlled cyber experimentation. Strong results may contribute to a scientific publication.

Sources:

[1] X. Shen et al., ‘PentestAgent: Incorporating LLM Agents to Automated Penetration Testing’, in Proceedings of the 20th ACM Asia Conference on Computer and Communications Security, 2025, pp. 375–391.

[2] J. Palanca, A. Terrasa, V. Julian, and C. Carrascosa, ‘SPADE 3: Supporting the New Generation of Multi-Agent Systems’, IEEE Access, vol. 8, pp. 182537–182549, 2020.

[3] H. van Hasselt, A. Guez, and D. Silver, ‘Deep reinforcement learning with double Q-Learning’, in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, 2016, pp. 2094–2100.

Prerequisites

Good Python programming skills.
Prior coursework or experience in machine learning and NLP
Familiarity with large language models (LLMs) workflows
Basic understanding of reinforcement learning (RL)
Comfort with experimentation and result interpretation

Quicklinks

Main navigation

Building an LLM Agent to empower Malware

Prerequisites