Deep Q-Networks (DQN) or Proximal Policy Optimization (PPO) algorithms are commonly deployed to learn a policy that maximizes cumulative reward over an episode (e.g., a timed penetration test). The "deep" aspect allows the agent to abstract high-level strategies from raw network data, such as recognizing that discovering a web server often precedes SQL injection attempts.
Includes a topology generator to train the AI on various network layouts, improving its ability to handle complex environments. autopentest-drl
Once the DRL engine identifies a path, the framework uses Metasploit (via the pymetasploit3 Deep Q-Networks (DQN) or Proximal Policy Optimization (PPO)