Define reward:
-
Gather new information
- Learning about new marchine: 5 reward
- Checking an exploit and learning it is valuable: 1 reward
- Garthering loot provides 1 reward, 5 reward for password hashes
=> This is only for new information and Garthering loot mean "loot" is valuable documents or information that an attacker can obtain from a compromised system
-
Successfully excuting vulnerablities:
- Give 10 reward
- Gaining superuser privileges (Admintrator): 20 reward
-
Movement fails: -1 reward
FlatActionSpace:
Meaning in Pentest Context:
Each action is a specific task that a learning agent can perform.
For example:
- Action 1: "Exploit Vulnerability A"
- Action 2: "Scan service B"
- Action 3: "Elevate privileges"
- Each action corresponds to a specific job that the agent can choose.
ParameterizedActionSpace:
Meaning in Pentest Context:
Each action can be performed with different parameters, creating multiple variations of an action type.
For example:
- Action: "Exploit Vulnerability"
- Parameter 1: Vulnerability type (A, B, C)
- Parameter 2: Target (Server 1, Server 2)
- Parameter 3: Privilege level (User, Root)
In this way, the agent can perform the "Vulnerability Exploitation" action with many different variations by changing the parameters.
Why are there only 2 Action Spaces:
Provide these two action spaces to accommodate the diversity of tasks and learning requirements. Flexibility helps the learning agent model learn how to interact with the environment and perform takedown testing tasks effectively.
[1] "Training an Autonomous Pentester with Deep RL" by Shane Caldwell. Link: "https://www.thestrangeloop.com/2021/training-an-autonomous-pentester-with-deep-rl.html"
...