Lab 1.4 - AI-Assisted Incident Handling

AI for Security Operations | SEC504 | May 2026

Drove a self-hosted gpt-4.1 (OpenWebUI via goaichat at localhost:8080) through three incident-handling jobs. First, uploaded an obfuscated analytics-backup.bat and prompted the model to deobfuscate the variable-fragmented commands step by step, reconstruct the hidden PowerShell, and extract IOCs (genusight.net/collect, genusight.s3.amazonaws.com/XhXrnSbE.exe, %TEMP%\bitsadmin.exe, a Startup-folder copy, %USERPROFILE%\.azure\accessTokens.json credential theft, and an HKCU Run\BITSAdmin key). Second, had it author BaselineCollector.ps1, a PowerShell 5.1 tool that snapshots services, tasks, users, firewall rules, ports, Run keys, and WMI subscriptions to JSON for Compare-Object diffing, plus usage documentation. Third, seeded it with an expert-IR system prompt and an Event of Interest (a CEO-workstation breach) to generate a MITRE ATT&CK-mapped response playbook.

Tools: gpt-4.1, OpenWebUI, goaichat, Docker, PowerShell 5.1, MITRE ATT&CK

Commands

1. Start the local AI stack

Ran the SEC504 goaichat helper, which starts the Docker service and brings up OpenWebUI (serving gpt-4.1) at http://localhost:8080. Running the model locally is the entire point: malware samples, IOCs, and internal details get pasted into prompts, and a self-hosted model keeps all of that inside the analysis environment instead of shipping it to a third-party API.

goaichat

2. Review the raw obfuscated sample

Looked at analytics-backup.bat first with cat. It is deliberately unreadable: dozens of single-purpose environment variables (set EUJZ=hell, set RBVJ="%TEMP%\bitsadmin.exe", set KQOT=BITSAdmin, ...) that get concatenated later to assemble the real commands. Reading the analyst's own eyes over the raw file first means you can sanity-check whatever the model claims it says.

cat ~/labs/falsimentis/analytics-backup.bat

3. Prompt for deobfuscation

Attached analytics-backup.bat and primed the model with a role and a tight task: 'You are an expert in Windows malware analysis. Analyze the attached script file. Deobfuscate the script as needed to understand the functionality.' gpt-4.1 identified it as a dropper, explained the variable-fragmentation evasion technique, and began a step-by-step breakdown. Role priming plus a concrete task is what gets a usable answer instead of a hedge.

4. Decode the variables, one command per line

Followed up: 'deobfuscate the script, decoding the variables. Show the commands in the script in deobfuscated form, one command per line.' The model substituted the fragmented variables back into their assembled commands and printed them as discrete lines, which is the form a human can actually reason about and copy into an IOC report.

5. Deobfuscate the PowerShell step by step

Narrowed in on the payload: 'Deobfuscate the PowerShell portion of the script. Show the PowerShell commands in their entirety in deobfuscated form. Slow down and think step-by-step.' The model identified the assembled %QMZA% line and walked the substitution: %BSML%=po, %AMBE%=wers, %EUJZ%=hell → powershell, %UEAI%=-c. Asking it to slow down and decompose is a reliable way to cut confident-but-wrong shortcuts on a long obfuscated string.

6. Extract the IOC list

Asked for a structured deliverable: 'Extract Indicators of compromise from the deobfuscated PowerShell commands and the other batch script commands. Provide the IOCs in a list format.' gpt-4.1 returned Network IOCs (http://genusight.net/collect?th=..., https://genusight.s3.amazonaws.com/XhXrnSbE.exe), File IOCs (%TEMP%\bitsadmin.exe, a Startup-folder copy for persistence, %USERPROFILE%\.azure\accessTokens.json targeted for credential theft), and a Registry IOC (HKCU\...\Run\BITSAdmin). Every one of these still needs analyst verification, but as a starting IOC set it is minutes of work instead of an hour.

7. Prompt for a baseline-collection tool

Switched from analysis to tooling: 'You are an expert PowerShell programmer ... Write a PowerShell script that collects baseline information on the configuration of a Windows host ... output configuration details in multiple files so that later use of the script on systems under investigation can reveal differences ... compared using the PowerShell compare-object command. Do you have any questions for me?' Ending with an explicit invitation for questions turns a one-shot generation into a scoped design conversation.

8. Answer the model's clarifying questions

gpt-4.1 asked the right questions before writing code: which configuration areas to cover (services, users/groups, scheduled tasks, listening ports, firewall rules, RDP, Run/RunOnce keys, installed software) and what output format. A model that asks before generating is far more useful than one that guesses, and it mirrors how a competent engineer would respond to the same request.

9. Scope the script in the reply

Answered with the full scope: cover running services, scheduled tasks, local user/group accounts, enabled firewall rules, listening ports, startup registry keys, installed programs, autoruns, remote-desktop status, and WMI subscriptions; use Compare-Object to diff; JSON output is fine; baseline once on the gold image, collect again on the host under investigation, and compare on an analyst workstation; target PowerShell 5.1. This is the same baseline-and-diff philosophy used manually in the PowerShell live-investigation lab, now codified into a reusable tool.

10. Review the generated tool

The model produced BaselineCollector.ps1: a parameterized script (param OutputFolder), a Save-Json helper wrapping ConvertTo-Json with UTF-8 output, and per-area collection (Get-Service projected to Name/DisplayName/Status/StartType, and so on) written to one JSON file per area under a per-hostname folder. The output is reviewed, not trusted blindly, but it is a working first draft that would have taken real time to write by hand.

11. Request usage documentation

Asked the model to 'Generate documentation on how to use the script ... Show sample usage for collecting data from a baseline system, and for a system under investigation. Show sample commands for comparing the results.' It produced a clean usage guide distinguishing the Baseline (gold image) and Investigation (suspect host) scenarios and showing the Compare-Object commands to diff the two JSON sets.

12. Set the expert-IR system prompt

For the third task, prepared a system prompt in a text file (IRplaybook.txt) that casts the model as an expert-level incident-response analyst whose job is to take an Event of Interest and produce a usable investigation playbook, with references to SANS incident-handling guidance and MITRE ATT&CK (e.g., T1110 Brute Force) and a version-control table. A strong, reusable system prompt is what makes the model's output consistent across investigations.

gedit ~/labs/falsimentis/IRplaybook.txt

13. Load the playbook system prompt

Loaded the IR system prompt into a fresh gpt-4.1 conversation. The prompt instructs the model to slow down, think step-by-step about what a responder actually needs, map techniques to MITRE ATT&CK, and maintain a versioned playbook document.

14. Provide the Event of Interest

The model asked the right scoping question back ('describe the Event of Interest you would like to focus on'), then was given the EOI: multiple IOCs in a breach investigation centered on the CEO workstation, with a malicious batch script and Network IOCs genusight.net and genusight.s3.amazonaws.com/XhXrnSbE.exe, the same indicators recovered in the deobfuscation task. Feeding the model real, structured EOI context is what turns a generic template into a playbook tailored to this incident.

Key Findings

gpt-4.1 deobfuscated analytics-backup.bat's variable fragmentation and reconstructed the hidden PowerShell (%QMZA% → powershell -c ...)
Extracted IOCs: genusight.net/collect, genusight.s3.amazonaws.com/XhXrnSbE.exe, %TEMP%\bitsadmin.exe, Startup-folder persistence, %USERPROFILE%\.azure\accessTokens.json, HKCU Run\BITSAdmin
Generated BaselineCollector.ps1 (PowerShell 5.1, JSON output) for Compare-Object live-response diffing, plus usage docs
Drafted a MITRE ATT&CK-mapped IR playbook (incl. T1110 Brute Force) from a CEO-workstation Event of Interest
Entire workflow run on a local gpt-4.1 (OpenWebUI/Docker) so malware and IOCs never left the environment

Security Controls

Self-hosted / data-isolated LLM for any sensitive-data workflow
Policy prohibiting upload of malware or IOCs to consumer AI services
Mandatory human review of LLM-generated code before execution
IOC verification against the source sample and threat intel before action
Versioned, vetted prompt library for repeatable analysis
Logging of AI prompts/outputs as investigation artifacts