SEC504 - Hacker Tools, Techniques, and Incident Handling
Lab 1.4 - AI-Assisted Incident Handling
Solo, Lab
Focus: AI for Security Operations
Level: SEC504
Date: May 2026
Artifacts: Sanitized screenshots of a self-hosted gpt-4.1 (OpenWebUI) session across three incident-handling use cases
TL;DR
- •Used a self-hosted gpt-4.1 to deobfuscate a variable-fragmented malicious batch script and extract its full IOC set
- •Had the model generate BaselineCollector.ps1 (PowerShell 5.1, JSON output for Compare-Object) plus usage docs for live-response baselining
- •Seeded an expert-IR system prompt with an Event of Interest to draft a MITRE ATT&CK-mapped incident-response playbook, all on a local model so malware and IOCs never left the environment
Skills demonstrated
Note: Course-provided PCAPs and lab instructions are not shared. Only my own captures and sanitized notes are published.
Why this matters
LLMs are now part of the incident-handler's toolkit whether teams plan for it or not. Used well, a model collapses an hour of manual batch-script deobfuscation into minutes and drafts tooling and playbooks an analyst can refine. Used carelessly, it leaks the very malware and IOCs an investigation is trying to contain into a third-party service, and it hands over confident-but-wrong answers that go unverified. This lab practices the good version: a self-hosted model, deliberate step-by-step prompting, and treating every output as a lead to verify rather than a verdict.
Context
This lab uses a locally hosted LLM (gpt-4.1 served through OpenWebUI, started with the SEC504 goaichat helper) as a force-multiplier across three distinct incident-handling tasks: deobfuscating a heavily obfuscated malicious batch script and extracting its IOCs, generating a PowerShell baseline-collection tool for live response, and drafting a structured incident-response playbook from an Event of Interest. The emphasis is on prompt construction and verification, and on doing all of it against a self-hosted model so sensitive malware and IOCs never leave the analyst's environment.
Tools used
Steps taken
1Start the local AI stack
Ran the SEC504 goaichat helper, which starts the Docker service and brings up OpenWebUI (serving gpt-4.1) at http://localhost:8080. Running the model locally is the entire point: malware samples, IOCs, and internal details get pasted into prompts, and a self-hosted model keeps all of that inside the analysis environment instead of shipping it to a third-party API.
$ goaichat2Review the raw obfuscated sample
Looked at analytics-backup.bat first with cat. It is deliberately unreadable: dozens of single-purpose environment variables (set EUJZ=hell, set RBVJ="%TEMP%\bitsadmin.exe", set KQOT=BITSAdmin, ...) that get concatenated later to assemble the real commands. Reading the analyst's own eyes over the raw file first means you can sanity-check whatever the model claims it says.
$ cat ~/labs/falsimentis/analytics-backup.bat3Prompt for deobfuscation
Attached analytics-backup.bat and primed the model with a role and a tight task: 'You are an expert in Windows malware analysis. Analyze the attached script file. Deobfuscate the script as needed to understand the functionality.' gpt-4.1 identified it as a dropper, explained the variable-fragmentation evasion technique, and began a step-by-step breakdown. Role priming plus a concrete task is what gets a usable answer instead of a hedge.
4Decode the variables, one command per line
Followed up: 'deobfuscate the script, decoding the variables. Show the commands in the script in deobfuscated form, one command per line.' The model substituted the fragmented variables back into their assembled commands and printed them as discrete lines, which is the form a human can actually reason about and copy into an IOC report.
5Deobfuscate the PowerShell step by step
Narrowed in on the payload: 'Deobfuscate the PowerShell portion of the script. Show the PowerShell commands in their entirety in deobfuscated form. Slow down and think step-by-step.' The model identified the assembled %QMZA% line and walked the substitution: %BSML%=po, %AMBE%=wers, %EUJZ%=hell → powershell, %UEAI%=-c. Asking it to slow down and decompose is a reliable way to cut confident-but-wrong shortcuts on a long obfuscated string.
6Extract the IOC list
Asked for a structured deliverable: 'Extract Indicators of compromise from the deobfuscated PowerShell commands and the other batch script commands. Provide the IOCs in a list format.' gpt-4.1 returned Network IOCs (http://genusight.net/collect?th=..., https://genusight.s3.amazonaws.com/XhXrnSbE.exe), File IOCs (%TEMP%\bitsadmin.exe, a Startup-folder copy for persistence, %USERPROFILE%\.azure\accessTokens.json targeted for credential theft), and a Registry IOC (HKCU\...\Run\BITSAdmin). Every one of these still needs analyst verification, but as a starting IOC set it is minutes of work instead of an hour.
7Prompt for a baseline-collection tool
Switched from analysis to tooling: 'You are an expert PowerShell programmer ... Write a PowerShell script that collects baseline information on the configuration of a Windows host ... output configuration details in multiple files so that later use of the script on systems under investigation can reveal differences ... compared using the PowerShell compare-object command. Do you have any questions for me?' Ending with an explicit invitation for questions turns a one-shot generation into a scoped design conversation.
8Answer the model's clarifying questions
gpt-4.1 asked the right questions before writing code: which configuration areas to cover (services, users/groups, scheduled tasks, listening ports, firewall rules, RDP, Run/RunOnce keys, installed software) and what output format. A model that asks before generating is far more useful than one that guesses, and it mirrors how a competent engineer would respond to the same request.
9Scope the script in the reply
Answered with the full scope: cover running services, scheduled tasks, local user/group accounts, enabled firewall rules, listening ports, startup registry keys, installed programs, autoruns, remote-desktop status, and WMI subscriptions; use Compare-Object to diff; JSON output is fine; baseline once on the gold image, collect again on the host under investigation, and compare on an analyst workstation; target PowerShell 5.1. This is the same baseline-and-diff philosophy used manually in the PowerShell live-investigation lab, now codified into a reusable tool.
10Review the generated tool
The model produced BaselineCollector.ps1: a parameterized script (param OutputFolder), a Save-Json helper wrapping ConvertTo-Json with UTF-8 output, and per-area collection (Get-Service projected to Name/DisplayName/Status/StartType, and so on) written to one JSON file per area under a per-hostname folder. The output is reviewed, not trusted blindly, but it is a working first draft that would have taken real time to write by hand.
11Request usage documentation
Asked the model to 'Generate documentation on how to use the script ... Show sample usage for collecting data from a baseline system, and for a system under investigation. Show sample commands for comparing the results.' It produced a clean usage guide distinguishing the Baseline (gold image) and Investigation (suspect host) scenarios and showing the Compare-Object commands to diff the two JSON sets.
12Set the expert-IR system prompt
For the third task, prepared a system prompt in a text file (IRplaybook.txt) that casts the model as an expert-level incident-response analyst whose job is to take an Event of Interest and produce a usable investigation playbook, with references to SANS incident-handling guidance and MITRE ATT&CK (e.g., T1110 Brute Force) and a version-control table. A strong, reusable system prompt is what makes the model's output consistent across investigations.
$ gedit ~/labs/falsimentis/IRplaybook.txt13Load the playbook system prompt
Loaded the IR system prompt into a fresh gpt-4.1 conversation. The prompt instructs the model to slow down, think step-by-step about what a responder actually needs, map techniques to MITRE ATT&CK, and maintain a versioned playbook document.
14Provide the Event of Interest
The model asked the right scoping question back ('describe the Event of Interest you would like to focus on'), then was given the EOI: multiple IOCs in a breach investigation centered on the CEO workstation, with a malicious batch script and Network IOCs genusight.net and genusight.s3.amazonaws.com/XhXrnSbE.exe, the same indicators recovered in the deobfuscation task. Feeding the model real, structured EOI context is what turns a generic template into a playbook tailored to this incident.
Key findings
Outcome / Lessons learned
Ran a self-hosted gpt-4.1 across the three places an LLM genuinely helps an incident handler: it deobfuscated a variable-fragmented batch dropper and produced a verifiable IOC list in minutes, authored a working BaselineCollector.ps1 plus usage docs for live-response diffing, and drafted a MITRE ATT&CK-mapped response playbook from a CEO-workstation Event of Interest. Every output was treated as a reviewed first draft, and the whole workflow stayed on a local model so the malware and IOCs never left the environment.
Standardize on a self-hosted or contractually-isolated model for anything touching malware, IOCs, or internal data, and document that policy so analysts are not pasting samples into consumer chatbots. Keep a versioned library of vetted system prompts (malware analyst, PowerShell tooling, IR playbook author) so output is consistent and reviewable. Treat every model output as a lead: verify extracted IOCs against the actual sample and threat intel, and code-review generated scripts before running them on production hosts. Capture prompts and outputs as investigation artifacts for repeatability and audit.
Security controls relevant
- Self-hosted / data-isolated LLM for any sensitive-data workflow
- Policy prohibiting upload of malware or IOCs to consumer AI services
- Mandatory human review of LLM-generated code before execution
- IOC verification against the source sample and threat intel before action
- Versioned, vetted prompt library for repeatable analysis
- Logging of AI prompts/outputs as investigation artifacts
What I took away from this
The biggest decision in this lab is invisible in the output: it runs on a self-hosted model. The moment an analyst pastes a malware sample or an internal IOC into a consumer chatbot, that data has left the investigation and may be retained, logged, or trained on. Self-hosting gpt-4.1 in Docker is what makes AI-assisted analysis defensible rather than a data-exfiltration incident of your own making. Capability is the easy part; the operational-security choice is the part that separates a professional workflow from a liability.
Prompting for security work is a real skill and the lab demonstrates the two highest-value techniques. Role priming ('you are an expert in Windows malware analysis') sets the model's frame, and step-by-step decomposition ('slow down, think step-by-step, one command per line') stops it from taking confident shortcuts on long obfuscated strings. The same model that would hand-wave a single sloppy prompt produces precise variable substitutions when the task is decomposed properly.
AI is an accelerator, not an oracle, and the discipline is verification. The model's IOC list and generated PowerShell were excellent starting points, but the IOCs still have to be checked against the actual sample and the script still has to be code-reviewed before it runs on a production host. The win is real (an hour of deobfuscation becomes minutes), but it is a win precisely because a skilled analyst is in the loop to catch the cases where the model is confidently wrong.