SEC504 - Hacker Tools, Techniques, and Incident Handling

Lab 1.4 - AI-Assisted Incident Handling

Solo, Lab

Focus: AI for Security Operations

Level: SEC504

Date: May 2026

Artifacts: Sanitized screenshots of a self-hosted gpt-4.1 (OpenWebUI) session across three incident-handling use cases

TL;DR

•Used a self-hosted gpt-4.1 to deobfuscate a variable-fragmented malicious batch script and extract its full IOC set
•Had the model generate BaselineCollector.ps1 (PowerShell 5.1, JSON output for Compare-Object) plus usage docs for live-response baselining
•Seeded an expert-IR system prompt with an Event of Interest to draft a MITRE ATT&CK-mapped incident-response playbook, all on a local model so malware and IOCs never left the environment

Skills demonstrated

AI-assisted malware deobfuscation and IOC extractionPrompt engineering for security tasks (role priming, step-by-step decomposition)LLM-generated tooling review (PowerShell baseline collector)Incident-response playbook development with MITRE ATT&CK mappingOperational security of self-hosted LLMs for sensitive data

Note: Course-provided PCAPs and lab instructions are not shared. Only my own captures and sanitized notes are published.

Why this matters

LLMs are now part of the incident-handler's toolkit whether teams plan for it or not. Used well, a model collapses an hour of manual batch-script deobfuscation into minutes and drafts tooling and playbooks an analyst can refine. Used carelessly, it leaks the very malware and IOCs an investigation is trying to contain into a third-party service, and it hands over confident-but-wrong answers that go unverified. This lab practices the good version: a self-hosted model, deliberate step-by-step prompting, and treating every output as a lead to verify rather than a verdict.

Context

This lab uses a locally hosted LLM (gpt-4.1 served through OpenWebUI, started with the SEC504 goaichat helper) as a force-multiplier across three distinct incident-handling tasks: deobfuscating a heavily obfuscated malicious batch script and extracting its IOCs, generating a PowerShell baseline-collection tool for live response, and drafting a structured incident-response playbook from an Event of Interest. The emphasis is on prompt construction and verification, and on doing all of it against a self-hosted model so sensitive malware and IOCs never leave the analyst's environment.

Tools used

gpt-4.1OpenWebUIgoaichatDockerPowerShell 5.1MITRE ATT&CK

Steps taken

1Start the local AI stack

Ran the SEC504 goaichat helper, which starts the Docker service and brings up OpenWebUI (serving gpt-4.1) at http://localhost:8080. Running the model locally is the entire point: malware samples, IOCs, and internal details get pasted into prompts, and a self-hosted model keeps all of that inside the analysis environment instead of shipping it to a third-party API.

$ goaichat

2Review the raw obfuscated sample

Looked at analytics-backup.bat first with cat. It is deliberately unreadable: dozens of single-purpose environment variables (set EUJZ=hell, set RBVJ="%TEMP%\bitsadmin.exe", set KQOT=BITSAdmin, ...) that get concatenated later to assemble the real commands. Reading the analyst's own eyes over the raw file first means you can sanity-check whatever the model claims it says.

$ cat ~/labs/falsimentis/analytics-backup.bat

3Prompt for deobfuscation

Attached analytics-backup.bat and primed the model with a role and a tight task: 'You are an expert in Windows malware analysis. Analyze the attached script file. Deobfuscate the script as needed to understand the functionality.' gpt-4.1 identified it as a dropper, explained the variable-fragmentation evasion technique, and began a step-by-step breakdown. Role priming plus a concrete task is what gets a usable answer instead of a hedge.

4Decode the variables, one command per line

Followed up: 'deobfuscate the script, decoding the variables. Show the commands in the script in deobfuscated form, one command per line.' The model substituted the fragmented variables back into their assembled commands and printed them as discrete lines, which is the form a human can actually reason about and copy into an IOC report.

5Deobfuscate the PowerShell step by step

Narrowed in on the payload: 'Deobfuscate the PowerShell portion of the script. Show the PowerShell commands in their entirety in deobfuscated form. Slow down and think step-by-step.' The model identified the assembled %QMZA% line and walked the substitution: %BSML%=po, %AMBE%=wers, %EUJZ%=hell → powershell, %UEAI%=-c. Asking it to slow down and decompose is a reliable way to cut confident-but-wrong shortcuts on a long obfuscated string.

6Extract the IOC list

Asked for a structured deliverable: 'Extract Indicators of compromise from the deobfuscated PowerShell commands and the other batch script commands. Provide the IOCs in a list format.' gpt-4.1 returned Network IOCs (http://genusight.net/collect?th=..., https://genusight.s3.amazonaws.com/XhXrnSbE.exe), File IOCs (%TEMP%\bitsadmin.exe, a Startup-folder copy for persistence, %USERPROFILE%\.azure\accessTokens.json targeted for credential theft), and a Registry IOC (HKCU\...\Run\BITSAdmin). Every one of these still needs analyst verification, but as a starting IOC set it is minutes of work instead of an hour.

7Prompt for a baseline-collection tool

Switched from analysis to tooling: 'You are an expert PowerShell programmer ... Write a PowerShell script that collects baseline information on the configuration of a Windows host ... output configuration details in multiple files so that later use of the script on systems under investigation can reveal differences ... compared using the PowerShell compare-object command. Do you have any questions for me?' Ending with an explicit invitation for questions turns a one-shot generation into a scoped design conversation.

8Answer the model's clarifying questions

gpt-4.1 asked the right questions before writing code: which configuration areas to cover (services, users/groups, scheduled tasks, listening ports, firewall rules, RDP, Run/RunOnce keys, installed software) and what output format. A model that asks before generating is far more useful than one that guesses, and it mirrors how a competent engineer would respond to the same request.

9Scope the script in the reply

Answered with the full scope: cover running services, scheduled tasks, local user/group accounts, enabled firewall rules, listening ports, startup registry keys, installed programs, autoruns, remote-desktop status, and WMI subscriptions; use Compare-Object to diff; JSON output is fine; baseline once on the gold image, collect again on the host under investigation, and compare on an analyst workstation; target PowerShell 5.1. This is the same baseline-and-diff philosophy used manually in the PowerShell live-investigation lab, now codified into a reusable tool.

10Review the generated tool

The model produced BaselineCollector.ps1: a parameterized script (param OutputFolder), a Save-Json helper wrapping ConvertTo-Json with UTF-8 output, and per-area collection (Get-Service projected to Name/DisplayName/Status/StartType, and so on) written to one JSON file per area under a per-hostname folder. The output is reviewed, not trusted blindly, but it is a working first draft that would have taken real time to write by hand.

11Request usage documentation

Asked the model to 'Generate documentation on how to use the script ... Show sample usage for collecting data from a baseline system, and for a system under investigation. Show sample commands for comparing the results.' It produced a clean usage guide distinguishing the Baseline (gold image) and Investigation (suspect host) scenarios and showing the Compare-Object commands to diff the two JSON sets.

12Set the expert-IR system prompt

For the third task, prepared a system prompt in a text file (IRplaybook.txt) that casts the model as an expert-level incident-response analyst whose job is to take an Event of Interest and produce a usable investigation playbook, with references to SANS incident-handling guidance and MITRE ATT&CK (e.g., T1110 Brute Force) and a version-control table. A strong, reusable system prompt is what makes the model's output consistent across investigations.

$ gedit ~/labs/falsimentis/IRplaybook.txt

13Load the playbook system prompt

Loaded the IR system prompt into a fresh gpt-4.1 conversation. The prompt instructs the model to slow down, think step-by-step about what a responder actually needs, map techniques to MITRE ATT&CK, and maintain a versioned playbook document.

14Provide the Event of Interest

The model asked the right scoping question back ('describe the Event of Interest you would like to focus on'), then was given the EOI: multiple IOCs in a breach investigation centered on the CEO workstation, with a malicious batch script and Network IOCs genusight.net and genusight.s3.amazonaws.com/XhXrnSbE.exe, the same indicators recovered in the deobfuscation task. Feeding the model real, structured EOI context is what turns a generic template into a playbook tailored to this incident.

Key findings

gpt-4.1 deobfuscated analytics-backup.bat's variable fragmentation and reconstructed the hidden PowerShell (%QMZA% → powershell -c ...)

Extracted IOCs: genusight.net/collect, genusight.s3.amazonaws.com/XhXrnSbE.exe, %TEMP%\bitsadmin.exe, Startup-folder persistence, %USERPROFILE%\.azure\accessTokens.json, HKCU Run\BITSAdmin

Generated BaselineCollector.ps1 (PowerShell 5.1, JSON output) for Compare-Object live-response diffing, plus usage docs

Drafted a MITRE ATT&CK-mapped IR playbook (incl. T1110 Brute Force) from a CEO-workstation Event of Interest

Entire workflow run on a local gpt-4.1 (OpenWebUI/Docker) so malware and IOCs never left the environment

Outcome / Lessons learned

Ran a self-hosted gpt-4.1 across the three places an LLM genuinely helps an incident handler: it deobfuscated a variable-fragmented batch dropper and produced a verifiable IOC list in minutes, authored a working BaselineCollector.ps1 plus usage docs for live-response diffing, and drafted a MITRE ATT&CK-mapped response playbook from a CEO-workstation Event of Interest. Every output was treated as a reviewed first draft, and the whole workflow stayed on a local model so the malware and IOCs never left the environment.

Standardize on a self-hosted or contractually-isolated model for anything touching malware, IOCs, or internal data, and document that policy so analysts are not pasting samples into consumer chatbots. Keep a versioned library of vetted system prompts (malware analyst, PowerShell tooling, IR playbook author) so output is consistent and reviewable. Treat every model output as a lead: verify extracted IOCs against the actual sample and threat intel, and code-review generated scripts before running them on production hosts. Capture prompts and outputs as investigation artifacts for repeatability and audit.

Security controls relevant

Self-hosted / data-isolated LLM for any sensitive-data workflow
Policy prohibiting upload of malware or IOCs to consumer AI services
Mandatory human review of LLM-generated code before execution
IOC verification against the source sample and threat intel before action
Versioned, vetted prompt library for repeatable analysis
Logging of AI prompts/outputs as investigation artifacts

What I took away from this

The biggest decision in this lab is invisible in the output: it runs on a self-hosted model. The moment an analyst pastes a malware sample or an internal IOC into a consumer chatbot, that data has left the investigation and may be retained, logged, or trained on. Self-hosting gpt-4.1 in Docker is what makes AI-assisted analysis defensible rather than a data-exfiltration incident of your own making. Capability is the easy part; the operational-security choice is the part that separates a professional workflow from a liability.

Prompting for security work is a real skill and the lab demonstrates the two highest-value techniques. Role priming ('you are an expert in Windows malware analysis') sets the model's frame, and step-by-step decomposition ('slow down, think step-by-step, one command per line') stops it from taking confident shortcuts on long obfuscated strings. The same model that would hand-wave a single sloppy prompt produces precise variable substitutions when the task is decomposed properly.

AI is an accelerator, not an oracle, and the discipline is verification. The model's IOC list and generated PowerShell were excellent starting points, but the IOCs still have to be checked against the actual sample and the script still has to be code-reviewed before it runs on a production host. The win is real (an hour of deobfuscation becomes minutes), but it is a win precisely because a skilled analyst is in the loop to catch the cases where the model is confidently wrong.