/ ai-playground — llm red team lab

Prompt injection research
against production LLMs.

A reproducible red-team study of prompt-injection techniques mapped to the OWASP LLM Top 10 and MITRE ATLAS, tested across frontier and budget-tier models via Vercel AI Gateway. Each attack ships with a defensive mitigation and every result is reproducible from a pinned model ID and the prompt committed to source.

attacks: 3
owasp categories: 3 / 10
models evaluated: 6
status: week 1 / scaffold

lab notebook

Seed attacks cover 3 of 10 OWASP categories. The matrix UI, filters, slide-over transcripts, and the live sandbox land in weeks 2-4. Full scope: llm-redteam-brief.

seeded attacks

LLM01Prompt Injection
Direct instruction override against a customer-support assistant
instruction-override · high · AML.T0051.000
LLM02Sensitive Information Disclosure
Indirect PII exfiltration by embedding the request in a fictional narrative
role-play-jailbreak · high · AML.T0057
LLM07System Prompt Leakage
System-prompt extraction by requesting an 'accessibility summary' of prior context
system-prompt-leak · high · AML.T0055

Prompt injection research against production LLMs.

Prompt injection research
against production LLMs.