/ ai-playground — llm red team lab
Prompt injection research
against production LLMs.
A reproducible red-team study of prompt-injection techniques mapped to the OWASP LLM Top 10 and MITRE ATLAS, tested across frontier and budget-tier models via Vercel AI Gateway. Each attack ships with a defensive mitigation and every result is reproducible from a pinned model ID and the prompt committed to source.
- attacks
- 3
- owasp categories
- 3 / 10
- models evaluated
- 6
- status
- week 1 / scaffold
lab notebook
Seed attacks cover 3 of 10 OWASP categories. The matrix UI, filters, slide-over transcripts, and the live sandbox land in weeks 2-4. Full scope: llm-redteam-brief.
seeded attacks
- LLM01Prompt InjectionDirect instruction override against a customer-support assistantinstruction-override · high · AML.T0051.000
- LLM02Sensitive Information DisclosureIndirect PII exfiltration by embedding the request in a fictional narrativerole-play-jailbreak · high · AML.T0057
- LLM07System Prompt LeakageSystem-prompt extraction by requesting an 'accessibility summary' of prior contextsystem-prompt-leak · high · AML.T0055