Skip to main content
/ ai-playground — llm red team lab

Prompt injection research against production LLMs.

A reproducible red-team study of prompt-injection techniques mapped to the OWASP LLM Top 10 and MITRE ATLAS, tested across frontier and budget-tier models via Vercel AI Gateway. Each attack ships with a defensive mitigation and every result is reproducible from a pinned model ID and the prompt committed to source.

attacks
3
owasp categories
3 / 10
models evaluated
6
status
week 1 / scaffold
lab notebook

Seed attacks cover 3 of 10 OWASP categories. The matrix UI, filters, slide-over transcripts, and the live sandbox land in weeks 2-4. Full scope: llm-redteam-brief.

seeded attacks
  • LLM01Prompt Injection
    Direct instruction override against a customer-support assistant
    instruction-override  ·  high  ·  AML.T0051.000
  • LLM02Sensitive Information Disclosure
    Indirect PII exfiltration by embedding the request in a fictional narrative
    role-play-jailbreak  ·  high  ·  AML.T0057
  • LLM07System Prompt Leakage
    System-prompt extraction by requesting an 'accessibility summary' of prior context
    system-prompt-leak  ·  high  ·  AML.T0055
LLM Red Team Lab | Luis Javier Lozoya | Luis Javier Lozoya