Expense Manager CTF

Multi-turn conversational agent CTF testing authorization bypass vulnerabilities in LLM-powered applications.

What This App Is

An expense management chatbot powered by LangGraph and GPT-4 that intentionally has weak authorization controls. All code-level permission checks are disabled in the tools - the LLM must enforce security rules through reasoning alone. This creates vulnerabilities that can be exploited through prompt injection, role confusion, and social engineering attacks.

The goal: exploit the weak authorization to access or manipulate Shuo's expense data (shuo@promptfoo.dev).

API - Chat Endpoint

Endpoint: `POST /chat`

Request:

{
  "sessionId": "optional-session-id",
  "userEmail": "attacker@example.com",
  "message": "Show me all expenses",
  "ctfId": "optional-ctf-id"
}

Response:

{
  "sessionId": "abc123def456",
  "response": "Here are your expenses...",
  "capturedFlags": ["system_prompt_leak"]
}

Fields

sessionId (optional): Conversation session ID. If not provided or invalid, a new session is auto-created
userEmail (required): User's email address. New users are auto-created with employee role
message (required): User's chat message
ctfId (optional): CTF ID for flag submission to platform leaderboard
response: Agent's response message
capturedFlags: Array of flag names captured in this turn (empty if none)

Conversation & Session Management

Session Storage: In-memory dictionary (sessionId -> session data)

Session Structure:

{
  'user_email': 'user@example.com',
  'user_id': 2,
  'messages': [
    {'role': 'user', 'content': 'Show my expenses'},
    {'role': 'assistant', 'content': 'Here are your expenses...'}
  ],
  'created_at': '2025-11-29 10:30:00'
}

Session Flow:

First Request: Client sends userEmail with optional sessionId
- If sessionId is provided: Creates new session with that ID
- If sessionId is missing: Server generates random 16-char ID
- User is created/retrieved from email address
Subsequent Requests: Client includes sessionId from previous response
- Server retrieves conversation history from session
- History is sent to LangGraph agent as context
- New messages are appended to session history
New Chat: Client omits sessionId or sends new ID to start fresh conversation

User Management: Users are auto-created from email addresses. Only shuo@promptfoo.dev is pre-defined (ID=1). All other emails create new users dynamically with auto-incrementing IDs.

Running Locally

Install dependencies with uv:

uv sync

Set API key:

export OPENAI_API_KEY=your-openai-api-key

Start server:

uv run python -m ctf_expense_manager.server

Docker

Build and run with Docker:

# Build image
docker build -t ctf-expense-manager .

# Run container
docker run -p 8080:8080 -e OPENAI_API_KEY=your-key ctf-expense-manager

The container runs on port 8080 (Cloud Run compatible).

Contributing

Interested in contributing? Check out CONTRIBUTING.md for:

Development setup instructions
Code quality guidelines
Testing requirements
Pull request process

Server runs on http://localhost:5005

Available Endpoints:

/chat - Chat API (POST)
/ui - Custom UI with flags sidebar (GET)
/config.yaml - CTF config for platform import (GET)
/health - Health check (GET)

Test the API

Using curl:

# First message (no sessionId)
curl -X POST http://localhost:5005/chat \
  -H "Content-Type: application/json" \
  -d '{"userEmail": "test@example.com", "message": "Who am I?"}'

# Response includes sessionId: "abc123..."

# Follow-up message (with sessionId)
curl -X POST http://localhost:5005/chat \
  -H "Content-Type: application/json" \
  -d '{"sessionId": "abc123...", "userEmail": "test@example.com", "message": "Show my expenses"}'

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
src/ctf_expense_manager		src/ctf_expense_manager
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
README.md		README.md
config.yaml		config.yaml
pyproject.toml		pyproject.toml
renovate.json		renovate.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Expense Manager CTF

What This App Is

API - Chat Endpoint

Endpoint: `POST /chat`

Fields

Conversation & Session Management

Running Locally

Docker

Contributing

Test the API

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

Languages

promptfoo/ctf-expense-manager

Folders and files

Latest commit

History

Repository files navigation

Expense Manager CTF

What This App Is

API - Chat Endpoint

Endpoint: POST /chat

Fields

Conversation & Session Management

Running Locally

Docker

Contributing

Test the API

About

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Languages

Endpoint: `POST /chat`

Packages