FFmpeg-RTMP: A Production-Validated Reference System

A distributed video transcoding system documenting architectural patterns, design invariants, and failure semantics observed under real load. This reference implementation demonstrates master-worker coordination, state machine guarantees, and operational tradeoffs in production environments.

About This Reference System

FFmpeg-RTMP documents architectural choices, invariants, and failure semantics observed under real load. While the system is used in production and is available for reuse, its primary goal is to communicate design tradeoffs and operational lessons rather than to serve as a general-purpose or commercially supported platform.

Research Goals

This reference implementation demonstrates:

Architectural patterns: Pull-based coordination, state machine guarantees, idempotent operations
Design invariants: What never changes, even under failure conditions
Failure semantics: Explicit documentation of retry boundaries and terminal states
Operational tradeoffs: Why certain design choices were made over alternatives
Performance characteristics: Measured behavior under realistic workloads (45,000+ jobs tested)

Key Contributions

State Machine Correctness: FSM with validated transitions and row-level locking prevents race conditions
Failure Mode Documentation: Explicit boundaries between transient (retry) and terminal (fail) errors
Graceful Degradation: Heartbeat-based failure detection with configurable recovery semantics
Production Patterns: Exponential backoff, connection pooling, graceful shutdown demonstrated at scale
Transparency: Design decisions documented with rationale and alternatives considered

What This Is NOT

Not a commercial platform: No support, SLAs, or stability guarantees across versions
Not general-purpose: Optimized for batch transcoding workloads, not real-time streaming
Not plug-and-play: Requires understanding of distributed systems concepts for deployment
Not feature-complete: Focuses on core patterns; many production features deliberately omitted

Intended Audience

Systems researchers studying distributed coordination patterns
Engineers evaluating architectural approaches for similar problems
Students learning production distributed systems design
Teams seeking a reference implementation to adapt for specific use cases

This is a teaching tool backed by real operational data, not a turnkey solution.

Project Organization

This reference implementation is organized to clearly separate concerns:

master/ - Orchestration: job scheduling, failure detection, state management
worker/ - Execution: job processing, FFmpeg integration, metrics collection
shared/ - Common libraries: FSM, retry semantics, database abstractions

See ARCHITECTURE.md for detailed design discussion and CODE_VERIFICATION_REPORT.md for implementation validation.

Running the Reference Implementation

Local Development Environment

For studying the system behavior locally:

# One-command setup: builds, runs, and verifies everything
./scripts/run_local_stack.sh

See docs/LOCAL_STACK_GUIDE.md for details.

Distributed Deployment (Research/Production Use)

The reference implementation can be deployed across multiple nodes to study distributed behavior patterns.

Prerequisites

Go 1.24+ (for building binaries)
Python 3.10+ (optional, for analysis scripts)
FFmpeg (for transcoding)
Linux with kernel 4.15+ (for RAPL power monitoring)

Deploy Master Node

# Clone and build
git clone https://github.com/psantana5/ffmpeg-rtmp.git
cd ffmpeg-rtmp
make build-master

# Set API key for authentication
export MASTER_API_KEY=$(openssl rand -base64 32)

# Start master service
# - TLS enabled by default (auto-generates self-signed cert)
# - SQLite persistence (master.db)
# - Job retry (3 attempts default)
# - Prometheus metrics on port 9090
./bin/master --port 8080 --api-key "$MASTER_API_KEY"

# Optional: Start monitoring stack (VictoriaMetrics + Grafana)
make vm-up-build

Deploy Worker Node(s)

# On worker node(s)
git clone https://github.com/psantana5/ffmpeg-rtmp.git
cd ffmpeg-rtmp
make build-agent

# Set same API key as master
export MASTER_API_KEY="<same-key-as-master>"

# Register and start agent
# Concurrency settings affect failure mode behavior
./bin/agent \
  --register \
  --master https://MASTER_IP:8080 \
  --api-key "$MASTER_API_KEY" \
  --max-concurrent-jobs 4 \
  --poll-interval 3s \
  --insecure-skip-verify
  
# Note: --insecure-skip-verify only for self-signed certs in research environments

Submit Jobs and Observe Behavior

# Submit via CLI
./bin/ffrtmp jobs submit \
  --master https://MASTER_IP:8080 \
  --scenario 1080p-h264 \
  --bitrate 5M \
  --duration 300

# Or via REST API
curl -X POST https://MASTER_IP:8080/jobs \
  -H "Authorization: Bearer $MASTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "scenario": "1080p-h264",
    "confidence": "auto",
    "parameters": {"duration": 300, "bitrate": "5M"}
  }'

# Workers poll master and execute jobs
# Observe state transitions and failure recovery patterns
# Monitor progress at https://MASTER_IP:8080/jobs

Observability Endpoints

Master API: https://MASTER_IP:8080/nodes (registered nodes, health status)
Prometheus Metrics: http://MASTER_IP:9090/metrics
Grafana (optional): http://MASTER_IP:3000 (admin/admin)
VictoriaMetrics (optional): http://MASTER_IP:8428

For systemd service configuration, see deployment/README.md.

Experimental: Edge Workload Wrapper

The Edge Workload Wrapper demonstrates OS-level resource constraint patterns for compute workloads. This experimental component explores non-owning governance models where workloads survive wrapper crashes.

Design Patterns Demonstrated

Non-owning supervision: Workloads run independently of wrapper lifecycle
Attach semantics: Govern already-running processes without restart
Graceful fallback: OS-level constraints degrade gracefully without root/cgroups
Exit tracking: Capture exit codes, reasons, and execution duration

Example Usage

# Run FFmpeg with resource constraints
ffrtmp run \
  --job-id transcode-001 \
  --sla-eligible \
  --cpu-quota 200 \
  --memory-limit 4096 \
  -- ffmpeg -i input.mp4 -c:v h264_nvenc output.mp4

# Attach to existing process (demonstrates attach semantics)
ffrtmp attach \
  --pid 12345 \
  --job-id existing-job-042 \
  --cpu-weight 150 \
  --nice -5

# Auto-discovery watch daemon (NEW!)
ffrtmp watch \
  --scan-interval 10s \
  --enable-state \
  --enable-retry \
  --watch-config /etc/ffrtmp/watch-config.yaml

Auto-Discovery Watch Daemon (Experimental)

Demonstrates automatic process discovery and governance patterns. Explores techniques for:

Non-intrusive process discovery via /proc scanning
State persistence across daemon restarts
Configuration-driven process filtering and governance

Example deployment:

# Install experimental daemon
sudo ./deployment/install-edge.sh

# Configure discovery rules
sudo nano /etc/ffrtmp/watch-config.yaml

# Start service
sudo systemctl start ffrtmp-watch

See deployment/WATCH_DEPLOYMENT.md for implementation details.

Wrapper Documentation

Wrapper Architecture - Design patterns and philosophy
Wrapper Examples - Usage demonstrations

System Architecture and Design Patterns

Resource Management Patterns

The reference implementation demonstrates several resource management approaches:

Running with privileged access (research/production):

# Full cgroup support for resource isolation
sudo ./bin/agent \
  --register \
  --master https://MASTER_IP:8080 \
  --api-key "$MASTER_API_KEY" \
  --max-concurrent-jobs 4 \
  --poll-interval 3s

Benefits of privileged execution:

Strict CPU quotas via cgroups (v1/v2)
Hard memory limits with OOM protection
Complete process isolation per job
Resource exhaustion prevention

Graceful degradation without privileges:

Disk space monitoring (always enforced)
Timeout enforcement (always enforced)
Process priority control via nice
CPU/memory limits disabled (monitoring only)

Per-Job Resource Constraints

Jobs support configurable limits for studying resource contention:

{
  "scenario": "1080p-h264",
  "parameters": {
    "bitrate": "4M",
    "duration": 300
  },
  "resource_limits": {
    "max_cpu_percent": 200,      // 200% = 2 CPU cores
    "max_memory_mb": 2048,        // 2GB memory limit
    "max_disk_mb": 5000,          // 5GB temp space required
    "timeout_sec": 600            // 10 minute timeout
  }
}

Default constraints:

CPU: All available cores (numCPU × 100%)
Memory: 2048 MB (2GB)
Disk: 5000 MB (5GB)
Timeout: 3600 seconds (1 hour)

Resource Isolation Techniques

1. CPU Limits (cgroup-based)

Demonstrates per-job CPU percentage allocation (100% = 1 core)
Supports cgroup v1 and v2
Fallback to nice priority without root

2. Memory Limits (cgroup-based)

Hard memory caps via Linux cgroups
OOM (Out of Memory) protection
Automatic process termination if limits exceeded
Requires root for enforcement

3. Disk Space Monitoring 2. Memory Limits (cgroup-based)

Hard memory caps with OOM protection
Automatic fallback to monitoring without enforcement (no root)

3. Disk Space Monitoring

Pre-job validation (reject at 95% usage)
Always enforced (no privileges required)
Configurable cleanup policies for temporary files

4. Timeout Enforcement

Per-job timeout with context-based cancellation
SIGTERM → SIGKILL escalation
Process group cleanup

5. Process Priority

Nice value = 10 (lower than system services)
Always enforced (no privileges required)

Observability and Metrics

The system exports Prometheus metrics demonstrating:

Resource usage patterns: CPU, memory, GPU utilization per job
Job lifecycle: Active jobs, completion rates, latency distribution
Hardware monitoring: GPU power, temperature (NVIDIA)
Encoder availability: NVENC, QSV, VAAPI runtime detection
Bandwidth tracking: Input/output bytes, compression ratios
SLA classification: Intelligent job categorization (production vs test/debug)

Metrics endpoint: http://worker:9091/metrics

Documentation:

Auto-Attach Documentation - Process discovery patterns
Bandwidth Metrics Guide - Bandwidth tracking implementation
SLA Tracking Guide - Service level monitoring approach
SLA Classification Guide - Job classification methodology (99.8% compliance with 45K+ jobs)
Alerting Guide - Prometheus alert configuration

Measured Performance Characteristics

Test Results (45,000+ jobs across 31 scenarios):

99.8% SLA compliance observed
Automatic retry recovers transient failures (network errors, node failures)
FFmpeg failures terminal (codec errors, format issues)
Heartbeat-based failure detection (90s timeout, 3 missed heartbeats)

See CODE_VERIFICATION_REPORT.md for implementation validation and docs/SLA_CLASSIFICATION.md for complete testing methodology.

Configuration Examples by Workload Type

720p Fast Encoding:

"resource_limits": {
  "max_cpu_percent": 150,     // 1.5 cores
  "max_memory_mb": 1024,      // 1GB
  "timeout_sec": 300          // 5 minutes
}

1080p Standard Encoding:

"resource_limits": {
  "max_cpu_percent": 300,     // 3 cores
  "max_memory_mb": 2048,      // 2GB
  "timeout_sec": 900          // 15 minutes
}

4K High Quality Encoding:

"resource_limits": {
  "max_cpu_percent": 600,     // 6 cores
  "max_memory_mb": 4096,      // 4GB
  "timeout_sec": 3600         // 1 hour
}

System Requirements for Resource Limits

Minimum (without root): System requirements:

Linux kernel 3.10+
/tmp with 10GB+ free space
2GB+ RAM per worker

Recommended (with privileged access):

Linux kernel 4.5+ (cgroup v2 support)
/tmp with 50GB+ free space
8GB+ RAM per worker
SSD storage for /tmp

Additional documentation:

Resource Limits Guide - Configuration reference
Production Features - Additional hardening patterns
Troubleshooting - Common issues

Local Development Mode

For development and experimentation, Docker Compose provides a single-machine setup:

# Clone and start
git clone https://github.com/psantana5/ffmpeg-rtmp.git
cd ffmpeg-rtmp
make up-build

# Submit test jobs
make build-cli
./bin/ffrtmp jobs submit --scenario 1080p-h264 --bitrate 5M --duration 60

# View metrics at http://localhost:3000

Note: Docker Compose is for local testing only. For distributed deployment, see above.

See shared/docs/DEPLOYMENT_MODES.md for deployment comparisons.

Key Design Patterns Demonstrated

Version 2.4 (2026-01-06): Production Reliability Patterns

Retry Semantics:

Transport-layer retry only (HTTP requests, heartbeats, polling)
Exponential backoff: 1s → 30s, max 3 retries
Context-aware (respects cancellation)
Job execution never retried (FFmpeg failures terminal)

Graceful Shutdown:

Worker: Stop accepting jobs, drain current jobs (30s timeout)
Master: LIFO shutdown order (HTTP → metrics → scheduler → DB → logger)
No workload interruption (jobs complete naturally or timeout)
Async coordination via shutdown.Done() channel

Readiness Checks:

FFmpeg validation before accepting work
Disk space verification
Master connectivity check
HTTP 200 only when truly ready (Kubernetes-friendly)

Centralized Logging:

Structured directory: /var/log/ffrtmp/<component>/<subcomponent>.log
Multi-writer: file + stdout (systemd journald compatible)
Automatic fallback to ./logs/ without privileges

Documentation:

Production Readiness Guide - Complete pattern documentation
Security Review - Security audit
Audit Summary - Technical debt elimination

Version 2.3: Distributed Coordination Patterns

Concurrency:

Workers process multiple jobs simultaneously (--max-concurrent-jobs)
Hardware-aware configuration tool: ffrtmp config recommend

Reliability:

TLS/HTTPS enabled by default (auto-generated certificates)
API authentication via MASTER_API_KEY
SQLite persistence (jobs survive restarts)
Automatic retry with exponential backoff

Observability:

Built-in Prometheus metrics (port 9090)
Dual engine support (FFmpeg/GStreamer)

See docs/README.md for comprehensive documentation.

Fault Tolerance Implementation

Job Recovery Patterns

Failure Detection:

Heartbeat-based (90s timeout, 3 missed heartbeats)
Identifies dead nodes and orphaned jobs

Automatic Reassignment:

Jobs from failed workers automatically reassigned
Smart retry for transient failures (network errors, timeouts)
FFmpeg failures terminal (not retried)
Max 3 retry attempts with exponential backoff

Stale Job Handling:

Batch jobs timeout after 30min
Live jobs timeout after 5min inactivity

Priority Queue Implementation

Multi-level priorities: Live > High > Medium > Low > Batch

Queue-based scheduling: live, default, batch queues with different SLAs

FIFO within priority: Fair scheduling for same-priority jobs

Security Patterns

TLS/mTLS between master and workers
API key authentication required
Certificate auto-generation support

# Example: Submit high-priority job
./bin/ffrtmp jobs submit \
    --scenario live-4k \
    --queue live \
    --priority high \
    --duration 3600

# Configure master
./bin/master \
    --port 8080 \
    --max-retries 5 \
    --scheduler-interval 10s \
    --api-key "$MASTER_API_KEY"
    
# Configure worker
./bin/agent \
    --master https://MASTER_IP:8080 \
    --max-concurrent-jobs 4 \
    --poll-interval 3s \
    --heartbeat-interval 30s

See docs/README.md for complete implementation details.

Dual Engine Support (FFmpeg + GStreamer)

Demonstrates engine selection patterns for different workload characteristics:

FFmpeg (default): General-purpose file transcoding
GStreamer: Low-latency live streaming
Auto-selection: System chooses based on workload type
Hardware acceleration: NVENC, QSV, VAAPI support for both

# Auto-select engine (default)
./bin/ffrtmp jobs submit --scenario live-stream --engine auto

# Force specific engine
./bin/ffrtmp jobs submit --scenario transcode --engine ffmpeg
./bin/ffrtmp jobs submit --scenario live-rtmp --engine gstreamer

Auto-selection logic:

LIVE queue → GStreamer (low latency)
FILE/batch → FFmpeg (better for offline)
RTMP streaming → GStreamer
GPU+NVENC+streaming → GStreamer

See docs/DUAL_ENGINE_SUPPORT.md for details.

Research Applications

This reference system can be used to study:

Distributed coordination: Master-worker patterns, state machine guarantees, failure detection
Resource management: CPU/memory limits, cgroup isolation, graceful degradation
Retry semantics: Transient vs terminal failures, exponential backoff, idempotent operations
Observability patterns: Metrics collection, distributed tracing, structured logging
Energy efficiency: Power consumption during video transcoding (Intel RAPL)
Workload scaling: Performance characteristics across multiple nodes

System Architecture

Distributed Deployment (Primary Use Case)

Master-worker architecture demonstrating coordination patterns:

Master Node: Job orchestration, failure detection, metrics aggregation
- HTTP API (Go)
- VictoriaMetrics (30-day retention)
- Grafana (visualization)
Worker Nodes: Job execution, resource monitoring, heartbeat reporting
- Hardware auto-detection
- Pull-based job polling
- Local metrics collection
- Result reporting

Local Development (Single Machine)

Docker Compose stack for experimentation:

Nginx RTMP (streaming server)
VictoriaMetrics (time-series database)
Grafana (dashboards)
Go Exporters (CPU/GPU metrics via RAPL/NVML)
Python Exporters (QoE metrics, analysis)
Alertmanager (alert routing)

See shared/docs/DEPLOYMENT_MODES.md for architecture diagrams.

Documentation Index

Primary documentation: docs/README.md - Complete reference guide

Implementation Guides

Configuration Tool - Hardware-aware worker configuration
Concurrent Jobs Guide - Parallel job processing
Job Launcher Script - Batch job submission

Academic Publications

Deployment Success Report - Real-world deployment case study

Implementation Details

Dual Engine Support - FFmpeg + GStreamer selection patterns
Production Features - Reliability patterns (TLS, auth, retry, metrics)
Deployment Modes - Architecture comparison
Internal Architecture - Runtime model and operations
Distributed Architecture - Master-worker coordination
Production Deployment - Systemd service configuration
Getting Started Guide - Initial setup

Testing and Validation

Running Tests - Test scenarios and execution
Go Exporters Quick Start - Metrics collection setup
Troubleshooting - Common issues

Technical Reference

Architecture Overview - System design and data flow
Exporters Quick Reference - Metrics collection patterns
Exporters Overview - Master-side metrics
Master Exporters Deployment - Master metrics setup
Worker Exporters - Worker-side metrics
Worker Exporters Deployment - Worker metrics setup
Energy Advisor - ML-based efficiency analysis
Documentation Index - Complete technical documentation

Command Reference

Distributed Deployment Commands

# Build components
make build-master          # Build master node binary
make build-agent           # Build worker agent binary
make build-cli             # Build ffrtmp CLI tool
make build-distributed     # Build all

# Get hardware-aware configuration
./bin/ffrtmp config recommend --environment production --output text

# Run services
./bin/master --port 8080 --api-key "$MASTER_API_KEY"
./bin/agent --register --master https://MASTER_IP:8080 \
  --api-key "$MASTER_API_KEY" \
  --max-concurrent-jobs 4 \
  --insecure-skip-verify

# Submit and manage jobs
./bin/ffrtmp jobs submit --scenario 1080p-h264 --bitrate 5M --duration 300
./bin/ffrtmp jobs status <job-id>
./bin/ffrtmp nodes list

# Systemd service management
sudo systemctl start ffmpeg-master
sudo systemctl start ffmpeg-agent
sudo systemctl status ffmpeg-master

# Monitor and observe
curl -k https://localhost:8080/nodes      # List registered workers
curl -k https://localhost:8080/jobs       # List jobs
curl http://localhost:9090/metrics        # Prometheus metrics
journalctl -u ffmpeg-master -f            # View master logs
journalctl -u ffmpeg-agent -f             # View worker logs

Local Development Commands

# Stack management
make up-build              # Start Docker Compose stack
make down                  # Stop stack
make ps                    # Show container status
make logs SERVICE=victoriametrics  # View service logs

# Testing scenarios
make test-single           # Run single stream test
make test-batch            # Run batch test matrix
make run-benchmarks        # Run benchmark suite
make analyze               # Analyze results

# Development tools
make lint                  # Run linting
make format                # Format code
make test                  # Run test suite

Example Research Scenarios

Scenario 1: Studying Distributed Failure Recovery

Observe job reassignment after worker failure:

# Submit long-running jobs
./bin/ffrtmp jobs submit --scenario 4K-h265 --bitrate 15M --duration 3600
./bin/ffrtmp jobs submit --scenario 1080p-h264 --bitrate 5M --duration 1800

# Monitor initial assignment
curl -k https://master:8080/jobs

# Kill a worker mid-job (simulate failure)
sudo systemctl stop ffmpeg-agent  # On worker node

# Observe master detecting failure (90s timeout)
# Watch job reassignment to healthy workers
curl -k https://master:8080/jobs  # Check job state transitions

# Analyze recovery time and behavior
journalctl -u ffmpeg-master -f

Observations to study:

Heartbeat failure detection timing (3 × 30s = 90s)
Job state transitions (running → failed → queued)
Reassignment latency
Worker re-registration behavior

Scenario 2: Analyzing Resource Isolation Effectiveness

Test cgroup-based resource limits under contention:

# Submit multiple jobs with different CPU limits
./bin/ffrtmp jobs submit --scenario 1080p-h264 --duration 600 \
  --cpu-limit 200   # 2 cores

./bin/ffrtmp jobs submit --scenario 1080p-h264 --duration 600 \
  --cpu-limit 100   # 1 core

# Monitor actual CPU usage via Prometheus metrics
curl http://worker:9091/metrics | grep cpu_usage

# Compare observed vs requested CPU allocation
# Study cgroup enforcement effectiveness

Scenario 3: Energy Efficiency Analysis

Compare codec energy consumption patterns:

# Start local development stack
make up-build && make build-cli

# Test H.264 codec
./bin/ffrtmp jobs submit --scenario 4K60-h264 --bitrate 10M --duration 120
./bin/ffrtmp jobs submit --scenario 1080p60-h264 --bitrate 5M --duration 60

# Test H.265 codec
./bin/ffrtmp jobs submit --scenario 4K60-h265 --bitrate 10M --duration 120
./bin/ffrtmp jobs submit --scenario 1080p60-h265 --bitrate 5M --duration 60

# Analyze energy consumption via RAPL metrics
python3 scripts/analyze_results.py

# View power consumption dashboards
# Open Grafana at http://localhost:3000

Production: Continuous CI/CD Benchmarking

Deploy distributed mode with agents on your build servers:

# CI/CD pipeline submits jobs to master after each release
curl -X POST https://master:8080/jobs \
  -H "Authorization: Bearer $MASTER_API_KEY" \
  -H "Content-Type: application/json" \
  -d @benchmark_config.json

# Results automatically aggregated and visualized
# Alerts fire if performance regressions detected

Contributing

Contributions are welcome! See the detailed documentation for development guidelines.

License

See LICENSE file for details.

Quick Links

Testing

The project includes comprehensive test coverage for critical components:

# Run all tests with race detector
cd shared/pkg
go test -v -race ./...

# Run tests with coverage report
go test -v -coverprofile=coverage.out ./models ./scheduler ./store
go tool cover -html=coverage.out

Test Coverage:

models: 85% (FSM state machine fully tested)
scheduler: 53% (priority queues, recovery logic)
store: Comprehensive database operations tests
agent: Engine selection, optimizers, encoders

CI/CD:

Automated testing on every push
Race condition detection
Multi-architecture builds (amd64, arm64)
Binary artifacts for master, worker, and CLI

See CONTRIBUTING.md for testing guidelines.

Documentation

Core documentation has been streamlined for clarity:

docs/README.md - Complete system documentation (NEW)
docs/CONFIGURATION_TOOL.md - Hardware-aware config tool
CONCURRENT_JOBS_IMPLEMENTATION.md - Parallel processing guide
QUICKSTART.md - Get started in 5 minutes
docs/ARCHITECTURE.md - System design and architecture
DEPLOYMENT.md - Production deployment guide
CONTRIBUTING.md - Contribution guidelines
docs/LOCAL_STACK_GUIDE.md - Local development setup
CHANGELOG.md - Version history

Additional technical documentation is available in docs/archive/ for reference.

Name		Name	Last commit message	Last commit date
Latest commit History 743 Commits
.github		.github
ansible		ansible
certs		certs
cmd		cmd
deployment		deployment
docs		docs
examples		examples
internal		internal
master		master
ml_rust		ml_rust
results		results
scripts		scripts
shared		shared
test_results		test_results
tests		tests
worker		worker
.coverage		.coverage
.coveragerc		.coveragerc
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
ACADEMIC_REFRAMING_2026-01-07.md		ACADEMIC_REFRAMING_2026-01-07.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
DEPLOY.md		DEPLOY.md
LICENSE		LICENSE
Makefile		Makefile
PRODUCTION_CHECKLIST.md		PRODUCTION_CHECKLIST.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
batch_stress_matrix.json		batch_stress_matrix.json
config-postgres.yaml		config-postgres.yaml
config.yaml		config.yaml
deploy.sh		deploy.sh
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
nginx.conf		nginx.conf
production_benchmarks.json		production_benchmarks.json
pyproject.toml		pyproject.toml

License

psantana5/ffmpeg-rtmp

Folders and files

Latest commit

History

Repository files navigation

FFmpeg-RTMP: A Production-Validated Reference System

About This Reference System

Research Goals

Key Contributions

What This Is NOT

Intended Audience

Project Organization

Running the Reference Implementation

Local Development Environment

Distributed Deployment (Research/Production Use)

Prerequisites

Deploy Master Node

Deploy Worker Node(s)

Submit Jobs and Observe Behavior

Observability Endpoints

Experimental: Edge Workload Wrapper

Design Patterns Demonstrated

Example Usage

Auto-Discovery Watch Daemon (Experimental)

Wrapper Documentation

System Architecture and Design Patterns

Resource Management Patterns

Per-Job Resource Constraints

Resource Isolation Techniques

Observability and Metrics

Measured Performance Characteristics

Configuration Examples by Workload Type

System Requirements for Resource Limits

Local Development Mode

Key Design Patterns Demonstrated

Version 2.4 (2026-01-06): Production Reliability Patterns

Version 2.3: Distributed Coordination Patterns

Fault Tolerance Implementation

Job Recovery Patterns

Priority Queue Implementation

Security Patterns

Dual Engine Support (FFmpeg + GStreamer)

Research Applications

System Architecture

Distributed Deployment (Primary Use Case)

Local Development (Single Machine)

Documentation Index

Implementation Guides

Academic Publications

Implementation Details

Testing and Validation

Technical Reference

Command Reference

Distributed Deployment Commands

Local Development Commands

Example Research Scenarios

Scenario 1: Studying Distributed Failure Recovery

Scenario 2: Analyzing Resource Isolation Effectiveness

Scenario 3: Energy Efficiency Analysis

Production: Continuous CI/CD Benchmarking

Contributing

License

Quick Links

Testing

Documentation

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages