Scaling & performance

Jutsu scales each tier independently: stateless APIs scale on request load, agent workers scale on queue depth, and OpenSearch absorbs alert volume through per-organization weekly indices. This page explains how each tier scales and where the practical limits sit.

How each tier scales

The architecture separates request handling from event processing, so the two scale on different signals.

Tier	Scaling signal	Mechanism
Platform & ingest APIs	CPU utilization	Stateless — add replicas behind the load balancer (HorizontalPodAutoscaler).
Agent workers	RabbitMQ queue depth	KEDA adds replicas as a stage's queue grows, then removes them as it drains.
OpenSearch	Alert volume	Per-organization weekly indices spread load; size the cluster to ingest rate and retention.
PostgreSQL	Transactional load	Vertical sizing plus read replicas as the dashboard fan-out grows.

Because the APIs hold no per-request state, scaling them out is safe and immediate. Because workers scale on queue depth rather than CPU, the pipeline self-balances: a stage that falls behind grows its queue, which adds workers until the backlog drains.

Weekly index rollover

Each organization's events roll over into a fresh OpenSearch index every week (org-{organizationId}-{year}-w{week}-events). Rollover keeps any single index bounded, so query latency stays predictable as total data grows, and aging out an old week is an index drop rather than a bulk delete. Reads span an organization's weeks through a wildcard pattern.

Rate limiting

The platform API applies a per-IP rate limit sized for normal dashboard polling, with tighter limits on credential-bearing auth endpoints. Service-to-service and high-volume webhook ingest traffic are intentionally exempt — Jutsu does not rate-limit SIEM forwarders, so ingest scales on throughput instead.

Rate-limit windows and per-IP thresholds are configurable per deployment. Confirm the limits and their backing store against your configuration.

Where bottlenecks appear

Capacity pressure tends to surface in two places before anywhere else.

Ingest throughput. The ingest API and normalizer set the ceiling on events per second. If forwarders outrun the normalizer, the normalizer queue grows — the signal that drives worker autoscaling. Watch queue depth here first.
LLM latency and cost. Triage, enrichment, correlation, and reporting call external LLM providers, so per-alert latency and spend track model choice and call volume. Cached step results (Redis, with an in-process fallback) cut repeat calls; provider rate limits (HTTP 429) are handled with exponential-backoff retries.

OpenSearch query latency is the next thing to watch as retention grows, which is what weekly rollover is designed to contain.

Sizing guidance

Start from your event rate and retention window, then size each tier to it.

APIs — scale on observed CPU; they are cheap to replicate. Leave headroom for dashboard polling bursts.
Workers — let KEDA set replica counts from queue depth rather than pinning them. Tune the per-replica queue-length target if a stage chronically lags or over-provisions.
OpenSearch — size storage to events/day × retention and shards to weekly index volume.
LLM budget — model per-alert cost against expected alert volume; lean on caching and pick smaller models for high-volume stages.

Jutsu does not publish fixed throughput numbers — capacity depends on your event mix, model choices, and managed-service tiers. Load-test with representative traffic and confirm against your deployment.

Scaling & performance#

How each tier scales#

Weekly index rollover#

Rate limiting#

Where bottlenecks appear#

Sizing guidance#

Related#