- Docs
- Architecture & operations
- Scaling & performance
Scaling & performance
Jutsu scales each tier independently: stateless APIs scale on request load, agent workers scale on queue depth, and OpenSearch absorbs alert volume through per-organization weekly indices. This page explains how each tier scales and where the practical limits sit.
How each tier scales
The architecture separates request handling from event processing, so the two scale on different signals.
| Tier | Scaling signal | Mechanism |
|---|---|---|
| Platform & ingest APIs | CPU utilization | Stateless — add replicas behind the load balancer (HorizontalPodAutoscaler). |
| Agent workers | RabbitMQ queue depth | KEDA adds replicas as a stage's queue grows, then removes them as it drains. |
| OpenSearch | Alert volume | Per-organization weekly indices spread load; size the cluster to ingest rate and retention. |
| PostgreSQL | Transactional load | Vertical sizing plus read replicas as the dashboard fan-out grows. |
Because the APIs hold no per-request state, scaling them out is safe and immediate. Because workers scale on queue depth rather than CPU, the pipeline self-balances: a stage that falls behind grows its queue, which adds workers until the backlog drains.
Weekly index rollover
Each organization's events roll over into a fresh OpenSearch index every week (org-{organizationId}-{year}-w{week}-events). Rollover keeps any single index bounded, so query latency stays predictable as total data grows, and aging out an old week is an index drop rather than a bulk delete. Reads span an organization's weeks through a wildcard pattern.
Rate limiting
The platform API applies a per-IP rate limit sized for normal dashboard polling, with tighter limits on credential-bearing auth endpoints. Service-to-service and high-volume webhook ingest traffic are intentionally exempt — Jutsu does not rate-limit SIEM forwarders, so ingest scales on throughput instead.
Where bottlenecks appear
Capacity pressure tends to surface in two places before anywhere else.
- Ingest throughput. The ingest API and normalizer set the ceiling on events per second. If forwarders outrun the normalizer, the normalizer queue grows — the signal that drives worker autoscaling. Watch queue depth here first.
- LLM latency and cost. Triage, enrichment, correlation, and reporting call external LLM providers, so per-alert latency and spend track model choice and call volume. Cached step results (Redis, with an in-process fallback) cut repeat calls; provider rate limits (HTTP 429) are handled with exponential-backoff retries.
OpenSearch query latency is the next thing to watch as retention grows, which is what weekly rollover is designed to contain.
Sizing guidance
Start from your event rate and retention window, then size each tier to it.
- APIs — scale on observed CPU; they are cheap to replicate. Leave headroom for dashboard polling bursts.
- Workers — let KEDA set replica counts from queue depth rather than pinning them. Tune the per-replica queue-length target if a stage chronically lags or over-provisions.
- OpenSearch — size storage to
events/day × retentionand shards to weekly index volume. - LLM budget — model per-alert cost against expected alert volume; lean on caching and pick smaller models for high-volume stages.