Under the hood

What reasons, what doesn't, and what gets gated.

"AI-powered" tells you nothing. The question that matters is which parts of the system are allowed to improvise. Here is our answer, published, because technical evaluators deserve better than a diagram with clouds on it.

The principle

Creative where judgment helps. Boring where it doesn't.

Investigation benefits from reasoning; execution must not. Soarcery splits every run across three layers with a hard line between them.

Deterministic

The parts that must never improvise

Pure logic, no model in the loop, same inputs, same outputs, every time.

  • >Parsing your plain-language instructions into ordered, numbered steps.
  • >Resolving which tool connections a step may touch, with least privilege.
  • >Correlating alerts to open inquiries by asset, before any investigation.
  • >Verifying tool calls actually succeeded: an agent cannot report success over a failed call, the run is marked errored instead.
  • >Loop and recursion guards, cancellation, and timeouts on everything.
Reasoning

The part that thinks

A frontier-model agent works each step with the tools that step allows, and nothing else.

  • >Reads the alert, the step instruction, your runbooks, and prior step results.
  • >Chooses which lookups to run and reads the disagreement, not just the score.
  • >Writes conclusions in plain language with confidence stated and every claim cited to a tool result.
  • >Its full output, token usage, and per-call cost are recorded on the run. Nothing it says is unattributable.
Gated

The part you control

Consequential actions stop at the approval queue. The line between autonomous and gated is yours, per use case.

  • >Risk-rated approvals with the evidence, blast radius, and expiry attached.
  • >Approve and reject are both first-class outcomes; both are recorded with identity.
  • >Reversible actions preferred by design; irreversible ones labelled and held to a higher bar.
  • >Mid-run questions pause the run and wait for a human, then resume exactly where they stopped.
What we log

If it happened, it's on the trail.

Every run produces a complete, replayable record. This is not a debug feature, it is the product's spine: the same trail the agent builds while working is the one you hand to an auditor.

  • Every step: input, output, status, and timing.
  • Every tool call: parameters, result, latency, and cost.
  • Every model interaction: tokens in, tokens out, accounted per run.
  • Every human decision: who approved or rejected what, and when.
run 01J9Z3 · spell: phishing-triage
step 3 threat_intel.lookup · 640ms · $0.0003 · ok
step 4 siem.search · 1.2s · ok
finding confidence: high · evidence: 4 citations
gate AP-2031 · approved · identity recorded
actions 3 · reversible: 3 · status: complete
tokens 2,118 · cost accounted · trail exportable

A run record, abridged. The tour shows the full version.

Failure modes

What happens when things go wrong.

Any vendor page that skips this section is telling you something.

A tool is down

The step errors honestly and the run says so. The deterministic layer checks every call's real result, so an agent cannot paper over a dead integration with confident prose.

The agent is uncertain

Uncertainty is a first-class verdict. Contested evidence routes to a human with the disagreement intact instead of being averaged into a guess.

A run misbehaves

Hard timeouts, recursion depth caps, and cancellation that actually interrupts in-flight work. And because actions prefer reversibility, the undo is part of the design, not an apology.

Kick the tires

Architecture pages are claims too.
Here's the evidence.

Walk a real case and check this page against what you see.