Hermes Katana

Inner function map

Hermes Katana Function Map

As Above So Below

One visual board

Release, runtime, and install path in one frame.

The top row is release readiness. The middle row is runtime enforcement. The bottom row is evidence, verification, and user setup. The rule is simple: GitHub code, GitHub Pages, and Hugging Face artifacts should become public together.

Hermes Katana pipeline board showing source of truth, package build, Hugging Face artifacts, public launch, ingress, scan, policy, dispatch, evidence, verification, and user install path

Runtime path

A tool call becomes a security decision.

Hermes Katana does not try to make prompts magically safe. It puts a decision point around tool dispatch, then feeds that decision point with provenance, scanner findings, model signals, policy rules, and audit evidence.

Ingress

User text, tool output, web/file content, MCP schemas, retrieved documents, and model output enter with origin context where possible.

taint/labels.py

Normalize

Hidden payloads are decoded before scanning: base64, URL encoding, HTML entities, Unicode spoofing, document metadata, and embedded carriers.

scanner/decoder.py

Classify and Scan

Scabbard, ProtectAI, heuristic scanners, structural scanners, RAG/MCP detectors, and multi-turn checks add risk signals.

middleware/integration.py

Decide

Policy resolves the current tool call to allow, deny, escalate, or log-only. Audit records the path even when execution is blocked.

policy/engine.py

Default middleware order

The chain is ordered by priority.

Higher priority middleware runs first. A deny short-circuits execution. Escalations continue collecting context so a caller can decide whether a human review is needed.

Dispatch stack

  • 100 Taintsource labels and flow checks
  • 90 Scabbardprimary multi-signal classifier
  • 88 ProtectAIbinary injection gate
  • 85 Scabbard secondarysecond classifier pass
  • 80 Scannerinjection, secrets, unicode, content, commands
  • 78 MCPtool schema and server poisoning
  • 76 Multi-turnconversation-level escalation
  • 74 RAG injectionretrieved-document attacks
  • 70 StructuralHTML, PDF, Markdown, bloom signals
  • 65 Behavioralpost-dispatch sequence observer
  • 60 PolicyYAML allow, deny, escalate, log-only
  • 20 Auditdecision trail and diagnostics
create_default_chain()

Why Scabbard comes early

Scabbard normalizes text, extracts multi-signal features, and classifies before pattern scanners run. A high-confidence block can stop the tool call before later layers spend time on detailed checks.

scabbard/scabbard.py

Why policy comes late

Policy needs context from earlier layers: taint sources, command safety, scanner findings, route decisions, and risk scores. It is the resolver, not the first sensor.

policy/engine.py

Scanner system

Heuristics cover the surfaces models miss.

The scanner layer is deliberately broad. It handles text, commands, output, binary file bytes, document metadata, URLs, encoded carriers, and multimodal payloads without requiring a remote API.

Prompt injection

Instruction override, role hijack, delimiter escape, prompt leak attempts, encoded payloads, multilingual forms, and persona manipulation.

scanner/injection.py

Command risk

Dangerous terminal intent: destructive deletes, reverse shells, pipe-to-shell, privilege escalation, container escape, mining, and network scanning.

scanner/commands.py

Secrets

API keys, JWTs, private keys, database URLs, high-entropy blobs, and encoded or chunked secret material.

scanner/secrets.py

Structural documents

HTML hidden text, Markdown tricks, PDF layers and JavaScript, OOXML metadata, SVG payloads, image text chunks, and steganography indicators.

scanner/structural.py

RAG and MCP

Retrieved-document prompt injection, context manipulation, tool hijack, poisoned embedding text, MCP schema drift, and hidden tool instructions.

scanner/rag_injection.py

Unicode and content

Bidi overrides, zero-width text, homoglyphs, mixed scripts, ANSI injection, homograph URLs, markdown exfil paths, and suspicious browser content.

scanner/unicode.py

Scabbard classifier

The ML path has fallbacks by design.

Public checkouts work without bundled private model files. When optional artifacts are present, Scabbard can use them. When they are missing, it still falls back to lightweight signals instead of making the whole runtime unusable.

Normalize

Transforms suspicious text into a consistent view and records anomaly flags that can boost downstream risk.

scabbard/normalizer.py

Extract features

Combines intent divergence, centroid similarity, perplexity shifts, n-gram signals, encoding flags, and optional retrieval features.

scabbard/feature_extractor.py

Classify

Uses the configured Katana classifier artifact, legacy DeBERTa, or lightweight fusion depending on runtime profile.

scabbard/fusion.py

Policy resolver

YAML decides what the tool can do.

Built-in presets come from policies/*.yaml. The engine sorts policies by priority, matches the tool name by glob, evaluates every condition in a rule, and returns the first matching action.

Action Runtime effect Typical reason
allow The middleware chain may continue to execution. Clean route, trusted or acceptable source, and no blocking scanner/policy condition.
deny Execution stops before the tool runs. Dangerous command, secret exfil path, tainted critical sink, or explicit max/balanced rule.
escalate The call is marked for review instead of silently trusted. Unknown or sensitive route where the right answer depends on operator intent.
log_only The event is recorded while execution is allowed. Permissive workflows where visibility is wanted but blocking would be too disruptive.

Commands and pipeline reference

Run the system from the outside in.

These commands are the public control surface. The notes under each command explain which part of the runtime pipeline it exercises, so a new user can connect CLI behavior to the internals shown above.

Install and inspect

  • pip install hermes-katana installs the base package without large local model artifacts.
  • katana doctor checks Python, optional dependencies, paths, policy files, and runtime readiness.
  • katana status shows the local security configuration and active environment state.
  • katana version prints the installed package version for support and troubleshooting.

Policy and scanner gates

  • katana policy list shows available presets and the currently selected policy set.
  • katana policy use balanced selects a runtime preset from the YAML policy source of truth.
  • katana scan TEXT runs prompt-injection, secret, Unicode, content, and structural checks on text.
  • katana scan-command CMD evaluates terminal intent before a command reaches execution.

Secrets, audit, and egress

  • katana vault set NAME stores a secret handle outside prompt text.
  • katana vault list confirms which handles exist without printing secret values.
  • katana audit show reviews allow, deny, escalate, and log-only decisions.
  • katana proxy start starts optional HTTPS egress inspection when proxy dependencies are installed.

Artifacts and Hermes integration

  • katana artifacts status --all reports which optional local model artifacts are present.
  • katana setup --yes downloads the default fast CPU artifact when wanted.
  • katana install --target PATH patches a compatible Hermes checkout.
  • katana run --target PATH -- ... runs Hermes through the Katana-protected dispatch path.

Text path

Input enters through CLI, plugin, tool output, file, web, MCP, or retrieved content. Taint labels preserve source context so the same text can be treated differently by origin.

Decision path

Normalizers expose hidden payloads, Scabbard and scanners add risk signals, policy resolves the action, and middleware either blocks, escalates, logs, or allows dispatch.

Evidence path

Vault keeps secrets out of prompts, proxy can inspect egress, audit records decisions, and optional artifacts add local classifier capability without bloating the base install.