Most security tools send you alerts. Mallcop sends you answers.
Here's what happens between mallcop watch and "all clear."
Mallcop generated 15 findings this week. 12 were resolved by AI triage — routine activity by known actors doing known things. 2 were investigated deeper and resolved with cited evidence. 1 reached your Slack with a full investigation report attached.
The value isn't "we found something." Every security tool finds things. The value is that you didn't have to do the investigation work. The AI ran it down, cited its evidence, and either closed the case or handed you a briefing you can act on in under a minute.
Here's the full pipeline. Scanning and detection are rule-based — no AI, no cost. The AI enters at escalation. Let's follow a finding through.
Someone new shows up in your GitHub org.
The new-actor detector fires because this principal isn't in the baseline.
Rule-based comparison, no LLM, no cost. Finding generated with severity warn.
The runtime loads the triage playbook (POST.md) as a system prompt. The finding,
triggering events, and baseline data are pre-loaded into context. The agent enters a reasoning
loop (max 3 iterations):
# Triage agent reasoning loop 1. Calls check-baseline → actor not known, 0 prior events 2. Calls search-events → finds org invite sent by repo owner 3 hours ago 3. Applies the 4-question test: A. Routine? No — first appearance. B. Legitimate trigger? Yes — org invite by admin at 14:22 UTC. C. Credential theft? No — invite chain is intact, timing matches. D. Access expansion? No — read-only collaborator. 4. Resolves with confidence 4, citing invite event ID and timestamp. Result: resolved — "New collaborator added via org invite (event evt-2026-03-11-0042) by admin@org 3h prior. Read-only access."
Same finding, but no invite found. Activity at 3am. Unusual IP range. Triage confidence is 2 — hard rule enforced at runtime: confidence ≤ 2 means escalate. The agent doesn't guess. It says "I'm not sure" and passes to investigation.
The investigation agent has more tools (search-findings, baseline-stats,
load-skill, annotate-finding) and more iterations (10 vs 3).
It loads the github-security skill for domain expertise.
# Investigation agent deep dive 1. Loads github-security skill → domain expertise on audit log patterns 2. Calls search-events → no invite, no PR, no onboarding workflow 3. Calls check-baseline → actor IP not in any known range 4. Calls search-findings → no prior findings for this actor 5. Applies credential theft test: "If credentials were stolen, would this look identical?" → Cannot distinguish. 6. Annotates finding with full reasoning chain 7. Escalates with confidence 2, citing missing provenance Result: escalated — "No upstream trigger found. 3am activity, unknown IP range, no org invite or PR. Recommend: verify identity through independent channel."
Unresolved findings go to Slack, Teams, or email as a batch digest. Investigation notes are included — you see reasoning, not just an alert.
Notice the AI never saw the raw GitHub API response. Events are sanitized at ingest:
control characters stripped, strings length-capped. User-controlled fields — commit messages,
branch names, PR titles — are wrapped in [USER_DATA_BEGIN]...[USER_DATA_END] markers.
Tool results are re-sanitized before reaching the LLM.
If someone put "IGNORE PREVIOUS INSTRUCTIONS" in a commit message, it arrives wrapped in markers. The AI sees it as data to analyze, not instructions to follow.
Agents interact with your data through a controlled set of tools. Each tool has explicit permissions — triage is read-only by design.
| Tool | What it tells the agent | Available to | Permission |
|---|---|---|---|
| check-baseline | Is this actor known? Frequency, relationships, typical hours | triage, investigate | read |
| read-events | Events for this finding, enriched with local time context | triage, investigate | read |
| search-events | Full-text search across event history — find upstream triggers | triage, investigate | read |
| search-findings | Historical findings — has this pattern appeared before? | investigate | read |
| baseline-stats | Statistical summary of baseline for an actor | investigate | read |
| load-skill | Domain expertise loaded on-demand (AWS IAM, Azure RBAC, etc.) | investigate | read |
| annotate-finding | Document reasoning before resolving | investigate | write |
| resolve-finding | Mark finding resolved/escalated with confidence score | triage, investigate | write* |
* Triage's resolve-finding is constrained by policy: it cannot resolve privilege escalation or access boundary findings regardless of model output. Enforced at runtime, not by prompting.
This isn't "ask the AI if it looks bad." It's a structured decision protocol with hard constraints the AI can't override. These are real excerpts from the playbooks that run in production.
From the triage agent's playbook (POST.md):
## Step 3: Analyze Answer these questions using the data from steps 1-2: A. Is this action routine for this actor? "[Actor] has done [action] [N] times. This is [routine/new]." B. Is there a legitimate trigger? "Events show [trigger/no trigger]: [detail]." C. Could a stolen credential produce this exact pattern? "[Yes/No] because [specific factor — IP/location, timing, user-agent]." D. Does this expand access or privileges? "[Yes/No]." ## Step 4: Decide - If A=routine AND B=trigger AND C=distinguishable AND D=no → RESOLVE - Privilege changes → always ESCALATE (non-negotiable) - Log format drift → always ESCALATE - Resolution requires positive evidence — "actor is known" alone is not enough - Otherwise → ESCALATE
From the investigation agent's playbook (POST.md):
## Pre-Resolution Checklist Before calling resolve-finding — whether resolving OR escalating — run these 5 checks. They apply in both directions. 1. EVIDENCE — Am I citing specific fields, timestamps, or baseline entries? If I can't point to it, I'm guessing. This applies to escalations too: cite what's anomalous, not just "it looks wrong." 2. ADVERSARY — Could an attacker produce this exact pattern? What would distinguish legitimate from compromised? Automation names, user-agent strings, and correlation IDs can all be spoofed. 3. DISCONFIRM — What evidence would contradict my conclusion? Did I check for it, or just not look? If resolving, did I check for anomalous signals I might be overlooking? If escalating, did I check whether the baseline explains the activity? 4. BOUNDARY — Does this action expand who or what has access to the environment? If yes, treat as privilege-level. 5. BLAST RADIUS — If I'm wrong, what's the worst case? A false escalation wastes analyst time. A missed breach loses the org.
Skills are structured knowledge that agents load when they need domain depth. Here's what they actually contain.
From the privilege-analysis skill (SKILL.md):
These are three distinct events that often get conflated. A grant is when an actor receives a permission (role attachment, policy change, group membership). A use is when the actor exercises that permission. Escalation is when the result exceeds what was explicitly intended — an actor converts a limited permission into broader access, often by chaining grants across multiple resources or principals. Seeing a grant event in isolation tells you almost nothing. The question is whether the use that followed was proportionate to the grant, and whether the grant itself was within the actor's pre-existing authority. Key check: did the actor who issued the grant have the authority to do so? An actor who can grant a permission they do not themselves hold is a classic privilege escalation pattern.
From the github-security skill (SKILL.md):
The "known actor" trap: GitHub admins are known actors by definition —
they appear frequently in the audit log. The credential theft test must
focus on behavioral deviation (new action types, new targets, new timing),
not on whether the actor is known.
Skills are loaded on-demand via the load-skill tool. They support parent/child
inheritance: privilege-analysis is the general framework; aws-iam inherits
from it and adds platform-specific guidance. 9 built-in skills cover Azure, AWS, GitHub,
M365, Vercel, Supabase, Container Logs, and OpenClaw.
Skills are prompt fragments injected into AI context — which makes them a prompt injection
vector. That's why every skill is SSH-signed, verified against a trust web, and hash-pinned
in skills.lock before loading.
Here's our rubric, here's how we scored, verify us. 54 adversarial scenarios. 100% system accuracy.
| Layer | What it is | AI involved? | Findings | Cost |
|---|---|---|---|---|
| Hard constraints | Code rules — priv-escalation, log drift, injection always escalate | No | 15 | 0 donuts |
| Learned rules | Your confirmed feedback becomes deterministic rules | No | 0* | 0 donuts |
| AI triage | Quick investigation, resolve the obvious | Yes | 3 | ~3 donuts |
| Consensus verification | 4 independent investigations must unanimously agree | Yes | 9 | ~108 donuts |
| Escalated to human | System says "I'm not sure" | Yes | 29 | ~102 donuts |
| Total | 54 adversarial scenarios, 100% accuracy | 56 | ~213 donuts |
*Resolution rules activate after 2 weeks of human feedback. In production they handle the majority of routine findings at zero cost.
When the AI resolves a finding as benign, mallcop doesn't just accept the answer. It runs 3 additional independent investigations on the same finding. All 4 must unanimously agree. If any single investigation disagrees, the finding escalates to a human.
9 of 54 Academy scenarios were caught this way — findings where a single AI investigation would have gotten it wrong. The disagreement itself is the signal.
| Sovereignty | Multiplier | Academy cost | What it means |
|---|---|---|---|
| Open | 1.0× | ~213 donuts | Widest model pool, cheapest per donut |
| Allied | 1.3× | ~277 donuts | US + EU + Five Eyes providers only |
| US-only | 1.5× | ~320 donuts | US-headquartered providers only |
Same security outcome across all tiers — same pipeline, same hard constraints, same consensus mechanism. Sovereignty restricts the model pool, which increases cost per donut. It's a cost decision, not a security decision.
$ mallcop exam pipeline
The 54 scenarios are open-source YAML files. The grading criteria are in the code. Run the exam on your own infrastructure, with your own models, and verify these results.
Bring your own API key for $0, or let us handle inference for less than you'd pay retail.
| BYOK (Free) | Managed (Starter+) | |
|---|---|---|
| Price | $0 + your API costs | $4.99-79.99/mo |
| Models | Your choice (any API key) | Optimized routing across sovereignty tiers |
| Key management | You manage API keys | We handle it |
| Connectors | Unlimited | Unlimited |
| Events | Unlimited | Unlimited |
| Self-improvement | Unlimited (your tokens) | By tier (donut allocation) |
| Sovereignty | Your key, your provider | Choose: Open, Allied, or US-only |