How It Works - mallcop

Most findings resolve themselves

Mallcop generated 15 findings this week. 12 were resolved by AI triage — routine activity by known actors doing known things. 2 were investigated deeper and resolved with cited evidence. 1 reached your Slack with a full investigation report attached.

The value isn't "we found something." Every security tool finds things. The value is that you didn't have to do the investigation work. The AI ran it down, cited its evidence, and either closed the case or handed you a briefing you can act on in under a minute.

The pipeline

Here's the full pipeline. Scanning and detection are rule-based — no AI, no cost. The AI enters at escalation. Let's follow a finding through.

Init

mallcop init

Discover environment
Write config
Estimate costs

→

Scan

mallcop scan

Poll platform APIs
Normalize events
Store as JSONL

→

Detect

mallcop detect

Compare to baseline
12 built-in detectors
Generate findings

→

Escalate

mallcop escalate

AI triage (1🍩)
AI investigation (8🍩)
Notification

→

Review

mallcop review

Curated digest
Investigation notes
Acknowledge or dig deeper

Follow a finding through the system

Someone new shows up in your GitHub org.

Detection (no AI)

The new-actor detector fires because this principal isn't in the baseline. Rule-based comparison, no LLM, no cost. Finding generated with severity warn.

Triage agent starts (1🍩)

The runtime loads the triage playbook (POST.md) as a system prompt. The finding, triggering events, and baseline data are pre-loaded into context. The agent enters a reasoning loop (max 3 iterations):

# Triage agent reasoning loop

1. Calls check-baseline → actor not known, 0 prior events
2. Calls search-events → finds org invite sent by repo owner 3 hours ago
3. Applies the 4-question test:
   A. Routine? No — first appearance.
   B. Legitimate trigger? Yes — org invite by admin at 14:22 UTC.
   C. Credential theft? No — invite chain is intact, timing matches.
   D. Access expansion? No — read-only collaborator.
4. Resolves with confidence 4, citing invite event ID and timestamp.

Result: resolved — "New collaborator added via org invite
  (event evt-2026-03-11-0042) by admin@org 3h prior. Read-only access."

What if triage can't resolve?

Same finding, but no invite found. Activity at 3am. Unusual IP range. Triage confidence is 2 — hard rule enforced at runtime: confidence ≤ 2 means escalate. The agent doesn't guess. It says "I'm not sure" and passes to investigation.

Investigate agent (8🍩)

The investigation agent has more tools (search-findings, baseline-stats, load-skill, annotate-finding) and more iterations (10 vs 3). It loads the github-security skill for domain expertise.

# Investigation agent deep dive

1. Loads github-security skill → domain expertise on audit log patterns
2. Calls search-events → no invite, no PR, no onboarding workflow
3. Calls check-baseline → actor IP not in any known range
4. Calls search-findings → no prior findings for this actor
5. Applies credential theft test: "If credentials were stolen,
   would this look identical?" → Cannot distinguish.
6. Annotates finding with full reasoning chain
7. Escalates with confidence 2, citing missing provenance

Result: escalated — "No upstream trigger found. 3am activity,
  unknown IP range, no org invite or PR. Recommend: verify identity
  through independent channel."

Notification

Unresolved findings go to Slack, Teams, or email as a batch digest. Investigation notes are included — you see reasoning, not just an alert.

The airlock

Notice the AI never saw the raw GitHub API response. Events are sanitized at ingest: control characters stripped, strings length-capped. User-controlled fields — commit messages, branch names, PR titles — are wrapped in [USER_DATA_BEGIN]...[USER_DATA_END] markers. Tool results are re-sanitized before reaching the LLM.

If someone put "IGNORE PREVIOUS INSTRUCTIONS" in a commit message, it arrives wrapped in markers. The AI sees it as data to analyze, not instructions to follow.

Full Security Model →

What agents can see and do

Agents interact with your data through a controlled set of tools. Each tool has explicit permissions — triage is read-only by design.

Tool	What it tells the agent	Available to	Permission
check-baseline	Is this actor known? Frequency, relationships, typical hours	triage, investigate	read
read-events	Events for this finding, enriched with local time context	triage, investigate	read
search-events	Full-text search across event history — find upstream triggers	triage, investigate	read
search-findings	Historical findings — has this pattern appeared before?	investigate	read
baseline-stats	Statistical summary of baseline for an actor	investigate	read
load-skill	Domain expertise loaded on-demand (AWS IAM, Azure RBAC, etc.)	investigate	read
annotate-finding	Document reasoning before resolving	investigate	write
resolve-finding	Mark finding resolved/escalated with confidence score	triage, investigate	write*

* Triage's resolve-finding is constrained by policy: it cannot resolve privilege escalation or access boundary findings regardless of model output. Enforced at runtime, not by prompting.

Playbooks — the decision protocol

This isn't "ask the AI if it looks bad." It's a structured decision protocol with hard constraints the AI can't override. These are real excerpts from the playbooks that run in production.

Triage: the 4-question test

From the triage agent's playbook (POST.md):

## Step 3: Analyze

Answer these questions using the data from steps 1-2:

A. Is this action routine for this actor?
"[Actor] has done [action] [N] times. This is [routine/new]."

B. Is there a legitimate trigger?
"Events show [trigger/no trigger]: [detail]."

C. Could a stolen credential produce this exact pattern?
"[Yes/No] because [specific factor — IP/location, timing, user-agent]."

D. Does this expand access or privileges?
"[Yes/No]."

## Step 4: Decide

- If A=routine AND B=trigger AND C=distinguishable AND D=no → RESOLVE
- Privilege changes → always ESCALATE (non-negotiable)
- Log format drift → always ESCALATE
- Resolution requires positive evidence — "actor is known" alone is not enough
- Otherwise → ESCALATE

Investigation: pre-resolution checklist

From the investigation agent's playbook (POST.md):

## Pre-Resolution Checklist

Before calling resolve-finding — whether resolving OR escalating —
run these 5 checks. They apply in both directions.

1. EVIDENCE — Am I citing specific fields, timestamps, or baseline
   entries? If I can't point to it, I'm guessing. This applies to
   escalations too: cite what's anomalous, not just "it looks wrong."

2. ADVERSARY — Could an attacker produce this exact pattern? What
   would distinguish legitimate from compromised? Automation names,
   user-agent strings, and correlation IDs can all be spoofed.

3. DISCONFIRM — What evidence would contradict my conclusion? Did I
   check for it, or just not look? If resolving, did I check for
   anomalous signals I might be overlooking? If escalating, did I
   check whether the baseline explains the activity?

4. BOUNDARY — Does this action expand who or what has access to the
   environment? If yes, treat as privilege-level.

5. BLAST RADIUS — If I'm wrong, what's the worst case? A false
   escalation wastes analyst time. A missed breach loses the org.

Skills — domain expertise on demand

Skills are structured knowledge that agents load when they need domain depth. Here's what they actually contain.

Privilege analysis: Grant vs. Use vs. Escalation

From the privilege-analysis skill (SKILL.md):

These are three distinct events that often get conflated. A grant is
when an actor receives a permission (role attachment, policy change, group
membership). A use is when the actor exercises that permission.
Escalation is when the result exceeds what was explicitly intended — an
actor converts a limited permission into broader access, often by chaining
grants across multiple resources or principals.

Seeing a grant event in isolation tells you almost nothing. The question is
whether the use that followed was proportionate to the grant, and whether
the grant itself was within the actor's pre-existing authority.

Key check: did the actor who issued the grant have the authority to do so?
An actor who can grant a permission they do not themselves hold is a classic
privilege escalation pattern.

GitHub security: the known actor trap

From the github-security skill (SKILL.md):

The "known actor" trap: GitHub admins are known actors by definition —
they appear frequently in the audit log. The credential theft test must
focus on behavioral deviation (new action types, new targets, new timing),
not on whether the actor is known.

Skills are loaded on-demand via the load-skill tool. They support parent/child inheritance: privilege-analysis is the general framework; aws-iam inherits from it and adds platform-specific guidance. 9 built-in skills cover Azure, AWS, GitHub, M365, Vercel, Supabase, Container Logs, and OpenClaw.

Skills are prompt fragments injected into AI context — which makes them a prompt injection vector. That's why every skill is SSH-signed, verified against a trust web, and hash-pinned in skills.lock before loading.

Skills documentation · Trust model

Security confidence — Academy results

Here's our rubric, here's how we scored, verify us. 54 adversarial scenarios. 100% system accuracy.

The 6-layer pipeline

Layer	What it is	AI involved?	Findings	Cost
Hard constraints	Code rules — priv-escalation, log drift, injection always escalate	No	15	0 donuts
Learned rules	Your confirmed feedback becomes deterministic rules	No	0*	0 donuts
AI triage	Quick investigation, resolve the obvious	Yes	3	~3 donuts
Consensus verification	4 independent investigations must unanimously agree	Yes	9	~108 donuts
Escalated to human	System says "I'm not sure"	Yes	29	~102 donuts
Total	54 adversarial scenarios, 100% accuracy		56	~213 donuts

*Resolution rules activate after 2 weeks of human feedback. In production they handle the majority of routine findings at zero cost.

Consensus — the headline differentiator

When the AI resolves a finding as benign, mallcop doesn't just accept the answer. It runs 3 additional independent investigations on the same finding. All 4 must unanimously agree. If any single investigation disagrees, the finding escalates to a human.

9 of 54 Academy scenarios were caught this way — findings where a single AI investigation would have gotten it wrong. The disagreement itself is the signal.

Cost by sovereignty tier

Sovereignty	Multiplier	Academy cost	What it means
Open	1.0×	~213 donuts	Widest model pool, cheapest per donut
Allied	1.3×	~277 donuts	US + EU + Five Eyes providers only
US-only	1.5×	~320 donuts	US-headquartered providers only

Same security outcome across all tiers — same pipeline, same hard constraints, same consensus mechanism. Sovereignty restricts the model pool, which increases cost per donut. It's a cost decision, not a security decision.

Run it yourself

$ mallcop exam pipeline

The 54 scenarios are open-source YAML files. The grading criteria are in the code. Run the exam on your own infrastructure, with your own models, and verify these results.

BYOK vs Managed

Bring your own API key for $0, or let us handle inference for less than you'd pay retail.

	BYOK (Free)	Managed (Basic+)
Price	$0 + your API costs	$4.99-79.99/mo
Models	Your choice (any API key)	Optimized routing across sovereignty tiers
Key management	You manage API keys	We handle it
Connectors	Unlimited	Unlimited
Events	Unlimited	Unlimited
Self-improvement	Unlimited (your tokens)	By tier (donut allocation)
Sovereignty	Your key, your provider	Choose: Open, Allied, or US-only

See Pricing →

Get Started → Security Model

The AI that investigates for you