Building an Attack Chain Engine: Cross-Product Vulnerability Correlation

The Problem: Siloed Findings, Missed Chains

Every security team runs multiple tools. A network scanner finds open ports and internal services. A DAST scanner finds injection points in web parameters. A SAST tool flags dangerous sinks in source code. A proxy captures live HTTP traffic with session tokens and cookie flags.

Each tool produces a list of findings. Each finding exists in isolation. And isolation is where severity miscalculation lives.

Consider a concrete scenario: your DAST scanner reports a Medium-severity SSRF in a url query parameter. It can make outbound HTTP requests. Medium seems right — it is a server-side request forgery with limited observed impact. The finding goes into the backlog behind twenty other items.

Meanwhile, your network scanner found Redis on port 6379, two internal services on 10.x addresses, and an AWS EC2 metadata endpoint responding on 169.254.169.254. These are separate findings in a separate report. Nobody connects them to the SSRF.

That SSRF is not Medium. It is a Critical path to the cloud metadata service, internal Redis (likely unauthenticated), and lateral movement across the internal network. The attack chain is obvious to a senior penetration tester who reads both reports. It is invisible to every tool that produced them.

This is the problem an attack chain engine solves: correlating findings across tool boundaries to construct the exploitation narrative that individual tools cannot see.

Architecture: A Shared Context Store

The key architectural insight is that correlation is not a feature you bolt onto individual scanners. It requires a shared data layer where every tool deposits structured context about a target, and a correlation engine that joins that context against incoming findings.

Data Flow: Four Products → One Context Store → Correlated Findings

Network Scanner
ports, services, IPs → Context Store
ChromaDB + metadata ← SAST Scanner
sinks, data flows, files

DAST Scanner
URLs, params, WAF → Correlation Engine
rules + chain builder ← Traffic Proxy
cookies, auth, headers

Each tool contributes a different context type:

Network context: Open ports, running services (with versions), internal IP addresses (RFC 1918, link-local, localhost), detected OS and technology stack.
Surface context: Discovered URLs, mapped parameters (name, type, location), WAF detection status, web technologies, and response headers.
Code context: Vulnerable file paths, dangerous sinks (exec, eval, cursor.execute, requests.get), taint data flows from source to sink, and absence of sanitizers.
Traffic context: Observed parameters in live requests, cookie attributes (HttpOnly, Secure flags), authentication patterns (Bearer tokens, session cookies), and request patterns.

The context store is a vector database (ChromaDB) with heavy metadata usage. Each ingested document carries a target_id, context_type, evidence_type, and source tag. The target_id is the join key — a normalized version of the target hostname or IP that strips protocol, default ports, paths, and www prefixes.

Target Normalization: Getting the Join Key Right

The correlation engine is only as good as its ability to recognize that https://www.Example.COM:443/api/proxy and example.com are the same target. Normalization strips protocol prefixes, removes default ports (80 for HTTP, 443 for HTTPS), lowercases the hostname, strips www., and discards paths. Non-default ports are preserved: http://10.0.0.5:8080/ normalizes to 10.0.0.5:8080.

This is deceptively important. A single normalization bug means an entire product's context fails to join with findings from other products. Every ingestion call and every correlation query passes through the same normalize_target() function to guarantee consistency.

Idempotent Ingestion

Security tools run repeatedly against the same targets. The context store must handle re-ingestion without creating duplicates. Each context document gets a deterministic ID computed from its content:

doc_id = f"ctx-{target_id}-{context_type}-{evidence_type}-{sha256(data)[:12]}"

Ingestion uses upsert semantics. If the same tool reports the same data for the same target, the existing document is updated rather than duplicated. If the data changes (a new port opens, a new parameter is discovered), the SHA256 suffix produces a different ID and a new document is created alongside the existing one. The context store grows monotonically with new intelligence, never with redundant repetition.

The Correlation Algorithm

When a finding arrives for correlation, the engine executes a seven-step pipeline.

Step 1: Fetch All Context for the Target

A metadata-filtered query retrieves every context document for the normalized target ID. This is not a vector similarity search — it is a direct metadata lookup. We want everything known about this target, regardless of semantic similarity to the finding's description.

# Metadata filter, not vector search
docs = store.list_by_metadata(
    where={"target_id": target_id},
    limit=500
)

In parallel, a semantic similarity search retrieves relevant general knowledge base documents (CVE records, exploit code, CWE references) for optional PoC generation:

# Vector similarity for KB enrichment
kb_docs = store.query(
    f"{title} {vuln_type} {cwe} {parameter}",
    n_results=10
)

Step 2: Group by Context Type

The retrieved documents are grouped into buckets by their context_type metadata: network, surface, code, traffic. Each bucket represents a different tool's perspective on the target.

Step 3: Extract Structured Evidence

Each context type has a dedicated extractor that pulls structured data from the raw document text using regex patterns and metadata fields:

Network extractor: Port numbers, service names, internal IPs (RFC 1918 ranges, 169.254.x.x, 127.0.0.1), technologies.
Surface extractor: URLs, parameter names, WAF status (special-case detection for "no waf" / "waf: none"), web frameworks.
Code extractor: A hardcoded list of dangerous sinks (exec, eval, system, subprocess, requests.get, urllib, os.popen, cursor.execute), taint flow indicators, file paths.
Traffic extractor: Parameters from metadata, cookie names and flag absence (HttpOnly, Secure), authentication patterns (Bearer, session cookie), request method/path pairs.

The extractors do not need to understand every document perfectly. They look for the specific evidence patterns that the escalation rules need. A network extractor that misses a service name is fine; one that misses a 169.254.169.254 address is a severity escalation failure.

Step 4: Evaluate Escalation Rules

This is where the correlation becomes actionable. The engine evaluates a set of declarative rules, each of which encodes a specific escalation pattern:

@dataclass
class CorrelationRule:
    name: str
    description: str
    vuln_types: list[str]       # Finding must match one of these
    required_context: list[str] # These context types must exist
    evidence_patterns: list[str] # Regex patterns to find in context docs
    escalated_severity: str
    confidence_boost: float
    reason_template: str

A rule fires when three conditions are met simultaneously:

Vulnerability match: The finding's vulnerability type or CWE matches one of the rule's trigger types (case-insensitive substring match).
Context presence: All required context types have at least one document in the grouped results.
Evidence match: At least one of the rule's regex patterns matches in the raw text of the required context documents.

When a rule fires, it contributes an escalated severity level and a confidence boost. Severity escalation is monotonic — once a finding is escalated to Critical, no subsequent rule can downgrade it. Confidence boosts accumulate additively, capped at 1.0.

The Ten Escalation Rules

Each rule represents a real-world exploitation pattern where cross-tool context changes the severity calculus. These are not theoretical — they come from patterns observed in penetration testing engagements.

Rule	Trigger	Required Context	Evidence Patterns	Escalation
SSRF + Internal Services	SSRF, CWE-918	network	redis, memcache, elasticsearch, internal IPs	Critical (+0.30)
SSRF/RCE + Cloud Metadata	SSRF, RCE, CWE-918, CWE-78	network	169.254.169.254, metadata.google, iam.*credential	Critical (+0.35)
SQLi + Traffic Confirmed	SQLi, CWE-89	traffic	parameter, user input, query string, form data	High (+0.25)
SQLi + DB Credentials in Code	SQLi, CWE-89	code	password, db_pass, connection string, credential	Critical (+0.30)
RCE + No WAF	RCE, command/code injection, CWE-78, CWE-94	surface	no waf, waf none, waf not detected, unprotected	Critical (+0.30)
Auth Bypass + Admin Panels	Auth bypass, IDOR, CWE-287, CWE-306, CWE-639	surface	/admin, /dashboard, /manage, /console, privileged	Critical (+0.30)
XSS + Session Theft	XSS, CWE-79	traffic	httponly false, no httponly, session token, set-cookie without httponly	High (+0.20)
File Upload + Server-Side Execution	File upload, CWE-434	code	exec, eval, system, subprocess, include, .php, .jsp	Critical (+0.30)
LFI + Sensitive Files	LFI, path traversal, CWE-22, CWE-98	network	/etc/passwd, .env, config, .git, wp-config	Critical (+0.25)
Multi-Product Corroboration	(any vulnerability type)	(2+ distinct sources)	(none — source count is the evidence)	unchanged (+0.20)

Rule 10: Why Multi-Product Corroboration Matters

The tenth rule is structurally different from the others. It does not match on vulnerability type at all. Instead, it counts the distinct source values across all context documents for the target. When two or more different tools have contributed context, this rule fires and adds a +0.20 confidence boost.

The reasoning is simple: if a network scanner, a DAST tool, and a SAST tool all have observations about the same target, the aggregate picture is more reliable than any single tool's perspective. False positives from one tool are unlikely to be corroborated by independent analysis from another. Multi-source convergence is, in practice, one of the strongest indicators that a finding is real.

Confidence Scoring

Confidence starts from a severity-derived seed:

SEVERITY_SEEDS = {
    "Critical": 0.50,
    "High":     0.40,
    "Medium":   0.30,
    "Low":      0.20,
    "Info":     0.10,
}

Each matched rule adds its confidence_boost. A Medium SSRF (seed: 0.30) that triggers both ssrf_internal_services (+0.30) and ssrf_cloud_metadata (+0.35) with multi-product corroboration (+0.20) reaches a confidence of 1.0 — a fully corroborated Critical finding with evidence from multiple independent sources.

Attack Chain Construction

After rule evaluation, the engine builds an ordered attack chain — a narrative describing how the vulnerability would be exploited in stages, mapping each step to the tool that provided the evidence.

Attack Chain: SSRF → Cloud Metadata → Lateral Movement

1. Exploit SSRF
DAST finding → 2. Code path confirmed
SAST: requests.get(url) → 3. Surface mapped
DAST: no WAF detected → 4. Pivot to internal
Netscan: Redis :6379, 10.x → 5. Traffic validation
Proxy: Bearer token

The chain is built from five potential steps, each gated on whether the corresponding context type has data:

Exploit the vulnerability (always present) — the finding itself, attributed to the scanner that reported it.
Code path confirmed (if code context exists) — the specific vulnerable sink and file path from static analysis.
Attack surface mapped (if surface context exists) — discovered parameters, URLs, and WAF status from web reconnaissance.
Pivot to internal infrastructure (if network context exists) — internal services and IPs reachable via the vulnerability.
Validate in live traffic (if traffic context exists) — observed authentication patterns and cookie configurations.

A finding that only has DAST data produces a single-step chain. A finding with context from all four tool categories produces a five-step exploitation narrative. The chain length itself is a signal — longer chains with more corroborating evidence represent higher-confidence, higher-impact findings.

PoC Generation from Correlated Evidence

With the attack chain built and all corroborating evidence assembled, the engine can optionally generate a proof-of-concept exploitation script. This is where the correlation data becomes operationally dangerous (and operationally useful for defenders).

The PoC context includes every concrete detail extracted during correlation:

{
    "target": "example.com",
    "vuln_type": "ssrf",
    "parameter": "url",
    "internal_ips": ["10.0.0.5", "10.0.0.12"],
    "services": [
        {"name": "redis", "port": 6379},
        {"name": "elasticsearch", "port": 9200}
    ],
    "open_ports": [22, 80, 443, 6379, 9200],
    "discovered_urls": ["/api/proxy?url=", "/admin/config"],
    "mapped_parameters": ["url", "callback", "redirect"],
    "waf": "none",
    "vulnerable_files": ["proxy.py"],
    "vulnerable_sinks": ["requests.get"],
    "auth_type": "bearer_token",
    "request_patterns": ["GET /api/proxy"]
}

The LLM receives this structured context with instructions to produce a staged exploitation script that uses only the actual discovered IPs, services, and parameters — not hypothetical or placeholder values. Each line in the PoC is commented with the data source that justified it.

Why this matters for defenders: A PoC generated from correlated evidence across four tools is fundamentally different from one generated by a single scanner. It demonstrates the full exploitation path — from initial entry point through lateral movement — using real infrastructure details. This is the artifact that turns a security report into an executive conversation about risk.

The API Surface

The correlation engine exposes three endpoints that form a complete ingest-correlate-query cycle.

Ingest Context

POST /ingest/context
{
    "target": "https://example.com",
    "context_type": "network",
    "evidence_type": "port_scan",
    "source": "sn1per",
    "data": {
        "ports": [22, 80, 443, 6379],
        "services": [{"name": "redis", "port": 6379}],
        "internal_ips": ["10.0.0.5"]
    }
}

Each tool calls this endpoint after completing its scan phase. Bulk ingestion is supported via POST /ingest/context/bulk for tools that produce many context items at once.

Correlate a Finding

POST /correlate
{
    "target": "https://example.com",
    "vuln_type": "ssrf",
    "title": "Server-Side Request Forgery in proxy endpoint",
    "severity": "Medium",
    "cwe": "CWE-918",
    "parameter": "url",
    "generate_poc": true
}

Returns the full correlation result: original and escalated severity, matched rules with human-readable reasons, confidence score, attack chain steps, corroborating evidence summaries, and the optional generated PoC.

Query Target Context

GET /correlate/context/example.com

{
    "target_id": "example.com",
    "context_types": {"network": 3, "surface": 2, "code": 1, "traffic": 1},
    "total_docs": 7,
    "context": {
        "network": [{"source": "sn1per", "evidence_type": "port_scan", ...}],
        "surface": [{"source": "silentchain-enterprise", ...}],
        ...
    }
}

This diagnostic endpoint shows everything the engine knows about a target before correlation runs. Useful for debugging why a rule did or did not fire.

Production Patterns and Lessons

Rule Design: Specificity Over Coverage

Early versions of the rule set tried to be comprehensive — a rule for every possible combination of vulnerability type and context. This produced noisy escalations. A "SQL injection + any network context" rule fired constantly because network context exists for almost every target. The escalations were technically defensible but practically useless.

The current ten rules are specific. sqli_db_credentials does not fire on "SQL injection + code context exists." It fires on "SQL injection + code context contains strings matching password|db_pass|connection.*string|credential." The rule encodes a specific threat model: SQL injection is Critical when the code base contains database credentials that the injection could exfiltrate.

Monotonic Severity Escalation

Severity only goes up, never down. If rule A escalates a finding to Critical and rule B would suggest High, the finding stays Critical. This is a deliberate design choice: correlation adds evidence, and evidence of exploitability should never reduce the assessed risk. In a ranked severity space (Critical=0, High=1, Medium=2, Low=3, Info=4), the engine always picks the lower ordinal.

Metadata Filtering vs. Vector Search

The correlation engine deliberately avoids vector similarity search for context retrieval. When correlating an SSRF finding, we do not want "documents semantically similar to SSRF" — we want every document about this target, regardless of topic. A Redis service on port 6379 is not semantically similar to "server-side request forgery," but it is critical evidence for the SSRF + internal services escalation rule.

Vector search is reserved for KB enrichment (pulling relevant CVEs and exploit references), where semantic relevance is the right retrieval criterion. Context retrieval uses pure metadata filtering on target_id.

Cross-Product Trust and the Corroboration Signal

The multi-product corroboration rule addresses a fundamental problem in automated scanning: false positive rates. A single tool reporting a vulnerability has a false positive rate determined by that tool's accuracy. Two independent tools reporting observations about the same target create a corroboration signal that is stronger than either tool alone.

This is not the same as two tools finding the same vulnerability. It is two tools providing different types of evidence about the same target — one finding the vulnerability, another confirming the infrastructure conditions that make it exploitable. The correlation engine exploits the independence of different tool types to increase confidence without reducing coverage.

What This Changes for Security Teams

Without correlation, security teams triage findings in isolation. A backlog of 200 Medium findings produces analysis paralysis. With correlation, those 200 findings are automatically re-evaluated against infrastructure, code, and traffic context. The twelve findings that have corroborating evidence from multiple sources and match known escalation patterns surface at the top as Critical or High, with attack chains explaining exactly why.

The attack chain is the deliverable. It is not "we found an SSRF." It is "we found an SSRF in the url parameter of /api/proxy, confirmed the code path (requests.get(url) at line 42 in proxy.py with no sanitization), verified no WAF is protecting the endpoint, and confirmed that Redis on 10.0.0.5:6379 and the AWS metadata service at 169.254.169.254 are reachable from the server. Here is the three-stage PoC."

That finding gets fixed before lunch.

Cross-Product Correlation, Built In

The SILENTCHAIN platform includes the correlation engine described in this post, wired across DAST (Enterprise), SAST (SOURCE), network reconnaissance (Sn1per), and traffic analysis (Pro). Every finding is automatically correlated, escalated, and chain-mapped against all available context. No integration work required.

Learn More

← Back to all posts