fix: XSS context analyzer edge cases for issue #7086#7101
fix: XSS context analyzer edge cases for issue #7086#7101a638011 wants to merge 8 commits intoprojectdiscovery:devfrom
Conversation
Implements an XSS context analyzer that integrates with nuclei's fuzzing pipeline. The analyzer: - Injects a canary string with XSS-critical characters (<>"'/) to detect reflection and character survival in HTTP responses - Uses golang.org/x/net/html tokenizer to classify reflection context into 8 types: HTMLText, Attribute, AttributeUnquoted, Script, ScriptString, Style, HTMLComment, and None - Selects context-appropriate XSS payloads filtered by which special characters survive server-side encoding - Replays payloads and verifies unencoded reflection to confirm XSS - Reports CSP header presence as a note on exploitability New files: pkg/fuzz/analyzers/xss/types.go - context types and constants pkg/fuzz/analyzers/xss/context.go - HTML tokenizer context detection pkg/fuzz/analyzers/xss/analyzer.go - main analyzer with replay logic pkg/fuzz/analyzers/xss/context_test.go - comprehensive tests Modified files: pkg/fuzz/analyzers/analyzers.go - add response fields to Options pkg/protocols/http/http.go - register xss analyzer via import pkg/protocols/http/request.go - pass response data to analyzer pkg/protocols/http/request_fuzz.go - init nil Parameters map Closes projectdiscovery#5838
- Detect javascript: URI in attributes as ContextScript - Treat srcdoc attributes as HTML injection context (ContextHTMLText) - These fixes address issue projectdiscovery#7086
Neo - PR Security ReviewNo security issues found Highlights
Comment |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a new xss_context analyzer with canary-based reflection detection, HTML-context parsing, payload selection and replay verification; registers it for HTTP fuzzing. Also makes RNG thread-safe (mutex-protected), exports Changes
Sequence DiagramsequenceDiagram
participant Fuzzer as Fuzzer Engine
participant XSS as XSS Analyzer
participant Parser as HTML Parser
participant Selector as Payload Selector
participant Replayer as Request Replayer
participant Server as HTTP Server
Fuzzer->>XSS: ApplyInitialTransformation(payload, params)
XSS->>XSS: generateCanary() / store xss_canary
XSS-->>Fuzzer: transformed payload
Fuzzer->>Server: Send fuzzed request with canary
Server-->>Fuzzer: Return response (body, headers, status)
Fuzzer->>XSS: Analyze(options with ResponseBody/Headers/Status)
XSS->>Parser: DetectReflections(body, canary)
Parser-->>XSS: ReflectionInfo[]
XSS->>XSS: detectCharacterSurvival(body, canary)
XSS->>Selector: selectPayloads(reflection, chars)
Selector-->>XSS: payloads[]
loop replay payloads
XSS->>Replayer: replayAndVerify(payload, reflection)
Replayer->>Fuzzer: update component & rebuild request
Replayer->>Server: Send replay request
Server-->>Replayer: Return response
Replayer-->>XSS: result (found, details)
end
alt Vulnerability found
XSS-->>Fuzzer: Return (true, details)
else
XSS-->>Fuzzer: Return (false, "", nil)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/fuzz/analyzers/xss/analyzer.go`:
- Around line 60-64: The analyzer reads the canary from shared
AnalyzerParameters which can desync; instead extract it per-request from
options.FuzzGenerated.Value. Implement a small helper (e.g., extractCanary(value
string) string) that parses the final fuzzed reflection string for the canary
pattern (e.g., "nuclei" + 8 alpha chars) and replace the current lookup in
analyzer.go (the block using options.AnalyzerParameters["xss_canary"]) to call
extractCanary(options.FuzzGenerated.Value) and use that returned string for
downstream XSS detection logic.
- Around line 131-143: The component state is being reset only after Rebuild, so
if Rebuild fails the mutated value leaks; move the restore defer to immediately
after calling gr.Component.SetValue(gr.Key, payload) so the original is
guaranteed to be restored (use the existing gr.OriginalValue), then call
gr.Component.Rebuild(); also remove the later deferred restore after Rebuild to
avoid double-restores. Ensure references: gr.Component.SetValue, gr.Key,
gr.OriginalValue, and gr.Component.Rebuild.
In `@pkg/fuzz/analyzers/xss/context_test.go`:
- Around line 124-140: Add two explicit regression tests to cover the `#7086` edge
cases: create TestDetectReflections_JavascriptURI and
TestDetectReflections_Srcdoc that call DetectReflections with inputs containing
a javascript: URI and a srcdoc HTML fragment respectively (reuse testMarker like
TestDetectReflections_EventHandler), then assert that at least one reflection is
returned and that the reflection.Context equals the appropriate context enum
(e.g., ContextURI for javascript: and ContextSrcdoc for srcdoc) and any relevant
fields (e.g., AttrName or URL) match expected values to ensure the scanner
classifies these contexts correctly.
In `@pkg/fuzz/analyzers/xss/context.go`:
- Around line 81-94: The checks for "javascript:" URIs and "srcdoc" are nested
inside the isEventHandler(attrName) branch making them unreachable for non-event
attributes; refactor the logic in the function handling attrName/attrVal so
that: first, if isEventHandler(attrName) then set ctx = ContextScript; next,
separately check if
strings.HasPrefix(strings.ToLower(strings.TrimSpace(attrVal)), "javascript:")
and set ctx = ContextScript for attributes like href; and separately check if
attrName == "srcdoc" and set ctx = ContextHTMLText; ensure you update the same
ctx variable and remove the unreachable duplicate ctx assignments so
isEventHandler, the javascript: check, and the srcdoc check are independent.
ℹ️ Review info
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 66c8751e-7e23-4470-92aa-dd0cb94bb726
📒 Files selected for processing (8)
pkg/fuzz/analyzers/analyzers.gopkg/fuzz/analyzers/xss/analyzer.gopkg/fuzz/analyzers/xss/context.gopkg/fuzz/analyzers/xss/context_test.gopkg/fuzz/analyzers/xss/types.gopkg/protocols/http/http.gopkg/protocols/http/request.gopkg/protocols/http/request_fuzz.go
The javascript: URI and srcdoc checks were incorrectly nested inside the isEventHandler block, causing them to never execute for non-event handler attributes. Fixes: - Move javascript: URI check to check URL-context attributes (href, src, action, etc.) - Move srcdoc check outside isEventHandler block - Restructure event handler logic to only apply to actual event handlers Addresses Neo security review feedback.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
pkg/fuzz/analyzers/xss/context.go (2)
248-254: Whitespace around=in attributes not handled.HTML permits whitespace around the equals sign (
class = "foo"). The current search forattrName + "="(line 249) would miss these cases, falling back to the default double-quoted assumption.This is a minor robustness issue since:
- Such whitespace is uncommon in practice
- The reflection is still detected; only the quote character metadata may be inaccurate
♻️ Optional: Handle whitespace around equals
func detectAttrQuoting(rawToken, attrName string) (byte, bool) { - attrAssign := attrName + "=" rawLower := strings.ToLower(rawToken) - idx := strings.Index(rawLower, attrAssign) + // Find attribute name, then skip optional whitespace to find = + idx := strings.Index(rawLower, attrName) if idx < 0 { return '"', false // default to double-quoted } - afterEq := idx + len(attrAssign) + // Skip past attribute name and any whitespace to find = + pos := idx + len(attrName) + for pos < len(rawToken) && (rawToken[pos] == ' ' || rawToken[pos] == '\t') { + pos++ + } + if pos >= len(rawToken) || rawToken[pos] != '=' { + return '"', false + } + // Skip = and any whitespace after it + pos++ + for pos < len(rawToken) && (rawToken[pos] == ' ' || rawToken[pos] == '\t') { + pos++ + } + afterEq := pos if afterEq >= len(rawToken) { return '"', false }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/fuzz/analyzers/xss/context.go` around lines 248 - 254, detectAttrQuoting currently searches for attrName+"=" and thus misses attributes with spaces around '='; modify detectAttrQuoting to locate attrName (use strings.Index on a lower-cased rawToken or regex) then advance past the attribute name, skip any intervening whitespace, verify the next character is '=', skip any whitespace after '=', and finally inspect the next character to determine the quote byte and return true; update references inside detectAttrQuoting to use this whitespace-tolerant logic so attributes like `class = "foo"` are handled correctly.
233-236: Template literal interpolation not handled (edge case).The backtick string detection works for simple cases, but doesn't account for
${...}interpolations where the marker would be in expression context rather than string context:`Hello ${MARKER}` // MARKER is in expression context, not stringThis is a minor edge case that could result in
ContextScriptStringinstead ofContextScriptfor markers inside template interpolations.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/fuzz/analyzers/xss/context.go` around lines 233 - 236, The backtick handling misses template-expression contexts like `${...}` so markers inside interpolations are misclassified; update the logic around the backtick case in context.go to track template expression state (e.g., add an inTemplateExpr boolean and a templateExprDepth counter), so that when inBacktick and you see the sequence "${" you set inTemplateExpr=true and increment depth and treat subsequent characters as expression context (count '{' and '}' to decrement depth and clear inTemplateExpr when depth hits 0), and ensure the existing inBacktick toggle logic (case '`') ignores backtick toggling while inTemplateExpr is true so markers inside `${...}` are classified as ContextScript rather than ContextScriptString.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/fuzz/analyzers/xss/context.go`:
- Around line 248-254: detectAttrQuoting currently searches for attrName+"=" and
thus misses attributes with spaces around '='; modify detectAttrQuoting to
locate attrName (use strings.Index on a lower-cased rawToken or regex) then
advance past the attribute name, skip any intervening whitespace, verify the
next character is '=', skip any whitespace after '=', and finally inspect the
next character to determine the quote byte and return true; update references
inside detectAttrQuoting to use this whitespace-tolerant logic so attributes
like `class = "foo"` are handled correctly.
- Around line 233-236: The backtick handling misses template-expression contexts
like `${...}` so markers inside interpolations are misclassified; update the
logic around the backtick case in context.go to track template expression state
(e.g., add an inTemplateExpr boolean and a templateExprDepth counter), so that
when inBacktick and you see the sequence "${" you set inTemplateExpr=true and
increment depth and treat subsequent characters as expression context (count '{'
and '}' to decrement depth and clear inTemplateExpr when depth hits 0), and
ensure the existing inBacktick toggle logic (case '`') ignores backtick toggling
while inTemplateExpr is true so markers inside `${...}` are classified as
ContextScript rather than ContextScriptString.
ℹ️ Review info
Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e167771d-ca4e-4f09-8e05-d743dc72145f
📒 Files selected for processing (1)
pkg/fuzz/analyzers/xss/context.go
|
Addressed follow-up review items in 46a0a5e:\n\n- Extract canary from (regex ) to avoid analyzer parameter desync.\n- Move component-state restore to immediately after , so original value is restored even if fails.\n- Added explicit regression tests:\n - \n - \n\nI can’t run in this environment because is not installed, but changes are intentionally scoped and compile-safe in the touched files. Happy to adjust quickly if CI flags anything. |
|
Follow-up update (the previous comment formatting got mangled by shell escaping): Addressed the review items in commit
I couldn’t run |
|
Implemented the two remaining CodeRabbit nits in
Added tests:
(Still can’t run |
There was a problem hiding this comment.
🧹 Nitpick comments (2)
pkg/fuzz/analyzers/xss/context_test.go (2)
142-176: Harden thejavascript:/srcdocregression tests with explicit “not old context” checks.Line 142 and Line 160 currently assert only that the new context exists. These tests can still pass if
ContextAttributeis also returned for the same attribute, which weakens regression protection.Proposed test hardening
func TestDetectReflections_JavascriptURI(t *testing.T) { body := `<html><body><a href="javascript:nucleiXSScanary">x</a></body></html>` reflections := DetectReflections(body, testMarker) if len(reflections) == 0 { t.Fatal("expected at least one reflection in javascript: URI") } - found := false + foundScript := false + foundLegacyAttr := false for _, r := range reflections { if r.Context == ContextScript && r.AttrName == "href" { - found = true - break + foundScript = true + } + if r.Context == ContextAttribute && r.AttrName == "href" { + foundLegacyAttr = true } } - if !found { + if !foundScript { t.Fatalf("expected ContextScript for javascript: href, got %v", reflections) } + if foundLegacyAttr { + t.Fatalf("unexpected ContextAttribute for javascript: href, got %v", reflections) + } } func TestDetectReflections_Srcdoc(t *testing.T) { body := `<html><body><iframe srcdoc="<p>nucleiXSScanary</p>"></iframe></body></html>` reflections := DetectReflections(body, testMarker) if len(reflections) == 0 { t.Fatal("expected at least one reflection in srcdoc attribute") } - found := false + foundHTMLText := false + foundLegacyAttr := false for _, r := range reflections { if r.Context == ContextHTMLText && r.AttrName == "srcdoc" { - found = true - break + foundHTMLText = true + } + if r.Context == ContextAttribute && r.AttrName == "srcdoc" { + foundLegacyAttr = true } } - if !found { + if !foundHTMLText { t.Fatalf("expected ContextHTMLText for srcdoc, got %v", reflections) } + if foundLegacyAttr { + t.Fatalf("unexpected ContextAttribute for srcdoc, got %v", reflections) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/fuzz/analyzers/xss/context_test.go` around lines 142 - 176, Tests DetectReflections_JavascriptURI and DetectReflections_Srcdoc only assert presence of the new contexts (ContextScript and ContextHTMLText) but don't assert the old ContextAttribute is absent; update these tests (targeting DetectReflections output) to additionally verify that no reflection exists with AttrName "href" or "srcdoc" that has ContextAttribute, i.e., ensure each reflection with AttrName == "href" (for JavascriptURI) or == "srcdoc" (for Srcdoc) does not have Context == ContextAttribute while still asserting the expected new context (ContextScript / ContextHTMLText) is present.
178-184: Add one exact-canary negative regression for encoded marker forms.Current suite has generic no-reflection coverage, but not an explicit encoded-marker case. Adding this protects the intended “exact canary reflection” behavior from drifting.
Proposed regression test
func TestDetectReflections_NoReflection(t *testing.T) { body := `<html><body><p>Hello world</p></body></html>` reflections := DetectReflections(body, testMarker) if len(reflections) != 0 { t.Fatalf("expected no reflections, got %d", len(reflections)) } } + +func TestDetectReflections_NoReflection_EncodedMarker(t *testing.T) { + body := `<html><body><p>%6e%75%63%6c%65%69%58%53%53%63%61%6e%61%72%79</p></body></html>` + reflections := DetectReflections(body, testMarker) + if len(reflections) != 0 { + t.Fatalf("expected no reflections for encoded marker, got %d", len(reflections)) + } +}Based on learnings: “preserve the design intent: use an exact-canary search during detection… implementation should follow the two-phase approach.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/fuzz/analyzers/xss/context_test.go` around lines 178 - 184, Add a negative regression test that verifies DetectReflections does not treat encoded forms of the canary as a reflection: create a new test (e.g., TestDetectReflections_EncodedMarker_NoReflection) that constructs a body containing URL-encoded and HTML-encoded variants of testMarker and calls DetectReflections(body, testMarker), then assert len(reflections) == 0; this ensures DetectReflections uses an exact-canary search (two-phase approach) rather than matching encoded equivalents.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/fuzz/analyzers/xss/context_test.go`:
- Around line 142-176: Tests DetectReflections_JavascriptURI and
DetectReflections_Srcdoc only assert presence of the new contexts (ContextScript
and ContextHTMLText) but don't assert the old ContextAttribute is absent; update
these tests (targeting DetectReflections output) to additionally verify that no
reflection exists with AttrName "href" or "srcdoc" that has ContextAttribute,
i.e., ensure each reflection with AttrName == "href" (for JavascriptURI) or ==
"srcdoc" (for Srcdoc) does not have Context == ContextAttribute while still
asserting the expected new context (ContextScript / ContextHTMLText) is present.
- Around line 178-184: Add a negative regression test that verifies
DetectReflections does not treat encoded forms of the canary as a reflection:
create a new test (e.g., TestDetectReflections_EncodedMarker_NoReflection) that
constructs a body containing URL-encoded and HTML-encoded variants of testMarker
and calls DetectReflections(body, testMarker), then assert len(reflections) ==
0; this ensures DetectReflections uses an exact-canary search (two-phase
approach) rather than matching encoded equivalents.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e570235f-4248-436f-afbc-ebcd8aba470d
📒 Files selected for processing (2)
pkg/fuzz/analyzers/xss/context.gopkg/fuzz/analyzers/xss/context_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/fuzz/analyzers/xss/context.go
|
Applied the latest test-hardening suggestions in
(Still unable to run |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/fuzz/analyzers/xss/context_test.go`:
- Around line 251-257: The test TestDetectReflections_CaseInsensitive encodes a
scenario that never occurs because Analyzer.Analyze first does an exact
strings.Contains(body, canary) check before calling DetectReflections; update
the test to reflect real analyzer flow by either (a) changing the test input
body to use the exact-case canary string so the Analyzer path is valid (use same
casing as the canary passed to DetectReflections), or (b) if you intend this to
be a unit-only helper contract for DetectReflections, rename the test to
indicate it's a helper-only test (e.g.,
TestDetectReflections_Helper_CaseInsensitive) and move it to a helper-focused
test suite; reference Analyzer.Analyze and DetectReflections when making the
change so behavior matches the two-phase detect→analyze flow.
- Around line 351-363: Add a negative regression test to ensure
detectAttrQuoting does not match partial attribute names: create tests (e.g.,
TestDetectAttrQuoting_PartialMatch_DataClass and
TestDetectAttrQuoting_PartialMatch_Classname) that call detectAttrQuoting with
the target "class" against inputs like `<input data-class="nucleiXSScanary">`
and `<input classname="nucleiXSScanary">` and assert that the function does not
report a match (i.e., it should not return quote == '"' nor unquoted == true),
thereby verifying the boundary check rejects partial-name matches.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: af48ca87-6274-4510-a8fa-d6e818c38c3a
📒 Files selected for processing (1)
pkg/fuzz/analyzers/xss/context_test.go
- Resolved conflict in pkg/protocols/http/request.go - Applied XSS analyzer response data extraction to current dev version - Preserved all XSS analyzer functionality while adopting latest dev changes
|
Friendly ping! This PR is ready for review. All checks have passed (Neo ✓, CodeRabbit ✓). Happy to address any feedback! |
Fix XSS context analyzer edge cases for issue #7086
/claim #7086
Summary by CodeRabbit
New Features
Tests
Chores