feat(html-report): add flaky test detection and summary section#5498
feat(html-report): add flaky test detection and summary section#5498
Conversation
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 4 |
TIP This summary will be updated as you push new changes. Give us feedback
Add dedicated flaky test support to the HTML report: - Flaky detection: tests with status 'passed' and retryAttempt > 0 - New collapsible 'Flaky Tests' quick-access section with retry counts - Orange 'Flaky' filter pill in the status filter bar (hidden when no flaky tests) - Flaky count indicator in the summary dashboard - Orange color theme variables and badge styling for flaky status - Exclude flaky tests from the 'Passed' filter for clearer separation Closes #5487
a352a23 to
8e14a19
Compare
There was a problem hiding this comment.
Code Review: feat(html-report): flaky test detection and summary section
Overall this is a well-executed, self-contained feature. The implementation is clean and consistent with existing patterns in the file. A few things worth flagging:
Bug: "Passed" pill count includes flaky tests, but the filter excludes them
The server-side summary.Passed value (rendered into the HTML at generation time) counts all passed tests, including flaky ones. The JS filter for passed correctly excludes flaky tests — but the count shown in the pill doesn't match what's visible after filtering.
// AppendSearchAndFilters — Passed pill count is baked in server-side
sb.Append(summary.Passed); // includes flaky tests// JS filter correctly excludes flaky from the Passed view
} else if (activeFilter === 'passed') {
if (t.status !== 'passed' || isFlaky(t)) return false;
}This creates a mismatch: the "Passed" badge says e.g. 42, but clicking it only shows 39 tests. The fix is to subtract the flaky count from the displayed Passed count once it's computed client-side:
// After computing flaky.length in renderFlakySection:
const passedPillCount = document.querySelector('[data-filter="passed"] .pill-count');
if (passedPillCount) {
const originalPassed = parseInt(passedPillCount.dataset.total || passedPillCount.textContent);
passedPillCount.dataset.total = originalPassed; // preserve original for idempotency
passedPillCount.textContent = originalPassed - flaky.length;
}Minor: ReportSummary has no Flaky property
The flaky count is computed entirely client-side in JS, meaning the serialized JSON summary embedded in the report doesn't carry a flaky count. If consumers ever parse the raw reportData JSON (e.g. for CI tooling or other downstream use), the flaky count won't be available.
Adding a Flaky property to ReportSummary (populated server-side based on retry semantics) would make the data model complete and consistent with Passed, Failed, etc. This isn't blocking for the HTML report use case, but worth considering for forward compatibility.
Minor: pill visibility uses inline style vs. class toggle (inconsistency)
The flakyPill uses pill.style.display = ... while flakyIndicator uses .classList.add('visible'). Both work correctly, but the file establishes the class-based pattern for toggling visibility — using it consistently for the pill too would be cleaner.
Strengths
isFlaky()extracted as a named helper — readable and reusable- Reuses existing
.retry-tag,.qa-section,.qa-itemCSS — no style duplication esc()used on all user-supplied strings — XSS safe- Sort by retry count descending is the right default (most unstable tests first)
- Dark/light mode theming via CSS variables follows established patterns
aria-labelon the new section region — good accessibility
- Fix Passed pill count mismatch by subtracting flaky count server-side - Add Flaky property to ReportSummary for forward compatibility - Use class-based visibility toggle for flaky pill instead of inline style
There was a problem hiding this comment.
Code Review: feat(html-report): flaky test detection and summary section
Previous Review — All Addressed ✓
All three issues from the prior review have been resolved:
- Passed pill count: now correctly renders
summary.Passed - summary.Flakyserver-side ✓ ReportSummary.Flakyproperty: added with[JsonPropertyName("flaky")]and populated inAccumulateStatus✓- Pill visibility: uses
classList.add/remove('hidden')correctly (class-based, not inline style) ✓
New Issue: Flaky tests render a green "passed" badge in the main test list
When a user clicks the "Flaky" filter pill, matchesFilter correctly shows only flaky tests — but the main test row renderer uses t.status for the badge class:
// HtmlReportGenerator.cs:1886
html += '<span class="t-badge '+t.status+'">'+esc(t.status)+'</span>';Since t.status === 'passed' for flaky tests, each row gets a green "passed" badge — inconsistent with the orange "flaky" badge shown in the quick-access section above. A user filtering by Flaky will see a page full of green "passed" badges, which undermines the visual distinction this PR is building.
Suggested fix:
const badgeClass = isFlaky(t) ? 'flaky' : t.status;
html += '<span class="t-badge ' + badgeClass + '">' + esc(badgeClass) + '</span>';This also applies to the failure-clusters renderer at line 2282, which has the same pattern:
h += '<span class="t-badge '+safeClass(f.t.status)+'">'+esc(f.t.status)+'</span>';(Less critical there since failure clusters only show failed tests, but worth keeping consistent.)
Minor: Pass rate / ring chart inconsistency with the Passed pill
The ring chart and pass rate percentage are computed from summary.Passed (which includes flaky tests), while the Passed filter pill shows summary.Passed - summary.Flaky. This means:
- Ring shows, say, 95% pass rate (42 tests as "passed")
- Passed pill says 39 tests
A user might be confused why the numbers don't add up. The flaky-indicator label ("3 flaky") bridges the gap, but it's a small text element that's easy to miss. Either:
- Adjust the ring to use
Passed - Flakyfor the green arc and add an orange arc for flaky (more visual clarity), or - Add a note to the ring center like "pass rate (incl. flaky)" to explicitly acknowledge the convention
The current approach isn't wrong — flaky tests did pass — but the pill/ring mismatch may prompt user confusion.
Strengths
The overall implementation is solid:
isFlaky()extracted as a named, reusable helper — used consistently acrossmatchesFilter,renderFlakySection, and the row renderer- CSS variables follow the established pattern (
--orange,--orange-d,@propertyregistration, dark/light mode) - XSS safety:
esc()is applied everywhere user-facing strings are interpolated retryAttemptwas already serialized into the JSON data model, so no schema additions needed for JS to work.retry-tagalready existed in CSS — smart reuse rather than duplication- Server-side and client-side flaky definitions are identical (
passed+retryAttempt > 0), so counts will always agree
- Add badgeLabel() JS helper to show 'flaky' badge instead of 'passed' for flaky tests in main list, failed section, and failure clusters - Split ring chart green arc into green (clean pass) + orange (flaky) - Update Passed stat card to show clean passes only (excluding flaky) - Simplify AccumulateStatus to accept ReportTestResult directly - Use classList.toggle for flaky pill visibility
There was a problem hiding this comment.
Code Review: feat(html-report): flaky test detection and summary section (Round 3)
Previous Review — All Addressed ✓
All issues from the second review have been resolved:
- Green "passed" badge for flaky tests: fixed with the new
badgeLabel()helper applied consistently acrossrender(),renderFailedSection(), andrenderFailureClusters()✓ - Ring chart inconsistency: green arc now represents clean passes only; orange arc added for flaky; the split correctly preserves the total arc length ✓
- Passed stat card: now shows
cleanPassed(excluding flaky) ✓
Remaining Issue: Stat cards no longer sum to Total
With flaky tests excluded from the "Passed" card, the stat cards are now mathematically inconsistent:
Total = cleanPassed + Flaky + Failed + TimedOut + Skipped + Cancelled
But the displayed stat cards are:
Total | Passed (clean) | Failed | Skipped | Cancelled
For example with 3 flaky tests:
- Total: 50
- Passed: 39 + Failed: 5 + Skipped: 3 + Cancelled: 0 = 47 ≠ 50
The 3 missing tests are accounted for only by the small flaky-indicator div beneath the stats. A user is likely to notice the arithmetic doesn't work and be confused. The flaky-indicator is easy to overlook — it's an <div> styled with .78rem text versus the prominent stat card layout.
Suggested fix: Add a proper "Flaky" stat card to keep the stat row self-consistent:
AppendStatCard(sb, "passed", cleanPassed.ToString(), "Passed", "var(--emerald)");
AppendStatCard(sb, "flaky", summary.Flaky.ToString(), "Flaky", "var(--orange)"); // add this
AppendStatCard(sb, "failed", (summary.Failed + summary.TimedOut).ToString(), "Failed", "var(--rose)");Then the flaky-indicator div can be removed entirely since it would be redundant. The stat card also provides a click-to-filter entry point (same as the other cards if wired up), and is consistent with how other statuses are treated.
Minor: passRate includes flaky tests but ring green arc does not
var passRate = summary.Total > 0 ? (double)summary.Passed / summary.Total * 100 : 0;
// rendered as the ring center % labelsummary.Passed includes flaky tests, so the ring center shows e.g. "95% pass rate" computed from Passed / Total — but the green arc of the ring now represents only cleanPassed / Total. The orange arc covers the gap, so a careful reader can reconcile them. This is a minor inconsistency — either computing passRate from cleanPassed or updating the label to "pass rate (incl. flaky)" would be cleaner — but it's not blocking.
Strengths
The implementation continues to be solid:
badgeLabel()is clean and now applied uniformly in all three rendering contexts- The split ring chart gives flaky tests a distinct visual weight without requiring a tooltip or key to interpret
AccumulateStatusswitch uses the correct C#whenguard ordering — more specificcase "passed" when retryAttempt > 0before plaincase "passed"is correctcleanPassed = summary.Passed - summary.Flakyis mathematically guaranteed safe (every flaky test also incrementsPassed, soFlaky ≤ Passedalways holds)- Server-side and client-side flaky definitions remain identical (
status == "passed" && retryAttempt > 0), so counts are always in sync
- Add Flaky stat card so stat row sums to Total (hidden when 0 flaky) - Remove redundant flaky-indicator div (stat card replaces it) - Fix pass rate % to exclude flaky tests (consistent with green arc)
There was a problem hiding this comment.
Code Review: feat(html-report): flaky test detection and summary section (Round 4)
Previous Review — All Addressed ✓
All issues from the third review have been resolved:
- Stat cards no longer sum to Total: Fixed by adding a conditional Flaky stat card when
summary.Flaky > 0, ensuring cleanPassed + Flaky + Failed + Skipped + Cancelled = Total ✓ passRateincludes flaky but ring doesn't: Fixed —passRatenow usescleanPassed / Total, so the ring centre label matches the green arc's proportion ✓
Implementation is Clean and Complete
Looking at the final diff, the key invariants all hold:
Server-side consistency:
cleanPassed = summary.Passed - summary.Flakyis always safe (Flaky ≤ Passedby construction inAccumulateStatus)- Flaky stat card is shown only when there are flaky tests — sensible UX choice (avoids a "0 Flaky" card polluting zero-flaky runs)
- C# switch ordering is correct:
case "passed" when retryAttempt > 0before plaincase "passed"
Client-side consistency:
isFlaky(t)and server-side flaky definition are identical — counts always agreebadgeLabel()applied in all three render contexts:render(),renderFailedSection(),renderFailureClusters()flakyPillstartshiddenin server-rendered HTML;renderFlakySection()toggles it client-side — correct separation of concerns
One Minor Observation (Non-blocking)
The sort in renderFlakySection uses only retryAttempt as the key:
flaky.sort(function(a,b){ return b.t.retryAttempt - a.t.retryAttempt; });Tests sharing the same retry count will appear in whatever order groups happens to be in — non-deterministic across runs. A secondary sort by displayName would make the list stable and easier to scan when many tests share the same retry count, but this is purely cosmetic and not blocking.
Overall Assessment
The PR is in excellent shape. Each round of feedback was addressed cleanly and incrementally. The feature is well-scoped, XSS-safe throughout (esc() on all user-facing strings), follows all existing CSS/JS conventions, and leaves the data model in a consistent state for any downstream consumers of the JSON. Ready to merge.
- Add CleanPassed and TotalFailed to ReportSummary, replacing 5 inline computations across the generator - Revert pass rate to include flaky tests (they did pass)
There was a problem hiding this comment.
Follow-up Code Review: feat(html-report): flaky test detection and summary section
All three issues raised in the previous review have been addressed. Here's the status:
Previous issues — all resolved
Bug: Passed pill count mismatch — Fixed. The server-side C# now uses summary.CleanPassed (i.e. Passed - Flaky) when rendering the Passed pill count, so the displayed count matches what the JS filter shows. The CleanPassed computed property in ReportSummary is the right place for this.
Missing Flaky property on ReportSummary — Fixed. Flaky is now a proper [JsonPropertyName("flaky")] serialized property populated by AccumulateStatus. This means downstream consumers of the raw JSON data will see the flaky count alongside all other summary fields.
Inline style vs. class-based visibility toggle — Fixed. The pill now uses pill.classList.toggle('hidden', !flaky.length) consistently with the rest of the file.
New observations in this revision
The flaky pill starts hidden server-side with class="pill hidden", but renderFlakySection() re-computes and shows it client-side. This is a reasonable approach but note a subtle edge case: the flaky pill is always rendered as hidden in the HTML regardless of whether summary.Flaky > 0. This means:
- If a user loads the report with JS disabled, flaky tests are never surfaced via the pill (though the stat card and section heading still appear).
- More practically: the server already knows
summary.Flakyat generation time, so the pill visibility could be set server-side by conditionally omitting thehiddenclass whensummary.Flaky > 0. This would remove the need forrenderFlakySection()to manipulate the pill DOM at all, giving cleaner separation of concerns. That said,renderFlakySection()needs to run regardless to populate the section body, so the current approach is perfectly functional.
This is a minor architectural point, not a blocker. The current implementation is consistent and correct.
AccumulateStatus refactor is clean. Passing ReportTestResult instead of just status is the right design — the switch with a when guard for the flaky case is readable and correct.
CleanPassed / TotalFailed as [JsonIgnore] computed properties is the right choice. These are derived values and don't need to be in the JSON payload.
Summary
The implementation is solid. All previous feedback was acted on promptly and correctly. The feature is consistent with the report's existing design patterns — isFlaky() and badgeLabel() helpers are well-placed, the CSS variables follow the established naming convention, and the ring chart segmentation is logical. Good work.
Summary
status === 'passed'andretryAttempt > 0Closes #5487