feat: Redpanda Connect connector docs automation with multi-release attribution#183
feat: Redpanda Connect connector docs automation with multi-release attribution#183JakeSCahill merged 7 commits intomainfrom
Conversation
- Fix semver validation for version extraction (--connect-version, --from-version flags, and filename parsing) - Add null-safety to multi-version PR summary formatter to prevent crashes on malformed data - Remove duplicate cleanup logic that contradicted versionsToKeep - Add 14 tests for generateMultiVersionPRSummary() function - Export generateMultiVersionPRSummary for testing Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Implements sequential processing of intermediate releases to accurately attribute new connectors to their actual release version instead of lumping all changes into the latest version. Key improvements: - Process each intermediate release between last documented and latest version - Match cloud binary version to each release date for accurate platform attribution - Fix CGO-only component false positives via augmentation stripping - Add master diff aggregation across multiple releases - New CLI flags: --skip-intermediate, --from-version Bug fixes: - Strip augmentation fields (cloudSupported, requiresCgo, cloudOnly) before version comparisons - Prevent cloud-only/CGO components from appearing as "new" in wrong versions - Fix buildConfigYaml to only add label field for inputs/outputs/processors New files: - tools/redpanda-connect/github-release-utils.js - GitHub release discovery and cloud version matching - tools/redpanda-connect/multi-version-summary.js - Master diff aggregation - tools/redpanda-connect/AUTOMATION.md - Comprehensive automation documentation - CLAUDE.md - AI-optimized repository overview - Tests for new functionality Accuracy improved from ~70% to ~95% for connector attribution. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The multi-version PR summary now includes comprehensive information for technical writers: - Release notes links for each version - New connector descriptions (2-sentence summaries) - New fields table with component, field, and description columns - Removed connectors and fields with version attribution - Deprecated fields with migration guidance - Changed defaults table showing old → new values - Prioritized action items (cloud connectors first) - Platform grouping (cloud vs self-hosted sections) This makes the PR summary actionable for writers without needing to dig through JSON files for details. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
✅ Deploy Preview for docs-extensions-and-macros ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Important Review skippedAuto incremental reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThis PR introduces multi-version connector documentation automation for Redpanda Connect. It adds a GitHub release discovery utility to find intermediate releases between versions, implements multi-release diff aggregation, extends PR summary formatting to handle multiple releases with per-release breakdowns and aggregated statistics, and updates the connector docs handler to orchestrate intermediate release processing. Supporting changes include new CLI options for version control ( Sequence DiagramsequenceDiagram
participant CLI as CLI Handler
participant Discovery as Release Discovery
participant DataLoader as Data Loader
participant Analyzer as Binary Analyzer
participant DiffGen as Diff Generator
participant Aggregator as Master Diff Aggregator
participant Formatter as PR Summary Formatter
CLI->>Discovery: discoverIntermediateReleases(from, to)
Discovery->>Discovery: Fetch releases from GitHub
Discovery-->>CLI: [Release v1, v2, v3...]
loop For each consecutive version pair
CLI->>DataLoader: loadConnectorDataForVersion(version)
DataLoader-->>CLI: Connector data (stripped)
CLI->>Analyzer: analyzeAllBinaries(newVersion)
Analyzer-->>CLI: Binary analysis (Cloud/OSS/CGO)
CLI->>DiffGen: Generate & write diff JSON
DiffGen-->>CLI: connect-diff-<from>_to_<to>.json
end
CLI->>Aggregator: createMasterDiff(intermediateResults, finalDiff)
Aggregator->>Aggregator: Read & parse all diff JSON files
Aggregator->>Aggregator: Aggregate counts across releases
Aggregator-->>CLI: masterDiff (aggregated metadata)
CLI->>Formatter: generateMultiVersionPRSummary(masterDiff)
Formatter->>Formatter: Format per-release breakdown
Formatter->>Formatter: Compute aggregated totals
Formatter-->>CLI: PR summary with multi-version output
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 9
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tools/redpanda-connect/pr-summary-formatter.js (1)
397-408:⚠️ Potential issue | 🟠 MajorGuard
binaryAnalysis.comparisoneverywhere and countcloudOnlyconnectors as cloud-supported.
tools/redpanda-connect/connector-binary-analyzer.jsLines 431-525 can return{ comparison: null }when cloud analysis is skipped, so these dereferences can throw and abort summary generation. The quick summary and writer-action blocks also only look atinCloud/notInCloud, which dropscloudOnlyconnectors from the cloud-doc count/checklist.💡 Suggested pattern
+ const comparison = binaryAnalysis?.comparison; + const inCloud = comparison?.inCloud ?? []; + const cloudOnly = comparison?.cloudOnly ?? []; + const notInCloud = comparison?.notInCloud ?? []; + if (stats.newComponents > 0) { lines.push(`- **${stats.newComponents}** new connector${stats.newComponents !== 1 ? 's' : ''}`); - if (binaryAnalysis) { + if (comparison) { const newConnectorKeys = diffData.details.newComponents.map(c => `${c.type}:${c.name}`); const cloudSupported = newConnectorKeys.filter(key => { - const inCloud = binaryAnalysis.comparison.inCloud.some(c => `${c.type}:${c.name}` === key); - return inCloud; + return inCloud.some(c => `${c.type}:${c.name}` === key) || + cloudOnly.some(c => `${c.type}:${c.name}` === key); }).length;Also applies to: 551-559, 648-669, 739-743, 951-955
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/redpanda-connect/pr-summary-formatter.js` around lines 397 - 408, Guard all uses of binaryAnalysis.comparison before dereferencing it and include cloudOnly connectors when counting cloud-supported connectors: check binaryAnalysis && binaryAnalysis.comparison before accessing comparison.inCloud or comparison.notInCloud, and when computing newConnectorKeys' cloud support use comparison.inCloud.concat(comparison.cloudOnly) (or check both arrays) to count cloudSupported; ensure needsCloudDocs is treated as a numeric count (not a boolean) and then use if (needsCloudDocs > 0) to push the summary line. Apply the same guards and cloudOnly-inclusive counting to the other summary/writer-action blocks that reference comparison.inCloud/notInCloud.
🧹 Nitpick comments (3)
__tests__/tools/buildConfigYaml.test.js (1)
149-224: Solid coverage for complex field rendering.Tests appropriately verify:
- Nested object fields render with children
- Array-of-objects render as empty arrays (not expanded)
- Simple arrays render correctly
Consider adding an edge case test for an empty
childrenarray to ensure no rendering issues occur.🧪 Optional: Add edge case test for empty children
+ it('should handle empty children array', () => { + const result = buildConfigYaml('inputs', 'kafka', [], false); + expect(result).toContain('inputs:'); + expect(result).toContain(' label: ""'); + expect(result).toContain(' kafka:'); + // Should only have header lines, no field lines + const lines = result.split('\n'); + expect(lines.length).toBe(3); + });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@__tests__/tools/buildConfigYaml.test.js` around lines 149 - 224, Add a unit test for the empty-children edge case by calling buildConfigYaml with a field whose children is an empty array (e.g., { name: 'foo', type: 'object', kind: 'map', children: [] } and also test kind: 'array' if desired) and assert it does not render any child keys (expect(result).not.toContain('someChild:')) and that the parent renders appropriately (e.g., contains 'foo:' for map and 'foo: []' for array-of-objects). Place the test inside the existing "complex field types" describe block next to the other cases and reference buildConfigYaml to locate the behavior to validate.tools/redpanda-connect/AUTOMATION.md (1)
41-41: Add languages to the unlabeled fenced blocks.These fences trigger markdownlint MD040. Tag the ASCII diagrams/tree blocks as
textto keep the new doc lint-clean.Also applies to: 85-85, 92-92, 628-628
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tools/redpanda-connect/AUTOMATION.md` at line 41, Update each unlabeled fenced code block that contains ASCII diagrams or tree-style ASCII art (the blocks that currently start with just ``` and contain ASCII diagrams/tree) by adding the language tag text after the opening fence (i.e., change ``` to ```text) so markdownlint MD040 is satisfied; search for the plain ``` fences surrounding ASCII diagram content (the diagram/tree blocks referenced in the review) and update them to use ```text consistently.__tests__/tools/pr-summary-formatter.test.js (1)
507-519: Scope the cloud-indicator assertion to the connector entry.
expect(summary).not.toContain('☁️')is broader than the behavior this test cares about, so an unrelated cloud legend/header will break it even whentest_connectoris still unmarked. Assert on thetest_connectorline instead.♻️ Suggested assertion
// Should not crash and should not show cloud indicator expect(summary).toContain('`test_connector`'); - expect(summary).not.toContain('☁️'); + const connectorLine = summary.match(/`test_connector`[^\n]*/); + expect(connectorLine).toBeTruthy(); + expect(connectorLine[0]).not.toContain('☁️');🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@__tests__/tools/pr-summary-formatter.test.js` around lines 507 - 519, The test currently checks globally that the summary does not contain the cloud emoji which is too broad; update the assertion to verify the connector-specific line for `test_connector` produced by generateMultiVersionPRSummary(masterDiff) does not include '☁️' instead of using expect(summary).not.toContain('☁️'); locate the test using createMasterDiff and createRelease and change the negative cloud assertion to target the specific connector entry (e.g., find the line that contains '`test_connector`' in the summary and assert that line does not include the cloud indicator).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CLAUDE.md`:
- Around line 161-171: The example for the `fetch` command is out of sync with
the implementation in bin/doc-tools.js: the CLI exposes --owner, --repo,
--remote-path, --save-dir, and --filename (not --path, --tag, or --output) and
it does not perform version-specific fetches; either update the CLAUDE.md
example to use the actual flags (--owner, --repo, --remote-path, --save-dir,
--filename) and remove the claim about version-specific downloads, or implement
the missing flags/behavior in the fetch command in bin/doc-tools.js (add support
for --path/--tag/--output and versioned fetch logic) so the docs match the CLI.
- Around line 139-142: The example command is missing the required -s/--surface
flag enforced by the CLI (see bin/doc-tools.js where -s, --surface is
mandatory); update the example to include the surface flag and a concrete value
(e.g., npx doc-tools generate bundle-openapi --surface openapi --tag v25.3.1) so
the command runs successfully.
In `@tools/redpanda-connect/github-release-utils.js`:
- Around line 144-154: Ensure fromVersion and toVersion are validated as strings
before using startsWith: check typeof fromVersion === 'string' and typeof
toVersion === 'string' and if either is not a string throw the existing Invalid
starting/ending version Error for fromVersion/toVersion; only after those guards
compute normalizedFrom and normalizedTo (using startsWith('v') ? slice(1) :
value) and then run semver.valid on the normalized values. Reference variables:
fromVersion, toVersion, normalizedFrom, normalizedTo, and semver.valid.
In `@tools/redpanda-connect/pr-summary-formatter.js`:
- Around line 264-289: The current logic only emits otherConnectors when there
are no cloudConnectors or selfHostedConnectors, dropping unknown-platform
connectors; update the block that handles otherConnectors (the otherConnectors
variable and its if check) to always run when otherConnectors.length > 0 (remove
the && cloudConnectors.length === 0 && selfHostedConnectors.length === 0
condition) and add an appropriate header (e.g., lines.push('_Other
connectors:_')) before iterating so those connectors are included in the
checklist regardless of cloud/self-hosted presence.
In `@tools/redpanda-connect/rpcn-connector-docs-handler.js`:
- Around line 869-879: The catch currently logs errors and records failed pairs
in intermediateProcessingResults but the later logic still updates the Antora
`latest-connect-version` and exits 0; change the flow so any recorded failure
prevents advancing the latest version: set a failure flag (e.g.,
hadProcessingError) or rely on inspecting intermediateProcessingResults entries
(check for any item with success: false) after processing all pairs, and if any
failures exist, skip the Antora/latest-version update and exit with a non-zero
code (or throw) so CI/automation will not treat the run as fully successful;
update the code paths that write/update `latest-connect-version` to first
confirm all intermediateProcessingResults are success === true (or that
hadProcessingError is false) before proceeding.
- Around line 943-945: The oldIndex is only stripped in the Antora fallback
branch, causing shape mismatches when oldIndex is loaded via the --old-data path
or from existingDataFiles; update every code path that assigns oldIndex
(including the --old-data/oldPath load and the existingDataFiles branch) to wrap
the parsed object with stripAugmentationFields(...) (i.e., replace direct
assignments like oldIndex = JSON.parse(fs.readFileSync(oldPath,'utf8')) or
assignments from existingDataFiles with oldIndex =
stripAugmentationFields(JSON.parse(...)) ) so that generateConnectorDiffJson()
receives a consistently stripped oldIndex.
- Around line 950-958: The code currently swaps the snapshot files (creating
._connect-${newVersion}-augmented.json.tmp and renaming the original) before
calling analyzeAllBinaries(), but if analyzeAllBinaries() throws the original
connect-${newVersion}.json isn't restored; wrap the analyzeAllBinaries() call
and the rename/copy operations in a try/finally (or add a finally block) so that
regardless of errors you rename/move the temp augmented file back to its
expected filename and remove the .tmp, restoring the original snapshot;
specifically update the logic around where cleanOssDataPath, newIndex, and
analyzeAllBinaries() are used to perform the restore in finally to guarantee
cleanup and restore of connect-${newVersion}.json.
- Around line 495-507: stripAugmentationFields currently only filters out
cloudOnly connectors but misses those marked requiresCgo, so update the filter
inside stripAugmentationFields to also exclude connectors with requiresCgo
unless they have OSS config (config or fields). Concretely, change the predicate
in the cleanData[type] = cleanData[type].filter(...) for connectors to return
true only when the connector is not cloudOnly and not requiresCgo, or when it
has c.config or c.fields (e.g., return (!(c.cloudOnly || c.requiresCgo) ||
c.config || c.fields)); this ensures cgo-only injected connectors are removed
from the cleaned data.
- Around line 603-607: The current filename regex
/^connect-\d+\.\d+\.\d+\.json$/ must be relaxed to allow prerelease segments and
you must replace lexicographic .sort() calls on version strings with
semver-aware sorting; update the regex to capture prerelease (for example
/^connect-(\d+\.\d+\.\d+(?:-[0-9A-Za-z-.]+)?)\.json$/) and extract the captured
version, validate with semver.valid, then use semver.rsort(candidates) or
semver.sort(candidates) and pick the first element to get the highest semantic
version (adjust places that currently call candidates.sort() and pick last to
instead call semver.rsort/semver.sort and pick index 0); apply this change
wherever candidates are filtered and sorted (the blocks referencing the regex
and candidates.sort()).
---
Outside diff comments:
In `@tools/redpanda-connect/pr-summary-formatter.js`:
- Around line 397-408: Guard all uses of binaryAnalysis.comparison before
dereferencing it and include cloudOnly connectors when counting cloud-supported
connectors: check binaryAnalysis && binaryAnalysis.comparison before accessing
comparison.inCloud or comparison.notInCloud, and when computing
newConnectorKeys' cloud support use
comparison.inCloud.concat(comparison.cloudOnly) (or check both arrays) to count
cloudSupported; ensure needsCloudDocs is treated as a numeric count (not a
boolean) and then use if (needsCloudDocs > 0) to push the summary line. Apply
the same guards and cloudOnly-inclusive counting to the other
summary/writer-action blocks that reference comparison.inCloud/notInCloud.
---
Nitpick comments:
In `@__tests__/tools/buildConfigYaml.test.js`:
- Around line 149-224: Add a unit test for the empty-children edge case by
calling buildConfigYaml with a field whose children is an empty array (e.g., {
name: 'foo', type: 'object', kind: 'map', children: [] } and also test kind:
'array' if desired) and assert it does not render any child keys
(expect(result).not.toContain('someChild:')) and that the parent renders
appropriately (e.g., contains 'foo:' for map and 'foo: []' for
array-of-objects). Place the test inside the existing "complex field types"
describe block next to the other cases and reference buildConfigYaml to locate
the behavior to validate.
In `@__tests__/tools/pr-summary-formatter.test.js`:
- Around line 507-519: The test currently checks globally that the summary does
not contain the cloud emoji which is too broad; update the assertion to verify
the connector-specific line for `test_connector` produced by
generateMultiVersionPRSummary(masterDiff) does not include '☁️' instead of using
expect(summary).not.toContain('☁️'); locate the test using createMasterDiff and
createRelease and change the negative cloud assertion to target the specific
connector entry (e.g., find the line that contains '`test_connector`' in the
summary and assert that line does not include the cloud indicator).
In `@tools/redpanda-connect/AUTOMATION.md`:
- Line 41: Update each unlabeled fenced code block that contains ASCII diagrams
or tree-style ASCII art (the blocks that currently start with just ``` and
contain ASCII diagrams/tree) by adding the language tag text after the opening
fence (i.e., change ``` to ```text) so markdownlint MD040 is satisfied; search
for the plain ``` fences surrounding ASCII diagram content (the diagram/tree
blocks referenced in the review) and update them to use ```text consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: daa2ab84-f179-4d5f-ac7d-70dfaa2c4f4d
📒 Files selected for processing (12)
CLAUDE.mdCLI_REFERENCE.adoc__tests__/tools/buildConfigYaml.test.js__tests__/tools/github-release-utils.test.js__tests__/tools/pr-summary-formatter.test.jsbin/doc-tools.jstools/redpanda-connect/AUTOMATION.mdtools/redpanda-connect/github-release-utils.jstools/redpanda-connect/helpers/buildConfigYaml.jstools/redpanda-connect/multi-version-summary.jstools/redpanda-connect/pr-summary-formatter.jstools/redpanda-connect/rpcn-connector-docs-handler.js
| // Normalize versions (remove 'v' prefix if present) | ||
| const normalizedFrom = fromVersion.startsWith('v') ? fromVersion.slice(1) : fromVersion; | ||
| const normalizedTo = toVersion.startsWith('v') ? toVersion.slice(1) : toVersion; | ||
|
|
||
| // Validate versions | ||
| if (!semver.valid(normalizedFrom)) { | ||
| throw new Error(`Invalid starting version: ${fromVersion}`); | ||
| } | ||
| if (!semver.valid(normalizedTo)) { | ||
| throw new Error(`Invalid ending version: ${toVersion}`); | ||
| } |
There was a problem hiding this comment.
Validate the inputs before calling startsWith().
If fromVersion or toVersion is undefined or non-string, this throws a TypeError before the semver checks run, so callers get a crash instead of the intended Invalid ... version error.
💡 Suggested guard
- const normalizedFrom = fromVersion.startsWith('v') ? fromVersion.slice(1) : fromVersion;
- const normalizedTo = toVersion.startsWith('v') ? toVersion.slice(1) : toVersion;
+ const normalizeVersion = (value, label) => {
+ if (typeof value !== 'string') {
+ throw new Error(`Invalid ${label} version: ${value}`);
+ }
+ return value.startsWith('v') ? value.slice(1) : value;
+ };
+
+ const normalizedFrom = normalizeVersion(fromVersion, 'starting');
+ const normalizedTo = normalizeVersion(toVersion, 'ending');🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/redpanda-connect/github-release-utils.js` around lines 144 - 154,
Ensure fromVersion and toVersion are validated as strings before using
startsWith: check typeof fromVersion === 'string' and typeof toVersion ===
'string' and if either is not a string throw the existing Invalid
starting/ending version Error for fromVersion/toVersion; only after those guards
compute normalizedFrom and normalizedTo (using startsWith('v') ? slice(1) :
value) and then run semver.valid on the normalized values. Reference variables:
fromVersion, toVersion, normalizedFrom, normalizedTo, and semver.valid.
| const cloudConnectors = allNewConnectors.filter(c => c.isCloud); | ||
| const selfHostedConnectors = allNewConnectors.filter(c => c.isSelfHostedOnly); | ||
| const otherConnectors = allNewConnectors.filter(c => !c.isCloud && !c.isSelfHostedOnly); | ||
|
|
||
| if (cloudConnectors.length > 0) { | ||
| lines.push('_Cloud-supported (higher priority):_'); | ||
| cloudConnectors.forEach(conn => { | ||
| lines.push(`- [ ] \`${conn.name}\` ${conn.type} ☁️ — introduced in **${conn.version}**`); | ||
| }); | ||
| lines.push(''); | ||
| } | ||
|
|
||
| if (selfHostedConnectors.length > 0) { | ||
| lines.push('_Self-hosted only:_'); | ||
| selfHostedConnectors.forEach(conn => { | ||
| lines.push(`- [ ] \`${conn.name}\` ${conn.type} 🖥️ — introduced in **${conn.version}**`); | ||
| }); | ||
| lines.push(''); | ||
| } | ||
|
|
||
| if (otherConnectors.length > 0 && cloudConnectors.length === 0 && selfHostedConnectors.length === 0) { | ||
| otherConnectors.forEach(conn => { | ||
| lines.push(`- [ ] \`${conn.name}\` ${conn.type} — introduced in **${conn.version}**`); | ||
| }); | ||
| lines.push(''); | ||
| } |
There was a problem hiding this comment.
Don't drop unknown-platform connectors from the multi-release checklist.
otherConnectors are only emitted when there are no cloud/self-hosted entries at all. If one release has binary-analysis data and another does not, the second release's new connectors disappear from Writer Action Items even though they still need docs.
💡 Suggested fix
- if (otherConnectors.length > 0 && cloudConnectors.length === 0 && selfHostedConnectors.length === 0) {
+ if (otherConnectors.length > 0) {
+ lines.push('_Platform not determined:_');
otherConnectors.forEach(conn => {
lines.push(`- [ ] \`${conn.name}\` ${conn.type} — introduced in **${conn.version}**`);
});
lines.push('');
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/redpanda-connect/pr-summary-formatter.js` around lines 264 - 289, The
current logic only emits otherConnectors when there are no cloudConnectors or
selfHostedConnectors, dropping unknown-platform connectors; update the block
that handles otherConnectors (the otherConnectors variable and its if check) to
always run when otherConnectors.length > 0 (remove the && cloudConnectors.length
=== 0 && selfHostedConnectors.length === 0 condition) and add an appropriate
header (e.g., lines.push('_Other connectors:_')) before iterating so those
connectors are included in the checklist regardless of cloud/self-hosted
presence.
| function stripAugmentationFields(data) { | ||
| const cleanData = JSON.parse(JSON.stringify(data)); | ||
| const connectorTypes = ['inputs', 'outputs', 'processors', 'caches', 'rate_limits', | ||
| 'buffers', 'metrics', 'scanners', 'tracers', 'config', 'bloblang-methods']; | ||
|
|
||
| for (const type of connectorTypes) { | ||
| if (Array.isArray(cleanData[type])) { | ||
| // Remove connectors that were added by augmentation (cloudOnly or requiresCgo without OSS data) | ||
| cleanData[type] = cleanData[type].filter(c => { | ||
| // Keep if it's not marked as cloudOnly | ||
| // OR if it has a config/fields (meaning it came from OSS, not just binary analysis) | ||
| return !c.cloudOnly || c.config || c.fields; | ||
| }); |
There was a problem hiding this comment.
stripAugmentationFields() still leaves injected cgo-only connectors behind.
Line 503 only filters cloudOnly, but Line 1142 later injects cgo-only connectors with requiresCgo: true. Those survive the “clean” load path and get diffed as real OSS components.
🧹 Suggested fix
- cleanData[type] = cleanData[type].filter(c => {
- // Keep if it's not marked as cloudOnly
- // OR if it has a config/fields (meaning it came from OSS, not just binary analysis)
- return !c.cloudOnly || c.config || c.fields;
- });
+ cleanData[type] = cleanData[type].filter(c => {
+ return !c.cloudOnly && !c.requiresCgo;
+ });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/redpanda-connect/rpcn-connector-docs-handler.js` around lines 495 -
507, stripAugmentationFields currently only filters out cloudOnly connectors but
misses those marked requiresCgo, so update the filter inside
stripAugmentationFields to also exclude connectors with requiresCgo unless they
have OSS config (config or fields). Concretely, change the predicate in the
cleanData[type] = cleanData[type].filter(...) for connectors to return true only
when the connector is not cloudOnly and not requiresCgo, or when it has c.config
or c.fields (e.g., return (!(c.cloudOnly || c.requiresCgo) || c.config ||
c.fields)); this ensures cgo-only injected connectors are removed from the
cleaned data.
| if (!semver.valid(options.connectVersion)) { | ||
| console.error(`Error: Invalid --connect-version format: ${options.connectVersion}`) | ||
| console.error('Expected format: X.Y.Z (e.g., 4.50.0)') | ||
| process.exit(1) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n -C2 'semver\.valid|connect-\\d\+\\.\\d\+\\.\\d+|\.sort\(\)' tools/redpanda-connect/rpcn-connector-docs-handler.js
python - <<'PY'
import re
files = [
"connect-v4.50.0.json",
"connect-4.82.0-rc2.json",
"connect-4.9.0.json",
"connect-4.10.0.json",
]
pat = re.compile(r"^connect-\d+\.\d+\.\d+\.json$")
print("regex matches:", {f: bool(pat.match(f)) for f in files})
print("lexicographic order:", sorted(files))
PYRepository: redpanda-data/docs-extensions-and-macros
Length of output: 2844
🏁 Script executed:
cat > /tmp/test_semver.js << 'EOF'
const semver = require('semver');
const testVersions = [
'v4.50.0',
'4.82.0-rc2',
'4.50.0',
'4.9.0',
'4.10.0',
];
console.log('semver.valid() results:');
testVersions.forEach(v => {
console.log(` ${v}: ${semver.valid(v)}`);
});
// Also test coerce
console.log('\nsemver.coerce() results:');
testVersions.forEach(v => {
const coerced = semver.coerce(v);
console.log(` ${v}: ${coerced}`);
});
EOF
cd tools/redpanda-connect && node /tmp/test_semver.jsRepository: redpanda-data/docs-extensions-and-macros
Length of output: 949
🏁 Script executed:
cat tools/redpanda-connect/package.json | grep -A5 -B5 semverRepository: redpanda-data/docs-extensions-and-macros
Length of output: 151
🏁 Script executed:
# Check all the locations mentioned in the review comment
rg -n 'semver\.valid|semver\.coerce' tools/redpanda-connect/rpcn-connector-docs-handler.jsRepository: redpanda-data/docs-extensions-and-macros
Length of output: 262
🏁 Script executed:
# Check what valid semver formats are and understand the issue better
cat tools/redpanda-connect/rpcn-connector-docs-handler.js | sed -n '600,620p' && \
echo "---" && \
cat tools/redpanda-connect/rpcn-connector-docs-handler.js | sed -n '700,720p' && \
echo "---" && \
cat tools/redpanda-connect/rpcn-connector-docs-handler.js | sed -n '730,755p'Repository: redpanda-data/docs-extensions-and-macros
Length of output: 2856
🏁 Script executed:
# Check the "Also applies to" sections
cat tools/redpanda-connect/rpcn-connector-docs-handler.js | sed -n '710,720p'Repository: redpanda-data/docs-extensions-and-macros
Length of output: 578
🏁 Script executed:
cat tools/redpanda-connect/rpcn-connector-docs-handler.js | sed -n '730,740p'Repository: redpanda-data/docs-extensions-and-macros
Length of output: 486
🏁 Script executed:
cat tools/redpanda-connect/rpcn-connector-docs-handler.js | sed -n '740,755p'Repository: redpanda-data/docs-extensions-and-macros
Length of output: 651
🏁 Script executed:
# Check if there's any version coercion or normalization
rg -n 'semver\.coerce|\.replace.*v|normalize' tools/redpanda-connect/rpcn-connector-docs-handler.js | head -20Repository: redpanda-data/docs-extensions-and-macros
Length of output: 329
🏁 Script executed:
# Search for examples of how versions are used
rg -n 'connect-.*\.json|version.*=' tools/redpanda-connect/rpcn-connector-docs-handler.js | head -30Repository: redpanda-data/docs-extensions-and-macros
Length of output: 2512
🏁 Script executed:
# Check comments or documentation about version format
rg -n 'Expected format|X\.Y\.Z|version.*format' tools/redpanda-connect/rpcn-connector-docs-handler.jsRepository: redpanda-data/docs-extensions-and-macros
Length of output: 503
🏁 Script executed:
# Look for where newVersion is used after being set
rg -n 'newVersion|dataFile' tools/redpanda-connect/rpcn-connector-docs-handler.js | grep -A2 -B2 'newVersion ='Repository: redpanda-data/docs-extensions-and-macros
Length of output: 706
Use semver-aware sorting for version selection, not lexicographic string ordering.
The regex /^connect-\d+\.\d+\.\d+\.json$/ only matches X.Y.Z format filenames. This causes two problems:
-
Prerelease versions fail file matching:
semver.valid('4.82.0-rc2')returnstrue, so the validation passes, but the filename won't match the regex pattern, causing the file lookup to fail or select the wrong baseline. -
Lexicographic sorting breaks semantic versions:
candidates.sort()orders strings lexicographically, so['4.9.0', '4.10.0'].sort()becomes['4.10.0', '4.9.0']. Since the code selects the last element, it picks4.9.0instead of4.10.0.
Replace .sort() with semver-aware sorting using semver.sort() or semver.rsort() when selecting versions.
Affects lines 710–712, 743–751, 928–934, 983–986, 1184–1188.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/redpanda-connect/rpcn-connector-docs-handler.js` around lines 603 -
607, The current filename regex /^connect-\d+\.\d+\.\d+\.json$/ must be relaxed
to allow prerelease segments and you must replace lexicographic .sort() calls on
version strings with semver-aware sorting; update the regex to capture
prerelease (for example /^connect-(\d+\.\d+\.\d+(?:-[0-9A-Za-z-.]+)?)\.json$/)
and extract the captured version, validate with semver.valid, then use
semver.rsort(candidates) or semver.sort(candidates) and pick the first element
to get the highest semantic version (adjust places that currently call
candidates.sort() and pick last to instead call semver.rsort/semver.sort and
pick index 0); apply this change wherever candidates are filtered and sorted
(the blocks referencing the regex and candidates.sort()).
| } catch (err) { | ||
| console.error(`❌ Error processing ${fromVer} → ${toVer}: ${err.message}`) | ||
| console.error(' Continuing with next version...\n') | ||
|
|
||
| intermediateProcessingResults.push({ | ||
| fromVersion: fromVer, | ||
| toVersion: toVer, | ||
| error: err.message, | ||
| success: false | ||
| }) | ||
| } |
There was a problem hiding this comment.
Partial intermediate failures should not advance latest-connect-version.
These branches record failed release pairs and keep going, but the command still updates Antora and exits 0 later. A missed fromVersion → toVersion then falls out of the next automatic run unless someone manually replays it with --from-version.
🚦 Suggested fix
+ const hasIntermediateFailures = intermediateProcessingResults.some(r => !r.success)
+
// Generate PR summary
try {
const { printPRSummary } = require('./pr-summary-formatter.js')
// Use master diff if available, otherwise use single diff
printPRSummary(masterDiff || diffJson, binaryAnalysis, draftFiles, masterDiff ? true : false)
@@
- const wrote = setAntoraValue('asciidoc.attributes.latest-connect-version', newVersion)
- if (wrote) {
- console.log(`Done: Updated Antora version: ${newVersion}`)
+ if (!hasIntermediateFailures) {
+ const wrote = setAntoraValue('asciidoc.attributes.latest-connect-version', newVersion)
+ if (wrote) {
+ console.log(`Done: Updated Antora version: ${newVersion}`)
+ }
+ } else {
+ console.warn('Warning: Skipping Antora version update because some intermediate releases failed')
+ process.exitCode = 1
}
@@
- process.exit(0)
+ process.exit(process.exitCode ?? 0)Also applies to: 1767-1777
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/redpanda-connect/rpcn-connector-docs-handler.js` around lines 869 -
879, The catch currently logs errors and records failed pairs in
intermediateProcessingResults but the later logic still updates the Antora
`latest-connect-version` and exits 0; change the flow so any recorded failure
prevents advancing the latest version: set a failure flag (e.g.,
hadProcessingError) or rely on inspecting intermediateProcessingResults entries
(check for any item with success: false) after processing all pairs, and if any
failures exist, skip the Antora/latest-version update and exit with a non-zero
code (or throw) so CI/automation will not treat the run as fully successful;
update the code paths that write/update `latest-connect-version` to first
confirm all intermediateProcessingResults are success === true (or that
hadProcessingError is false) before proceeding.
| // Strip augmentation fields to ensure clean comparisons | ||
| oldIndex = stripAugmentationFields(JSON.parse(fs.readFileSync(oldPath, 'utf8'))) | ||
| } |
There was a problem hiding this comment.
Apply stripAugmentationFields() to every oldIndex load path.
This only cleans the Antora fallback branch. When --old-data is used or the previous snapshot comes from existingDataFiles, oldIndex stays augmented while newIndex at Line 951 is stripped, so generateConnectorDiffJson() compares mismatched shapes.
🧼 Suggested fix
if (options.oldData && fs.existsSync(options.oldData)) {
- oldIndex = JSON.parse(fs.readFileSync(options.oldData, 'utf8'))
+ oldIndex = stripAugmentationFields(JSON.parse(fs.readFileSync(options.oldData, 'utf8')))
const m = options.oldData.match(/connect-([\d.]+)\.json$/)
if (m) oldVersion = m[1]
} else {
@@
const oldFile = existingDataFiles[existingDataFiles.length - 1]
oldVersion = oldFile.match(/connect-(\d+\.\d+\.\d+)\.json/)[1]
const oldPath = path.join(dataDir, oldFile)
- oldIndex = JSON.parse(fs.readFileSync(oldPath, 'utf8'))
+ oldIndex = stripAugmentationFields(JSON.parse(fs.readFileSync(oldPath, 'utf8')))
console.log(`📋 Using old version data: ${oldFile}`)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/redpanda-connect/rpcn-connector-docs-handler.js` around lines 943 -
945, The oldIndex is only stripped in the Antora fallback branch, causing shape
mismatches when oldIndex is loaded via the --old-data path or from
existingDataFiles; update every code path that assigns oldIndex (including the
--old-data/oldPath load and the existingDataFiles branch) to wrap the parsed
object with stripAugmentationFields(...) (i.e., replace direct assignments like
oldIndex = JSON.parse(fs.readFileSync(oldPath,'utf8')) or assignments from
existingDataFiles with oldIndex = stripAugmentationFields(JSON.parse(...)) ) so
that generateConnectorDiffJson() receives a consistently stripped oldIndex.
| // Load and strip augmentation fields for clean comparisons | ||
| let newIndex = stripAugmentationFields(JSON.parse(fs.readFileSync(dataFile, 'utf8'))) | ||
|
|
||
| // Save a clean copy of OSS data for binary analysis (before augmentation) | ||
| // This ensures the binary analyzer compares actual binaries, not augmented data | ||
| const cleanOssDataPath = path.join(dataDir, `._connect-${newVersion}-clean.json`) | ||
|
|
||
| // Strip augmentation fields to create clean data for comparison | ||
| // Use the already-stripped newIndex for clean data | ||
| const cleanData = JSON.parse(JSON.stringify(newIndex)) |
There was a problem hiding this comment.
Always restore the original snapshot after the binary-analysis swap.
If analyzeAllBinaries() throws after the rename/copy step, the catch at Line 1099 logs and continues, but the original connect-${newVersion}.json never gets restored. That leaves ._connect-${newVersion}-augmented.json.tmp behind and changes what the later drafting/cleanup stages read from disk.
🛡️ Suggested fix
const expectedPath = path.join(dataDir, `connect-${newVersion}.json`)
+ const augmentedBackupPath = path.join(dataDir, `._connect-${newVersion}-augmented.json.tmp`)
let tempRenamed = false
- if (fs.existsSync(cleanOssDataPath)) {
- if (fs.existsSync(expectedPath)) {
- fs.renameSync(expectedPath, path.join(dataDir, `._connect-${newVersion}-augmented.json.tmp`))
- tempRenamed = true
- }
- fs.copyFileSync(cleanOssDataPath, expectedPath)
- }
-
- const analysisOptions = {
- skipCloud: false,
- skipCgo: false,
- cgoVersion: options.cgoVersion || null
- }
-
- binaryAnalysis = await analyzeAllBinaries(
- newVersion,
- options.cloudVersion || null,
- dataDir,
- analysisOptions
- )
-
- // Restore the augmented file
- if (tempRenamed) {
- const expectedPath = path.join(dataDir, `connect-${newVersion}.json`)
- fs.unlinkSync(expectedPath)
- fs.renameSync(path.join(dataDir, `._connect-${newVersion}-augmented.json.tmp`), expectedPath)
- }
+ try {
+ if (fs.existsSync(cleanOssDataPath)) {
+ if (fs.existsSync(expectedPath)) {
+ fs.renameSync(expectedPath, augmentedBackupPath)
+ tempRenamed = true
+ }
+ fs.copyFileSync(cleanOssDataPath, expectedPath)
+ }
+
+ const analysisOptions = {
+ skipCloud: false,
+ skipCgo: false,
+ cgoVersion: options.cgoVersion || null
+ }
+
+ binaryAnalysis = await analyzeAllBinaries(
+ newVersion,
+ options.cloudVersion || null,
+ dataDir,
+ analysisOptions
+ )
+ } finally {
+ if (tempRenamed && fs.existsSync(augmentedBackupPath)) {
+ if (fs.existsSync(expectedPath)) fs.unlinkSync(expectedPath)
+ fs.renameSync(augmentedBackupPath, expectedPath)
+ }
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tools/redpanda-connect/rpcn-connector-docs-handler.js` around lines 950 -
958, The code currently swaps the snapshot files (creating
._connect-${newVersion}-augmented.json.tmp and renaming the original) before
calling analyzeAllBinaries(), but if analyzeAllBinaries() throws the original
connect-${newVersion}.json isn't restored; wrap the analyzeAllBinaries() call
and the rename/copy operations in a try/finally (or add a finally block) so that
regardless of errors you rename/move the temp augmented file back to its
expected filename and remove the .tmp, restoring the original snapshot;
specifically update the logic around where cleanOssDataPath, newIndex, and
analyzeAllBinaries() are used to perform the restore in finally to guarantee
cleanup and restore of connect-${newVersion}.json.
Fixes all issues identified in code review: **CLAUDE.md fixes:** - Update fetch command example to match actual CLI flags (--owner, --repo, --remote-path, --save-dir) - Add required --surface flag to bundle-openapi example **Input validation:** - Add string type validation for fromVersion/toVersion before calling .startsWith() **PR summary improvements:** - Fix otherConnectors to always show when present (remove restrictive condition) - Add binaryAnalysis.comparison guards before accessing properties - Include cloudOnly connectors in cloud-supported counts **Error handling:** - Add failure checking for intermediate processing before updating Antora version - Exit with error code if any intermediate release processing fails - Add try/finally for snapshot file restoration during binary analysis **Data consistency:** - Apply stripAugmentationFields consistently across all oldIndex load paths - Filter both cloudOnly and requiresCgo connectors in stripAugmentationFields - Ensure clean OSS-to-OSS comparisons for all diff generation **Version handling:** - Update regex to support prerelease versions (\d+\.\d+\.\d+(?:-[0-9A-Za-z-.]+)?) - Replace lexicographic sort with semver.rsort/semver.sort for correct version ordering - Applied to all 5 locations where version files are sorted **Test improvements:** - Add tests for empty children arrays in buildConfigYaml - Make cloud emoji assertion more specific (check connector line, not entire summary) - Add text language tags to ASCII art blocks in AUTOMATION.md for MD040 compliance Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add support metadata (certified/community/enterprise) from info.csv to connector documentation pages. Changes: - parse-csv-connectors.js: Extract support field from CSV - rpcn-connector-docs-handler.js: Parse CSV and pass metadata to generator - generate-rpcn-connector-docs.js: Build csvMetadataMap and lookup support level - connector.hbs: Add :support: attribute to frontmatter - Fix connector key format to use singular type (processor vs processors) Bump version to 4.15.7 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
The CGO/cloud detection uses plural types (inputs, outputs, processors) while CSV metadata uses singular types (input, output, processor). Fixed by using separate keys: - connectorKey (plural type) for CGO/cloud lookups - csvKey (singular item.type) for CSV metadata lookups This fixes the test failure in cgo-detection.test.js where requiresCgo flag wasn't being set correctly. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Summary
This PR adds multi-release attribution to the Redpanda Connect connector documentation automation. When releases are missed, the automation now processes each intermediate release separately instead of lumping all changes into the latest version.
Key improvements:
Multi-Release Attribution
Problem
When the automation missed weekly releases (e.g., 4.81.0 through 4.85.0), all changes were attributed to version 4.85.0 instead of their actual release version. Writers couldn't tell which features appeared in which version.
Solution
New CLI Flags
--skip-intermediate: Legacy mode (single comparison only)--from-version <version>: Override starting version instead of using antora.ymlBug Fixes
CGO-Only Component False Positives
Problem: Components like
tigerbeetle_cdc,zmq4,ffiexisted in OSS binaries all along but showed as "new" in 4.85.0 because augmented data was used for diff generation.Fix: Added
stripAugmentationFields()function that removes cloud/CGO augmentation before version comparisons. This ensures diffs compare clean OSS-to-OSS data.Result: 4.85.0 now shows 2 new components (correct) instead of 7 (eliminated 5 false positives).
Cloud Binary Version Mismatch
Problem: When processing intermediate releases, automation used latest cloud version (4.85.0) for ALL comparisons, causing incorrect platform attribution.
Fix: Added
findCloudVersionForDate()function that finds the appropriate cloud version for each OSS release date.Result: Each intermediate release now uses its contemporary cloud version (4.82.0 uses cloud 4.82.0, 4.83.0 uses cloud 4.83.0, etc.).
Cloud-Only Connector Labeling
Problem:
aws_cloudwatch_logswas placed in cloud-only directory but PR summary labeled it "self-hosted only".Fix: Changed from negative checks (
!inCloud) to explicit positive checks forisSelfHostedOnlyandisCloudOnly.Configuration YAML Improvements
Label field restriction: The
labelfield is now only added for components that support it (inputs, outputs, processors). Previously it was added for all types including caches and metrics where it's invalid.Common vs Advanced deduplication: When common and advanced configurations are identical, only the common config is shown with a leading sentence. No tabs are generated for duplicate content.
Enhanced PR Summary
The multi-version PR summary now includes:
New Files
tools/redpanda-connect/github-release-utils.jstools/redpanda-connect/multi-version-summary.jstools/redpanda-connect/AUTOMATION.mdCLAUDE.md__tests__/tools/github-release-utils.test.js__tests__/tools/buildConfigYaml.test.jsModified Files
rpcn-connector-docs-handler.jsstripAugmentationFields(), intermediate release processing loop, cloud version detectionpr-summary-formatter.jsbuildConfigYaml.jsbin/doc-tools.js--skip-intermediateand--from-versionflagsTest Coverage
66 tests across 3 test files (all passing)
buildConfigYaml.test.js: Label inclusion, deprecated fields, object/array renderinggithub-release-utils.test.js: Version parsing, release discovery,findCloudVersionForDate()pr-summary-formatter.test.js: Platform detection, multi-version summary, action itemsTest plan
npm test- all 66 tests pass--skip-intermediate🤖 Generated with Claude Code