Skip to content

✨ Fuzzed string#4012

Open
dubzzz wants to merge 5 commits intomainfrom
fuzzed-string
Open

✨ Fuzzed string#4012
dubzzz wants to merge 5 commits intomainfrom
fuzzed-string

Conversation

@dubzzz
Copy link
Owner

@dubzzz dubzzz commented Jun 27, 2023

Category:

  • ✨ Introduce new features
  • 📝 Add or update documentation
  • ✅ Add or update tests
  • 🐛 Fix a bug
  • 🏷️ Add or update types
  • ⚡️ Improve performance
  • Other(s): ...

Potential impacts:

  • Generated values
  • Shrink values
  • Performance
  • Typings
  • Other(s): ...

@codesandbox-ci
Copy link

codesandbox-ci bot commented Jun 27, 2023

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit 36d789d:

Sandbox Source
Vanilla Configuration
@fast-check/examples Configuration

@codecov
Copy link

codecov bot commented Jun 27, 2023

Codecov Report

Merging #4012 (021223d) into main (95492a7) will decrease coverage by 1.05%.
The diff coverage is 15.38%.

❗ Current head 021223d differs from pull request most recent head 36d789d. Consider uploading reports for the commit 36d789d to get more accurate results

@@            Coverage Diff             @@
##             main    #4012      +/-   ##
==========================================
- Coverage   94.61%   93.57%   -1.05%     
==========================================
  Files         209      210       +1     
  Lines        5848     5926      +78     
  Branches     1344     1353       +9     
==========================================
+ Hits         5533     5545      +12     
- Misses        299      360      +61     
- Partials       16       21       +5     
Flag Coverage Δ
unit-tests 93.57% <15.38%> (-1.05%) ⬇️
unit-tests-14.x-Linux 93.57% <15.38%> (-1.05%) ⬇️
unit-tests-16.x-Linux 93.52% <15.38%> (-1.10%) ⬇️
unit-tests-18.x-Linux 93.57% <15.38%> (-1.05%) ⬇️
unit-tests-latest-Linux 93.57% <15.38%> (-1.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
packages/fast-check/src/arbitrary/fuzzedString.ts 12.00% <12.00%> (ø)
packages/fast-check/src/fast-check-default.ts 100.00% <100.00%> (ø)

@dubzzz dubzzz changed the title Fuzzed string ✨ Fuzzed string Aug 29, 2023
@dubzzz dubzzz added the 🧪 Experiment Performing some experiments for potential future features implying deep changes label Aug 30, 2023
@dubzzz
Copy link
Owner Author

dubzzz commented Mar 7, 2026

@claude review

@claude
Copy link
Contributor

claude bot commented Mar 7, 2026

Claude encountered an error —— View job


I'll analyze this and get back to you.

@dubzzz
Copy link
Owner Author

dubzzz commented Mar 7, 2026

@claude review

@claude
Copy link
Contributor

claude bot commented Mar 7, 2026

Claude finished @dubzzz's task in 1m 30s —— View job


Review of PR #4012: ✨ Fuzzed string

  • Read changed files and understand the implementation
  • Analyze code quality, correctness, and potential issues
  • Post review feedback

This PR introduces a Markov chain-based string fuzzer that generates strings resembling a given corpus. Here's my review:

Design concerns

1. fuzzedString10 is a code smell — use a constraints object instead

Exporting two separate functions (fuzzedString with depth=1 and fuzzedString10 with depth=10) doesn't follow the existing fast-check API conventions. Every other arbitrary uses a constraints object for configuration. This should be a single fuzzedString(corpus, constraints?) where constraints includes depth (and eventually strictness, charArb, etc).

2. Strictness parameter is dead code (fuzzedString.ts:78-80)

The constructor accepts strictness: 0 | 1 | 2 but immediately throws if it's not 2. This parameter is unused — either remove it or implement the other modes before merging.

if (this.strictness !== 2) {
  throw new Error('Strictness not being 2, not implemented');
}

3. No shrinking support (fuzzedString.ts:118-124)

canShrinkWithoutContext returns false and shrink returns an empty stream. This significantly reduces the usefulness of this arbitrary for property-based testing — counterexamples won't be minimized. At minimum, this limitation should be documented. Ideally, a basic shrinking strategy (e.g., shrinking toward shorter strings or toward corpus entries) should be implemented.

4. Missing JSDoc annotations

The exported functions at lines 127-133 have no JSDoc comments. Other arbitraries in fast-check include @remarks Since x.x.x and @public annotations and parameter documentation.

Correctness issues

5. Potential crash when allTransitions is empty (fuzzedString.ts:83-104)

If no transitions are found for the current state (which can happen with higher depths where the exact n-gram combination wasn't seen in the corpus), allTransitions will be empty. This causes:

  • totalWeight = 0
  • mrng.nextInt(0, -1) — undefined behavior
  • allTransitions[0][1] — crash on empty array access

This is a real bug that can manifest with small corpora and high depth values.

Fix this →

6. Composite key collisions in multiFromToSingleFrom (fuzzedString.ts:14-18)

The function serializes multi-character state as startCount + ':' + nonStart.join(''). Since nonStart values are joined without a separator, different state sequences can produce the same key. For example, states ["ab", "c"] and ["a", "bc"] both produce "0:abc". This causes incorrect transition weights.

Fix this →

7. Duplicate transitions across depths inflate weights non-uniformly (fuzzedString.ts:84-92)

The loop collects transitions from all depths 1..N and concatenates them into allTransitions. When the same transition appears at multiple depth levels, it gets duplicated in the array. This means its effective weight is multiplied by the number of depths it appears at, which skews the probability distribution in hard-to-predict ways. If this is intentional, it should be documented; if not, transitions should be deduplicated with merged weights.

Documentation / test issues

8. Duplicate documentation files with same slug

Both fuzzed.md and fuzzed10.md have slug: /core-blocks/arbitraries/fake-data/fuzzed/ — this will cause a Docusaurus build conflict. They should either be merged into one file or have distinct slugs.

9. Very low test coverage (15.38%)

There are no unit tests for the FuzzedString class. The only "testing" is through the documentation example generation in the e2e test. Edge cases like empty strings in corpus, single-character corpus, very short corpus, and high depth are untested.

10. Hardcoded special-casing in test file (Docs.md.spec.ts:147, 158)

The test file has fuzzedString-specific hardcoded branches (snippet.includes('fuzzedString') ? 10 * numRuns : numRuns). This doesn't scale — a better approach would be a marker comment in the doc file or a generic mechanism.

Minor nits

  • The _biasFactor parameter in generate (line 114) is ignored — bias is typically used to guide generation toward edge cases, which would be valuable here.
  • The constructor builds the transition map eagerly for all depths 1..N, which could be memory-intensive for large corpora with high depth. Consider lazy construction or documenting the trade-off.

Summary

This is an interesting POC for Markov chain-based string generation, but it needs more work before merging:

  1. Critical: Fix the empty allTransitions crash (issue Suggest smaller entries at first generations  #5)
  2. Critical: Fix the composite key collision (issue Need README  #6)
  3. Important: Replace fuzzedString/fuzzedString10 with a single function + constraints object
  4. Important: Add unit tests
  5. Nice to have: Implement basic shrinking

--- | fuzzed-string

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🧪 Experiment Performing some experiments for potential future features implying deep changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant