Skip to content

Conversation

@Amrrx
Copy link
Contributor

@Amrrx Amrrx commented Jul 13, 2025

Description:

Summary

Adds configurable header-based splitting to MarkdownTextSplitter component for semantic document
chunking by writing a custom method since its not a native option in langchain.js.

Features

  • Dropdown selection for header levels (H1-H6)
  • Hierarchical splitting (H2 includes H1 headers)
  • Headers preserved with content sections
  • Prioritizes semantic boundaries over chunk size

Testing

✅ Tested with 23KB real-world markdown document
✅ All splitting scenarios working correctly
✅ Production build successful

Results

  • H1: 5 chunks (4,568 chars avg)
  • H2: 21 chunks (1,086 chars avg)
  • H3: 69 chunks (329 chars avg)

  - Add dropdown for header level selection (H1-H6)
  - Implement hierarchical splitting (H2 includes H1 headers)
  - Headers preserved with content sections
  - Prioritize semantic boundaries over chunk size
Copy link
Contributor

@HenryHengZJ HenryHengZJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@HenryHengZJ HenryHengZJ merged commit d584c0b into FlowiseAI:main Jul 18, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants