Commit ef623ff
authored
fix(docx): slow table parsing (#2553)
* chore(docx): remove unnecessary import
Signed-off-by: Cesar Berrospi Ramis <[email protected]>
* fix(docx): simplify parsing of simple tables
Simplify the parsing of tables with just text (no rich cells).
Move nested function group_cell_elements out of _handle_tables for readability.
Signed-off-by: Cesar Berrospi Ramis <[email protected]>
* chore(docx): reuse method for finding inline pictures
Signed-off-by: Cesar Berrospi Ramis <[email protected]>
* chore(docx): format strikethrough text
Signed-off-by: Cesar Berrospi Ramis <[email protected]>
* tests(docx): use fixtures to avoid converting same file multiple times
Signed-off-by: Cesar Berrospi Ramis <[email protected]>
* fix(docx): remove unnecessary argument docx_obj in functions
Signed-off-by: Cesar Berrospi Ramis <[email protected]>
* tests(docx): add test for rich table cells
Signed-off-by: Cesar Berrospi Ramis <[email protected]>
* chore(docx): small improvements in backend and its unit tests
Signed-off-by: Cesar Berrospi Ramis <[email protected]>
* chore(docx): parse superscript and subscript formatted text
Signed-off-by: Cesar Berrospi Ramis <[email protected]>
---------
Signed-off-by: Cesar Berrospi Ramis <[email protected]>1 parent 0ba8d5d commit ef623ff
File tree
6 files changed
+3339
-191
lines changed- docling/backend
- tests
- data
- docx
- groundtruth/docling_v2
6 files changed
+3339
-191
lines changed
0 commit comments