Skip to content

refactor: extract models.py, ui.py, docx_utils.py from index.py#40

Closed
Chetic wants to merge 3 commits intomainfrom
refactor/codebase-cleanup
Closed

refactor: extract models.py, ui.py, docx_utils.py from index.py#40
Chetic wants to merge 3 commits intomainfrom
refactor/codebase-cleanup

Conversation

@Chetic
Copy link
Copy Markdown
Owner

@Chetic Chetic commented Feb 26, 2026

Summary

  • Extract shared model utilities into models.py, eliminating duplication between index.py and search.py
  • Extract terminal UI classes (IndexingUI, FileProcessingContext, GracefulAbort) into ui.py (~560 lines)
  • Extract DOCX processing functions into docx_utils.py (~250 lines)
  • Modernize type annotations in index.py to Python 3.11+ syntax (list[], dict[], X | None)
  • Add ruff linter configuration to pyproject.toml

index.py drops from 2,655 to 1,720 lines (-35%).

Test plan

  • CI passes: all functional tests
  • ruff check src/chunksilo/ passes on new files
  • Verify no behavioral changes — purely mechanical extraction + import updates

🤖 Generated with Claude Code

Chetic and others added 2 commits February 26, 2026 06:30
…x.py

- Extract shared model utilities (_get_cached_model_path, resolve_flashrank_model_name, configure_offline_mode) into models.py, eliminating duplication between index.py and search.py
- Extract IndexingUI, FileProcessingContext, FileProcessingTimeoutError, GracefulAbort into ui.py (~560 lines)
- Extract DOCX processing (_parse_heading_level, _get_doc_temp_dir, _convert_doc_to_docx, split_docx_into_heading_documents) into docx_utils.py (~250 lines)
- Modernize type annotations in index.py to Python 3.11+ syntax (list[], dict[], X | None)
- Add ruff linter configuration to pyproject.toml

index.py drops from 2,655 to 1,720 lines (-35%).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the import of EXCLUDED_EMBED_METADATA_KEYS, EXCLUDED_LLM_METADATA_KEYS,
and get_heading_store to inside split_docx_into_heading_documents() to avoid
the index -> docx_utils -> index circular import at module load time.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a4ee99ddb9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

from llama_index.core import Document as LlamaIndexDocument

from . import cfgload
from .index import EXCLUDED_EMBED_METADATA_KEYS, EXCLUDED_LLM_METADATA_KEYS, get_heading_store
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Remove circular import between index and DOCX utils

Importing chunksilo.index now fails at module import time because index.py imports docx_utils, and docx_utils.py immediately imports EXCLUDED_EMBED_METADATA_KEYS, EXCLUDED_LLM_METADATA_KEYS, and get_heading_store back from index.py; those names are not defined yet when the first import is in progress, so Python raises ImportError from a partially initialized module. This blocks any workflow that loads chunksilo.index (including CLI indexing entrypoints) before runtime logic can execute.

Useful? React with 👍 / 👎.

Pass heading_store, excluded_embed_metadata_keys, and excluded_llm_metadata_keys
as parameters to split_docx_into_heading_documents() instead of importing them
from index.py. This cleanly breaks the circular dependency without any runtime
import tricks.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Chetic Chetic closed this Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant