Batch spell-checker invocations for improved performance by ehamiter · Pull Request #214 · facelessuser/pyspelling

ehamiter · 2026-03-06T16:35:38Z

Currently, pyspelling spawns a separate aspell/hunspell subprocess for every source file. For large projects this becomes a bottleneck — process creation overhead dominates actual spell-checking time.

This PR restructures run_task() to separate the extraction and checking phases:

Extract phase — All files are processed through the pipeline (concurrently via ProcessPoolExecutor when jobs > 1) to produce extracted text sources.
Check phase — A single _batch_spellcheck() call combines all extracted text and invokes aspell/hunspell once. Misspelled words are then mapped back to their originating source via word-boundary matching.

New internal methods:

SpellChecker._collect_pipeline_sources() — yields pipeline-extracted sources without checking
Aspell._batch_spellcheck() / Hunspell._batch_spellcheck() — single-invocation batch checking
SpellingTask.extract_only() — extraction entry point for worker processes

Existing single-file behavior and results are preserved; this is purely a performance optimization.

Example with a project that has ~175 template files for processing:

	aspell	hunspell
Before	20.450s	36.507s
After	6.644s	6.591s
Speedup	3.1×	5.5×

(Using the total wall-clock times, and rounding speedup to 1 decimal.)

Caveat: I am a professional software dev but I authored this with assistance from Claude code; I have tested locally and also reviewed it but not sure what your stance on LLM/AI contributions are. No hard feelings if this is a dealbreaker.

facelessuser · 2026-03-06T17:15:36Z

My feelings are mixed, but if you have the technical ability to stand behind the work and fix it accordingly, I'm open to exploring a path forward.

Right off the bat:

Spelling checks need fixing in CI, I believe they can all be fixed by using proper captialization for their name hunspell -> Hunspell, and maybe either using the full name str -> string or just putting code language in backquotes str -> `str`.
I would prefer that the code is well tested, currently, we have untested code, see code coverage. Does this code work as intended? We don't know right now.

If you are willing to thoroughly vet the code, then I am open to considering it, but if the expectation is for me to figure out what the AI did and debug it to ensure it works properly, then maybe not.

facelessuser · 2026-03-06T17:21:31Z

Basically, I need people who can speak to the changes they present, understand it, and be confident that everything included makes sense and works as expected.

ehamiter · 2026-03-06T17:24:19Z

A reasonable take! I'll work on fixing CI tests and adding some additional tests to help increase coverage. Then we can see if this is something you'd like to add or close.

facelessuser · 2026-03-06T17:25:42Z

Yep, if there is code that should not be there, then rip it out. I am okay with ignoring some coverage, only if there is a reasonable explanation as to why it is maybe not important.

ehamiter · 2026-03-06T17:31:33Z

Sounds good. I'm closing this for now and will make a new one later with additional tests and try to remove some of the duplication between the aspell and hunspell functions.

facelessuser · 2026-03-06T17:35:25Z

The big thing is to make sure everything that is being suggested has a real reason for being there as well. I saw some interesting UTF-8 code manipulation. I'm not sure if that is needed or not, but it wasn't even tested. While it passed existing tests, there was still logic that was added for...reasons? I didn't dig too deep, but I'm expecting that to be done by the submitter before proposing. I obviously will dig in once it seems due diligence has been put in.

facelessuser · 2026-03-06T17:45:12Z

Also, did you compare this to cases where you tweak -j?

Batch spell-check calls to reduce subprocess overhead

a42bae7

gir-bot added S: needs-review Needs to be reviewed and/or approved. C: source Related to source code. labels Mar 6, 2026

ehamiter closed this Mar 6, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Batch spell-checker invocations for improved performance#214

Batch spell-checker invocations for improved performance#214
ehamiter wants to merge 1 commit intofacelessuser:masterfrom
ehamiter:master

ehamiter commented Mar 6, 2026

Uh oh!

facelessuser commented Mar 6, 2026

Uh oh!

facelessuser commented Mar 6, 2026

Uh oh!

ehamiter commented Mar 6, 2026

Uh oh!

facelessuser commented Mar 6, 2026

Uh oh!

ehamiter commented Mar 6, 2026

Uh oh!

facelessuser commented Mar 6, 2026

Uh oh!

facelessuser commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ehamiter commented Mar 6, 2026

Uh oh!

facelessuser commented Mar 6, 2026

Uh oh!

facelessuser commented Mar 6, 2026

Uh oh!

ehamiter commented Mar 6, 2026

Uh oh!

facelessuser commented Mar 6, 2026

Uh oh!

ehamiter commented Mar 6, 2026

Uh oh!

facelessuser commented Mar 6, 2026

Uh oh!

facelessuser commented Mar 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants