Improve performance in bigger projects by RobinMalfait · Pull Request #19632 · tailwindlabs/tailwindcss

RobinMalfait · 2026-02-02T11:28:34Z

This PR improves the performance of Oxide when scanning large codebases.

The Oxide API, looks something like this:

let scanner = new Scanner({ sources })
let candidates = scanner.scan() // Found candidates
let files = scanner.files // Scanned files
let globs = scanner.globs // Scanned globs

The files and globs are used to tell PostCSS, Vite, webpack etc which files to watch for changes.
The .scan() operation extracts the candidates from the source files. You can think of these as potential Tailwind CSS classes.

In all these scenarios we have to walk the file system and find files that match the sources.

1. Prevent multiple file system walks

The first big win came from the fact that accessing .files after a .scan() also does an entire walk of the file system (for the given sources), which is unnecessary because we just walked the file system.

This is something that's not really an issue in smaller codebases because we have mtime tracking. We don't re-scan a file if its mtime hasn't changed since the last scan. However, in large codebases with thousands of files, even walking the file system to check mtimes can be expensive.

2. Use parallel file system walking

Another big win is to use a parallel file system walker instead of a synchronous one. The big problem here is that the parallel build has 20ms-50ms of overhead which is noticeable on small codebases. We don't really know if you have a small or big codebase ahead of time, so maybe some kind of hint in the future would be useful.

So the solution I settled on right now is to use a synchronous walker for the initial scan, and then switch to a parallel walker for subsequent scans (think dev mode). This gives us the best of both worlds: fast initial scan on small codebases, and fast re-scans on large codebases.

Caveat: if you use the @tailwindcss/cli we know exactly which files changed so we can just re-scan those files directly without walking the file system at all. But in @tailwindcss/postcss we don't know which files changed, so we have to walk the file system to check mtimes.

While this improvement is nice, it resulted in an annoying issue related to mtime tracking. Since the parallel walker processes files in parallel, the mtime was typed as Arc<Mutex<FxHashMap<PathBuf, SystemTime>>> so to avoid locking, I decided to only walk the files here and collect their paths. Then later we check the mtime to know whether to re-scan them or not.

Initially I just removed the mtime tracking altogether. But it did have an impact when actually extracting candidates from those files, so I added it back later.

3. Delaying work

I was still a bit annoyed by the fact that we had to track mtime values for every file. This seems like annoying overhead, especially when doing a single build (no dev mode).

So the trick I applied here is to only start tracking mtime values after the initial scan.

This means that, in dev mode, we would do this:

Walk entire file system to track files.
On a subsequent scan, walk entire file system (again) and start tracking mtime values. This time, we use the parallel walker instead of the synchronous one.
On further scans, only re-scan files whose mtime has changed

The trade-off here is that on the second scan we always re-scan all files, even if they haven't changed. Since this typically only happens in dev mode, I think this is an acceptable trade-off especially if the initial build is therefor faster this way.

3. Small wins

There are also a few small wins in here that I would like to mention but that are less significant:

Pre-computed normalized source patterns instead of in every walker filter call.
Tried to avoid some allocations in various places. For example the pre_process_input always called content.to_vec() which allocates. Instead we now accept an owned Vec<u8> so we don't have to call .to_vec() in the default case (in my testing, this is ~92% of the time in the codebases I checked).
Made the Cursor struct smaller, which is used a lot during candidate extraction.

Benchmarks

Now for the fun stuff, the benchmarks!

The code for the benchmarks

import path from 'node:path'
import { bench, boxplot, do_not_optimize, run, summary } from 'mitata'

import { Scanner as ScannerPr } from '/path/to/repo/with/pr/branch/tailwindcss/crates/node'
import { Scanner as ScannerMain } from '/path/to/repo/with/main/branch/tailwindcss/crates/node'

let base = '/path/to/some/codebase'
let sources = [{ base, pattern: '**/*', negated: false }]

// Verify the results are the same before benchmarking
let scannerPr = new ScannerPr({ sources })
let scannerMain = new ScannerMain({ sources })

{
  let aCandidates = scannerPr.scan()
  let bCandidates = scannerMain.scan()

  if (aCandidates.length !== bCandidates.length) {
    throw new Error(`Mismatch in candidate count: ${aCandidates.length} vs ${bCandidates.length}`)
  }

  for (let i = 0; i < aCandidates.length; i++) {
    if (aCandidates[i] !== bCandidates[i]) {
      throw new Error(`Mismatch in candidate at index ${i}: ${aCandidates[i]} vs ${bCandidates[i]}`)
    }
  }

  let aFiles = scannerPr.files
  let bFiles = scannerMain.files

  if (aFiles.length !== bFiles.length) {
    throw new Error(`Mismatch in file count: ${aFiles.length} vs ${bFiles.length}`)
  }

  for (let i = 0; i < aFiles.length; i++) {
    if (aFiles[i] !== bFiles[i]) {
      throw new Error(`Mismatch in file at index ${i}: ${aFiles[i]} vs ${bFiles[i]}`)
    }
  }

  console.log('Scanned', aFiles.length, 'files')
  console.log('Extracted', aCandidates.length, 'candidates')
  console.log('Base =', base)
  console.log()
}

summary(() => {
  boxplot(() => {
    bench('PR (build, .scan()))', function* () {
      yield {
        [0]() {
          return new ScannerPr({ sources })
        },
        bench(scanner: ScannerPr) {
          do_not_optimize(scanner.scan())
        },
      }
    })

    bench('main (build, .scan()))', function* () {
      yield {
        [0]() {
          return new ScannerMain({ sources })
        },
        bench(scanner: ScannerMain) {
          do_not_optimize(scanner.scan())
        },
      }
    })
  })
})

summary(() => {
  boxplot(() => {
    bench('PR (build, .scan() + .files)', function* () {
      yield {
        [0]() {
          return new ScannerPr({ sources })
        },
        bench(scanner: ScannerPr) {
          do_not_optimize(scanner.scan())
          do_not_optimize(scanner.files)
        },
      }
    })

    bench('main (build, .scan() + .files)', function* () {
      yield {
        [0]() {
          return new ScannerMain({ sources })
        },
        bench(scanner: ScannerMain) {
          do_not_optimize(scanner.scan())
          do_not_optimize(scanner.files)
        },
      }
    })
  })
})

summary(() => {
  boxplot(() => {
    bench('PR (watch, .scan()))', function* () {
      yield {
        bench() {
          do_not_optimize(scannerPr.scan())
        },
      }
    })

    bench('main (watch, .scan()))', function* () {
      yield {
        bench() {
          do_not_optimize(scannerMain.scan())
        },
      }
    })
  })
})

summary(() => {
  boxplot(() => {
    bench('PR (watch, .scan() + .files)', function* () {
      yield {
        bench() {
          do_not_optimize(scannerPr.scan())
          do_not_optimize(scannerPr.files)
        },
      }
    })

    bench('main (watch, .scan() + .files)', function* () {
      yield {
        bench() {
          do_not_optimize(scannerMain.scan())
          do_not_optimize(scannerMain.files)
        },
      }
    })
  })
})

await run()

tailwindcss.com codebase

Scanned 462 files
Extracted 13200 candidates
Base = /Users/robin/github.com/tailwindlabs/tailwindcss.com

clk: ~3.09 GHz
cpu: Apple M1 Max
runtime: bun 1.3.3 (arm64-darwin)

In these benchmarks the PR one is consistently faster than main. It's not by a lot but that's mainly because the codebase itself isn't that big. It is a codebase with a lot of candidates though, but not that many files.

The candidate extraction was already pretty fast, so the wins here mainly come from avoiding re-walking the file system when accessing .files, and from delaying mtime tracking until after the initial scan.

Single initial build:

It's not a lot, but it's a bit faster. This is due to avoiding tracking the mtime values initially and making some small optimizations related to the struct size and allocations.

benchmark                     avg (min … max) p75 / p99    (min … top 1%)
--------------------------------------------- -------------------------------
PR (build, .scan()))            22.87 ms/iter  23.28 ms               █
                        (21.49 ms … 25.68 ms)  23.98 ms     ▂  ▂   ▂  █  ▂▂
                      (832.00 kb …   2.69 mb)   1.41 mb ▆▆▆▆█▆▆█▁▆▆█▁▁█▁▆██▁▆

main (build, .scan()))          25.67 ms/iter  26.12 ms █   █      █
                        (24.54 ms … 27.74 ms)  27.06 ms █   █ █    ███
                      (432.00 kb …   2.78 mb) 996.00 kb ██▁████▁█▁████▁█▁▁█▁█

                               ┌                                            ┐
                               ╷    ┌─────┬──┐     ╷
          PR (build, .scan())) ├────┤     │  ├─────┤
                               ╵    └─────┴──┘     ╵
                                                        ╷  ┌─────┬──┐       ╷
        main (build, .scan()))                          ├──┤     │  ├───────┤
                                                        ╵  └─────┴──┘       ╵
                               └                                            ┘
                               21.49 ms           24.28 ms           27.06 ms

summary
  PR (build, .scan()))
   1.12x faster than main (build, .scan()))

Single initial build + accessing .files:

We don't have to re-walk the entire file system even if we're just dealing with ~462 scanned files.

benchmark                     avg (min … max) p75 / p99    (min … top 1%)
--------------------------------------------- -------------------------------
PR (build, .scan() + .files)    22.54 ms/iter  22.99 ms █      ▂
                        (21.41 ms … 25.86 ms)  24.26 ms █ ▅  ▅▅█▅  ▅▅
                      (368.00 kb …   2.05 mb) 853.00 kb █▇█▇▇████▇▇██▁▁▇▁▁▇▁▇

main (build, .scan() + .files)  32.15 ms/iter  32.17 ms  █  ▂
                        (30.78 ms … 36.22 ms)  35.75 ms ▅█ ▅█  ▅
                      (400.00 kb …   2.45 mb) 952.00 kb ██▁██▇▇█▇▁▁▁▁▁▁▁▁▁▁▁▇

                               ┌                                            ┐
                               ╷┌──┬┐   ╷
  PR (build, .scan() + .files) ├┤  │├───┤
                               ╵└──┴┘   ╵
                                                            ╷┌───┬          ╷
main (build, .scan() + .files)                              ├┤   │──────────┤
                                                            ╵└───┴          ╵
                               └                                            ┘
                               21.41 ms           28.58 ms           35.75 ms

summary
  PR (build, .scan() + .files)
   1.43x faster than main (build, .scan() + .files)

Watch/dev mode, only scanning:

This now switches to the parallel walker, but since it's not a super big codebase we don't see a huge win here yet.

benchmark                     avg (min … max) p75 / p99    (min … top 1%)
--------------------------------------------- -------------------------------
PR (watch, .scan()))             6.85 ms/iter   7.22 ms   █▄
                          (6.34 ms … 7.94 ms)   7.91 ms  ▄██▃
                      ( 64.00 kb … 688.00 kb) 452.82 kb ▃████▆▂▂▁▂▁▂▁▁▁▅█▆▃▅▃

main (watch, .scan()))           7.92 ms/iter   8.08 ms   █ █  ▃ █▃▃
                          (7.41 ms … 8.71 ms)   8.68 ms   █▆█▆▃█████
                      (  0.00  b …  64.00 kb)  19.20 kb ▆▄██████████▆▁▆▆█▄▄▄▆

                               ┌                                            ┐
                               ╷  ┌──────┬──────┐            ╷
          PR (watch, .scan())) ├──┤      │      ├────────────┤
                               ╵  └──────┴──────┘            ╵
                                                    ╷    ┌───┬──┐           ╷
        main (watch, .scan()))                      ├────┤   │  ├───────────┤
                                                    ╵    └───┴──┘           ╵
                               └                                            ┘
                               6.34 ms            7.51 ms             8.68 ms

summary
  PR (watch, .scan()))
   1.16x faster than main (watch, .scan()))

Watch/dev mode, scanning + accessing .files:

Again we avoid re-walking the entire file system when accessing .files.

benchmark                     avg (min … max) p75 / p99    (min … top 1%)
--------------------------------------------- -------------------------------
PR (watch, .scan() + .files)    12.10 ms/iter  12.74 ms █ █     █ █ ▃▃▃
                        (10.69 ms … 13.89 ms)  13.81 ms █ █▂▂▂ ▇█▂█▂███▇
                      (128.00 kb …  10.73 mb)   5.23 mb █▆████▁█████████▆▆▆▆▆

main (watch, .scan() + .files)  14.44 ms/iter  14.74 ms   █
                        (13.93 ms … 15.33 ms)  15.18 ms   ███▅ █ ▅   ▅
                      ( 16.00 kb …  80.00 kb)  39.51 kb █▅████▁███▅▁█████▅▁▅▅

                               ┌                                            ┐
                               ╷      ┌──────┬──────┐         ╷
  PR (watch, .scan() + .files) ├──────┤      │      ├─────────┤
                               ╵      └──────┴──────┘         ╵
                                                               ╷ ┌───┬──┐   ╷
main (watch, .scan() + .files)                                 ├─┤   │  ├───┤
                                                               ╵ └───┴──┘   ╵
                               └                                            ┘
                               10.69 ms           12.93 ms           15.18 ms

summary
  PR (watch, .scan() + .files)
   1.19x faster than main (watch, .scan() + .files)

Synthetic 5000 files codebase

Based on the instructions from #19616 I created a codebase with 5000 files. Each file contains a flex class and a unique class like content-['/path/to/file'] to ensure we have a decent amount of unique candidates.

You can test the script yourself by running this:

mkdir -p fixtures/app-5000/src/components/{auth,dashboard,settings,profile,notifications,messages,search,navigation,footer,sidebar}/sub{001..500} && for dir in fixtures/app-5000/src/components/*/sub*; do echo "export const Component = () => <div className=\"flex content-['$dir']\">test</div>" > "$dir/index.tsx"; done && find fixtures/app-5000/src/components -type f | wc -lc

Scanned 5000 files
Extracted 5005 candidates
Base = /Users/robin/github.com/RobinMalfait/playground/scanner-benchmarks/fixtures/app-5000

clk: ~3.08 GHz
cpu: Apple M1 Max
runtime: bun 1.3.3 (arm64-darwin)

Single initial build:

As expected not a super big win here because it's a single build. But there is a noticeable improvement.

benchmark                     avg (min … max) p75 / p99    (min … top 1%)
--------------------------------------------- -------------------------------
PR (build, .scan()))           217.27 ms/iter 211.97 ms              █
                      (205.99 ms … 289.53 ms) 214.33 ms ▅      ▅ ▅▅▅▅█▅▅    ▅
                      (  3.34 mb …   4.25 mb)   3.72 mb █▁▁▁▁▁▁█▁███████▁▁▁▁█

main (build, .scan()))         249.26 ms/iter 239.88 ms               █
                      (231.51 ms … 381.66 ms) 241.01 ms ▅    ▅ ▅▅  ▅  █▅  ▅▅▅
                      (  4.22 mb …   4.78 mb)   4.49 mb █▁▁▁▁█▁██▁▁█▁▁██▁▁███

                               ┌                                            ┐
                               ╷    ┌─────╷──┬
          PR (build, .scan())) ├────┤     ┤  │
                               ╵    └─────╵──┴
                                                                ╷   ┌───────╷
        main (build, .scan()))                                  ├───┤       ┤
                                                                ╵   └───────╵
                               └                                            ┘
                               205.99 ms         223.50 ms          241.01 ms

summary
  PR (build, .scan()))
   1.15x faster than main (build, .scan()))

Single initial build + accessing .files:

Now things are getting interesting. Almost a 2x speedup by avoiding re-walking the file system when accessing .files.

benchmark                     avg (min … max) p75 / p99    (min … top 1%)
--------------------------------------------- -------------------------------
PR (build, .scan() + .files)   216.35 ms/iter 214.53 ms █ █ █
                      (211.00 ms … 242.64 ms) 221.45 ms █ █▅█ ▅▅      ▅     ▅
                      (  2.97 mb …   4.47 mb)   3.97 mb █▁███▁██▁▁▁▁▁▁█▁▁▁▁▁█

main (build, .scan() + .files) 414.79 ms/iter 406.05 ms     ██
                      (396.72 ms … 542.30 ms) 413.69 ms ▅   ██▅ ▅▅ ▅ ▅      ▅
                      (  5.19 mb …   6.03 mb)   5.63 mb █▁▁▁███▁██▁█▁█▁▁▁▁▁▁█

                               ┌                                            ┐
                               ┌┬╷
  PR (build, .scan() + .files) ││┤
                               └┴╵
                                                                        ╷┌──╷
main (build, .scan() + .files)                                          ├┤  ┤
                                                                        ╵└──╵
                               └                                            ┘
                               211.00 ms         312.34 ms          413.69 ms

summary
  PR (build, .scan() + .files)
   1.92x faster than main (build, .scan() + .files)

Watch/dev mode, only scanning:

This is where we see bigger wins because now we're using the parallel walker.

benchmark                     avg (min … max) p75 / p99    (min … top 1%)
--------------------------------------------- -------------------------------
PR (watch, .scan()))            76.26 ms/iter  77.41 ms                     █
                        (73.56 ms … 79.02 ms)  77.81 ms ▅   ▅  ▅ ▅▅ ▅▅▅   ▅ █
                      (  2.53 mb …   5.52 mb)   3.06 mb █▁▁▁█▁▁█▁██▁███▁▁▁█▁█

main (watch, .scan()))         166.71 ms/iter 165.14 ms █  █
                      (161.49 ms … 198.26 ms) 168.99 ms █ ▅█ ▅▅▅  ▅  ▅      ▅
                      (  1.08 mb …   2.72 mb)   1.24 mb █▁██▁███▁▁█▁▁█▁▁▁▁▁▁█

                               ┌                                            ┐
                               ╷┬┐
          PR (watch, .scan())) ├││
                               ╵┴┘
                                                                        ╷┌─┬╷
        main (watch, .scan()))                                          ├┤ │┤
                                                                        ╵└─┴╵
                               └                                            ┘
                               73.56 ms          121.28 ms          168.99 ms

summary
  PR (watch, .scan()))
   2.19x faster than main (watch, .scan()))

Watch/dev mode, scanning + accessing .files:

This is the biggest win of them all because we have all the benefits combined:

Avoiding re-walking the file system when accessing .files
Using the parallel walker for faster file system walking

benchmark                     avg (min … max) p75 / p99    (min … top 1%)
--------------------------------------------- -------------------------------
PR (watch, .scan() + .files)    84.04 ms/iter  84.84 ms            █
                        (80.96 ms … 87.53 ms)  87.27 ms ▅▅   ▅▅ ▅▅ █▅ ▅     ▅
                      ( 15.42 mb …  31.34 mb)  22.16 mb ██▁▁▁██▁██▁██▁█▁▁▁▁▁█

main (watch, .scan() + .files) 338.59 ms/iter 353.89 ms   █
                      (321.87 ms … 378.43 ms) 358.70 ms   █   █
                      (  2.39 mb …   2.45 mb)   2.42 mb ███▁▁██▁▁▁▁▁▁▁▁▁▁██▁█

                               ┌                                            ┐
                               ┬┐
  PR (watch, .scan() + .files) ││
                               ┴┘
                                                                      ┌──┬─┐╷
main (watch, .scan() + .files)                                        │  │ ├┤
                                                                      └──┴─┘╵
                               └                                            ┘
                               80.96 ms          219.83 ms          358.70 ms

summary
  PR (watch, .scan() + .files)
   4.03x faster than main (watch, .scan() + .files)

Test plan

All existing tests still pass
All public APIs remain the same
In the benchmarks I'm sharing, I first verify that the candidates returned and the files returned are the same before and after the change.
Benchmarked against real codebases, and against a synthetic large codebase (5000 files).

Fixes: #19616

pompom454 · 2026-02-05T19:58:09Z

This looks very promising! I can't wait to see what comes out of this!

Whenever we call `.scan()`, we will use `.scan_sources()` internally to walk the file tree. When we access `.files` or `.globs` we also ensure that we have the data available by calling `.scan_sources`. But now we will prevent a double scan in case we already used `.scan()`. Note: whenever we call `.scan()`, we will traverse the file system again.

Not relevant, and we can re-introduce it if needed.

… in walker filter

If we increase `self.pos` until it exceeds the `usize`, then we have bigger problems... `self.pos + 1` should be safe enough.

We're only storing the `input` and the `pos` such that the Cursor size is much smaller. The `prev`, `curr` and `next` are now methods that compute the values when needed. We also inline those function calls so there is no additional overhead.

This way we don't have to call `.to_vec()` in the default case, which is the majority of the files in a typical project.

We dropped it from the `filter_entry` before because we wanted to introduce `build_parallel`. We had to walk all files anyway, so now we will check the `mtime` before actually extracting candidates from the files.

Accessing the `mtime` of a file has some overhead. When we call `.scan()` (think build mode), then we just scan all the files. There is no need to track `mtime` yet. If we call `.scan()` a second time, then we are in a watcher mode environment. Only at this time do we start tracking `mtimes`. This technically means that we will 1 full scan for the initial scan, the second scan is yet another full scan, but from that point onwards we use the `mtime` information. The biggest benefit is that the initial call stays fast without overhead, which is perfect for a production build.

The walk_parallel is useful and faster, but only in "watch" mode when everything is considered warm. Walk parallel has a 20ms-50ms overhead cost in my testing when I just run a build instead of running in watch mode. So right now we will use a normal walk, and use a parallel walk in watch mode. The walk_parallel is still faster for large codebases, but we don't know that ahead of time unfortunately...

coderabbitai · 2026-02-17T19:16:57Z

Walkthrough

The pull request refactors the Cursor struct to encapsulate state management through accessor methods (curr(), next(), prev()) instead of public fields, adding Copy trait support. All cursor field accesses are updated to method calls throughout the extractor and pre-processor modules. The Scanner component is enhanced for incremental scanning with mtime-based change detection, using FxHashSet for deduplication of files and directories. A new init_tracing module provides runtime tracing initialization. The pre_process_input function signature changes to accept Vec ownership instead of slice references. Extraction functions return FxHashSet for deduplication. The Scanner gains a scan_content method to process incremental content changes. A CHANGELOG entry documents performance improvements for the Oxide scanner in larger projects.

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title 'Improve performance in bigger projects' directly aligns with the main objective of the changeset, which focuses on performance improvements for scanning large codebases.
Description check	✅ Passed	The PR description comprehensively explains the performance improvements, including preventing multiple filesystem walks, using parallel file system walking with synchronous fallback, and delaying mtime tracking, which directly relate to the extensive changes across multiple files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/oxide/src/cursor.rs (1)
39-61: ⚠️ Potential issue | 🔴 Critical

Add bounds check in prev() to prevent out-of-bounds reads when pos > input.len().

The prev() method is vulnerable to OOB reads because pos can exceed input.len(). While advance_by() uses move_to() which clamps pos, the public advance() and advance_twice() methods directly increment pos without bounds checking, allowing pos to exceed input.len(). When prev() is called in this state, it only checks pos > 0 before accessing get_unchecked(self.pos - 1), risking an out-of-bounds read.
Proposed fix
 pub fn prev(&self) -> u8 {
-    if self.pos > 0 {
-        unsafe { *self.input.get_unchecked(self.pos - 1) }
-    } else {
-        0x00
-    }
+    if self.pos == 0 || self.pos > self.input.len() {
+        0x00
+    } else {
+        unsafe { *self.input.get_unchecked(self.pos - 1) }
+    }
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/oxide/src/cursor.rs` around lines 39 - 61, prev() currently assumes
pos>0 is safe but can read OOB if pos > input.len(); either clamp pos on
mutation or guard reads. Fix by ensuring prev() checks that self.pos > 0 AND
self.pos <= self.input.len() before using get_unchecked(self.pos - 1) (i.e.,
return 0x00 if self.pos == 0 or self.pos > self.input.len()), or alternatively
have advance() and advance_twice() call move_to(self.pos + 1) / move_to(self.pos
+ 2) (like advance_by) so pos is always clamped; update either prev(), or
advance()/advance_twice() (or both) referencing the prev, advance,
advance_twice, advance_by, and move_to symbols to eliminate OOB reads.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@CHANGELOG.md`:
- Line 31: Update the changelog entry string "Improve performance Oxide scanner
in bigger projects
([`#19632`](https://github.com/tailwindlabs/tailwindcss/pull/19632))" to read
"Improve performance of the Oxide scanner in bigger projects
([`#19632`](https://github.com/tailwindlabs/tailwindcss/pull/19632))" to fix the
grammar; locate and edit the exact line containing that phrase in CHANGELOG.md.

In `@crates/oxide/src/scanner/init_tracing.rs`:
- Around line 47-54: The code panics on non-UTF8 paths because
as_path().to_str().unwrap() is used when printing absolute_file_path; replace
that with a lossless conversion such as as_path().to_string_lossy() (or
equivalent) so highlight receives a safe &str without panicking. Update the call
that builds the eprintln (referencing file_path and absolute_file_path) to use
the lossy string form before passing into highlight/dim to make debug tracing
resilient to non-UTF8 paths.

In `@crates/oxide/src/scanner/mod.rs`:
- Around line 356-447: The discover_sources function currently only inserts into
self.files, self.dirs, self.extensions and self.mtimes which causes deleted
files to persist and mtimes to grow; fix by clearing per-scan caches at the
start of discover_sources (reset self.files, self.dirs, self.extensions and any
per-scan collections used for this pass) before populating them, then after
iterating all_entries prune self.mtimes to retain entries only for the files
seen in this scan (use the seen_files set collected during the loop) so mtimes
doesn’t grow unbounded and get_files()/get_globs() reflect deletions. Ensure you
still preserve mtimes for currently-seen files when updating (insert/update only
for seen files) and keep the existing has_scanned_once/changed logic unchanged.

---

Outside diff comments:
In `@crates/oxide/src/cursor.rs`:
- Around line 39-61: prev() currently assumes pos>0 is safe but can read OOB if
pos > input.len(); either clamp pos on mutation or guard reads. Fix by ensuring
prev() checks that self.pos > 0 AND self.pos <= self.input.len() before using
get_unchecked(self.pos - 1) (i.e., return 0x00 if self.pos == 0 or self.pos >
self.input.len()), or alternatively have advance() and advance_twice() call
move_to(self.pos + 1) / move_to(self.pos + 2) (like advance_by) so pos is always
clamped; update either prev(), or advance()/advance_twice() (or both)
referencing the prev, advance, advance_twice, advance_by, and move_to symbols to
eliminate OOB reads.

coderabbitai · 2026-02-17T19:17:00Z

CHANGELOG.md

 - Fix infinite loop when using `@variant` inside `@custom-variant` ([#19633](https://github.com/tailwindlabs/tailwindcss/pull/19633))
 - Allow multiples of `.25` in `aspect-*` fractions ([#19688](https://github.com/tailwindlabs/tailwindcss/pull/19688))
 - Ensure changes to external files listed via `@source` trigger a full page reload when using `@tailwindcss/vite` ([#19670](https://github.com/tailwindlabs/tailwindcss/pull/19670))
+- Improve performance Oxide scanner in bigger projects ([#19632](https://github.com/tailwindlabs/tailwindcss/pull/19632))


⚠️ Potential issue | 🟡 Minor

Fix grammar for clarity in changelog entry.

Wording is a bit off; consider “Improve performance of the Oxide scanner in bigger projects” for readability.

✏️ Suggested tweak

-- Improve performance Oxide scanner in bigger projects ([`#19632`](https://github.com/tailwindlabs/tailwindcss/pull/19632)) +- Improve performance of the Oxide scanner in bigger projects ([`#19632`](https://github.com/tailwindlabs/tailwindcss/pull/19632))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

- Improve performance Oxide scanner in bigger projects ([#19632](https://github.com/tailwindlabs/tailwindcss/pull/19632))

- Improve performance of the Oxide scanner in bigger projects ([`#19632`](https://github.com/tailwindlabs/tailwindcss/pull/19632))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@CHANGELOG.md` at line 31, Update the changelog entry string "Improve performance Oxide scanner in bigger projects ([`#19632`](https://github.com/tailwindlabs/tailwindcss/pull/19632))" to read "Improve performance of the Oxide scanner in bigger projects ([`#19632`](https://github.com/tailwindlabs/tailwindcss/pull/19632))" to fix the grammar; locate and edit the exact line containing that phrase in CHANGELOG.md.

coderabbitai · 2026-02-17T19:17:00Z

crates/oxide/src/scanner/init_tracing.rs

+    let file_path = Path::new(&file_path);
+    let absolute_file_path = dunce::canonicalize(file_path)
+        .unwrap_or_else(|_| panic!("Failed to canonicalize {file_path:?}"));
+    eprintln!(
+        "{} Writing debug info to: {}\n",
+        dim("[DEBUG]"),
+        highlight(absolute_file_path.as_path().to_str().unwrap())
+    );


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

cat -n crates/oxide/src/scanner/init_tracing.rs | sed -n '40,60p'

Repository: tailwindlabs/tailwindcss

Length of output: 999

🏁 Script executed:

rg -A 10 "fn highlight" crates/oxide/src/

Repository: tailwindlabs/tailwindcss

Length of output: 813

🏁 Script executed:

rg "to_str()" crates/oxide/src/scanner/init_tracing.rs -B 2 -A 2

Repository: tailwindlabs/tailwindcss

Length of output: 207

Avoid panicking on non-UTF8 paths when printing the log location.

to_str().unwrap() will panic if the path contains non-UTF8 bytes. Use lossy conversion to keep debug tracing resilient.

💡 Suggested change

- eprintln!( - "{} Writing debug info to: {}\n", - dim("[DEBUG]"), - highlight(absolute_file_path.as_path().to_str().unwrap()) - ); + let display_path = absolute_file_path.to_string_lossy(); + eprintln!( + "{} Writing debug info to: {}\n", + dim("[DEBUG]"), + highlight(&display_path) + );

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/oxide/src/scanner/init_tracing.rs` around lines 47 - 54, The code panics on non-UTF8 paths because as_path().to_str().unwrap() is used when printing absolute_file_path; replace that with a lossless conversion such as as_path().to_string_lossy() (or equivalent) so highlight receives a safe &str without panicking. Update the call that builds the eprintln (referencing file_path and absolute_file_path) to use the lossy string form before passing into highlight/dim to make debug tracing resilient to non-UTF8 paths.

coderabbitai · 2026-02-17T19:17:00Z

crates/oxide/src/scanner/mod.rs

+    #[tracing::instrument(skip_all)]
+    fn discover_sources(&mut self) -> (Vec<Vec<u8>>, Vec<PathBuf>) {
+        if self.sources_scanned {
+            return (vec![], vec![]);
+        }
+        self.sources_scanned = true;
+
+        let Some(walker) = &mut self.walker else {
+            return (vec![], vec![]);
+        };
+
+        // Use synchronous walk for the initial build (lower overhead) and parallel
+        // walk for subsequent calls (watch mode) where the overhead is amortised.
+        let all_entries = if self.has_scanned_once {
+            walk_parallel(walker)
+        } else {
+            walk_synchronous(walker)
+        };
+
+        let mut css_files: Vec<PathBuf> = vec![];
+        let mut content_paths: Vec<(PathBuf, String)> = Vec::new();
+        let mut seen_files: FxHashSet<PathBuf> = FxHashSet::default();
+
+        for (path, is_dir, extension) in all_entries {
+            if is_dir {
+                self.dirs.insert(path);
+            } else {
+                // Deduplicate: parallel walk can visit the same file from multiple threads
+                if !seen_files.insert(path.clone()) {
+                    continue;
+                }
+
+                // On re-scans, check mtime to skip unchanged files.
+                // On the first scan we skip this entirely to avoid extra
+                // metadata syscalls.
+                let changed = if self.has_scanned_once {
+                    let current_mtime = std::fs::metadata(&path)
+                        .ok()
+                        .and_then(|m| m.modified().ok());
+
+                    match current_mtime {
+                        Some(mtime) => {
+                            let prev = self.mtimes.insert(path.clone(), mtime);
+                            prev.is_none_or(|prev| prev != mtime)
+                        }
+                        None => true,
+                    }
+                } else {
+                    true
+                };
+
+                match extension.as_str() {
+                    // Special handing for CSS files, we don't want to extract candidates from
+                    // these files, but we do want to extract used CSS variables.
+                    "css" => {
+                        if changed {
+                            css_files.push(path.clone());
+                        }
+                    }
+                    _ => {
+                        if changed {
+                            content_paths.push((path.clone(), extension.clone()));
+                        }
+                    }
+                }
+
+                self.extensions.insert(extension);
+                self.files.insert(path);
+            }
+        }
+
+        // Read + preprocess all discovered files in parallel
+        let scanned_blobs: Vec<Vec<u8>> = content_paths
+            .into_par_iter()
+            .filter_map(|(path, ext)| {
+                let content = std::fs::read(&path).ok()?;
+                event!(tracing::Level::INFO, "Reading {:?}", path);
+                let processed = pre_process_input(content, &ext);
+                if processed.is_empty() {
+                    None
+                } else {
+                    Some(processed)
+                }
+            })
+            .collect();
+
+        if !self.has_scanned_once {
+            self.has_scanned_once = true;
+        }
+
+        (scanned_blobs, css_files)
+    }


⚠️ Potential issue | 🟠 Major

Reset/prune per-scan caches to avoid stale results and unbounded growth.

files, dirs, extensions, and mtimes are only ever inserted into during discovery, so deleted paths remain in get_files()/get_globs() results and mtimes can grow across long watch sessions. Clear per-scan caches and retain mtimes for currently-seen files.

🧹 Proposed fix (clear caches + retain mtimes)

fn discover_sources(&mut self) -> (Vec<Vec<u8>>, Vec<PathBuf>) { if self.sources_scanned { return (vec![], vec![]); } self.sources_scanned = true; let Some(walker) = &mut self.walker else { return (vec![], vec![]); }; + // Reset per-scan caches to avoid stale entries. + self.files.clear(); + self.dirs.clear(); + self.extensions.clear(); // Use synchronous walk for the initial build (lower overhead) and parallel // walk for subsequent calls (watch mode) where the overhead is amortised. let all_entries = if self.has_scanned_once { walk_parallel(walker) } else { walk_synchronous(walker) }; let mut css_files: Vec<PathBuf> = vec![]; let mut content_paths: Vec<(PathBuf, String)> = Vec::new(); let mut seen_files: FxHashSet<PathBuf> = FxHashSet::default(); for (path, is_dir, extension) in all_entries { if is_dir { self.dirs.insert(path); } else { // Deduplicate: parallel walk can visit the same file from multiple threads if !seen_files.insert(path.clone()) { continue; } ... self.extensions.insert(extension); self.files.insert(path); } } + // Drop mtimes for files that no longer exist. + self.mtimes.retain(|path, _| self.files.contains(path)); if !self.has_scanned_once { self.has_scanned_once = true; } (scanned_blobs, css_files) }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@crates/oxide/src/scanner/mod.rs` around lines 356 - 447, The discover_sources function currently only inserts into self.files, self.dirs, self.extensions and self.mtimes which causes deleted files to persist and mtimes to grow; fix by clearing per-scan caches at the start of discover_sources (reset self.files, self.dirs, self.extensions and any per-scan collections used for this pass) before populating them, then after iterating all_entries prune self.mtimes to retain entries only for the files seen in this scan (use the seen_files set collected during the loop) so mtimes doesn’t grow unbounded and get_files()/get_globs() reflect deletions. Ensure you still preserve mtimes for currently-seen files when updating (insert/update only for seen files) and keep the existing has_scanned_once/changed logic unchanged.

RobinMalfait force-pushed the fix/issue-19616 branch from ecdd645 to e49a695 Compare February 5, 2026 12:35

RobinMalfait added 20 commits February 17, 2026 20:04

rename scan_sources to discover_sources

2188bac

move init_tracing outside of the scanner

bf2874b

remove comment

2875245

Not relevant, and we can re-introduce it if needed.

pre-compute normalized source patterns to avoid per-entry allocations…

e06b984

… in walker filter

don't clone the entire vec

5047d2e

remove saturating_add in cursor

91f1bd2

If we increase `self.pos` until it exceeds the `usize`, then we have bigger problems... `self.pos + 1` should be safe enough.

make Cursor smaller

d254752

We're only storing the `input` and the `pos` such that the Cursor size is much smaller. The `prev`, `curr` and `next` are now methods that compute the values when needed. We also inline those function calls so there is no additional overhead.

accept owned content in pre_process_input

d4cd763

This way we don't have to call `.to_vec()` in the default case, which is the majority of the files in a typical project.

return FxHashSet from extract to avoid redundant sort

8a7786f

pass discovered data explicitly through return values

01c89ab

remove mtime tracking from walker filter

6cf9119

use parallel walker for initial source discovery

49ca262

re-add mtime tracking

808288c

We dropped it from the `filter_entry` before because we wanted to introduce `build_parallel`. We had to walk all files anyway, so now we will check the `mtime` before actually extracting candidates from the files.

only create the walker once

7f133f3

make files unique

4d52df0

make dirs unique

9e17bc6

update changelog

2e76d69

RobinMalfait force-pushed the fix/issue-19616 branch from e49a695 to 2e76d69 Compare February 17, 2026 19:05

RobinMalfait marked this pull request as ready for review February 17, 2026 19:05

RobinMalfait requested a review from a team as a code owner February 17, 2026 19:05

RobinMalfait merged commit 095ff96 into main Feb 17, 2026
9 checks passed

RobinMalfait deleted the fix/issue-19616 branch February 17, 2026 19:15

coderabbitai bot reviewed Feb 17, 2026

View reviewed changes

dcastil mentioned this pull request Feb 18, 2026

Add support for Tailwind CSS v4.2 dcastil/tailwind-merge#651

Merged

26 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve performance in bigger projects#19632

Improve performance in bigger projects#19632
RobinMalfait merged 20 commits intomainfrom
fix/issue-19616

RobinMalfait commented Feb 2, 2026

Uh oh!

pompom454 commented Feb 5, 2026

Uh oh!

Uh oh!

coderabbitai bot commented Feb 17, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Feb 17, 2026

Uh oh!

coderabbitai bot Feb 17, 2026

Uh oh!

coderabbitai bot Feb 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

	- Improve performance Oxide scanner in bigger projects ([#19632](https://github.com/tailwindlabs/tailwindcss/pull/19632))
	- Improve performance of the Oxide scanner in bigger projects ([`#19632`](https://github.com/tailwindlabs/tailwindcss/pull/19632))

Uh oh!

Conversation

RobinMalfait commented Feb 2, 2026

1. Prevent multiple file system walks

2. Use parallel file system walking

3. Delaying work

3. Small wins

Benchmarks

tailwindcss.com codebase

Synthetic 5000 files codebase

Test plan

Uh oh!

pompom454 commented Feb 5, 2026

Uh oh!

Uh oh!

coderabbitai bot commented Feb 17, 2026

Walkthrough

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments