Running DOMPurify in a Web Worker

This issue discusses a performance issue deriving from DOMPurify's dependance on the DOM.

### Background & Context

I work on a note-taking app which renders user-provided HTML documents, essentially. Those HTML documents must be sanitized, my goal is to do that as efficiently as possible. The problem is that given how DOMPurify works it's impossible for me to implement this well enough.

Initially I was just doing the following:

1. Turn the HTML into a DOM with DOMParser.
2. Sanitize the DOM with DOMPurify.

But there are multiple problems with that approach:

1. First of all sanitization time scales with the input, and it's synchronous so it blocks the thread, it follows immediately that I just can't run sanitization on the main thread if I want to provide great performance to my users under pretty much all scenarios.
2. Secondly when the HTML document is being edited by the user I probably don't actually need to sanitize it fully again, I can just split the HTML into top-level tags and sanitize each of them individually, with some caching this means only the top-level tags that changed will be sanitized again, which will be a massive speed up.

I can address the second problem pretty easily, but the first problem just can't be addressed because DOMPurify relies on the DOM, the DOM APIs aren't available in a worker context, userland implementations of DOM APIs like jsdom come with their own major issues (jsdom in particular in my experience is slow and massive), and alternative non-DOM-based HTML sanitization libraries like js-xss I just don't trust, if they work at all.

So assuming it's in DOMPurify's interest to be able to be used efficiently, what should be done to fix this?

### Feature

I think the best way to address the issue is the following: DOMPurify already accepts a raw Node as input, if it "only" accepted a relatively simple NodeLike object too, which would be an object implementing a very restricted set of DOM-like APIs, then I as a user could parse my HTML string with a third-party HTML parser, and then provide a simple adaptor to it for DOMPurify. Basically DOMPurify would work largely the same way with potentially little change to its code, and users could run it in workers with relatively little work.

Essentially I think it's fine that DOMPurify needs a DOM-like API, but if it requires the entirety  of the DOM APIs then it becomes a problem from a performance perspective, because it just can't be run in a worker.

Additionally it may be a good idea to provide an asynchronous version of the API which yields to the event loop every 5ms or so, so that no matter how big the input string is the thread will never just freeze indefinitely.

I hope these potential improvements will be implemented, as currently I don't see a way to use DOMPurify with predictable and acceptable performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Running DOMPurify in a Web Worker #577

Background & Context

Feature

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Running DOMPurify in a Web Worker #577

Description

Background & Context

Feature

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions