lxml is pinned at lxml<5.0. lxml>=5.0 introduced breaking changes related to system dependencies
lxml==5.2.1 introduced new extra so we'll nee to rename lxml --> lxml[html-clean]
We have upgraded to python 3.12 so we can investigate and remove this constraint