-
-
Notifications
You must be signed in to change notification settings - Fork 11
Description
It would be immensely helpful if the ksoup-entites module would provide an Parser.unescapeEntities equivalent Jsoup API:
https://jsoup.org/apidocs/org/jsoup/parser/Parser.html#unescapeEntities(java.lang.String,boolean)
With that one you can unescape all of the Ӓ entities.
My use case is that I have an RSS Reader and often times even though it's not required the content is HTML encoded and hence I need to unescape it. For instance here's on such feed:
https://lexfridman.com/feed/podcast/
Note though that ideally any unicode would be supported, here are some more feeds which use a lot of escaped HTML entites:
https://vandal.elespanol.com/xml.cgi
https://open.firstory.me/rss/user/cklqae6gy388f0892i2qmv851
https://blog.codinghorror.com/rss/
https://www.ivoox.com/finanzas-personales-libertad-financiera_fg_f1703990_filtro_1.xml