Skip to content

Parser.unescapeEntities equivalent #17

@vanniktech

Description

@vanniktech

It would be immensely helpful if the ksoup-entites module would provide an Parser.unescapeEntities equivalent Jsoup API:

https://jsoup.org/apidocs/org/jsoup/parser/Parser.html#unescapeEntities(java.lang.String,boolean)

With that one you can unescape all of the Ӓ entities.

My use case is that I have an RSS Reader and often times even though it's not required the content is HTML encoded and hence I need to unescape it. For instance here's on such feed:

https://lexfridman.com/feed/podcast/

Note though that ideally any unicode would be supported, here are some more feeds which use a lot of escaped HTML entites:

https://vandal.elespanol.com/xml.cgi
https://open.firstory.me/rss/user/cklqae6gy388f0892i2qmv851
https://blog.codinghorror.com/rss/
https://www.ivoox.com/finanzas-personales-libertad-financiera_fg_f1703990_filtro_1.xml

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions