Skip to content

Fix for handling non-Latin characters#7

Open
qurbat wants to merge 4 commits intodeepseagirl:masterfrom
qurbat:patch-1
Open

Fix for handling non-Latin characters#7
qurbat wants to merge 4 commits intodeepseagirl:masterfrom
qurbat:patch-1

Conversation

@qurbat
Copy link
Copy Markdown

@qurbat qurbat commented Apr 18, 2022

This change introduces support for search results containing non-Latin characters as part of the URL or description.

This is done by passing the final_string variable to the html.unescape() function (instead of printing it directly) at the last print call.

qurbat added 2 commits April 19, 2022 00:28
This change introduces support for text containing non-Latin characters (Hindi, Urdu, Greek, for example).

This is done by printing `html.unescape(final_string)` instead of `final_string`.
@qurbat
Copy link
Copy Markdown
Author

qurbat commented Apr 19, 2022

@deepseagirl could you merge this after review?

@qurbat qurbat changed the title Convert HTML entities Fix for handling non-Latin characters May 26, 2022
@qurbat
Copy link
Copy Markdown
Author

qurbat commented May 26, 2022

@deepseagirl hi, just sending a ping on this. thanks!

@qurbat
Copy link
Copy Markdown
Author

qurbat commented Jun 27, 2022

@deepseagirl Can we close this?

@deepseagirl
Copy link
Copy Markdown
Owner

hi, thanks. this is a good improvement :)
i moved the unescape to only occur on the result descriptions directly
with a flag to toggle the behavior on/off

new default will be to decode character references:

$ python3 degoogle.py "intitle:⟿ inurl:⟿"
-- 9 results --

TranslingualEdit - Wiktionary
https://en.wiktionary.org/wiki/%E2%9F%BF

Talk:⟿ - Wiktionary
https://en.wiktionary.org/wiki/Talk:%E2%9F%BF

flag to turn decoding off:

$ python3 degoogle.py -d "intitle:⟿ inurl:⟿"
-- 9 results --

TranslingualEdit - Wiktionary
https://en.wiktionary.org/wiki/%E2%9F%BF

Talk:⟿ - Wiktionary
https://en.wiktionary.org/wiki/Talk:%E2%9F%BF

the html.unescape python doc links to this list of named character references which seemed handy.
i didn't realize char references were such an in depth thing until now. if you're interested here is that link
https://html.spec.whatwg.org/multipage/named-characters.html#named-character-references

@deepseagirl
Copy link
Copy Markdown
Owner

i'll finalize this when i have a few more mins. should be soon now that it's this far along. thanks again

@qurbat
Copy link
Copy Markdown
Author

qurbat commented Jul 12, 2022

@deepseagirl no worries, and I realize you were not able to access a computer earlier, so it is no problem. the new changes look great! thank you & tc =)

@qurbat
Copy link
Copy Markdown
Author

qurbat commented Oct 8, 2022

@deepseagirl can we close?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants