Normalize UTF-8 #2

Open
opened 2023-03-04 09:04:18 -05:00 by jmcbray · 1 comment
Owner

I was reading some pages in Español via Cosmarmot with TurboGopher on MacOS 8. It handles ISO8859-1 Latin-1 text, but accented characters were coming across as "Ãx" where x is some ascii character. This shouldn't be, because accented Spanish characters in UTF-8 are in the same position they are in ISO Latin-1. My guess is that these are two UTF-8 codepoint, a combining accent and a letter. I think normalizing the UTF-8 before sending to Gopher will prevent such mojibake.

I was reading some pages in Español via Cosmarmot with TurboGopher on MacOS 8. It handles ISO8859-1 Latin-1 text, but accented characters were coming across as "Ãx" where x is some ascii character. This shouldn't be, because accented Spanish characters in UTF-8 are in the same position they are in ISO Latin-1. My guess is that these are two UTF-8 codepoint, a combining accent and a letter. I think normalizing the UTF-8 before sending to Gopher will prevent such mojibake.
Author
Owner

So, I normalized the UTF-8 in b50e099cd5, but it doesn't resolve the issue. I maybe TurboGopher is just not great? Going to re-test with a simple gopher browser on a Latin-1 terminal.

So, I normalized the UTF-8 in b50e099cd5, but it doesn't resolve the issue. I maybe TurboGopher is just not great? Going to re-test with a simple gopher browser on a Latin-1 terminal.
Sign in to join this conversation.
No labels
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
jmcbray/cosmarmot#2
No description provided.