Problematic glyphs in HTML output

Discussion:

Torsten Bronger

2005-03-27 09:37:47 UTC

Hallöchen!

In HTML output, some characters (e.g. Hungarian umlaut, Polish
l-slash, dotless i) are not printed nicely, but rendered as ASCII.
Unfortunately, this also affects XML output, so I noticed.

The reason stated in the source is that these characters "don't have
HTML support". I think this statement is rather outdated. Very
most browsers can digest &#....; for giving an arbitrary Unicode.
It's part of the standard for many years.

XML can do anyway, but before a lot of new "if"s pollute the source,
I wonder whether you agree that this should be fixed for HTML, too.
I even think that one should fix the "Glyphs for Examples" like
"==>" or "-!-", since there are Unicodes for them, too.

What do you think?

Tschö,
Torsten.

--
Torsten Bronger, aquisgrana, europa vetus

Karl Berry

2005-03-27 14:38:52 UTC

Permalink

Very most browsers can digest &#....; for giving an arbitrary
Unicode.

Maybe they can digest &#, but can they give reasonable output from it?
In particular, what does lynx do when you give the Unicode for dotless
i, etc.? Can you make a test document and send it to the list, and
whoever is interested can try it in their favorite browser(s), so we can
get a reality check?

I even think that one should fix the "Glyphs for Examples" like
"==>" or "-!-", since there are Unicodes for them, too.

Indeed, Patrice and I wrote down the mappings for everything we could
find in the "HTML Cross-references" chapter.

Thanks,
karl

Torsten Bronger

2005-03-27 16:09:11 UTC

Permalink

Hallöchen!

Post by Karl Berry
Very most browsers can digest &#....; for giving an arbitrary
Unicode.
Maybe they can digest &#, but can they give reasonable output from it?

Well, not all, and not for all glyphs. However, I suppose people
reading those languages will use browsers that can display them.

Post by Karl Berry
In particular, what does lynx do when you give the Unicode for dotless
i, etc.?

I doesn't do too badly.

Post by Karl Berry
Can you make a test document and send it to the list, and whoever
is interested can try it in their favorite browser(s), so we can
get a reality check?

Here you are:
http://www-users.rwth-aachen.de/torsten.bronger/glyphtest.html

The dotless j seems to be unavailable in Unicode, but maybe I was
just blind.

As for the arrows, the IE has problems at least under Win2k, I don't
know the situation with XP. This may suggest that it's better to
leave this as is.

However, the diacritical characters should be replaced, for the
reason stated above, but also because even the sub-optimal form with
the diacritical sign *next* to the glyph is better than the ASCII
version in my opinion.

The problem remains that it should be fixed for XML definitely.

Tschö,
Torsten.

--
Torsten Bronger, aquisgrana, europa vetus

Karl Berry

2005-03-28 01:08:53 UTC

Permalink

http://www-users.rwth-aachen.de/torsten.bronger/glyphtest.html

Thanks. It comes out reasonably enough, both in my mozilla 1.7.3 (and
as far back as 1.4.1, the oldest I have around), and my lynx.
Unfortunately, in netscape 4, they pretty much all appear as just a "?".

I suppose people reading those languages will use browsers that can
display them.

Seems reasonable. So I suppose such old browsers should not stop us
from moving on to better support those languages. Unless there is some
vehement objection, I'll welcome a patch to fix it for both HTML and
XML. At least I think you were volunteering to do that :)?

However, the diacritical characters should be replaced, for the
reason stated above, but also because even the sub-optimal form with
the diacritical sign *next* to the glyph is better than the ASCII
version in my opinion.

I tend to agree.

The dotless j seems to be unavailable in Unicode, but maybe I was

Apparently it's coming in the next round.
(Google for unicode dotless j)

Thanks,
k

Patrice Dumas

2006-03-27 22:19:27 UTC

Permalink

Post by Torsten Bronger
I even think that one should fix the "Glyphs for Examples" like
"==>" or "-!-", since there are Unicodes for them, too.
Indeed, Patrice and I wrote down the mappings for everything we could
find in the "HTML Cross-references" chapter.

For 'print', it seems that we missed the U+22A3 codepoint, I think it should
be used for cross-refs.

For 'point' U+2605 looks like the TeX symbol but not like the info one.
Should we use it for cross refs?

For 'expansion' I believe we should stick with U+2192, and not use U+21A6,
because it is too far from the info output.

--
Pat

Karl Berry

2006-03-28 18:39:40 UTC

Permalink

For 'print', it seems that we missed the U+22A3 codepoint, I think
it should be used for cross-refs.

Ok.

For 'point' U+2605 looks like the TeX symbol but not like the info one.
Should we use it for cross refs?

I guess it is better than the U+2217 asterisk operator we have now, so
let's go with it. (The info representation is quite weird, let's not
worry about it.)

For 'expansion' I believe we should stick with U+2192, and not use U+21A6,

Ok.

I updated the manual with the above two changes.

Thanks,
Karl