If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
|
Thread Tools | Rate Thread | Display Modes |
#16
|
|||
|
|||
Problem displaying Unicode characters in CMD
JJ wrote:
None of the mentioned fonts is accepted by the console, unfortunately. There is Courier New, but the file isn't big enough. How many versions of the Courier New font are there ? There is Droid Sans Mono, but it's a smaller font file than Courier New. And the guys here provide some numbers, for just how dire the situation is. https://graphicdesign.stackexchange....nicode-support Paul |
Ads |
#17
|
|||
|
|||
Problem displaying Unicode characters in CMD
"JJ" wrote
...... Following Paul's link,I found this: http://unifoundry.com/unifont.html (a font) https://upload.wikimedia.org/wikiped...3.20131006.png (a picture of the characters in that font) I don't know if windows will load it. There are also interesting notes on console windows he https://stackoverflow.com/questions/...mmand-line-how With the interesting idea of setting the display to UTF-8. There's also this, from Michael Kaplan, who is, or at least was, pretty much the language programming expert at MS: http://archives.miloush.net/michkap/...8/8306597.html He shows how to programmatically jump through hoops to show rectangles in the console window that function as the characters they're supposed to be. Whoopee. It doesn't sound promising. But maybe you'll be the first. | Interesting. Maybe that's coming across in dropdown text | window as unicode but being interpreted as DBCS. | | That's impossible. The "Gothic" text can't possibly be "????" regardless of | what it was originally encoded with. | No, I wouldn't think so. But some kind of fluke in the dropdown window is the only explanation I can think of. | Did you actually see the katakana characters in the news message from your | news client? That (and this) message was encoded using Big5, BTW. | I see them in the window. If I look at the message source it shows with the English code page, as a line of oddball characters. If I save the post and open it in Notepad I see rectangles. If I then paste that into an ANSI text window as part of a webpage I get ??????... But if I replace those with the rectangles from Notepad and save it as UTF-8, IE will show the characters. So... yes and no. |
#18
|
|||
|
|||
Problem displaying Unicode characters in CMD
On Sun, 6 Aug 2017 13:06:00 -0400, Mayayana wrote:
Following Paul's link,I found this: http://unifoundry.com/unifont.html (a font) https://upload.wikimedia.org/wikiped...3.20131006.png (a picture of the characters in that font) I don't know if windows will load it. It won't, unfortunately. There are also interesting notes on console windows he https://stackoverflow.com/questions/...mmand-line-how With the interesting idea of setting the display to UTF-8. Well, that SO question is about working with Unicode as data, not as display. I don't have any problem on that too. There's also this, from Michael Kaplan, who is, or at least was, pretty much the language programming expert at MS: http://archives.miloush.net/michkap/...8/8306597.html He shows how to programmatically jump through hoops to show rectangles in the console window that function as the characters they're supposed to be. Whoopee. It doesn't sound promising. But maybe you'll be the first. That actually shows the problem. The Windows' console design in terms of displaying characters, is not natively UCS2/UTF16. It's more like native ANSI/OEM. No, I wouldn't think so. But some kind of fluke in the dropdown window is the only explanation I can think of. FYI, most cross platform applications use their own font rendering engine. They don't rely on Windows' built in font rendering engine. Moreover, Thunderbird, Firefox and other Gecko based applications use the Gecko browser engine for their main application GUI (as GUI framework). I see them in the window. If I look at the message source it shows with the English code page, as a line of oddball characters. That would be the Big5 encoded text shown using ANSI character set. If I save the post and open it in Notepad I see rectangles. That's when the font used for Notepad doesn't have the glyph for that characters. If I then paste that into an ANSI text window as part of a webpage I get ??????... What application is that? ANSI character set is roughly the same as code page. If the system code page is not CJK, Windows won't show the correct character. Assuming that the font used for the display have the glyph for that characters. But if I replace those with the rectangles from Notepad and save it as UTF-8, IE will show the characters. So... yes and no. Well, IE has better internationalization support. Much better than the console, apparently. And if you take a look a the screenshot again, you'll notice that the console removes both the "Courier New" and "Lucida Console" fonts from the list when the system locale is set to CJK. So, it seems that the Windows' console design (in terms of display) is bound to the system code page. I think that's the main problem. OK, I do believe now that there's no solution for this. Thanks for your support. |
#19
|
|||
|
|||
Problem displaying Unicode characters in CMD
On Sun, 06 Aug 2017 11:45:29 -0400, Paul wrote:
There is Courier New, but the file isn't big enough. How many versions of the Courier New font are there ? There is Droid Sans Mono, but it's a smaller font file than Courier New. And the guys here provide some numbers, for just how dire the situation is. https://graphicdesign.stackexchange....nicode-support In my collection, there a - Courier (Raster, TrueType, PostScript) - Courier New KOI-8 (PostScript; KOI-8 character set) - Courier Std (OpenType) - Courier10 BT (TrueType, PostScript) - CourierMCY (TrueType, PostScript) AFAIK, all Courier fonts are monospaced, but I haven't seen any that have adquate Unicode subrange (which include CJK). FYI, Windows' built in PostScript fonts support can only handle ANSI/OEM character set. I have a font information tool I wrote years ago. Here are the list of the Unicode subrange some of the mentioned fonts have. Courrier New: https://pastebin.com/6GqRtHK7 Droid Sand Mono: https://pastebin.com/aP52cu6x FreeMono: https://pastebin.com/4prhSNsZ GNU UniFont: (mentioned by Mayayana) https://pastebin.com/3V2XMiyQ I have most of the fonts that has CJK Unicode subrange from many sources. I even have the excellect "Osaka" TrueType font from Mac OS X which is converted to Windows version (Mac TTF files are not binary compatible with Windows because they use big endian format). Yet, none of the CJK fonts in my collection is accepted by the console's settings dialog if I don't set the system locale to CJK. I don't think this problem has any solution. So, thanks for your time. |
#20
|
|||
|
|||
Problem displaying Unicode characters in CMD
"JJ" wrote
| If I then paste that into an ANSI text window | as part of a webpage I get ??????... | | What application is that? That's actually my own code editor. I made it with a RichEdit window and included a toggle option for ANSI or UTF-8. When set to ANSI I get ?s. When set to UTF-8 I get rectangles. There's seems to be some kind of "sniffing" built in. In ANSI I should get ANSI characters, but Windows apparently picks up that it's UTF-8 and just doesn't try to render it. Yet if I load a UTF-8 webpage I don't get ?s for single UTF-8 characters. I get characters above 128 in English ANSI. This has been an interesting exploration. The encoding options are so complicated. But I guess it makes sense that the console window would be ANSI. Most programming is English. I imagine CD or DEL don't change. So the only reason to support other languages would be for local differences in file/folder names. |
#21
|
|||
|
|||
Problem displaying Unicode characters in CMD
On Mon, 7 Aug 2017 18:23:51 -0400, Mayayana wrote:
That's actually my own code editor. I made it with a RichEdit window and included a toggle option for ANSI or UTF-8. When set to ANSI I get ?s. When set to UTF-8 I get rectangles. There's seems to be some kind of "sniffing" built in. In ANSI I should get ANSI characters, but Windows apparently picks up that it's UTF-8 and just doesn't try to render it. Yet if I load a UTF-8 webpage I don't get ?s for single UTF-8 characters. I get characters above 128 in English ANSI. The Windows' RichEdit control is Unicode aware, even if the host application uses an ANSI GUI. e.g. Wordpad in Windows 9x. You can test it by using the RichEdit's built in ALT+X shortcut when your application is set to ANSI mode. Press the shortcut when the input cursor is placed after a character. Try that using two different characters where both show as "?" or square characters. |
#22
|
|||
|
|||
Problem displaying Unicode characters in CMD
"JJ" wrote
| | The Windows' RichEdit control is Unicode aware, even if the host application | uses an ANSI GUI. e.g. Wordpad in Windows 9x. Interesting. I just pasted your file name into Wordpad and it got sniffed out as Japanese, then rendered in a Japanese font that I didn't know I had. But my program is an editor for HTML and script. I want it to be locked into either ANSI or UTF-8, so I have a menu toggle, which changes the 3rd parameter when I send an EM_STREAMIN message to load a file. More accurately, I want ANSI, but sometimes there are UTF-8 webpages that are loaded and I want to be able to handle those. It's kind of a shame, really. English webpages don't need to be UTF-8. ASCII is UTF-8 matching. But companies like Microsoft often use things like curly quotes in UTF-8 which then corrupt the text if they're rendered as ANSI. They're using just enough to create a problem for ANSI rendering. |
|
Thread Tools | |
Display Modes | Rate This Thread |
|
|