If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
|
Thread Tools | Rate Thread | Display Modes |
#1
|
|||
|
|||
Problem displaying Unicode characters in CMD
See below screenshot.
http://i.imgur.com/aY3JAqX.jpg My OS is Windows 7 with English language, BTW. FYI, that CMD session was already started using the /U switch. And it's already using a TrueType font (Lucida Console). The other Consolas font have the same problem too. My system already have the required font for displaying most Unicode characters (especially CJK) - as shown by the Windows Explorer in the screenshot. There are claims that I have to set the active code page for that CMD session to UTF-8 (65001) via the CHCP command, but even that didn't help. I also tried UTF-16 (1200) code page since it's the closest thing to the OS native UCS-2, but CMD says it's an invalid code page. My system code page is set to English, BTW. The system code page must not be changed for my system. With CMD application, I have no problem working with Unicode characters as data. I only have problem displaying them. Anyone can help? PS) - This is a CMD application problem. Not the console window itself. - Using an application other than CMD is not applicable, unless CMD can't display Unicode characters. |
Ads |
#2
|
|||
|
|||
Problem displaying Unicode characters in CMD
JJ wrote:
See below screenshot. http://i.imgur.com/aY3JAqX.jpg My OS is Windows 7 with English language, BTW. FYI, that CMD session was already started using the /U switch. And it's already using a TrueType font (Lucida Console). The other Consolas font have the same problem too. My system already have the required font for displaying most Unicode characters (especially CJK) - as shown by the Windows Explorer in the screenshot. There are claims that I have to set the active code page for that CMD session to UTF-8 (65001) via the CHCP command, but even that didn't help. I also tried UTF-16 (1200) code page since it's the closest thing to the OS native UCS-2, but CMD says it's an invalid code page. My system code page is set to English, BTW. The system code page must not be changed for my system. With CMD application, I have no problem working with Unicode characters as data. I only have problem displaying them. Anyone can help? PS) - This is a CMD application problem. Not the console window itself. - Using an application other than CMD is not applicable, unless CMD can't display Unicode characters. One of the answers here, adds an additional entry to the Registry, so you can have another font choice. Maybe the characters you need would be in there ? https://stackoverflow.com/questions/...mmand-line-how CMD.exe seems to be able to pass the characters (from a shell perspective), but there are no real guarantees on what shows in the display itself. Which is a disaster. What good is an interactive shell, which is not interactive ? Paul |
#3
|
|||
|
|||
Problem displaying Unicode characters in CMD
"JJ" wrote
| - This is a CMD application problem. Not the console window itself. I don't generally use console windows, but I assume you can only choose one font. In that case, Lucida is showing you what it's got, which doesn't include Chinese characters. |
#4
|
|||
|
|||
Problem displaying Unicode characters in CMD
On 8/3/2017 11:21 AM, JJ wrote:
See below screenshot. http://i.imgur.com/aY3JAqX.jpg My OS is Windows 7 with English language, BTW. FYI, that CMD session was already started using the /U switch. And it's already using a TrueType font (Lucida Console). The other Consolas font have the same problem too. My system already have the required font for displaying most Unicode characters (especially CJK) - as shown by the Windows Explorer in the screenshot. There are claims that I have to set the active code page for that CMD session to UTF-8 (65001) via the CHCP command, but even that didn't help. I also tried UTF-16 (1200) code page since it's the closest thing to the OS native UCS-2, but CMD says it's an invalid code page. My system code page is set to English, BTW. The system code page must not be changed for my system. With CMD application, I have no problem working with Unicode characters as data. I only have problem displaying them. Anyone can help? PS) - This is a CMD application problem. Not the console window itself. - Using an application other than CMD is not applicable, unless CMD can't display Unicode characters. What happens when you try this? Yeah,I've just resolved my problem. It was a fault of default font in cmd.exe which can't manage unicode signs. To fix it(windows 7 x64 pro): Open/run cmd.exe Click on the icon at the top-left corner Select properties Then "Font" bar Select "Lucida Console" and OK. Write Chcp 10000 at the prompt Finally dir /b Enjoy your clean UTF-16 output with hearts, Chinese signs, and much more! https://stackoverflow.com/questions/...-16-on-cmd-exe |
#5
|
|||
|
|||
Problem displaying Unicode characters in CMD
On 8/3/2017 4:16 PM, Mike S wrote:
On 8/3/2017 11:21 AM, JJ wrote: See below screenshot. http://i.imgur.com/aY3JAqX.jpg My OS is Windows 7 with English language, BTW. FYI, that CMD session was already started using the /U switch. And it's already using a TrueType font (Lucida Console). The other Consolas font have the same problem too. My system already have the required font for displaying most Unicode characters (especially CJK) - as shown by the Windows Explorer in the screenshot. There are claims that I have to set the active code page for that CMD session to UTF-8 (65001) via the CHCP command, but even that didn't help. I also tried UTF-16 (1200) code page since it's the closest thing to the OS native UCS-2, but CMD says it's an invalid code page. My system code page is set to English, BTW. The system code page must not be changed for my system. With CMD application, I have no problem working with Unicode characters as data. I only have problem displaying them. Anyone can help? PS) - This is a CMD application problem. Not the console window itself. - Using an application other than CMD is not applicable, unless CMD can't display Unicode characters. What happens when you try this? Yeah,I've just resolved my problem. It was a fault of default font in cmd.exe which can't manage unicode signs. To fix it(windows 7 x64 pro): Open/run cmd.exe Click on the icon at the top-left corner Select properties Then "Font" bar Select "Lucida Console" and OK. Write Chcp 10000 at the prompt Finally dir /b Enjoy your clean UTF-16 output with hearts, Chinese signs, and much more! https://stackoverflow.com/questions/...-16-on-cmd-exe Sorry, forgot to add this Chcp Displays the number of the active console code page, or changes the console's active console code page. Used without parameters, chcp displays the number of the active console code page. Syntax chcp [nnn] Code page _ Country/region or language 437 United States 850 Multilingual (Latin I) 852 Slavic (Latin II) 855 Cyrillic (Russian) 857 Turkish 860 Portuguese 861 Icelandic 863 Canadian-French 865 Nordic 866 Russian 869 Modern Greek https://technet.microsoft.com/en-us/.../bb490874.aspx |
#6
|
|||
|
|||
Problem displaying Unicode characters in CMD
On Thu, 3 Aug 2017 19:13:05 -0400, Mayayana wrote:
I don't generally use console windows, but I assume you can only choose one font. In that case, Lucida is showing you what it's got, which doesn't include Chinese characters. There are 3 fonts to choose from in my system: "Consolas", "Lucida Console", and "Raster Fonts". The first two are TrueType fonts. You're right. The "Lucida Console" font does not have a Unicode block for CJK characters. However, I use the "Microsoft Sans Serif" font for the default Windows GUI via Windows Classic theme. "Microsoft Sans Serif" font does not have a Unicode block for CJK characters either. Yet, Windows Explorer can display the CJK characters correctly. It's similar like using "Lucida Console" font (or any other TrueType/OpenType font) in Notepad. If you copy any CJK character from e.g. Character Map, Notepad can display the characters correctly. This is possible because the system borrows character glyphs from other font which have them. CMD however, behave differently. |
#7
|
|||
|
|||
Problem displaying Unicode characters in CMD
On Thu, 3 Aug 2017 16:16:14 -0700, Mike S wrote:
What happens when you try this? Yeah,I've just resolved my problem. It was a fault of default font in cmd.exe which can't manage unicode signs. To fix it(windows 7 x64 pro): Open/run cmd.exe Click on the icon at the top-left corner Select properties Then "Font" bar Select "Lucida Console" and OK. Write Chcp 10000 at the prompt Finally dir /b Enjoy your clean UTF-16 output with hearts, Chinese signs, and much more! https://stackoverflow.com/questions/...-16-on-cmd-exe Unfortunately, it has no effect. The console font is already set to Lucida Console. Setting the code page to 10000 (which is Mac version of Western code page) gives no error, but the DIR command still show the same thing. That SO answer may be a solution, but I think it's missing something else. Did you test that on your own system with an actual Unicode file name? If not, try creating a dummy file and rename it to below. It's the exact same file name as the one in my system. ソーラン渡り鳥 (島津亜矢 + 田 寿美).aac Note: the above text is encoded in UTF-8. |
#8
|
|||
|
|||
Problem displaying Unicode characters in CMD
On Thu, 03 Aug 2017 16:04:25 -0400, Paul wrote:
One of the answers here, adds an additional entry to the Registry, so you can have another font choice. Maybe the characters you need would be in there ? https://stackoverflow.com/questions/...mmand-line-how Yes, I've just tried that. It seems that the console's setting dialog only accepts monospace fonts including some other unknown criteria(s). Not all monospace fonts are accepted. e.g. "Bitstream Vera Sans Mono", "DejaVu Vera Sans Mono", "saxMono". Some are displayed in the list but the console won't use it; and some aren't even displayed in the list. I did succeded on adding and using some monospace fonts but none of them have any CJK Unicode block. e.g. "Andale Mono". AFAIK, the "MS Gothic" font is a monospace font designed for Japanese language and it does have CJK Unicode block (IIRC, it's the default GUI font in CJK version of Windows 95), but the console's setting dialog won't accept that font (it won't display it in the list). So far, I haven't found any monospace CJK-compatible font which is accepted by the console's setting dialog. CMD.exe seems to be able to pass the characters (from a shell perspective), but there are no real guarantees on what shows in the display itself. Which is a disaster. What good is an interactive shell, which is not interactive ? I've read in a discussion on the net that CMD doesn't respect the code page setting when displaying file names onto the screen. It only works properly the the output is redirected into file. As if it only use the system code page, which is a global setting. |
#9
|
|||
|
|||
Problem displaying Unicode characters in CMD
"JJ" wrote
| You're right. The "Lucida Console" font does not have a Unicode block for | CJK characters. However, I use the "Microsoft Sans Serif" font for the | default Windows GUI via Windows Classic theme. "Microsoft Sans Serif" font | does not have a Unicode block for CJK characters either. Yet, Windows | Explorer can display the CJK characters correctly. | | It's similar like using "Lucida Console" font (or any other | TrueType/OpenType font) in Notepad. If you copy any CJK character from e.g. | Character Map, Notepad can display the characters correctly. This is | possible because the system borrows character glyphs from other font which | have them. CMD however, behave differently. I just tested Lucida in my console window on XP. I get a rectangle for a Chinese character. Ditto with Notepad, which I keep set to Verdana. Windows Explorer is probably more sophisticated. Likewise with browsers. For instance, I keep a webpage for reference that I created with the full unicode set, showing each as: decimal value character UTF-8 byte values I set the font as verdana in CSS, but foreign characters still show up. Presumably the browser knows to pick a font that suits. I know that Firefox has settings in about:config for that. So if I use something like 恴 to show the unicode Chinese character 24692 (6074 is the hexadecimal version) then the browser knows to deal with that. I suspect those fonts may be built in. But browsers are designed to show anything graphical. Plain text windows are usually designed to show only one font. I'm surprised your Notepad shows the characters. Maybe MS made it more sophisticated in Vista/7 and it's no longer a plain Win32 text window. Also note with respect to Mike S's post: Local codepage has nothing to do with unicode characters. It started out as ASCII, using one byte. In 7-bit ASCII, 0-127 are basic English characters. With the need to support foreign languages, ANSI was developed. Still one byte per character. 0-127 are still the same. 128-255 are displayed depending on the local codepage. In English, #149 is a bullet. In Russian it's probably a Russian character. In Turkish, Turkish. Etc. The codepage setting decides that. You can set your system to function as Russian, Turkish, etc. That solved the problem except for Korean, Chinese, Japanese, which use a multibyte character set to deal with the limitations of ANSI. It's still one byte per character but some byte values are signifiers for the next byte. So 65 is "A", for instance, but 120 65 might be the character for "tree" using the Japanese codepage. (Just an example. I don't know the signifier numbers offhand. Nor do I know Japanese. That's all in the world of one-byte encoding (which confusingly includes multi-byte Asian characters). Unicode is two byte encoding. All characters needed have a number of their own. So Russian characters might be, say, 340-420. Chinese characters seem to be up in the mid-20,000s to 30,000s. It's an entirely different approach. 0-127 are still the same as ASCII, but the bytes for "ab" in ASCII or ANSI are 97-98. In unicode they're 0-97-0-98. Always 2 bytes. That created a problem. The computing world was based on 1 byte = 1 character. Even multibyte encoding reads one byte at a time. It's made up of numbers from 0-255. Unicode is made up of numbers from 0 to 65535, using 2 bytes for each number. Completely different encoding. Unicode has been around for many years, but it requires different treatment. Different programming APIs. Webpages are written in ANSI. JPG EXIF tags are in ANSI. Etc. Unicode is also superfluous to those of us in N. America and Europe. So it's been slow to be adopted. To make the transition smoother, UTF-8 was created. UTF-8 is similar to the multibyte Asian encoding. It renders the unicode characters using prepended flag bytes. So text can still be parsed one byte at a time. Webpages can be ANSI or UTF-8 without changing the basic file structure. There are no pesky null characters to screw things up. All that's needed is for the browser to know which way to parse. And of course, it still doesn't matter much in the West. So everyone's happy. Since UTF-8 does actually function as unicode, copepages are not used. Your console window probably deals in unicode. But fonts deal in characters. So if the window can only render one font at a time then it won't be able to render anything not drawn in Lucida. That may be more that anyone cares to know. But I figure it's worth explaining because the whole thing can get very confusing and there's a lot of misinformation about what's what when it comes to character encoding. |
#10
|
|||
|
|||
Problem displaying Unicode characters in CMD
JJ wrote:
On Thu, 3 Aug 2017 16:16:14 -0700, Mike S wrote: What happens when you try this? Yeah,I've just resolved my problem. It was a fault of default font in cmd.exe which can't manage unicode signs. To fix it(windows 7 x64 pro): Open/run cmd.exe Click on the icon at the top-left corner Select properties Then "Font" bar Select "Lucida Console" and OK. Write Chcp 10000 at the prompt Finally dir /b Enjoy your clean UTF-16 output with hearts, Chinese signs, and much more! https://stackoverflow.com/questions/...-16-on-cmd-exe Unfortunately, it has no effect. The console font is already set to Lucida Console. Setting the code page to 10000 (which is Mac version of Western code page) gives no error, but the DIR command still show the same thing. That SO answer may be a solution, but I think it's missing something else. Did you test that on your own system with an actual Unicode file name? If not, try creating a dummy file and rename it to below. It's the exact same file name as the one in my system. ソーラン渡り鳥 (島津亜矢 + 田 寿美).aac Note: the above text is encoded in UTF-8. I managed to modify my system enough so that Thunderbird shows characters instead of boxes. But since the font used (JhengHei Regular) isn't a monospaced font, there's no way that cmd.exe is going to use a font like that. Even with the registry hack, it will be excluded from the font menu. https://s2.postimg.org/hax9prms9/no_squares.gif This is the font i used. There's apparently more than one font for the job, and the characters are different in them. So only a native speaker/writer could possibly know whether that's an appropriate representation. http://www.microsoft.com/en-us/downl....aspx?id=12072 msjh.ttf 14,713,760 bytes I see a distinct lack of mono fonts, lots of "Regular" and "Bold". And also font extensions, which most programs won't know how to use. Adding more font standards (other than .ttf) isn't real progress when nothing uses them. I'd experiment with Courier New, but based on the size of the file in my system (303,296 bytes), that's just not big enough to have enough alternate pages of stuff. I had a copy of FontForge set up once, and I could see the pages in some of the fonts with it. Paul |
#11
|
|||
|
|||
Problem displaying Unicode characters in CMD
On Fri, 4 Aug 2017 09:40:03 -0400, Mayayana wrote:
I just tested Lucida in my console window on XP. I get a rectangle for a Chinese character. Ditto with Notepad, which I keep set to Verdana. Windows Explorer is probably more sophisticated. Likewise with browsers. For instance, I keep a webpage for reference that I created with the full unicode set, showing each as: decimal value character UTF-8 byte values I set the font as verdana in CSS, but foreign characters still show up. Presumably the browser knows to pick a font that suits. I know that Firefox has settings in about:config for that. So if I use something like 恴 to show the unicode Chinese character 24692 (6074 is the hexadecimal version) then the browser knows to deal with that. I suspect those fonts may be built in. But browsers are designed to show anything graphical. Plain text windows are usually designed to show only one font. I'm surprised your Notepad shows the characters. Maybe MS made it more sophisticated in Vista/7 and it's no longer a plain Win32 text window. Also note with respect to Mike S's post: Local codepage has nothing to do with unicode characters. It started out as ASCII, using one byte. In 7-bit ASCII, 0-127 are basic English characters. With the need to support foreign languages, ANSI was developed. Still one byte per character. 0-127 are still the same. 128-255 are displayed depending on the local codepage. In English, #149 is a bullet. In Russian it's probably a Russian character. In Turkish, Turkish. Etc. The codepage setting decides that. You can set your system to function as Russian, Turkish, etc. That solved the problem except for Korean, Chinese, Japanese, which use a multibyte character set to deal with the limitations of ANSI. It's still one byte per character but some byte values are signifiers for the next byte. So 65 is "A", for instance, but 120 65 might be the character for "tree" using the Japanese codepage. (Just an example. I don't know the signifier numbers offhand. Nor do I know Japanese. That's all in the world of one-byte encoding (which confusingly includes multi-byte Asian characters). Unicode is two byte encoding. All characters needed have a number of their own. So Russian characters might be, say, 340-420. Chinese characters seem to be up in the mid-20,000s to 30,000s. It's an entirely different approach. 0-127 are still the same as ASCII, but the bytes for "ab" in ASCII or ANSI are 97-98. In unicode they're 0-97-0-98. Always 2 bytes. That created a problem. The computing world was based on 1 byte = 1 character. Even multibyte encoding reads one byte at a time. It's made up of numbers from 0-255. Unicode is made up of numbers from 0 to 65535, using 2 bytes for each number. Completely different encoding. Unicode has been around for many years, but it requires different treatment. Different programming APIs. Webpages are written in ANSI. JPG EXIF tags are in ANSI. Etc. Unicode is also superfluous to those of us in N. America and Europe. So it's been slow to be adopted. To make the transition smoother, UTF-8 was created. UTF-8 is similar to the multibyte Asian encoding. It renders the unicode characters using prepended flag bytes. So text can still be parsed one byte at a time. Webpages can be ANSI or UTF-8 without changing the basic file structure. There are no pesky null characters to screw things up. All that's needed is for the browser to know which way to parse. And of course, it still doesn't matter much in the West. So everyone's happy. Since UTF-8 does actually function as unicode, copepages are not used. Your console window probably deals in unicode. But fonts deal in characters. So if the window can only render one font at a time then it won't be able to render anything not drawn in Lucida. That may be more that anyone cares to know. But I figure it's worth explaining because the whole thing can get very confusing and there's a lot of misinformation about what's what when it comes to character encoding. Well, the code page should be irrelevant assuming that the font actually has the required Unicode block, but apparently it isn't. To add more confusion, here what happened then the system code page is set to Japanese. http://i.imgur.com/mHfuaSW.jpg And strangely, you'll notice that the Font Preview window shows that the "MS Gothic" font name is not "MS Gothic" but "MS LO[G" when the system code page is set to ther than Japanese (or probably other than CJK). |
#12
|
|||
|
|||
Problem displaying Unicode characters in CMD
On Fri, 04 Aug 2017 12:15:24 -0400, Paul wrote:
I managed to modify my system enough so that Thunderbird shows characters instead of boxes. But since the font used (JhengHei Regular) isn't a monospaced font, there's no way that cmd.exe is going to use a font like that. Even with the registry hack, it will be excluded from the font menu. https://s2.postimg.org/hax9prms9/no_squares.gif This is the font i used. There's apparently more than one font for the job, and the characters are different in them. So only a native speaker/writer could possibly know whether that's an appropriate representation. http://www.microsoft.com/en-us/downl....aspx?id=12072 msjh.ttf 14,713,760 bytes I see a distinct lack of mono fonts, lots of "Regular" and "Bold". And also font extensions, which most programs won't know how to use. Adding more font standards (other than .ttf) isn't real progress when nothing uses them. I'd experiment with Courier New, but based on the size of the file in my system (303,296 bytes), that's just not big enough to have enough alternate pages of stuff. I had a copy of FontForge set up once, and I could see the pages in some of the fonts with it. Well, Thunderbird doesn't use the Windows built in console window. Moreover, most cross platform applications use their own font rendering engine. Also see my recent reply to Mayayana. |
#13
|
|||
|
|||
Problem displaying Unicode characters in CMD
On Sat, 5 Aug 2017 23:59:55 +0700, JJ wrote:
Well, the code page should be irrelevant assuming that the font actually has the required Unicode block, but apparently it isn't. To add more confusion, here what happened then the system code page is set to Japanese. http://i.imgur.com/mHfuaSW.jpg And strangely, you'll notice that the Font Preview window shows that the "MS Gothic" font name is not "MS Gothic" but "MS LO[G" when the system code page is set to ther than Japanese (or probably other than CJK). You probably already know that the Japanese code page uses the Yen currency character as the backslash. This is the main reason I don't want to change my system locale to Japanese. Otherwise, I would use that already. |
#14
|
|||
|
|||
Problem displaying Unicode characters in CMD
"JJ" wrote | Well, the code page should be irrelevant assuming that the font actually has | the required Unicode block, but apparently it isn't. | No, it's two different things. The codepage is used to parse ANSI/DBCS. Unicode is 2-byte encoding and includes unique numeric values for all characters. That's what I was trying to clarify. Codepage is used only for ANSI/DBCS. It's not relevant with unicode because all used characters are assigned a unique byte value, while the purpose of a codepage is to squeeze all languages into a possible 256 values in a byte. It does that by reusing bytes 128-255 depending on the language. A font does not have a "unicode block". It only has characters. Fonts and encoding are different things. It gets complicated because DBCS languages (Chinese, Japanese, Korean), have to use multiple bytes for single characters in their non-unicode encoding, while all other languages use one byte. If you just look at Western languages it's easier to see. A text file with a single byte 128 (H80) is a Euro sign when using the English codepage. In the Russian codepage it looks like a capital A. That's how you'd see it in Notepad on an English or Russian computer. The unicode value for a Euro sign is 8364, or hex 20AC. H20AC would show in a hex editor as AC 20. The English ANSI codepage would render that as an angled dash followed by a space. The Russian codepage would render it as something like a capital M followed by a space. But if Notepad knows it's unicode then both computers would render a Euro sign. Thus, no codepages for unicode. | And strangely, you'll notice that the Font Preview window shows that the "MS | Gothic" font name is not "MS Gothic" but "MS ????" when the system code page | is set to ther than Japanese (or probably other than CJK). Interesting. Maybe that's coming across in dropdown text window as unicode but being interpreted as DBCS. So what you need seems to be a monospaced, unicode font, that includes Japanese characters, then use Paul's trick to get at it in the console window. *If* your console window can really display unicode. There's a list he https://en.wikipedia.org/wiki/Unicode_font A few are monospaced, but the selection seems to be very limited. Arial Unicode MS has almost 40,000 characters, but many of the fonts only have 6,000 or so. What you need is monospace unicode with Japanese characters. Do any include Japanese? I don't know. Maybe some Japanese company has specifically made such a thing. If you change the codepage you run into all sorts of complications, as you've seen. Any byte above 127 will render corrupt, and other oddities like the funky font dropdown selector can happen. With Japanese it will probably be worse because it's a DBCS language rather than just ANSI. With DBCS a byte above 127 will be a flag indicating how to interpret the following byte. |
#15
|
|||
|
|||
Problem displaying Unicode characters in CMD
On Sat, 5 Aug 2017 16:16:24 -0400, Mayayana wrote:
No, it's two different things. The codepage is used to parse ANSI/DBCS. Unicode is 2-byte encoding and includes unique numeric values for all characters. That's what I was trying to clarify. Codepage is used only for ANSI/DBCS. It's not relevant with unicode because all used characters are assigned a unique byte value, while the purpose of a codepage is to squeeze all languages into a possible 256 values in a byte. It does that by reusing bytes 128-255 depending on the language. A font does not have a "unicode block". It only has characters. Fonts and encoding are different things. It gets complicated because DBCS languages (Chinese, Japanese, Korean), have to use multiple bytes for single characters in their non-unicode encoding, while all other languages use one byte. If you just look at Western languages it's easier to see. A text file with a single byte 128 (H80) is a Euro sign when using the English codepage. In the Russian codepage it looks like a capital A. That's how you'd see it in Notepad on an English or Russian computer. The unicode value for a Euro sign is 8364, or hex 20AC. H20AC would show in a hex editor as AC 20. The English ANSI codepage would render that as an angled dash followed by a space. The Russian codepage would render it as something like a capital M followed by a space. But if Notepad knows it's unicode then both computers would render a Euro sign. Thus, no codepages for unicode. Maybe I should have mentioned the "Unicode block" as "Unicode subrange". Sorry, for the confusion. Interesting. Maybe that's coming across in dropdown text window as unicode but being interpreted as DBCS. That's impossible. The "Gothic" text can't possibly be "LO[G" regardless of what it was originally encoded with. Did you actually see the katakana characters in the news message from your news client? That (and this) message was encoded using Big5, BTW. So what you need seems to be a monospaced, unicode font, that includes Japanese characters, then use Paul's trick to get at it in the console window. *If* your console window can really display unicode. There's a list he https://en.wikipedia.org/wiki/Unicode_font A few are monospaced, but the selection seems to be very limited. Arial Unicode MS has almost 40,000 characters, but many of the fonts only have 6,000 or so. What you need is monospace unicode with Japanese characters. Do any include Japanese? I don't know. Maybe some Japanese company has specifically made such a thing. None of the mentioned fonts is accepted by the console, unfortunately. |
|
Thread Tools | |
Display Modes | Rate This Thread |
|
|