A Windows XP help forum. PCbanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » PCbanter forum » Microsoft Windows 7 » Windows 7 Forum
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Problem displaying Unicode characters in CMD



 
 
Thread Tools Rate Thread Display Modes
  #1  
Old August 3rd 17, 07:21 PM posted to alt.windows7.general
JJ[_11_]
external usenet poster
 
Posts: 744
Default Problem displaying Unicode characters in CMD

See below screenshot.

http://i.imgur.com/aY3JAqX.jpg

My OS is Windows 7 with English language, BTW.

FYI, that CMD session was already started using the /U switch. And it's
already using a TrueType font (Lucida Console). The other Consolas font have
the same problem too. My system already have the required font for
displaying most Unicode characters (especially CJK) - as shown by the
Windows Explorer in the screenshot.

There are claims that I have to set the active code page for that CMD
session to UTF-8 (65001) via the CHCP command, but even that didn't help. I
also tried UTF-16 (1200) code page since it's the closest thing to the OS
native UCS-2, but CMD says it's an invalid code page. My system code page is
set to English, BTW. The system code page must not be changed for my system.

With CMD application, I have no problem working with Unicode characters as
data. I only have problem displaying them.

Anyone can help?

PS)
- This is a CMD application problem. Not the console window itself.
- Using an application other than CMD is not applicable, unless CMD can't
display Unicode characters.
Ads
  #2  
Old August 3rd 17, 09:04 PM posted to alt.windows7.general
Paul[_32_]
external usenet poster
 
Posts: 11,873
Default Problem displaying Unicode characters in CMD

JJ wrote:
See below screenshot.

http://i.imgur.com/aY3JAqX.jpg

My OS is Windows 7 with English language, BTW.

FYI, that CMD session was already started using the /U switch. And it's
already using a TrueType font (Lucida Console). The other Consolas font have
the same problem too. My system already have the required font for
displaying most Unicode characters (especially CJK) - as shown by the
Windows Explorer in the screenshot.

There are claims that I have to set the active code page for that CMD
session to UTF-8 (65001) via the CHCP command, but even that didn't help. I
also tried UTF-16 (1200) code page since it's the closest thing to the OS
native UCS-2, but CMD says it's an invalid code page. My system code page is
set to English, BTW. The system code page must not be changed for my system.

With CMD application, I have no problem working with Unicode characters as
data. I only have problem displaying them.

Anyone can help?

PS)
- This is a CMD application problem. Not the console window itself.
- Using an application other than CMD is not applicable, unless CMD can't
display Unicode characters.


One of the answers here, adds an additional entry to the Registry,
so you can have another font choice. Maybe the characters you need
would be in there ?

https://stackoverflow.com/questions/...mmand-line-how

CMD.exe seems to be able to pass the characters (from a shell perspective),
but there are no real guarantees on what shows in the display itself.
Which is a disaster. What good is an interactive shell,
which is not interactive ?

Paul
  #3  
Old August 4th 17, 12:13 AM posted to alt.windows7.general
Mayayana
external usenet poster
 
Posts: 6,438
Default Problem displaying Unicode characters in CMD

"JJ" wrote

| - This is a CMD application problem. Not the console window itself.

I don't generally use console windows, but I assume
you can only choose one font. In that case, Lucida
is showing you what it's got, which doesn't include
Chinese characters.


  #4  
Old August 4th 17, 12:16 AM posted to alt.windows7.general
Mike S[_4_]
external usenet poster
 
Posts: 496
Default Problem displaying Unicode characters in CMD

On 8/3/2017 11:21 AM, JJ wrote:
See below screenshot.

http://i.imgur.com/aY3JAqX.jpg

My OS is Windows 7 with English language, BTW.

FYI, that CMD session was already started using the /U switch. And it's
already using a TrueType font (Lucida Console). The other Consolas font have
the same problem too. My system already have the required font for
displaying most Unicode characters (especially CJK) - as shown by the
Windows Explorer in the screenshot.

There are claims that I have to set the active code page for that CMD
session to UTF-8 (65001) via the CHCP command, but even that didn't help. I
also tried UTF-16 (1200) code page since it's the closest thing to the OS
native UCS-2, but CMD says it's an invalid code page. My system code page is
set to English, BTW. The system code page must not be changed for my system.

With CMD application, I have no problem working with Unicode characters as
data. I only have problem displaying them.

Anyone can help?

PS)
- This is a CMD application problem. Not the console window itself.
- Using an application other than CMD is not applicable, unless CMD can't
display Unicode characters.

What happens when you try this?

Yeah,I've just resolved my problem. It was a fault of default font in
cmd.exe which can't manage unicode signs. To fix it(windows 7 x64 pro):

Open/run cmd.exe
Click on the icon at the top-left corner
Select properties
Then "Font" bar
Select "Lucida Console" and OK.
Write Chcp 10000 at the prompt
Finally dir /b

Enjoy your clean UTF-16 output with hearts, Chinese signs, and much more!

https://stackoverflow.com/questions/...-16-on-cmd-exe


  #5  
Old August 4th 17, 12:19 AM posted to alt.windows7.general
Mike S[_4_]
external usenet poster
 
Posts: 496
Default Problem displaying Unicode characters in CMD

On 8/3/2017 4:16 PM, Mike S wrote:
On 8/3/2017 11:21 AM, JJ wrote:
See below screenshot.

http://i.imgur.com/aY3JAqX.jpg

My OS is Windows 7 with English language, BTW.

FYI, that CMD session was already started using the /U switch. And it's
already using a TrueType font (Lucida Console). The other Consolas
font have
the same problem too. My system already have the required font for
displaying most Unicode characters (especially CJK) - as shown by the
Windows Explorer in the screenshot.

There are claims that I have to set the active code page for that CMD
session to UTF-8 (65001) via the CHCP command, but even that didn't
help. I
also tried UTF-16 (1200) code page since it's the closest thing to the OS
native UCS-2, but CMD says it's an invalid code page. My system code
page is
set to English, BTW. The system code page must not be changed for my
system.

With CMD application, I have no problem working with Unicode
characters as
data. I only have problem displaying them.

Anyone can help?

PS)
- This is a CMD application problem. Not the console window itself.
- Using an application other than CMD is not applicable, unless CMD can't
display Unicode characters.

What happens when you try this?

Yeah,I've just resolved my problem. It was a fault of default font in
cmd.exe which can't manage unicode signs. To fix it(windows 7 x64 pro):

Open/run cmd.exe
Click on the icon at the top-left corner
Select properties
Then "Font" bar
Select "Lucida Console" and OK.
Write Chcp 10000 at the prompt
Finally dir /b

Enjoy your clean UTF-16 output with hearts, Chinese signs, and much more!

https://stackoverflow.com/questions/...-16-on-cmd-exe


Sorry, forgot to add this

Chcp

Displays the number of the active console code page, or changes the
console's active console code page. Used without parameters, chcp
displays the number of the active console code page.
Syntax

chcp [nnn]

Code page _ Country/region or language

437 United States
850 Multilingual (Latin I)
852 Slavic (Latin II)
855 Cyrillic (Russian)
857 Turkish
860 Portuguese
861 Icelandic
863 Canadian-French
865 Nordic
866 Russian
869 Modern Greek

https://technet.microsoft.com/en-us/.../bb490874.aspx


  #6  
Old August 4th 17, 01:44 PM posted to alt.windows7.general
JJ[_11_]
external usenet poster
 
Posts: 744
Default Problem displaying Unicode characters in CMD

On Thu, 3 Aug 2017 19:13:05 -0400, Mayayana wrote:

I don't generally use console windows, but I assume
you can only choose one font. In that case, Lucida
is showing you what it's got, which doesn't include
Chinese characters.


There are 3 fonts to choose from in my system: "Consolas", "Lucida Console",
and "Raster Fonts". The first two are TrueType fonts.

You're right. The "Lucida Console" font does not have a Unicode block for
CJK characters. However, I use the "Microsoft Sans Serif" font for the
default Windows GUI via Windows Classic theme. "Microsoft Sans Serif" font
does not have a Unicode block for CJK characters either. Yet, Windows
Explorer can display the CJK characters correctly.

It's similar like using "Lucida Console" font (or any other
TrueType/OpenType font) in Notepad. If you copy any CJK character from e.g.
Character Map, Notepad can display the characters correctly. This is
possible because the system borrows character glyphs from other font which
have them. CMD however, behave differently.
  #7  
Old August 4th 17, 01:44 PM posted to alt.windows7.general
JJ[_11_]
external usenet poster
 
Posts: 744
Default Problem displaying Unicode characters in CMD

On Thu, 3 Aug 2017 16:16:14 -0700, Mike S wrote:
What happens when you try this?

Yeah,I've just resolved my problem. It was a fault of default font in
cmd.exe which can't manage unicode signs. To fix it(windows 7 x64 pro):

Open/run cmd.exe
Click on the icon at the top-left corner
Select properties
Then "Font" bar
Select "Lucida Console" and OK.
Write Chcp 10000 at the prompt
Finally dir /b

Enjoy your clean UTF-16 output with hearts, Chinese signs, and much more!

https://stackoverflow.com/questions/...-16-on-cmd-exe


Unfortunately, it has no effect. The console font is already set to Lucida
Console. Setting the code page to 10000 (which is Mac version of Western
code page) gives no error, but the DIR command still show the same thing.

That SO answer may be a solution, but I think it's missing something else.

Did you test that on your own system with an actual Unicode file name? If
not, try creating a dummy file and rename it to below. It's the exact same
file name as the one in my system.

ソーラン渡り鳥 (島津亜矢 + 田 寿美).aac

Note: the above text is encoded in UTF-8.
  #8  
Old August 4th 17, 01:44 PM posted to alt.windows7.general
JJ[_11_]
external usenet poster
 
Posts: 744
Default Problem displaying Unicode characters in CMD

On Thu, 03 Aug 2017 16:04:25 -0400, Paul wrote:

One of the answers here, adds an additional entry to the Registry,
so you can have another font choice. Maybe the characters you need
would be in there ?

https://stackoverflow.com/questions/...mmand-line-how


Yes, I've just tried that. It seems that the console's setting dialog only
accepts monospace fonts including some other unknown criteria(s).

Not all monospace fonts are accepted. e.g. "Bitstream Vera Sans Mono",
"DejaVu Vera Sans Mono", "saxMono". Some are displayed in the list but the
console won't use it; and some aren't even displayed in the list. I did
succeded on adding and using some monospace fonts but none of them have any
CJK Unicode block. e.g. "Andale Mono".

AFAIK, the "MS Gothic" font is a monospace font designed for Japanese
language and it does have CJK Unicode block (IIRC, it's the default GUI font
in CJK version of Windows 95), but the console's setting dialog won't accept
that font (it won't display it in the list). So far, I haven't found any
monospace CJK-compatible font which is accepted by the console's setting
dialog.

CMD.exe seems to be able to pass the characters (from a shell perspective),
but there are no real guarantees on what shows in the display itself.
Which is a disaster. What good is an interactive shell,
which is not interactive ?


I've read in a discussion on the net that CMD doesn't respect the code page
setting when displaying file names onto the screen. It only works properly
the the output is redirected into file. As if it only use the system code
page, which is a global setting.
  #9  
Old August 4th 17, 02:40 PM posted to alt.windows7.general
Mayayana
external usenet poster
 
Posts: 6,438
Default Problem displaying Unicode characters in CMD

"JJ" wrote

| You're right. The "Lucida Console" font does not have a Unicode block for
| CJK characters. However, I use the "Microsoft Sans Serif" font for the
| default Windows GUI via Windows Classic theme. "Microsoft Sans Serif" font
| does not have a Unicode block for CJK characters either. Yet, Windows
| Explorer can display the CJK characters correctly.
|
| It's similar like using "Lucida Console" font (or any other
| TrueType/OpenType font) in Notepad. If you copy any CJK character from
e.g.
| Character Map, Notepad can display the characters correctly. This is
| possible because the system borrows character glyphs from other font which
| have them. CMD however, behave differently.

I just tested Lucida in my console window on XP.
I get a rectangle for a Chinese character. Ditto with
Notepad, which I keep set to Verdana. Windows
Explorer is probably more sophisticated. Likewise with
browsers. For instance, I keep a webpage for reference
that I created with the full unicode set, showing
each as:

decimal value character UTF-8 byte values

I set the font as verdana in CSS, but foreign characters
still show up. Presumably the browser knows to pick a
font that suits. I know that Firefox has settings in
about:config for that. So if I use something like
恴 to show the unicode Chinese character 24692
(6074 is the hexadecimal version) then the browser knows
to deal with that. I suspect those fonts may be built in.

But browsers are designed to show anything graphical.
Plain text windows are usually designed to show only
one font. I'm surprised your Notepad shows the characters.
Maybe MS made it more sophisticated in Vista/7 and it's
no longer a plain Win32 text window.

Also note with respect to Mike S's post: Local codepage
has nothing to do with unicode characters. It started out
as ASCII, using one byte. In 7-bit ASCII, 0-127 are basic
English characters. With the need to support foreign
languages, ANSI was developed. Still one byte per character.
0-127 are still the same. 128-255 are displayed depending
on the local codepage. In English, #149 is a bullet. In
Russian it's probably a Russian character. In Turkish,
Turkish. Etc. The codepage setting decides that. You
can set your system to function as Russian, Turkish, etc.

That solved the problem except for Korean, Chinese,
Japanese, which use a multibyte character set to deal with
the limitations of ANSI. It's still one byte per character
but some byte values are signifiers for the next byte.
So 65 is "A", for instance, but 120 65 might be the character
for "tree" using the Japanese codepage. (Just an example.
I don't know the signifier numbers offhand. Nor do I know
Japanese.

That's all in the world of one-byte encoding (which
confusingly includes multi-byte Asian characters).

Unicode is two byte encoding. All characters needed
have a number of their own. So Russian characters
might be, say, 340-420. Chinese characters seem to
be up in the mid-20,000s to 30,000s. It's an entirely
different approach. 0-127 are still the same as ASCII,
but the bytes for "ab" in ASCII or ANSI are 97-98.
In unicode they're 0-97-0-98. Always 2 bytes.

That created a problem. The computing world was
based on 1 byte = 1 character. Even multibyte encoding
reads one byte at a time. It's made up of numbers
from 0-255. Unicode is made up of numbers from 0
to 65535, using 2 bytes for each number. Completely
different encoding.
Unicode has been around for many years, but it
requires different treatment. Different programming
APIs. Webpages are written in ANSI. JPG EXIF tags
are in ANSI. Etc. Unicode is also superfluous to those
of us in N. America and Europe. So it's been slow to
be adopted.
To make the transition smoother, UTF-8 was
created. UTF-8 is similar to the multibyte Asian
encoding. It renders the unicode characters using
prepended flag bytes. So text can still be parsed
one byte at a time. Webpages can be ANSI or UTF-8
without changing the basic file structure. There
are no pesky null characters to screw things up.
All that's needed is for the browser to know which
way to parse. And of course, it still doesn't matter
much in the West. So everyone's happy. Since UTF-8
does actually function as unicode, copepages are
not used.

Your console window probably deals in unicode.
But fonts deal in characters. So if the window can
only render one font at a time then it won't be
able to render anything not drawn in Lucida.

That may be more that anyone cares to know.
But I figure it's worth explaining because the whole
thing can get very confusing and there's a lot of
misinformation about what's what when it comes to
character encoding.


  #10  
Old August 4th 17, 05:15 PM posted to alt.windows7.general
Paul[_32_]
external usenet poster
 
Posts: 11,873
Default Problem displaying Unicode characters in CMD

JJ wrote:
On Thu, 3 Aug 2017 16:16:14 -0700, Mike S wrote:
What happens when you try this?

Yeah,I've just resolved my problem. It was a fault of default font in
cmd.exe which can't manage unicode signs. To fix it(windows 7 x64 pro):

Open/run cmd.exe
Click on the icon at the top-left corner
Select properties
Then "Font" bar
Select "Lucida Console" and OK.
Write Chcp 10000 at the prompt
Finally dir /b

Enjoy your clean UTF-16 output with hearts, Chinese signs, and much more!

https://stackoverflow.com/questions/...-16-on-cmd-exe


Unfortunately, it has no effect. The console font is already set to Lucida
Console. Setting the code page to 10000 (which is Mac version of Western
code page) gives no error, but the DIR command still show the same thing.

That SO answer may be a solution, but I think it's missing something else.

Did you test that on your own system with an actual Unicode file name? If
not, try creating a dummy file and rename it to below. It's the exact same
file name as the one in my system.

ソーラン渡り鳥 (島津亜矢 + 田 寿美).aac

Note: the above text is encoded in UTF-8.


I managed to modify my system enough so that Thunderbird
shows characters instead of boxes. But since the font
used (JhengHei Regular) isn't a monospaced font, there's
no way that cmd.exe is going to use a font like that. Even
with the registry hack, it will be excluded from the font menu.

https://s2.postimg.org/hax9prms9/no_squares.gif

This is the font i used. There's apparently more than one
font for the job, and the characters are different in them.
So only a native speaker/writer could possibly know whether
that's an appropriate representation.

http://www.microsoft.com/en-us/downl....aspx?id=12072

msjh.ttf 14,713,760 bytes

I see a distinct lack of mono fonts, lots of "Regular" and "Bold".
And also font extensions, which most programs won't know how to use.
Adding more font standards (other than .ttf) isn't real progress
when nothing uses them.

I'd experiment with Courier New, but based on the size of the
file in my system (303,296 bytes), that's just not big enough
to have enough alternate pages of stuff.

I had a copy of FontForge set up once, and I could see the
pages in some of the fonts with it.

Paul
  #11  
Old August 5th 17, 05:59 PM posted to alt.windows7.general
JJ[_11_]
external usenet poster
 
Posts: 744
Default Problem displaying Unicode characters in CMD

On Fri, 4 Aug 2017 09:40:03 -0400, Mayayana wrote:

I just tested Lucida in my console window on XP.
I get a rectangle for a Chinese character. Ditto with
Notepad, which I keep set to Verdana. Windows
Explorer is probably more sophisticated. Likewise with
browsers. For instance, I keep a webpage for reference
that I created with the full unicode set, showing
each as:

decimal value character UTF-8 byte values

I set the font as verdana in CSS, but foreign characters
still show up. Presumably the browser knows to pick a
font that suits. I know that Firefox has settings in
about:config for that. So if I use something like
恴 to show the unicode Chinese character 24692
(6074 is the hexadecimal version) then the browser knows
to deal with that. I suspect those fonts may be built in.

But browsers are designed to show anything graphical.
Plain text windows are usually designed to show only
one font. I'm surprised your Notepad shows the characters.
Maybe MS made it more sophisticated in Vista/7 and it's
no longer a plain Win32 text window.

Also note with respect to Mike S's post: Local codepage
has nothing to do with unicode characters. It started out
as ASCII, using one byte. In 7-bit ASCII, 0-127 are basic
English characters. With the need to support foreign
languages, ANSI was developed. Still one byte per character.
0-127 are still the same. 128-255 are displayed depending
on the local codepage. In English, #149 is a bullet. In
Russian it's probably a Russian character. In Turkish,
Turkish. Etc. The codepage setting decides that. You
can set your system to function as Russian, Turkish, etc.

That solved the problem except for Korean, Chinese,
Japanese, which use a multibyte character set to deal with
the limitations of ANSI. It's still one byte per character
but some byte values are signifiers for the next byte.
So 65 is "A", for instance, but 120 65 might be the character
for "tree" using the Japanese codepage. (Just an example.
I don't know the signifier numbers offhand. Nor do I know
Japanese.

That's all in the world of one-byte encoding (which
confusingly includes multi-byte Asian characters).

Unicode is two byte encoding. All characters needed
have a number of their own. So Russian characters
might be, say, 340-420. Chinese characters seem to
be up in the mid-20,000s to 30,000s. It's an entirely
different approach. 0-127 are still the same as ASCII,
but the bytes for "ab" in ASCII or ANSI are 97-98.
In unicode they're 0-97-0-98. Always 2 bytes.

That created a problem. The computing world was
based on 1 byte = 1 character. Even multibyte encoding
reads one byte at a time. It's made up of numbers
from 0-255. Unicode is made up of numbers from 0
to 65535, using 2 bytes for each number. Completely
different encoding.
Unicode has been around for many years, but it
requires different treatment. Different programming
APIs. Webpages are written in ANSI. JPG EXIF tags
are in ANSI. Etc. Unicode is also superfluous to those
of us in N. America and Europe. So it's been slow to
be adopted.
To make the transition smoother, UTF-8 was
created. UTF-8 is similar to the multibyte Asian
encoding. It renders the unicode characters using
prepended flag bytes. So text can still be parsed
one byte at a time. Webpages can be ANSI or UTF-8
without changing the basic file structure. There
are no pesky null characters to screw things up.
All that's needed is for the browser to know which
way to parse. And of course, it still doesn't matter
much in the West. So everyone's happy. Since UTF-8
does actually function as unicode, copepages are
not used.

Your console window probably deals in unicode.
But fonts deal in characters. So if the window can
only render one font at a time then it won't be
able to render anything not drawn in Lucida.

That may be more that anyone cares to know.
But I figure it's worth explaining because the whole
thing can get very confusing and there's a lot of
misinformation about what's what when it comes to
character encoding.


Well, the code page should be irrelevant assuming that the font actually has
the required Unicode block, but apparently it isn't.

To add more confusion, here what happened then the system code page is set
to Japanese.

http://i.imgur.com/mHfuaSW.jpg

And strangely, you'll notice that the Font Preview window shows that the "MS
Gothic" font name is not "MS Gothic" but "MS LO[G" when the system code page
is set to ther than Japanese (or probably other than CJK).
  #12  
Old August 5th 17, 06:00 PM posted to alt.windows7.general
JJ[_11_]
external usenet poster
 
Posts: 744
Default Problem displaying Unicode characters in CMD

On Fri, 04 Aug 2017 12:15:24 -0400, Paul wrote:

I managed to modify my system enough so that Thunderbird
shows characters instead of boxes. But since the font
used (JhengHei Regular) isn't a monospaced font, there's
no way that cmd.exe is going to use a font like that. Even
with the registry hack, it will be excluded from the font menu.

https://s2.postimg.org/hax9prms9/no_squares.gif

This is the font i used. There's apparently more than one
font for the job, and the characters are different in them.
So only a native speaker/writer could possibly know whether
that's an appropriate representation.

http://www.microsoft.com/en-us/downl....aspx?id=12072

msjh.ttf 14,713,760 bytes

I see a distinct lack of mono fonts, lots of "Regular" and "Bold".
And also font extensions, which most programs won't know how to use.
Adding more font standards (other than .ttf) isn't real progress
when nothing uses them.

I'd experiment with Courier New, but based on the size of the
file in my system (303,296 bytes), that's just not big enough
to have enough alternate pages of stuff.

I had a copy of FontForge set up once, and I could see the
pages in some of the fonts with it.


Well, Thunderbird doesn't use the Windows built in console window. Moreover,
most cross platform applications use their own font rendering engine.

Also see my recent reply to Mayayana.
  #13  
Old August 5th 17, 06:03 PM posted to alt.windows7.general
JJ[_11_]
external usenet poster
 
Posts: 744
Default Problem displaying Unicode characters in CMD

On Sat, 5 Aug 2017 23:59:55 +0700, JJ wrote:

Well, the code page should be irrelevant assuming that the font actually has
the required Unicode block, but apparently it isn't.

To add more confusion, here what happened then the system code page is set
to Japanese.

http://i.imgur.com/mHfuaSW.jpg

And strangely, you'll notice that the Font Preview window shows that the "MS
Gothic" font name is not "MS Gothic" but "MS LO[G" when the system code page
is set to ther than Japanese (or probably other than CJK).


You probably already know that the Japanese code page uses the Yen currency
character as the backslash. This is the main reason I don't want to change
my system locale to Japanese. Otherwise, I would use that already.
  #14  
Old August 5th 17, 09:16 PM posted to alt.windows7.general
Mayayana
external usenet poster
 
Posts: 6,438
Default Problem displaying Unicode characters in CMD


"JJ" wrote

| Well, the code page should be irrelevant assuming that the font actually
has
| the required Unicode block, but apparently it isn't.
|
No, it's two different things. The codepage is used to
parse ANSI/DBCS. Unicode is 2-byte encoding and includes
unique numeric values for all characters. That's what I was
trying to clarify. Codepage is used only for ANSI/DBCS. It's
not relevant with unicode because all used characters are
assigned a unique byte value, while the purpose of a
codepage is to squeeze all languages into a possible 256
values in a byte. It does that by reusing bytes 128-255
depending on the language.

A font does not have a "unicode block". It only has characters.
Fonts and encoding are different things.

It gets complicated because DBCS languages (Chinese,
Japanese, Korean), have to use multiple bytes for single
characters in their non-unicode encoding, while all other
languages use one byte. If you just look at Western
languages it's easier to see. A text file with a single byte
128 (H80) is a Euro sign when using the English codepage.
In the Russian codepage it looks like a capital A. That's
how you'd see it in Notepad on an English or Russian
computer. The unicode value for a Euro sign is 8364,
or hex 20AC. H20AC would show in a hex editor as AC 20.
The English ANSI codepage would render that as an angled
dash followed by a space. The Russian codepage would
render it as something like a capital M followed by a space.
But if Notepad knows it's unicode then both computers
would render a Euro sign. Thus, no codepages for unicode.

| And strangely, you'll notice that the Font Preview window shows that the
"MS
| Gothic" font name is not "MS Gothic" but "MS ????" when the system code
page
| is set to ther than Japanese (or probably other than CJK).

Interesting. Maybe that's coming across in dropdown text
window as unicode but being interpreted as DBCS.

So what you need seems to be a monospaced, unicode
font, that includes Japanese characters, then use Paul's
trick to get at it in the console window. *If* your console
window can really display unicode. There's a list he

https://en.wikipedia.org/wiki/Unicode_font

A few are monospaced, but the selection seems to
be very limited. Arial Unicode MS has almost 40,000
characters, but many of the fonts only have 6,000 or
so. What you need is monospace unicode with
Japanese characters. Do any include Japanese? I don't
know. Maybe some Japanese company has specifically
made such a thing.

If you change the codepage you run into all sorts
of complications, as you've seen. Any byte above
127 will render corrupt, and other oddities like the funky
font dropdown selector can happen. With Japanese it will
probably be worse because it's a DBCS language rather
than just ANSI. With DBCS a byte above 127 will
be a flag indicating how to interpret the following byte.


  #15  
Old August 6th 17, 04:33 PM posted to alt.windows7.general
JJ[_11_]
external usenet poster
 
Posts: 744
Default Problem displaying Unicode characters in CMD

On Sat, 5 Aug 2017 16:16:24 -0400, Mayayana wrote:
No, it's two different things. The codepage is used to
parse ANSI/DBCS. Unicode is 2-byte encoding and includes
unique numeric values for all characters. That's what I was
trying to clarify. Codepage is used only for ANSI/DBCS. It's
not relevant with unicode because all used characters are
assigned a unique byte value, while the purpose of a
codepage is to squeeze all languages into a possible 256
values in a byte. It does that by reusing bytes 128-255
depending on the language.

A font does not have a "unicode block". It only has characters.
Fonts and encoding are different things.

It gets complicated because DBCS languages (Chinese,
Japanese, Korean), have to use multiple bytes for single
characters in their non-unicode encoding, while all other
languages use one byte. If you just look at Western
languages it's easier to see. A text file with a single byte
128 (H80) is a Euro sign when using the English codepage.
In the Russian codepage it looks like a capital A. That's
how you'd see it in Notepad on an English or Russian
computer. The unicode value for a Euro sign is 8364,
or hex 20AC. H20AC would show in a hex editor as AC 20.
The English ANSI codepage would render that as an angled
dash followed by a space. The Russian codepage would
render it as something like a capital M followed by a space.
But if Notepad knows it's unicode then both computers
would render a Euro sign. Thus, no codepages for unicode.


Maybe I should have mentioned the "Unicode block" as "Unicode subrange".
Sorry, for the confusion.

Interesting. Maybe that's coming across in dropdown text
window as unicode but being interpreted as DBCS.


That's impossible. The "Gothic" text can't possibly be "LO[G" regardless of
what it was originally encoded with.

Did you actually see the katakana characters in the news message from your
news client? That (and this) message was encoded using Big5, BTW.

So what you need seems to be a monospaced, unicode
font, that includes Japanese characters, then use Paul's
trick to get at it in the console window. *If* your console
window can really display unicode. There's a list he

https://en.wikipedia.org/wiki/Unicode_font

A few are monospaced, but the selection seems to
be very limited. Arial Unicode MS has almost 40,000
characters, but many of the fonts only have 6,000 or
so. What you need is monospace unicode with
Japanese characters. Do any include Japanese? I don't
know. Maybe some Japanese company has specifically
made such a thing.


None of the mentioned fonts is accepted by the console, unfortunately.
 




Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off






All times are GMT +1. The time now is 06:01 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright 2004-2024 PCbanter.
The comments are property of their posters.