PCbanter - View Single Post - Convert those dastardly curly quotes to straight quotes on Windows?

**Anton Shepelev**

Whiskers to Mayayana:

You keep complicating things unnecessarily.
ASCII is valid UTF-8. UTF-8 is not a character
set. It's an encoding method. It defines how
byte values represent characters.

You're the one complicating things, by trying to
deny that UTF-8 is the global standard character
set. Which it is. UTF-8 is not the same as 'Uni-
code', although the terms are often interchanged
erroneously.

I think I understand what Mayayana means. ASCII is
just a character set because it is defined as a sim-
ple mapping of each character to a one-byte value.
Reading such text is a trivial task.

UTF-8, however, is more compicated than that. It is
an encoding method where each character is encoded
as a variable-base number. The reading algorithm is
harder to implement, and the internal storage of
such characters is more cumbersome: you either waste
memory by storing every character as a four-byte
value or implement convoluted and inefficient algo-
rithms to work with the underlying variable-base
stream.

I prefer ASCII for its simplicity. All classic
typesetting systems work with ASCII sources. For
example, \(:u represents u with umlaut in Troff,
which I used to print the bar-joke collection.

Windows-1251 is obsolete.

Why do you think so? It is used extensively in
Runet, and I store my text files in this encoding.
8-bit character sets are perfectly adequate for the
combinations of English and any other language with
fewer than 128 graphemes.

It's also Microsoft-only.

It is so by birth alone. Since classic and modern
editors tend to support both 1251 and KOI8-R, the
problem of incompatibility is virtually non-exis-
tant, and, when required, may be solved through
widely available conversion utilities, such as
iconv. Futhermore, the encoding is so simple that
everyone can write a trivial conversion program.

Where do you get the idea that there's software
that can render UTF-8 but not ANSI? I'd be sur-
prised if such a thing exists. But the opposite
is true. That's why I suggested that sending in
ASCII is a good approach. It's the lowest common
denominator.

You're the one who keeps bringing up ANSI. I
think you probably mean the collection of alterna-
tive (mutually incompatible) Microsoft character
sets that extend ASCII to give some foreign non-
USA alternative characters depending on which
character set is chosen.

Your paragraph is not a reply to Mayayana's. ASCII
indeed is the least common denominator for Latin-
based alphabets.

None of those character sets is compatible with
any of the others, nor with UTF-8 nor of course
with ASCII in that they are supersets of it.

I am not sure which ones you mean, but Windows-1251,
KOI8-R, and all the ISO/IEC 8859 character sets are
perfectly compatible with ASCII. Futhermore, the
witty designers of KOI8-R organised it so that the
removal of the highest bit shall turn Russian text
into a readable transliteration.

They worked well enough for people using their
computers to produce printed documents, but they
weren't designed for the internet and they don't
work well there.

And I think it is the other way round. Good printed
documens often require a richer set of characters
than ASCII provides, whereas on the web they are
rarely needed. I should hate to see '--' for a em-
dash in a printed book but am content with the it on
a web-page.

--
() ascii ribbon campaign -- against html e-mail
/\ http://preview.tinyurl.com/qcy6mjc [archived]