A Windows XP help forum. PCbanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » PCbanter forum » Microsoft Windows 7 » Windows 7 Forum
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Convert those dastardly curly quotes to straight quotes on Windows?



 
 
Thread Tools Rate Thread Display Modes
  #16  
Old October 8th 17, 04:39 PM posted to alt.comp.os.windows-10,alt.usage.english,alt.windows7.general
David E. Ross[_2_]
external usenet poster
 
Posts: 1,035
Default Convert those dastardly curly quotes to straight quotes onWindows?

On 10/7/2017 2:38 PM, harry newton wrote:
How can we convert those dastardly curly quotes to straight quotes on Windows?
http://i67.tinypic.com/2h5mjbr.jpg

I like to save into TEXT files on Windows technical information cut and
pasted from disjoint news articles where the unprintable curly quotes drive
me nuts!

Here is a screenshot of a sample cut and paste:
http://i67.tinypic.com/2h5mjbr.jpg

I tried cutting from the web and pasting into MS Word and then cutting from
MS Word and pasting into the text file - but the dastardly curly quotes
were still there.

I tried using Google Gmail, pasting into a composition window and then
hitting the "Tx" format text button, and even changing the font to some
other font, but the dastardly curly quotes were still there.

Since almost every technical web site uses the dastardly curly quotes, how
can I just get *rid* of them using a Windows method so that I can have a
text file that contains normal quotes?

Here's just one sample but the web is filled with dastardly curly quotes!
http://theverge.com/2017/10/6/16437790/iphone-8-swollen-battery-issue-apple-investigating


See http://www.fourmilab.ch/webtools/demoroniser/. This is a tool
that supposedly converts Microsoft "smart" characters to HTML-compatible
characters. Yes, it is 14 years old; and no, I have not tried it myself.

--
David E. Ross
http://www.rossde.com/

By allowing employers to eliminate coverage for birth control
from their insurance plans, President Trump has guaranteed there
will be an increase in the demand for abortions.
Ads
  #17  
Old October 8th 17, 04:48 PM posted to alt.comp.os.windows-10,alt.usage.english,alt.windows7.general
pyotr filipivich
external usenet poster
 
Posts: 752
Default Convert those dastardly curly quotes to straight quotes on Windows?

"Mayayana" on Sun, 8 Oct 2017 10:41:02 -0400
typed in alt.windows7.general the following:
"pyotr filipivich" wrote

|Microsoft is one of the worst for that
| problem. They write pages intended for an English-speaking
| audience, in English, then use just a handful of unnecessary
| UTF-8 characters that break the ANSI continuity. It makes
| no sense.
|
| IMOSHO, it makes no sense, but then it is Microsoft. Which often
| seem to have a lot of "I'm sure it makes sense - not to me, but to
| someone" elements.
|

That's a generous view. I don't see a problem
with switching to UTF-8, but what MS are doing is
to deliberately and unnecessarily break ASCII
compatibility without any need to do so,


MS has a policy of adopting something, modifying it out of
recognition, then insisting nothing else be used.

by replacing
quotes and spaces with unicode characters in UTF-8.
It seems to be a kind of political correctness attitude.
Nearly all English pages can easily be both ASCII and
UTF-8.

I wonder how journalists type those quotes. Maybe
they have a software program that does the conversion?

May be the word processing package they're using, which
automagically typesets on the fly.

--
pyotr filipivich
Next month's Panel: Graft - Boon or blessing?
  #18  
Old October 8th 17, 04:54 PM posted to alt.comp.os.windows-10,alt.usage.english,alt.windows7.general
Ken Blake[_5_]
external usenet poster
 
Posts: 2,221
Default Convert those dastardly curly quotes to straight quotes on Windows?

On Sat, 7 Oct 2017 23:30:52 +0000 (UTC), harry newton
wrote:

He who is Jason said on Sat, 7 Oct 2017 19:06:29 -0400:

Curly quotes (dastardly) are "normal" quotes. The straight quotes were
ASCII (and EBCDIC) excuses for "real" (dastardly, curly) quotes...


Depends on which side of the fence you live on.
Has the Internet Killed Curly Quotes?
https://www.theatlantic.com/technology/archive/2016/12/quotation-mark-wars/511766/

But I was just using "curly quotes" as just one of maybe a dozen or more
common dastardly abominations which just don't translate into text on
Windows, as shown in this simple example from Butterick's Practical
Typography:
https://practicaltypography.com/index.html#toc
Where curly quotes are just one of many evils:
https://practicaltypography.com/straight-and-curly-quotes.html

The problem is that my text editor (Gvim) isn't handling the dastardly
characters, so all I want to do is get rid of any character that any normal
text editor can't/won't/doesn't handle.



"Normal text editor"? I just pasted curly quotes into Notepad to be
sure it handled curly quotes. It does.

If yours doesn't, I suggest you change your text editor.
  #19  
Old October 8th 17, 04:56 PM posted to alt.comp.os.windows-10,alt.usage.english,alt.windows7.general
Ken Blake[_5_]
external usenet poster
 
Posts: 2,221
Default Convert those dastardly curly quotes to straight quotes on Windows?

On Sun, 8 Oct 2017 10:17:22 +0100, "J. P. Gilliver (John)"
wrote:

In message , harry newton
writes:
[]
The problem is that my text editor (Gvim) isn't handling the dastardly
characters, so all I want to do is get rid of any character that any normal
text editor can't/won't/doesn't handle.

[]
Of course, some would (and will) say why are you using a text editor
(probably inserting the word "still", to imply you're a dinosaur),




Using a text editor doesn't mean you're a dinosaur. Some of us
occasionally do things like create/modify .bat files.
  #20  
Old October 8th 17, 04:58 PM posted to alt.comp.os.windows-10,alt.usage.english,alt.windows7.general
J. P. Gilliver (John)[_4_]
external usenet poster
 
Posts: 2,679
Default Convert those dastardly curly quotes to straight quotes on Windows?

In message , pyotr
filipivich writes:
"Mayayana" on Sun, 8 Oct 2017 10:41:02 -0400
typed in alt.windows7.general the following:

[]
I wonder how journalists type those quotes. Maybe
they have a software program that does the conversion?

May be the word processing package they're using, which
automagically typesets on the fly.

Word does that (by default - you can turn it off); if I type "Fred", it
will convert the quotes into the 66 and 99 form (I think it calls them
"smart quotes"). [I think it does the same with single quotes, 'Fred'.]
You can stop it doing it either by turning off the setting, or on a
one-off basis by doing an Undo (Ctrl-Z) immediately after typing the ".

I wouldn't be surprised if some web-page editing software behaves
similarly.
--
J. P. Gilliver. UMRA: 1960/1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf

aibohphobia, n., The fear of palindromes.
  #21  
Old October 8th 17, 05:01 PM posted to alt.comp.os.windows-10,alt.usage.english,alt.windows7.general
J. P. Gilliver (John)[_4_]
external usenet poster
 
Posts: 2,679
Default Convert those dastardly curly quotes to straight quotes on Windows?

In message , Wolf K
writes:
[]
NB: ASCII is not ANSI. Ansi is ASCII plus codes 128 to 255.

Note ANSI 171 and 187. These are diagonal quotes, equivalent to curly
quotes.

As I recall it, characters 128 to 255 used to be called "extended
ASCII", way back in the Dark Ages, when people could write printer


Though that name for them gained wide circulation (and I sometimes use
it), I did read somewhere that it was never an official designation.

drivers by creating a list of escape codes and two- or three-byte codes
that described the dot pattern in the glyph matrix....

Another mess of esoteric useless knowledge.

Hi from a fellow dinosaur ... (-:
--
J. P. Gilliver. UMRA: 1960/1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf

aibohphobia, n., The fear of palindromes.
  #22  
Old October 8th 17, 05:06 PM posted to alt.comp.os.windows-10,alt.usage.english,alt.windows7.general
J. P. Gilliver (John)[_4_]
external usenet poster
 
Posts: 2,679
Default Convert those dastardly curly quotes to straight quotes on Windows?

In message , Ken Blake
writes:
On Sun, 8 Oct 2017 10:17:22 +0100, "J. P. Gilliver (John)"
wrote:

In message , harry newton
writes:
[]
The problem is that my text editor (Gvim) isn't handling the dastardly
characters, so all I want to do is get rid of any character that any normal
text editor can't/won't/doesn't handle.

[]
Of course, some would (and will) say why are you using a text editor
(probably inserting the word "still", to imply you're a dinosaur),




Using a text editor doesn't mean you're a dinosaur. Some of us
occasionally do things like create/modify .bat files.


I said _some_ would say. I'm not one of them (-:
--
J. P. Gilliver. UMRA: 1960/1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf

aibohphobia, n., The fear of palindromes.
  #23  
Old October 8th 17, 05:06 PM posted to alt.comp.os.windows-10,alt.usage.english,alt.windows7.general
Mayayana
external usenet poster
 
Posts: 6,438
Default Convert those dastardly curly quotes to straight quotes on Windows?

"Wolf K" wrote

| The OP wants to use a plain-text editor that only uses standard ASCII
| (not "extended ASCII", or codes - i. e. characters between 32 and 126
| decimal [plus newline]). He hasn't said why yet, but I understand what
| he wants. (I was going to say "... like Notepad", but Notepad does allow
| so-called "Extended ASCII", i. e. one particular set of the codes up to
| 255.) He is hoping for something that will render such text into
| nearest-equivalent (such as quotes that have directional qualities all
| into code 34 decimal).
| []
|
| NB: ASCII is not ANSI. Ansi is ASCII plus codes 128 to 255.
|
| Note ANSI 171 and 187. These are diagonal quotes, equivalent to curly
| quotes.
|

171 and 187 are double chevrons. That's not the same
as curly quotes. You're thinking of 147 and 148.

But that's true only for the standard English codepage.
If you're Russian you'll see Cyrillic characters. Something
like a capital Y and an oval with a vertical line through it.

ASCII is standard in all uses and matches the same
numbers in unicode. It specifies a basic western character
set for byte values 0 to 127. ANSI uses a "local codepage" to
define characters 128-255, while retaining the ASCII values
up to 127. The standard webpage encoding used to be
Windows English codepage ANSI. (ASCII in most cases.)
Now UTF-8 is more common.

UTF-8 is a way to express unicode using single bytes.
Unicode-16, what's usually just referred to as unicode,
encodes thousands of characters in 2 bytes, so each character
can have its own specific encoding number in order to fit
English, Russian and everything else. ASCII and ANSI use
a one-byte-per-character encoding, except with a few
Asian languages.

In order to internationalize the Web with minimal upset,
UTF-8 became standard. It allows for encoding unicode 16
in a one-byte system. The first 128 values are still ASCII.
The second 128 are used to create values with up to 4
bytes. Thus all languages can be encoded in one system.
It's still read 1 byte at a time and most webpages don't
change because most are still basically ASCII. (Whereas
if we'd converted to unicode, all webpage files would
have had to be converted to 2-byte encoding,making for
a lot of work and doubling the size of HTML files.)

The problem comes when UTF-8 is read as ANSI. (Most
text is still handled in one-byte-per-character ASCII/ANSI
encoding. Even things like JPG EXIF tags and PE file
import headers are ACSII/ANSI.)
There might be, say, 3 characters in UTF-8 that
indicate a left curly quote. I don't know exactly offhand.
But it might be, say, capital A with an umlaut, a 1/4 sign,
and a Euro sign. In the browser it's a left curly quote. In
Notepad it shows up as 3 wacky characters. The two
programs are interpreting the bytes by different standards.
So the text is corrupted. And that's just in English. A
browser reading the UTF-8 can display it properly and in
most cases will "sniff" the page to identify it even if the
HTML code does not specify. But when that single-byte
text is pasted to ANSI you see the ANSI characters. You
might see the Euro. A Russian will see something else. A
Greek will see a third thing.

What Harry is asking for is a simple way to convert
UTF-8 to ANSI using the standard English codepage. That
requires converting the string by parsing
the bytes. When the parser encounters bytes of 127+
it would need to decide how to treat them. Is it an
ANSI bullet, character 160 in English? Or is byte 160
the first of 2, 3, or 4 bytes, together indicating a character
in UTF-8? If it turns out to be, say, 3 bytes that render a
left curly quote in UTF-8, some kind of filter has to recognize
that exact pattern and say, "Oh, that's a quote. We'll just
substitute character 34 for those 3 bytes."
So Harry's solution has to treat each specific UTF-8
character and decide what to substitute. It's not a 1-to-1
correspondence. In other words, Notepad already translated
the UTF-8 to ANSI, but now it has to be transliterated.

If those quotes were written as character 34 in the first
place then the encoding would not matter. Everyone would
see ", because " is in the ASCII range.

Whiskers made an interesting point that I wasn't aware of:
The page he links says that MS Office products have an option
for fancy characters like curly quotes. Maybe that helps explain
why so many of them are on wepages. MS Office users are
among the most parochial of all computer users. They're usually
not tech-literate but are computer-literate. The result is millions
of people who equate their computer with MS Office and
assume the whole world also uses MS Office. They're the people
who send emails from Word or send a 60,000 byte DOC file to
communicate 1 sentence of 24 bytes. Many of those same people
are probably also creating webpage from MS Word, oblivious
of the travesty.




  #24  
Old October 8th 17, 05:10 PM posted to alt.comp.os.windows-10,alt.usage.english,alt.windows7.general
J. P. Gilliver (John)[_4_]
external usenet poster
 
Posts: 2,679
Default Convert those dastardly curly quotes to straight quotes on Windows?

In message , Ken Blake
writes:
[]
The problem is that my text editor (Gvim) isn't handling the dastardly
characters, so all I want to do is get rid of any character that any normal
text editor can't/won't/doesn't handle.



"Normal text editor"? I just pasted curly quotes into Notepad to be
sure it handled curly quotes. It does.

If yours doesn't, I suggest you change your text editor.


The distinction is blurred. To some people, a text editor is something
that doesn't do formatting, bold, italic, underlined, fonts, etcetera
(and thus NotePad is one such); to other people, it is one that only
works with ASCII codes 32 to 126 plus newline. There _are_ places where
only the latter is valid. (Headerless usenet, for example, though ANSI
characters _usually_ get through that unaltered.)
--
J. P. Gilliver. UMRA: 1960/1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf

aibohphobia, n., The fear of palindromes.
  #25  
Old October 8th 17, 05:11 PM posted to alt.comp.os.windows-10,alt.usage.english,alt.windows7.general
Andy Burns[_6_]
external usenet poster
 
Posts: 1,318
Default Convert those dastardly curly quotes to straight quotes onWindows?

Mayayana wrote:

ASCII is standard in all uses


Except when UK users want a pound sign £ and get a hash symbol # (yes I
realise Americans may call that a pound sign)
  #26  
Old October 8th 17, 05:28 PM posted to alt.comp.os.windows-10,alt.usage.english,alt.windows7.general
J. P. Gilliver (John)[_4_]
external usenet poster
 
Posts: 2,679
Default Convert those dastardly curly quotes to straight quotes on Windows?

In message , Mayayana
writes:
[]
ASCII is standard in all uses and matches the same
numbers in unicode. It specifies a basic western character
set for byte values 0 to 127. ANSI uses a "local codepage" to


If we're strictly talking about _characters_, it's 32 to 126 (-:
[]
In order to internationalize the Web with minimal upset,
UTF-8 became standard. It allows for encoding unicode 16
in a one-byte system. The first 128 values are still ASCII.
The second 128 are used to create values with up to 4
bytes. Thus all languages can be encoded in one system.


How does the receiving (and thus decoding) software know whether it's 2,
3, or 4 bytes - are three of the 128 beyond 127 reserved as meaning the
next one, two, or three bytes are part of the same character (somewhat
like, but also unlike, the shift characters in Baudot 5-bit code)?
[]
What Harry is asking for is a simple way to convert
UTF-8 to ANSI using the standard English codepage. That
requires converting the string by parsing
the bytes. When the parser encounters bytes of 127+
it would need to decide how to treat them. Is it an
ANSI bullet, character 160 in English? Or is byte 160
the first of 2, 3, or 4 bytes, together indicating a character
in UTF-8? If it turns out to be, say, 3 bytes that render a
left curly quote in UTF-8, some kind of filter has to recognize
that exact pattern and say, "Oh, that's a quote. We'll just
substitute character 34 for those 3 bytes."
So Harry's solution has to treat each specific UTF-8
character and decide what to substitute. It's not a 1-to-1
correspondence. In other words, Notepad already translated
the UTF-8 to ANSI, but now it has to be transliterated.


Yes, it's not _that_ simple, though a many-to-1 (well, to-94) mapping
ought not to be impossible.

If those quotes were written as character 34 in the first
place then the encoding would not matter. Everyone would
see ", because " is in the ASCII range.

Whiskers made an interesting point that I wasn't aware of:
The page he links says that MS Office products have an option
for fancy characters like curly quotes. Maybe that helps explain
why so many of them are on wepages. MS Office users are
among the most parochial of all computer users. They're usually
not tech-literate but are computer-literate. The result is millions
of people who equate their computer with MS Office and
assume the whole world also uses MS Office. They're the people
who send emails from Word or send a 60,000 byte DOC file to
communicate 1 sentence of 24 bytes. Many of those same people
are probably also creating webpage from MS Word, oblivious
of the travesty.

Yes, that's probably the cause.



4
--
J. P. Gilliver. UMRA: 1960/1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf

aibohphobia, n., The fear of palindromes.
  #27  
Old October 8th 17, 05:34 PM posted to alt.comp.os.windows-10,alt.usage.english,alt.windows7.general
Mayayana
external usenet poster
 
Posts: 6,438
Default Convert those dastardly curly quotes to straight quotes on Windows?

"Wolf K" wrote

| ANSI = ASCII plus 128 to 255. Most ANSI codes have Unicode counterparts.
|
| See
| http://ascii-table.com/ansi-codes.php
|

Answered in another post. Note that page you
linked explains that it's showing codepage 1252.
The standard Windows English codepage. That
only holds if your local language is set to English.

The whole thing gets very complicated. I tried
to clarify it in my other post. In brief, ASCII is the
same for everyone and deals with representing
characters with single byte values from 0 -127.
ANSI adds the rest of the byte - 128-255. But the
character represented depends on the local codepage.
Russian ANSI text is not the same as English ANSI
text and Turkish will be different yet again.

Unicode uses 2 bytes per character. A fundamentally
different way to encode characters. It allows for
characters in Russian, Turkish, etc to all have their
own unique numeric values.

UTF-8, which is how most webpages are now encoded,
is a one-byte encoding that uses 1-4 bytes to represent
all of the unicode set. (One byte in this case means that
each byte is read as a signifier while in normal unicode
2 bytes at a time are read as a signifier.)

So... "a" is 97 in ASCII. It's 97 in ANSI. It's 97 in UTF-8.
All one byte. In unicode it's byte 0 followed by byte 97.
But curly quotes are not in ASCII. In the English ANSI
codepage they're 147 and 148. But not in other codepages.
In unicode they're 8220 and 8221. 8220 would be represented
by byte values 32 and 28. 2 bytes for the single character,
read as a single, 2-byte numeric value. In UTF-8 encoding
the left curly quote is rendered with bytes 226-128-156
(hex E2 80 9C). It's not a 3-byte number. It's a pattern of 3
1-byte numbers.

If you download this webpage in a hex editor and look at
the bytes you can see:

https://www.dwheeler.com/essays/quotes-test-utf-8.html

The page also shows how it's possible to use non-standard
characters in standard ASCII HTML by using HTML encoding:
“ will render as a left curly quote, regardless of
language.


  #28  
Old October 8th 17, 05:52 PM posted to alt.comp.os.windows-10,alt.usage.english,alt.windows7.general
Mayayana
external usenet poster
 
Posts: 6,438
Default Convert those dastardly curly quotes to straight quotes on Windows?

"J. P. Gilliver (John)" wrote

| ASCII is standard in all uses and matches the same
| numbers in unicode. It specifies a basic western character
| set for byte values 0 to 127. ANSI uses a "local codepage" to
|
| If we're strictly talking about _characters_, it's 32 to 126 (-:

Ms. Line Return and Mr. Null might take offense to that.
In parsing they're all characters. Even a null. I sometimes
use character 1 as a marker in text programmatically
because it's neutral. It means nothing in English, formatting,
etc but it still acts as a character.

| How does the receiving (and thus decoding) software know whether it's 2,
| 3, or 4 bytes - are three of the 128 beyond 127 reserved as meaning the
| next one, two, or three bytes are part of the same character (somewhat
| like, but also unlike, the shift characters in Baudot 5-bit code)?

There are webpages about that. I actually wrote some
VBScript code for it awhile back, but now I've forgotten.
It's a pain in the neck.

http://www.jsware.net/jsware/scrfiles.php5#u2a


  #29  
Old October 8th 17, 05:57 PM posted to alt.comp.os.windows-10,alt.usage.english,alt.windows7.general
Char Jackson
external usenet poster
 
Posts: 10,449
Default Convert those dastardly curly quotes to straight quotes on Windows?

On Sat, 7 Oct 2017 23:30:52 +0000 (UTC), harry newton
wrote:

The problem is that my text editor (Gvim) isn't handling the dastardly
characters, so all I want to do is get rid of any character that any normal
text editor can't/won't/doesn't handle.


The obvious answer is to use another text editor, one that doesn't have
the problems that you object to. I use and recommend Notepad++.

The other obvious approach is to write a macro, or a series of macros if
you want a more modular approach, that you can run to fix the evils that
you see with GVIM. Most text editors include macro capability, but if
yours doesn't, you mentioned having access to Word, so in a pinch you
could do it there by using VBA.

I repeat, though, the obvious answer is to use another text editor. If
Notepad++ isn't to your liking, many of my colleagues have settled on
Ultra Edit or Textpad, so you might give those a try.

Closing thought, does GVIM let you choose a better character set, one
that includes symbols for the things that are currently not able to be
displayed?

  #30  
Old October 8th 17, 06:32 PM posted to alt.comp.os.windows-10,alt.usage.english,alt.windows7.general
Char Jackson
external usenet poster
 
Posts: 10,449
Default Convert those dastardly curly quotes to straight quotes on Windows?

On Sun, 8 Oct 2017 17:11:10 +0100, Andy Burns
wrote:

Mayayana wrote:

ASCII is standard in all uses


Except when UK users want a pound sign £ and get a hash symbol # (yes I
realise Americans may call that a pound sign)


I was working with a customer about a year ago, helping him edit the
config file for a piece of his networking gear. He wanted to add a
comment, which in that case is signified by a line starting with the "#"
symbol.

I asked him to type a pound sign. He paused, scanning his keyboard
unsuccessfully, so I helpfully added, "Shift-3". He said, "Oh! You mean
a hashtag!"

Millennials... Thanks, Twitter!

 




Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off






All times are GMT +1. The time now is 07:23 PM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2004-2024 PCbanter.
The comments are property of their posters.