If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
Thread Tools | Rate Thread | Display Modes |
#16
|
|||
|
|||
Convert those dastardly curly quotes to straight quotes onWindows?
On 10/7/2017 2:38 PM, harry newton wrote:
How can we convert those dastardly curly quotes to straight quotes on Windows? http://i67.tinypic.com/2h5mjbr.jpg I like to save into TEXT files on Windows technical information cut and pasted from disjoint news articles where the unprintable curly quotes drive me nuts! Here is a screenshot of a sample cut and paste: http://i67.tinypic.com/2h5mjbr.jpg I tried cutting from the web and pasting into MS Word and then cutting from MS Word and pasting into the text file - but the dastardly curly quotes were still there. I tried using Google Gmail, pasting into a composition window and then hitting the "Tx" format text button, and even changing the font to some other font, but the dastardly curly quotes were still there. Since almost every technical web site uses the dastardly curly quotes, how can I just get *rid* of them using a Windows method so that I can have a text file that contains normal quotes? Here's just one sample but the web is filled with dastardly curly quotes! http://theverge.com/2017/10/6/16437790/iphone-8-swollen-battery-issue-apple-investigating See http://www.fourmilab.ch/webtools/demoroniser/. This is a tool that supposedly converts Microsoft "smart" characters to HTML-compatible characters. Yes, it is 14 years old; and no, I have not tried it myself. -- David E. Ross http://www.rossde.com/ By allowing employers to eliminate coverage for birth control from their insurance plans, President Trump has guaranteed there will be an increase in the demand for abortions. |
Ads |
#17
|
|||
|
|||
Convert those dastardly curly quotes to straight quotes on Windows?
"Mayayana" on Sun, 8 Oct 2017 10:41:02 -0400
typed in alt.windows7.general the following: "pyotr filipivich" wrote |Microsoft is one of the worst for that | problem. They write pages intended for an English-speaking | audience, in English, then use just a handful of unnecessary | UTF-8 characters that break the ANSI continuity. It makes | no sense. | | IMOSHO, it makes no sense, but then it is Microsoft. Which often | seem to have a lot of "I'm sure it makes sense - not to me, but to | someone" elements. | That's a generous view. I don't see a problem with switching to UTF-8, but what MS are doing is to deliberately and unnecessarily break ASCII compatibility without any need to do so, MS has a policy of adopting something, modifying it out of recognition, then insisting nothing else be used. by replacing quotes and spaces with unicode characters in UTF-8. It seems to be a kind of political correctness attitude. Nearly all English pages can easily be both ASCII and UTF-8. I wonder how journalists type those quotes. Maybe they have a software program that does the conversion? May be the word processing package they're using, which automagically typesets on the fly. -- pyotr filipivich Next month's Panel: Graft - Boon or blessing? |
#18
|
|||
|
|||
Convert those dastardly curly quotes to straight quotes on Windows?
On Sat, 7 Oct 2017 23:30:52 +0000 (UTC), harry newton
wrote: He who is Jason said on Sat, 7 Oct 2017 19:06:29 -0400: Curly quotes (dastardly) are "normal" quotes. The straight quotes were ASCII (and EBCDIC) excuses for "real" (dastardly, curly) quotes... Depends on which side of the fence you live on. Has the Internet Killed Curly Quotes? https://www.theatlantic.com/technology/archive/2016/12/quotation-mark-wars/511766/ But I was just using "curly quotes" as just one of maybe a dozen or more common dastardly abominations which just don't translate into text on Windows, as shown in this simple example from Butterick's Practical Typography: https://practicaltypography.com/index.html#toc Where curly quotes are just one of many evils: https://practicaltypography.com/straight-and-curly-quotes.html The problem is that my text editor (Gvim) isn't handling the dastardly characters, so all I want to do is get rid of any character that any normal text editor can't/won't/doesn't handle. "Normal text editor"? I just pasted curly quotes into Notepad to be sure it handled curly quotes. It does. If yours doesn't, I suggest you change your text editor. |
#19
|
|||
|
|||
Convert those dastardly curly quotes to straight quotes on Windows?
On Sun, 8 Oct 2017 10:17:22 +0100, "J. P. Gilliver (John)"
wrote: In message , harry newton writes: [] The problem is that my text editor (Gvim) isn't handling the dastardly characters, so all I want to do is get rid of any character that any normal text editor can't/won't/doesn't handle. [] Of course, some would (and will) say why are you using a text editor (probably inserting the word "still", to imply you're a dinosaur), Using a text editor doesn't mean you're a dinosaur. Some of us occasionally do things like create/modify .bat files. |
#20
|
|||
|
|||
Convert those dastardly curly quotes to straight quotes on Windows?
In message , pyotr
filipivich writes: "Mayayana" on Sun, 8 Oct 2017 10:41:02 -0400 typed in alt.windows7.general the following: [] I wonder how journalists type those quotes. Maybe they have a software program that does the conversion? May be the word processing package they're using, which automagically typesets on the fly. Word does that (by default - you can turn it off); if I type "Fred", it will convert the quotes into the 66 and 99 form (I think it calls them "smart quotes"). [I think it does the same with single quotes, 'Fred'.] You can stop it doing it either by turning off the setting, or on a one-off basis by doing an Undo (Ctrl-Z) immediately after typing the ". I wouldn't be surprised if some web-page editing software behaves similarly. -- J. P. Gilliver. UMRA: 1960/1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf aibohphobia, n., The fear of palindromes. |
#21
|
|||
|
|||
Convert those dastardly curly quotes to straight quotes on Windows?
In message , Wolf K
writes: [] NB: ASCII is not ANSI. Ansi is ASCII plus codes 128 to 255. Note ANSI 171 and 187. These are diagonal quotes, equivalent to curly quotes. As I recall it, characters 128 to 255 used to be called "extended ASCII", way back in the Dark Ages, when people could write printer Though that name for them gained wide circulation (and I sometimes use it), I did read somewhere that it was never an official designation. drivers by creating a list of escape codes and two- or three-byte codes that described the dot pattern in the glyph matrix.... Another mess of esoteric useless knowledge. Hi from a fellow dinosaur ... (-: -- J. P. Gilliver. UMRA: 1960/1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf aibohphobia, n., The fear of palindromes. |
#22
|
|||
|
|||
Convert those dastardly curly quotes to straight quotes on Windows?
In message , Ken Blake
writes: On Sun, 8 Oct 2017 10:17:22 +0100, "J. P. Gilliver (John)" wrote: In message , harry newton writes: [] The problem is that my text editor (Gvim) isn't handling the dastardly characters, so all I want to do is get rid of any character that any normal text editor can't/won't/doesn't handle. [] Of course, some would (and will) say why are you using a text editor (probably inserting the word "still", to imply you're a dinosaur), Using a text editor doesn't mean you're a dinosaur. Some of us occasionally do things like create/modify .bat files. I said _some_ would say. I'm not one of them (-: -- J. P. Gilliver. UMRA: 1960/1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf aibohphobia, n., The fear of palindromes. |
#23
|
|||
|
|||
Convert those dastardly curly quotes to straight quotes on Windows?
"Wolf K" wrote
| The OP wants to use a plain-text editor that only uses standard ASCII | (not "extended ASCII", or codes - i. e. characters between 32 and 126 | decimal [plus newline]). He hasn't said why yet, but I understand what | he wants. (I was going to say "... like Notepad", but Notepad does allow | so-called "Extended ASCII", i. e. one particular set of the codes up to | 255.) He is hoping for something that will render such text into | nearest-equivalent (such as quotes that have directional qualities all | into code 34 decimal). | [] | | NB: ASCII is not ANSI. Ansi is ASCII plus codes 128 to 255. | | Note ANSI 171 and 187. These are diagonal quotes, equivalent to curly | quotes. | 171 and 187 are double chevrons. That's not the same as curly quotes. You're thinking of 147 and 148. But that's true only for the standard English codepage. If you're Russian you'll see Cyrillic characters. Something like a capital Y and an oval with a vertical line through it. ASCII is standard in all uses and matches the same numbers in unicode. It specifies a basic western character set for byte values 0 to 127. ANSI uses a "local codepage" to define characters 128-255, while retaining the ASCII values up to 127. The standard webpage encoding used to be Windows English codepage ANSI. (ASCII in most cases.) Now UTF-8 is more common. UTF-8 is a way to express unicode using single bytes. Unicode-16, what's usually just referred to as unicode, encodes thousands of characters in 2 bytes, so each character can have its own specific encoding number in order to fit English, Russian and everything else. ASCII and ANSI use a one-byte-per-character encoding, except with a few Asian languages. In order to internationalize the Web with minimal upset, UTF-8 became standard. It allows for encoding unicode 16 in a one-byte system. The first 128 values are still ASCII. The second 128 are used to create values with up to 4 bytes. Thus all languages can be encoded in one system. It's still read 1 byte at a time and most webpages don't change because most are still basically ASCII. (Whereas if we'd converted to unicode, all webpage files would have had to be converted to 2-byte encoding,making for a lot of work and doubling the size of HTML files.) The problem comes when UTF-8 is read as ANSI. (Most text is still handled in one-byte-per-character ASCII/ANSI encoding. Even things like JPG EXIF tags and PE file import headers are ACSII/ANSI.) There might be, say, 3 characters in UTF-8 that indicate a left curly quote. I don't know exactly offhand. But it might be, say, capital A with an umlaut, a 1/4 sign, and a Euro sign. In the browser it's a left curly quote. In Notepad it shows up as 3 wacky characters. The two programs are interpreting the bytes by different standards. So the text is corrupted. And that's just in English. A browser reading the UTF-8 can display it properly and in most cases will "sniff" the page to identify it even if the HTML code does not specify. But when that single-byte text is pasted to ANSI you see the ANSI characters. You might see the Euro. A Russian will see something else. A Greek will see a third thing. What Harry is asking for is a simple way to convert UTF-8 to ANSI using the standard English codepage. That requires converting the string by parsing the bytes. When the parser encounters bytes of 127+ it would need to decide how to treat them. Is it an ANSI bullet, character 160 in English? Or is byte 160 the first of 2, 3, or 4 bytes, together indicating a character in UTF-8? If it turns out to be, say, 3 bytes that render a left curly quote in UTF-8, some kind of filter has to recognize that exact pattern and say, "Oh, that's a quote. We'll just substitute character 34 for those 3 bytes." So Harry's solution has to treat each specific UTF-8 character and decide what to substitute. It's not a 1-to-1 correspondence. In other words, Notepad already translated the UTF-8 to ANSI, but now it has to be transliterated. If those quotes were written as character 34 in the first place then the encoding would not matter. Everyone would see ", because " is in the ASCII range. Whiskers made an interesting point that I wasn't aware of: The page he links says that MS Office products have an option for fancy characters like curly quotes. Maybe that helps explain why so many of them are on wepages. MS Office users are among the most parochial of all computer users. They're usually not tech-literate but are computer-literate. The result is millions of people who equate their computer with MS Office and assume the whole world also uses MS Office. They're the people who send emails from Word or send a 60,000 byte DOC file to communicate 1 sentence of 24 bytes. Many of those same people are probably also creating webpage from MS Word, oblivious of the travesty. |
#24
|
|||
|
|||
Convert those dastardly curly quotes to straight quotes on Windows?
In message , Ken Blake
writes: [] The problem is that my text editor (Gvim) isn't handling the dastardly characters, so all I want to do is get rid of any character that any normal text editor can't/won't/doesn't handle. "Normal text editor"? I just pasted curly quotes into Notepad to be sure it handled curly quotes. It does. If yours doesn't, I suggest you change your text editor. The distinction is blurred. To some people, a text editor is something that doesn't do formatting, bold, italic, underlined, fonts, etcetera (and thus NotePad is one such); to other people, it is one that only works with ASCII codes 32 to 126 plus newline. There _are_ places where only the latter is valid. (Headerless usenet, for example, though ANSI characters _usually_ get through that unaltered.) -- J. P. Gilliver. UMRA: 1960/1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf aibohphobia, n., The fear of palindromes. |
#25
|
|||
|
|||
Convert those dastardly curly quotes to straight quotes onWindows?
Mayayana wrote:
ASCII is standard in all uses Except when UK users want a pound sign £ and get a hash symbol # (yes I realise Americans may call that a pound sign) |
#26
|
|||
|
|||
Convert those dastardly curly quotes to straight quotes on Windows?
In message , Mayayana
writes: [] ASCII is standard in all uses and matches the same numbers in unicode. It specifies a basic western character set for byte values 0 to 127. ANSI uses a "local codepage" to If we're strictly talking about _characters_, it's 32 to 126 (-: [] In order to internationalize the Web with minimal upset, UTF-8 became standard. It allows for encoding unicode 16 in a one-byte system. The first 128 values are still ASCII. The second 128 are used to create values with up to 4 bytes. Thus all languages can be encoded in one system. How does the receiving (and thus decoding) software know whether it's 2, 3, or 4 bytes - are three of the 128 beyond 127 reserved as meaning the next one, two, or three bytes are part of the same character (somewhat like, but also unlike, the shift characters in Baudot 5-bit code)? [] What Harry is asking for is a simple way to convert UTF-8 to ANSI using the standard English codepage. That requires converting the string by parsing the bytes. When the parser encounters bytes of 127+ it would need to decide how to treat them. Is it an ANSI bullet, character 160 in English? Or is byte 160 the first of 2, 3, or 4 bytes, together indicating a character in UTF-8? If it turns out to be, say, 3 bytes that render a left curly quote in UTF-8, some kind of filter has to recognize that exact pattern and say, "Oh, that's a quote. We'll just substitute character 34 for those 3 bytes." So Harry's solution has to treat each specific UTF-8 character and decide what to substitute. It's not a 1-to-1 correspondence. In other words, Notepad already translated the UTF-8 to ANSI, but now it has to be transliterated. Yes, it's not _that_ simple, though a many-to-1 (well, to-94) mapping ought not to be impossible. If those quotes were written as character 34 in the first place then the encoding would not matter. Everyone would see ", because " is in the ASCII range. Whiskers made an interesting point that I wasn't aware of: The page he links says that MS Office products have an option for fancy characters like curly quotes. Maybe that helps explain why so many of them are on wepages. MS Office users are among the most parochial of all computer users. They're usually not tech-literate but are computer-literate. The result is millions of people who equate their computer with MS Office and assume the whole world also uses MS Office. They're the people who send emails from Word or send a 60,000 byte DOC file to communicate 1 sentence of 24 bytes. Many of those same people are probably also creating webpage from MS Word, oblivious of the travesty. Yes, that's probably the cause. 4 -- J. P. Gilliver. UMRA: 1960/1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf aibohphobia, n., The fear of palindromes. |
#27
|
|||
|
|||
Convert those dastardly curly quotes to straight quotes on Windows?
"Wolf K" wrote
| ANSI = ASCII plus 128 to 255. Most ANSI codes have Unicode counterparts. | | See | http://ascii-table.com/ansi-codes.php | Answered in another post. Note that page you linked explains that it's showing codepage 1252. The standard Windows English codepage. That only holds if your local language is set to English. The whole thing gets very complicated. I tried to clarify it in my other post. In brief, ASCII is the same for everyone and deals with representing characters with single byte values from 0 -127. ANSI adds the rest of the byte - 128-255. But the character represented depends on the local codepage. Russian ANSI text is not the same as English ANSI text and Turkish will be different yet again. Unicode uses 2 bytes per character. A fundamentally different way to encode characters. It allows for characters in Russian, Turkish, etc to all have their own unique numeric values. UTF-8, which is how most webpages are now encoded, is a one-byte encoding that uses 1-4 bytes to represent all of the unicode set. (One byte in this case means that each byte is read as a signifier while in normal unicode 2 bytes at a time are read as a signifier.) So... "a" is 97 in ASCII. It's 97 in ANSI. It's 97 in UTF-8. All one byte. In unicode it's byte 0 followed by byte 97. But curly quotes are not in ASCII. In the English ANSI codepage they're 147 and 148. But not in other codepages. In unicode they're 8220 and 8221. 8220 would be represented by byte values 32 and 28. 2 bytes for the single character, read as a single, 2-byte numeric value. In UTF-8 encoding the left curly quote is rendered with bytes 226-128-156 (hex E2 80 9C). It's not a 3-byte number. It's a pattern of 3 1-byte numbers. If you download this webpage in a hex editor and look at the bytes you can see: https://www.dwheeler.com/essays/quotes-test-utf-8.html The page also shows how it's possible to use non-standard characters in standard ASCII HTML by using HTML encoding: “ will render as a left curly quote, regardless of language. |
#28
|
|||
|
|||
Convert those dastardly curly quotes to straight quotes on Windows?
"J. P. Gilliver (John)" wrote
| ASCII is standard in all uses and matches the same | numbers in unicode. It specifies a basic western character | set for byte values 0 to 127. ANSI uses a "local codepage" to | | If we're strictly talking about _characters_, it's 32 to 126 (-: Ms. Line Return and Mr. Null might take offense to that. In parsing they're all characters. Even a null. I sometimes use character 1 as a marker in text programmatically because it's neutral. It means nothing in English, formatting, etc but it still acts as a character. | How does the receiving (and thus decoding) software know whether it's 2, | 3, or 4 bytes - are three of the 128 beyond 127 reserved as meaning the | next one, two, or three bytes are part of the same character (somewhat | like, but also unlike, the shift characters in Baudot 5-bit code)? There are webpages about that. I actually wrote some VBScript code for it awhile back, but now I've forgotten. It's a pain in the neck. http://www.jsware.net/jsware/scrfiles.php5#u2a |
#29
|
|||
|
|||
Convert those dastardly curly quotes to straight quotes on Windows?
On Sat, 7 Oct 2017 23:30:52 +0000 (UTC), harry newton
wrote: The problem is that my text editor (Gvim) isn't handling the dastardly characters, so all I want to do is get rid of any character that any normal text editor can't/won't/doesn't handle. The obvious answer is to use another text editor, one that doesn't have the problems that you object to. I use and recommend Notepad++. The other obvious approach is to write a macro, or a series of macros if you want a more modular approach, that you can run to fix the evils that you see with GVIM. Most text editors include macro capability, but if yours doesn't, you mentioned having access to Word, so in a pinch you could do it there by using VBA. I repeat, though, the obvious answer is to use another text editor. If Notepad++ isn't to your liking, many of my colleagues have settled on Ultra Edit or Textpad, so you might give those a try. Closing thought, does GVIM let you choose a better character set, one that includes symbols for the things that are currently not able to be displayed? |
#30
|
|||
|
|||
Convert those dastardly curly quotes to straight quotes on Windows?
On Sun, 8 Oct 2017 17:11:10 +0100, Andy Burns
wrote: Mayayana wrote: ASCII is standard in all uses Except when UK users want a pound sign £ and get a hash symbol # (yes I realise Americans may call that a pound sign) I was working with a customer about a year ago, helping him edit the config file for a piece of his networking gear. He wanted to add a comment, which in that case is signified by a line starting with the "#" symbol. I asked him to type a pound sign. He paused, scanning his keyboard unsuccessfully, so I helpfully added, "Shift-3". He said, "Oh! You mean a hashtag!" Millennials... Thanks, Twitter! |
Thread Tools | |
Display Modes | Rate This Thread |
|
|