Convert those dastardly curly quotes to straight quotes on Windows?

**Mayayana**

"harry newton" wrote

| What did I do wrongly?

You pasted into Notepad and saved as UTF-8, but
then you opened it in VIM, which is rendering ANSI.
Try opening the same file in Notepad.

I know you don't want to hear this, but it's
complicated.

If you save text as UTF-8 you solve the problem,
but then you need to read/edit as UTF-8 and need a
program that recognizes it. Suddenly you're dealing
with a number of plain text formats.
Notice the first 3 weird marks in your VIM screenshot?
Those are the UTF-8 marker characters, bytes EF BB BF,
that tell the editor it's UTF-8. Vim doesn't see them.
It's just rendering them as characters.

But your sample provides a good example of what you're
up against. There are more UTF-8 characters in that text
than just curly quotes. That's why I was saying that it's not
as simple as a basic replacement. The number and variety of
UTF-8 characters in copied text is an unknown. If you just
need to do a replace on curly quotes that's doable, but any
more than that may take time.

You can try my scripts, which I linked to yesterday:

http://www.jsware.net/jsware/scrfiles.php5#u2a

Even that's not a totally simple solution because UTF-8
does not entirely translate to ANSI. I explained it in the
readme file. One script translates UTF-8 characters that
match English ANSI characters. It's basically what you want.
Another converts the text to unicode and then converts
that to ANSI. The 3rd script copies webpage text and
converts the encoding to English codepage using ADODB.
In many cases UTF-8 - ANSI works fine. But it's like
translating between dialects. It's not just a simple case of
correcting byte values.

I recently read that
one can now depict a pile of **** emoji in unicode. It's
been assigned a unicode code point! But there's no pile
of **** character in the English ANSI codepage. So it
won't translate.
..... And that gets into a whole other kettle of fish that
I mentioned yesterday: Your translations will be limited not
only by your codepage but also by fonts. Most fonts don't
have a pile of **** character. Most fonts don't cover much
if any of the unicode range. That's why you get the boxes.
You're using a primitive console font in Vim that's very limited
in its coverage. Fine if you're just hanging around coding
with your Babylonian friends, but inadequate to handle text
from a "modern" webpage.

**Mayayana**

"harry newton" wrote

| I've installed Notepad++ but I don't know anything about it yet.

It's a basic coding editor. Like Vim, it's primarily
designed to provide garden variety coding features,
like line numbers and syntax highlighting. But it's
solidly built and includes most extras you might want.
There's a dedicated menu just for encoding options.

**harry newton**

He who is J. P. Gilliver (John) said on Mon, 9 Oct 2017 13:41:49 +-0100:

Does it accept the Alt-plus-NumPad method of entering characters (in
Windows)?

Check this screenshot out I just made for you!
http://wetakepic.com/images/2017/10/09/digraphs.jpg

I know more now than I did yesterday, so I can say that this works in vi:
control +- q +- 147 == adds an opening curly quote unprintable character
control +- q +- 148 == adds a closing curly quote unprintable character
See also: http://vim.wikia.com/wiki/Entering_special_characters

Try this in Notepad first, to see how it works:
0. Ensure the Num Lock light is on.
1. Press _and hold down_ the Alt key.
2. Type some digits - on the numeric pad, _not_ the top row of keys.
3. Release the Alt key.
(It's a lot smoother than it sounds.)

In vi, this process of [Numlock]Alt +- {0,1,2,...9} didn't seem to do
anything differently than if I had used the top row of number keys.

That is, it typed in the numbers.

Sad, I know, but I had actually memorised the codes for several
characters I used a lot: 0177 IIRR for plus-or-minus +ALE-, 0215 for
multiply +ANc-, 0176 and 0186 for degrees and raised o +ALA- +ALo- (though I can
never remember which is which), 230 for micro +A7w-. (Note that a lot of
those start with a zero, which _is_ needed.)

Look at this, which I didn't know VIM could do until now!
http://wetakepic.com/images/2017/10/09/digraphs.jpg

Typing the following digraphs all worked just fine in VIM:
control +- q +- 177 == resulted in a single +-/- (+ALE-) character
control +- q +- 215 == resulted in a multiply (+ANc-) character
control +- q +- 176 == resulted in a degree (+ALA-) character
control +- q +- 186 == resulted in an underscore-degree (+ALo-) character
control +- q +- 230 == resulted in a single ae (+AOY-) character
:digrapahs (will list all of them in vim)
The zero wasn't needed as is explained here in section 12:32

[This is fine if you _have_ a numeric pad. Since my default machine is
this netbook which doesn't, I now use Allchars which generates them by
two-character sequences that are fairly intuitive - +-- for the
plus-or-minus, "o for o umlaut +APY-, and so on.]

Now that I know something about the digraphs, I have no problem entering
them into VI (using control+-q or control+-k and then the decimal value).

That means I now have (almost) no problem *removing* and/or *replacing*
them in vi, simply because the digraphs (what a funny name) are easy to
type into the search and replace command.

So, with the use of "digraphs", the problem has been instantly solved!

Anyway, this might be a ways to get these characters into your vi s/x/y/
string, since you now know their codes; try it. (Try with and without
the leading zero.)

I have no problem now entering the digraphs [147] and [148] for the curly
quotes. I also have no problem *identifying* the digraphs for anything else
since simply sidling up to the character and pressing "ga" will inform me
of the digraph's digital, hex, and octal value in vim.

Once I have the digraph value, I can enter it into the search command.

Everyone has an opinion; everyone has a
favorite, a certain one they absolutely swear by."

Including you of course (-:

The beauty of choosing the most well known of all command-line text editors
is that my choice can't be wrong since everyone has to compare to it.

That generally means two things in canonical freewa
1. The canonical freeware generally does almost *everything*
2. Anyone copying it has to compare to the canonical freeware

In general, since #1 exists, they copy in #2 by eliminating the faults
(e.g., a better GUI or a faster GUI, or whatever) but not in functionality
for if they had better functionality, they'd be the canonical freeware and
not the copy.

See previous post, that just _perhaps_ you've got "muscle memory" for
some of the _in_efficiencies of vi? Not that I'm suggesting you change -
I was the same over Windows 98 (and still use XP in preference to 7).

Well, to support your point, I admit it's not intuitive that a search and
replace of a to b is the set of instructions:
:s/a/b
which means:
: = begin a command
s/ = search for
a = a
/b = and replace it with b

But those commands are run ten thousand times a year, so, it's not hard for
the muscles to remember them.

However, the great news is that a search and replace of the curly quotes
uses the *same* method - so all I needed to learn was what digraphs were!

https://www.merriam-webster.com/dictionary/digraph
"a group of two successive letters whose phonetic value is a single sound
(such as ea in bread or ng in sing)"

I don't know everything there is to know about freeware.
I'm always learning.
But less open than you once were. (I know; I am too.)

The thing about freeware, which I probably know as well as anyone here
does, is that the *price* of freeware is in the immense time and energy to
find and use the best freeware.

For example, I've spent entire nights trying to get Shotcut to do exactly
what I want it to do. It's not that Shotcut freeware can't do what I want
it to do - it's that I don't know the exact steps yet to make Shotcut do
what I want it to do.

Once you go through that process, when someone suggests another freeware,
e.g., VirtualDub, you look at it for just a few seconds and see the huge
faults, and you drop it based on that "hunch".

With freeware, to cut the immense cost of studying each one (we don't get
paid to evaluate freeaware - because if I did - I would LOVE that job), we
have to go on hunches just like an HR rep does when evaluating a stack of
incoming resumes.

So maybe Notepad is far more efficient than vi is.

Now, now - nobody ever claimed that.

OK. You're the second person to chastise me for that; but I only belatedly
realized people were talking about Notepad+-+- which is nothing like Notepad.

I've installed Notepad+-+- to test.

But then why can't I find Notepad yet on any list of the most efficient
text editors (at least not in my cursory search just now)?
Probably (a) because it clearly isn't (b) the sort of people who vote on
such things may include a lot of vi/emacs enthusiasts. (Actually, I
doubt there's anything as democratic as a _vote_ anyway: virtually all
such lists will be one person's view.)

I disagree.
There will almost always be "canonical" software that does a certain job
where *most* people will agree and *all* people will have heard of it such
that *every* competitor *must* compare against the canonical software.

My humble opinion is that if vi isn't in any given list of "canonical" text
editors, then it's not a list of canonical text editors.

Back on topic ... since the problem is solved, I'll write up a summary so
that others can jump to the chase, which increases our tribal knowledge
overall.
http://wetakepic.com/images/2017/10/09/digraphs.jpg

**harry newton**

He who is Mayayana said on Mon, 9 Oct 2017 10:13:17 -0400:

And that works? Your link is talking about ANSI replacments,
replacing 147 and 148 with 34. It's not common for people to be
using those in ANSI text because they'll be corrupted in a non-
English codepage. Since you're copying from webpages I'm
assuming that you're dealing with UTF-8. But if your method
works then....

Everything seems to be working now in VIM once you guys helped me figure
out how to determine the special character hex/octal/decimal value, which
was to simply type "ga" with the character selected.

Then it was an easy matter to search and replace using standard syntax:
:s/a/b = which means search for "a" and replace it with "b"

I had trouble with the multiple search & replace syntax though:
:s/[a,b]/c = which means search for a and/or b & replace it with c.

But I just switched to a different syntax and it worked:
%d147 & %d147 = this is one way to use the digraph in a command
control+q+147 & control+q+148 = this is another way to use digraphs

The first syntax didn't work in the multiple search and replace command,
but the second syntax did work.

FAILED:
:s/[x,y]/z/g == works fine to replace x and/or y with z throughout the file
:s/[\%d147,\%d148]/"/g == fails to replace for every line of the file

Likewise:
:s/x\|y/z/g == works fine to replace x and/or y with z throughout the file
:s/\%d147\|\%d148/"/g == fails to replace for every line of the file

WORKED:
:%s/[control+q+147,control+q+148]/"/g == replaces both with straight quotes

**Paul[_32_]**

Mayayana wrote:
"harry newton" wrote

| Yikes. Is my name in the MS Word document I already uploaded?
| http://www.filedropper.com/fixquote
|
Beats me. The site won't let me download without an account.
Are you really serious? You need to get a less obnoxious place to
store files. If you don't have anything like your own website or
private space, you can probably use dropbox. I find that links
from there work fine for direct download if I just change the
parameter dl=0 to dl=1.

| Is my name in my Word file?
|

If it's a DOCX rename it to .ZIP and open it. Extract
all the files, then use that famous VIM jack-of-all-trades
to search for your name. You might also do a binary
search in a hex editor. But be warned: DOCX is a bloated
mess. I managed to find a DOCX on my system and looked at
it just now. No less than 10 XML files are contained in it.

In a DOC file the whole thing is stored as semi-binary.
The actual ext may be ANSI or unicode. They set it up
like a binary file, with sections. (It's a general type that
Microsoft refer to as a compound storage file.) I think
the personal data comes near the end and may include
local paths. I just opened an old Microsoft EULA DOC
in HxD hex editor and found this that doesn't appear in
the doc itself:
Christina Olson... Microsoft Word 10.

I've never used MS Office and only use Libre Office as
necessary for business contracts and such. So I'm not
familiar with how one cleans tracks.

I didn't have a problem downloading from Filedropper.
You probably needed to have scripts turned on :-)

*******

One difference between .docx and .doc, is the .doc format
used to have a problem with "tips&tails". When writing out
a .doc, the very end of the document might be rounded
to the nearest sector, and the parts of the sector not
needed, were filled with crap. And that crap might have
personally identifiable materials (leakage). If you
sent the .doc to someone as an attachment, the non-valid
end of the file might be included.

The difference with .docx, is you might not notice. Since
it's a ZIP format, you might not notice binary placed
after the end of file. I doubt DOCX has this problem,
but it would be pretty hard to tell with just a hex editor.

The above .docx download, seems to have no encrypted content, so
I have to assume the document was actually empty. Microsoft
likes to apply a default encryption to the content, even
when a password is not set on a document. The "password"
on the default encryption is "VelvetSweatshop". You can
use that keyword in a Google search, to get more info.

This is why naively dropping a Microsoft document onto
your hex editor and "looking for the owners name", isn't
going to work. Lots of stuff you might want to look at,
has had 50,000 or 100,000 passes of an encryption method
applied to it. That's part of the reason a document
might "look like binary". Hardly any useful text to be
seen.

I tried my hand at tracing the handling of such, in the
LibreOffice source code (which should have an implementation)
but gave up after a while. My shovel wasn't big enough.
There is a handling path for virtually every version of
Word that ever existed. There are a couple 500 page specs
to look at. And so on. Need a big shovel...

One of the funny bits about the 500 page spec, is the
word "VelvetSweatshop" cannot be found by doing a
document search. To hide their embarrassment, Microsoft
shows the password as a series of individual characters,
as in "the first letter in the password is 0x56". I thought
that was pretty clever. You would not want that to be
easy to find, now would you.

But since the Libreoffice designers figured it out (with
the help of the available MS documentation), a Black Hat
could figure it out too. That's if there actually was some
info to be had. In something other than the Properties
the author sets in the document, claiming ownership.

I was hoping, by being inside the LibreOffice engine,
I could find a buffer with plaintext in it. That was
my basic plan.

Paul

**Mayayana**

"Paul" wrote

| Beats me. The site won't let me download without an account.
| I didn't have a problem downloading from Filedropper.
| You probably needed to have scripts turned on :-)
|
I assumed that much. I'm not about to allow a
site to run javascript just to download a file. But in
the page code, the script that makes the download
option appear after clicking the captcha button is
this:

input type="button"
onClick="window.location='http://www.filedropper.com/createaccount.php'"
It didn't do anything for me because it was specifically
designed to malfunction without script. At any rate, his later
link was fine, to sendspace.com.

| This is why naively dropping a Microsoft document onto
| your hex editor and "looking for the owners name", isn't
| going to work.

It might not work in docx but it works fine in doc.
I was only able to find 2 docx on my system. Both were
from MS docs downloads. I opened one and it's all in plain
text, if you can call XML plain text. So I don't know what
you mean about encryption. But I didn't find much revealing
info. Core.xml looks like it can contain such info, though.
Here's the closest thing to private data I found:

dc:creatormon11/dc:creatorcp:keywords/cp:keywordsdc:description/dc:descriptioncp:lastModifiedBymon11/cp:lastModifiedBycp:revision44/cp:revisiondcterms:created
xsi:type="dcterms:W3CDTF"2009-05-27T09:25:00Z/dcterms:createddcterms:modified
xsi:type="dcterms:W3CDTF"2009-05-29T08:18:00Z/dcterms:modified/cp:coreProperties

Not very racy.

| One of the funny bits about the 500 page spec, is the
| word "VelvetSweatshop" cannot be found by doing a
| document search.

Are you sure you're getting out enough, Paul?
There's something about looking for "VelvetSweatshop"
in a Word doc that seems a bit, well, unseemly.

**harry newton**

He who is harry newton said on Sat, 7 Oct 2017 21:38:03 +0000 (UTC):

How can we convert those dastardly curly quotes to straight quotes on Windows?
http://i67.tinypic.com/2h5mjbr.jpg

SOLVED!
http://wetakepic.com/images/2017/10/09/samplede854.jpg

The problem statement (to summarize the thread):
Cut-and-pasted text from web pages has unprintable characters in vi.
The solution (to summarize the tribal knowledge gained & leveraged):
a. Learn how to identify the non-printable diagraphs encountered
b. Learn how to replace them with desired printable characters
http://wetakepic.com/images/2017/10/09/fixquote3de984.jpg

SUMMARY:
I learned a new word today, which is "diagraphs", defined in VIM as:
:diagraphs http://wetakepic.com/images/2017/10/09/digraphs.jpg

Whenever I cut-and-paste from web pages such as this one:
https://practicaltypography.com/straight-and-curly-quotes.html
Certain diagraphs appeared as black boxes in my vim pure text editor.

We tried to fix the problem using Microsoft Word & Notepad but they failed!
http://wetakepic.com/images/2017/10/09/fix4deee6.jpg

We first went off on the wrong track trying to use "smart quotes" in Word:
http://wetakepic.com/images/2017/10/09/smartquotesff622.jpg

Then we went on the wrong track of saving as "pure text" in Word & Notepad:
http://wetakepic.com/images/2017/10/09/fix1.jpg

Then we went on another wrong track of character encoding:
http://wetakepic.com/images/2017/10/09/fix3aa9bb.jpg

But the right track was hiding in plain sight, all along, in gvim.
http://wetakepic.com/images/2017/10/09/fixquote32fc33.jpg

I knew how to do a search & replace in gvim (as does everyone); but I had
entered in the backslash thinking that the backslash delimiter was needed
to delimit the diagraph's leading percent sign which I also had thought was
needed!

In short, this is a normal search and replace for multiple items in gvim:
:%s/[x,y]/z/g = search for x and/or y, and replace all with z

The "ga" command tells me the following decimal values in vim:
'66' smartquote is decimal 147 (aka %d147)
'99' smartquote is decimal 148 (aka %d148)
http://wetakepic.com/images/2017/10/09/hex.jpg

Therefore, this worked for entering the curly quote digraphs in vi:
:%s/\%d147/"/g == replaces all '66' curly quotes with straight quotes
:%s/\%d148/"/g == replaces all '99' curly quotes with straight quotes

And this worked for entering the curly quote digraphs in pairs in vi:
:%s/[control+q+147,control+q+148]/"/g == for Windows
:%s/[control+v+147,control+v+148]/"/g == for Linux

But this did not work to enter the diagraphs in pairs in vi:
:%s/[\%d147,\%d148]/"/g == fails to replace for every line of the file

I didn't realize the digraph percent sign was not always needed in vi:
:%s/[\d147,\d148]/"/g

So the syntax problem of replacing curly quotes in gvim is now solved!
The same method can be used to replace any non-printable character.

Thanks!

**Ken Blake[_5_]**

On Mon, 9 Oct 2017 12:51:20 +0100, "J. P. Gilliver (John)"
wrote:

In message , harry newton
writes:
He who is Char Jackson said on Mon, 09 Oct 2017 00:22:09 -0500:

The reason I'm using GVIM on Windows is that vi allows for quick edits.
I tried to use Google to find "gvim quick edits" and came up empty.
What
does GVIM do that other text editors don't do? What are quick edits? I
assume those two words don't have their obvious meaning, because every
text editor does that, so what is it?

I apologize. My words were not clear. Hence I caused unnecessary
confusion.
I appreciate that you followed the implied meaning of my words, which makes
it my fault for not explaining that "vi" is all I ever wanted. GVIM just
came with the package when I searched for the best "vi" editor on Windows.

I never use the GUI in GVIM. I never use the mouse. Both hands are on the
keyboard, because all I do with GVIM is use the "vi" part of VIM.
So what I meant by "quick edits" was my own concoction of words to describe
that fact that I can cut and paste and copy and rename and delete and
reorder, etc., all using keyboard presses.

For example, I can delete a line by typing "dd" and I can yank a line by
typing "yy" and I can paste those deleted or yanked lines ten lines lower
by typing "p" after jumping down ten lines with "10j" such that the
self-concocted phrase "quick edits" means something like this sequence:

yy10jp == this is a "quick edit" yanking a line to place it 10 lines lower

Just out of curiosity: do you _ever_ use Ctrl-C, X, and V? Or
highlighting to select (which can be done from the keyboard - I usually
do - rather than the mouse)?

Or alternatively, select with either the keyboard or the mouse, then
either press delete or drag it where you want it.

He persists in thinking that the text editor he uses is his only
choice, when there are many others that will do at least as good a job
as his, and at least as easily.

**harry newton**

He who is Mayayana said on Mon, 9 Oct 2017 10:44:38 -0400:

And then what will you do when you need to type
a Euro sign, copyright, or other such symbol?

It's easy now that I know what a digraph is & how to handle them:
http://wetakepic.com/images/2017/10/09/digraphs98b40.jpg

I didn't know this simple answer at the start of this thread because I knew
nothing of digraphs (I didn't even know that the word existed).
http://vimdoc.sourceforge.net/htmldoc/digraph.html

Knowing a little bit now about digraphs, the answer to your question
happens to be very simple using the gvim editor of choice.
http://vim.1045645.n5.nabble.com/Solved-Display-Euro-character-in-gvim-7-2-td1176874.html

That is, to find all the digraphs in gvim, I simply enter the command:
:digraphs

Here is what results for me, on my old 32-bit gvim freewa
http://wetakepic.com/images/2017/10/09/digraphs98b40.jpg

It seems that the Euro sign is digraph decimal 128, hex 80, octal 200.
https://en.wikipedia.org/wiki/European_Currency_Unit
https://en.wikipedia.org/wiki/Euro_sign
And that the Copyright sign is the digraph decimal [169].
https://en.wikipedia.org/wiki/Copyright_symbol

To answer your question, I need to do only type the decimal [169] digraph
to show the copyright symbol, and to type the decimal [128] digraph and
choose a font (e.g., Courier New) that knows about that digraph to show the
Euro symbol.
http://wetakepic.com/images/2017/10/09/digraphs98b40.jpg

Notice that solving the Euro problem you proposed just now, also caused me
to accidentally solve the smartquote digraph problem by mistake!

**harry newton**

He who is harry newton said on Mon, 9 Oct 2017 17:01:05 +0000 (UTC):

SOLVED!
http://wetakepic.com/images/2017/10/09/samplede854.jpg

SOLVED ANOTHER WAY!
http://wetakepic.com/images/2017/10/09/digraphs98b40.jpg

Mayayana asked me how to show the Euro symbol in gvim, and when I figured
it out (using my newly found keyword "digraph"), I found yet another and
far easier way to solve the smartquote digraphs being non-viewable
characters!
http://vimdoc.sourceforge.net/htmldoc/digraph.html

All I needed to do, in gvim, was change the font from the default to
something that recognized the Euro (and smartquote) digraphs, such as
"Courier New".
http://vim.1045645.n5.nabble.com/Solved-Display-Euro-character-in-gvim-7-2-td1176874.html

Our combined tribal knowledge now records TWO answers to the OP question!

**harry newton**

He who is Mayayana said on Mon, 9 Oct 2017 10:02:35 -0400:

It's easy enough to look these things up. In fact I
already mentioned them.

I agree with you that I didn't have the knowledge that you have to solve
the problem when I first asked the question.

The problem with "looking these things up" is that I was looking up these
things in the wrong direction.

I first thought that MS Word "smart quotes" would work - but they failed.
Then I thought, as you did, that character encoding would work - but it too
was never the problem.

So the problem with 'looking things up' is that you have to pretty much
know what the answer is before you can find the answer, in such cases.

It turns out the answer is all about searching and replacing what is a new
word to me, which is digraphs!

Here's a sample that I found listed third at DDG searching
for curly quotes in UTF-8. It's not a set of characters
for typing curly quotes. It's 3 bytes that define a curly
quote in UTF-8 but read as 3 characters in ANSI. And
you can't type them.

I think the whole UTF-8 direction is a bust.
It's not about encoding (as far as I can tell), especially since GVIM seems
to handle any encoding simply by typing the encoding:
:set encoding=cp1252 -- tells vim to use Windows-1252 character encoding
:set coding=utf-8 -- tells vim to return to UTF-8 encoding.

I think the entire encoding issue was a red herring.
(I admit though that I don't understand encoding; all I know is the
empirical evidence which is that *every* encoding suggestion I tried failed
- and - the empirical evidence that the problem was solved by diagraphs, no
matter what the encoding).

You don't seem to really be reading the information
people are giving you. Why not look into the encoding?
Why not switch to a better editor? (Global search
and replace is not a rare animal.)

I read everything everyone suggested. I installed all the programs everyone
suggested. In a way, that was the problem which took me all night to figure
out, which is that most of the suggestions were red herrings.

I'm pretty sure that encoding is a red herring.

However, I readily admit that I don't understand encoding so I base that
assumption only on the purely empirical evidence of:
1. I set the encoding in every way suggested - and all failed.
2. I solved the problem without dealing with encoding.
https://groups.google.com/d/msg/alt.usage.english/H5lxAMNh5Kw/zbIkHZ_xAQAJ

But I expect you'll also find other issues if you want
to keep that webpage text as ANSI. That's why it may
not be the simple solution that you insist on finding.
Sometimes spaces are done with UTF-8 character
combinations. Things like a copyright sign will be different
in UTF-8 vs ANSI. And so on.

Thank you for that advice Mayayana, where I openly admit I really don't
understand this encoding stuff (because it has never mattered in what I
do).

Since I aim to stay in pure text, the only encoding I care about is the
KISS super simple most compatible US American Keyboard encoding. That's it.

I didn't know this when I first asked, but since the answer is readily
solved with only a simple search and replace, I don't think the encoding
was ever the problem.

Also, the answer was solved a second way, by accident, when I tried to
answer your question about how to display the Euro, which was to simply
change my font from the default to something that recognizes the Euro (and
smartquote) digraphs!
http://wetakepic.com/images/2017/10/09/digraphs98b40.jpg

But I could be wrong that encoding isn't the underlying issue since I don't
understand why I even have to deal with encoding, since I'm perfectly happy
with the characters on the US American keyboard in any standard font.

**harry newton**

He who is J. P. Gilliver (John) said on Mon, 9 Oct 2017 12:46:40 +0100:

Note You can find and replace all instances of single or double curly
quotes with straight quotes in your document. To do this, clear the
"Straight quotes" with "smart quotes" check box on the AutoFormat As You
Type tab. On the Edit menu, click Replace. In both the Find what and
Replace with boxes, type ' or ", and then click Find Next or Replace
All.

I haven't tried this. Note it says put ' or "" in _both_ boxes, i. e.
tell it to replace " with " - I grant that would _not_ have occurred to
me!)

Now that the problem is solved two different ways, both written up to
leverage the tribal knowledge gained...
https://groups.google.com/d/msg/alt.usage.english/H5lxAMNh5Kw/zbIkHZ_xAQAJ

I'm catching up on all the helpful responses.

Your suggestion above turned out to be *perfect* which was that the second
simplest solution of all was to figure out what the digraphs were (a new
word for me) for the non-printable characters and then to run a global
search and replace.

The diagraphs were found by the gvim "ga" command as decimal 147 and
decimal 148 respectively for the opening & closing curly quotes.

So the standard gvim search and replace command replaces them with straight
quotes.

The only problem was that I had to learn a few basics about text editors:

a. I had to realize that non-printable diagraphs were the problem.
b. I had to realize that encoding to "pure text" was not the problem.
c. I had to learn how to find a diagraph's decimal equivalent.
d. I had to syntactically learn how to search-and-replace diagraphs.

Once those hurdles were solved, the great news is that replacing diagraphs
is the same as replacing anything else - only with some added syntactically
required characters.

Hence, your approach of learning the search-and-replace was spot on
perfect!

It turns out that Mayayan's recent suggestion of displaying the Euro caused
me to accidentally find the *simplest* solution, which was simply to change
the font in gvim to Courier New (or any font that knows about the
smartquote (and euro) digraphs!).
http://wetakepic.com/images/2017/10/09/digraphs98b40.jpg

**harry newton**

He who is J. P. Gilliver (John) said on Mon, 9 Oct 2017 12:51:20 +0100:

Just out of curiosity: do you _ever_ use Ctrl-C, X, and V? Or
highlighting to select (which can be done from the keyboard - I usually
do - rather than the mouse)?

In vi, to copy and paste and to change a word or to change a sentence is so
easy that you "often" *don't need* the control+x, control+v, and control+c
keyboard shortcuts.

But I use them all the time in *all* my programs, as they are embedded into
muscle memory just as much as the vi commands are.

So, I use them in vi if/when my muscle memory knee-jerks them into place!

(Also: the above would move an individual line. How easy is it to move,
say, a sentence, that starts in the middle of a line and ends in the
middle of the next line - to a position in the middle of another line?)

The gvim editor isn't perfect, but the long answer to your question is that
a regular expression can find *anything* and then the rest of the commands
just move around what the regular expression found.

I almost never need to use the gvim GUI, but if I needed to do something
that was easier done with the GUI (aka the mouse) than by regular
expressions, my "lazy" attribute (inherent in all computer knowledgeable
people) would take over.

In that case, I'd just select the line and cut and paste it just like you
would.

Having the option to do whatever you want is what the best freeware is all
about. GVIM isn't the only pure text editor, as I'm sure they *all* can do
what you just asked ... can't they?

**harry newton**

He who is Ken Blake said on Mon, 09 Oct 2017 10:59:18 -0700:

Or alternatively, select with either the keyboard or the mouse, then
either press delete or drag it where you want it.

Exactly.

I don't think there is a text editor alive that can't cut and paste with
the mouse.

He persists in thinking that the text editor he uses is his only
choice, when there are many others that will do at least as good a job
as his, and at least as easily.

I don't want to argue with you, especially as you're helping me, and as,
together, we just increased the tribal knowledge of the ng as a whole
because the answer to the original question turned out to be super simple
as summarized he
http://wetakepic.com/images/2017/10/09/digraphs98b40.jpg
https://groups.google.com/d/msg/alt.usage.english/H5lxAMNh5Kw/pSNETZP2AQAJ

The only reason this thread is long is simply because nobody knew the
answer until now (one day later).

However, having said I don't want to argue, I will state that I'll use the
best of the freeware that is out there, where the *cost* of the freeware is
in learning which is the best and in learning how to use it.

For example, I am just now learning how to use Shotcut freeware, which,
like GIMP freeware, does *everything* you need to do - but in a super
secret way.

Why is it the best then?
Because it does *everything* you need it to do.

The cost is in *learning* how to do it.
Once that cost is expended, then trying out dozens of other video editors
isn't fruitful anymore.

SO you want to stick with the canonical software that can do everything.
Gvim is such software.
Notepad++ might be such software - I don't know - it's too early to tell.

**harry newton**

He who is Mayayana said on Mon, 9 Oct 2017 11:13:29 -0400:

You pasted into Notepad and saved as UTF-8, but
then you opened it in VIM, which is rendering ANSI.
Try opening the same file in Notepad.

I know you don't want to hear this, but it's
complicated.

I think the super simple simplest answer is he
https://groups.google.com/d/msg/alt.usage.english/H5lxAMNh5Kw/pSNETZP2AQAJ

If you save text as UTF-8 you solve the problem,
but then you need to read/edit as UTF-8 and need a
program that recognizes it. Suddenly you're dealing
with a number of plain text formats.

I can't explain why it worked, but simply changing the GVIM font from the
default to "Courier New" solved the problem of me not seeing the smartquote
digraphs!
http://wetakepic.com/images/2017/10/09/digraphs98b40.jpg

Notice the first 3 weird marks in your VIM screenshot?
Those are the UTF-8 marker characters, bytes EF BB BF,
that tell the editor it's UTF-8. Vim doesn't see them.
It's just rendering them as characters.

How does that explain that it works when I change the font?

But your sample provides a good example of what you're
up against. There are more UTF-8 characters in that text
than just curly quotes.

I agree. The problem seems to also show itself with the Euro.

That's why I was saying that it's not
as simple as a basic replacement. The number and variety of
UTF-8 characters in copied text is an unknown. If you just
need to do a replace on curly quotes that's doable, but any
more than that may take time.

The problem seems to be solved by changing the font, so, that's about as
simple as things get (since I don't care what font I use for text editing).

Can anyone explain why simply changing the font worked?

Thread Tools
Show Printable Version Email this Page
Display Modes	Rate This Thread
Linear Mode Switch to Hybrid Mode Switch to Threaded Mode	Rate This Thread: