A Windows XP help forum. PCbanter

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Go Back   Home » PCbanter forum » Microsoft Windows XP » General XP issues or comments
Site Map Home Register Authors List Search Today's Posts Mark Forums Read Web Partners

Can I copy text not meant for copying?



 
 
Thread Tools Display Modes
  #16  
Old March 10th 12, 02:14 PM posted to microsoft.public.windowsxp.general
jim
external usenet poster
 
Posts: 132
Default Can I copy text not meant for copying?

On Sat, 10 Mar 2012 08:23:18 -0500, in
microsoft.public.windowsxp.general, jim , wrote

On Fri, 09 Mar 2012 00:21:32 -0500, in
microsoft.public.windowsxp.general, Paul , wrote

When you have a PDF to play with, post the URL, so we can try our
own bags of tricks :-) I haven't tried "copy busting" in a while.

Paul



I recently tried to bust one for a friend -- trying three different
utilities, one was XPDF using command line options in pdftotext.exe which
*claimed* to bypass restrictions and failed spectacularly (i may have
done it wrong). I have a query out for that URL now and will post it
if/when i get it.

jim



Looks like this is it:
http://rapidlibrary.com/files/2disco...y9wwi89on.html

jim

Ads
  #17  
Old March 10th 12, 02:39 PM posted to microsoft.public.windowsxp.general
jim
external usenet poster
 
Posts: 132
Default Can I copy text not meant for copying?

On Sat, 10 Mar 2012 09:08:36 -0500, in
microsoft.public.windowsxp.general, Paul , wrote

jim wrote:
On Fri, 09 Mar 2012 00:21:32 -0500, in
microsoft.public.windowsxp.general, Paul , wrote

When you have a PDF to play with, post the URL, so we can try our
own bags of tricks :-) I haven't tried "copy busting" in a while.

Paul



I recently tried to bust one for a friend -- trying three different
utilities, one was XPDF using command line options in pdftotext.exe which
*claimed* to bypass restrictions and failed spectacularly (i may have
done it wrong). I have a query out for that URL now and will post it
if/when i get it.

jim


If you do "properties", take a look at the security settings,
while viewing the document in Acrobat Reader. Perhaps there
is something there to explain why it can't be busted. Maybe
Adobe had enough time to re-think how to fix the "honor" system...

Paul



He is a savvy fellow who told me he was using printscreen and then an OCR
app., so i dropped it. I know he would prefer to do "select text" and
"copy to clipboard".

jim
  #18  
Old March 10th 12, 03:10 PM posted to microsoft.public.windowsxp.general
jim
external usenet poster
 
Posts: 132
Default Can I copy text not meant for copying?

On Sat, 10 Mar 2012 09:09:33 -0500, in
microsoft.public.windowsxp.general, "Mayayana" ,
wrote

| I recently tried to bust one for a friend -- trying three different
| utilities, one was XPDF using command line options in pdftotext.exe which
| *claimed* to bypass restrictions and failed spectacularly

I wrote about that in my earlier post. XPDF claims
no such thing. In fact, the author has specifically
written an explanation saying that he doesn't feel
right about bypassing restrictions.

http://www.foolabs.com/xpdf/cracking.html

XPDF is also outdated, and never worked all that
well in the first place. It actually only requires a
very small edit to make pdftotext.exe ignore restrictions:

In pdftotext.c one just needs to comment out the
permission check:

// check for copy permission
/*
if (!doc-okToCopy()) {
error(-1, "Copying of text from this document is not allowed.");
exitCode = 3;
goto err2;
}
*/

Unfortunately, one also needs to be capable of
recompiling the software.


Well, that, and one would need the source code -- or be very clever in
locating the command within the compiled code, then nulling it, etc...

As far as being outdated, yes of course it is, but I am a fan of
"outdated" since that usually mean that you are going to bypass
GUI-cleverness.

jim

I looked around, at one point, for a program that
ignores restrictions and found that it seems to be
mostly a commercial thing. If you don't mind paying,
you can have the functionality. But for some reason
the OSS people "respect" the design of PDFs, which is
unfortunate since, as Paul said, most copy-protected
PDFs seem to be that way simply because the author
wasn't paying attention to the settings.


  #19  
Old March 10th 12, 03:35 PM posted to microsoft.public.windowsxp.general
Mayayana
external usenet poster
 
Posts: 6,438
Default Can I copy text not meant for copying?

| Unfortunately, one also needs to be capable of
| recompiling the software.
|

| Well, that, and one would need the source code --

It's open source software. The source code is freely
available. Unfortunately, it's not realistic for most people
to recompile their own software.

| As far as being outdated, yes of course it is, but I am a fan of
| "outdated" since that usually mean that you are going to bypass
| GUI-cleverness.
|

I don't see the connection. There's very old GUI software
and very new command line software. By outdated I'm just
talking about the functionality. XPDF wasn't perfect to start
with, and now it's several years old. The actual text output
is sloppy. It comes out with a lot of junk text and imperfections.

There's nothing inherently good about console software. It just
means that the author is either a keyboard lover or that they
weren't willing to put in the work required to write a GUI.

I later found Sumatra PDF, which is also open source and can
also be recompiled to ignore restrictions. Aside from the
restrictions issues, Sumatra PDF is a good, small PDF reader
that's free and will extract text better than XPDF does it.


  #20  
Old March 10th 12, 03:46 PM posted to microsoft.public.windowsxp.general
Paul
external usenet poster
 
Posts: 18,275
Default Can I copy text not meant for copying?

jim wrote:



He is a savvy fellow who told me he was using printscreen and then an OCR
app., so i dropped it. I know he would prefer to do "select text" and
"copy to clipboard".

jim


It's looking like some kind of font encoding problem, rather than
copy prevention. Still working on it...

filename = 2discoverislam_com_riyad_us_saliheen.pdf
type = PDF 1.4
size = 5638888 bytes
md5sum = df45ea78241da54c928ba8b91c94c59e

Paul
  #21  
Old March 10th 12, 05:15 PM posted to microsoft.public.windowsxp.general
jim
external usenet poster
 
Posts: 132
Default Can I copy text not meant for copying?

On Sat, 10 Mar 2012 10:35:01 -0500, in
microsoft.public.windowsxp.general, "Mayayana" ,
wrote

| As far as being outdated, yes of course it is, but I am a fan of
| "outdated" since that usually means that you are going to bypass
| GUI-cleverness.
|

I don't see the connection.


That's OK. ;-)
  #22  
Old March 10th 12, 07:10 PM posted to microsoft.public.windowsxp.general
micky[_2_]
external usenet poster
 
Posts: 926
Default Can I copy text not meant for copying?

On Fri, 09 Mar 2012 00:21:32 -0500, Paul wrote:

micky wrote:


It didn't work! This is the webpage:

http://www.tropicana.com/#/trop_prod...anaPurePremium

I wanted to get the 3 lines of black text above the 5 bottles***.


I have two web browsers. One with Adobe Flash installed, and one without.
The "5 bottles" only appear in the Flash based version of the webpage.
The text in this case, is in a Flash image, and is not text "you can wipe over".

The non-Flash equipped browser, shows a quite different page. I was
able to copy this text from the non-Flash page. The page is
entirely different, with different text. I got this via copy/paste
of the non-Flash page (with anything needing Unicode, removed).


Wow, I got a techical answer from you, and the text I wanted too!! I
don't suppose I can call you every time I wante do copy text. No,
probably not.

"We're committed to using the best fruit to give you the great tasting juices
you love and the nutrition your body needs. Each 59oz container of Tropicana
Pure Premium has 16 fresh-picked oranges squeezed into it and an 8oz glass
gives you 100% vitamin C to help you maintain a healthy immune system."


I have to check if Tropicana is one of those that can keep
fresh-picked squeezed oranges in a vat for months. Haven't had time.
I guess that would mean they are picket when they're fresh, not that
they're sold when they are.

The claim of 100% vitamin C, I guess that means your glass is filled to the
rim with dried Ascorbic Acid crystals :-) Linus Pauling would be overjoyed.


He deserves it.

http://en.wikipedia.org/wiki/Ascorbic_acid



On a pdf file, it lets me select all -- it's even in the drop down
menu --, but it doesn't let me copy it/paste it. And copy is not in
the drop down menu. It will take me a while to find the url for this
file from a stock broker, because now I'm working with a downloaded
copy. I plan to post again.


When you have a PDF to play with, post the URL, so we can try our
own bags of tricks :-) I haven't tried "copy busting" in a while.


Okay.

Here it is:. I found this on the web under a shorter, more sensible
name, but this is the same:
http://fa.morganstanleyindividual.co...e47b92e0b3.pdf

When you download this, it has an entry in the Edit drop-down liast
for Select All, but the entries for Copy and Cut are greyed out, and
cntl-C doesn't work either.

Paul


  #23  
Old March 10th 12, 07:50 PM posted to microsoft.public.windowsxp.general
Ken Blake, MVP[_4_]
external usenet poster
 
Posts: 1,699
Default Can I copy text not meant for copying?

On Sat, 10 Mar 2012 14:10:46 -0500, micky
wrote:


Here it is:. I found this on the web under a shorter, more sensible
name, but this is the same:
http://fa.morganstanleyindividual.co...e47b92e0b3.pdf

When you download this, it has an entry in the Edit drop-down liast
for Select All, but the entries for Copy and Cut are greyed out, and
cntl-C doesn't work either.



If you use the free Foxit Reader (which I think is the best pdf
reader) instead of Adobe Reader, it's easy to copy and paste from a
pdf document.
Ken Blake, Microsoft MVP
  #24  
Old March 10th 12, 08:35 PM posted to microsoft.public.windowsxp.general
Mayayana
external usenet poster
 
Posts: 6,438
Default Can I copy text not meant for copying?


|
http://fa.morganstanleyindividual.co...e47b92e0b3.pdf
|
| When you download this, it has an entry in the Edit drop-down liast
| for Select All, but the entries for Copy and Cut are greyed out, and
| cntl-C doesn't work either.
|
|
| If you use the free Foxit Reader (which I think is the best pdf
| reader) instead of Adobe Reader, it's easy to copy and paste from a
| pdf document.

But not that document. The idea was to find a way
to copy from restricted PDFs. Foxit, like most (all?)
free PDF software, will not allow bypassing restrictions.

(Foxit is also bloated, commercial nagware with a
spyware installer and attempts to install junk during
program install. It's a matter of preference, of course,
but personally I used to use Foxit and now prefer
Sumatra. Though I don't have any fancy requirements
-- just to read/print basic PDFs.)



  #25  
Old March 10th 12, 09:52 PM posted to microsoft.public.windowsxp.general
Ken Blake, MVP[_4_]
external usenet poster
 
Posts: 1,699
Default Can I copy text not meant for copying?

On Sat, 10 Mar 2012 15:35:37 -0500, "Mayayana"
wrote:


| If you use the free Foxit Reader (which I think is the best pdf
| reader) instead of Adobe Reader, it's easy to copy and paste from a
| pdf document.

But not that document. The idea was to find a way
to copy from restricted PDFs. Foxit, like most (all?)
free PDF software, will not allow bypassing restrictions.




Ah, thanks. I had missed that that was what the issue was.

Ken Blake, Microsoft MVP
  #26  
Old March 10th 12, 10:08 PM posted to microsoft.public.windowsxp.general
Robert Macy[_2_]
external usenet poster
 
Posts: 78
Default Can I copy text not meant for copying?

On Mar 10, 2:52*pm, "Ken Blake, MVP" wrote:
On Sat, 10 Mar 2012 15:35:37 -0500, "Mayayana"

wrote:
| If you use the free Foxit Reader (which I think is the best pdf
| reader) instead of Adobe Reader, it's easy to copy and paste from a
| pdf document.


* But not that document. The idea was to find a way
to copy from restricted PDFs. Foxit, like most (all?)
free PDF software, will not allow bypassing restrictions.


Ah, thanks. I had missed that that was what the issue was.

Ken Blake, Microsoft MVP


as long as not illegal, or such, I use
http://freemypdf.com/
they even convert from whiz bang 500kB files down to 300kB files that
open on Win98 using Adobe 5! The only thing that I could see added
when I opened with Foxit was an index along the left hand side, but
not much use when the document is only 2 pages to begin with!
  #27  
Old March 10th 12, 10:49 PM posted to microsoft.public.windowsxp.general
Paul
external usenet poster
 
Posts: 18,275
Default Can I copy text not meant for copying?

Paul wrote:
jim wrote:


He is a savvy fellow who told me he was using printscreen and then an OCR
app., so i dropped it. I know he would prefer to do "select text" and
"copy to clipboard".

jim


It's looking like some kind of font encoding problem, rather than
copy prevention. Still working on it...

filename = 2discoverislam_com_riyad_us_saliheen.pdf
type = PDF 1.4
size = 5638888 bytes
md5sum = df45ea78241da54c928ba8b91c94c59e

Paul


The document security settings show "print only" - all other options
are turned off. This doesn't look like a default document security
setting, so I'd say it was on purpose.

Using GhostScript ps2ps converter

ps2ps input.pdf output.ps

and opening the file in a text editor, shows things like this.
ASCII85 is an encoding method. LZW implies compression/decompression.
But when I tried converting the string using a web based ASCII85
decoder, followed by an attempt at LZWdecode, the results didn't
make any sense. So perhaps I'm missing something, as to what
chain of filters is actually being used here.

%%BeginResource: file (PDF Function obj_1059)
1059 0 obj
/Filter[/ASCII85Decode
/LZWDecode]
/FunctionType 0
/Domain[0
1]
/Range[-1
1]
/BitsPerSample 8
/Decode[-1
1.00787]
/Size[256]/Length 39stream
J03]G3$]7K#DEP:q1$o*=mro@So+\\4E(J,~
endstream
endobj

Later on in the document, something similar seems to be happening,
and it's my guess that each line of text in the document, has been
reduced to a chunk of stuff like this. The "/F[/A85" is short
for "/Filter /ASCII85Decode", but I can't find the declarations
as such, at the top of the document. (The document is a computer
program, and the "A85 routine" should be defined further up in
the document.) Perhaps even those definitions are obfuscated.

%%BeginResource: file (PDF CharProc obj_1072)
1072 0 obj
/Length 285 stream
101 0 5 -101 108 0 d1
103 0 0 101 5 -101 cm
BI
/IM true
/W 103
/H 101
/BPC 1
/D[1
0]
/F[/A85
/CCF]
/DP[null
/K -1
/Columns 103
/EndOfBlock false]
ID
-E*7?Ea1MPJ%,=TrVk^CrVlhH^\@VT^]33Yrr"J\,62Trnm2Crnm2Ep\THTs4I@drVsXis1dIg
%,=TJ,\?l ~
EI
endstream
endobj

*******

I gave up on that approach for now, and tested OCR to see how good it could be.
I used Ghostscript, to print each page as a TIFF file. This produced 561 files
of 32MB each. The files are quite compressible, for whatever that's worth.

gswin32 -sDEVICE=tiffgray -sOutputFile=output-%03d.tif -dTextAlphaBits=4
-dGraphicsAlphaBits=4 -r600 -dSAFER -dBATCH -dNOPAUSE input.pdf

You can get Ghostscript here. Once installed, I used command line
invocation to get my output.

http://pages.cs.wisc.edu/~ghost/doc/GPL/gpl902.htm

I selected one page out of the lot for testing. I have an old copy of Acrobat
Distiller for PC, which includes (for its time), a new feature called Paper
Capture. If you present a PDF page containing an image, and the image is
in the resolution range of around 200-300 DPI or so, there is an OCR engine
you can use, to convert the text. The text is "overlaid" on top of the
image pixmap, in the document. You save that out, and then you have
a document consisting of both the original image, with a layer of text
sitting on top of it. (That's to make OCR errors stand out better.)
If you wipe over the surface of the resulting document, you can then
copy and paste. What the copy loses, is "white space", so things like
the "CHAPTER 4" string at the bottom, don't get the right number
of spaces padded on the left of them.

Anyway, this is the copy/paste from the OCR output. This is page 23 by OCR.
Note that normally, that feature worked like crap, so I'd say the tool
liked the quality of the input, and wasn't put off too much by the
appearance of the font at all. I've had plenty of stuff that
produced close to 100% errors, when run through that tool.
I can see below, that "O" and zero got mixed up by the OCR.
Instead of "O" as an exclamation, it put a zero instead.
If I do the OCR with a 200 DPI image, I get a few more errors.
So it can be done with OCR, but correcting OCR errors by
hand is madness. (I've tried it.) That particular OCR, won't
accept input of a higher resolution (even though I can make
higher resolutions if necessary).

******* ghostscript -- TIFGRAY_600DPI -- SCALED_300DPI -- Print to PostScript
-- Acrobat Distiller -- Adobe Acrobat "Paper Capture" -- (Copy/Paste text) *******

2. Since scholarship and piousness are the foremost qualifications for counsellors and advisors, there is
no restriction of age for them.
3. The ruler should always be very considerate and tolerant.
4. The ruler should never hesitate from accepting truth and righteousness.
51. Ibn Mas‘ud (May Allah be pleased with him) reported: Messenger of Allah (PBUH) said, “You will
see after me favouritism and things which you will disapprove of.” They submitted: “What do you
order us to do (under such circumstances)?” He replied, “Discharge your obligations and ask your
rights from Allah”.
[Al-Bukhari and Muslim].
Commentary: This Hadith tells that if you have rulers who deny your rights and give themselves and
their relatives preference over you then patience is a better recourse. Rather than revolting against
them, you should seek pardon and forgiveness from Allah and pray for His Protection against the
mischief and tyranny of the rulers provided they do not show outright disbelief.
52. Usaid bin Hudhair (May Allah be pleased with him) reported that: A person from among the Ansar
said, “0 Messenger of Allah! You appointed such and such person and why do you not appoint me?”
Messenger of Allah (PBUH) said, “After me you will see others given preference to you, but you
should remain patient till you meet me at the Haud (Al-Kauthar in Jannah)“.
[Al-Bukhari and Muslim].
Commentary:
1. The prophecy of the Prophet (PBUH) came true, which is a miracle as well as an evidence of his
truthfulness.
2. The Haud (pond) mentioned here is Haud Al-Kauthar which is granted to the Prophet (PBUH) in
Jannah or in the field where people will be assembled on the Day of Resurrection. There he will offer
his followers cups of pure drink with his own hands. It will be such that one who would take it will
never feel thirst again.
3. Demand for an office is not a pleasant quality. It is, therefore, prohibited to give office to a person
who demands it. It is, however, permissible only in case a person feels that he is more competent than
others and there is no one else in view who is more intelligent, capable and pious.
53. ‘Abdullah bin Abu Aufa (May Allah be pleased with him) reported: The Messenger of Allah
(PBUH) at one time when he confronted the enemy, and was waiting for the sun to set, stood up and
said, “0 people! Do not long for encountering the enemy and supplicate to Allah to grant you security.
But when you face the enemy, show patience and steadfastness; and keep it in mind that Jannah lies
under the shade of the swords.” Then he invoked Allah, saying, “0 Allah, Revealer of the Book,
Disperser of the clouds, Defeater of the Confederates, put our enemy to rout and help us in overpowering
them”.
[Al-Bukhari and Muslim].
Commentary:
1. Although great stress has been laid on full preparation and readiness for Jihad, it is prohibited to
wish for war with enemy.
2. Patience is a great weapon of a Muslim. In the context of Jihad, it means steadfastness, fortitude and
fearlessness of death in the battlefield.
3. Muslims are ordained not to rely entirely on weapons, material resources and their military prowess.
They are advised to pray to Allah for
CHAPTER 4
******* End conversion page 23 *******

Paul
  #28  
Old March 11th 12, 02:29 AM posted to microsoft.public.windowsxp.general
Char Jackson
external usenet poster
 
Posts: 10,449
Default Can I copy text not meant for copying?

On Sat, 10 Mar 2012 15:35:37 -0500, "Mayayana"
wrote:


|
http://fa.morganstanleyindividual.co...e47b92e0b3.pdf
|
| When you download this, it has an entry in the Edit drop-down liast
| for Select All, but the entries for Copy and Cut are greyed out, and
| cntl-C doesn't work either.
|
|
| If you use the free Foxit Reader (which I think is the best pdf
| reader) instead of Adobe Reader, it's easy to copy and paste from a
| pdf document.

But not that document. The idea was to find a way
to copy from restricted PDFs. Foxit, like most (all?)
free PDF software, will not allow bypassing restrictions.


About 5-6 years ago I had a co-worker who used to produce all of his
engineering documents as pdf files. While the rest of the engineering
teams used Word so that others could make changes, with this guy you
had to send him your requested changes in a separate document and he'd
perhaps get around to incorporating them. No amount of rib poking got
him to conform to company standards, so I nosed around the corporate
website and found a licensed copy of Passware. It worked a treat on
his 'secure' pdf's. From then on I would use Passware to remove the
security, convert the pdf to Word, make my changes, then send it back
either in Word or if I was feeling frisky I'd convert the updated
document back to pdf and add my own security.

It took awhile, but he eventually stopped by my office to ask WTF?
After that, no more restricted pdf's!

  #29  
Old March 11th 12, 03:48 AM posted to microsoft.public.windowsxp.general
jim
external usenet poster
 
Posts: 132
Default Can I copy text not meant for copying?

On Sat, 10 Mar 2012 17:49:06 -0500, in
microsoft.public.windowsxp.general, Paul , wrote

Paul wrote:
jim wrote:


He is a savvy fellow who told me he was using printscreen and then an OCR
app., so i dropped it. I know he would prefer to do "select text" and
"copy to clipboard".

jim


It's looking like some kind of font encoding problem, rather than
copy prevention. Still working on it...

filename = 2discoverislam_com_riyad_us_saliheen.pdf
type = PDF 1.4
size = 5638888 bytes
md5sum = df45ea78241da54c928ba8b91c94c59e

Paul


The document security settings show "print only" - all other options
are turned off. This doesn't look like a default document security
setting, so I'd say it was on purpose.

Using GhostScript ps2ps converter

ps2ps input.pdf output.ps

and opening the file in a text editor, shows things like this.
ASCII85 is an encoding method. LZW implies compression/decompression.
But when I tried converting the string using a web based ASCII85
decoder, followed by an attempt at LZWdecode, the results didn't
make any sense. So perhaps I'm missing something, as to what
chain of filters is actually being used here.

%%BeginResource: file (PDF Function obj_1059)
1059 0 obj
/Filter[/ASCII85Decode
/LZWDecode]
/FunctionType 0
/Domain[0
1]
/Range[-1
1]
/BitsPerSample 8
/Decode[-1
1.00787]
/Size[256]/Length 39stream
J03]G3$]7K#DEP:q1$o*=mro@So+\\4E(J,~
endstream
endobj

Later on in the document, something similar seems to be happening,
and it's my guess that each line of text in the document, has been
reduced to a chunk of stuff like this. The "/F[/A85" is short
for "/Filter /ASCII85Decode", but I can't find the declarations
as such, at the top of the document. (The document is a computer
program, and the "A85 routine" should be defined further up in
the document.) Perhaps even those definitions are obfuscated.

%%BeginResource: file (PDF CharProc obj_1072)
1072 0 obj
/Length 285 stream
101 0 5 -101 108 0 d1
103 0 0 101 5 -101 cm
BI
/IM true
/W 103
/H 101
/BPC 1
/D[1
0]
/F[/A85
/CCF]
/DP[null
/K -1
/Columns 103
/EndOfBlock false]
ID
-E*7?Ea1MPJ%,=TrVk^CrVlhH^\@VT^]33Yrr"J\,62Trnm2Crnm2Ep\THTs4I@drVsXis1dIg
%,=TJ,\? l~
EI
endstream
endobj

*******

I gave up on that approach for now, and tested OCR to see how good it could be.
I used Ghostscript, to print each page as a TIFF file. This produced 561 files
of 32MB each. The files are quite compressible, for whatever that's worth.

gswin32 -sDEVICE=tiffgray -sOutputFile=output-%03d.tif -dTextAlphaBits=4
-dGraphicsAlphaBits=4 -r600 -dSAFER -dBATCH -dNOPAUSE input.pdf

You can get Ghostscript here. Once installed, I used command line
invocation to get my output.

http://pages.cs.wisc.edu/~ghost/doc/GPL/gpl902.htm

I selected one page out of the lot for testing. I have an old copy of Acrobat
Distiller for PC, which includes (for its time), a new feature called Paper
Capture. If you present a PDF page containing an image, and the image is
in the resolution range of around 200-300 DPI or so, there is an OCR engine
you can use, to convert the text. The text is "overlaid" on top of the
image pixmap, in the document. You save that out, and then you have
a document consisting of both the original image, with a layer of text
sitting on top of it. (That's to make OCR errors stand out better.)
If you wipe over the surface of the resulting document, you can then
copy and paste. What the copy loses, is "white space", so things like
the "CHAPTER 4" string at the bottom, don't get the right number
of spaces padded on the left of them.

Anyway, this is the copy/paste from the OCR output. This is page 23 by OCR.
Note that normally, that feature worked like crap, so I'd say the tool
liked the quality of the input, and wasn't put off too much by the
appearance of the font at all. I've had plenty of stuff that
produced close to 100% errors, when run through that tool.
I can see below, that "O" and zero got mixed up by the OCR.
Instead of "O" as an exclamation, it put a zero instead.
If I do the OCR with a 200 DPI image, I get a few more errors.
So it can be done with OCR, but correcting OCR errors by
hand is madness. (I've tried it.) That particular OCR, won't
accept input of a higher resolution (even though I can make
higher resolutions if necessary).

******* ghostscript -- TIFGRAY_600DPI -- SCALED_300DPI -- Print to PostScript
-- Acrobat Distiller -- Adobe Acrobat "Paper Capture" -- (Copy/Paste text) *******

2. Since scholarship and piousness are the foremost qualifications for counsellors and advisors, there is
no restriction of age for them.
3. The ruler should always be very considerate and tolerant.
4. The ruler should never hesitate from accepting truth and righteousness.
51. Ibn Mas‘ud (May Allah be pleased with him) reported: Messenger of Allah (PBUH) said, “You will
see after me favouritism and things which you will disapprove of.” They submitted: “What do you
order us to do (under such circumstances)?” He replied, “Discharge your obligations and ask your
rights from Allah”.
[Al-Bukhari and Muslim].
Commentary: This Hadith tells that if you have rulers who deny your rights and give themselves and
their relatives preference over you then patience is a better recourse. Rather than revolting against
them, you should seek pardon and forgiveness from Allah and pray for His Protection against the
mischief and tyranny of the rulers provided they do not show outright disbelief.
52. Usaid bin Hudhair (May Allah be pleased with him) reported that: A person from among the Ansar
said, “0 Messenger of Allah! You appointed such and such person and why do you not appoint me?”
Messenger of Allah (PBUH) said, “After me you will see others given preference to you, but you
should remain patient till you meet me at the Haud (Al-Kauthar in Jannah)“.
[Al-Bukhari and Muslim].
Commentary:
1. The prophecy of the Prophet (PBUH) came true, which is a miracle as well as an evidence of his
truthfulness.
2. The Haud (pond) mentioned here is Haud Al-Kauthar which is granted to the Prophet (PBUH) in
Jannah or in the field where people will be assembled on the Day of Resurrection. There he will offer
his followers cups of pure drink with his own hands. It will be such that one who would take it will
never feel thirst again.
3. Demand for an office is not a pleasant quality. It is, therefore, prohibited to give office to a person
who demands it. It is, however, permissible only in case a person feels that he is more competent than
others and there is no one else in view who is more intelligent, capable and pious.
53. ‘Abdullah bin Abu Aufa (May Allah be pleased with him) reported: The Messenger of Allah
(PBUH) at one time when he confronted the enemy, and was waiting for the sun to set, stood up and
said, “0 people! Do not long for encountering the enemy and supplicate to Allah to grant you security.
But when you face the enemy, show patience and steadfastness; and keep it in mind that Jannah lies
under the shade of the swords.” Then he invoked Allah, saying, “0 Allah, Revealer of the Book,
Disperser of the clouds, Defeater of the Confederates, put our enemy to rout and help us in overpowering
them”.
[Al-Bukhari and Muslim].
Commentary:
1. Although great stress has been laid on full preparation and readiness for Jihad, it is prohibited to
wish for war with enemy.
2. Patience is a great weapon of a Muslim. In the context of Jihad, it means steadfastness, fortitude and
fearlessness of death in the battlefield.
3. Muslims are ordained not to rely entirely on weapons, material resources and their military prowess.
They are advised to pray to Allah for
CHAPTER 4
******* End conversion page 23 *******

Paul



Wow, Paul, you did a lot here. I am passing all of this on to my friend.

Thanks,

jim
  #30  
Old March 11th 12, 08:09 AM posted to microsoft.public.windowsxp.general
Paul
external usenet poster
 
Posts: 18,275
Default Can I copy text not meant for copying?

jim wrote:



Wow, Paul, you did a lot here. I am passing all of this on to my friend.

Thanks,

jim


Not all prints this way, have fonts good enough for OCR. So if the
authors had been even more devious in their selection of fonts, they
could have broken my cheap OCR, and many other, better OCRs as well.
OCR doesn't like characters with "broken" edges. Since there is
no "noise" in a Ghostscript conversion to TIFF, the OCR should
really do well.

There is also a remote possibility, that the stuff that ends up
in the copy/paste buffer (Evince in Linux let me wipe and copy
the text), the junk you get that way, could be a simple substitution
code. Nothing I've seen in the file yet, smacks of "encryption"
except in a trivial way. But I don't have any automated tools for
detecting simple substitution codes. The Embedded fonts might have
been juggled in the font table, such that say, the letter "W"
causes "X" to be printed on the screen. So re-mapping the font table,
might be a way they can mess it up. This was undoubtedly a "feature"
of the tool doing the conversion from MSWord to PDF. If they wanted,
they could re-juggle the fonts on a page by page basis,
but again, there's no evidence of strong purposeful design there,
by the company making the PDF generator. It could be, if you
figured out the substitution code, it would be good for the
whole document.

Paul
 




Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is Off
HTML code is Off






All times are GMT +1. The time now is 05:57 AM.


Powered by vBulletin® Version 3.6.4
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright 2004-2024 PCbanter.
The comments are property of their posters.