If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below. |
|
|
|
Thread Tools | Display Modes |
#1
|
|||
|
|||
OT Web Page Capture App
Is there an app that will capture all contents of a highlighted area on
a webpage, text and images, and put in a .DOC or similar editable format? I can save to a PDF but get all the other garbage on the page. The PDF looks like the webpage. I can do this with great difficulty using Libre Office, however, the images, although there, are badly sized and the page format is messed up some. I save from Libre Office to a .DOC and open In Word and try to edit but it is a nightmare too. Editing text and graphics in Word is nearly impossible. Suggestions please. -- -- No signature --- news://freenews.netfront.net/ - complaints: --- |
Ads |
#2
|
|||
|
|||
OT Web Page Capture App
On 5/28/2015 7:46 PM, Bob R wrote:
Is there an app that will capture all contents of a highlighted area on a webpage, text and images, and put in a .DOC or similar editable format? I can save to a PDF but get all the other garbage on the page. The PDF looks like the webpage. I can do this with great difficulty using Libre Office, however, the images, although there, are badly sized and the page format is messed up some. I save from Libre Office to a .DOC and open In Word and try to edit but it is a nightmare too. Editing text and graphics in Word is nearly impossible. Suggestions please. Look at http://www.lightenpdf.com/pdf-to-word-converter.html they have a free trail. |
#3
|
|||
|
|||
OT Web Page Capture App
Bob R wrote:
Is there an app that will capture all contents of a highlighted area on a webpage, text and images, and put in a .DOC or similar editable format? I can save to a PDF but get all the other garbage on the page. The PDF looks like the webpage. I can do this with great difficulty using Libre Office, however, the images, although there, are badly sized and the page format is messed up some. I save from Libre Office to a .DOC and open In Word and try to edit but it is a nightmare too. Editing text and graphics in Word is nearly impossible. Suggestions please. I've been toying with an answer to this question, but was having trouble coming up with an "angle" to frame an answer. So what I'll start with, is this PDF sample. Open this in your PDF viewer, then imagine a conversion process taking place. http://ecee.colorado.edu/~kuester/smith/smith.pdf What's the first thing you notice ? Things are going on in there, which cannot be expressed in either Microsoft Word or in LibreOffice. Things are going on in there, that can be expressed in Adobe Illustrator. But I don't think that's what you want to hear. You apparently want some "subset" of capability, a series of simple rectangular objects, text not sitting on spline curves or arbitrary paths. When people design translators (and I've tried to write one once, so I learned some lessons), the idea is to find a one-to-one mapping for every object in the source documents. Because, dammit, you want to do a good job. When you're writing your translator, your objective is to make it "pixel perfect". Every object in the source document, you want to provide an exact translation in the destination document. You want to preserve Z-axis priority (so occluded objects are occluded in the same way in the translation). Now, you can go along for a fair while, and find lots of primitives that translate nicely. Then comes along something, where the coordinate system simply doesn't allow mapping from one to the other with any accuracy. And it goes downhill from there. So while I can imagine a very simplified case, where you could convert some primitives (ones on horizontal baselines), the wheels would rapidly fall off if the source of the PDF was something like Adobe Illustrator. Because it can emit primitives (like gradients), that only it can understand on input. PDF has no trouble expressing what Adobe Illustrator can do, so they're roughly graphical "peers". Word or LO, are not even in the same ballpark. So if you attempted this translation, and you got anywhere remotely close to an accurate translation (90% of the stuff is in there), I'm thinking that's pretty good. I could easily find samples, that almost nothing intelligible would come through. And I would start looking for Adobe Illustrator samples as a source of confounding input. So I'm thinking the subset you are looking for, is something like this. Word -- PDF, then later PDF -- Word where since the original source was a pretty simple minded tool, the translation process is pretty simple going in the reverse direction. There's no way to get a spline curve in the original PDF, because Word has no way to make one. There's no way to get a gradient as such in there either. And now it's no longer a "general purpose" translator, it's a "I want to recover my Word source from this PDF I managed to save" translator. And try fitting that into a Google search and getting an intelligible answer :-) If even *some* of the original document comes out intelligible, it's a miracle :-) It's that difficult to do a good job. We used to see this with OCR packages too. There were OCR packages, that could recognize text. Great. On screen, you could see the letters, and it was reasonably close. Then you selected "save as Word", maybe it got dumped into a .RTF, you'd pull it into Word... and all the text strings were on top of one another, and the spacing was a mess. So while the OCR software was good at picking out the text, it was **** poor at emitting primitives properly for Word. Which should make you doubly appreciative when *any* tool pulls even one of the items out of the source, and gets it right. So when designing a translator, the first question you ask is, "are the formats of equal expressive power?" And the answer is, Word or LO are simple minded. PDF is a lot more capable, and you can do things in a PDF that there is no translation for Word. If you design a translator that only covers some subset, then the product has to clearly define what those boundaries might be. As otherwise, the users become very annoyed with your effort in the first five minutes. When my OCR software couldn't make good Word documents, I just uninstalled it. I'd just wasted ninety-nine bucks. Paul |
#4
|
|||
|
|||
OT Web Page Capture App
Bob R wrote:
Is there an app that will capture all contents of a highlighted area on a webpage, text and images, and put in a .DOC or similar editable format? I can save to a PDF but get all the other garbage on the page. The PDF looks like the webpage. I can do this with great difficulty using Libre Office, however, the images, although there, are badly sized and the page format is messed up some. I save from Libre Office to a .DOC and open In Word and try to edit but it is a nightmare too. Editing text and graphics in Word is nearly impossible. Suggestions please. How about saving the web page as an HTML folder using a HTML editor? SeaMonkey Composer can do that. It's primitive but works. Then you can edit words, move & resize pictures, delete ads, etc. Kompozer and Blue Griffin are decent basic webpage editors. |
#5
|
|||
|
|||
OT Web Page Capture App
Bob R wrote:
Is there an app that will capture all contents of a highlighted area on a webpage, text and images, and put in a .DOC or similar editable format? I can save to a PDF but get all the other garbage on the page. The PDF looks like the webpage. I can do this with great difficulty using Libre Office, however, the images, although there, are badly sized and the page format is messed up some. I save from Libre Office to a .DOC and open In Word and try to edit but it is a nightmare too. Editing text and graphics in Word is nearly impossible. Suggestions please. In general, what you ask is impossible. For the best editing capabilities, I'd save the HTML and edit that. -- Mike Barnes Cheshire, England |
#6
|
|||
|
|||
OT Web Page Capture App
Mike Barnes wrote:
Bob R wrote: Is there an app that will capture all contents of a highlighted area on a webpage, text and images, and put in a .DOC or similar editable format? I can save to a PDF but get all the other garbage on the page. The PDF looks like the webpage. I can do this with great difficulty using Libre Office, however, the images, although there, are badly sized and the page format is messed up some. I save from Libre Office to a .DOC and open In Word and try to edit but it is a nightmare too. Editing text and graphics in Word is nearly impossible. Suggestions please. In general, what you ask is impossible. For the best editing capabilities, I'd save the HTML and edit that. http://medicine.buffalo.edu/ooc/reso...cial/word.html "Recent versions of Word can open web pages directly from the internet." "As you can see from the picture below, a page from our website looks very different when Word turns its navigation menus into editable text: Menus become long lists in Word" So it's also possible to import a web page. I don't think I've ever tried that with my old copy of Word. And I don't have any new versions to try with. It looks comparable to a Seamonkey Composer experience, judging by their example. I tried loading a page into Composer before giving my PDF answer, and didn't think that was going to be good enough. Maybe the most recent version of Word does a better job, as I keep getting surprised by the progress they're making. Paul |
#7
|
|||
|
|||
OT Web Page Capture App
I'd second Mike Barnes's response: HTML and
CSS are not so difficult to learn. I often re-edit saved webpages. Trying to edit something like that through point-and-click will never work very well. Unfortunately, though, many sites use WYSIWYG programs, or custom software, to generate the code behind their pages, generating vast amounts of junk code. And many smaller sites are created by small business owners who sign up with monstrous operations like wix.com, patch together a quickie website with drag-drop, and end up with webpages that don't really even use HTML. They're more like database front-ends. All of that is to say that not all webpages are alike. So your success in pasting to Word will depend on the particular page you're dealing with, to some extent. Also, the usefulness of the graphical layout will vary. For example, a webpage with an extensive table of, say, houseplant info will be worth saving with the tables. But an article about houseplants is better saved as plain text. Here's a great example of the kind of muck resulting with auto-generated webpages: https://support.microsoft.com/en-us/kb/310521 It's a page about creating an HTML file in Word. The webpage has 3-4 paragraphs of text, yet the actual page is 185 KB -- 185,000 characters! And there's 1.2 MB of files that go with it. On top of all that, the page is broken without javascript. It's an auto-generated page, so the authors don't need to be concerned with making sense of the code later... so it's barely human-readable. With a webpage like that I go to View - Style - No Style in Firefox/Pale Moon, then select and copy the actual 3 paragraphs of text, then save that in Notepad. There's actually nothing useful added by the graphics, menus, or formatting in that page. Everything useful is contained in the text I've pasted below here, which contains less than 3,000 characters: ---------------- begin MS webpage content ---------- This article provides a step-by-step guide to how to create an HTML document, including items such as typing text and adding images and hyperlinks to your HTML document. Create Your HTML Document Use one of the following two methods to create your new HTML document. Method 1 Start Microsoft Word. In the New Document task pane, click Blank Web Page under New. On the File menu, click Save. NOTE: The Save as type box defaults to Web Page (*.htm; *.html). In the File name box, type the file name that you want for your document, and then click Save. Method 2 Start Microsoft Word. Create a new blank document. On the File menu, click Save as Web Page. In the File name box, type the file name that you want for your document, and then click Save. Add Text and Hyperlinks to Your HTML Document Open the HTML document that you created earlier in this article. To do this, follow these steps: On the File menu, click Open. Browse to the location that you saved your article to, in the "Create Your HTML Document" section of this article. Select the file and then click Open. Type the following text into the document: You can use Microsoft Word to create HTML documents as easily as you can create normal Word documents. To create a hyperlink, select the words "Microsoft Word" in the text that you typed. On the Insert menu, click Hyperlink. In the Insert Hyperlink dialog box, type http://www.microsoft.com/word in the Address box, and then click OK. Save your changes to the document. Add an Image to Your HTML Document Place your insertion point where you want to place an image in your document. On the Insert menu, point to Picture, and then click ClipArt. In the Insert ClipArt task pane, click Search. NOTE: If you click Search without typing anything into the Search Text box, the search result will display all of the currently available images on your system. In the Results section, select the image that you want to insert into the page. Save your changes and then close the document. Open an HTML Document in Word Do one of the following. If the New Document task pane is still displayed: In the New Document task pane, select the document under Open a document. This opens the document directly. -or- If the New Document task pane is not displayed: On the File menu, click Open. In the Open dialog box, locate the HTML document that you created earlier, and then select it. Click Open. REFERENCES For more information about HTML support in Word 2002, follow these steps: Open Microsoft Word 2002. On the Help menu, click Microsoft Word Help. Click the Answer Wizard tab. Type HTML in the What would you like to do? box, and then click Search. Related topics will be displayed. Click any item to display the information. |
#8
|
|||
|
|||
OT Web Page Capture App
In message , Paul
writes: Bob R wrote: Is there an app that will capture all contents of a highlighted area on a webpage, text and images, and put in a .DOC or similar editable format? I can save to a PDF but get all the other garbage on the page. The PDF looks like the webpage. How do you do that (save it as a PDF) - are you using a special application, is it a function of your browser (if so which browser?), or are you just using a to-PDF "printer" like pdf995 or one of many similar? I can do this with great difficulty using Libre Office, however, the images, although there, are badly sized and the page format is messed up some. I save from Libre Office to a .DOC and open In Word and try to edit but it is a nightmare too. Editing text and graphics in Word is nearly impossible. Suggestions please. [LONG bit on how PDF converters have little chance of working all the time snipped] As the other Paul said, I'd save it as an HTML folder (to get the images), then edit it in something. Word can read HTML directly, though I'd _never_ use its ability to save _back_ to HTML, because it tries to use HTML more like PDF to force a page layout, and thus can convert a ten-line HTML page into something many tens of kB. (But I don't _think_ creating it back as HTML is what you want anyway - see second question below.) When you say "Editing text and graphics in Word is nearly impossible", is that your view regardless, or only when the text-and-graphics have come via a specific route? (I'd agree if it presents as layers, or whatever the right terminology is for that "floating over/under" thing Word can do.) Two questions - how are you highlighting the section of the webpage you want (just using the mouse and/or cursor-and-shift keys, or something more complex); and: what do you want to do with the extract once you've got it - put it in a (Word-type say?) document or similar, or what? -- J. P. Gilliver. UMRA: 1960/1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf Intelligence isn't complete without the full picture and the full picture is all about doubt. Otherwise, you go the way of George Bush. - baroness Eliza Manningham-Buller (former head of MI5), Radio Times 3-9 September 2011. |
#9
|
|||
|
|||
OT Web Page Capture App
In message , Paul in Houston TX
writes: Bob R wrote: Is there an app that will capture all contents of a highlighted area on a webpage, text and images, and put in a .DOC or similar editable format? I can save to a PDF but get all the other garbage on the page. The PDF looks like the webpage. I can do this with great difficulty using Libre Office, however, the images, although there, are badly sized and the page format is messed up some. I save from Libre Office to a .DOC and open In Word and try to edit but it is a nightmare too. Editing text and graphics in Word is nearly impossible. Suggestions please. How about saving the web page as an HTML folder using a HTML editor? Basic Firefox can save as such a folder (you select "HTML, complete" or something like that), though you'd then need an HTML editor to actually edit it. (I used to quite like the one that some versions of Netscape had - reasonable compromise between WYSIWYG and keeping you at least to some extent mindful of the underlying HTML code - but that's probably not up to modern web-page things. But there must be plenty of HTML editors. [Of which I don't consider Word to be one - it can read an HTML document, but I'd never use it to write one, even saving back the one it just read.]) SeaMonkey Composer can do that. It's primitive but works. Then you can edit words, move & resize pictures, delete ads, etc. Kompozer and Blue Griffin are decent basic webpage editors. -- J. P. Gilliver. UMRA: 1960/1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf Intelligence isn't complete without the full picture and the full picture is all about doubt. Otherwise, you go the way of George Bush. - baroness Eliza Manningham-Buller (former head of MI5), Radio Times 3-9 September 2011. |
#10
|
|||
|
|||
OT Web Page Capture App
J. P. Gilliver (John) wrote:
In message , Paul in Houston TX writes: Bob R wrote: Is there an app that will capture all contents of a highlighted area on a webpage, text and images, and put in a .DOC or similar editable format? I can save to a PDF but get all the other garbage on the page. The PDF looks like the webpage. I can do this with great difficulty using Libre Office, however, the images, although there, are badly sized and the page format is messed up some. I save from Libre Office to a .DOC and open In Word and try to edit but it is a nightmare too. Editing text and graphics in Word is nearly impossible. Suggestions please. How about saving the web page as an HTML folder using a HTML editor? Basic Firefox can save as such a folder (you select "HTML, complete" or something like that), though you'd then need an HTML editor to actually edit it. No, you wouldn't *need* an HTML editor, though it might help if you had limited time or inclination to learn HTML. Windows notepad would do. I edit *lots* of HTML (also CSS, JavaScript) and I use an ordinary text editor. I do have a copy of Dreamweaver but I haven't used it for years. -- Mike Barnes Cheshire, England |
#11
|
|||
|
|||
OT Web Page Capture App
In message , Mike Barnes
writes: J. P. Gilliver (John) wrote: In message , Paul in Houston TX writes: Bob R wrote: Is there an app that will capture all contents of a highlighted area on a webpage, text and images, and put in a .DOC or similar editable format? I can save to a PDF but get all the other garbage on the page. The PDF looks like the webpage. [] How about saving the web page as an HTML folder using a HTML editor? Basic Firefox can save as such a folder (you select "HTML, complete" or something like that), though you'd then need an HTML editor to actually edit it. No, you wouldn't *need* an HTML editor, though it might help if you had limited time or inclination to learn HTML. Windows notepad would do. I edit *lots* of HTML (also CSS, JavaScript) and I use an ordinary text editor. I do have a copy of Dreamweaver but I haven't used it for years. True; I do all my HTML editing in a text editor. However, (a) the OP wanted "or similar editable format", which suggests he's not familiar with editing raw HTML, and (b) many modern web pages are autogenerated, and thus even if you do know the syntax of HTML, are a Wright Payne to edit - twenty or thirty levels of nested DIVs, even three or four levels of nested TABLEs, complex formatting tags with just an nbsp inside ... so if he did save them, he'd for practical purposes need an HTML editor to edit them. (I'd say - by a narrow margin - "Incredimail" is the worst candidate for creating bloated HTML, but there are others that are pretty bad - Word among them.) -- J. P. Gilliver. UMRA: 1960/1985 MB++G()AL-IS-Ch++(p)Ar@T+H+Sh0!:`)DNAf Anybody who thinks there can be unlimited growth in a static, limited environment, is either mad or an economist. - Sir David Attenborough, in Radio Times 10-16 November 2012 |
#12
|
|||
|
|||
OT Web Page Capture App
| As the other Paul said, I'd save it as an HTML folder (to get the
| images), then edit it in something. Word can read HTML directly, though | I'd _never_ use its ability to save _back_ to HTML, I just did something similar yesterday. Someone sent me a very long DOC, which I opened in Libre Office and found difficult to read. It was laid out as pages, with giant spaces at the bottom of many pages and hard-to-read serif fonts that were far too big. I then saved it as HTML, which LO did a pretty good job of, and found it quick and easy to change fonts, font sizes and colors. I now have a very readable HTML version, which I generally prefer over both DOCs and PDFs for readability. But I wouldn't use an office program to open an HTML file. There are plenty of actual HTML editors for that. Even Notepad will work. Since HTML source code is plain text it doesn't need a special editor except to get conveniences like syntax color highlighting. |
#13
|
|||
|
|||
OT Web Page Capture App
| No, you wouldn't *need* an HTML editor, though it might help if you had | limited time or inclination to learn HTML. Windows notepad would do. I | edit *lots* of HTML (also CSS, JavaScript) and I use an ordinary text | editor. I do have a copy of Dreamweaver but I haven't used it for years. | You don't find that more difficult? I use an editor I wrote myself, with syntax color highlighting, simple debugging, easy attribute lookup, toggling webpage view, CSS popup menu, etc. I find it *much* easier and faster than a simple text editor. I added every feature that I thought would make writing more convenient for me. I haven't seen much of Dreamweaver. Last I saw it was a dual editor, with source view and WYSIWYG functionality. The problem with that approach is that the two methods don't really mix. It's usually hard to edit WYSIWYG-generated code. Anyone who really knows HTML (as it sounds like you do) can't benefit from WYSIWYG convenience. |
#14
|
|||
|
|||
OT Web Page Capture App
Mayayana wrote:
| No, you wouldn't *need* an HTML editor, though it might help if you had | limited time or inclination to learn HTML. Windows notepad would do. I | edit *lots* of HTML (also CSS, JavaScript) and I use an ordinary text | editor. I do have a copy of Dreamweaver but I haven't used it for years. | You don't find that more difficult? I use an editor I wrote myself, with syntax color highlighting, simple debugging, easy attribute lookup, toggling webpage view, CSS popup menu, etc. I find it *much* easier and faster than a simple text editor. I added every feature that I thought would make writing more convenient for me. For most languages - not just HTML - I find a plain editor works best. The editor I use: syntax colour highlighting - yes (HTML, PHP, CSS and JavaScript) debugging - no, my browsers have excellent debuggers built in attribute lookup - not necessary, I can remember the few I use web page view - I prefer using a real browser I haven't seen much of Dreamweaver. Last I saw it was a dual editor, with source view and WYSIWYG functionality. The problem with that approach is that the two methods don't really mix. It's usually hard to edit WYSIWYG-generated code. Anyone who really knows HTML (as it sounds like you do) can't benefit from WYSIWYG convenience. One of the things I liked about Dreamweaver was that it didn't mess with my code, except that it had its own ideas about whitespace. Some operations such as table column manipulation can be easier done graphically. But I don't use tables anywhere near as much as I used to. Also Dreamweaver had an excellent template system but now PHP works better. -- Mike Barnes Cheshire, England |
#15
|
|||
|
|||
OT Web Page Capture App
| The editor I use:
| | syntax colour highlighting - yes (HTML, PHP, CSS and JavaScript) Ah. I thought you were talking about a variation on Notepad. The colorcoding really helps to see what's what. | attribute lookup - not necessary, I can remember the few I use You're a better man than I am. I often forget things like whether VALIGN works with this, or only with that. With CSS it's worse. I never bothered to learn all the properties and their possible values, so I find a popup "intellisense" auto-insert menu makes things much easier. (I don't believe all that stuff about improving memory with age by memorizing things and doing crossword puzzles. I'll take all the crutches I can get.) | web page view - I prefer using a real browser | I use a real browser. It's basically an IE window embedded in my software. I also customized it so that I can hover over an element and see the styles for that element. I think everyone is different with webpage design and coding, but for me there's only one sensible way to do it: I design the page in quirks mode for IE, which takes care of all IE versions "in one fell swoop". I then take that page and adapt it until it looks the same way in Firefox. That takes care of all other browsers. The result is two variations that don't need script and display dependably in all browsers with needing to resort to hacks like !-- If IE8...... |
|
Thread Tools | |
Display Modes | |
|
|