Stop guessing what′s working and start seeing it for yourself.
Giriş yapın veya kayıt olun
Q&A
Question Center →

Semalt Expert: How To Extract Text From Web Pages

While there are scraping tools out there capable of extracting data from multiple pages in a matter of seconds, the one sure way of extracting text from web pages has always been highlighting and copying the text. But this method is somewhat cumbersome especially in cases whereby you have to copy text from multiple pages. Also, web developers are coming up with means of locking up a web page's content to prevent "copying" it.

'Now to start off, there are various quick methods of extracting text from web pages. Depending on the amount of text you want to obtain you can choose between the following modes:

1. Save-page method

This technique relies on the ability of browsers to save a copy of the current web page locally. To do so simply hold control+S buttons together or you can right-click on the page, and select save the page from the popup menu. This will launch an explorer window that requires you to specify some attributes of the web page.

On the lower section, there's a "filename" option that will give you the opportunity to specify the name of the web page file. It's important to note that the browser will also create a folder with a similar name that will contain all the attached data from the webpage such as images and backdrops.

Below that, there is a "save as type" option that allows you to specify which file type you want to be saved as. Considering that we are interested in text only select save as ".txt" which will automatically create a text file containing all of the web page's text and can be edited using any word processor. This method is especially useful in scenarios where you have to copy full pages. In case you need to leave out some parts of the text simply open the text file and cut out the unnecessary text.

2. Ctrl+C and Ctrl+V method

This is probably the oldest trick in the book, by only using your mouse to highlight the text that you wish to extract you can then proceed to copy it and paste it elsewhere. This method is useful when you need to copy snippets and quickly use them in another document.

To perform this, you need to scroll to the part containing the text that you require, press and hold the left mouse button to switch the cursor from "navigation" mode to "highlight" mode. This will allow you to highlight the text, to do so continue holding the left mouse button and move the cursor around to highlight your text. When you are done, release the button and right click on the text that you have copied to pop up the navigation menu. On it clicks on "copy" option to copy the selected text.

Navigate to the text document where you want to save the text and right click to pop up the menu and click on paste.

It's important to note that you can select between various paste modes but if you are interested in text only, click on paste as plain text.

View more on these topics

Post a comment

Post Your Comment

Skype

semaltcompany

WhatsApp

16468937756

Telegram

Semaltsupport