|

Q&A: How do you get plain text from HTML?

Save or share to

  1. You can view the plain text from any of HTML tags: <p>, <h1>, <h2>, <h3> <h4>, <h5>, <h6>, <td> (inside a <tr> in a <table>), <th> (inside a <tr> in a <table>) and all text inside a tag are texts.
  2. Some of the text may disappear from the webpage. This is done by CSS, JS and other scripts.
  3. Some of the text may changed by a script (such as JavaScript). You can see the final results in the web inspector.
  4. Some of the texts are images. This cannot be converted unless the image is converted by some kind of OCRs and Online OCRs.
  5. Some of the texts may be hidden due to another object blocking it. For example, a text is located inside the <div> tag but blocked by other elements inside the tag. Another example is when the text inside the HTML5 <video> tag is hidden because the tag shows content in a video format.
  6. Some images have hidden text, too. This can be located by the attribute alt=”…” inside the <img> tag.
  7. Texts that are inside of an applet/object that requires plugins (such as Java, Flash and Silverlight) may not be copied completely as text. Meanwhile, you can still obtain the plain text from HTML via Reading View feature that is available on some browsers such as Android Stock browser, Firefox and Safari. There may be extensions available for Chrome and other browser’s users.

Thanks for reading this article! By the way, we’re also working on finishing these interesting posts. Revisit this site soon or follow us to see them once they’re published!

[display-posts post_status=”future” include_link=”false” wrapper_id=”future-list”]

Save or share to

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *