1
0
mirror of https://github.com/lorien/awesome-web-scraping.git synced 2024-11-21 17:17:03 +02:00

add html5-parser to python

This commit is contained in:
Andriy Orehov 2020-10-19 21:48:59 +03:00
parent e0ff53e53b
commit 10d98d9faa
2 changed files with 2 additions and 0 deletions

1
.gitignore vendored
View File

@ -1,6 +1,7 @@
*.swp
*.swo
*.orig
.idea
html
Pipfile.lock

View File

@ -102,6 +102,7 @@ This list contains python libraries related to web scraping and data processing
* [chopper](https://github.com/jurismarches/chopper) - Tool to extract a part from HTML page with corresponding CSS rules and preserving correct HTML.
* [selectolax](https://github.com/rushter/selectolax) - Python bindings to Modest engine (fast HTML5 parser with CSS selectors).
* [parsel](https://github.com/scrapy/parsel) - Lets you extract data from XML/HTML documents using XPath or CSS selectors.
* [html5-parser](https://github.com/kovidgoyal/html5-parser) - Fast C based HTML 5 parsing for python.
### HTML/XML : Sanitizing