1
0
mirror of https://github.com/lorien/awesome-web-scraping.git synced 2024-11-24 08:32:19 +02:00

Merge pull request #37 from ahivert/master

Add chopper tool
This commit is contained in:
lorien 2017-12-13 16:47:28 +03:00 committed by GitHub
commit b31f06d9f0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -68,6 +68,7 @@ This list contains python libraries related to web scraping and data processing
* [xhtml2pdf](https://github.com/chrisglass/xhtml2pdf) - HTML/CSS to PDF converter.
* [untangle](https://github.com/stchris/untangle) - Converts XML documents to Python objects for easy access.
* [hodor](https://github.com/CompileInc/hodor) - Configuration driven wrapper around lxml and cssselect.
* [chopper](https://github.com/jurismarches/chopper) - Tool to extract a part from HTML page with corresponding CSS rules and preserving correct HTML.
* [selectolax](https://github.com/rushter/selectolax) - Python bindings to Modest engine (fast HTML5 parser with CSS selectors).
* Sanitizing
* [Bleach](http://bleach.readthedocs.org/en/latest/) - cleaning of HTML (requires html5lib)