mirror of
https://github.com/lorien/awesome-web-scraping.git
synced 2024-11-24 08:32:19 +02:00
Update python.md
This commit is contained in:
parent
a3aa31fe12
commit
46ef0ccbb8
@ -70,8 +70,6 @@ This list contains python libraries related to web scraping and data processing
|
|||||||
* Sanitizing
|
* Sanitizing
|
||||||
* [Bleach](http://bleach.readthedocs.org/en/latest/) - cleaning of HTML (requires html5lib)
|
* [Bleach](http://bleach.readthedocs.org/en/latest/) - cleaning of HTML (requires html5lib)
|
||||||
* [sanitize](https://github.com/Alir3z4/sanitize) - Bringing sanity to world of messed-up data.
|
* [sanitize](https://github.com/Alir3z4/sanitize) - Bringing sanity to world of messed-up data.
|
||||||
* Tables
|
|
||||||
* [rows](https://github.com/turicas/rows) - A common, beautiful interface to tabular data, no matter the format (currently CSV, HTML, XLS, TXT -- more coming!)
|
|
||||||
|
|
||||||
## Text Processing
|
## Text Processing
|
||||||
|
|
||||||
@ -180,7 +178,7 @@ This list contains python libraries related to web scraping and data processing
|
|||||||
* [langid.py](https://github.com/saffsd/langid.py) - Stand-alone language identification system.
|
* [langid.py](https://github.com/saffsd/langid.py) - Stand-alone language identification system.
|
||||||
* [Korean](https://korean.readthedocs.org/) - A library for [Korean](http://en.wikipedia.org/wiki/Korean_language) morphology.
|
* [Korean](https://korean.readthedocs.org/) - A library for [Korean](http://en.wikipedia.org/wiki/Korean_language) morphology.
|
||||||
* [pymorphy2](https://github.com/kmike/pymorphy2) - Morphological analyzer (POS tagger + inflection engine) for Russian language.
|
* [pymorphy2](https://github.com/kmike/pymorphy2) - Morphological analyzer (POS tagger + inflection engine) for Russian language.
|
||||||
* [pypln](http://www.pypln.org/) - Distributed Natural Language Processing
|
* [PyPLN](https://github.com/NAMD/pypln.backend) - A distributed pipeline for natural language processing, made in Python. he goal of the project is to create an easy way to use NLTK for processing big corpora, with a Web interface.
|
||||||
|
|
||||||
## Browser automation and emulation
|
## Browser automation and emulation
|
||||||
* [selenium](http://selenium.googlecode.com/git/docs/api/py/api.html) - automating real browsers (Chrome, Firefox, Opera, IE)
|
* [selenium](http://selenium.googlecode.com/git/docs/api/py/api.html) - automating real browsers (Chrome, Firefox, Opera, IE)
|
||||||
@ -257,8 +255,6 @@ This list contains python libraries related to web scraping and data processing
|
|||||||
* [you-get](http://www.soimort.org/you-get/) - A YouTube/Youku/Niconico video downloader written in Python 3.
|
* [you-get](http://www.soimort.org/you-get/) - A YouTube/Youku/Niconico video downloader written in Python 3.
|
||||||
* Wiki
|
* Wiki
|
||||||
* [WikiTeam](https://github.com/WikiTeam/wikiteam) - Tools for downloading and preserving wikis.
|
* [WikiTeam](https://github.com/WikiTeam/wikiteam) - Tools for downloading and preserving wikis.
|
||||||
* Tables
|
|
||||||
* [rows](https://github.com/turicas/rows) - A common, beautiful interface to tabular data, no matter the format (currently CSV, HTML, XLS, TXT -- more coming!)
|
|
||||||
|
|
||||||
## WebSocket
|
## WebSocket
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user