1
0
mirror of https://github.com/lorien/awesome-web-scraping.git synced 2024-11-30 08:57:19 +02:00
Commit Graph

165 Commits

Author SHA1 Message Date
lorien
ad05430f52
Add cloudpickle to python list 2023-04-18 00:59:47 +07:00
lorien
84d0299a68
Add multiple serialization libs to python list 2023-04-16 21:30:52 +07:00
Nykakin
0b6b60f48f Move chompjs to separate section 2022-12-21 08:57:52 +01:00
Nykakin
48e231f63e Add chompjs 2022-11-28 17:14:33 +01:00
Sergey Scat
58f7955d3c
Add Unicaps to Captcha Solving (Python) 2022-09-05 13:32:33 +03:00
Some User
b5f82fc622 Fix spaces 2022-06-24 14:47:34 +03:00
milahu
46d625c36a
sort by github stars 2022-04-18 08:54:38 +02:00
milahu
1c3879c18a
fix links 2022-04-18 08:51:15 +02:00
Ronie Martinez
22589140be
Add Dude to python web scraping frameworks
Added Dude (https://github.com/roniemartinez/dude)
2022-03-18 01:55:37 +01:00
lorien
fa4a1f9c59
Add captcha solving libraries 2022-03-16 12:48:45 +03:00
lorien
9e1145e29e
Remove links to outdated python lists 2022-03-16 12:35:20 +03:00
rand-net
a8bdec29ee
Adds dribbble-py - a dribbble.com scraper 2022-03-07 13:14:22 +00:00
Thiago Lages de Alencar
b4190c7e22
Update python.md
Link was down, switched to github repository
2022-01-09 05:18:06 -03:00
lorien
5004b1b71f
Update python.md 2021-08-10 17:15:23 +03:00
lorien
44e7083e3a
Merge pull request #126 from ray-102/master
Added Extractnet an ML based crawler in Python
2021-07-24 21:29:06 +03:00
0xflotus
66da267fa5
fixed small error 2021-04-05 00:48:14 +02:00
ray-102
d36d03b647
Added ExtractNet 2021-03-07 14:20:53 +08:00
lorien
68548b984e
Update python.md 2020-12-12 18:42:28 +03:00
Prayson Wilfred Daniel
ff18f3f2dd
moved photon to correct category
Moved photo to web extraction
2020-12-03 19:26:03 +01:00
Prayson Wilfred Daniel
46b1749888
Update python.md
added seleniumbase, photon,  and frontera
2020-12-01 09:27:28 +01:00
Prayson Wilfred Daniel
a65a3d4baa
added gazpacho
gazpacho is a simple, fast, and modern web scraping library
2020-10-30 10:51:39 +01:00
Thuc Phan
4bd58c232a
Add Gerapy 2020-10-24 17:32:33 +07:00
Gregory Petukhov
5b3163acb1
Merge pull request #117 from Proteusiq/patch-2
Added Playwright
2020-10-19 22:03:28 +03:00
Andriy Orehov
10d98d9faa add html5-parser to python 2020-10-19 21:48:59 +03:00
Prayson Wilfred Daniel
743d4324b4
Added Playwright
Playwright > Selenium (> == greater)
2020-10-18 10:38:55 +02:00
Gregory Petukhov
e0ff53e53b
Merge pull request #114 from hedythedev/patch-2
add dateparser to python.md
2020-10-13 00:41:39 +03:00
Prayson Wilfred Daniel
3857575481
Added Starbelly
Starbelly is a user-friendly web crawler that is easy to deploy and configure.
2020-10-12 20:41:58 +02:00
Hedy Li
6a118a3536
move dateparser to text processing time and date 2020-10-08 13:20:37 +08:00
Hedy Li
e30c1f9735
add dateparser to python.md 2020-10-02 07:52:42 +08:00
Peter Thaleikis
0aef446138
Updating link to grab docs 2020-09-23 00:21:26 +04:00
Gregory Petukhov
2288fdc88b
Update python.md 2020-09-14 13:17:38 +03:00
Prayson Wilfred Daniel
9a37c1adfb
Added Advertools
advertools is added in Web Content Extraction
2020-09-09 12:09:32 +02:00
alireza
f734fb42e2 add autoscraper for python 2020-09-02 20:11:05 +04:30
Gregory Petukhov
64e44f26e9
Use github links for some of packages in the list 2020-03-27 17:01:52 +03:00
Namal Dayarathna
384f3a5207
Fixed outdated link for cssselect 2020-02-17 20:44:23 +00:00
Adrien Barbaresi
2c45ad3f17
Scraping library added
- Content scraping library
2020-01-29 13:09:38 +01:00
The Woops
f3a1d60fc9
change external links to github links 2020-01-14 19:03:42 +01:00
The Woops
929290f39f
Added 3 popular NLP-libraries 2020-01-13 14:36:39 +01:00
Gregory Petukhov
aedc67bb7d
Add python:orjson 2020-01-08 16:35:49 +03:00
Gregory Petukhov
f26b8691f3
Add python:httptools 2019-12-25 08:16:09 +03:00
Gregory Petukhov
0625cde22f
add scapy to python 2019-12-16 01:43:28 +03:00
Gregory Petukhov
3c88280317 Fix formatting 2019-11-28 23:20:57 +03:00
Gregory Petukhov
68402d69e3
Add dnspython 2019-11-23 00:19:54 +03:00
Gregory Petukhov
14df577fd4
Update httplib2 description and link 2019-11-22 04:54:32 +03:00
Gregory Petukhov
6c66ce8086
Add ioweb to python web scraping frameworks 2019-11-22 04:48:28 +03:00
Gregory Petukhov
b1aed659bf
Add javascript engine bindings to python list 2019-11-22 01:55:05 +03:00
Gregory Petukhov
390d725b49
Add cloudscraper to python page 2019-11-22 01:43:29 +03:00
Gregory Petukhov
18c5f9a3ce
Fix pycrumbs link 2019-11-08 13:51:43 +03:00
Gregory Petukhov
314fe87505
Merge pull request #96 from andriyor/add-python-user-agnt
add uap-python to Python User-Agent parser
2019-10-25 16:45:03 +03:00
Gregory Petukhov
4fbf8d0a7b
Fix pull request 2019-10-25 16:44:33 +03:00