mirror of
https://github.com/lorien/awesome-web-scraping.git
synced 2024-11-28 08:48:58 +02:00
1.9 KiB
1.9 KiB
Console Tools
Console tools related to web scraping and data processing
HTTP Clients
- curl - supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, HTTP/2, cookies, user+password authentication (Basic, Plain, Digest, CRAM-MD5, NTLM, Negotiate and Kerberos), file transfer resume, proxy tunneling and more.
- httpie - sending arbitrary HTTP requests using a simple and natural syntax, and displays colorized output. HTTPie can be used for testing, debugging, and generally interacting with HTTP servers.
- wget - package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc.
- crawley - unix-way web scraper/crawler.
Specific Formats Processing
- Office
- unoconv - convert between any document format supported by LibreOffice/OpenOffice.
- CSV
- csvkit - utilities for converting to and working with CSV.
Proxy Wrappers
- proxychains - a tool that forces any TCP connection made by any given application to follow through proxy
- proxychains-ng - a preloader which hooks calls to sockets in dynamically linked programs and redirects it through one or more socks/http proxies. continuation of the unmaintained proxychains project
Other Lists
- structured-text-tools - A list of command line tools for manipulating structured text data