1
0
mirror of https://github.com/lorien/awesome-web-scraping.git synced 2024-11-28 08:48:58 +02:00
awesome-web-scraping/console_tools.md
Shevchenko Alexey 4bf66999f3 order fix
2022-10-10 21:58:50 +03:00

1.9 KiB

Console Tools

Console tools related to web scraping and data processing

HTTP Clients

  • curl - supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, HTTP/2, cookies, user+password authentication (Basic, Plain, Digest, CRAM-MD5, NTLM, Negotiate and Kerberos), file transfer resume, proxy tunneling and more.
  • httpie - sending arbitrary HTTP requests using a simple and natural syntax, and displays colorized output. HTTPie can be used for testing, debugging, and generally interacting with HTTP servers.
  • wget - package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc.
  • crawley - unix-way web scraper/crawler.

Specific Formats Processing

  • Office
    • unoconv - convert between any document format supported by LibreOffice/OpenOffice.
  • CSV
    • csvkit - utilities for converting to and working with CSV.

Proxy Wrappers

  • proxychains - a tool that forces any TCP connection made by any given application to follow through proxy
  • proxychains-ng - a preloader which hooks calls to sockets in dynamically linked programs and redirects it through one or more socks/http proxies. continuation of the unmaintained proxychains project

Other Lists