1
0
mirror of https://github.com/lorien/awesome-web-scraping.git synced 2024-11-28 08:48:58 +02:00
awesome-web-scraping/javascript.md
2015-08-21 22:06:22 +05:00

6.1 KiB

JavaScript Web Scraping

This list contains JavaScript libraries related to web scraping and data processing

Network

  • node-http2 - An HTTP/2 client and server implementation for node.js
  • httpinvoke - A no-dependencies HTTP client library for browsers and Node.js with a promise-based or Node.js-style callback-based API to progress events, text and binary file upload and download, partial response body, request and response headers, status code.
  • request - Simplified HTTP request client.
  • socks5-http-client - SOCKS v5 HTTP client implementation in JavaScript for Node.js
  • rest - RESTful HTTP client for JavaScript
  • wreck - HTTP Client Utilities

Web-Scraping Frameworks

  • Full Featured Crawlers
    • TODO
  • Other
    • TODO

HTML/XML Parsing

  • General
    • TODO
  • Sanitizing
    • js-xss - Sanitize untrusted HTML (to prevent XSS) with a configuration specified by a Whitelist.

Text Processing

Libraries for parsing and manipulating plain texts.

  • General
    • string.js - Extra JavaScript string methods.
    • accounting.js - A lightweight JavaScript library for number, money and currency formatting - fully localisable, zero dependencies.
    • validator.js - String validation and sanitization.
  • Date and time
    • moment - Parse, validate, manipulate, and display dates in javascript.
    • date - Date() for humans.
    • ms.js - Tiny millisecond conversion utility.
  • HTML entities
    • he - A robust HTML entity encoder/decoder written in JavaScript.
  • Money
    • money.js - Simple and tiny JavaScript library for realtime currency conversion and exchange rate calculation, from any currency, to any currency.
  • Color
    • chroma.js - JavaScript library for all kinds of color manipulations.
    • color - JavaScript color conversion and manipulation library.
    • TinyColor - Fast, small color manipulation and conversion for JavaScript.

Specific Formats Processing

Libraries for parsing and manipulating specific text formats.

  • General
    • jBinary - High-level I/O (loading, parsing, manipulating, serializing, saving) for binary files with declarative syntax for describing file types and data structures.
  • CSV
    • BabyParse - Fast and reliable CSV parser based on Papa Parse. Papa Parse is for the browser, Baby Parse is for Node.js.
  • JSON
    • json3 - A modern JSON implementation compatible with nearly all JavaScript platforms.

Natural Language Processing

Libraries for working with human languages.

  • TODO

Browser automation and emulation

  • phantomjs - Scriptable Headless WebKit.
  • slimerjs - A PhantomJS-like tool running Gecko.
  • casperjs - Navigation scripting & testing utility for PhantomJS and SlimerJS.
  • zombie - Insanely fast, full-stack, headless browser testing using node.js.
  • nightmare - Nightmare is a high level wrapper for PhantomJS that lets you automate browser tasks

Multiprocessing

  • TODO

Asynchronous

Libraries for asynchronous networking programming.

  • TODO

Queue

  • TODO

Email

Libraries for parsing email.

  • TODO

URL and Network Address Manipulation

Libraries for parsing/modifying URLs and network addresses.

  • URL
    • query-string - Parse and stringify URL query strings.
    • URI.js - Javascript URL mutation library.
    • jsurl - Lightweight URL manipulation with JavaScript.
  • Network Address
    • TODO

Web Content Extracting

Libraries for extracting web contents.

  • Text and Meta Data from HTML pages
    • TODO

WebSocket

Libraries for working with WebSocket.

  • TODO

DNS Resolving

  • TODO

Computer Vision

  • tracking.js - A modern approach for Computer Vision on the web.
  • ocrad.js - OCR in Javascript via Emscripten.

Proxy Server

  • TODO

Data Structure

  • immutable - Immutable persistent data collections for Javascript which increase efficiency and simplicity.

Other JavaScript lists