mirror of
https://github.com/lorien/awesome-web-scraping.git
synced 2024-11-28 08:48:58 +02:00
6.1 KiB
6.1 KiB
JavaScript Web Scraping
This list contains JavaScript libraries related to web scraping and data processing
- JavaScript Web Scraping
- Network
- Web-scraping Frameworks
- HTML/XML Parsing
- Text processing
- Specific Formats Processing
- Natural Language Processing
- Browser automation and emulation
- Multiprocessing
- Queue
- Cloud Computing
- URL and Network Address Manipulation
- Web Content Extracting
- Asynchronous
- WebSocket
- DNS Resolving
- Computer Vision
- Proxy Server
- Other JavaScript Lists
- [Data Structure][#data-structure]
Network
- node-http2 - An HTTP/2 client and server implementation for node.js
- httpinvoke - A no-dependencies HTTP client library for browsers and Node.js with a promise-based or Node.js-style callback-based API to progress events, text and binary file upload and download, partial response body, request and response headers, status code.
- request - Simplified HTTP request client.
- socks5-http-client - SOCKS v5 HTTP client implementation in JavaScript for Node.js
- rest - RESTful HTTP client for JavaScript
- wreck - HTTP Client Utilities
Web-Scraping Frameworks
- Full Featured Crawlers
- TODO
- Other
- TODO
HTML/XML Parsing
- General
- TODO
- Sanitizing
- js-xss - Sanitize untrusted HTML (to prevent XSS) with a configuration specified by a Whitelist.
Text Processing
Libraries for parsing and manipulating plain texts.
- General
- string.js - Extra JavaScript string methods.
- accounting.js - A lightweight JavaScript library for number, money and currency formatting - fully localisable, zero dependencies.
- validator.js - String validation and sanitization.
- Date and time
- moment - Parse, validate, manipulate, and display dates in javascript.
- moment-timezone - Timezone support for moment.js.
- date - Date() for humans.
- ms.js - Tiny millisecond conversion utility.
- moment - Parse, validate, manipulate, and display dates in javascript.
- HTML entities
- he - A robust HTML entity encoder/decoder written in JavaScript.
- Money
- money.js - Simple and tiny JavaScript library for realtime currency conversion and exchange rate calculation, from any currency, to any currency.
- Color
Specific Formats Processing
Libraries for parsing and manipulating specific text formats.
- General
- jBinary - High-level I/O (loading, parsing, manipulating, serializing, saving) for binary files with declarative syntax for describing file types and data structures.
- CSV
- BabyParse - Fast and reliable CSV parser based on Papa Parse. Papa Parse is for the browser, Baby Parse is for Node.js.
- JSON
- json3 - A modern JSON implementation compatible with nearly all JavaScript platforms.
Natural Language Processing
Libraries for working with human languages.
- TODO
Browser automation and emulation
- phantomjs - Scriptable Headless WebKit.
- slimerjs - A PhantomJS-like tool running Gecko.
- casperjs - Navigation scripting & testing utility for PhantomJS and SlimerJS.
- zombie - Insanely fast, full-stack, headless browser testing using node.js.
- nightmare - Nightmare is a high level wrapper for PhantomJS that lets you automate browser tasks
Multiprocessing
- TODO
Asynchronous
Libraries for asynchronous networking programming.
- TODO
Queue
- TODO
Libraries for parsing email.
- TODO
URL and Network Address Manipulation
Libraries for parsing/modifying URLs and network addresses.
- URL
- query-string - Parse and stringify URL query strings.
- URI.js - Javascript URL mutation library.
- jsurl - Lightweight URL manipulation with JavaScript.
- Network Address
- TODO
Web Content Extracting
Libraries for extracting web contents.
- Text and Meta Data from HTML pages
- TODO
WebSocket
Libraries for working with WebSocket.
- TODO
DNS Resolving
- TODO
Computer Vision
- tracking.js - A modern approach for Computer Vision on the web.
- ocrad.js - OCR in Javascript via Emscripten.
Proxy Server
- TODO
Data Structure
- immutable - Immutable persistent data collections for Javascript which increase efficiency and simplicity.