1
0
mirror of https://github.com/lorien/awesome-web-scraping.git synced 2024-11-28 08:48:58 +02:00

Create ruby.md

This commit is contained in:
Gregory Petukhov 2015-08-16 21:16:59 +05:00
parent 4973812455
commit 485b95b5a7

142
ruby.md Normal file
View File

@ -0,0 +1,142 @@
# Python Web Scraping
This list contains ruby libraries related to web scraping and data processing
* [Python Web Scraping](#python-web-scraping)
* [Network](#network)
* [Web-scraping Frameworks](#web-scraping-frameworks)
* [HTML/XML Parsing](#htmlxml-parsing)
* [Text processing](#text-processing)
* [Specific Formats Processing](#specific-formats-processing)
* [Natural Language Processing](#natural-language-processing)
* [Downloader](#downloader)
* [Browser automation and emulation](#browser-automation-and-emulation)
* [Multiprocessing](#multiprocessing)
* [Queue](#queue)
* [Cloud Computing](#cloud-computing)
* [Email](#email)
* [URL Manipulation](#url-manipulation)
* [Web Content Extracting](#web-content-extracting)
* [Asynchronous](#asynchronous)
* [WebSocket](#websocket)
* [DNS Resolving](#dns-resolving)
* [Computer Vision](#computer-vision)
* [Geolocation](#geolocation)
* [Other Python Lists](#other-python-lists)
## Network
* [httparty](https://github.com/jnunemaker/httparty) Makes http fun again!
* [faraday](https://github.com/lostisland/faraday) Simple, but flexible HTTP client library, with support for multiple backends.
* [http](https://github.com/tarcieri/http) A simple Ruby DSL for making HTTP requests
* [excon](https://github.com/excon/excon) Usable, fast, simple HTTP(S) 1.1 for Ruby
* [nestful](https://github.com/maccman/nestful) Simple Ruby HTTP/REST client with a sane API
* [EM-HTTP-Request](https://github.com/igrigorik/em-http-request) - EventMachine based asynchronous HTTP client
## Web-Scraping Frameworks
* TODO
## HTML/XML Parsing
* [nokogiri](https://github.com/sparklemotion/nokogiri) - HTML, XML, SAX, and Reader parser with XPath and CSS selector support
* [loofah](https://github.com/flavorjones/loofah) - HTML/XML manipulation and sanitization based on Nokogiri
## Text Processing
*Libraries for parsing and manipulating plain texts.*
* General
* TODO
## Specific Formats Processing
*Libraries for parsing and manipulating specific text formats.*
* Office
* [Yomu](https://github.com/Erol) - Read text and metadata from files and documents (.doc, .docx, .pages, .odt, .rtf, .pdf)
* [spreadsheet](https://github.com/zdavatz/spreadsheet) - The Spreadsheet Library is designed to read and write Spreadsheet Documents.
* [roo](https://github.com/Empact/roo) - Roo implements read access for all spreadsheet types and read/write access for Google spreadsheets.
* [google-spreadsheet-ruby](https://github.com/gimite/google-spreadsheet-ruby) - This is a library to read/write Google Spreadsheet.
* [rubyXL](https://github.com/weshatheleopard/rubyXL) - rubyXL is a gem which allows the parsing, creation, and manipulation of Microsoft Excel (.xlsx/.xlsm) Documents
* [remote_table](https://github.com/seamusabshere/remote_table) - Open local or remote XLSX, XLS, ODS, CSV (comma separated), TSV (tab separated), other delimited, fixed-width files, and Google Docs.
* [sheets](https://github.com/bspaulding/Sheets) - Work with spreadsheets easily in a native ruby format.
* [workbook](https://github.com/murb/workbook) - Workbook contains workbooks, as in a table, contains rows, contains cells, reads/writes excel, ods and csv and tab separated files...
* [oxcelix](https://github.com/gbiczo/oxcelix) - A fast Excel 2007/2010 (.xlsx) file parser that returns a collection of Matrix objects
* [wrap_excel](https://github.com/tomiacannondale/wrap_excel) - WrapExcel is to wrap the win32ole, and easy to use Excel operations with ruby. Detailed description please see the README.
## Natural Language Processing
*Libraries for working with human languages.*
* [Treat](https://github.com/louismullie/treat) - Treat is a toolkit for natural language processing and computational linguistics in Ruby
## Downloader
*Libraries for downloading.*
* TODO
## Browser automation and emulation
* TODO
## Multiprocessing
* [Celluloid](https://github.com/celluloid/celluloid) - Actor-based concurrent object framework for Ruby
* [Parallel](https://github.com/grosser/parallel) - Ruby parallel processing made simple and fast
## Asynchronous
*Libraries for asynchronous networking programming.*
* [EventMachine](https://github.com/eventmachine/eventmachine) - event-driven I/O and lightweight concurrency library
## Queue
* [Resque](https://github.com/resque/resque) A Redis-backed Ruby library for creating background jobs, placing them on multiple queues.
* [Delayed::Job](https://github.com/tobi/delayed_job) — Database backed asynchronous priority queue.
* [Qu](https://github.com/bkeepers/qu) A Ruby library for queuing and processing background jobs.
* [Sidekiq](https://github.com/mperham/sidekiq) Simple, efficient background processing for Ruby
## Cloud Computing
* TODO
## Email
*Libraries for parsing email.*
* [mail](https://github.com/mikel/mail) A Really Ruby Mail Library
## URL Manipulation
*Libraries for parsing URLs.*
* TODO
## Web Content Extracting
*Libraries for extracting web contents.*
* TODO
## WebSocket
*Libraries for working with WebSocket.*
* [em-websocket](https://github.com/igrigorik/em-websocket) - EventMachine based WebSocket server
## DNS Resolving
* TODO
## Computer Vision
* TODO
## Geolocation
* [geocoder](https://github.com/alexreisner/geocoder) Complete Ruby geocoding solution
* [Geokit](https://github.com/geokit/geokit) - Geokit gem provides geocoding and distance/heading calculations.
## Other ruby lists
* TODO