1
0
mirror of https://github.com/lorien/awesome-web-scraping.git synced 2024-11-28 08:48:58 +02:00

Update ruby.md

This commit is contained in:
Gregory Petukhov 2015-08-16 21:28:16 +05:00
parent 1ab0fff51f
commit e4b8a565da

32
ruby.md
View File

@ -33,9 +33,10 @@ This list contains ruby libraries related to web scraping and data processing
* [nestful](https://github.com/maccman/nestful) Simple Ruby HTTP/REST client with a sane API
* [EM-HTTP-Request](https://github.com/igrigorik/em-http-request) - EventMachine based asynchronous HTTP client
## Web-Scraping Frameworks
* TODO
* [upton](https://github.com/propublica/upton) - A batteries-included framework for easy web-scraping
## HTML/XML Parsing
@ -53,17 +54,21 @@ This list contains ruby libraries related to web scraping and data processing
*Libraries for parsing and manipulating specific text formats.*
* Office
* [Yomu](https://github.com/Erol) - Read text and metadata from files and documents (.doc, .docx, .pages, .odt, .rtf, .pdf)
* [spreadsheet](https://github.com/zdavatz/spreadsheet) - The Spreadsheet Library is designed to read and write Spreadsheet Documents.
* [roo](https://github.com/Empact/roo) - Roo implements read access for all spreadsheet types and read/write access for Google spreadsheets.
* [google-spreadsheet-ruby](https://github.com/gimite/google-spreadsheet-ruby) - This is a library to read/write Google Spreadsheet.
* [rubyXL](https://github.com/weshatheleopard/rubyXL) - rubyXL is a gem which allows the parsing, creation, and manipulation of Microsoft Excel (.xlsx/.xlsm) Documents
* [remote_table](https://github.com/seamusabshere/remote_table) - Open local or remote XLSX, XLS, ODS, CSV (comma separated), TSV (tab separated), other delimited, fixed-width files, and Google Docs.
* [sheets](https://github.com/bspaulding/Sheets) - Work with spreadsheets easily in a native ruby format.
* [workbook](https://github.com/murb/workbook) - Workbook contains workbooks, as in a table, contains rows, contains cells, reads/writes excel, ods and csv and tab separated files...
* [oxcelix](https://github.com/gbiczo/oxcelix) - A fast Excel 2007/2010 (.xlsx) file parser that returns a collection of Matrix objects
* [wrap_excel](https://github.com/tomiacannondale/wrap_excel) - WrapExcel is to wrap the win32ole, and easy to use Excel operations with ruby. Detailed description please see the README.
* Office
* [Yomu](https://github.com/Erol) - Read text and metadata from files and documents (.doc, .docx, .pages, .odt, .rtf, .pdf)
* [spreadsheet](https://github.com/zdavatz/spreadsheet) - The Spreadsheet Library is designed to read and write Spreadsheet Documents.
* [roo](https://github.com/Empact/roo) - Roo implements read access for all spreadsheet types and read/write access for Google spreadsheets.
* [google-spreadsheet-ruby](https://github.com/gimite/google-spreadsheet-ruby) - This is a library to read/write Google Spreadsheet.
* [rubyXL](https://github.com/weshatheleopard/rubyXL) - rubyXL is a gem which allows the parsing, creation, and manipulation of Microsoft Excel (.xlsx/.xlsm) Documents
* [remote_table](https://github.com/seamusabshere/remote_table) - Open local or remote XLSX, XLS, ODS, CSV (comma separated), TSV (tab separated), other delimited, fixed-width files, and Google Docs.
* [sheets](https://github.com/bspaulding/Sheets) - Work with spreadsheets easily in a native ruby format.
* [workbook](https://github.com/murb/workbook) - Workbook contains workbooks, as in a table, contains rows, contains cells, reads/writes excel, ods and csv and tab separated files...
* [oxcelix](https://github.com/gbiczo/oxcelix) - A fast Excel 2007/2010 (.xlsx) file parser that returns a collection of Matrix objects
* [wrap_excel](https://github.com/tomiacannondale/wrap_excel) - WrapExcel is to wrap the win32ole, and easy to use Excel operations with ruby. Detailed description please see the README.
* libpcap
[PacketFul](https://github.com/packetfu/packetfu) - A library for reading and writing packets to an interface or to a libpcap-formatted file.
* JSON
* [JsonCompare](https://github.com/a2design-company/json-compare) - Returns the difference between two JSON files
## Natural Language Processing
@ -97,6 +102,7 @@ This list contains ruby libraries related to web scraping and data processing
* [Delayed::Job](https://github.com/tobi/delayed_job) — Database backed asynchronous priority queue.
* [Qu](https://github.com/bkeepers/qu) A Ruby library for queuing and processing background jobs.
* [Sidekiq](https://github.com/mperham/sidekiq) Simple, efficient background processing for Ruby
* [Sneakers](https://github.com/jondot/sneakers) - A fast background processing framework for Ruby and RabbitMQ
## Cloud Computing
* TODO
@ -117,7 +123,7 @@ This list contains ruby libraries related to web scraping and data processing
*Libraries for extracting web contents.*
* TODO
* [Metainspector](https://github.com/jaimeiniesta/metainspector) - scrapes a given URL, and returns its title, meta description, meta keywords, an array with all the links, all the images in it, etc
## WebSocket