mirror of
https://github.com/lorien/awesome-web-scraping.git
synced 2024-11-28 08:48:58 +02:00
Update ruby.md
This commit is contained in:
parent
1ab0fff51f
commit
e4b8a565da
32
ruby.md
32
ruby.md
@ -33,9 +33,10 @@ This list contains ruby libraries related to web scraping and data processing
|
||||
* [nestful](https://github.com/maccman/nestful) Simple Ruby HTTP/REST client with a sane API
|
||||
* [EM-HTTP-Request](https://github.com/igrigorik/em-http-request) - EventMachine based asynchronous HTTP client
|
||||
|
||||
|
||||
## Web-Scraping Frameworks
|
||||
|
||||
* TODO
|
||||
* [upton](https://github.com/propublica/upton) - A batteries-included framework for easy web-scraping
|
||||
|
||||
## HTML/XML Parsing
|
||||
|
||||
@ -53,17 +54,21 @@ This list contains ruby libraries related to web scraping and data processing
|
||||
|
||||
*Libraries for parsing and manipulating specific text formats.*
|
||||
|
||||
* Office
|
||||
* [Yomu](https://github.com/Erol) - Read text and metadata from files and documents (.doc, .docx, .pages, .odt, .rtf, .pdf)
|
||||
* [spreadsheet](https://github.com/zdavatz/spreadsheet) - The Spreadsheet Library is designed to read and write Spreadsheet Documents.
|
||||
* [roo](https://github.com/Empact/roo) - Roo implements read access for all spreadsheet types and read/write access for Google spreadsheets.
|
||||
* [google-spreadsheet-ruby](https://github.com/gimite/google-spreadsheet-ruby) - This is a library to read/write Google Spreadsheet.
|
||||
* [rubyXL](https://github.com/weshatheleopard/rubyXL) - rubyXL is a gem which allows the parsing, creation, and manipulation of Microsoft Excel (.xlsx/.xlsm) Documents
|
||||
* [remote_table](https://github.com/seamusabshere/remote_table) - Open local or remote XLSX, XLS, ODS, CSV (comma separated), TSV (tab separated), other delimited, fixed-width files, and Google Docs.
|
||||
* [sheets](https://github.com/bspaulding/Sheets) - Work with spreadsheets easily in a native ruby format.
|
||||
* [workbook](https://github.com/murb/workbook) - Workbook contains workbooks, as in a table, contains rows, contains cells, reads/writes excel, ods and csv and tab separated files...
|
||||
* [oxcelix](https://github.com/gbiczo/oxcelix) - A fast Excel 2007/2010 (.xlsx) file parser that returns a collection of Matrix objects
|
||||
* [wrap_excel](https://github.com/tomiacannondale/wrap_excel) - WrapExcel is to wrap the win32ole, and easy to use Excel operations with ruby. Detailed description please see the README.
|
||||
* Office
|
||||
* [Yomu](https://github.com/Erol) - Read text and metadata from files and documents (.doc, .docx, .pages, .odt, .rtf, .pdf)
|
||||
* [spreadsheet](https://github.com/zdavatz/spreadsheet) - The Spreadsheet Library is designed to read and write Spreadsheet Documents.
|
||||
* [roo](https://github.com/Empact/roo) - Roo implements read access for all spreadsheet types and read/write access for Google spreadsheets.
|
||||
* [google-spreadsheet-ruby](https://github.com/gimite/google-spreadsheet-ruby) - This is a library to read/write Google Spreadsheet.
|
||||
* [rubyXL](https://github.com/weshatheleopard/rubyXL) - rubyXL is a gem which allows the parsing, creation, and manipulation of Microsoft Excel (.xlsx/.xlsm) Documents
|
||||
* [remote_table](https://github.com/seamusabshere/remote_table) - Open local or remote XLSX, XLS, ODS, CSV (comma separated), TSV (tab separated), other delimited, fixed-width files, and Google Docs.
|
||||
* [sheets](https://github.com/bspaulding/Sheets) - Work with spreadsheets easily in a native ruby format.
|
||||
* [workbook](https://github.com/murb/workbook) - Workbook contains workbooks, as in a table, contains rows, contains cells, reads/writes excel, ods and csv and tab separated files...
|
||||
* [oxcelix](https://github.com/gbiczo/oxcelix) - A fast Excel 2007/2010 (.xlsx) file parser that returns a collection of Matrix objects
|
||||
* [wrap_excel](https://github.com/tomiacannondale/wrap_excel) - WrapExcel is to wrap the win32ole, and easy to use Excel operations with ruby. Detailed description please see the README.
|
||||
* libpcap
|
||||
[PacketFul](https://github.com/packetfu/packetfu) - A library for reading and writing packets to an interface or to a libpcap-formatted file.
|
||||
* JSON
|
||||
* [JsonCompare](https://github.com/a2design-company/json-compare) - Returns the difference between two JSON files
|
||||
|
||||
## Natural Language Processing
|
||||
|
||||
@ -97,6 +102,7 @@ This list contains ruby libraries related to web scraping and data processing
|
||||
* [Delayed::Job](https://github.com/tobi/delayed_job) — Database backed asynchronous priority queue.
|
||||
* [Qu](https://github.com/bkeepers/qu) A Ruby library for queuing and processing background jobs.
|
||||
* [Sidekiq](https://github.com/mperham/sidekiq) Simple, efficient background processing for Ruby
|
||||
* [Sneakers](https://github.com/jondot/sneakers) - A fast background processing framework for Ruby and RabbitMQ
|
||||
|
||||
## Cloud Computing
|
||||
* TODO
|
||||
@ -117,7 +123,7 @@ This list contains ruby libraries related to web scraping and data processing
|
||||
|
||||
*Libraries for extracting web contents.*
|
||||
|
||||
* TODO
|
||||
* [Metainspector](https://github.com/jaimeiniesta/metainspector) - scrapes a given URL, and returns its title, meta description, meta keywords, an array with all the links, all the images in it, etc
|
||||
|
||||
|
||||
## WebSocket
|
||||
|
Loading…
Reference in New Issue
Block a user