From b117a73a5415a6a22c0c6c4896ec7a0542599c77 Mon Sep 17 00:00:00 2001
From: Gregory Petukhov <lorien@lorien.name>
Date: Thu, 13 Aug 2015 02:49:22 +0600
Subject: [PATCH] New stuff

---
 CONTRIBUTING.md | 15 +++++++++++++++
 Makefile        |  1 +
 README.md       |  7 ++++++-
 python.md       | 44 ++++++++++++++++++++++----------------------
 web_service.md  |  3 +++
 5 files changed, 47 insertions(+), 23 deletions(-)
 create mode 100644 CONTRIBUTING.md
 create mode 100644 web_service.md

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 0000000..b74a2cd
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1,15 @@
+# Contributing
+
+Your contributions are always welcome!
+
+## Guidelines
+
+* Clone the web-scraping repo.
+* Add section if needed.
+    * Add section description.
+    * Add section title to Table of contents.
+* Search previous suggestions before making a new one, as yours may be a duplicate.
+* Add your links: `* [project-name](http://example.com/) - A short description ends with a dot.` Do not use common words in description. Try to describe unique features of that library.
+* Check your spelling and grammar.
+* Make sure your text editor is set to remove trailing whitespace.
+* Send a Pull Request.
diff --git a/Makefile b/Makefile
index 972d130..8b40960 100644
--- a/Makefile
+++ b/Makefile
@@ -3,3 +3,4 @@
 html:
 	python -m markdown README.md > html/README.html
 	python -m markdown python.md > html/python.html
+	python -m markdown web_service.md > html/web_service.html
diff --git a/README.md b/README.md
index 736ce76..26aaf19 100644
--- a/README.md
+++ b/README.md
@@ -8,4 +8,9 @@ The list of tools, programming libraries and APIs used in web-scraping.
 
 ## Other
 
-* [Python](http://github.com/lorien/web-scraping/blob/master/web_service.md)
+* [Web Services](http://github.com/lorien/web-scraping/blob/master/web_service.md)
+
+
+## [Contributing](https://github.com/lorien/web-scraping/blob/master/CONTRIBUTING.md)
+
+Make this list better! Your contributions are always welcome!
diff --git a/python.md b/python.md
index f00fdb3..abef4ac 100644
--- a/python.md
+++ b/python.md
@@ -3,34 +3,34 @@
 ## Network Request
 * [urllib](https://docs.python.org/3.4/library/urllib.html?highlight=urllib#module-urllib) - standard python network library
 * [requests](http://www.python-requests.org/) - network library
-* [grab](http://github.com/lorien/grab) - network library (pycurl based)
+* [grab](http://docs.grablib.org/en/latest/) - network library (pycurl based)
 * [pycurl](http://pycurl.sourceforge.net/) - network library (binding to [libcurl](http://curl.haxx.se/libcurl/))
 * [urllib3](https://github.com/shazow/urllib3) - network library
 
 ## Web-Scraping Frameworks
-* [grab](http://github.com/lorien/grab) - web-scraping framework (pycurl/multicurl based)
-* [scrapy](http://scrapy.org/) - web-scraping framework (twisted based)
+* [grab](http://docs.grablib.org/en/latest/#grab-spider-user-manual) - web-scraping framework (pycurl/multicurl based)
+* [scrapy](http://scrapy.org/) - web-scraping framework (twisted based). Does not support Python3.
 
 ## HTML/XML Parsing
-* [lxml](http://lxml.de) - effective HTML/XML processing library. Supports XPATH.
-* [cssselect](https://pythonhosted.org/cssselect) - quering DOM tree with CSS expressions
-* [pyquery](http://pythonhosted.org//pyquery/) - парсинг XML/HTML с помощью jquery-запросов (требует lxml)
-* [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/bs4/doc/) - глючная тормозная библиотека для парсинга XML/HTML, не поддерживающая xpath запросы, плюсом является то, что она написана на чистом питоне.
-* [WHATWG](http://html5lib.readthedocs.org/en/latest/]html5lib[/url] - парсинг и сериализация HTML по спецификации [url=http://www.whatwg.org/) (на неё ориентируются современные веб-браузеры). Возможно выбрать для построения DOM-дерева, как встроенные средства питона, так и библиотеки lxml или BeautifulSoup.
-* [feedparser](http://pythonhosted.org/feedparser/) - парсинг RSS и ATOM фидов.
-* [Bleach](http://bleach.readthedocs.org/en/latest/) - библиотека для очистки HTML (требует html5lib)
+* [lxml](http://lxml.de) - effective HTML/XML processing library. Supports XPATH. Written in C.
+* [cssselect](https://pythonhosted.org/cssselect) - working with DOM tree with CSS selectors
+* [pyquery](http://pythonhosted.org//pyquery/) - working with DOM tree with jQuery-like selectors
+* [BeautifulSoup](http://www.crummy.com/software/BeautifulSoup/bs4/doc/) - slow HTML/XMl processing library, written in pure python
+* [html5lib](http://html5lib.readthedocs.org/en/latest/) - building DOM of HTML/XML парсинг according to [WHATWG spec](url=http://www.whatwg.org/). That spec is used in all modern browsers.
+* [feedparser](http://pythonhosted.org/feedparser/) - parsing of RSS/ATOM feeds.
+* [Bleach](http://bleach.readthedocs.org/en/latest/) - cleaning of HTML (requires html5lib)
 
-## Эмуляторы браузеров
-* [selenium](http://selenium.googlecode.com/git/docs/api/py/api.html) - средство для тестирования веб-интерфейсов с помощью реальных браузеров: google chrome, opera, firefox, IE. Можно и сайты через него парсить. Есть некоторые неудобства, связанные с идеологией инструмента - он позволяет эмулировать внешние действия пользователя. Например, задать свой собственный Referer - это уже проблема. Не требует дополнительных зависимостей.
-* [Ghost.py](http://carrerasrodrigo.github.io/Ghost.py/) - обёртка над QtWebKit (требует PyQt или PySide)
-* [Spynner](https://github.com/makinacorpus/spynner) - ещё одна обёртка над QtWebKit, более ничего не знаю про эту либу :)
+## Browser automation and emulation
+* [selenium](http://selenium.googlecode.com/git/docs/api/py/api.html) - automating real browsers (Chrome, Firefox, Opera, IE)
+* [Ghost.py](http://carrerasrodrigo.github.io/Ghost.py/) - wrapper of QtWebKit (requires PyQT)
+* [Spynner](https://github.com/makinacorpus/spynner) - wrapper of QtWebKit QtWebKit (requires PyQT)
 
-## Параллельная многозадачность
-* [threading](http://docs.python.org/2.7/library/threading.html) - встроенный в python модуль для реализации многозадачности с помощью тредов (объекты языка, выполняющиеся в одном процессе). Минусы подхода в том, что вы не можете загрузить вычислениями более одного ядра.
-* [multiprocessing](http://docs.python.org/2.7/library/multiprocessing.html) - встроенный в python модуль для реализации многозадачности с помощью процессов
-* [celery](http://celery.readthedocs.org/en/latest/index.html) - навороченная реализация очереди задач, поддерживающая различные бэкенды для хранения этой самой очереди задач.
-* [RQ](http://python-rq.org/docs/) - легковесная очередь задач, использующая redis
+## Multiprocessing
+* [threading](http://docs.python.org/2.7/library/threading.html) - standard python library to run threads. Effective for I/O-bound tasks. Useless for CPU-bound tasks because of python GIL.
+* [multiprocessing](http://docs.python.org/2.7/library/multiprocessing.html) - standard python library to run processes.
+* [celery](http://celery.readthedocs.org/en/latest/index.html) - task queue manager
+* [RQ](http://python-rq.org/docs/) - lightweight task queue manager based on redis
 
-## Облачные вычислениях
-* [picloud](http://docs.picloud.com/) - выполнение python-кода в облаке
-* [dominoup.com](http://www.dominoup.com/) - выполнение R, Python и matlab кода в облаке
+## Cloud Computing
+* [picloud](http://docs.picloud.com/) - executing python-code in cloud
+* [dominoup.com](http://www.dominoup.com/) - executing R, Python и matlab code in cloud
diff --git a/web_service.md b/web_service.md
new file mode 100644
index 0000000..346d20d
--- /dev/null
+++ b/web_service.md
@@ -0,0 +1,3 @@
+# Web-scraping Web Services
+
+* [import.io](https://import.io/) - bla bla bla