2015-05-28 09:29:27 +08:00
|
|
|
## WHAT-IS
|
|
|
|
|
|
|
|
`Scrapy`: An open source and collaborative framework for extracting the data
|
|
|
|
you need from websites. In a fast, simple, yet extensible way.
|
2015-05-28 09:07:32 +08:00
|
|
|
|
2015-05-28 12:21:17 +08:00
|
|
|
This image is based on `debian:jessie`, and it only takes 278.6 MB.
|
|
|
|
You can create a scrapy (v0.24.6) project on top of this image.
|
2015-05-28 09:07:32 +08:00
|
|
|
|
2015-05-28 09:29:27 +08:00
|
|
|
## HOW-TO
|
|
|
|
|
|
|
|
```
|
2015-05-28 12:21:17 +08:00
|
|
|
$ docker run --name scrapy -it vimagick/scrapy
|
|
|
|
>>> scrapy startproject demo
|
|
|
|
>>> cd demo
|
|
|
|
>>> scrapy genspider example example.com
|
|
|
|
>>> scrapy edit example
|
|
|
|
>>> scrapy crawl example
|
2015-05-28 09:29:27 +08:00
|
|
|
```
|
|
|
|
|
|
|
|
## TODO-LIST
|
|
|
|
|
2015-05-28 10:42:35 +08:00
|
|
|
- [x] build [libxml2][1]/[libxslt][2] from source
|
|
|
|
- [x] add [scrapy_bash_completion][3] script
|
2015-05-28 09:29:27 +08:00
|
|
|
|
|
|
|
[1]: http://www.xmlsoft.org/downloads.html
|
|
|
|
[2]: http://git.gnome.org/browse/libxslt/
|
2015-05-28 10:42:35 +08:00
|
|
|
[3]: https://github.com/scrapy/scrapy/raw/master/extras/scrapy_bash_completion
|