1
0
mirror of https://github.com/vimagick/dockerfiles.git synced 2025-02-09 13:47:10 +02:00
dockerfiles/scrapy/README.md

28 lines
763 B
Markdown
Raw Normal View History

2015-05-28 09:29:27 +08:00
## WHAT-IS
`Scrapy`: An open source and collaborative framework for extracting the data
you need from websites. In a fast, simple, yet extensible way.
2015-05-28 09:07:32 +08:00
2015-05-28 12:21:17 +08:00
This image is based on `debian:jessie`, and it only takes 278.6 MB.
You can create a scrapy (v0.24.6) project on top of this image.
2015-05-28 09:07:32 +08:00
2015-05-28 09:29:27 +08:00
## HOW-TO
```
2015-05-28 12:21:17 +08:00
$ docker run --name scrapy -it vimagick/scrapy
>>> scrapy startproject demo
>>> cd demo
>>> scrapy genspider example example.com
>>> scrapy edit example
>>> scrapy crawl example
2015-05-28 09:29:27 +08:00
```
## TODO-LIST
2015-05-28 10:42:35 +08:00
- [x] build [libxml2][1]/[libxslt][2] from source
- [x] add [scrapy_bash_completion][3] script
2015-05-28 09:29:27 +08:00
[1]: http://www.xmlsoft.org/downloads.html
[2]: http://git.gnome.org/browse/libxslt/
2015-05-28 10:42:35 +08:00
[3]: https://github.com/scrapy/scrapy/raw/master/extras/scrapy_bash_completion