1
0
mirror of https://github.com/vimagick/dockerfiles.git synced 2024-12-12 11:14:57 +02:00
dockerfiles/crawlee/README.md
2023-03-08 15:51:09 +08:00

23 lines
591 B
Markdown

crawlee
=======
[Crawlee][1] is a web scraping and browser automation library Crawlee is a web
scraping and browser automation library.
```bash
$ docker run --rm -e PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1 -e PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1 -v $PWD:/tmp -w /tmp apify/actor-node:16 npx crawlee create -t cheerio-js my-crawler
$ docker-compose build my-crawler
$ docker-compose run --rm my-crawler
$ tree my-crawler/storage/
├── datasets
│   └── default
│   └── 000000001.json
├── key_value_stores
└── request_queues
```
[1]: https://crawlee.dev/