1
0
mirror of https://github.com/vimagick/dockerfiles.git synced 2024-11-24 08:52:31 +02:00
dockerfiles/crawlee
2024-04-10 17:30:24 +08:00
..
.dockerignore
docker-compose.yml update crawlee 2023-03-08 15:51:09 +08:00
README.md update crawlee 2024-04-10 17:30:24 +08:00

crawlee

Crawlee is a web scraping and browser automation library Crawlee is a web scraping and browser automation library.

$ docker run --rm -it -v $PWD:/tmp apify/actor-node:20 sh
>>> export PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1
>>> export PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1
>>> npx crawlee create -t cheerio-js my-crawler
>>> mv my-crawler /tmp
>>> exit

$ docker-compose build my-crawler

$ docker-compose run --rm my-crawler

$ tree my-crawler/storage/
├── datasets
│   └── default
│       └── 000000001.json
├── key_value_stores
└── request_queues