2023-03-08 09:03:19 +02:00
|
|
|
crawlee
|
|
|
|
=======
|
|
|
|
|
|
|
|
[Crawlee][1] is a web scraping and browser automation library Crawlee is a web
|
|
|
|
scraping and browser automation library.
|
|
|
|
|
|
|
|
```bash
|
2024-04-10 11:30:24 +02:00
|
|
|
$ docker run --rm -it -v $PWD:/tmp apify/actor-node:20 sh
|
2023-03-08 10:07:39 +02:00
|
|
|
>>> export PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD=1
|
|
|
|
>>> export PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=1
|
|
|
|
>>> npx crawlee create -t cheerio-js my-crawler
|
|
|
|
>>> mv my-crawler /tmp
|
|
|
|
>>> exit
|
2023-03-08 09:03:19 +02:00
|
|
|
|
2023-03-08 09:51:09 +02:00
|
|
|
$ docker-compose build my-crawler
|
2023-03-08 09:03:19 +02:00
|
|
|
|
2023-03-08 09:51:09 +02:00
|
|
|
$ docker-compose run --rm my-crawler
|
|
|
|
|
|
|
|
$ tree my-crawler/storage/
|
2023-03-08 09:03:19 +02:00
|
|
|
├── datasets
|
|
|
|
│ └── default
|
2023-03-08 09:51:09 +02:00
|
|
|
│ └── 000000001.json
|
2023-03-08 09:03:19 +02:00
|
|
|
├── key_value_stores
|
|
|
|
└── request_queues
|
|
|
|
```
|
|
|
|
|
|
|
|
[1]: https://crawlee.dev/
|