mirror of
https://github.com/MontFerret/ferret.git
synced 2025-07-15 01:25:00 +02:00
Grammar Check
This commit is contained in:
16
README.md
16
README.md
@ -3,15 +3,15 @@
|
|||||||

|

|
||||||
|
|
||||||
## What is it?
|
## What is it?
|
||||||
```ferret``` is a web scraping system aiming to simplify data extraction from the web for such things like ui testing, machine learning and analytics.
|
```ferret``` is a web scraping system aiming to simplify data extraction from the web for such things like UI testing, machine learning and analytics.
|
||||||
Having it's own declarative language, ```ferret``` abstracts away technical details and complexity of the underlying technologies, helping to focus on the data itself.
|
Having its own declarative language, ```ferret``` abstracts away technical details and complexity of the underlying technologies, helping to focus on the data itself.
|
||||||
It's extremely portable, extensible and fast.
|
It's extremely portable, extensible and fast.
|
||||||
|
|
||||||
## Show me some code
|
## Show me some code
|
||||||
The following example demonstrates the use of dynamic pages.
|
The following example demonstrates the use of dynamic pages.
|
||||||
First of all, we load the main Google Search page, type search criteria into an input box and then click a search button.
|
First of all, we load the main Google Search page, type search criteria into an input box and then click a search button.
|
||||||
The click action triggers a redirect, so we wait till its end.
|
The click action triggers a redirect, so we wait till its end.
|
||||||
Once the page gets loaded, we iterate over all elements in search results and assign output to a variable.
|
Once the page gets loaded, we iterate over all elements in search results and assign the output to a variable.
|
||||||
The final for loop filters out empty elements that might be because of inaccurate use of selectors.
|
The final for loop filters out empty elements that might be because of inaccurate use of selectors.
|
||||||
|
|
||||||
```aql
|
```aql
|
||||||
@ -49,14 +49,14 @@ RETURN (
|
|||||||
Nowadays data is everything and who owns data - owns the world.
|
Nowadays data is everything and who owns data - owns the world.
|
||||||
I have worked on multiple data-driven projects where data was an essential part of a system and I realized how cumbersome writing tons of scrapers is.
|
I have worked on multiple data-driven projects where data was an essential part of a system and I realized how cumbersome writing tons of scrapers is.
|
||||||
After some time looking for a tool that would let me to not write a code, but just express what data I need, decided to come up with my own solution.
|
After some time looking for a tool that would let me to not write a code, but just express what data I need, decided to come up with my own solution.
|
||||||
```ferret``` project is an ambitious initiative trying to bring universal platform for writing scrapers without any hassle.
|
```ferret``` project is an ambitious initiative trying to bring the universal platform for writing scrapers without any hassle.
|
||||||
|
|
||||||
## Inspiration
|
## Inspiration
|
||||||
FQL (Ferret Query Language) is heavily inspired by [AQL](https://www.arangodb.com/) (ArangoDB Query Language).
|
FQL (Ferret Query Language) is heavily inspired by [AQL](https://www.arangodb.com/) (ArangoDB Query Language).
|
||||||
But due to the domain specifics, there are some differences in how things work.
|
But due to the domain specifics, there are some differences in how things work.
|
||||||
|
|
||||||
## WIP
|
## WIP
|
||||||
Be aware, the the project is under heavy development. There is no documentation and some things may change in the final release.
|
Be aware, that the project is under heavy development. There is no documentation and some things may change in the final release.
|
||||||
For query syntax, you may go to [ArangoDB web site](https://docs.arangodb.com/3.3/AQL/index.html) and use AQL docs as docs for FQL - since they are identical.
|
For query syntax, you may go to [ArangoDB web site](https://docs.arangodb.com/3.3/AQL/index.html) and use AQL docs as docs for FQL - since they are identical.
|
||||||
|
|
||||||
|
|
||||||
@ -107,7 +107,7 @@ Please use `Ctrl-D` to exit this program.
|
|||||||
|
|
||||||
```
|
```
|
||||||
|
|
||||||
**Note:** symbol ```%``` is used to start and end multi line queries. You also can use heredoc format.
|
**Note:** symbol ```%``` is used to start and end multi-line queries. You also can use the heredoc format.
|
||||||
|
|
||||||
If you want to execute a query stored in a file, just pass a file name:
|
If you want to execute a query stored in a file, just pass a file name:
|
||||||
|
|
||||||
@ -126,7 +126,7 @@ ferret < ./docs/examples/static-page.fql
|
|||||||
|
|
||||||
### Browser mode
|
### Browser mode
|
||||||
|
|
||||||
By default, ``ferret`` loads HTML pages via http protocol, because it's faster.
|
By default, ``ferret`` loads HTML pages via HTTP protocol, because it's faster.
|
||||||
But nowadays, there are more and more websites rendered with JavaScript, and therefore, this 'old school' approach does not really work.
|
But nowadays, there are more and more websites rendered with JavaScript, and therefore, this 'old school' approach does not really work.
|
||||||
For such cases, you may fetch documents using Chrome or Chromium via Chrome DevTools protocol (aka CDP).
|
For such cases, you may fetch documents using Chrome or Chromium via Chrome DevTools protocol (aka CDP).
|
||||||
First, you need to make sure that you launched Chrome with ```remote-debugging-port=9222``` flag.
|
First, you need to make sure that you launched Chrome with ```remote-debugging-port=9222``` flag.
|
||||||
@ -345,7 +345,7 @@ func getStrings() ([]string, error) {
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
On top of that, you can completely turn off standard library, by passing the following option:
|
On top of that, you can completely turn off the standard library, bypassing the following option:
|
||||||
|
|
||||||
```go
|
```go
|
||||||
comp := compiler.New(compiler.WithoutStdlib())
|
comp := compiler.New(compiler.WithoutStdlib())
|
||||||
|
Reference in New Issue
Block a user