mirror of
https://github.com/laurent22/joplin.git
synced 2024-11-27 08:21:03 +02:00
Docs: Improve explanation for using custom OCR language data (#10910)
This commit is contained in:
parent
9dc29d034b
commit
e05d7197bd
@ -32,8 +32,19 @@ This pluggable interface is present in the software but not currently exposed. W
|
||||
|
||||
## Custom OCR language data URL
|
||||
|
||||
After enabling OCR, Joplin downloads language files from https://cdn.jsdelivr.net/npm/@tesseract.js-data/. This URL can be customized in settings > advanced > "OCR: Language data URL or path". This URL or path should point to a directory with a `.traineddata.gz` file for each language to be used for OCR.
|
||||
After enabling OCR, Joplin downloads language files from https://cdn.jsdelivr.net/npm/@tesseract.js-data/. This URL can be customized in settings > advanced > "OCR: Language data URL or path". This URL or path should point to a directory with a `.traineddata.gz` file for each language to be used for OCR. After the first download, language data files are cached.
|
||||
|
||||
For reference, an example `.traineddata.gz` file can be found [here](https://cdn.jsdelivr.net/npm/@tesseract.js-data/eng/4.0.0_best_int/eng.traineddata.gz).
|
||||
For example, to use OCR on a computer without internet access:
|
||||
1. Transfer the `.traineddata.gz` files for the languages that will be OCRed.
|
||||
- English: https://cdn.jsdelivr.net/npm/@tesseract.js-data/eng/4.0.0_best_int/eng.traineddata.gz
|
||||
- French: https://cdn.jsdelivr.net/npm/@tesseract.js-data/fra/4.0.0_best_int/fra.traineddata.gz
|
||||
- In general, trained data can be obtained from `https://cdn.jsdelivr.net/npm/@tesseract.js-data/[language]/4.0.0_best_int/[language].traineddata.gz` where `[language]` should be replaced with `eng`, `fra`, `chi_sim`, `deu`, `spa`, or one of the other supported language codes.
|
||||
2. Transfer the `.traineddata.gz` files to the offline computer.
|
||||
3. Move all of the files to the same directory (e.g. `C:\Users\User\Documents\joplin-ocr-data\`).
|
||||
4. In Joplin, open settings > general > advanced.
|
||||
5. Set the "OCR: Language data URL or path" to the filepath of the directory with training data.
|
||||
- This should be the path to the directory selected in step 3.
|
||||
6. Click "Apply".
|
||||
7. Enable OCR.
|
||||
|
||||
To fully replace the cached language data with custom data, it may be necessary to click "Clear cache and re-download language data files".
|
||||
To replace existing cached language data, click "Clear cache and re-download language data files".
|
||||
|
Loading…
Reference in New Issue
Block a user