* consolidated endpoints, added live configuration * added ml settings to server * added settings dashboard * updated deps, fixed typos * simplified modelconfig updated tests * Added ml setting accordion for admin page updated tests * merge `clipText` and `clipVision` * added face distance setting clarified setting * add clip mode in request, dropdown for face models * polished ml settings updated descriptions * update clip field on error * removed unused import * add description for image classification threshold * pin safetensors for arm wheel updated poetry lock * moved dto * set model type only in ml repository * revert form-data package install use fetch instead of axios * added slotted description with link updated facial recognition description clarified effect of disabling tasks * validation before model load * removed unnecessary getconfig call * added migration * updated api updated api updated api --------- Co-authored-by: Alex Tran <alex.tran1502@gmail.com>
Immich Machine Learning
- Image classification
- CLIP embeddings
- Facial recognition
Setup
This project uses Poetry, so be sure to install it first.
Running poetry install --no-root --with dev
will install everything you need in an isolated virtual environment.
To add or remove dependencies, you can use the commands poetry add $PACKAGE_NAME
and poetry remove $PACKAGE_NAME
, respectively.
Be sure to commit the poetry.lock
and pyproject.toml
files to reflect any changes in dependencies.
Load Testing
To measure inference throughput and latency, you can use Locust using the provided locustfile.py
.
Locust works by querying the model endpoints and aggregating their statistics, meaning the app must be deployed.
You can run load_test.sh
to automatically deploy the app locally and start Locust, optionally adjusting its env variables as needed.
Alternatively, for more custom testing, you may also run locust
directly: see the documentation. Note that in Locust's jargon, concurrency is measured in users
, and each user runs one task at a time. To achieve a particular per-endpoint concurrency, multiply that number by the number of endpoints to be queried. For example, if there are 3 endpoints and you want each of them to receive 8 requests at a time, you should set the number of users to 24.