self-hosted/dockerfiles

mirror of https://github.com/vimagick/dockerfiles.git synced 2024-12-23 01:39:27 +02:00

kev ef1887567c add llama.cpp

2024-08-20 18:31:26 +08:00

610 B

Raw Blame History

llama.cpp

llama.cpp is to enable LLM inference with minimal setup and state-of-the-art performance on a wide variety of hardware - locally and in the cloud.

$ mkdir -p data

$ wget -P data https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q2_K.gguf

$ docker compose up -d

$ curl --request POST \
    --url http://localhost:8080/completion \
    --header "Content-Type: application/json" \
    --data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'