1
0
mirror of https://github.com/vimagick/dockerfiles.git synced 2025-01-10 04:19:24 +02:00
dockerfiles/llama.cpp/README.md

22 lines
610 B
Markdown
Raw Normal View History

2024-08-20 12:31:26 +02:00
llama.cpp
=========
[llama.cpp][1] is to enable LLM inference with minimal setup and
state-of-the-art performance on a wide variety of hardware - locally and in the
cloud.
```bash
$ mkdir -p data
$ wget -P data https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q2_K.gguf
$ docker compose up -d
$ curl --request POST \
--url http://localhost:8080/completion \
--header "Content-Type: application/json" \
--data '{"prompt": "Building a website can be done in 10 simple steps:","n_predict": 128}'
```
[1]: https://github.com/ggerganov/llama.cpp