diff --git a/.gitignore b/.gitignore index 8db9bcb..79093b4 100644 --- a/.gitignore +++ b/.gitignore @@ -165,5 +165,5 @@ cython_debug/ # option (not recommended) you can uncomment the following to ignore the entire idea folder. .idea/ -# model .bin files -docker/auto_docker/*.bin +# downloaded model .bin files +docker/open_llama/*.bin diff --git a/docker/README.md b/docker/README.md index f4954d1..c7e92d0 100644 --- a/docker/README.md +++ b/docker/README.md @@ -7,16 +7,21 @@ **Note #2:** NVidia GPU CuBLAS support requires a NVidia GPU with sufficient VRAM (approximately as much as the size above) and Docker NVidia support (see [container-toolkit/install-guide](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)) # Simple Dockerfiles for building the llama-cpp-python server with external model bin files -- `./openblas_simple/Dockerfile` - a simple Dockerfile for non-GPU OpenBLAS, where the model is located outside the Docker image - - `cd ./openblas_simple` - - `docker build -t openblas_simple .` - - `docker run -e USE_MLOCK=0 -e MODEL=/var/model/ -v :/var/model -t openblas_simple` - where `/` is the full path to the model file on the Docker host system. -- `./cuda_simple/Dockerfile` - a simple Dockerfile for CUDA accelerated CuBLAS, where the model is located outside the Docker image - - `cd ./cuda_simple` - - `docker build -t cuda_simple .` - - `docker run -e USE_MLOCK=0 -e MODEL=/var/model/ -v :/var/model -t cuda_simple` - where `/` is the full path to the model file on the Docker host system. +## openblas_simple - a simple Dockerfile for non-GPU OpenBLAS, where the model is located outside the Docker image +``` +cd ./openblas_simple +docker build -t openblas_simple . +docker run -e USE_MLOCK=0 -e MODEL=/var/model/ -v :/var/model -t openblas_simple +``` +where `/` is the full path to the model file on the Docker host system. + +## cuda_simple - a simple Dockerfile for CUDA accelerated CuBLAS, where the model is located outside the Docker image +``` +cd ./cuda_simple +docker build -t cuda_simple . +docker run -e USE_MLOCK=0 -e MODEL=/var/model/ -v :/var/model -t cuda_simple +``` +where `/` is the full path to the model file on the Docker host system. # "Open-Llama-in-a-box" - Download a MIT licensed Open Llama model and install into a Docker image that runs an OpenBLAS-enabled llama-cpp-python server ```