Commit graph

21 commits

Author SHA1 Message Date
Andrei Betlen 9eafc4c49a Refactor server to use factory 2023-05-01 22:38:46 -04:00
Andrei Betlen 9ff9cdd7fc Fix import error 2023-05-01 15:11:15 -04:00
Lucas Doyle efe8e6f879 llama_cpp server: slight refactor to init_llama function
Define an init_llama function that starts llama with supplied settings instead of just doing it in the global context of app.py

This allows the test to be less brittle by not needing to mess with os.environ, then importing the app
2023-04-29 11:42:23 -07:00
Lucas Doyle 6d8db9d017 tests: simple test for server module 2023-04-29 11:42:20 -07:00
Lucas Doyle 468377b0e2 llama_cpp server: app is now importable, still runnable as a module 2023-04-29 11:41:25 -07:00
Andrei Betlen 3cab3ef4cb Update n_batch for server 2023-04-25 09:11:32 -04:00
Andrei Betlen e4647c75ec Add use_mmap flag to server 2023-04-19 15:57:46 -04:00
Andrei Betlen 92c077136d Add experimental cache 2023-04-15 12:03:09 -04:00
Andrei Betlen 6c7cec0c65 Fix completion request 2023-04-14 10:01:15 -04:00
Andrei Betlen 4f5f99ef2a Formatting 2023-04-12 22:40:12 -04:00
Andrei Betlen 0daf16defc Enable logprobs on completion endpoint 2023-04-12 19:08:11 -04:00
Andrei Betlen 19598ac4e8 Fix threading bug. Closes #62 2023-04-12 19:07:53 -04:00
Andrei Betlen b3805bb9cc Implement logprobs parameter for text completion. Closes #2 2023-04-12 14:05:11 -04:00
Andrei Betlen 213cc5c340 Remove async from function signature to avoid blocking the server 2023-04-11 11:54:31 -04:00
Andrei Betlen 0067c1a588 Formatting 2023-04-08 16:01:18 -04:00
Andrei Betlen da539cc2ee Safer calculation of default n_threads 2023-04-06 21:22:19 -04:00
Andrei Betlen 930db37dd2 Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-04-06 21:07:38 -04:00
Andrei Betlen 55279b679d Handle prompt list 2023-04-06 21:07:35 -04:00
MillionthOdin16 c283edd7f2 Set n_batch to default values and reduce thread count:
Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default.

Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%
2023-04-05 18:17:29 -04:00
MillionthOdin16 76a82babef Set n_batch to the default value of 8. I think this is leftover from when n_ctx was missing and n_batch was 2048. 2023-04-05 17:44:53 -04:00
Andrei Betlen 44448fb3a8 Add server as a subpackage 2023-04-05 16:23:25 -04:00