Commit graph

70 commits

Author SHA1 Message Date
Mug 5f81400fcb Also ignore errors on input prompts 2023-04-26 14:45:51 +02:00
Mug be2c961bc9 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python 2023-04-26 14:38:09 +02:00
Mug c4a8491d42 Fix decode errors permanently 2023-04-26 14:37:06 +02:00
Andrei Betlen cc706fb944 Add ctx check and re-order __init__. Closes #112 2023-04-25 09:00:53 -04:00
Andrei Betlen d484c5634e Bugfix: Check cache keys as prefix to prompt tokens 2023-04-24 22:18:54 -04:00
Andrei Betlen cbe95bbb75 Add cache implementation using llama state 2023-04-24 19:54:41 -04:00
Andrei Betlen 2c359a28ff Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-04-24 17:51:27 -04:00
Andrei Betlen 197cf80601 Add save/load state api for Llama class 2023-04-24 17:51:25 -04:00
Andrei Betlen 86f8e5ad91 Refactor internal state for Llama class 2023-04-24 15:47:54 -04:00
Andrei f37456133a
Merge pull request #108 from eiery/main
Update n_batch default to 512 to match upstream llama.cpp
2023-04-24 13:48:09 -04:00
eiery aa12d8a81f
Update llama.py
update n_batch default to 512 to match upstream llama.cpp
2023-04-23 20:56:40 -04:00
Andrei Betlen 7230599593 Disable mmap when applying lora weights. Closes #107 2023-04-23 14:53:17 -04:00
Andrei Betlen 0df4d69c20 If lora base is not set avoid re-loading the model by passing NULL 2023-04-18 23:45:25 -04:00
Andrei Betlen 453e517fd5 Add seperate lora_base path for applying LoRA to quantized models using original unquantized model weights. 2023-04-18 10:20:46 -04:00
Andrei Betlen eb7f278cc6 Add lora_path parameter to Llama model 2023-04-18 01:43:44 -04:00
Andrei Betlen 89856ef00d Bugfix: only eval new tokens 2023-04-15 17:32:53 -04:00
Andrei Betlen 92c077136d Add experimental cache 2023-04-15 12:03:09 -04:00
Andrei Betlen a6372a7ae5 Update stop sequences for chat 2023-04-15 12:02:48 -04:00
Andrei Betlen 83b2be6dc4 Update chat parameters 2023-04-15 11:58:43 -04:00
Andrei Betlen 62087514c6 Update chat prompt 2023-04-15 11:58:19 -04:00
Andrei Betlen 02f9fb82fb Bugfix 2023-04-15 11:39:52 -04:00
Andrei Betlen 3cd67c7bd7 Add type annotations 2023-04-15 11:39:21 -04:00
Andrei Betlen d7de0e8014 Bugfix 2023-04-15 00:08:04 -04:00
Andrei Betlen e90e122f2a Use clear 2023-04-14 23:33:18 -04:00
Andrei Betlen ac7068a469 Track generated tokens internally 2023-04-14 23:33:00 -04:00
Andrei Betlen 6e298d8fca Set kv cache size to f16 by default 2023-04-14 22:21:19 -04:00
Andrei Betlen 6153baab2d Clean up logprobs implementation 2023-04-14 09:59:33 -04:00
Andrei Betlen 26cc4ee029 Fix signature for stop parameter 2023-04-14 09:59:08 -04:00
Andrei Betlen 6595ad84bf Add field to disable reseting between generations 2023-04-13 00:28:00 -04:00
Andrei Betlen 22fa5a621f Revert "Deprecate generate method"
This reverts commit 6cf5876538.
2023-04-13 00:19:55 -04:00
Andrei Betlen c854c2564b Don't serialize stateful parameters 2023-04-12 14:07:14 -04:00
Andrei Betlen 2f9b649005 Style fix 2023-04-12 14:06:22 -04:00
Andrei Betlen 6cf5876538 Deprecate generate method 2023-04-12 14:06:04 -04:00
Andrei Betlen b3805bb9cc Implement logprobs parameter for text completion. Closes #2 2023-04-12 14:05:11 -04:00
Andrei Betlen 1f67ad2a0b Add use_mmap option 2023-04-10 02:11:35 -04:00
Andrei Betlen 314ce7d1cc Fix cpu count default 2023-04-08 19:54:04 -04:00
Andrei Betlen 3fbc06361f Formatting 2023-04-08 16:01:45 -04:00
Andrei Betlen e96a5c5722 Make Llama instance pickleable. Closes #27 2023-04-05 06:52:17 -04:00
Andrei Betlen cefc69ea43 Add runtime check to ensure embedding is enabled if trying to generate embeddings 2023-04-05 03:25:37 -04:00
Andrei Betlen 5c50af7462 Remove workaround 2023-04-05 03:25:09 -04:00
Andrei Betlen c137789143 Add verbose flag. Closes #19 2023-04-04 13:09:24 -04:00
Andrei Betlen 5075c16fcc Bugfix: n_batch should always be <= n_ctx 2023-04-04 13:08:21 -04:00
Andrei Betlen caf3c0362b Add return type for default __call__ method 2023-04-03 20:26:08 -04:00
Andrei Betlen 4aa349d777 Add docstring for create_chat_completion 2023-04-03 20:24:20 -04:00
Andrei Betlen 7fedf16531 Add support for chat completion 2023-04-03 20:12:44 -04:00
Andrei Betlen 3dec778c90 Update to more sensible return signature 2023-04-03 20:12:14 -04:00
Andrei Betlen ae004eb69e Fix #16 2023-04-03 18:46:19 -04:00
Andrei Betlen 4f509b963e Bugfix: Stop sequences and missing max_tokens check 2023-04-02 03:59:19 -04:00
Andrei Betlen 353e18a781 Move workaround to new sample method 2023-04-02 00:06:34 -04:00
Andrei Betlen a4a1bbeaa9 Update api to allow for easier interactive mode 2023-04-02 00:02:47 -04:00