Commit graph

98 commits

Author SHA1 Message Date
Andrei Betlen cdeaded251 Bugfix: Ensure logs are printed when streaming 2023-05-10 16:12:17 -04:00
Andrei Betlen d957422bf4 Implement sampling as in llama.cpp main example 2023-05-08 21:21:25 -04:00
Andrei Betlen 93a9019bb1 Merge branch 'main' of github.com:abetlen/llama_cpp_python into Maximilian-Winter/main 2023-05-08 19:57:09 -04:00
Andrei Betlen 65d9cc050c Add openai frequency and presence penalty parameters. Closes #169 2023-05-08 01:30:18 -04:00
Andrei Betlen e72f58614b Change pointer to lower overhead byref 2023-05-07 20:01:34 -04:00
Andrei Betlen 0e94a70de1 Add in-memory longest prefix cache. Closes #158 2023-05-07 19:31:26 -04:00
Andrei Betlen 2753b85321 Format 2023-05-07 13:19:56 -04:00
Andrei Betlen 7c3743fe5f Update llama.cpp 2023-05-07 00:12:47 -04:00
Andrei Betlen bc853e3742 Fix type for eval_logits in LlamaState object 2023-05-06 21:32:50 -04:00
Maximilian Winter 515d9bde7e Fixed somethings and activated cublas 2023-05-06 23:40:19 +02:00
Maximilian Winter aa203a0d65 Added mirostat sampling to the high level API. 2023-05-06 22:47:47 +02:00
Andrei Betlen 98bbd1c6a8 Fix eval logits type 2023-05-05 14:23:14 -04:00
Andrei Betlen 66e28eb548 Fix temperature bug 2023-05-05 14:00:41 -04:00
Andrei Betlen b6a9a0b6ba Add types for all low-level api functions 2023-05-05 12:22:27 -04:00
Andrei Betlen 5be0efa5f8 Cache should raise KeyError when key is missing 2023-05-05 12:21:49 -04:00
Andrei Betlen 853dc711cc Format 2023-05-04 21:58:36 -04:00
Andrei Betlen 97c6372350 Rewind model to longest prefix. 2023-05-04 21:58:27 -04:00
Andrei Betlen 329297fafb Bugfix: Missing logits_to_logprobs 2023-05-04 12:18:40 -04:00
Andrei Betlen 9e5b6d675a Improve logging messages 2023-05-03 10:28:10 -04:00
Andrei Betlen 43f2907e3a Support smaller state sizes 2023-05-03 09:33:50 -04:00
Andrei Betlen dd9ad1c759 Formatting 2023-05-01 21:51:16 -04:00
Andrei Betlen b6747f722e Fix logprob calculation. Fixes #134 2023-05-01 17:45:08 -04:00
Andrei Betlen 350a1769e1 Update sampling api 2023-05-01 14:47:55 -04:00
Mug 18a0c10032 Remove excessive errors="ignore" and add utf8 test 2023-04-29 12:19:22 +02:00
Mug b7d14efc8b Python weirdness 2023-04-28 13:20:31 +02:00
Mug eed61289b6 Dont detect off tokens, detect off detokenized utf8 2023-04-28 13:16:18 +02:00
Mug 3a98747026 One day, i'll fix off by 1 errors permanently too 2023-04-28 12:54:28 +02:00
Mug c39547a986 Detect multi-byte responses and wait 2023-04-28 12:50:30 +02:00
Mug 5f81400fcb Also ignore errors on input prompts 2023-04-26 14:45:51 +02:00
Mug be2c961bc9 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python 2023-04-26 14:38:09 +02:00
Mug c4a8491d42 Fix decode errors permanently 2023-04-26 14:37:06 +02:00
Andrei Betlen cc706fb944 Add ctx check and re-order __init__. Closes #112 2023-04-25 09:00:53 -04:00
Andrei Betlen d484c5634e Bugfix: Check cache keys as prefix to prompt tokens 2023-04-24 22:18:54 -04:00
Andrei Betlen cbe95bbb75 Add cache implementation using llama state 2023-04-24 19:54:41 -04:00
Andrei Betlen 2c359a28ff Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-04-24 17:51:27 -04:00
Andrei Betlen 197cf80601 Add save/load state api for Llama class 2023-04-24 17:51:25 -04:00
Andrei Betlen 86f8e5ad91 Refactor internal state for Llama class 2023-04-24 15:47:54 -04:00
Andrei f37456133a
Merge pull request #108 from eiery/main
Update n_batch default to 512 to match upstream llama.cpp
2023-04-24 13:48:09 -04:00
eiery aa12d8a81f
Update llama.py
update n_batch default to 512 to match upstream llama.cpp
2023-04-23 20:56:40 -04:00
Andrei Betlen 7230599593 Disable mmap when applying lora weights. Closes #107 2023-04-23 14:53:17 -04:00
Andrei Betlen 0df4d69c20 If lora base is not set avoid re-loading the model by passing NULL 2023-04-18 23:45:25 -04:00
Andrei Betlen 453e517fd5 Add seperate lora_base path for applying LoRA to quantized models using original unquantized model weights. 2023-04-18 10:20:46 -04:00
Andrei Betlen eb7f278cc6 Add lora_path parameter to Llama model 2023-04-18 01:43:44 -04:00
Andrei Betlen 89856ef00d Bugfix: only eval new tokens 2023-04-15 17:32:53 -04:00
Andrei Betlen 92c077136d Add experimental cache 2023-04-15 12:03:09 -04:00
Andrei Betlen a6372a7ae5 Update stop sequences for chat 2023-04-15 12:02:48 -04:00
Andrei Betlen 83b2be6dc4 Update chat parameters 2023-04-15 11:58:43 -04:00
Andrei Betlen 62087514c6 Update chat prompt 2023-04-15 11:58:19 -04:00
Andrei Betlen 02f9fb82fb Bugfix 2023-04-15 11:39:52 -04:00
Andrei Betlen 3cd67c7bd7 Add type annotations 2023-04-15 11:39:21 -04:00