Commit graph

514 commits

Author SHA1 Message Date
c0sogi 1551ba10bd Added RouteErrorHandler for server 2023-07-16 14:57:39 +09:00
Andrei Betlen 8ab098e49d Re-order Llama class params 2023-07-15 15:35:08 -04:00
Andrei Betlen e4f9db37db Fix context_params struct layout 2023-07-15 15:34:55 -04:00
Andrei Betlen f0797a6054 Merge branch main into custom_rope 2023-07-15 15:11:01 -04:00
randoentity 3f8f276f9f Add bindings for custom_rope 2023-07-10 17:37:46 +02:00
Andrei Betlen a86bfdf0a5 bugfix: truncate completion max_tokens to fit context length by default 2023-07-09 18:13:29 -04:00
Andrei Betlen 6f70cc4b7d bugfix: pydantic settings missing / changed fields 2023-07-09 18:03:31 -04:00
Andrei 5d756de314
Merge branch 'main' into add_unlimited_max_tokens 2023-07-08 02:37:38 -04:00
Andrei b8e0bed295
Merge pull request #453 from wu-qing-157/main
Fix incorrect token_logprobs (due to indexing after sorting)
2023-07-08 02:31:52 -04:00
Andrei Betlen d6e6aad927 bugfix: fix compatibility bug with openai api on last token 2023-07-08 00:06:11 -04:00
Andrei Betlen 4f2b5d0b53 Format 2023-07-08 00:05:10 -04:00
Andrei Betlen 34c505edf2 perf: convert pointer to byref 2023-07-07 22:54:07 -04:00
Andrei Betlen 52753b77f5 Upgrade fastapi to 0.100.0 and pydantic v2 2023-07-07 21:38:46 -04:00
Andrei Betlen 11eae75211 perf: avoid allocating new buffers during sampling 2023-07-07 19:28:53 -04:00
Andrei Betlen a14d8a9b3f perf: assign to candidates data structure instead 2023-07-07 18:58:43 -04:00
wu-qing-157 9e61661518 fix indexing token_logprobs after sorting 2023-07-07 10:18:49 +00:00
Andrei Betlen 57d8ec3899 Add setting to control request interruption 2023-07-07 03:37:23 -04:00
Andrei Betlen 4c7cdcca00 Add interruptible streaming requests for llama-cpp-python server. Closes #183 2023-07-07 03:04:17 -04:00
Andrei Betlen 98ae4e58a3 Update llama.cpp 2023-07-06 17:57:56 -04:00
Andrei Betlen b994296c75 Update llama.cpp 2023-07-05 01:00:14 -04:00
Andrei Betlen c67f786360 Update llama.cpp 2023-06-29 01:08:15 -04:00
Andrei Betlen e34f4414cf Hotfix: logits_all bug 2023-06-29 00:57:27 -04:00
Andrei Betlen a2ede37bd5 Load logits directly into scores buffer 2023-06-29 00:45:46 -04:00
Andrei Betlen b95b0ffbeb Use pre-allocated buffers to store input_ids and scores 2023-06-29 00:40:47 -04:00
Andrei Betlen a5e059c053 Free model when llama is unloaded. Closes #434 2023-06-28 23:58:55 -04:00
Andrei Betlen 3379dc40a1 Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-06-26 08:50:48 -04:00
Andrei Betlen 952228407e Update llama.cpp 2023-06-26 08:50:38 -04:00
Andrei Betlen b4a3db3e54 Update type signature 2023-06-26 08:50:30 -04:00
Andrei 5eb4ebb041
Merge branch 'main' into fix-state-pickle 2023-06-26 08:45:02 -04:00
samfundev d788fb49bf
Only concatenate after all batches are done 2023-06-24 15:51:46 -04:00
Andrei 877ca6d016
Merge branch 'main' into fix-state-pickle 2023-06-23 15:13:07 -04:00
Alexey 282698b6d3
server: pass seed param from command line to llama 2023-06-23 00:19:24 +04:00
Andrei Betlen e37798777e Update llama.cpp 2023-06-20 11:25:10 -04:00
Andrei Betlen d410f12fae Update docs. Closes #386 2023-06-17 13:38:48 -04:00
Andrei Betlen 9f528f4715 Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-06-17 13:37:17 -04:00
Andrei Betlen d7153abcf8 Update llama.cpp 2023-06-16 23:11:14 -04:00
imaprogrammer fd9f294b3a
Update llama.py: Added how many input tokens in ValueError exception 2023-06-16 14:11:57 +05:30
Andrei Betlen 1e20be6d0c Add low_vram to server settings 2023-06-14 22:13:42 -04:00
Andrei Betlen 44b83cada5 Add low_vram parameter 2023-06-14 22:12:33 -04:00
Andrei Betlen f7c5cfaf50 Format server options 2023-06-14 22:08:28 -04:00
Andrei Betlen 9c41a3e990 Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-06-14 21:50:43 -04:00
Andrei f568baeef1
Merge pull request #351 from player1537-forks/th/add-logits-bias-parameter
Add support for `logit_bias` and `logit_bias_type` parameters
2023-06-14 21:49:56 -04:00
Andrei Betlen f27393ab7e Add additional verbose logs for cache 2023-06-14 21:46:48 -04:00
Andrei Betlen 4cefb70cd0 Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-06-14 21:40:19 -04:00
Andrei Betlen 715f98c591 Update llama.cpp 2023-06-14 21:40:13 -04:00
Okabintaro 10b0cb727b fix: Make LLamaState pickable for disk cache
I fixed the issue by making the saved state a bytes object instead of the ctypes one which can't be pickled.
2023-06-13 12:03:31 +02:00
Gabor 3129a0e7e5 correction to add back environment variable support <3 docker 2023-06-11 01:11:24 +01:00
Gabor 3ea31930e5 fixes abetlen/llama-cpp-python #358 2023-06-11 00:58:08 +01:00
Andrei Betlen 21acd7901f Re-enable cache 2023-06-10 12:22:31 -04:00
Andrei Betlen 6639371407 Update llama.cpp 2023-06-10 12:17:38 -04:00
Tanner Hobson eb7645b3ba Add support for logit_bias and logit_bias_type parameters 2023-06-09 13:13:08 -04:00
Andrei Betlen 0da655b3be Temporarily disable cache until save state bug is fixed. 2023-06-09 11:10:24 -04:00
Andrei Betlen 556c7edf47 Truncate max_tokens if it exceeds context length 2023-06-09 10:57:36 -04:00
Andrei Betlen 0c42168508 Fix cache implementation breaking changes 2023-06-08 13:19:23 -04:00
Andrei Betlen 607d217caa Allow both .so and .dylib extensions for macos 2023-06-08 00:27:19 -04:00
Andrei 0f0b447fa4
Merge pull request #289 from Maximilian-Winter/main
Diskcache implementation for llama state.
2023-06-06 17:03:03 -04:00
Andrei d508573fb4
Merge pull request #328 from spirilis/mirostat
Added mirostat support for completions, chat completions API
2023-06-06 16:58:23 -04:00
Andrei Betlen aad4b17f52 Update llama.cpp 2023-06-06 16:23:55 -04:00
Andrei Betlen 8b4968ea67 Fix resize issue. Closes #330 2023-06-06 11:37:57 -04:00
Eric B 9b1c9e902c Added mirostat support for completions, chat completions API 2023-06-05 22:37:11 -04:00
Andrei Betlen 7b57420ea9 Update llama.cpp 2023-06-05 18:17:29 -04:00
Maximilian-Winter 29f9c9cca3 Added both LlamaChache classes Disk and RAM. 2023-05-31 22:33:56 +02:00
Maximilian Winter 9ea7a379d3
Merge branch 'abetlen:main' into main 2023-05-31 12:55:51 +02:00
Andrei 49fe9395a1
Merge pull request #277 from abetlen/add-numpy-support
Use numpy for internal buffers
2023-05-29 20:59:30 -04:00
Maximilian-Winter 719c3eae0a Diskcache implementation for llama state. 2023-05-28 15:56:38 +02:00
Andrei Betlen 80066f0b80 Use async routes 2023-05-27 09:12:58 -04:00
Andrei Betlen c2b59a5f59 Import unnused import 2023-05-26 22:59:29 -04:00
Andrei Betlen 8f2b4456ad Format 2023-05-26 22:04:31 -04:00
Andrei Betlen 84e313bd6e Align dtype to match c structs 2023-05-26 22:02:16 -04:00
Andrei Betlen 66bcb8d70d Merge branch 'main' into add-numpy-support 2023-05-26 20:25:03 -04:00
Andrei Betlen 8f35bddd7e Fix stop sequence performance bug. 2023-05-26 20:23:49 -04:00
Andrei Betlen 7fc7bc30e7 Remove usage of eval_tokens for cache check 2023-05-26 20:12:05 -04:00
Andrei Betlen fe331ec589 Replace eval_logits and eval_tokens with numpy arrays 2023-05-26 20:03:31 -04:00
Andrei Betlen 8eb9769f78 Add support for numpy 2023-05-26 16:12:45 -04:00
Andrei Betlen 4c1b7f7a76 Bugfix for logits_processor and stopping_criteria 2023-05-26 10:25:28 -04:00
Andrei Betlen 433a2e3e8a Add extra logits_processor and stopping_criteria 2023-05-26 03:13:24 -04:00
Andrei Betlen f74b90ed67 Fix streaming hang on last token when cache is on. 2023-05-26 03:03:01 -04:00
Andrei Betlen 5be8354e11 Added tokenizer 2023-05-26 03:00:51 -04:00
Andrei Betlen 8fa2ef1959 Format 2023-05-26 03:00:35 -04:00
Andrei Betlen 6bd1075291 Merge branch 'Maximilian-Winter/main' into main 2023-05-26 02:56:11 -04:00
Andrei Betlen ca01f98e09 Add LlamaTokenizer class 2023-05-25 14:11:33 -04:00
Andrei Betlen 1d247e0f35 Add StoppingCriteria and LogitsProcessor to generate to match huggingface API 2023-05-25 14:04:54 -04:00
Maximilian-Winter c2585b6889 Fixed list elements typing 2023-05-25 10:54:08 +02:00
Maximilian-Winter da463e6c8c Added types to logit processor list and stop criteria list 2023-05-25 09:07:16 +02:00
Maximilian-Winter c05fcdf42f Fixed none value of logits processors. 2023-05-24 22:02:06 +02:00
Maximilian-Winter 5bb780d455 Implemented logit processors and stop criteria's 2023-05-24 21:55:44 +02:00
Andrei Betlen fab064ded9 Remove unnecessary ffi calls 2023-05-23 17:56:21 -04:00
Andrei Betlen 0adb9ec37a Use model_name and index in response 2023-05-21 21:30:03 -04:00
Andrei Betlen 922b5b2bfd Merge branch 'main' into server-embedding 2023-05-21 21:21:38 -04:00
Andrei Betlen cd102e9da1 Cache shared library function calls for static tokens 2023-05-21 19:18:56 -04:00
Andrei Betlen b895511cca Fix penalize_nl 2023-05-21 18:38:06 -04:00
Andrei Betlen 03e2947b03 Fix unnecessary memory allocation while sampling 2023-05-21 18:36:34 -04:00
Andrei Betlen fafe47114c Update llama.cpp 2023-05-21 17:47:21 -04:00
Andrei Betlen 76b1d2cd20 Change properties to functions to match token functions 2023-05-20 08:24:06 -04:00
Andrei Betlen a7ba85834f Add n_ctx, n_vocab, and n_embd properties 2023-05-20 08:13:41 -04:00
Simon Chabot e783f1c191 feat: make embedding support list of string as input
makes the /v1/embedding route similar to OpenAI api.
2023-05-20 01:23:32 +02:00
Andrei Betlen 01a010be52 Fix llama_cpp and Llama type signatures. Closes #221 2023-05-19 11:59:33 -04:00
Andrei Betlen a8cd169251 Bugfix: Stop sequences can be strings 2023-05-19 03:15:08 -04:00
Andrei Betlen 17d4271b04 Fix logprobs for completions and implement for streaming logprobs. 2023-05-19 02:20:27 -04:00
Andrei Betlen a634a2453b Allow first logprob token to be null to match openai api 2023-05-19 02:04:57 -04:00
Andrei Betlen dc39cc0fa4 Use server sent events function for streaming completion 2023-05-19 02:04:30 -04:00
Andrei Betlen f0ec6e615e Stream tokens instead of text chunks 2023-05-18 11:35:59 -04:00
Andrei Betlen 21d8f5fa9f Remove unnused union 2023-05-18 11:35:15 -04:00
Andrei Betlen 61d58e7b35 Check for CUDA_PATH before adding 2023-05-17 15:26:38 -04:00
Andrei Betlen 7c95895626 Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-05-17 15:19:32 -04:00
Aneesh Joy e9794f91f2
Fixd CUBLAS dll load issue in Windows 2023-05-17 18:04:58 +01:00
Andrei Betlen 4f342795e5 Update token checks 2023-05-17 03:35:13 -04:00
Andrei Betlen f5c2f998ab Format 2023-05-17 02:00:39 -04:00
Andrei Betlen d28b753ed2 Implement penalize_nl 2023-05-17 01:53:26 -04:00
Andrei Betlen f11e2a781c Fix last_n_tokens_size 2023-05-17 01:42:51 -04:00
Andrei Betlen 7e55244540 Fix top_k value. Closes #220 2023-05-17 01:41:42 -04:00
Andrei Betlen a7c9e38287 Update variable name 2023-05-16 18:07:25 -04:00
Andrei Betlen a3352923c7 Add model_alias option to override model_path in completions. Closes #39 2023-05-16 17:22:00 -04:00
Andrei Betlen a65125c0bd Add sampling defaults for generate 2023-05-16 09:35:50 -04:00
Andrei Betlen cbac19bf24 Add winmode arg only on windows if python version supports it 2023-05-15 09:15:01 -04:00
Andrei Betlen c804efe3f0 Fix obscure Wndows DLL issue. Closes #208 2023-05-14 22:08:11 -04:00
Andrei Betlen cdf59768f5 Update llama.cpp 2023-05-14 00:04:22 -04:00
Andrei Betlen 7a536e86c2 Allow model to tokenize strings longer than context length and set add_bos. Closes #92 2023-05-12 14:28:22 -04:00
Andrei Betlen 8740ddc58e Only support generating one prompt at a time. 2023-05-12 07:21:46 -04:00
Andrei Betlen 8895b9002a Revert "llama_cpp server: prompt is a string". Closes #187
This reverts commit b9098b0ef7.
2023-05-12 07:16:57 -04:00
Andrei Betlen 7be584fe82 Add missing tfs_z paramter 2023-05-11 21:56:19 -04:00
Andrei Betlen cdeaded251 Bugfix: Ensure logs are printed when streaming 2023-05-10 16:12:17 -04:00
Lucas Doyle 02e8a018ae llama_cpp server: document presence_penalty and frequency_penalty, mark as supported 2023-05-09 16:25:00 -07:00
Andrei Betlen d957422bf4 Implement sampling as in llama.cpp main example 2023-05-08 21:21:25 -04:00
Andrei Betlen 93a9019bb1 Merge branch 'main' of github.com:abetlen/llama_cpp_python into Maximilian-Winter/main 2023-05-08 19:57:09 -04:00
Andrei Betlen 82d138fe54 Fix: default repeat_penalty 2023-05-08 18:49:11 -04:00
Andrei Betlen 29f094bbcf Bugfix: not falling back to environment variables when default is value is set. 2023-05-08 14:46:25 -04:00
Andrei Betlen 0d6c60097a Show default value when --help is called 2023-05-08 14:21:15 -04:00
Andrei Betlen 022e9ebcb8 Use environment variable if parsed cli arg is None 2023-05-08 14:20:53 -04:00
Andrei Betlen 0d751a69a7 Set repeat_penalty to 0 by default 2023-05-08 01:50:43 -04:00
Andrei Betlen 65d9cc050c Add openai frequency and presence penalty parameters. Closes #169 2023-05-08 01:30:18 -04:00
Andrei Betlen a0b61ea2a7 Bugfix for models endpoint 2023-05-07 20:17:52 -04:00
Andrei Betlen e72f58614b Change pointer to lower overhead byref 2023-05-07 20:01:34 -04:00
Andrei Betlen 14da46f16e Added cache size to settins object. 2023-05-07 19:33:17 -04:00
Andrei Betlen 0e94a70de1 Add in-memory longest prefix cache. Closes #158 2023-05-07 19:31:26 -04:00
Andrei Betlen 8dfde63255 Fix return type 2023-05-07 19:30:14 -04:00
Andrei Betlen 2753b85321 Format 2023-05-07 13:19:56 -04:00
Andrei Betlen 627811ea83 Add verbose flag to server 2023-05-07 05:09:10 -04:00
Andrei Betlen 3fbda71790 Fix mlock_supported and mmap_supported return type 2023-05-07 03:04:22 -04:00
Andrei Betlen 5a3413eee3 Update cpu_count 2023-05-07 03:03:57 -04:00
Andrei Betlen 1a00e452ea Update settings fields and defaults 2023-05-07 02:52:20 -04:00
Andrei Betlen 86753976c4 Revert "llama_cpp server: delete some ignored / unused parameters"
This reverts commit b47b9549d5.
2023-05-07 02:02:34 -04:00
Andrei Betlen c382d8f86a Revert "llama_cpp server: mark model as required"
This reverts commit e40fcb0575.
2023-05-07 02:00:22 -04:00
Andrei Betlen d8fddcce73 Merge branch 'main' of github.com:abetlen/llama_cpp_python into better-server-params-and-fields 2023-05-07 01:54:00 -04:00
Andrei Betlen 7c3743fe5f Update llama.cpp 2023-05-07 00:12:47 -04:00
Andrei Betlen bc853e3742 Fix type for eval_logits in LlamaState object 2023-05-06 21:32:50 -04:00
Maximilian Winter 515d9bde7e Fixed somethings and activated cublas 2023-05-06 23:40:19 +02:00
Maximilian Winter aa203a0d65 Added mirostat sampling to the high level API. 2023-05-06 22:47:47 +02:00
Andrei Betlen 98bbd1c6a8 Fix eval logits type 2023-05-05 14:23:14 -04:00
Andrei Betlen b5f3e74627 Add return type annotations for embeddings and logits 2023-05-05 14:22:55 -04:00
Andrei Betlen 3e28e0e50c Fix: runtime type errors 2023-05-05 14:12:26 -04:00
Andrei Betlen e24c3d7447 Prefer explicit imports 2023-05-05 14:05:31 -04:00
Andrei Betlen 40501435c1 Fix: types 2023-05-05 14:04:12 -04:00
Andrei Betlen 66e28eb548 Fix temperature bug 2023-05-05 14:00:41 -04:00
Andrei Betlen 6702d2abfd Fix candidates type 2023-05-05 14:00:30 -04:00
Andrei Betlen 5e7ddfc3d6 Fix llama_cpp types 2023-05-05 13:54:22 -04:00
Andrei Betlen b6a9a0b6ba Add types for all low-level api functions 2023-05-05 12:22:27 -04:00
Andrei Betlen 5be0efa5f8 Cache should raise KeyError when key is missing 2023-05-05 12:21:49 -04:00
Andrei Betlen 24fc38754b Add cli options to server. Closes #37 2023-05-05 12:08:28 -04:00
Andrei Betlen 853dc711cc Format 2023-05-04 21:58:36 -04:00
Andrei Betlen 97c6372350 Rewind model to longest prefix. 2023-05-04 21:58:27 -04:00
Andrei Betlen 329297fafb Bugfix: Missing logits_to_logprobs 2023-05-04 12:18:40 -04:00
Lucas Doyle 3008a954c1 Merge branch 'main' of github.com:abetlen/llama-cpp-python into better-server-params-and-fields 2023-05-03 13:10:03 -07:00
Andrei Betlen 9e5b6d675a Improve logging messages 2023-05-03 10:28:10 -04:00
Andrei Betlen 43f2907e3a Support smaller state sizes 2023-05-03 09:33:50 -04:00
Andrei Betlen 1d47cce222 Update llama.cpp 2023-05-03 09:33:30 -04:00
Lucas Doyle b9098b0ef7 llama_cpp server: prompt is a string
Not sure why this union type was here but taking a look at llama.py, prompt is only ever processed as a string for completion

This was breaking types when generating an openapi client
2023-05-02 14:47:07 -07:00
Matt Hoffner f97ff3c5bb
Update llama_cpp.py 2023-05-01 20:40:06 -07:00
Andrei 7ab08b8d10
Merge branch 'main' into better-server-params-and-fields 2023-05-01 22:45:57 -04:00
Andrei Betlen 9eafc4c49a Refactor server to use factory 2023-05-01 22:38:46 -04:00
Andrei Betlen dd9ad1c759 Formatting 2023-05-01 21:51:16 -04:00
Lucas Doyle dbbfc4ba2f llama_cpp server: fix to ChatCompletionRequestMessage
When I generate a client, it breaks because it fails to process the schema of ChatCompletionRequestMessage

These fix that:
- I think `Union[Literal["user"], Literal["channel"], ...]` is the same as Literal["user", "channel", ...]
- Turns out default value `Literal["user"]` isn't JSON serializable, so replace with "user"
2023-05-01 15:38:19 -07:00
Lucas Doyle fa2a61e065 llama_cpp server: fields for the embedding endpoint 2023-05-01 15:38:19 -07:00
Lucas Doyle 8dcbf65a45 llama_cpp server: define fields for chat completions
Slight refactor for common fields shared between completion and chat completion
2023-05-01 15:38:19 -07:00
Lucas Doyle 978b6daf93 llama_cpp server: add some more information to fields for completions 2023-05-01 15:38:19 -07:00
Lucas Doyle a5aa6c1478 llama_cpp server: add missing top_k param to CreateChatCompletionRequest
`llama.create_chat_completion` definitely has a `top_k` argument, but its missing from `CreateChatCompletionRequest`. decision: add it
2023-05-01 15:38:19 -07:00
Lucas Doyle 1e42913599 llama_cpp server: move logprobs to supported
I think this is actually supported (its in the arguments of `LLama.__call__`, which is how the completion is invoked). decision: mark as supported
2023-05-01 15:38:19 -07:00
Lucas Doyle b47b9549d5 llama_cpp server: delete some ignored / unused parameters
`n`, `presence_penalty`, `frequency_penalty`, `best_of`, `logit_bias`, `user`: not supported, excluded from the calls into llama. decision: delete it
2023-05-01 15:38:19 -07:00
Lucas Doyle e40fcb0575 llama_cpp server: mark model as required
`model` is ignored, but currently marked "optional"... on the one hand could mark "required" to make it explicit in case the server supports multiple llama's at the same time, but also could delete it since its ignored. decision: mark it required for the sake of openai api compatibility.

I think out of all parameters, `model` is probably the most important one for people to keep using even if its ignored for now.
2023-05-01 15:38:19 -07:00
Andrei Betlen b6747f722e Fix logprob calculation. Fixes #134 2023-05-01 17:45:08 -04:00
Andrei Betlen 9ff9cdd7fc Fix import error 2023-05-01 15:11:15 -04:00
Andrei Betlen 350a1769e1 Update sampling api 2023-05-01 14:47:55 -04:00
Andrei Betlen 7837c3fdc7 Fix return types and import comments 2023-05-01 14:02:06 -04:00
Andrei Betlen ccf1ed54ae Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-05-01 11:35:14 -04:00
Andrei Betlen 80184a286c Update llama.cpp 2023-05-01 10:44:28 -04:00
Lucas Doyle efe8e6f879 llama_cpp server: slight refactor to init_llama function
Define an init_llama function that starts llama with supplied settings instead of just doing it in the global context of app.py

This allows the test to be less brittle by not needing to mess with os.environ, then importing the app
2023-04-29 11:42:23 -07:00
Lucas Doyle 6d8db9d017 tests: simple test for server module 2023-04-29 11:42:20 -07:00
Lucas Doyle 468377b0e2 llama_cpp server: app is now importable, still runnable as a module 2023-04-29 11:41:25 -07:00
Andrei 755f9fa455
Merge pull request #118 from SagsMug/main
Fix UnicodeDecodeError permanently
2023-04-29 07:19:01 -04:00
Mug 18a0c10032 Remove excessive errors="ignore" and add utf8 test 2023-04-29 12:19:22 +02:00
Andrei Betlen ea0faabae1 Update llama.cpp 2023-04-28 15:32:43 -04:00
Mug b7d14efc8b Python weirdness 2023-04-28 13:20:31 +02:00
Mug eed61289b6 Dont detect off tokens, detect off detokenized utf8 2023-04-28 13:16:18 +02:00
Mug 3a98747026 One day, i'll fix off by 1 errors permanently too 2023-04-28 12:54:28 +02:00
Mug c39547a986 Detect multi-byte responses and wait 2023-04-28 12:50:30 +02:00
Andrei Betlen 9339929f56 Update llama.cpp 2023-04-26 20:00:54 -04:00
Mug 5f81400fcb Also ignore errors on input prompts 2023-04-26 14:45:51 +02:00
Mug be2c961bc9 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python 2023-04-26 14:38:09 +02:00
Mug c4a8491d42 Fix decode errors permanently 2023-04-26 14:37:06 +02:00
Andrei Betlen cbd26fdcc1 Update llama.cpp 2023-04-25 19:03:41 -04:00
Andrei Betlen 3cab3ef4cb Update n_batch for server 2023-04-25 09:11:32 -04:00
Andrei Betlen cc706fb944 Add ctx check and re-order __init__. Closes #112 2023-04-25 09:00:53 -04:00
Andrei Betlen d484c5634e Bugfix: Check cache keys as prefix to prompt tokens 2023-04-24 22:18:54 -04:00
Andrei Betlen cbe95bbb75 Add cache implementation using llama state 2023-04-24 19:54:41 -04:00
Andrei Betlen 2c359a28ff Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-04-24 17:51:27 -04:00
Andrei Betlen 197cf80601 Add save/load state api for Llama class 2023-04-24 17:51:25 -04:00
Andrei Betlen 86f8e5ad91 Refactor internal state for Llama class 2023-04-24 15:47:54 -04:00
Andrei f37456133a
Merge pull request #108 from eiery/main
Update n_batch default to 512 to match upstream llama.cpp
2023-04-24 13:48:09 -04:00
Andrei Betlen 02cf881317 Update llama.cpp 2023-04-24 09:30:10 -04:00
eiery aa12d8a81f
Update llama.py
update n_batch default to 512 to match upstream llama.cpp
2023-04-23 20:56:40 -04:00
Andrei Betlen 7230599593 Disable mmap when applying lora weights. Closes #107 2023-04-23 14:53:17 -04:00
Andrei Betlen e99caedbbd Update llama.cpp 2023-04-22 19:50:28 -04:00
Andrei Betlen 1eb130a6b2 Update llama.cpp 2023-04-21 17:40:27 -04:00
Andrei Betlen e4647c75ec Add use_mmap flag to server 2023-04-19 15:57:46 -04:00
Andrei Betlen 0df4d69c20 If lora base is not set avoid re-loading the model by passing NULL 2023-04-18 23:45:25 -04:00
Andrei Betlen 95c0dc134e Update type signature to allow for null pointer to be passed. 2023-04-18 23:44:46 -04:00
Andrei Betlen 453e517fd5 Add seperate lora_base path for applying LoRA to quantized models using original unquantized model weights. 2023-04-18 10:20:46 -04:00
Andrei Betlen eb7f278cc6 Add lora_path parameter to Llama model 2023-04-18 01:43:44 -04:00
Andrei Betlen 35abf89552 Add bindings for LoRA adapters. Closes #88 2023-04-18 01:30:04 -04:00
Andrei Betlen 89856ef00d Bugfix: only eval new tokens 2023-04-15 17:32:53 -04:00
Andrei Betlen 92c077136d Add experimental cache 2023-04-15 12:03:09 -04:00
Andrei Betlen a6372a7ae5 Update stop sequences for chat 2023-04-15 12:02:48 -04:00
Andrei Betlen 83b2be6dc4 Update chat parameters 2023-04-15 11:58:43 -04:00
Andrei Betlen 62087514c6 Update chat prompt 2023-04-15 11:58:19 -04:00
Andrei Betlen 02f9fb82fb Bugfix 2023-04-15 11:39:52 -04:00
Andrei Betlen 3cd67c7bd7 Add type annotations 2023-04-15 11:39:21 -04:00
Andrei Betlen d7de0e8014 Bugfix 2023-04-15 00:08:04 -04:00
Andrei Betlen e90e122f2a Use clear 2023-04-14 23:33:18 -04:00
Andrei Betlen ac7068a469 Track generated tokens internally 2023-04-14 23:33:00 -04:00
Andrei Betlen 6e298d8fca Set kv cache size to f16 by default 2023-04-14 22:21:19 -04:00
Andrei Betlen 6c7cec0c65 Fix completion request 2023-04-14 10:01:15 -04:00
Andrei Betlen 6153baab2d Clean up logprobs implementation 2023-04-14 09:59:33 -04:00
Andrei Betlen 26cc4ee029 Fix signature for stop parameter 2023-04-14 09:59:08 -04:00
Andrei Betlen 6595ad84bf Add field to disable reseting between generations 2023-04-13 00:28:00 -04:00
Andrei Betlen 22fa5a621f Revert "Deprecate generate method"
This reverts commit 6cf5876538.
2023-04-13 00:19:55 -04:00
Andrei Betlen 4f5f99ef2a Formatting 2023-04-12 22:40:12 -04:00
Andrei Betlen 0daf16defc Enable logprobs on completion endpoint 2023-04-12 19:08:11 -04:00
Andrei Betlen 19598ac4e8 Fix threading bug. Closes #62 2023-04-12 19:07:53 -04:00
Andrei Betlen 005c78d26c Update llama.cpp 2023-04-12 14:29:00 -04:00
Andrei Betlen c854c2564b Don't serialize stateful parameters 2023-04-12 14:07:14 -04:00
Andrei Betlen 2f9b649005 Style fix 2023-04-12 14:06:22 -04:00
Andrei Betlen 6cf5876538 Deprecate generate method 2023-04-12 14:06:04 -04:00
Andrei Betlen b3805bb9cc Implement logprobs parameter for text completion. Closes #2 2023-04-12 14:05:11 -04:00
Andrei Betlen 9f1e565594 Update llama.cpp 2023-04-11 11:59:03 -04:00
Andrei Betlen 213cc5c340 Remove async from function signature to avoid blocking the server 2023-04-11 11:54:31 -04:00
jm12138 90e1021154 Add unlimited max_tokens 2023-04-10 15:56:05 +00:00
Mug 2559e5af9b Changed the environment variable name into "LLAMA_CPP_LIB" 2023-04-10 17:27:17 +02:00
Mug ee71ce8ab7 Make windows users happy (hopefully) 2023-04-10 17:12:25 +02:00
Mug cf339c9b3c Better custom library debugging 2023-04-10 17:06:58 +02:00
Mug 4132293d2d Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into local-lib 2023-04-10 17:00:42 +02:00