Commit graph

413 commits

Author SHA1 Message Date
Andrei Betlen f0ec6e615e Stream tokens instead of text chunks 2023-05-18 11:35:59 -04:00
Andrei Betlen 21d8f5fa9f Remove unnused union 2023-05-18 11:35:15 -04:00
Andrei Betlen 61d58e7b35 Check for CUDA_PATH before adding 2023-05-17 15:26:38 -04:00
Andrei Betlen 7c95895626 Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-05-17 15:19:32 -04:00
Aneesh Joy e9794f91f2
Fixd CUBLAS dll load issue in Windows 2023-05-17 18:04:58 +01:00
Andrei Betlen 4f342795e5 Update token checks 2023-05-17 03:35:13 -04:00
Andrei Betlen f5c2f998ab Format 2023-05-17 02:00:39 -04:00
Andrei Betlen d28b753ed2 Implement penalize_nl 2023-05-17 01:53:26 -04:00
Andrei Betlen f11e2a781c Fix last_n_tokens_size 2023-05-17 01:42:51 -04:00
Andrei Betlen 7e55244540 Fix top_k value. Closes #220 2023-05-17 01:41:42 -04:00
Andrei Betlen a7c9e38287 Update variable name 2023-05-16 18:07:25 -04:00
Andrei Betlen a3352923c7 Add model_alias option to override model_path in completions. Closes #39 2023-05-16 17:22:00 -04:00
Andrei Betlen a65125c0bd Add sampling defaults for generate 2023-05-16 09:35:50 -04:00
Andrei Betlen cbac19bf24 Add winmode arg only on windows if python version supports it 2023-05-15 09:15:01 -04:00
Andrei Betlen c804efe3f0 Fix obscure Wndows DLL issue. Closes #208 2023-05-14 22:08:11 -04:00
Andrei Betlen cdf59768f5 Update llama.cpp 2023-05-14 00:04:22 -04:00
Andrei Betlen 7a536e86c2 Allow model to tokenize strings longer than context length and set add_bos. Closes #92 2023-05-12 14:28:22 -04:00
Andrei Betlen 8740ddc58e Only support generating one prompt at a time. 2023-05-12 07:21:46 -04:00
Andrei Betlen 8895b9002a Revert "llama_cpp server: prompt is a string". Closes #187
This reverts commit b9098b0ef7.
2023-05-12 07:16:57 -04:00
Andrei Betlen 7be584fe82 Add missing tfs_z paramter 2023-05-11 21:56:19 -04:00
Andrei Betlen cdeaded251 Bugfix: Ensure logs are printed when streaming 2023-05-10 16:12:17 -04:00
Lucas Doyle 02e8a018ae llama_cpp server: document presence_penalty and frequency_penalty, mark as supported 2023-05-09 16:25:00 -07:00
Andrei Betlen d957422bf4 Implement sampling as in llama.cpp main example 2023-05-08 21:21:25 -04:00
Andrei Betlen 93a9019bb1 Merge branch 'main' of github.com:abetlen/llama_cpp_python into Maximilian-Winter/main 2023-05-08 19:57:09 -04:00
Andrei Betlen 82d138fe54 Fix: default repeat_penalty 2023-05-08 18:49:11 -04:00
Andrei Betlen 29f094bbcf Bugfix: not falling back to environment variables when default is value is set. 2023-05-08 14:46:25 -04:00
Andrei Betlen 0d6c60097a Show default value when --help is called 2023-05-08 14:21:15 -04:00
Andrei Betlen 022e9ebcb8 Use environment variable if parsed cli arg is None 2023-05-08 14:20:53 -04:00
Andrei Betlen 0d751a69a7 Set repeat_penalty to 0 by default 2023-05-08 01:50:43 -04:00
Andrei Betlen 65d9cc050c Add openai frequency and presence penalty parameters. Closes #169 2023-05-08 01:30:18 -04:00
Andrei Betlen a0b61ea2a7 Bugfix for models endpoint 2023-05-07 20:17:52 -04:00
Andrei Betlen e72f58614b Change pointer to lower overhead byref 2023-05-07 20:01:34 -04:00
Andrei Betlen 14da46f16e Added cache size to settins object. 2023-05-07 19:33:17 -04:00
Andrei Betlen 0e94a70de1 Add in-memory longest prefix cache. Closes #158 2023-05-07 19:31:26 -04:00
Andrei Betlen 8dfde63255 Fix return type 2023-05-07 19:30:14 -04:00
Andrei Betlen 2753b85321 Format 2023-05-07 13:19:56 -04:00
Andrei Betlen 627811ea83 Add verbose flag to server 2023-05-07 05:09:10 -04:00
Andrei Betlen 3fbda71790 Fix mlock_supported and mmap_supported return type 2023-05-07 03:04:22 -04:00
Andrei Betlen 5a3413eee3 Update cpu_count 2023-05-07 03:03:57 -04:00
Andrei Betlen 1a00e452ea Update settings fields and defaults 2023-05-07 02:52:20 -04:00
Andrei Betlen 86753976c4 Revert "llama_cpp server: delete some ignored / unused parameters"
This reverts commit b47b9549d5.
2023-05-07 02:02:34 -04:00
Andrei Betlen c382d8f86a Revert "llama_cpp server: mark model as required"
This reverts commit e40fcb0575.
2023-05-07 02:00:22 -04:00
Andrei Betlen d8fddcce73 Merge branch 'main' of github.com:abetlen/llama_cpp_python into better-server-params-and-fields 2023-05-07 01:54:00 -04:00
Andrei Betlen 7c3743fe5f Update llama.cpp 2023-05-07 00:12:47 -04:00
Andrei Betlen bc853e3742 Fix type for eval_logits in LlamaState object 2023-05-06 21:32:50 -04:00
Maximilian Winter 515d9bde7e Fixed somethings and activated cublas 2023-05-06 23:40:19 +02:00
Maximilian Winter aa203a0d65 Added mirostat sampling to the high level API. 2023-05-06 22:47:47 +02:00
Andrei Betlen 98bbd1c6a8 Fix eval logits type 2023-05-05 14:23:14 -04:00
Andrei Betlen b5f3e74627 Add return type annotations for embeddings and logits 2023-05-05 14:22:55 -04:00
Andrei Betlen 3e28e0e50c Fix: runtime type errors 2023-05-05 14:12:26 -04:00
Andrei Betlen e24c3d7447 Prefer explicit imports 2023-05-05 14:05:31 -04:00
Andrei Betlen 40501435c1 Fix: types 2023-05-05 14:04:12 -04:00
Andrei Betlen 66e28eb548 Fix temperature bug 2023-05-05 14:00:41 -04:00
Andrei Betlen 6702d2abfd Fix candidates type 2023-05-05 14:00:30 -04:00
Andrei Betlen 5e7ddfc3d6 Fix llama_cpp types 2023-05-05 13:54:22 -04:00
Andrei Betlen b6a9a0b6ba Add types for all low-level api functions 2023-05-05 12:22:27 -04:00
Andrei Betlen 5be0efa5f8 Cache should raise KeyError when key is missing 2023-05-05 12:21:49 -04:00
Andrei Betlen 24fc38754b Add cli options to server. Closes #37 2023-05-05 12:08:28 -04:00
Andrei Betlen 853dc711cc Format 2023-05-04 21:58:36 -04:00
Andrei Betlen 97c6372350 Rewind model to longest prefix. 2023-05-04 21:58:27 -04:00
Andrei Betlen 329297fafb Bugfix: Missing logits_to_logprobs 2023-05-04 12:18:40 -04:00
Lucas Doyle 3008a954c1 Merge branch 'main' of github.com:abetlen/llama-cpp-python into better-server-params-and-fields 2023-05-03 13:10:03 -07:00
Andrei Betlen 9e5b6d675a Improve logging messages 2023-05-03 10:28:10 -04:00
Andrei Betlen 43f2907e3a Support smaller state sizes 2023-05-03 09:33:50 -04:00
Andrei Betlen 1d47cce222 Update llama.cpp 2023-05-03 09:33:30 -04:00
Lucas Doyle b9098b0ef7 llama_cpp server: prompt is a string
Not sure why this union type was here but taking a look at llama.py, prompt is only ever processed as a string for completion

This was breaking types when generating an openapi client
2023-05-02 14:47:07 -07:00
Matt Hoffner f97ff3c5bb
Update llama_cpp.py 2023-05-01 20:40:06 -07:00
Andrei 7ab08b8d10
Merge branch 'main' into better-server-params-and-fields 2023-05-01 22:45:57 -04:00
Andrei Betlen 9eafc4c49a Refactor server to use factory 2023-05-01 22:38:46 -04:00
Andrei Betlen dd9ad1c759 Formatting 2023-05-01 21:51:16 -04:00
Lucas Doyle dbbfc4ba2f llama_cpp server: fix to ChatCompletionRequestMessage
When I generate a client, it breaks because it fails to process the schema of ChatCompletionRequestMessage

These fix that:
- I think `Union[Literal["user"], Literal["channel"], ...]` is the same as Literal["user", "channel", ...]
- Turns out default value `Literal["user"]` isn't JSON serializable, so replace with "user"
2023-05-01 15:38:19 -07:00
Lucas Doyle fa2a61e065 llama_cpp server: fields for the embedding endpoint 2023-05-01 15:38:19 -07:00
Lucas Doyle 8dcbf65a45 llama_cpp server: define fields for chat completions
Slight refactor for common fields shared between completion and chat completion
2023-05-01 15:38:19 -07:00
Lucas Doyle 978b6daf93 llama_cpp server: add some more information to fields for completions 2023-05-01 15:38:19 -07:00
Lucas Doyle a5aa6c1478 llama_cpp server: add missing top_k param to CreateChatCompletionRequest
`llama.create_chat_completion` definitely has a `top_k` argument, but its missing from `CreateChatCompletionRequest`. decision: add it
2023-05-01 15:38:19 -07:00
Lucas Doyle 1e42913599 llama_cpp server: move logprobs to supported
I think this is actually supported (its in the arguments of `LLama.__call__`, which is how the completion is invoked). decision: mark as supported
2023-05-01 15:38:19 -07:00
Lucas Doyle b47b9549d5 llama_cpp server: delete some ignored / unused parameters
`n`, `presence_penalty`, `frequency_penalty`, `best_of`, `logit_bias`, `user`: not supported, excluded from the calls into llama. decision: delete it
2023-05-01 15:38:19 -07:00
Lucas Doyle e40fcb0575 llama_cpp server: mark model as required
`model` is ignored, but currently marked "optional"... on the one hand could mark "required" to make it explicit in case the server supports multiple llama's at the same time, but also could delete it since its ignored. decision: mark it required for the sake of openai api compatibility.

I think out of all parameters, `model` is probably the most important one for people to keep using even if its ignored for now.
2023-05-01 15:38:19 -07:00
Andrei Betlen b6747f722e Fix logprob calculation. Fixes #134 2023-05-01 17:45:08 -04:00
Andrei Betlen 9ff9cdd7fc Fix import error 2023-05-01 15:11:15 -04:00
Andrei Betlen 350a1769e1 Update sampling api 2023-05-01 14:47:55 -04:00
Andrei Betlen 7837c3fdc7 Fix return types and import comments 2023-05-01 14:02:06 -04:00
Andrei Betlen ccf1ed54ae Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-05-01 11:35:14 -04:00
Andrei Betlen 80184a286c Update llama.cpp 2023-05-01 10:44:28 -04:00
Lucas Doyle efe8e6f879 llama_cpp server: slight refactor to init_llama function
Define an init_llama function that starts llama with supplied settings instead of just doing it in the global context of app.py

This allows the test to be less brittle by not needing to mess with os.environ, then importing the app
2023-04-29 11:42:23 -07:00
Lucas Doyle 6d8db9d017 tests: simple test for server module 2023-04-29 11:42:20 -07:00
Lucas Doyle 468377b0e2 llama_cpp server: app is now importable, still runnable as a module 2023-04-29 11:41:25 -07:00
Andrei 755f9fa455
Merge pull request #118 from SagsMug/main
Fix UnicodeDecodeError permanently
2023-04-29 07:19:01 -04:00
Mug 18a0c10032 Remove excessive errors="ignore" and add utf8 test 2023-04-29 12:19:22 +02:00
Andrei Betlen ea0faabae1 Update llama.cpp 2023-04-28 15:32:43 -04:00
Mug b7d14efc8b Python weirdness 2023-04-28 13:20:31 +02:00
Mug eed61289b6 Dont detect off tokens, detect off detokenized utf8 2023-04-28 13:16:18 +02:00
Mug 3a98747026 One day, i'll fix off by 1 errors permanently too 2023-04-28 12:54:28 +02:00
Mug c39547a986 Detect multi-byte responses and wait 2023-04-28 12:50:30 +02:00
Andrei Betlen 9339929f56 Update llama.cpp 2023-04-26 20:00:54 -04:00
Mug 5f81400fcb Also ignore errors on input prompts 2023-04-26 14:45:51 +02:00
Mug be2c961bc9 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python 2023-04-26 14:38:09 +02:00
Mug c4a8491d42 Fix decode errors permanently 2023-04-26 14:37:06 +02:00
Andrei Betlen cbd26fdcc1 Update llama.cpp 2023-04-25 19:03:41 -04:00
Andrei Betlen 3cab3ef4cb Update n_batch for server 2023-04-25 09:11:32 -04:00
Andrei Betlen cc706fb944 Add ctx check and re-order __init__. Closes #112 2023-04-25 09:00:53 -04:00
Andrei Betlen d484c5634e Bugfix: Check cache keys as prefix to prompt tokens 2023-04-24 22:18:54 -04:00
Andrei Betlen cbe95bbb75 Add cache implementation using llama state 2023-04-24 19:54:41 -04:00
Andrei Betlen 2c359a28ff Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-04-24 17:51:27 -04:00
Andrei Betlen 197cf80601 Add save/load state api for Llama class 2023-04-24 17:51:25 -04:00
Andrei Betlen 86f8e5ad91 Refactor internal state for Llama class 2023-04-24 15:47:54 -04:00
Andrei f37456133a
Merge pull request #108 from eiery/main
Update n_batch default to 512 to match upstream llama.cpp
2023-04-24 13:48:09 -04:00
Andrei Betlen 02cf881317 Update llama.cpp 2023-04-24 09:30:10 -04:00
eiery aa12d8a81f
Update llama.py
update n_batch default to 512 to match upstream llama.cpp
2023-04-23 20:56:40 -04:00
Andrei Betlen 7230599593 Disable mmap when applying lora weights. Closes #107 2023-04-23 14:53:17 -04:00
Andrei Betlen e99caedbbd Update llama.cpp 2023-04-22 19:50:28 -04:00
Andrei Betlen 1eb130a6b2 Update llama.cpp 2023-04-21 17:40:27 -04:00
Andrei Betlen e4647c75ec Add use_mmap flag to server 2023-04-19 15:57:46 -04:00
Andrei Betlen 0df4d69c20 If lora base is not set avoid re-loading the model by passing NULL 2023-04-18 23:45:25 -04:00
Andrei Betlen 95c0dc134e Update type signature to allow for null pointer to be passed. 2023-04-18 23:44:46 -04:00
Andrei Betlen 453e517fd5 Add seperate lora_base path for applying LoRA to quantized models using original unquantized model weights. 2023-04-18 10:20:46 -04:00
Andrei Betlen eb7f278cc6 Add lora_path parameter to Llama model 2023-04-18 01:43:44 -04:00
Andrei Betlen 35abf89552 Add bindings for LoRA adapters. Closes #88 2023-04-18 01:30:04 -04:00
Andrei Betlen 89856ef00d Bugfix: only eval new tokens 2023-04-15 17:32:53 -04:00
Andrei Betlen 92c077136d Add experimental cache 2023-04-15 12:03:09 -04:00
Andrei Betlen a6372a7ae5 Update stop sequences for chat 2023-04-15 12:02:48 -04:00
Andrei Betlen 83b2be6dc4 Update chat parameters 2023-04-15 11:58:43 -04:00
Andrei Betlen 62087514c6 Update chat prompt 2023-04-15 11:58:19 -04:00
Andrei Betlen 02f9fb82fb Bugfix 2023-04-15 11:39:52 -04:00
Andrei Betlen 3cd67c7bd7 Add type annotations 2023-04-15 11:39:21 -04:00
Andrei Betlen d7de0e8014 Bugfix 2023-04-15 00:08:04 -04:00
Andrei Betlen e90e122f2a Use clear 2023-04-14 23:33:18 -04:00
Andrei Betlen ac7068a469 Track generated tokens internally 2023-04-14 23:33:00 -04:00
Andrei Betlen 6e298d8fca Set kv cache size to f16 by default 2023-04-14 22:21:19 -04:00
Andrei Betlen 6c7cec0c65 Fix completion request 2023-04-14 10:01:15 -04:00
Andrei Betlen 6153baab2d Clean up logprobs implementation 2023-04-14 09:59:33 -04:00
Andrei Betlen 26cc4ee029 Fix signature for stop parameter 2023-04-14 09:59:08 -04:00
Andrei Betlen 6595ad84bf Add field to disable reseting between generations 2023-04-13 00:28:00 -04:00
Andrei Betlen 22fa5a621f Revert "Deprecate generate method"
This reverts commit 6cf5876538.
2023-04-13 00:19:55 -04:00
Andrei Betlen 4f5f99ef2a Formatting 2023-04-12 22:40:12 -04:00
Andrei Betlen 0daf16defc Enable logprobs on completion endpoint 2023-04-12 19:08:11 -04:00
Andrei Betlen 19598ac4e8 Fix threading bug. Closes #62 2023-04-12 19:07:53 -04:00
Andrei Betlen 005c78d26c Update llama.cpp 2023-04-12 14:29:00 -04:00
Andrei Betlen c854c2564b Don't serialize stateful parameters 2023-04-12 14:07:14 -04:00
Andrei Betlen 2f9b649005 Style fix 2023-04-12 14:06:22 -04:00
Andrei Betlen 6cf5876538 Deprecate generate method 2023-04-12 14:06:04 -04:00
Andrei Betlen b3805bb9cc Implement logprobs parameter for text completion. Closes #2 2023-04-12 14:05:11 -04:00
Andrei Betlen 9f1e565594 Update llama.cpp 2023-04-11 11:59:03 -04:00
Andrei Betlen 213cc5c340 Remove async from function signature to avoid blocking the server 2023-04-11 11:54:31 -04:00
jm12138 90e1021154 Add unlimited max_tokens 2023-04-10 15:56:05 +00:00
Mug 2559e5af9b Changed the environment variable name into "LLAMA_CPP_LIB" 2023-04-10 17:27:17 +02:00
Mug ee71ce8ab7 Make windows users happy (hopefully) 2023-04-10 17:12:25 +02:00
Mug cf339c9b3c Better custom library debugging 2023-04-10 17:06:58 +02:00
Mug 4132293d2d Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into local-lib 2023-04-10 17:00:42 +02:00
Mug 76131d5bb8 Use environment variable for library override 2023-04-10 17:00:35 +02:00
Andrei Betlen 1f67ad2a0b Add use_mmap option 2023-04-10 02:11:35 -04:00
Andrei Betlen c3c2623e8b Update llama.cpp 2023-04-09 22:01:33 -04:00
Andrei Betlen 314ce7d1cc Fix cpu count default 2023-04-08 19:54:04 -04:00
Andrei Betlen 3fbc06361f Formatting 2023-04-08 16:01:45 -04:00
Andrei Betlen 0067c1a588 Formatting 2023-04-08 16:01:18 -04:00
Andrei Betlen 38f442deb0 Bugfix: Wrong size of embeddings. Closes #47 2023-04-08 15:05:33 -04:00
Andrei Betlen ae3e9c3d6f Update shared library extension for macos 2023-04-08 02:45:21 -04:00
Andrei Betlen da539cc2ee Safer calculation of default n_threads 2023-04-06 21:22:19 -04:00
Andrei Betlen 930db37dd2 Merge branch 'main' of github.com:abetlen/llama_cpp_python into main 2023-04-06 21:07:38 -04:00
Andrei Betlen 55279b679d Handle prompt list 2023-04-06 21:07:35 -04:00
MillionthOdin16 c283edd7f2 Set n_batch to default values and reduce thread count:
Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default.

Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%
2023-04-05 18:17:29 -04:00
MillionthOdin16 76a82babef Set n_batch to the default value of 8. I think this is leftover from when n_ctx was missing and n_batch was 2048. 2023-04-05 17:44:53 -04:00
Andrei Betlen 44448fb3a8 Add server as a subpackage 2023-04-05 16:23:25 -04:00
Mug e3ea354547 Allow local llama library usage 2023-04-05 14:23:01 +02:00
Andrei Betlen e96a5c5722 Make Llama instance pickleable. Closes #27 2023-04-05 06:52:17 -04:00
Andrei Betlen 7643f6677d Bugfix for Python3.7 2023-04-05 04:37:33 -04:00
Andrei Betlen cefc69ea43 Add runtime check to ensure embedding is enabled if trying to generate embeddings 2023-04-05 03:25:37 -04:00
Andrei Betlen 5c50af7462 Remove workaround 2023-04-05 03:25:09 -04:00
Andrei Betlen 51dbcf2693 Bugfix: wrong signature for quantize function 2023-04-04 22:36:59 -04:00
Andrei Betlen c137789143 Add verbose flag. Closes #19 2023-04-04 13:09:24 -04:00
Andrei Betlen 5075c16fcc Bugfix: n_batch should always be <= n_ctx 2023-04-04 13:08:21 -04:00
Andrei Betlen caf3c0362b Add return type for default __call__ method 2023-04-03 20:26:08 -04:00
Andrei Betlen 4aa349d777 Add docstring for create_chat_completion 2023-04-03 20:24:20 -04:00
Andrei Betlen 7fedf16531 Add support for chat completion 2023-04-03 20:12:44 -04:00
Andrei Betlen 3dec778c90 Update to more sensible return signature 2023-04-03 20:12:14 -04:00
Andrei Betlen ae004eb69e Fix #16 2023-04-03 18:46:19 -04:00
MillionthOdin16 a0758f0077
Update llama_cpp.py with PR requests
lib_base_name and load_shared_library
to 
_lib_base_name and _load_shared_library
2023-04-03 13:06:50 -04:00
MillionthOdin16 a40476e299
Update llama_cpp.py
Make shared library code more robust with some platform specific functionality and more descriptive errors when failures occur
2023-04-02 21:50:13 -04:00
Andrei Betlen 1ed8cd023d Update llama_cpp and add kv_cache api support 2023-04-02 13:33:49 -04:00
Andrei Betlen 4f509b963e Bugfix: Stop sequences and missing max_tokens check 2023-04-02 03:59:19 -04:00
Andrei Betlen 353e18a781 Move workaround to new sample method 2023-04-02 00:06:34 -04:00
Andrei Betlen a4a1bbeaa9 Update api to allow for easier interactive mode 2023-04-02 00:02:47 -04:00
Andrei Betlen eef627c09c Fix example documentation 2023-04-01 17:39:35 -04:00
Andrei Betlen 1e4346307c Add documentation for generate method 2023-04-01 17:36:30 -04:00
Andrei Betlen 67c70cc8eb Add static methods for beginning and end of sequence tokens. 2023-04-01 17:29:30 -04:00
Andrei Betlen 318eae237e Update high-level api 2023-04-01 13:01:27 -04:00
Andrei Betlen 69e7d9f60e Add type definitions 2023-04-01 12:59:58 -04:00
Andrei Betlen 49c8df369a Fix type signature of token_to_str 2023-03-31 03:25:12 -04:00
Andrei Betlen 670d390001 Fix ctypes typing issue for Arrays 2023-03-31 03:20:15 -04:00
Andrei Betlen 1545b22727 Fix array type signatures 2023-03-31 02:08:20 -04:00
Andrei Betlen c928e0afc8 Formatting 2023-03-31 00:00:27 -04:00
Andrei Betlen 8908f4614c Update llama.cpp 2023-03-28 21:10:23 -04:00
Andrei Betlen 70b8a1ef75 Add support to get embeddings from high-level api. Closes #4 2023-03-28 04:59:54 -04:00
Andrei Betlen 3dbb3fd3f6 Add support for stream parameter. Closes #1 2023-03-28 04:03:57 -04:00
Andrei Betlen 30fc0f3866 Extract generate method 2023-03-28 02:42:22 -04:00
Andrei Betlen 1c823f6d0f Refactor Llama class and add tokenize / detokenize methods Closes #3 2023-03-28 01:45:37 -04:00
Andrei Betlen 8ae3beda9c Update Llama to add params 2023-03-25 16:26:23 -04:00
Andrei Betlen 4525236214 Update llama.cpp 2023-03-25 16:26:03 -04:00
Andrei Betlen b121b7c05b Update docstring 2023-03-25 12:33:18 -04:00
Andrei Betlen fa92740a10 Update llama.cpp 2023-03-25 12:12:09 -04:00
Andrei Betlen df15caa877 Add mkdocs 2023-03-24 18:57:59 -04:00
Andrei Betlen 4da5faa28b Bugfix: cross-platform method to find shared lib 2023-03-24 18:43:29 -04:00
Andrei Betlen b93675608a Handle errors returned by llama.cpp 2023-03-24 15:47:17 -04:00
Andrei Betlen 7786edb0f9 Black formatting 2023-03-24 14:59:29 -04:00
Andrei Betlen c784d83131 Update llama.cpp and re-organize low-level api 2023-03-24 14:58:42 -04:00
Andrei Betlen b9c53b88a1 Use n_ctx provided from actual context not params 2023-03-24 14:58:10 -04:00
Andrei Betlen 2cc499512c Black formatting 2023-03-24 14:35:41 -04:00
Andrei Betlen e24c581b5a Implement prompt batch processing as in main.cpp 2023-03-24 14:33:38 -04:00
Andrei Betlen a28cb92d8f Remove model_name param 2023-03-24 04:04:29 -04:00
Andrei Betlen eec9256a42 Bugfix: avoid decoding partial utf-8 characters 2023-03-23 16:25:13 -04:00
Andrei Betlen e63ea4dbbc Add support for logprobs 2023-03-23 15:51:05 -04:00
Andrei Betlen 465238b179 Updated package to build with skbuild 2023-03-23 13:54:14 -04:00
Andrei Betlen 79b304c9d4 Initial commit 2023-03-23 05:33:06 -04:00