Commit graph

1647 commits

Author SHA1 Message Date
Andrei Betlen 0281214863 chore: Bump version 2024-04-20 00:09:37 -04:00
Andrei Betlen cc81afebf0 feat: Add stopping_criteria to ChatFormatter, allow stopping on arbitrary token ids, fixes llama3 instruct 2024-04-20 00:00:53 -04:00
Andrei Betlen d17c1887a3 feat: Update llama.cpp 2024-04-19 23:58:16 -04:00
Andrei Betlen 893a27a736 chore: Bump version 2024-04-18 01:43:39 -04:00
Andrei Betlen a128c80500 feat: Update llama.cpp 2024-04-18 01:39:45 -04:00
Lucca Zenóbio 4f42664955
feat: update grammar schema converter to match llama.cpp (#1353)
* feat: improve function calling

* feat:grammar

* fix

* fix

* fix
2024-04-18 01:36:25 -04:00
Andrei Betlen fa4bb0cf81 Revert "feat: Update json to grammar (#1350)"
This reverts commit 610a592f70.
2024-04-17 16:18:16 -04:00
Lucca Zenóbio 610a592f70
feat: Update json to grammar (#1350)
* feat: improve function calling

* feat:grammar
2024-04-17 10:10:21 -04:00
khimaros b73c73c0c6
feat: add disable_ping_events flag (#1257)
for backward compatibility, this is false by default

it can be set to true to disable EventSource pings
which are not supported by some OpenAI clients.

fixes https://github.com/abetlen/llama-cpp-python/issues/1256
2024-04-17 10:08:19 -04:00
tc-wolf 4924455dec
feat: Make saved state more compact on-disk (#1296)
* State load/save changes

- Only store up to `n_tokens` logits instead of full `(n_ctx, n_vocab)`
  sized array.
  - Difference between ~350MB and ~1500MB for example prompt with ~300
    tokens (makes sense lol)
- Auto-formatting changes

* Back out formatting changes
2024-04-17 10:06:50 -04:00
Andrei Betlen 9842cbf99d feat: Update llama.cpp 2024-04-17 10:06:15 -04:00
ddh0 c96b2daebf feat: Use all available CPUs for batch processing (#1345) 2024-04-17 10:05:54 -04:00
Andrei Betlen a420f9608b feat: Update llama.cpp 2024-04-14 19:14:09 -04:00
Andrei Betlen 90dceaba8a feat: Update llama.cpp 2024-04-14 11:35:57 -04:00
Andrei Betlen 2e9ffd28fd feat: Update llama.cpp 2024-04-12 21:09:12 -04:00
Andrei Betlen ef29235d45 chore: Bump version 2024-04-10 03:44:46 -04:00
Andrei Betlen bb65b4d764 fix: pass correct type to chat handlers for chat completion logprobs 2024-04-10 03:41:55 -04:00
Andrei Betlen 060bfa64d5 feat: Add support for yaml based configs 2024-04-10 02:47:01 -04:00
Andrei Betlen 1347e1d050 feat: Add typechecking for ctypes structure attributes 2024-04-10 02:40:41 -04:00
Andrei Betlen 889d0e8981 feat: Update llama.cpp 2024-04-10 02:25:58 -04:00
Andrei Betlen 56071c956a feat: Update llama.cpp 2024-04-09 09:53:49 -04:00
Andrei Betlen 08b16afe11 chore: Bump version 2024-04-06 01:53:38 -04:00
Andrei Betlen 7ca364c8bd feat: Update llama.cpp 2024-04-06 01:37:43 -04:00
Andrei Betlen b3bfea6dbf fix: Always embed metal library. Closes #1332 2024-04-06 01:36:53 -04:00
Andrei Betlen f4092e6b46 feat: Update llama.cpp 2024-04-05 10:59:31 -04:00
Andrei Betlen 2760ef6156 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-04-05 10:51:54 -04:00
Andrei Betlen 1ae3abbcc3 fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 Closes #1314 2024-04-05 10:51:44 -04:00
Andrei Betlen 49bc66bfa2 fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 #1314 2024-04-05 10:50:49 -04:00
Andrei Betlen 9111b6e03a feat: Update llama.cpp 2024-04-05 09:21:02 -04:00
Sigbjørn Skjæret 7265a5dc0e
fix(docs): incorrect tool_choice example (#1330) 2024-04-05 09:14:03 -04:00
Andrei Betlen 909ef66951 docs: Rename cuBLAS section to CUDA 2024-04-04 03:08:47 -04:00
Andrei Betlen 1db3b58fdc docs: Add docs explaining how to install pre-built wheels. 2024-04-04 02:57:06 -04:00
Andrei Betlen c50309e52a docs: LLAMA_CUBLAS -> LLAMA_CUDA 2024-04-04 02:49:19 -04:00
Andrei Betlen 612e78d322 fix(ci): use correct script name 2024-04-03 16:15:29 -04:00
Andrei Betlen 34081ddc5b chore: Bump version 2024-04-03 15:38:27 -04:00
Andrei Betlen 368061c04a Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-04-03 15:35:30 -04:00
Andrei Betlen 5a5193636b feat: Update llama.cpp 2024-04-03 15:35:28 -04:00
Andrei 5a930ee9a1
feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (#1247)
* Generate binary wheel index on release

* Add total release downloads badge

* Update download label

* Use official cibuildwheel action

* Add workflows to build CUDA and Metal wheels

* Update generate index workflow

* Update workflow name
2024-04-03 15:32:13 -04:00
Andrei Betlen 8649d7671b fix: segfault when logits_all=False. Closes #1319 2024-04-03 15:30:31 -04:00
Andrei Betlen f96de6d920 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main 2024-04-03 00:55:21 -04:00
Andrei Betlen e465157804 feat: Update llama.cpp 2024-04-03 00:55:19 -04:00
Yuri Mikhailov 62aad610e1
fix: last tokens passing to sample_repetition_penalties function (#1295)
Co-authored-by: ymikhaylov <ymikhaylov@x5.ru>
Co-authored-by: Andrei <abetlen@gmail.com>
2024-04-01 15:25:43 -04:00
Andrei Betlen 45bf5ae582 chore: Bump version 2024-04-01 10:28:22 -04:00
lawfordp2017 a0f373e310
fix: Changed local API doc references to hosted (#1317) 2024-04-01 10:21:00 -04:00
Limour f165048a69
feat: add support for KV cache quantization options (#1307)
* add KV cache quantization options

https://github.com/abetlen/llama-cpp-python/discussions/1220
https://github.com/abetlen/llama-cpp-python/issues/1305

* Add ggml_type

* Use ggml_type instead of string for quantization

* Add server support

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-04-01 10:19:28 -04:00
windspirit95 aa9f1ae011
feat: Add logprobs support to chat completions (#1311)
* Add logprobs return in ChatCompletionResponse

* Fix duplicate field

* Set default to false

* Simplify check

* Add server example

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2024-03-31 13:30:13 -04:00
Andrei Betlen 1e60dba082 feat: Update llama.cpp 2024-03-29 13:34:23 -04:00
Andrei Betlen dcbe57fcf8 feat: Update llama.cpp 2024-03-29 12:45:27 -04:00
Andrei Betlen 125b2358c9 feat: Update llama.cpp 2024-03-28 12:06:46 -04:00
Andrei Betlen 901fe02461 feat: Update llama.cpp 2024-03-26 22:58:53 -04:00