Commit graph

  • 946156fb6c feat: Update llama.cpp Andrei Betlen 2024-04-30 15:46:45 -0400
  • 9286b5caac Merge branch 'main' of github.com:abetlen/llama_cpp_python into main Andrei Betlen 2024-04-30 15:45:36 -0400
  • f116175a5a fix: Suppress all logs when verbose=False, use hardcoded fileno's to work in colab notebooks. Closes #796 Closes #729 Andrei Betlen 2024-04-30 15:45:34 -0400
  • 3226b3c5ef
    fix: UTF-8 handling with grammars (#1415) Jonathan Soma 2024-04-30 14:33:23 -0400
  • 945c62c567 docs: Change all examples from interpreter style to script style. Andrei Betlen 2024-04-30 10:15:04 -0400
  • 26478ab293 docs: Update README.md Andrei Betlen 2024-04-30 10:11:38 -0400
  • b14dd98922 chore: Bump version Andrei Betlen 2024-04-30 09:39:56 -0400
  • 29b6e9a5c8 fix: wrong parameter for flash attention in pickle __getstate__ Andrei Betlen 2024-04-30 09:32:47 -0400
  • 22d77eefd2 feat: Add option to enable flash_attn to Lllama params and ModelSettings Andrei Betlen 2024-04-30 09:29:16 -0400
  • 8c2b24d5aa feat: Update llama.cpp Andrei Betlen 2024-04-30 09:27:55 -0400
  • 6332527a69
    fix(ci): Fix build-and-release.yaml (#1413) Olivier DEBAUCHE 2024-04-30 15:16:14 +0200
  • c8cd8c17c6 docs: Update README to include CUDA 12.4 wheels Andrei Betlen 2024-04-30 03:12:46 -0400
  • f417cce28a chore: Bump version Andrei Betlen 2024-04-30 03:11:02 -0400
  • 3489ef09d3 fix: Ensure image renders before text in chat formats regardless of message content order. Andrei Betlen 2024-04-30 03:08:46 -0400
  • d03f15bb73 fix(ci): Fix bug in use of upload-artifact failing to merge multiple artifacts into a single release. Andrei Betlen 2024-04-30 02:58:55 -0400
  • 26c7876ba0 chore: Bump version Andrei Betlen 2024-04-30 01:48:40 -0400
  • fe2da09538
    feat: Generic Chat Formats, Tool Calling, and Huggingface Pull Support for Multimodal Models (Obsidian, LLaVA1.6, Moondream) (#1147) Andrei 2024-04-30 01:35:38 -0400
  • 97fb860eba feat: Update llama.cpp Andrei Betlen 2024-04-29 23:34:55 -0400
  • df2b5b5d44
    chore(deps): bump actions/upload-artifact from 3 to 4 (#1412) dependabot[bot] 2024-04-29 22:53:42 -0400
  • be43018e09
    chore(deps): bump actions/configure-pages from 4 to 5 (#1411) dependabot[bot] 2024-04-29 22:53:21 -0400
  • 32c000f3ec
    chore(deps): bump softprops/action-gh-release from 1 to 2 (#1408) dependabot[bot] 2024-04-29 22:52:58 -0400
  • 03c654a3d9
    ci(fix): Workflow actions updates and fix arm64 wheels not included in release (#1392) Olivier DEBAUCHE 2024-04-30 04:52:23 +0200
  • 0c3bc4b928 fix(ci): Update generate wheel index script to include cu12.3 and cu12.4 Closes #1406 Andrei Betlen 2024-04-29 12:37:22 -0400
  • 2355ce2227
    ci: Add support for pre-built cuda 12.4.1 wheels (#1388) Olivier DEBAUCHE 2024-04-28 05:44:47 +0200
  • a411612b38 feat: Add support for str type kv_overrides Andrei Betlen 2024-04-27 23:42:19 -0400
  • c9b85bf098 feat: Update llama.cpp Andrei Betlen 2024-04-27 23:41:54 -0400
  • c07db99e5b
    chore(deps): bump pypa/cibuildwheel from 2.16.5 to 2.17.0 (#1401) dependabot[bot] 2024-04-27 20:51:13 -0400
  • 7074c4d256
    chore(deps): bump docker/build-push-action from 4 to 5 (#1400) dependabot[bot] 2024-04-27 20:51:02 -0400
  • 79318ba1d1
    chore(deps): bump docker/login-action from 2 to 3 (#1399) dependabot[bot] 2024-04-27 20:50:50 -0400
  • 27038db3d6
    chore(deps): bump actions/cache from 3.3.2 to 4.0.2 (#1398) dependabot[bot] 2024-04-27 20:50:39 -0400
  • 17bdfc818f
    chore(deps): bump conda-incubator/setup-miniconda from 2.2.0 to 3.0.4 (#1397) dependabot[bot] 2024-04-27 20:50:28 -0400
  • f178636e1b
    fix: Functionary bug fixes (#1385) Jeffrey Fong 2024-04-28 08:49:52 +0800
  • e6bbfb863c
    examples: fix quantize example (#1387) iyubondyrev 2024-04-28 02:48:47 +0200
  • c58b56123d
    ci: Update action versions in build-wheels-metal.yaml (#1390) Olivier DEBAUCHE 2024-04-28 02:47:49 +0200
  • 9e7f738220
    ci: Update dependabot.yml (#1391) Olivier DEBAUCHE 2024-04-28 02:47:07 +0200
  • 65edc90671 chore: Bump version Andrei Betlen 2024-04-26 10:11:31 -0400
  • 173ebc7878 fix: Remove duplicate pooling_type definition and add misisng n_vocab definition in bindings Andrei Betlen 2024-04-25 21:36:09 -0400
  • f6ed21f9a2
    feat: Allow for possibly non-pooled embeddings (#1380) Douglas Hanley 2024-04-25 20:32:44 -0500
  • fcfea66857 fix: pydantic deprecation warning Andrei Betlen 2024-04-25 21:21:48 -0400
  • 7f52335c50 feat: Update llama.cpp Andrei Betlen 2024-04-25 21:21:29 -0400
  • 266abfc1a3 fix(ci): Fix metal tests as well Andrei Betlen 2024-04-25 03:09:46 -0400
  • de37420fcf fix(ci): Fix python macos test runners issue Andrei Betlen 2024-04-25 03:08:32 -0400
  • 2a9979fce1 feat: Update llama.cpp Andrei Betlen 2024-04-25 02:48:26 -0400
  • ce85be97e2
    Merge https://github.com/abetlen/llama-cpp-python baalajimaestro 2024-04-25 10:48:33 +0530
  • c50d3300d2 chore: Bump version Andrei Betlen 2024-04-23 02:53:20 -0400
  • 611781f531 ci: Build arm64 wheels. Closes #1342 Andrei Betlen 2024-04-23 02:48:09 -0400
  • 53ebcc8bb5
    feat(server): Provide ability to dynamically allocate all threads if desired using -1 (#1364) Sean Bailey 2024-04-23 02:35:38 -0400
  • 507c1da066
    fix: Update scikit-build-core build dependency avoid bug in 0.9.1 (#1370) Geza Velkey 2024-04-23 08:34:15 +0200
  • 8559e8ce88
    feat: Add Llama-3 chat format (#1371) abk16 2024-04-23 06:33:29 +0000
  • 617d536e1c feat: Update llama.cpp Andrei Betlen 2024-04-23 02:31:40 -0400
  • d40a250ef3 feat: Use new llama_token_is_eog in create_completions Andrei Betlen 2024-04-22 00:35:47 -0400
  • b21ba0e2ac Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main Andrei Betlen 2024-04-21 20:46:42 -0400
  • 159cc4e5d9 feat: Update llama.cpp Andrei Betlen 2024-04-21 20:46:40 -0400
  • 0281214863 chore: Bump version Andrei Betlen 2024-04-20 00:09:37 -0400
  • cc81afebf0 feat: Add stopping_criteria to ChatFormatter, allow stopping on arbitrary token ids, fixes llama3 instruct Andrei Betlen 2024-04-20 00:00:53 -0400
  • d17c1887a3 feat: Update llama.cpp Andrei Betlen 2024-04-19 23:58:16 -0400
  • 893a27a736 chore: Bump version Andrei Betlen 2024-04-18 01:43:39 -0400
  • a128c80500 feat: Update llama.cpp Andrei Betlen 2024-04-18 01:39:45 -0400
  • 4f42664955
    feat: update grammar schema converter to match llama.cpp (#1353) Lucca Zenóbio 2024-04-18 02:36:25 -0300
  • fa4bb0cf81 Revert "feat: Update json to grammar (#1350)" Andrei Betlen 2024-04-17 16:18:16 -0400
  • 610a592f70
    feat: Update json to grammar (#1350) Lucca Zenóbio 2024-04-17 11:10:21 -0300
  • b73c73c0c6
    feat: add disable_ping_events flag (#1257) khimaros 2024-04-17 14:08:19 +0000
  • 4924455dec
    feat: Make saved state more compact on-disk (#1296) tc-wolf 2024-04-17 09:06:50 -0500
  • 9842cbf99d feat: Update llama.cpp Andrei Betlen 2024-04-17 10:06:15 -0400
  • c96b2daebf feat: Use all available CPUs for batch processing (#1345) ddh0 2024-04-17 09:04:33 -0500
  • a420f9608b feat: Update llama.cpp Andrei Betlen 2024-04-14 19:14:09 -0400
  • 90dceaba8a feat: Update llama.cpp Andrei Betlen 2024-04-14 11:35:57 -0400
  • 2e9ffd28fd feat: Update llama.cpp Andrei Betlen 2024-04-12 21:09:12 -0400
  • ef29235d45 chore: Bump version Andrei Betlen 2024-04-10 03:44:46 -0400
  • bb65b4d764 fix: pass correct type to chat handlers for chat completion logprobs Andrei Betlen 2024-04-10 03:41:55 -0400
  • 060bfa64d5 feat: Add support for yaml based configs Andrei Betlen 2024-04-10 02:47:01 -0400
  • 1347e1d050 feat: Add typechecking for ctypes structure attributes Andrei Betlen 2024-04-10 02:40:41 -0400
  • 889d0e8981 feat: Update llama.cpp Andrei Betlen 2024-04-10 02:25:58 -0400
  • 56071c956a feat: Update llama.cpp Andrei Betlen 2024-04-09 09:53:49 -0400
  • 0078e0f1cf
    Merge https://github.com/abetlen/llama-cpp-python baalajimaestro 2024-04-06 16:34:43 +0530
  • 08b16afe11 chore: Bump version Andrei Betlen 2024-04-06 01:53:38 -0400
  • 7ca364c8bd feat: Update llama.cpp Andrei Betlen 2024-04-06 01:37:43 -0400
  • b3bfea6dbf fix: Always embed metal library. Closes #1332 Andrei Betlen 2024-04-06 01:36:53 -0400
  • f4092e6b46 feat: Update llama.cpp Andrei Betlen 2024-04-05 10:59:31 -0400
  • 2760ef6156 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main Andrei Betlen 2024-04-05 10:51:54 -0400
  • 1ae3abbcc3 fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 Closes #1314 Andrei Betlen 2024-04-05 10:50:49 -0400
  • 49bc66bfa2 fix: missing logprobs in response, incorrect response type for functionary, minor type issues. Closes #1328 #1314 Andrei Betlen 2024-04-05 10:50:49 -0400
  • 9111b6e03a feat: Update llama.cpp Andrei Betlen 2024-04-05 09:21:02 -0400
  • 7265a5dc0e
    fix(docs): incorrect tool_choice example (#1330) Sigbjørn Skjæret 2024-04-05 15:14:03 +0200
  • 8b9cd38c0d
    Merge https://github.com/abetlen/llama-cpp-python baalajimaestro 2024-04-05 10:38:53 +0530
  • 909ef66951 docs: Rename cuBLAS section to CUDA Andrei Betlen 2024-04-04 03:08:47 -0400
  • 1db3b58fdc docs: Add docs explaining how to install pre-built wheels. Andrei Betlen 2024-04-04 02:57:06 -0400
  • c50309e52a docs: LLAMA_CUBLAS -> LLAMA_CUDA Andrei Betlen 2024-04-04 02:49:19 -0400
  • 612e78d322 fix(ci): use correct script name Andrei Betlen 2024-04-03 16:15:29 -0400
  • 34081ddc5b chore: Bump version Andrei Betlen 2024-04-03 15:38:27 -0400
  • 368061c04a Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main Andrei Betlen 2024-04-03 15:35:30 -0400
  • 5a5193636b feat: Update llama.cpp Andrei Betlen 2024-04-03 15:35:28 -0400
  • 5a930ee9a1
    feat: Binary wheels for CPU, CUDA (12.1 - 12.3), Metal (#1247) Andrei 2024-04-03 15:32:13 -0400
  • 8649d7671b fix: segfault when logits_all=False. Closes #1319 Andrei Betlen 2024-04-03 15:30:31 -0400
  • f96de6d920 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python into main Andrei Betlen 2024-04-03 00:55:21 -0400
  • e465157804 feat: Update llama.cpp Andrei Betlen 2024-04-03 00:55:19 -0400
  • 62aad610e1
    fix: last tokens passing to sample_repetition_penalties function (#1295) Yuri Mikhailov 2024-04-02 04:25:43 +0900
  • 45bf5ae582 chore: Bump version Andrei Betlen 2024-04-01 10:28:22 -0400
  • a0f373e310
    fix: Changed local API doc references to hosted (#1317) lawfordp2017 2024-04-01 08:21:00 -0600
  • f165048a69
    feat: add support for KV cache quantization options (#1307) Limour 2024-04-01 22:19:28 +0800