Commit graph

74 commits

Author SHA1 Message Date
zocainViken 6dde6bd09c
bug fixing (#925) 2023-11-20 12:31:52 -05:00
Andrei Betlen 74167bdfb2 Update Functions notebook 2023-11-10 13:02:30 -05:00
Andrei Betlen 85ead98a3e Update Functions notebook example 2023-11-10 12:49:14 -05:00
Andrei Betlen 1b376c62b7 Update functionary for new OpenAI API 2023-11-10 02:51:58 -05:00
Andrei Betlen 598780fde8 Update Multimodal notebook 2023-11-08 00:48:25 -05:00
Damian Stewart aab74f0b2b
Multimodal Support (Llava 1.5) (#821)
* llava v1.5 integration

* Point llama.cpp to fork

* Add llava shared library target

* Fix type

* Update llama.cpp

* Add llava api

* Revert changes to llama and llama_cpp

* Update llava example

* Add types for new gpt-4-vision-preview api

* Fix typo

* Update llama.cpp

* Update llama_types to match OpenAI v1 API

* Update ChatCompletionFunction type

* Reorder request parameters

* More API type fixes

* Even More Type Updates

* Add parameter for custom chat_handler to Llama class

* Fix circular import

* Convert to absolute imports

* Fix

* Fix pydantic Jsontype bug

* Accept list of prompt tokens in create_completion

* Add llava1.5 chat handler

* Add Multimodal notebook

* Clean up examples

* Add server docs

---------

Co-authored-by: Andrei Betlen <abetlen@gmail.com>
2023-11-07 22:48:51 -05:00
Andrei 3af7b21ff1
Add functionary support (#784)
* Add common grammars and json-schema-to-grammar utility function from llama.cpp

* Pass functions to format function

* Add basic functionary formatting

* Add LlamaChatHandler for more complex chat use cases

* Add function calling example notebook

* Add support for regular chat completions alongside function calling
2023-11-03 02:12:14 -04:00
Andrei ab028cb878
Migrate inference to llama_batch and llama_decode api (#795)
* Add low-level batching notebook

* fix: tokenization of special characters: (#850)

It should behave like llama.cpp, where most out of the box usages
treat special characters accordingly

* Update CHANGELOG

* Cleanup

* Fix runner label

* Update notebook

* Use llama_decode and batch api

* Support logits_all parameter

---------

Co-authored-by: Antoine Lizee <antoine.lizee@gmail.com>
2023-11-02 20:13:57 -04:00
Andrei Betlen f4090a0bb2 Add numa support, low level api users must now explicitly call llama_backend_init at the start of their programs. 2023-09-13 23:00:43 -04:00
Juarez Bochi 20ac434d0f
Fix low level api examples 2023-09-07 17:50:47 -04:00
Andrei 2adf6f3f9a
Merge pull request #265 from dmahurin/fix-from-bytes-byteorder
fix "from_bytes() missing required argument 'byteorder'"
2023-05-26 12:53:06 -04:00
Andrei 34ad71f448
Merge pull request #274 from dmahurin/fix-missing-antiprompt
low_level_api_chat_cpp.py: Fix missing antiprompt output in chat.
2023-05-26 12:52:34 -04:00
Don Mahurin 0fa2ec4903 low_level_api_chat_cpp.py: Fix missing antiprompt output in chat. 2023-05-26 06:54:28 -07:00
Don Mahurin d6a7adb17a fix "missing 1 required positional argument: 'min_keep'" 2023-05-23 06:42:22 -07:00
Don Mahurin 327eedbfe1 fix "from_bytes() missing required argument 'byteorder'" 2023-05-23 00:20:34 -07:00
Andrei Betlen c7788c85ab Add Guidance example 2023-05-19 03:16:58 -04:00
Andrei 7499fc1cbb
Merge pull request #126 from Stonelinks/deprecate-example-server
Deprecate example server
2023-05-08 19:29:04 -04:00
Mug eaf9f19aa9 Fix lora 2023-05-08 15:27:42 +02:00
Mug 2c0d9b182c Fix session loading and saving in low level example chat 2023-05-08 15:27:03 +02:00
Mug fd80ddf703 Fix a bug with wrong type 2023-05-06 22:22:28 +02:00
Mug 996f63e9e1 Add utf8 to chat example 2023-05-06 15:16:58 +02:00
Mug 3ceb47b597 Fix mirastat requiring c_float 2023-05-06 13:35:50 +02:00
Mug 9797394c81 Wrong logit_bias parsed type 2023-05-06 13:27:52 +02:00
Mug 1895c11033 Rename postfix to suffix to match upstream 2023-05-06 13:18:25 +02:00
Mug 0e9f227afd Update low level examples 2023-05-04 18:33:08 +02:00
Lucas Doyle 0fcc25cdac examples fastapi_server: deprecate
This commit "deprecates" the example fastapi server by remaining runnable but pointing folks at the module if they want to learn more.

Rationale:

Currently there exist two server implementations in this repo:

- `llama_cpp/server/__main__.py`, the module that's runnable by consumers of the library with `python3 -m llama_cpp.server`
- `examples/high_level_api/fastapi_server.py`, which is probably a copy-pasted example by folks hacking around

IMO this is confusing. As a new user of the library I see they've both been updated relatively recently but looking side-by-side there's a diff.

The one in the module seems better:
- supports logits_all
- supports use_mmap
- has experimental cache support (with some mutex thing going on)
- some stuff with streaming support was moved around more recently than fastapi_server.py
2023-05-01 22:34:23 -07:00
Mug c39547a986 Detect multi-byte responses and wait 2023-04-28 12:50:30 +02:00
Mug 5f81400fcb Also ignore errors on input prompts 2023-04-26 14:45:51 +02:00
Mug 3c130f00ca Remove try catch from chat 2023-04-26 14:38:53 +02:00
Mug c4a8491d42 Fix decode errors permanently 2023-04-26 14:37:06 +02:00
Mug 53d17ad003 Fixed end of text wrong type, and fix n_predict behaviour 2023-04-17 14:45:28 +02:00
Mug 3bb45f1658 More reasonable defaults 2023-04-10 16:38:45 +02:00
Mug 0cccb41a8f Added iterative search to prevent instructions from being echoed, add ignore eos, add no-mmap, fixed 1 character echo too much bug 2023-04-10 16:35:38 +02:00
Andrei Betlen 196650ccb2 Update model paths to be more clear they should point to file 2023-04-09 22:45:55 -04:00
Andrei Betlen 6d1bda443e Add clients example. Closes #46 2023-04-08 09:35:32 -04:00
Andrei 41365b0456
Merge pull request #15 from SagsMug/main
llama.cpp chat example implementation
2023-04-07 20:43:33 -04:00
Mug 16fc5b5d23 More interoperability to the original llama.cpp, and arguments now work 2023-04-07 13:32:19 +02:00
Mug 10c7571117 Fixed too many newlines, now onto args.
Still needs shipping work so you could do "python -m llama_cpp.examples." etc.
2023-04-06 15:33:22 +02:00
Mug 085cc92b1f Better llama.cpp interoperability
Has some too many newline issues so WIP
2023-04-06 15:30:57 +02:00
MillionthOdin16 c283edd7f2 Set n_batch to default values and reduce thread count:
Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default.

Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%
2023-04-05 18:17:29 -04:00
Andrei Betlen e1b5b9bb04 Update fastapi server example 2023-04-05 14:44:26 -04:00
Mug 283e59c5e9 Fix bug in init_break not being set when exited via antiprompt and others. 2023-04-05 14:47:24 +02:00
Mug 99ceecfccd Move to new examples directory 2023-04-05 14:28:02 +02:00
Mug e4c6f34d95 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python 2023-04-05 14:18:27 +02:00
Andrei Betlen b1babcf56c Add quantize example 2023-04-05 04:17:26 -04:00
Andrei Betlen c8e13a78d0 Re-organize examples folder 2023-04-05 04:10:13 -04:00
Andrei Betlen c16bda5fb9 Add performance tuning notebook 2023-04-05 04:09:19 -04:00
Mug c862e8bac5 Fix repeating instructions and an antiprompt bug 2023-04-04 17:54:47 +02:00
Mug 9cde7973cc Fix stripping instruction prompt 2023-04-04 16:20:27 +02:00
Mug da5a6a7089 Added instruction mode, fixed infinite generation, and various other fixes 2023-04-04 16:18:26 +02:00