Commit graph

49 commits

Author SHA1 Message Date
Lucas Doyle 0fcc25cdac examples fastapi_server: deprecate
This commit "deprecates" the example fastapi server by remaining runnable but pointing folks at the module if they want to learn more.

Rationale:

Currently there exist two server implementations in this repo:

- `llama_cpp/server/__main__.py`, the module that's runnable by consumers of the library with `python3 -m llama_cpp.server`
- `examples/high_level_api/fastapi_server.py`, which is probably a copy-pasted example by folks hacking around

IMO this is confusing. As a new user of the library I see they've both been updated relatively recently but looking side-by-side there's a diff.

The one in the module seems better:
- supports logits_all
- supports use_mmap
- has experimental cache support (with some mutex thing going on)
- some stuff with streaming support was moved around more recently than fastapi_server.py
2023-05-01 22:34:23 -07:00
Mug c39547a986 Detect multi-byte responses and wait 2023-04-28 12:50:30 +02:00
Mug 5f81400fcb Also ignore errors on input prompts 2023-04-26 14:45:51 +02:00
Mug 3c130f00ca Remove try catch from chat 2023-04-26 14:38:53 +02:00
Mug c4a8491d42 Fix decode errors permanently 2023-04-26 14:37:06 +02:00
Mug 53d17ad003 Fixed end of text wrong type, and fix n_predict behaviour 2023-04-17 14:45:28 +02:00
Mug 3bb45f1658 More reasonable defaults 2023-04-10 16:38:45 +02:00
Mug 0cccb41a8f Added iterative search to prevent instructions from being echoed, add ignore eos, add no-mmap, fixed 1 character echo too much bug 2023-04-10 16:35:38 +02:00
Andrei Betlen 196650ccb2 Update model paths to be more clear they should point to file 2023-04-09 22:45:55 -04:00
Andrei Betlen 6d1bda443e Add clients example. Closes #46 2023-04-08 09:35:32 -04:00
Andrei 41365b0456
Merge pull request #15 from SagsMug/main
llama.cpp chat example implementation
2023-04-07 20:43:33 -04:00
Mug 16fc5b5d23 More interoperability to the original llama.cpp, and arguments now work 2023-04-07 13:32:19 +02:00
Mug 10c7571117 Fixed too many newlines, now onto args.
Still needs shipping work so you could do "python -m llama_cpp.examples." etc.
2023-04-06 15:33:22 +02:00
Mug 085cc92b1f Better llama.cpp interoperability
Has some too many newline issues so WIP
2023-04-06 15:30:57 +02:00
MillionthOdin16 c283edd7f2 Set n_batch to default values and reduce thread count:
Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default.

Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%
2023-04-05 18:17:29 -04:00
Andrei Betlen e1b5b9bb04 Update fastapi server example 2023-04-05 14:44:26 -04:00
Mug 283e59c5e9 Fix bug in init_break not being set when exited via antiprompt and others. 2023-04-05 14:47:24 +02:00
Mug 99ceecfccd Move to new examples directory 2023-04-05 14:28:02 +02:00
Mug e4c6f34d95 Merge branch 'main' of https://github.com/abetlen/llama-cpp-python 2023-04-05 14:18:27 +02:00
Andrei Betlen b1babcf56c Add quantize example 2023-04-05 04:17:26 -04:00
Andrei Betlen c8e13a78d0 Re-organize examples folder 2023-04-05 04:10:13 -04:00
Andrei Betlen c16bda5fb9 Add performance tuning notebook 2023-04-05 04:09:19 -04:00
Mug c862e8bac5 Fix repeating instructions and an antiprompt bug 2023-04-04 17:54:47 +02:00
Mug 9cde7973cc Fix stripping instruction prompt 2023-04-04 16:20:27 +02:00
Mug da5a6a7089 Added instruction mode, fixed infinite generation, and various other fixes 2023-04-04 16:18:26 +02:00
Mug 0b32bb3d43 Add instruction mode 2023-04-04 11:48:48 +02:00
Andrei Betlen ffe34cf64d Allow user to set llama config from env vars 2023-04-04 00:52:44 -04:00
Andrei Betlen 05eb2087d8 Small fixes for examples 2023-04-03 20:33:07 -04:00
Andrei Betlen 7fedf16531 Add support for chat completion 2023-04-03 20:12:44 -04:00
Andrei Betlen f7ab8d55b2 Update context size defaults Close #11 2023-04-03 20:11:13 -04:00
Mug f1615f05e6 Chat llama.cpp example implementation 2023-04-03 22:54:46 +02:00
Andrei Betlen caff127836 Remove commented out code 2023-04-01 15:13:01 -04:00
Andrei Betlen f28bf3f13d Bugfix: enable embeddings for fastapi server 2023-04-01 15:12:25 -04:00
Andrei Betlen ed6f2a049e Add streaming and embedding endpoints to fastapi example 2023-04-01 13:05:20 -04:00
Andrei Betlen 9fac0334b2 Update embedding example to new api 2023-04-01 13:02:51 -04:00
Andrei Betlen 5e011145c5 Update low level api example 2023-04-01 13:02:10 -04:00
Andrei Betlen 5f2e822b59 Rename inference example 2023-04-01 13:01:45 -04:00
Andrei Betlen 70b8a1ef75 Add support to get embeddings from high-level api. Closes #4 2023-03-28 04:59:54 -04:00
Andrei Betlen 3dbb3fd3f6 Add support for stream parameter. Closes #1 2023-03-28 04:03:57 -04:00
Andrei Betlen dfe8608096 Update examples 2023-03-24 19:10:31 -04:00
Andrei Betlen a61fd3b509 Add example based on stripped down version of main.cpp from llama.cpp 2023-03-24 18:57:25 -04:00
Andrei Betlen 2cc499512c Black formatting 2023-03-24 14:35:41 -04:00
Andrei Betlen d29b05bb67 Update example to match alpaca training prompt 2023-03-24 14:34:15 -04:00
Andrei Betlen 15e3dc7897 Add fastapi example 2023-03-24 01:41:24 -04:00
Andrei Betlen 9af16b63fd Added low-level api inference example 2023-03-23 23:45:59 -04:00
Andrei Betlen 8680332203 Update examples 2023-03-23 23:12:42 -04:00
Andrei Betlen 90c78723de Add basic langchain demo 2023-03-23 16:25:24 -04:00
Andrei Betlen 3d6eb32c76 Update basic example 2023-03-23 14:57:31 -04:00
Andrei Betlen 79b304c9d4 Initial commit 2023-03-23 05:33:06 -04:00