llama.cpp

Author	SHA1	Message	Date
Andrei Betlen	4852a6a39c	Fix built in GBNF grammar rules	2023-11-08 00:06:22 -05:00
Andrei Betlen	64f5153c35	Add seed parameter to chat handlers	2023-11-07 23:41:29 -05:00
Andrei Betlen	86aeb9f3a1	Add seed parameter support for completion and chat_completion requests. Closes #884	2023-11-07 23:37:28 -05:00
Damian Stewart	aab74f0b2b	Multimodal Support (Llava 1.5) (#821 ) * llava v1.5 integration * Point llama.cpp to fork * Add llava shared library target * Fix type * Update llama.cpp * Add llava api * Revert changes to llama and llama_cpp * Update llava example * Add types for new gpt-4-vision-preview api * Fix typo * Update llama.cpp * Update llama_types to match OpenAI v1 API * Update ChatCompletionFunction type * Reorder request parameters * More API type fixes * Even More Type Updates * Add parameter for custom chat_handler to Llama class * Fix circular import * Convert to absolute imports * Fix * Fix pydantic Jsontype bug * Accept list of prompt tokens in create_completion * Add llava1.5 chat handler * Add Multimodal notebook * Clean up examples * Add server docs --------- Co-authored-by: Andrei Betlen <abetlen@gmail.com>	2023-11-07 22:48:51 -05:00
Andrei Betlen	56171cf7bf	Bump version	2023-11-06 09:37:55 -05:00
Andrei Betlen	be0add1b2d	Fix type bug	2023-11-06 09:30:38 -05:00
Andrei Betlen	e214a58422	Refactor Llama class internals	2023-11-06 09:16:36 -05:00
Andrei Betlen	bbffdaebaa	Refactor autotokenizer format to reusable function	2023-11-06 09:07:27 -05:00
Joe	4ff8def4d0	#717 : Add support for Huggingface Autotokenizer (#790 ) Co-authored-by: Andrei <abetlen@gmail.com>	2023-11-05 18:06:36 -05:00
earonesty	3580e2c5df	Update llama_chat_format.py (#869 ) * Update llama_chat_format.py properly formal llama2 with first-message prompt embedded * Update llama_chat_format.py	2023-11-05 17:00:13 -05:00
Andrei Betlen	f0b30ef7dc	Update llama.cpp	2023-11-05 16:57:10 -05:00
Andrei Betlen	2ec043af76	Clean up stdout / stderr suppression	2023-11-03 13:02:15 -04:00
Andrei Betlen	4ea7027c41	Rename internal only module utils to _utils	2023-11-03 12:55:55 -04:00
Andrei Betlen	df9362eeea	Update llama.cpp	2023-11-03 11:34:50 -04:00
Andrei	3af7b21ff1	Add functionary support (#784 ) * Add common grammars and json-schema-to-grammar utility function from llama.cpp * Pass functions to format function * Add basic functionary formatting * Add LlamaChatHandler for more complex chat use cases * Add function calling example notebook * Add support for regular chat completions alongside function calling	2023-11-03 02:12:14 -04:00
Andrei	ab028cb878	Migrate inference to llama_batch and llama_decode api (#795 ) * Add low-level batching notebook * fix: tokenization of special characters: (#850) It should behave like llama.cpp, where most out of the box usages treat special characters accordingly * Update CHANGELOG * Cleanup * Fix runner label * Update notebook * Use llama_decode and batch api * Support logits_all parameter --------- Co-authored-by: Antoine Lizee <antoine.lizee@gmail.com>	2023-11-02 20:13:57 -04:00
Andrei Betlen	8350de9a18	Bump version	2023-11-02 15:53:01 -04:00
Andrei Betlen	011b95d7f3	Fix name 'open' is not defined exception. Closes #860	2023-11-02 15:30:55 -04:00
Andrei Betlen	fa83cc5f9c	Update llama.cpp Fix build examples Exclude examples directory Revert cmake changes Try actions/checkout@v4 Try to update submodules Revert Update llama.cpp Fix build examples Exclude examples directory Revert cmake changes Try actions/checkout@v4 Try to update submodules Revert	2023-11-02 14:28:15 -04:00
Antoine Lizee	4d4e0f11e2	fix: tokenization of special characters: (#850 ) It should behave like llama.cpp, where most out of the box usages treat special characters accordingly	2023-11-02 14:28:14 -04:00
Andrei Betlen	6b3aa7fc8f	Bump version	2023-11-01 19:25:03 -04:00
Sujeendran Menon	7b136bb5b1	Fix for shared library not found and compile issues in Windows (#848 ) * fix windows library dll name issue * Updated README.md Windows instructions * Update llama_cpp.py to handle different windows dll file versions	2023-11-01 18:55:57 -04:00
cebtenzzre	eefd76fe81	llama: fix exception in Llama.__del__ (#846 )	2023-11-01 18:53:57 -04:00
David Ponce	3fc9147218	Iterate over tokens that should be biased rather than the entire vocabulary. (#851 )	2023-11-01 18:53:47 -04:00
Marko Tasic	9c8f4dca5f	fixed Llama._create_completion suffix check, it can be either None or str instance (#854 )	2023-11-01 18:52:50 -04:00
Daniel Thuerck	5f8f369d1b	Pass-Through grammar parameter in web server. (#855 ) Closes #778	2023-11-01 18:51:12 -04:00
Adam Katora	25cb710281	Update llama_types.py (#849 ) Minor typo fix, funcion -> function	2023-11-01 18:50:11 -04:00
Andrei Betlen	d808fd436c	Update llama.cpp	2023-10-31 21:29:35 -04:00
Andrei Betlen	53861c9e53	Update llama.cpp	2023-10-24 03:13:32 -04:00
gmcgoldr	09a8406c83	Fix streaming doesn't return finish reason (#798 ) When streaming the yield that contains the finish can be skipped. This change ensures that yield isn't skipped.	2023-10-19 02:55:56 -04:00
Andrei Betlen	28c2b884e2	Merge branch 'main' of github.com:abetlen/llama_cpp_python into main	2023-10-19 02:55:31 -04:00
Andrei Betlen	ff580031d2	Update llama.cpp	2023-10-19 02:55:08 -04:00
Xiaoyu Kevin Hu	a315128d66	update value check for n_gpu_layers field (#826 )	2023-10-18 18:25:25 -04:00
Pierre Alexandre SCHEMBRI	10304d75fc	Make use of suppress_stdout_stderr when freeing model (#803 )	2023-10-15 13:52:43 -04:00
Ma, Guokai	a1ac199980	Fix repeat greeting (#808 ) * fix repeated greeting * remove seperator between role and message	2023-10-15 13:52:21 -04:00
Eric Liu	b50166500e	Add validation for tensor_split size exceeding LLAMA_MAX_DEVICES (#820 ) * Add validation for tensor_split size exceeding LLAMA_MAX_DEVICES * reword	2023-10-15 13:51:51 -04:00
Andrei Betlen	d6a130a052	Print traceback on server error	2023-10-10 15:56:04 -04:00
Andrei Betlen	43dfe1e2ab	Update llama.cpp	2023-10-05 16:07:49 -04:00
Andrei Betlen	a7d17b8ac9	Update llama.cpp	2023-10-03 15:23:35 -04:00
Andrei Betlen	305482bd41	Add chatml chat format	2023-09-30 21:01:34 -04:00
Andrei Betlen	5ef5280ef9	Log server exceptions to stdout	2023-09-30 19:13:36 -04:00
Andrei Betlen	fab4bccc35	Bump version	2023-09-30 16:04:46 -04:00
Andrei Betlen	d696251fbe	Fix logits_all bug	2023-09-30 16:02:35 -04:00
Andrei Betlen	6ee413d79e	Bump version	2023-09-30 13:23:09 -04:00
Andrei Betlen	42bb721d64	Fix bug in embedding	2023-09-30 13:20:22 -04:00
Andrei Betlen	5d62d55a82	Bump version	2023-09-30 00:07:06 -04:00
Andrei Betlen	386c88b68e	Bump version	2023-09-29 20:07:31 -04:00
Andrei Betlen	d9bce17794	Update server params	2023-09-29 19:59:12 -04:00
Andrei Betlen	3720c739d4	Update llama.cpp	2023-09-29 19:58:21 -04:00
Andrei	3bca7708fb	Configurable Chat Formats (#711 ) * Add configurable default chat completion format. * Remove chat_template file to avoid circular import * Update llama_types * Add chat format	2023-09-29 19:52:04 -04:00
Josh XT	a945404b4a	Fix rope scaling defaults (#767 ) * Fix rope scale with backwards compatibility * Fix defaults * Fix op * Remove backwards compatibility * Check single val	2023-09-29 16:03:57 -04:00
Andrei Betlen	1a1c3dc418	Update llama.cpp	2023-09-28 22:42:03 -04:00
Andrei Betlen	4177ae6d34	Bump version	2023-09-25 14:38:38 -04:00
Viacheslav/Slava Tradunsky	3d5e5b1c04	Adds openai-processing-ms response header (#748 )	2023-09-25 13:55:58 -04:00
Andrei Betlen	dbca136fea	Update llama_types and names to match openai api	2023-09-20 15:38:26 -04:00
Andrei Betlen	38e34c97f0	Update llama.cpp	2023-09-18 16:11:27 -04:00
Andrei Betlen	8d75016549	Install required runtime dlls to package directory on windows	2023-09-16 14:57:49 -04:00
Andrei Betlen	acf18fcdf0	Bump version	2023-09-15 14:22:21 -04:00
Andrei Betlen	b047b3034e	Remove confusing helpstring from server cli args. Closes #719	2023-09-15 14:09:43 -04:00
Andrei Betlen	24fec0b242	Bump version	2023-09-14 18:33:08 -04:00
Andrei Betlen	8474665625	Update base_path to fix issue resolving dll in windows isolation container.	2023-09-14 14:51:43 -04:00
Andrei Betlen	507bcc7171	Bump version	2023-09-13 23:15:23 -04:00
Andrei Betlen	0449d29b9f	Fix boolean env vars and cli arguments	2023-09-13 23:09:57 -04:00
earonesty	58a6e42cc0	Update app.py (#705 )	2023-09-13 23:01:34 -04:00
Andrei Betlen	f4090a0bb2	Add numa support, low level api users must now explicitly call llama_backend_init at the start of their programs.	2023-09-13 23:00:43 -04:00
Andrei Betlen	c999325e8e	Fix boolean cli flags	2023-09-13 22:56:10 -04:00
Andrei Betlen	4daf77e546	Format	2023-09-13 21:23:23 -04:00
Andrei Betlen	2920c4bf7e	Update server params. Added lora_base, lora_path, low_vram, and main_gpu. Removed rms_norm_eps and n_gqa (deprecated in llama.cpp)	2023-09-13 21:23:13 -04:00
Andrei Betlen	6a20293fc2	Reorder init params to match llama.cpp order	2023-09-13 21:20:26 -04:00
Andrei Betlen	c8f9b8a734	Explicitly make all init params other than model_path into keyword only params	2023-09-13 21:19:47 -04:00
Andrei Betlen	a68f9e2791	Add kwargs to init to catch extra params	2023-09-13 21:19:02 -04:00
Andrei Betlen	9e345a47a2	remove print	2023-09-13 21:12:27 -04:00
Andrei Betlen	517f9ed80b	Convert missed llama.cpp constants into standard python types	2023-09-13 21:11:52 -04:00
Andrei Betlen	c4c440ba2d	Fix tensor_split cli option	2023-09-13 20:00:42 -04:00
Andrei Betlen	203ede4ba2	Bump version	2023-09-13 18:07:08 -04:00
Andrei Betlen	759405c84b	Fix issue with Literal and Optional cli arguments not working. Closes #702	2023-09-13 18:06:12 -04:00
Devrim	da9df78db0	Add X-Request-ID request header for mirroring custom IDs. (#703 )	2023-09-13 16:18:31 -04:00
Andrei Betlen	8e13520796	Bump version	2023-09-13 01:47:58 -04:00
Andrei Betlen	2787663a25	Bump version	2023-09-12 21:00:01 -04:00
Andrei Betlen	6e89775759	Bump version	2023-09-12 18:57:01 -04:00
Andrei Betlen	bb4e67e7aa	Using dynamic version	2023-09-12 18:56:36 -04:00
Andrei Betlen	1910793f56	Merge branch 'main' into v0.2-wip	2023-09-12 16:43:32 -04:00
Andrei Betlen	c7901f1141	Bump version	2023-09-12 16:16:40 -04:00
janvdp	33ce931cce	merge upstream	2023-09-09 21:21:04 +02:00
Andrei Betlen	d3f63211ef	Update llama.cpp	2023-09-09 12:12:32 -04:00
janvdp	da0fdafc32	import version in __init__.py	2023-09-05 21:09:28 +02:00
janvdp	6e8e64d09a	add version file	2023-09-05 21:09:08 +02:00
Andrei Betlen	186626d58e	Update llama.cpp	2023-09-01 14:26:13 -04:00
Andrei Betlen	47de3ab104	Update llama.cpp	2023-08-29 07:36:20 -04:00
Andrei Betlen	3f76e1de52	cjk pr minor cleanup	2023-08-29 07:21:59 -04:00
Andrei	bae44ec8bf	Merge pull request #309 from MeouSker77/fix-CJK Fix CJK and emoji stream output	2023-08-29 06:58:10 -04:00
Andrei Betlen	e0dcbc28a1	Update llama.cpp	2023-08-28 10:33:45 -04:00
Andrei Betlen	4887973c22	Update llama.cpp	2023-08-27 12:59:20 -04:00
Andrei Betlen	3a29d65f45	Update llama.cpp	2023-08-26 23:36:24 -04:00
Andrei Betlen	5de8009706	Add copilot-codex completions endpoint for drop-in copilot usage	2023-08-25 17:49:14 -04:00
Andrei Betlen	ac47d55577	Merge branch 'main' into v0.2-wip	2023-08-25 15:45:22 -04:00
Andrei Betlen	ef23d1e545	Update llama.cpp	2023-08-25 14:35:53 -04:00
Andrei Betlen	48cf43b427	Use _with_model variants for tokenization	2023-08-25 13:43:16 -04:00
Andrei Betlen	8ac59465b9	Strip leading space when de-tokenizing.	2023-08-25 04:56:48 -04:00
Andrei Betlen	c2d1deaa8a	Update llama.cpp	2023-08-24 18:01:42 -04:00

1 2 3 4 5 ...

518 commits