baalajimaestro/llama.cpp

Andrei Betlen 7b57420ea9 Update llama.cpp

2023-06-05 18:17:29 -04:00

636 B

Raw Blame History

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

Added: k-quants support

[v0.1.58]

Added: Metal Silicon support

[v0.1.57]

Added: OpenLlama 3B support

[v0.1.56]

Added

Added first version of the changelog
Server: Use async routes
Use numpy for internal buffers to reduce memory usage and improve performance.

Fixed

Performance bug in stop sequence check slowing down streaming.