llama.cpp/examples/high_level_api
MillionthOdin16 c283edd7f2 Set n_batch to default values and reduce thread count:
Change batch size to the llama.cpp default of 8. I've seen issues in llama.cpp where batch size affects quality of generations. (It shouldn't) But in case that's still an issue I changed to default.

Set auto-determined num of threads to 1/2 system count. ggml will sometimes lock cores at 100% while doing nothing. This is being addressed, but can cause bad experience for user if pegged at 100%
2023-04-05 18:17:29 -04:00
..
fastapi_server.py Set n_batch to default values and reduce thread count: 2023-04-05 18:17:29 -04:00
high_level_api_embedding.py Re-organize examples folder 2023-04-05 04:10:13 -04:00
high_level_api_inference.py Re-organize examples folder 2023-04-05 04:10:13 -04:00
high_level_api_streaming.py Re-organize examples folder 2023-04-05 04:10:13 -04:00
langchain_custom_llm.py Re-organize examples folder 2023-04-05 04:10:13 -04:00