Merge pull request #456 from AgentJ-WR/patch-1

Show how to adjust context window in README.md
This commit is contained in:
Andrei 2023-07-08 02:32:20 -04:00 committed by GitHub
commit 236c4cf442
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -105,6 +105,15 @@ Below is a short example demonstrating how to use the high-level API to generate
}
```
### Adjusting the Context Window
The context window of the Llama models determines the maximum number of tokens that can be processed at once. By default, this is set to 512 tokens, but can be adjusted based on your requirements.
For instance, if you want to work with larger contexts, you can expand the context window by setting the n_ctx parameter when initializing the Llama object:
```python
llm = Llama(model_path="./models/7B/ggml-model.bin", n_ctx=2048)
```
## Web Server
`llama-cpp-python` offers a web server which aims to act as a drop-in replacement for the OpenAI API.