---
title: API Reference
---

## High Level API

High-level Python bindings for llama.cpp.

::: llama_cpp.Llama
    options:
        members:
            - __init__
            - tokenize
            - detokenize
            - reset
            - eval
            - sample
            - generate
            - create_embedding
            - embed
            - create_completion
            - __call__
            - create_chat_completion
            - set_cache
            - save_state
            - load_state
            - token_bos
            - token_eos
        show_root_heading: true

::: llama_cpp.LlamaGrammar
    options:
        members:
            - from_string
            - from_json_schema

::: llama_cpp.LlamaCache
    options:
        show_root_heading: true

::: llama_cpp.LlamaState
    options:
        show_root_heading: true

::: llama_cpp.LogitsProcessor
    options:
        show_root_heading: true

::: llama_cpp.LogitsProcessorList
    options:
        show_root_heading: true

::: llama_cpp.StoppingCriteria
    options:
        show_root_heading: true

::: llama_cpp.StoppingCriteriaList
    options:
        show_root_heading: true

## Low Level API

Low-level Python bindings for llama.cpp using Python's ctypes library.

::: llama_cpp.llama_cpp
    options:
        show_if_no_docstring: true
        # filter only members starting with `llama_`
        filters:
            - "^llama_"

::: llama_cpp.llama_cpp
    options:
        show_if_no_docstring: true
        show_root_heading: false
        show_root_toc_entry: false
        heading_level: 4
        # filter only members starting with `LLAMA_`
        filters:
            - "^LLAMA_"

## Misc

::: llama_cpp.llama_types
    options:
        show_if_no_docstring: true