Skip to content

Configuration

AgentConfig

The AgentConfig struct is created internally when you call Agent::make(). Defaults are sensible for most use cases:

rust
pub struct AgentConfig {
    pub system_prompt: String,              // default: ""
    pub template: ChatTemplate,             // default: Chatml
    pub max_iterations: usize,              // default: 10
    pub eviction_strategy: EvictionStrategy, // default: 8K tokens
}

You control these via builder methods, not by constructing AgentConfig directly.

EvictionStrategy

Controls when and how old messages are removed from context:

rust
pub struct EvictionStrategy {
    pub max_safe_tokens: usize,  // default: 8000
}

When total_tokens + prompt_overhead > max_safe_tokens, the framework pops the oldest messages (FIFO) until the budget fits.

rust
let agent = Agent::make(config).await?
    .with_eviction_strategy(EvictionStrategy { max_safe_tokens: 4096 });

The default of 8K is a rough safe point for 8K-context models. For 128K models you might set it to 64K or higher. The exact value depends on how much output room you need.

LLMEngineConfig

This is the enum you pass to Agent::make():

rust
pub enum LLMEngineConfig {
    #[cfg(feature = "openai-api")]
    OpenAI(OpenAIEngineConfig),
    #[cfg(feature = "llama-cpp")]
    Llama(LlamaEngineConfig),
    Custom(Box<dyn LLMEngineTrait>),
}

OpenAI config

rust
OpenAIEngineConfig {
    api_key: String,
    base_url: String,    // "https://api.openai.com/v1"
    model_name: String,  // "gpt-4o"
    temp: f32,           // 0.0 - 2.0
    top_p: f32,          // 0.0 - 1.0
}

base_url can point to any OpenAI-compatible endpoint (DeepSeek, Ollama with OpenAI adapter, etc.).

Llama.cpp config

rust
LlamaEngineConfig {
    model_path: String,              // path to .gguf file
    mmproj_path: Option<String>,     // external vision projector (e.g., mmproj-model-f16.gguf)
    integrated_vision: bool,         // whether the model has native vision capabilities
    max_tokens: i32,                 // max tokens to predict
    buffer_size: usize,              // batch buffer size for piece decoding
    use_gpu: bool,                   // offload layers to GPU
    n_gpu_layers: u32,               // how many layers to offload to GPU
    n_ctx: u32,                      // context window size
    n_tokens: usize,                 // batch size for prompt processing
    n_seq_max: i32,                  // max sequences in a batch
    penalty_last_n: i32,             // past tokens to consider for penalties
    penalty_repeat: f32,             // repetition penalty
    penalty_freq: f32,               // frequency penalty
    penalty_present: f32,            // presence penalty
    temp: f32,                       // temperature (0.0 – 2.0)
    top_p: f32,                      // nucleus sampling threshold
    seed: u32,                       // RNG seed for deterministic generation
    min_keep: usize,                 // min-keep sampling boundary
}

Validation runs at load time – if required fields are missing or out of range, you get an EngineError immediately rather than a cryptic crash mid-inference.

Feature flags

toml
[dependencies]
ambi = { version = "0.3", default-features = false, features = ["openai-api"] }
FeatureWhat it enablesDependencies
openai-apiOpenAI-compatible cloud backendasync-openai
llama-cppLocal inference via llama.cppllama-cpp-2, llama-cpp-sys-2
cudaCUDA acceleration (implies llama-cpp)+ CUDA SDK
vulkanVulkan acceleration+ Vulkan SDK
metalApple Metal acceleration+ Metal framework
rocmAMD ROCm acceleration+ ROCm
macro#[tool] and #[agent] attribute macros (see ambi-macros)ambi-macros
mtmdMultimodal support for Llama (VLM)+ base64

You cannot enable more than one GPU backend at once – there's a compile-time compile_error! guard for this.

Adding to the runtime requirement

toml
tokio = { version = "1", features = ["rt-multi-thread", "sync", "time", "macros"] }

See native platform for details.

Released under the Apache-2.0 License.