Skip to content

Python Binding

The Python binding lets you use Ambi from Python with full access to OpenAI-compatible APIs, custom LLM engines, tool calling, and streaming.

Installation

bash
pip install ambi-python

Build from source

Requires maturin and a Rust toolchain.

bash
git clone https://github.com/Maskviva/Ambi.git
cd Ambi/bindings/python

# Install maturin if needed
pip install maturin

# Build and install the native module
maturin develop --release

Import directly:

python
from ambi import Agent, AgentState, Pipeline, LLMEngineConfig

Build & Publish

bash
cd bindings/python

# Build wheel
maturin build --release

# Publish to PyPI
maturin publish --username __token__ --password pypi-xxxxx

# Or use twine
maturin build --release
pip install twine
twine upload target/wheels/ambi_python-*.whl

Quick Start

python
import asyncio
from ambi import Agent, AgentState, Pipeline, LLMEngineConfig

async def main():
    # 1. Configure the engine
    config = LLMEngineConfig.openai(
        api_key="sk-...",
        base_url="https://api.openai.com/v1",
        model_name="gpt-4o-mini",
        temp=0.7,
        top_p=0.9,
    )

    # 2. Create the agent
    agent = await Agent.make(config)
    agent = agent.template("chatml").preamble("You are a helpful assistant.")

    # 3. Chat
    state = AgentState("session-1")
    runner = Pipeline.chat_runner(5)
    reply = await runner.chat(agent, state, "Hello!")

asyncio.run(main())

API Reference

Python APIJS EquivalentDescription
await Agent.make(config)await Agent.make(config)Create an agent
agent.preamble(text)agent.preamble(text)Set system prompt
agent.template(type_str)agent.template(type)Template type ("chatml", "llama3", …)
agent.custom_template(...)agent.customTemplate(...)Custom template (13 kwargs)
agent.add_tool(name, desc, params_json, cb)agent.tool(tool(...))Register a tool
agent.with_standard_formatting()agent.withStandardFormatting()Enable standard formatting
agent.with_eviction_strategy(...)agent.withEvictionStrategy(...)Memory eviction
agent.max_iterations(n)agent.maxIterations(n)Max tool iterations
agent.with_tool_tags(s, e)agent.withToolTags(s, e)Custom tool tags
agent.count_tokens(text)agent.countTokens(text)Token counting
AgentState(id)new AgentState(id)Session state
LLMEngineConfig.openai(...)LLMEngineConfig.openai(...)OpenAI engine
LLMEngineConfig.custom(handler)LLMEngineConfig.custom(handler)Custom Python engine
Pipeline.chat_runner(n)Pipeline.chatRunner(n)Chat runner
Pipeline.custom(handler)Pipeline.custom(handler)Custom Python pipeline
await runner.chat(...)await runner.chat(...)Sync chat
await runner.chat_stream(...)await runner.chatStream(...)Streaming chat
await stream.next_chunk()await stream.nextChunk()Read next token
resolve_request(id, result)resolveRequest(id, result)Resolve async callback

Tool Registration

Build the JSON schema manually or with a small Python helper, then call add_tool():

python
import json

def build_tool(options):
    name = options["name"]
    description = options["description"]
    required = list(options["parameters"].keys())
    properties = {}
    for key, val in options["parameters"].items():
        if isinstance(val, list):
            properties[key] = {"type": "string", "enum": val, "description": key}
        elif isinstance(val, str):
            properties[key] = {"type": val, "description": key}
        else:
            properties[key] = val
    params_json = json.dumps({"type": "object", "properties": properties, "required": required})

    def wrapped(args_json):
        args = json.loads(args_json)
        result = options["callback"](args)
        return result if isinstance(result, str) else json.dumps(result)

    return name, description, params_json, wrapped

tool_args = build_tool({
    "name": "get_weather",
    "description": "Query real-time weather for a city",
    "parameters": {"city": {"type": "string", "description": "City name"}},
    "callback": lambda args: {"temperature": 25, "condition": "Sunny"},
})

agent = agent.add_tool(*tool_args)

Custom LLM Engine

Create a custom engine from any Python callable. The handler must be synchronous — start async work inside and call resolve_request() when done:

python
import asyncio, json
from ambi import resolve_request

def handler(req_json: str):
    payload = json.loads(req_json)
    request_id = payload["request_id"]
    request = payload["request"]

    async def do_work():
        result = await my_async_llm_call(request["formatted_prompt"])
        resolve_request(request_id, result)

    asyncio.create_task(do_work())

config = LLMEngineConfig.custom(chat_handler=handler, supports_multimodal=False)

Streaming

python
stream = await runner.chat_stream(agent, state, "Tell me a story")
while True:
    chunk = await stream.next_chunk()
    if chunk is None:
        break
    print(chunk, end="", flush=True)

Template Strings

Built-in templates are available as functions returning dicts:

python
from ambi import chatml_template, deepseek_template, llama3_template

tpl = deepseek_template()
print(tpl["system_prefix"])  # <|SYS_START|>\n

Available: chatml_template, llama3_template, gemma_template, phi3_template, zephyr_template, deepseek_template, qwen_template, mistral_template, llama2_template.

Released under the Apache-2.0 License.