Skip to content

Stream Formatter

A stream formatter processes the LLM's output token-by-token in real time, deciding what to show the user and what to hide.

Why you need one

When the LLM outputs raw text, it may include:

  • <think> / </think> tags (reasoning blocks)
  • [TOOL_CALL]...[/TOOL_CALL] blocks (raw tool call JSON)

These are useful for the machine but noisy for humans. The formatter strips or replaces them before the text reaches the user.

Default: PassthroughFormatter

By default, nothing is filtered – all text goes straight through:

rust
pub struct PassThroughFormatter;
impl StreamFormatter for PassThroughFormatter {
    fn push(&mut self, token: &str) -> String {
        token.to_string()
    }
    fn flush(&mut self) -> String {
        String::new()
    }
}

Standard formatting

rust
let agent = Agent::make(config).await?
    .with_standard_formatting();

This enables StandardStreamFormatter, which:

  1. Scans each incoming token for the tool start/end tags and think tags
  2. Buffers text within tool call blocks and suppresses it
  3. Replaces think blocks with [Thinking]:\n
  4. Labels non-tool content with [Content]:
  5. Enforces a max buffer size (8KB by default) to prevent OOM on large chunks

Example transformation:

Raw LLM output:
  Let me think about this
  <think>
  The user is asking about the weather...
  </think>
  [TOOL_CALL]{"name":"get_weather","args":{"city":"Tokyo"}}[/TOOL_CALL]
  The weather in Tokyo is...

Formatted output:
  [Thinking]:
  The user is asking about the weather...
  [Content]:
  The weather in Tokyo is...

Custom StreamFormatter

Implement the StreamFormatter trait:

rust
use ambi::types::StreamFormatter;

struct MyFormatter;

impl StreamFormatter for MyFormatter {
    fn push(&mut self, token: &str) -> String {
        // Simple: just uppercase everything
        token.to_uppercase()
    }

    fn flush(&mut self) -> String {
        String::new()
    }
}

Then inject it:

rust
let agent = Agent::make(config).await?
    .with_stream_formatter(|| Box::new(MyFormatter));

The with_stream_formatter method takes a factory closure (not an instance) because a new formatter is created per streaming request. This is important for stateful formatters that accumulate buffers.

When the formatter is called

  • In streaming mode: every LLM token chunk goes through push(), and flush() runs after the stream ends.
  • In sync mode: the full output is passed through a formatter once, but the pipeline constructs it internally from push() + flush() calls.

Buffer overflow protection

StandardStreamFormatter has a hard cap (max_buffer_size, default 8192 bytes). If the buffer exceeds it, the formatter clears itself and logs an error. This is a safety net against pathological LLM output.

Released under the Apache-2.0 License.