Skip to main content
To receive the response as it’s generated, pass "stream": true. The response arrives as SSE — a sequence of data: chunks in OpenAI format, terminated by a data: [DONE] line.
from openai import OpenAI

client = OpenAI(base_url="https://api.ethereal.llc/v1", api_key="eth-...")

stream = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Write a haiku about code"}],
    stream=True,
)
for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Chunk format

data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{"content":"Sil"},"index":0,"finish_reason":null}]}

data: {"id":"...","object":"chat.completion.chunk","choices":[{"delta":{},"index":0,"finish_reason":"stop"}]}

data: [DONE]
When streaming, billing is based on the actual number of generated tokens — exactly as with a regular request.