Streaming Responses

Streaming allows you to receive partial responses from the API as they are generated, rather than waiting for the entire response to be completed. This can significantly improve the perceived latency and user experience, especially for longer generations.

How Streaming Works

The Nusantara AI API implements streaming using Server-Sent Events (SSE) when you set the stream: true parameter in your request. When a streaming request is made:

Content-Type Header: The API sets the Content-Type header to text/event-stream.
Partial Data Chunks: The server sends data in small chunks. Each chunk is an event with a specific type and data payload.
End of Stream Signal: The stream is terminated by a final event, like response.completed or a data: [DONE] message, indicating that no further data will be sent.

This allows your client application to display or process the generated content incrementally.

Usage with SDKs

You can enable streaming easily by setting stream: true in your request body. Below are examples for both of our main endpoints.

Responses API
Chat Completions API

The /v1/responses endpoint uses a modern, event-driven stream. Each chunk is a typed event, allowing you to easily handle different parts of the response, such as text deltas or function call arguments.

Request Body Example

{
  "model": "nusantara-base",
  "input": "Explain the concept of AI streaming in detail.",
  "stream": true
}

Example Streaming Events

event: response.created
data: {"type":"response.created","response":{"id":"resp_abc123",...}}

event: response.output_item.added
data: {"type":"response.output_item.added","item":{"id":"msg_def456",...}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":"AI "}

event: response.output_text.delta
data: {"type":"response.output_text.delta","delta":"streaming "}

... more delta events ...

event: response.completed
data: {"type":"response.completed","response":{...}}

Python & Javascript Examples

from openai import OpenAI

# Initialize the client for Nusantara AI
client = OpenAI(
    api_key="<YOUR_NUSANTARA_API_KEY>", 
    base_url="https://api.neosantara.xyz/v1"
)

def stream_response(prompt: str, model: str = "nusantara-base"):
    """Makes a streaming request to the Responses API."""
    print(f"Streaming from /v1/responses for model: {model}")
    print(f"Prompt: {prompt}\n")

    try:
        stream = client.responses.create(
            model=model,
            input=prompt,
            stream=True,
            max_tokens=2045
        )

        full_response_content = ""
        print("AI Response: ", end="", flush=True)
        for event in stream:
            # Check for the event type containing text chunks
            if event.type == 'response.output_text.delta':
                content = event.delta
                full_response_content += content
                print(content, end="", flush=True)

        print("\n\n--- Stream finished ---")
        return full_response_content

    except Exception as e:
        print(f"\nAn error occurred: {e}")
        return None

if __name__ == "__main__":
    user_prompt = "Tell me a short story about a mythical creature from Indonesian folklore."
    streamed_text = stream_response(user_prompt)
    if streamed_text:
        print(f"\nTotal streamed content length: {len(streamed_text)} characters")

First steps

Learn About Neosantara

Models & Pricing

Capabilities

Examples

Guides

How Streaming Works

Usage with SDKs

Request Body Example

Example Streaming Events

Python & Javascript Examples

First steps

Learn About Neosantara

Models & Pricing

Capabilities

Examples

Guides

​How Streaming Works

​Usage with SDKs

​Request Body Example

​Example Streaming Events

​Python & Javascript Examples

How Streaming Works

Usage with SDKs

Request Body Example

Example Streaming Events

Python & Javascript Examples