Llama 3.3 Nemotron Quickstart

This flexible open-weight reasoning model is designed for developers and enterprises who need transparency and customization while maintaining advanced reasoning capabilities for complex tasks. The Llama 3.3 Nemotron model is trained to think step-by-step before responding, excelling at tasks like coding, mathematics, and agentic workflows. By default, this reasoning process is active, providing you with detailed, chain-of-thought responses.

How to Use the API

For reasoning models that produce longer, more detailed responses, we highly recommend streaming tokens to ensure the best user experience.

Default Behavior

By default, the model will provide a step-by-step thought process before the final answer.

Disabling Reasoning

To get a direct answer without the chain-of-thought process, add /no_think to the beginning of your system prompt.

from openai import OpenAI

client = OpenAI(
  base_url="https://api.neosantara.xyz/v1",
  api_key="<YOUR_NUSANTARA_API_KEY>"
)

stream = client.chat.completions.create(
  model="llama-3.3-nemotron-super-49b-v1.5",
  messages=[{
    "role": "user",
    "content": "Solve this logic puzzle: If all roses are flowers and some flowers are red, can we conclude that some roses are red?",
  }],
  temperature=0.7,
  stream=True
)

for chunk in stream:
  print(chunk.choices[0].delta.content or "", end="", flush=True)

This will produce a response that includes the model’s thought process, followed by the final answer.

Best Practices

To get the best results from Llama 3.3 Nemotron, treat it like an expert problem-solver. Provide a clear, high-level objective and let the model determine the best steps to reach the solution.

Strengths: Excels at open-ended reasoning, multi-step logic, and complex coding or mathematical problems.
Avoid Over-prompting: Micromanaging each step can limit the model’s advanced reasoning capabilities. Give it the goal, not the exact path.
Provide Clear Objectives: Ensure your prompt is clear and unambiguous to get the most accurate and relevant response.
Use Streaming: For complex queries, the reasoning process can generate a lot of text. Streaming the response provides a much better user experience.

Use Cases

Code Generation & Analysis: Analyze large codebases, suggest improvements, and generate complex code snippets.
Strategic Planning: Develop multi-stage plans, reasoning about optimal approaches and potential obstacles.
Complex Document Analysis: Process and summarize technical specifications, legal contracts, and research papers.
Agentic Workflows: Build sophisticated AI agents that can perform complex, multi-step tasks.
Scientific Research: Assist in hypothesis generation, experimental design, and data analysis.
Advanced Problem Solving: Handle ambiguous requirements by inferring unstated assumptions and providing logical solutions.

First steps

Learn About Neosantara

Models & Pricing

Capabilities

Examples

Guides

Llama 3.3 Nemotron Quickstart

How to Use the API

Default Behavior

Disabling Reasoning

Best Practices

Use Cases

Next Steps

First steps

Learn About Neosantara

Models & Pricing

Capabilities

Examples

Guides

​How to Use the API

​Default Behavior

​Disabling Reasoning

​Best Practices

​Use Cases

​Next Steps

How to Use the API

Default Behavior

Disabling Reasoning

Best Practices

Use Cases

Next Steps