Chat Completions

Generate a Chat Completion

Use this endpoint to generate a model response for a given conversation. The request and response shape follows the common OpenAI Chat Completions format, so you can use it with the official OpenAI SDK or compatible clients.

Endpoint

POST https://www.tokiai.ai/v1/chat/completions

Request Parameters

Parameter	Type	Required	Description
`model`	string	Yes	The model identifier (e.g., `deepseek/deepseek-chat-v3`).
`messages`	array	Yes	A list of message objects comprising the conversation so far.
`temperature`	number	No	What sampling temperature to use, between 0 and 2. Higher values mean the model will take more risks. Default: 1.
`top_p`	number	No	An alternative to sampling with temperature, called nucleus sampling. Default: 1.
`frequency_penalty`	number	No	Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far.
`presence_penalty`	number	No	Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far.
`max_tokens`	integer	No	The maximum number of tokens to generate in the chat completion.
`stream`	boolean	No	If set, partial message deltas will be sent, like in ChatGPT.
`response_format`	object	No	An object specifying the format that the model must output. Used to enable JSON mode.
`tools`	array	No	Tool definitions. Support depends on the selected model.
`tool_choice`	string/object	No	Controls whether and how the model calls tools.

Message Object

Each object in the messages array requires a role and content.

interface Message {
  role: 'system' | 'developer' | 'user' | 'assistant' | 'tool';
  content: string | ContentPart[];
  name?: string; // Used for tool calls
}

Support for multimodal inputs, tool calls, JSON mode, developer messages, and sampling parameters can differ by model. Check the model page, console, and server response for the current supported behavior.

Example Request

cURL

curl https://www.tokiai.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKI_API_KEY" \
  -d '{
    "model": "deepseek/deepseek-chat-v3",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "What is machine learning?"}
    ],
    "temperature": 0.7,
    "max_tokens": 500
  }'

Response

The response follows the common OpenAI Chat Completions structure.

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1714000000,
  "model": "deepseek/deepseek-chat-v3",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Machine learning is a subset of artificial intelligence..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}

Streaming

Set stream: true to receive incremental data as Server-Sent Events:

cURL

curl https://www.tokiai.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKI_API_KEY" \
  -d '{
    "model": "deepseek/deepseek-chat-v3",
    "messages": [
      {"role": "user", "content": "Introduce TOKI in one sentence"}
    ],
    "stream": true
  }'

Clients should read data: lines from the SSE stream and stop when [DONE] is received.

Chat Completions

Generate a Chat Completion

Endpoint

Request Parameters

Message Object

Example Request

Response

Streaming

On this page