Chat with NPCs

Authentication and Real-Time Interactions

This document provides an in-depth explanation of the real-time communication flow. You will use:

POST /token – For user authentication and acquiring a JWT access token.
WebSocket /ws – For real-time, bi-directional communication supporting text or audio-based interactions.

Overview

Obtain an Access Token
Call the POST /token endpoint with user credentials. The server will respond with a Bearer token, which you can use to authenticate other requests.
Establish a WebSocket Connection
- Open a websocket connection to wss://api.aarda.ai/ws.
- On connection, immediately send a JSON message containing the api_token field with the token you obtained from /token.
Initialize Session
- Send an initialize message that can contain user_uuid (the unique identifier of the requesting user) and session_id (the unique identifier of a session belonging to the user_uuid).
- The server will create or resume a session and respond with an initialize_response that returns the server-confirmed user_uuid and session_id.
Interact in Real-Time
- Send subsequent message or audio payloads, always including user_uuid and session_id.
- Receive text (and optionally audio) responses in real-time.

1. `POST /token`

The POST /token endpoint handles user authentication. It expects credentials (username, password) in the format of OAuth2PasswordRequestForm.

Endpoint

POST /token

Request Body

Field	Type	Description
username	string	User's username
password	string	User's password (plaintext)

Note: This is sent as form data, not JSON.

Example (cURL)

curl -X POST "http://localhost:8000/token" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "username=alice" \
  -d "password=verysecretpassword"

Example (JSON-like representation)

{
  "username": "alice",
  "password": "verysecretpassword"
}

(In practice, you will submit this in x-www-form-urlencoded format.)

Response

A JSON object containing the token and token type:

Field	Type	Description
access_token	string	The JWT used for future requests
token_type	string	Typically `"bearer"`

Example Response

{
  "access_token": "eyJhbGciOiJIUzI1NiIsInR5c...<snip>...T2_H0s",
  "token_type": "bearer"
}

Common Failure Scenarios

401 Unauthorized
If the username or password is incorrect, you'll receive:
```
{
  "detail": "Incorrect username or password"
}
```

2. `WebSocket /ws`

Once you have an access token, you can establish a WebSocket connection.
Important: The new flow requires you to pass user_uuid and session_id with each message to track session context.

Connection Steps

Connect
Open a WebSocket connection to:
```
wss://api.aarda.ai/ws
```

Send Token
Immediately upon connection, send a JSON message containing your api_token (the JWT you got from /token):

{
  "api_token": "<YOUR_JWT_HERE>"
}

Example (JavaScript):

const socket = new WebSocket("wss://api.aarda.ai/ws");

socket.onopen = () => {
  const initData = JSON.stringify({
    api_token: "eyJhbGciOiJIUzI1NiIsInR5c...",
  });
  socket.send(initData);
};

Authenticate
The server verifies your token. If valid, the connection remains open. Otherwise, it closes the connection with a 1008 policy violation code.

Subsequent Messages

After successful authentication, you can send/receive multiple types of JSON messages:

2.1 - `initialize` Message

The first significant message after the token is verified has type = "initialize". It sets up (or resumes) your session context.

Request:

{
  "type": "initialize",
  "user_uuid": "abc123",         // Your unique user ID
  "mood": "friendly",
  "characterId": "123",
  "playerId": "456",
  "sceneId": "789",
  "forcedKnowledgeBricks": ["Some knowledge..."],
  "overrideQSystemPrompt": null,
  "overrideNpcChatPrompt": null,
  "overridePlayerChatPrompt": null,
  "overrideNpcPrompt": null,
  "overridePlayerPrompt": null,
  "audioSupport": true,
  "language": "en-US",
  "audioFormat": "pcm_22050" | "mp3_44100_32"
}

Server Response (JSON):

{
  "type": "text",
  "source": "initialize_response",
  "user_uuid": "abc123",
  "session_id": 42
}

If no user_uuid is provided, the server will generate a new one.
Save both user_uuid and session_id because subsequent messages must include them.

2.2 - Subsequent Messages

After initialization, you can send messages of type = "message" or type = "audio".
Important: Each payload must contain the same user_uuid and session_id given (or returned) during the initialization phase.

Text-Based Message

Request:

{
  "type": "message",
  "user_uuid": "abc123",
  "session_id": 42,
  "message": "Hello, how are you today?"
}

Server Response (Text - JSON):

{
  "type": "text",
  "source": "text_response",
  "response": "Hello! I'm doing well, thank you for asking.",
  "flags_player": [],
  "flags_character": [],
  "immediate_emotion": "<emotion>",
  "accumulated_emotion": "<emotion>",
  "tokens_spent": 42
}

response: AI text output
flags_player: Flags for the triggered context on player message.
flags_character: Flags for the triggered context on character message.
immediate_emotion: The emotion of the NPC at the moment of the response.
accumulated_emotion: The emotion of the NPC at the moment of the response, accumulated from all the messages.
tokens_spent: How many tokens this message consumed.

If you have audioSupport enabled, the server will also deliver a separate binary frame containing the audio version of the above text.

Audio-Based Message

Request (JSON):

{
  "type": "audio",
  "user_uuid": "abc123",
  "session_id": "def456",
  "encoding": "audio/webm", // or "audio/pcm"
  "data": "<base64_or_binary_payload>"
}

encoding – The format of your audio data. If set to "audio/webm", the server expects raw binary WEBM_OPUS data. Otherwise, it expects LINEAR16 data.
data – The actual audio payload (base64 string or raw bytes).
Sample rate - Always 48000.

Server Steps:

Decodes the audio data.
Runs speech-to-text to convert it into text.

Returns the recognized text as:

{
  "type": "text",
  "source": "transcription",
  "response": "Hi there!"
}

Processes that recognized text via the chat logic, returning the standard chat response (text + optional audio).

Example Flow:

sequenceDiagram
    participant Client
    participant Server

    Client->>Server: Connect to wss://.../ws
    Note over Server: Wait for first message
    Client->>Server: {"api_token": "..."}
    Server->>Server: Validates token
    Server->>Client: Connection accepted
    Client->>Server: {"type": "initialize", ...}
    Note over Server: Setup session with given parameters
    Server->>Client: {"type":"text", "source":"initialize_response", "user_uuid":"abc123", "session_id":42}
    Client->>Server: {"type": "audio", "encoding": "audio/webm", "data": "...", "user_uuid":"abc123", "session_id":42}
    Server->>Server: Transcribe audio
    Server->>Client: {"type":"text", "source":"transcription", "response":"Hi there!"}
    Server->>Server: Process "Hi there!" in the conversation
    Server->>Client: {"type":"text", "source":"text_response", "response":"Hello back!"}
    Server->>Client: <binary audio data>

Error and Disconnection Handling

If the api_token is missing or invalid, the server closes the WebSocket with code 1008.
If any uncaught exception occurs, the server tries to close the connection gracefully.
Handle onclose or onerror events on the client side to know when the connection has been dropped.

FAQ / Common Pitfalls

Where do I pass the JWT token for normal HTTP endpoints?
You normally include it in the Authorization header as Bearer <ACCESS_TOKEN>.
Does the server automatically store conversation state?
Yes. The server uses the user_uuid, characterId and playerId as indexes to store the conversation state. When the same user_uuid, characterId and playerId are used, the server will pick up the existing conversation state.
How do I change the voice or audio settings?
Send them in the initialize message:
```
{
  "audioSupport": true,
  "language": "en-US",
  "audioFormat": "pcm_22050"
}
```
The server picks up your preferences for text-to-speech generation.
What if I only want text responses (no audio)?
Set "audioSupport": false in the initialize message.

Conclusion

The /token endpoint and /ws WebSocket endpoint form a powerful duo for authentication and real-time conversational workflows in your FastAPI application. Use /token to retrieve a secure JWT, and establish a WebSocket connection to /ws for an interactive session supporting both text and audio messages.

Example: Full WebSocket Client Flow (Pseudocode)

Below is a simple example (in pseudocode/JavaScript) illustrating how you might connect, initialize, send messages, and receive text+audio:

// 1. Get token from /token (assume you already have 'accessToken')

// 2. Connect WebSocket
const socket = new WebSocket("wss://api.aarda.ai/ws");

socket.onopen = () => {
  // Immediately send api_token
  socket.send(JSON.stringify({ api_token: accessToken }));
};

socket.onmessage = (event) => {
  if (typeof event.data === "string") {
    // JSON-encoded text message
    const jsonMessage = JSON.parse(event.data);
    if (jsonMessage.source === "initialize_response") {
      // Save these for future messages
      window.myUserUuid = jsonMessage.user_uuid;
      window.mySessionId = jsonMessage.session_id;
      console.log("Session initialized. User UUID & Session ID set.");
    } else if (jsonMessage.source === "text_response") {
      console.log("AI responded:", jsonMessage.response);
    } else if (jsonMessage.source === "transcription") {
      console.log("Audio transcribed: ", jsonMessage.response);
    }
  } else {
    // Binary data -> audio
    const audioBlob = new Blob([event.data], { type: "audio/wav" });
    const audioURL = URL.createObjectURL(audioBlob);
    const audioElement = new Audio(audioURL);
    audioElement.play();
  }
};

// 3. Initialize Session
socket.send(
  JSON.stringify({
    type: "initialize",
    user_uuid: "abc123",
    mood: "excited",
    characterId: 12,
    playerId: 43,
    sceneId: 0,
    audioSupport: true,
    language: "en-US",
    audioFormat: "mp3_44100_32",
  }),
);

// 4. Send a text message
socket.send(
  JSON.stringify({
    type: "message",
    user_uuid: "abc123",
    session_id: 42, // Use the sessionId from the server
    message: "Hello, what's happening?",
  }),
);

// 5. Or send an audio message (base64 example)
socket.send(
  JSON.stringify({
    type: "audio",
    user_uuid: "abc123",
    session_id: 42,
    encoding: "base64",
    data: "UklGRnQAAABXQVZFZm10IBIAAAABAAEAQB8AAIA+AAACABAAZGF0YQAAAAA=",
  }),
);

Conclusion

With user_uuid and session_id included in each message, the application can maintain conversations per user and session in real time. Always:

Obtain your JWT via /token.
Open a WebSocket connection to /ws.
Immediately provide your api_token.
Use type = "initialize" to start or resume a session (providing user_uuid and any known session_id).
For each text or audio message, include the same user_uuid and session_id.
Receive text (and optional audio) responses from the server, maintaining context across messages.

Authentication and Real-Time Interactions

Overview​

1. POST /token​

Endpoint​

Request Body​

Example (cURL)​

Example (JSON-like representation)​

Response​

Example Response​

Common Failure Scenarios​

2. WebSocket /ws​

Connection Steps​

Subsequent Messages​

2.1 - initialize Message​

2.2 - Subsequent Messages​

Text-Based Message​

Audio-Based Message​

Error and Disconnection Handling​

FAQ / Common Pitfalls​

Conclusion​

Example: Full WebSocket Client Flow (Pseudocode)​

Conclusion​

Overview

1. `POST /token`

Endpoint

Request Body

Example (cURL)

Example (JSON-like representation)

Response

Example Response

Common Failure Scenarios

2. `WebSocket /ws`

Connection Steps

Subsequent Messages

2.1 - `initialize` Message

2.2 - Subsequent Messages

Text-Based Message

Audio-Based Message

Error and Disconnection Handling

FAQ / Common Pitfalls

Conclusion

Example: Full WebSocket Client Flow (Pseudocode)

Conclusion