# Inference: Querying Text Models in AetherMind

AetherMind provides a powerful, OpenAI-compatible REST API for querying text models. Users can interact with it in multiple ways:

* AetherMind **Python Client Library** – Seamless integration with Python-based applications.
* **Web Console** – A user-friendly interface for testing models interactively.
* **LangChain** – Effortless connection to knowledge-based applications.
* **Direct API Invocation** – Use your preferred programming language or tools.
* **OpenAI Python Client** – Compatible with existing OpenAI-based implementations.

**Using the Web Console**

All AetherMind models are accessible via the web console at AetherMin&#x64;**.ai**. Users can select a model and enter a prompt with additional request parameters in the playground.

* **Non-chat models** use the **completions API**, which directly processes input.
* **Chat models (instruct models)** use the **chat completions API**, which formats input according to the model's conversation style.
* Advanced users can revert to the completions API by disabling the **"Use chat template"** option.

**Using the API**

**Chat Completions API**

Models with a conversation configuration support the chat completions API, enabling structured conversations optimized for different styles. For example:

python

```python
from dashflow.client import DashFlow  

client = DashFlow(api_key="<DASHFLOW_API_KEY>")  
response = client.chat.completions.create(  
  model="accounts/dashflow/models/llama-v3-8b-instruct",  
  messages=[{  
    "role": "user",  
    "content": "Say this is a test",  
  }],  
)  
print(response.choices[0].message.content)  
```

**Overriding the System Prompt**

Some conversation styles include a default system prompt (e.g., "You are a helpful AI."). Users can override this by setting the first message role as "system":

python

```python
[  
  {  
    "role": "system",  
    "content": "You are a pirate."  
  },  
  {  
    "role": "user",  
    "content": "Hello, what is your name?"  
  }  
]  
```

To remove the system prompt, set the `content` field to an empty string.

**Completions API**

Text models generate responses based on a given prompt. The model continues generating until reaching the maximum output tokens or a special end-of-sequence (EOS) token.

python

```python
from dashflow.client import DashFlow  

client = DashFlow(api_key="<DASHFLOW_API_KEY>")  
response = client.completion.create(  
  model="accounts/dashflow/models/llama-v3-8b-instruct",  
  prompt="Say this is a test",  
)  
print(response.choices[0].text)  
```

**Advanced Options**

* **Streaming** – Stream results in real-time for chat applications.
* **Async Mode** – Supports asynchronous requests for optimized performance.
* **Multiple Choices** – Generate multiple output variations per request.
* **Max Tokens** – Define the maximum number of tokens per response.
* **Temperature** – Control response randomness (higher = more creativity).
* **Top-p & Top-k Sampling** – Alternative sampling methods for response variety.
* **Repetition Penalty** – Reduce redundant outputs and looping behavior.
* **Mirostat Algorithm** – Fine-tune output unpredictability.
* **Logit Bias** – Adjust the likelihood of certain words appearing.

**Debugging Options**

* **Ignore EOS** – Prevents the model from stopping at an end-of-sequence token.
* **Logprobs** – Returns token probabilities for debugging.
* **Echo** – Displays input and output together for verification.
* **Raw Output** – Provides full debugging details of model response.

**Tokenization in** AetherMind

AetherMind models process text in tokens, where token count affects cost and response length. The actual number of tokens used in a request is returned in the `usage` field of the API response.

For more details, visit the AetherMind API documentation at AetherMin&#x64;**.ai/docs**.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://aethermind.gitbook.io/aethermind/inference/inference-querying-text-models-in-aethermind.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
