Context API
The Context API retrieves relevant memories and formats them for direct injection into LLM prompts. It handles token budgeting, ranking, and formatting so you can focus on building your application.
Get LLM Context
POST /v1/contextRetrieves semantically relevant memories formatted for LLM consumption.
Required Scope: search:read
Request Body
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query | string | Yes | - | The context query (typically the user's current question) |
max_tokens | integer | No | 4000 | Maximum tokens for the context window |
tier | string | No | - | Filter by tier: short, medium, long |
format | string | No | text | Output format: text or json |
Response
| Field | Type | Description |
|---|---|---|
context | string | Formatted context ready for LLM injection |
memories | array | Source memories with relevance info |
token_count | integer | Estimated token count of the context |
retrieval_time_ms | integer | Time spent retrieving context |
Memory Object in Response
| Field | Type | Description |
|---|---|---|
id | string | Memory ID |
content | string | Memory content |
tier | string | Memory tier |
score | number | Combined relevance score |
cURL Example
curl -X POST "https://api.mymemoryos.com/api/v1/context" \
-H "Authorization: Bearer mos_live_<your_key>" \
-H "Content-Type: application/json" \
-d '{
"query": "What does the user prefer for their development environment?",
"max_tokens": 2000,
"format": "text"
}'JavaScript Example
const response = await fetch('https://api.mymemoryos.com/api/v1/context', {
method: 'POST',
headers: {
'Authorization': 'Bearer mos_live_<your_key>',
'Content-Type': 'application/json'
},
body: JSON.stringify({
query: 'What does the user prefer for their development environment?',
max_tokens: 2000,
format: 'text'
})
});
const { data } = await response.json();
console.log(`Retrieved ${data.memories.length} memories (${data.token_count} tokens)`);
console.log('Context:', data.context);Python Example
import requests
response = requests.post(
'https://api.mymemoryos.com/api/v1/context',
headers={
'Authorization': 'Bearer mos_live_<your_key>',
'Content-Type': 'application/json'
},
json={
'query': 'What does the user prefer for their development environment?',
'max_tokens': 2000,
'format': 'text'
}
)
data = response.json()['data']
print(f"Retrieved {len(data['memories'])} memories ({data['token_count']} tokens)")
print(f"Context: {data['context']}")Response Example (Text Format)
{
"data": {
"context": "User prefers dark mode interfaces and minimal UI designs\n\nUser uses VS Code as their primary IDE with Vim keybindings\n\nUser prefers TypeScript over JavaScript for all projects",
"memories": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"content": "User prefers dark mode interfaces and minimal UI designs",
"tier": "long",
"score": 0.92
},
{
"id": "550e8400-e29b-41d4-a716-446655440001",
"content": "User uses VS Code as their primary IDE with Vim keybindings",
"tier": "long",
"score": 0.88
},
{
"id": "550e8400-e29b-41d4-a716-446655440002",
"content": "User prefers TypeScript over JavaScript for all projects",
"tier": "long",
"score": 0.85
}
],
"token_count": 156,
"retrieval_time_ms": 85
},
"meta": {
"request_id": "req_abc123",
"latency_ms": 95
}
}Response Example (JSON Format)
{
"data": {
"context": "[{\"id\":\"550e8400-e29b-41d4-a716-446655440000\",\"content\":\"User prefers dark mode interfaces and minimal UI designs\",\"tier\":\"long\",\"score\":0.92}]",
"memories": [...],
"token_count": 245,
"retrieval_time_ms": 85
},
"meta": {
"request_id": "req_abc123",
"latency_ms": 95
}
}Injecting Context into LLM Prompts
The context API is designed for seamless integration with LLM applications. Here are common patterns:
Basic Prompt Injection
async function generateResponse(userMessage) {
// 1. Get relevant context
const contextResponse = await fetch('https://api.mymemoryos.com/api/v1/context', {
method: 'POST',
headers: {
'Authorization': 'Bearer mos_live_<your_key>',
'Content-Type': 'application/json'
},
body: JSON.stringify({
query: userMessage,
max_tokens: 2000
})
});
const { data: contextData } = await contextResponse.json();
// 2. Build the prompt with context
const systemPrompt = `You are a helpful assistant with access to user memories and preferences.
Here is relevant context about the user:
<context>
${contextData.context}
</context>
Use this context to personalize your response when relevant.`;
// 3. Call your LLM
const completion = await openai.chat.completions.create({
model: 'gpt-4',
messages: [
{ role: 'system', content: systemPrompt },
{ role: 'user', content: userMessage }
]
});
return completion.choices[0].message.content;
}With Tier Prioritization
async function getLayeredContext(query, tokenBudget = 4000) {
// Allocate budget across tiers
const budgets = {
long: Math.floor(tokenBudget * 0.5), // 50% for core facts
medium: Math.floor(tokenBudget * 0.3), // 30% for recent context
short: Math.floor(tokenBudget * 0.2) // 20% for session context
};
const contexts = await Promise.all([
fetch('https://api.mymemoryos.com/api/v1/context', {
method: 'POST',
headers: {
'Authorization': 'Bearer mos_live_<your_key>',
'Content-Type': 'application/json'
},
body: JSON.stringify({
query,
max_tokens: budgets.long,
tier: 'long'
})
}).then(r => r.json()),
fetch('https://api.mymemoryos.com/api/v1/context', {
method: 'POST',
headers: {
'Authorization': 'Bearer mos_live_<your_key>',
'Content-Type': 'application/json'
},
body: JSON.stringify({
query,
max_tokens: budgets.medium,
tier: 'medium'
})
}).then(r => r.json()),
fetch('https://api.mymemoryos.com/api/v1/context', {
method: 'POST',
headers: {
'Authorization': 'Bearer mos_live_<your_key>',
'Content-Type': 'application/json'
},
body: JSON.stringify({
query,
max_tokens: budgets.short,
tier: 'short'
})
}).then(r => r.json())
]);
return {
longTerm: contexts[0].data.context,
mediumTerm: contexts[1].data.context,
shortTerm: contexts[2].data.context,
totalTokens: contexts.reduce((sum, c) => sum + c.data.token_count, 0)
};
}Structured Prompt Template
function buildPromptWithContext(context, userMessage) {
return `# System Instructions
You are an AI assistant with persistent memory about this user.
# Long-Term Knowledge
These are established facts about the user:
${context.longTerm || 'No long-term memories available.'}
# Recent Context
Recent interactions and temporary preferences:
${context.mediumTerm || 'No recent context available.'}
# Current Session
Information from this conversation:
${context.shortTerm || 'No session context available.'}
# User Message
${userMessage}
# Instructions
- Use the context above to personalize your response
- Reference specific memories when relevant
- If context contradicts the user's current request, prioritize the request
- Do not fabricate information not present in the context`;
}Token Budgeting Strategies
Effective token budgeting ensures you get the most relevant context without exceeding LLM limits.
Model Context Windows
| Model | Max Context | Recommended Context Budget |
|---|---|---|
| GPT-4 | 8K / 32K / 128K | 2K-4K / 8K-16K / 32K-64K |
| GPT-3.5 | 4K / 16K | 1K-2K / 4K-8K |
| Claude 3 | 200K | 20K-50K |
| Llama 3 | 8K | 2K-4K |
Budget Allocation Formula
function calculateContextBudget(modelMaxTokens, expectedResponseTokens = 1000) {
const systemPromptTokens = 500; // Reserve for system instructions
const bufferTokens = 200; // Safety buffer
const availableForContext = modelMaxTokens
- systemPromptTokens
- expectedResponseTokens
- bufferTokens;
return Math.floor(availableForContext * 0.8); // Use 80% of available
}
// Example: GPT-4 8K model
const budget = calculateContextBudget(8192);
console.log(`Context budget: ${budget} tokens`); // ~5000 tokensDynamic Budget Based on Query
async function getContextWithDynamicBudget(query, options = {}) {
const {
model = 'gpt-4',
expectedResponseLength = 'medium'
} = options;
const modelLimits = {
'gpt-4': 8192,
'gpt-4-32k': 32768,
'gpt-4-turbo': 128000,
'gpt-3.5-turbo': 4096,
'gpt-3.5-turbo-16k': 16384
};
const responseBudgets = {
short: 500,
medium: 1000,
long: 2000
};
const maxTokens = modelLimits[model] || 8192;
const responseTokens = responseBudgets[expectedResponseLength] || 1000;
const contextBudget = calculateContextBudget(maxTokens, responseTokens);
return fetch('https://api.mymemoryos.com/api/v1/context', {
method: 'POST',
headers: {
'Authorization': 'Bearer mos_live_<your_key>',
'Content-Type': 'application/json'
},
body: JSON.stringify({
query,
max_tokens: contextBudget
})
}).then(r => r.json());
}Caching Context
For applications with repeated similar queries, cache context responses:
class ContextCache {
constructor(ttlMs = 60000) { // 1 minute default TTL
this.cache = new Map();
this.ttlMs = ttlMs;
}
getCacheKey(query, options) {
return JSON.stringify({
query: query.toLowerCase().trim(),
max_tokens: options.max_tokens,
tier: options.tier,
format: options.format
});
}
get(query, options) {
const key = this.getCacheKey(query, options);
const entry = this.cache.get(key);
if (entry && Date.now() - entry.timestamp < this.ttlMs) {
return entry.data;
}
return null;
}
set(query, options, data) {
const key = this.getCacheKey(query, options);
this.cache.set(key, {
data,
timestamp: Date.now()
});
}
clear() {
this.cache.clear();
}
}
const contextCache = new ContextCache();
async function getCachedContext(query, options = {}) {
const cached = contextCache.get(query, options);
if (cached) {
return cached;
}
const response = await fetch('https://api.mymemoryos.com/api/v1/context', {
method: 'POST',
headers: {
'Authorization': 'Bearer mos_live_<your_key>',
'Content-Type': 'application/json'
},
body: JSON.stringify({ query, ...options })
});
const { data } = await response.json();
contextCache.set(query, options, data);
return data;
}Best Practices
1. Query Design
Write context queries that match how memories are stored:
// Good: Specific, matches memory content style
const query = "user's programming language preferences and coding style";
// Less effective: Too vague
const query = "preferences";2. Token Estimation
Memory OS estimates tokens as characters / 4. For more accurate counting:
import { encoding_for_model } from 'tiktoken';
function countTokens(text, model = 'gpt-4') {
const enc = encoding_for_model(model);
return enc.encode(text).length;
}3. Fallback Handling
Handle cases where context retrieval fails or returns empty:
async function getContextSafely(query, options = {}) {
try {
const response = await fetch('https://api.mymemoryos.com/api/v1/context', {
method: 'POST',
headers: {
'Authorization': 'Bearer mos_live_<your_key>',
'Content-Type': 'application/json'
},
body: JSON.stringify({ query, ...options })
});
if (!response.ok) {
console.warn('Context API error:', response.status);
return { context: '', memories: [], token_count: 0 };
}
const { data } = await response.json();
if (!data.memories.length) {
console.info('No relevant memories found');
}
return data;
} catch (error) {
console.error('Context retrieval failed:', error);
return { context: '', memories: [], token_count: 0 };
}
}4. Context Freshness
For time-sensitive applications, consider re-fetching context periodically:
class ContextManager {
constructor(apiKey, refreshInterval = 30000) {
this.apiKey = apiKey;
this.currentContext = null;
this.lastQuery = null;
this.refreshInterval = refreshInterval;
}
async getContext(query, options = {}) {
const queryChanged = query !== this.lastQuery;
const needsRefresh = !this.currentContext ||
queryChanged ||
Date.now() - this.lastFetch > this.refreshInterval;
if (needsRefresh) {
this.currentContext = await this.fetchContext(query, options);
this.lastQuery = query;
this.lastFetch = Date.now();
}
return this.currentContext;
}
async fetchContext(query, options) {
const response = await fetch('https://api.mymemoryos.com/api/v1/context', {
method: 'POST',
headers: {
'Authorization': `Bearer ${this.apiKey}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ query, ...options })
});
const { data } = await response.json();
return data;
}
}