Memory OS

Context API

The Context API retrieves relevant memories and formats them for direct injection into LLM prompts. It handles token budgeting, ranking, and formatting so you can focus on building your application.

Get LLM Context

HTTP
POST /v1/context

Retrieves semantically relevant memories formatted for LLM consumption.

Required Scope: search:read

Request Body

ParameterTypeRequiredDefaultDescription
querystringYes-The context query (typically the user's current question)
max_tokensintegerNo4000Maximum tokens for the context window
tierstringNo-Filter by tier: short, medium, long
formatstringNotextOutput format: text or json

Response

FieldTypeDescription
contextstringFormatted context ready for LLM injection
memoriesarraySource memories with relevance info
token_countintegerEstimated token count of the context
retrieval_time_msintegerTime spent retrieving context

Memory Object in Response

FieldTypeDescription
idstringMemory ID
contentstringMemory content
tierstringMemory tier
scorenumberCombined relevance score

cURL Example

Bash
curl -X POST "https://api.mymemoryos.com/api/v1/context" \
  -H "Authorization: Bearer mos_live_<your_key>" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "What does the user prefer for their development environment?",
    "max_tokens": 2000,
    "format": "text"
  }'

JavaScript Example

JavaScript
const response = await fetch('https://api.mymemoryos.com/api/v1/context', {
  method: 'POST',
  headers: {
    'Authorization': 'Bearer mos_live_<your_key>',
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({
    query: 'What does the user prefer for their development environment?',
    max_tokens: 2000,
    format: 'text'
  })
});

const { data } = await response.json();
console.log(`Retrieved ${data.memories.length} memories (${data.token_count} tokens)`);
console.log('Context:', data.context);

Python Example

Python
import requests

response = requests.post(
    'https://api.mymemoryos.com/api/v1/context',
    headers={
        'Authorization': 'Bearer mos_live_<your_key>',
        'Content-Type': 'application/json'
    },
    json={
        'query': 'What does the user prefer for their development environment?',
        'max_tokens': 2000,
        'format': 'text'
    }
)

data = response.json()['data']
print(f"Retrieved {len(data['memories'])} memories ({data['token_count']} tokens)")
print(f"Context: {data['context']}")

Response Example (Text Format)

JSON
{
  "data": {
    "context": "User prefers dark mode interfaces and minimal UI designs\n\nUser uses VS Code as their primary IDE with Vim keybindings\n\nUser prefers TypeScript over JavaScript for all projects",
    "memories": [
      {
        "id": "550e8400-e29b-41d4-a716-446655440000",
        "content": "User prefers dark mode interfaces and minimal UI designs",
        "tier": "long",
        "score": 0.92
      },
      {
        "id": "550e8400-e29b-41d4-a716-446655440001",
        "content": "User uses VS Code as their primary IDE with Vim keybindings",
        "tier": "long",
        "score": 0.88
      },
      {
        "id": "550e8400-e29b-41d4-a716-446655440002",
        "content": "User prefers TypeScript over JavaScript for all projects",
        "tier": "long",
        "score": 0.85
      }
    ],
    "token_count": 156,
    "retrieval_time_ms": 85
  },
  "meta": {
    "request_id": "req_abc123",
    "latency_ms": 95
  }
}

Response Example (JSON Format)

JSON
{
  "data": {
    "context": "[{\"id\":\"550e8400-e29b-41d4-a716-446655440000\",\"content\":\"User prefers dark mode interfaces and minimal UI designs\",\"tier\":\"long\",\"score\":0.92}]",
    "memories": [...],
    "token_count": 245,
    "retrieval_time_ms": 85
  },
  "meta": {
    "request_id": "req_abc123",
    "latency_ms": 95
  }
}

Injecting Context into LLM Prompts

The context API is designed for seamless integration with LLM applications. Here are common patterns:

Basic Prompt Injection

JavaScript
async function generateResponse(userMessage) {
  // 1. Get relevant context
  const contextResponse = await fetch('https://api.mymemoryos.com/api/v1/context', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer mos_live_<your_key>',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      query: userMessage,
      max_tokens: 2000
    })
  });

  const { data: contextData } = await contextResponse.json();

  // 2. Build the prompt with context
  const systemPrompt = `You are a helpful assistant with access to user memories and preferences.

Here is relevant context about the user:
<context>
${contextData.context}
</context>

Use this context to personalize your response when relevant.`;

  // 3. Call your LLM
  const completion = await openai.chat.completions.create({
    model: 'gpt-4',
    messages: [
      { role: 'system', content: systemPrompt },
      { role: 'user', content: userMessage }
    ]
  });

  return completion.choices[0].message.content;
}

With Tier Prioritization

JavaScript
async function getLayeredContext(query, tokenBudget = 4000) {
  // Allocate budget across tiers
  const budgets = {
    long: Math.floor(tokenBudget * 0.5),   // 50% for core facts
    medium: Math.floor(tokenBudget * 0.3), // 30% for recent context
    short: Math.floor(tokenBudget * 0.2)   // 20% for session context
  };

  const contexts = await Promise.all([
    fetch('https://api.mymemoryos.com/api/v1/context', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer mos_live_<your_key>',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        query,
        max_tokens: budgets.long,
        tier: 'long'
      })
    }).then(r => r.json()),

    fetch('https://api.mymemoryos.com/api/v1/context', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer mos_live_<your_key>',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        query,
        max_tokens: budgets.medium,
        tier: 'medium'
      })
    }).then(r => r.json()),

    fetch('https://api.mymemoryos.com/api/v1/context', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer mos_live_<your_key>',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        query,
        max_tokens: budgets.short,
        tier: 'short'
      })
    }).then(r => r.json())
  ]);

  return {
    longTerm: contexts[0].data.context,
    mediumTerm: contexts[1].data.context,
    shortTerm: contexts[2].data.context,
    totalTokens: contexts.reduce((sum, c) => sum + c.data.token_count, 0)
  };
}

Structured Prompt Template

JavaScript
function buildPromptWithContext(context, userMessage) {
  return `# System Instructions
You are an AI assistant with persistent memory about this user.

# Long-Term Knowledge
These are established facts about the user:
${context.longTerm || 'No long-term memories available.'}

# Recent Context
Recent interactions and temporary preferences:
${context.mediumTerm || 'No recent context available.'}

# Current Session
Information from this conversation:
${context.shortTerm || 'No session context available.'}

# User Message
${userMessage}

# Instructions
- Use the context above to personalize your response
- Reference specific memories when relevant
- If context contradicts the user's current request, prioritize the request
- Do not fabricate information not present in the context`;
}

Token Budgeting Strategies

Effective token budgeting ensures you get the most relevant context without exceeding LLM limits.

Model Context Windows

ModelMax ContextRecommended Context Budget
GPT-48K / 32K / 128K2K-4K / 8K-16K / 32K-64K
GPT-3.54K / 16K1K-2K / 4K-8K
Claude 3200K20K-50K
Llama 38K2K-4K

Budget Allocation Formula

JavaScript
function calculateContextBudget(modelMaxTokens, expectedResponseTokens = 1000) {
  const systemPromptTokens = 500; // Reserve for system instructions
  const bufferTokens = 200;       // Safety buffer

  const availableForContext = modelMaxTokens
    - systemPromptTokens
    - expectedResponseTokens
    - bufferTokens;

  return Math.floor(availableForContext * 0.8); // Use 80% of available
}

// Example: GPT-4 8K model
const budget = calculateContextBudget(8192);
console.log(`Context budget: ${budget} tokens`); // ~5000 tokens

Dynamic Budget Based on Query

JavaScript
async function getContextWithDynamicBudget(query, options = {}) {
  const {
    model = 'gpt-4',
    expectedResponseLength = 'medium'
  } = options;

  const modelLimits = {
    'gpt-4': 8192,
    'gpt-4-32k': 32768,
    'gpt-4-turbo': 128000,
    'gpt-3.5-turbo': 4096,
    'gpt-3.5-turbo-16k': 16384
  };

  const responseBudgets = {
    short: 500,
    medium: 1000,
    long: 2000
  };

  const maxTokens = modelLimits[model] || 8192;
  const responseTokens = responseBudgets[expectedResponseLength] || 1000;
  const contextBudget = calculateContextBudget(maxTokens, responseTokens);

  return fetch('https://api.mymemoryos.com/api/v1/context', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer mos_live_<your_key>',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({
      query,
      max_tokens: contextBudget
    })
  }).then(r => r.json());
}

Caching Context

For applications with repeated similar queries, cache context responses:

JavaScript
class ContextCache {
  constructor(ttlMs = 60000) { // 1 minute default TTL
    this.cache = new Map();
    this.ttlMs = ttlMs;
  }

  getCacheKey(query, options) {
    return JSON.stringify({
      query: query.toLowerCase().trim(),
      max_tokens: options.max_tokens,
      tier: options.tier,
      format: options.format
    });
  }

  get(query, options) {
    const key = this.getCacheKey(query, options);
    const entry = this.cache.get(key);

    if (entry && Date.now() - entry.timestamp < this.ttlMs) {
      return entry.data;
    }

    return null;
  }

  set(query, options, data) {
    const key = this.getCacheKey(query, options);
    this.cache.set(key, {
      data,
      timestamp: Date.now()
    });
  }

  clear() {
    this.cache.clear();
  }
}

const contextCache = new ContextCache();

async function getCachedContext(query, options = {}) {
  const cached = contextCache.get(query, options);
  if (cached) {
    return cached;
  }

  const response = await fetch('https://api.mymemoryos.com/api/v1/context', {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer mos_live_<your_key>',
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ query, ...options })
  });

  const { data } = await response.json();
  contextCache.set(query, options, data);

  return data;
}

Best Practices

1. Query Design

Write context queries that match how memories are stored:

JavaScript
// Good: Specific, matches memory content style
const query = "user's programming language preferences and coding style";

// Less effective: Too vague
const query = "preferences";

2. Token Estimation

Memory OS estimates tokens as characters / 4. For more accurate counting:

JavaScript
import { encoding_for_model } from 'tiktoken';

function countTokens(text, model = 'gpt-4') {
  const enc = encoding_for_model(model);
  return enc.encode(text).length;
}

3. Fallback Handling

Handle cases where context retrieval fails or returns empty:

JavaScript
async function getContextSafely(query, options = {}) {
  try {
    const response = await fetch('https://api.mymemoryos.com/api/v1/context', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer mos_live_<your_key>',
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ query, ...options })
    });

    if (!response.ok) {
      console.warn('Context API error:', response.status);
      return { context: '', memories: [], token_count: 0 };
    }

    const { data } = await response.json();

    if (!data.memories.length) {
      console.info('No relevant memories found');
    }

    return data;
  } catch (error) {
    console.error('Context retrieval failed:', error);
    return { context: '', memories: [], token_count: 0 };
  }
}

4. Context Freshness

For time-sensitive applications, consider re-fetching context periodically:

JavaScript
class ContextManager {
  constructor(apiKey, refreshInterval = 30000) {
    this.apiKey = apiKey;
    this.currentContext = null;
    this.lastQuery = null;
    this.refreshInterval = refreshInterval;
  }

  async getContext(query, options = {}) {
    const queryChanged = query !== this.lastQuery;
    const needsRefresh = !this.currentContext ||
      queryChanged ||
      Date.now() - this.lastFetch > this.refreshInterval;

    if (needsRefresh) {
      this.currentContext = await this.fetchContext(query, options);
      this.lastQuery = query;
      this.lastFetch = Date.now();
    }

    return this.currentContext;
  }

  async fetchContext(query, options) {
    const response = await fetch('https://api.mymemoryos.com/api/v1/context', {
      method: 'POST',
      headers: {
        'Authorization': `Bearer ${this.apiKey}`,
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({ query, ...options })
    });

    const { data } = await response.json();
    return data;
  }
}
Ctrl+Shift+C to copy