Memory OS

Relevance Scoring

Memory OS uses a multi-factor scoring algorithm to rank memories during retrieval. This ensures that the most relevant, recent, and important memories are surfaced first, mimicking how human memory prioritizes information.

Overview

The combined relevance score is calculated from six weighted factors:

FactorWeightDescription
Semantic Similarity40%How closely the memory matches the query meaning
Recency20%How recently the memory was created or accessed
Importance15%Explicit importance score assigned to the memory
Access Frequency10%How often the memory has been retrieved
User Feedback10%Signals from useful/not useful ratings
Entity Co-occurrence5%Shared entities between query and memory

The Scoring Formula

TEXT
combined_score = (0.40 * similarity) +
                 (0.20 * recency_score) +
                 (0.15 * importance_score) +
                 (0.10 * access_score) +
                 (0.10 * feedback_score) +
                 (0.05 * entity_score)

All component scores are normalized to a 0-1 range, resulting in a final score between 0 and 1.

Factor Details

1. Semantic Similarity (40%)

The largest factor is semantic similarity, calculated using vector embeddings. When you search, your query is converted to an embedding and compared against stored memory embeddings using cosine similarity.

How it works:

  • Query text is embedded using the same model as stored memories
  • Cosine similarity is computed between query and memory embeddings
  • Higher similarity means the memory content is more semantically related

Example:

TEXT
Query: "What programming languages does the user prefer?"
Memory: "User works primarily with Python and TypeScript"
Similarity: 0.87 (high - related to programming languages)

Memory: "User prefers dark mode in their IDE"
Similarity: 0.45 (low - related to user preferences but not languages)

2. Recency (20%)

Recent memories are often more relevant. The recency score decays exponentially based on the memory's age and tier.

Decay rates by tier:

  • Short-term: Half-life of 6 hours
  • Medium-term: Half-life of 7 days
  • Long-term: Half-life of 90 days

Formula:

TEXT
recency_score = exp(-decay_rate * hours_since_access)

A memory accessed 1 hour ago scores higher than one accessed 1 week ago (assuming same tier).

3. Importance Score (15%)

Each memory has an explicit importance score (0-1) that you can set during creation or update. This lets you manually boost critical memories.

Default values:

  • New memories start at 0.5
  • Promote to 0.7-0.9 for important information
  • Demote to 0.2-0.4 for nice-to-have context
JavaScript
// Create a high-importance memory
await client.memories.create({
  content: "User is the CEO and needs executive-level responses",
  tier: "long",
  content_type: "fact",
  importance_score: 0.95,
  metadata: { category: "user-profile" }
});

// Update importance based on feedback
await client.memories.update("memory-id", {
  importance_score: 0.8  // Boost after positive feedback
});
Python
# Create a high-importance memory
client.memories.create(
    content="User is the CEO and needs executive-level responses",
    tier="long",
    content_type="fact",
    importance_score=0.95,
    metadata={"category": "user-profile"}
)

# Update importance based on feedback
client.memories.update("memory-id",
    importance_score=0.8  # Boost after positive feedback
)
Bash
# Create a high-importance memory
curl -X POST https://api.mymemoryos.com/v1/memories \
  -H "Authorization: Bearer $MEMORY_OS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "User is the CEO and needs executive-level responses",
    "tier": "long",
    "content_type": "fact",
    "importance_score": 0.95,
    "metadata": {"category": "user-profile"}
  }'

4. Access Frequency (10%)

Memories that are frequently retrieved are likely more relevant. Access frequency is tracked automatically and normalized.

How it works:

  • Each retrieval increments access_count
  • Score is calculated as min(1, access_count / 20)
  • Caps at 20 accesses to prevent runaway scores

This creates a self-reinforcing loop where useful memories become easier to find.

5. User Feedback (10%)

Explicit feedback from users or your application adjusts memory relevance.

Feedback types:

  • useful: Boosts relevance score
  • not_useful: Decreases relevance score
  • outdated: Marks for review/decay
  • incorrect: Significantly penalizes the memory
JavaScript
// Record positive feedback
await client.feedback.create({
  memory_id: "memory-id",
  type: "useful",
  context: "User found this information helpful"
});

// Record negative feedback
await client.feedback.create({
  memory_id: "memory-id",
  type: "not_useful",
  context: "Information was outdated"
});
Python
# Record positive feedback
client.feedback.create(
    memory_id="memory-id",
    type="useful",
    context="User found this information helpful"
)

# Record negative feedback
client.feedback.create(
    memory_id="memory-id",
    type="not_useful",
    context="Information was outdated"
)
Bash
# Record positive feedback
curl -X POST https://api.mymemoryos.com/v1/feedback \
  -H "Authorization: Bearer $MEMORY_OS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "memory_id": "memory-id",
    "type": "useful",
    "context": "User found this information helpful"
  }'

6. Entity Co-occurrence (5%)

Memories that share entities with the query receive a boost. Entities are extracted automatically from memory content.

Example:

TEXT
Query: "What does Sarah think about the React migration?"
Entities: ["Sarah", "React"]

Memory 1: "Sarah mentioned concerns about the migration timeline"
Entities: ["Sarah", "migration"]
Entity overlap: 1 (Sarah)
Entity score: 0.5

Memory 2: "The React migration is scheduled for Q2"
Entities: ["React", "migration", "Q2"]
Entity overlap: 1 (React)
Entity score: 0.5

Memory 3: "Sarah prefers Vue over React"
Entities: ["Sarah", "Vue", "React"]
Entity overlap: 2 (Sarah, React)
Entity score: 1.0

Understanding Search Results

Search results include individual scores for transparency:

JSON
{
  "results": [
    {
      "id": "mem_123",
      "content": "User prefers Python for data analysis work",
      "similarity": 0.89,
      "combined_score": 0.82,
      "relevance_score": 0.75,
      "tier": "long",
      "created_at": "2024-01-10T15:30:00Z"
    }
  ],
  "search_type": "semantic",
  "threshold": 0.7
}
  • similarity: Raw semantic similarity (0-1)
  • combined_score: Final weighted score used for ranking
  • relevance_score: Stored relevance score (updated by decay and feedback)

Tuning for Your Use Case

Different applications may need different scoring weights. Here are common tuning patterns:

Real-time Chat (Prioritize Recency)

For chatbots where recent context matters most:

JavaScript
// Emphasize short-term memories and recent access
const results = await client.search({
  query: "What is the user currently working on?",
  tier: "short",  // Only look at short-term memories
  limit: 5
});

Knowledge Base (Prioritize Importance)

For FAQ or documentation systems where accuracy trumps recency:

JavaScript
// Focus on long-term semantic memories with high importance
const results = await client.search({
  query: "How does authentication work?",
  tier: "long",
  memory_nature: "semantic",
  threshold: 0.8  // Higher threshold for precision
});

Personalization (Balance All Factors)

For personalization engines that need both history and preferences:

JavaScript
// Use default scoring but filter by relevant metadata
const context = await client.getContext({
  query: "Personalize the homepage for this user",
  max_tokens: 3000
  // Default scoring works well for personalization
});

Minimum Thresholds

Use the threshold parameter to set a minimum combined score:

JavaScript
// Only return highly relevant memories
const results = await client.search({
  query: "User preferences",
  threshold: 0.8,  // 80% minimum combined score
  limit: 10
});

// More lenient for broad context
const broadResults = await client.search({
  query: "Any relevant user information",
  threshold: 0.5,  // 50% minimum
  limit: 50
});
Python
# Only return highly relevant memories
results = client.search(
    query="User preferences",
    threshold=0.8,  # 80% minimum combined score
    limit=10
)

# More lenient for broad context
broad_results = client.search(
    query="Any relevant user information",
    threshold=0.5,  # 50% minimum
    limit=50
)
Bash
# Only return highly relevant memories
curl -X POST https://api.mymemoryos.com/v1/search \
  -H "Authorization: Bearer $MEMORY_OS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "User preferences",
    "threshold": 0.8,
    "limit": 10
  }'

Best Practices

1. Set Meaningful Importance Scores

Don't leave everything at the default 0.5. Actively manage importance:

JavaScript
// Critical information
{ importance_score: 0.9 }  // User identity, key preferences

// Standard information
{ importance_score: 0.5 }  // Regular interactions

// Low-priority context
{ importance_score: 0.3 }  // Nice-to-have details

2. Collect Feedback

Implement feedback loops to improve relevance over time:

JavaScript
// After using a memory successfully
if (userFoundResponseHelpful) {
  await client.feedback.create({
    memory_id: usedMemoryId,
    type: "useful"
  });
}

3. Use Appropriate Tiers

The tier affects decay rate, which impacts recency scoring:

  • Short-term memories decay quickly, keeping them relevant only briefly
  • Long-term memories decay slowly, remaining relevant for months

4. Monitor Combined Scores

Track the combined scores of retrieved memories to calibrate your thresholds:

JavaScript
const results = await client.search({ query, limit: 10 });

// Log score distribution
const scores = results.results.map(r => r.combined_score);
console.log(`Score range: ${Math.min(...scores)} - ${Math.max(...scores)}`);
console.log(`Average: ${scores.reduce((a,b) => a+b, 0) / scores.length}`);
Ctrl+Shift+C to copy