AI Agent Memory System Design:短期记忆 + 长期记忆 + 向量检索

Trong quá trình phát triển AI Agent, việc thiết kế hệ thống bộ nhớ là yếu tố quyết định giữa một agent "ngốc nghếch" chỉ nhớ cuộc trò chuyện hiện tại và một agent thông minh có khả năng học hỏi từ quá khứ. Bài viết này sẽ hướng dẫn bạn xây dựng hệ thống bộ nhớ 3 tầng hoàn chỉnh với HolySheep AI — nền tảng API AI với chi phí tiết kiệm đến 85% so với các nhà cung cấp khác.

Vấn đề thực tế: Khi Agent không nhớ gì cả

Bạn đã bao giờ gặp lỗi này khi làm việc với AI Agent?

ConnectionError: Failed to establish a new connection: 
Connection timeout after 30 seconds

During handling of the above exception, another exception occurred:

httpx.ConnectTimeout: Connection timeout
API Request Failed: Maximum retries exceeded

Hoặc tệ hơn — lỗi xác thực khiến toàn bộ cuộc hội thoại bị gián đoạn:

401 Unauthorized: Invalid API key or expired token
{
  "error": {
    "message": "Incorrect API key provided",
    "type": "invalid_request_error",
    "code": "invalid_api_key"
  }
}

Những lỗi này thường xảy ra khi hệ thống không có cơ chế quản lý bộ nhớ hiệu quả. Agent không thể duy trì ngữ cảnh, không lưu trữ thông tin quan trọng, và liên tục phải khởi tạo lại từ đầu. Hãy cùng thiết kế một giải pháp hoàn chỉnh.

Tổng quan kiến trúc Memory System 3 tầng

Hệ thống bộ nhớ của AI Agent được chia thành 3 tầng, mỗi tầng phục vụ một mục đích khác nhau:

Tier 1 - Bộ nhớ ngắn hạn (Short-term Memory): Lưu trữ ngữ cảnh của cuộc trò chuyện hiện tại, có dung lượng giới hạn
Tier 2 - Bộ nhớ dài hạn (Long-term Memory): Lưu trữ thông tin quan trọng qua nhiều phiên làm việc
Tier 3 - Vector Retrieval: Tìm kiếm ngữ nghĩa trong lịch sử, trả về kết quả liên quan nhất

Triển khai Short-term Memory (Bộ nhớ ngắn hạn)

Bộ nhớ ngắn hạn hoạt động như một "buffer" lưu trữ các tin nhắn trong phiên hiện tại. Khi buffer đầy, chúng ta cần cơ chế để "nén" hoặc "dump" vào bộ nhớ dài hạn.

import json
from datetime import datetime
from typing import List, Dict, Optional
from collections import deque

class ShortTermMemory:
    """Bộ nhớ ngắn hạn - lưu trữ ngữ cảnh cuộc trò chuyện hiện tại"""
    
    def __init__(self, max_messages: int = 20, max_tokens: int = 4000):
        self.max_messages = max_messages
        self.max_tokens = max_tokens
        self.messages = deque(maxlen=max_messages)
        self.session_id = datetime.now().strftime("%Y%m%d_%H%M%S")
        self.created_at = datetime.now()
    
    def add_message(self, role: str, content: str, metadata: Optional[Dict] = None):
        """Thêm tin nhắn vào bộ nhớ ngắn hạn"""
        message = {
            "role": role,
            "content": content,
            "timestamp": datetime.now().isoformat(),
            "metadata": metadata or {}
        }
        self.messages.append(message)
        
        # Kiểm tra nếu vượt quá giới hạn token
        if self.estimate_tokens() > self.max_tokens:
            return self.compress()
        return None
    
    def estimate_tokens(self) -> int:
        """Ước tính số tokens (1 token ≈ 4 ký tự tiếng Anh, 2 ký tự tiếng Việt)"""
        total_chars = sum(len(m["content"]) for m in self.messages)
        return total_chars // 4
    
    def get_context(self, system_prompt: str = "") -> List[Dict]:
        """Lấy ngữ cảnh đầy đủ để gửi cho API"""
        context = []
        if system_prompt:
            context.append({"role": "system", "content": system_prompt})
        context.extend(list(self.messages))
        return context
    
    def compress(self) -> Dict:
        """Nén bộ nhớ khi vượt giới hạn - tạo summary"""
        old_messages = list(self.messages)
        summary = self._create_summary(old_messages)
        
        # Reset với summary
        self.messages.clear()
        self.messages.append({
            "role": "system",
            "content": f"[TÓM TẮT CUỘC TRÒ CHUYỆN TRƯỚC]: {summary}",
            "timestamp": datetime.now().isoformat()
        })
        
        return {"action": "compressed", "summary": summary}
    
    def _create_summary(self, messages: List[Dict]) -> str:
        """Tạo tóm tắt bằng AI"""
        # Triển khai sử dụng HolySheep API
        pass

Sử dụng với HolySheep API
def call_holysheep_api(messages: List[Dict], model: str = "deepseek-chat") -> str:
    import httpx
    
    url = "https://api.holysheep.ai/v1/chat/completions"
    headers = {
        "Authorization": "Bearer YOUR_HOLYSHEEP_API_KEY",
        "Content-Type": "application/json"
    }
    payload = {
        "model": model,
        "messages": messages,
        "temperature": 0.7
    }
    
    try:
        with httpx.Client(timeout=30.0) as client:
            response = client.post(url, headers=headers, json=payload)
            response.raise_for_status()
            return response.json()["choices"][0]["message"]["content"]
    except httpx.ConnectTimeout:
        print("Connection timeout - thử lại với retry logic")
        raise
    except httpx.HTTPStatusError as e:
        if e.response.status_code == 401:
            print("Lỗi xác thực - kiểm tra API key")
            raise
        raise

Triển khai Long-term Memory (Bộ nhớ dài hạn)

Bộ nhớ dài hạn sử dụng cơ sở dữ liệu SQLite hoặc PostgreSQL để lưu trữ thông tin quan trọng qua nhiều phiên. Đây là nơi lưu trữ preferences, facts, và knowledge của user.

import sqlite3
import json
from datetime import datetime
from typing import List, Dict, Optional
from pathlib import Path

class LongTermMemory:
    """Bộ nhớ dài hạn - lưu trữ thông tin qua nhiều phiên"""
    
    def __init__(self, db_path: str = "agent_memory.db"):
        self.db_path = db_path
        self._init_database()
    
    def _init_database(self):
        """Khởi tạo database với bảng memory"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        
        cursor.execute("""
            CREATE TABLE IF NOT EXISTS memories (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                user_id TEXT NOT NULL,
                memory_type TEXT NOT NULL,
                content TEXT NOT NULL,
                embedding BLOB,
                importance INTEGER DEFAULT 5,
                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
                access_count INTEGER DEFAULT 0,
                last_accessed TIMESTAMP
            )
        """)
        
        cursor.execute("""
            CREATE INDEX IF NOT EXISTS idx_user_memory 
            ON memories(user_id, memory_type)
        """)
        
        conn.commit()
        conn.close()
    
    def store(self, user_id: str, memory_type: str, content: str, 
              importance: int = 5, embedding: Optional[List[float]] = None):
        """Lưu ký ức vào database"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        
        cursor.execute("""
            INSERT INTO memories 
            (user_id, memory_type, content, embedding, importance)
            VALUES (?, ?, ?, ?, ?)
        """, (
            user_id, 
            memory_type, 
            content, 
            json.dumps(embedding) if embedding else None,
            importance
        ))
        
        memory_id = cursor.lastrowid
        conn.commit()
        conn.close()
        
        return memory_id
    
    def retrieve(self, user_id: str, memory_type: Optional[str] = None,
                 limit: int = 10) -> List[Dict]:
        """Truy xuất ký ức theo loại"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        
        if memory_type:
            cursor.execute("""
                SELECT id, content, memory_type, importance, access_count
                FROM memories 
                WHERE user_id = ? AND memory_type = ?
                ORDER BY importance DESC, created_at DESC
                LIMIT ?
            """, (user_id, memory_type, limit))
        else:
            cursor.execute("""
                SELECT id, content, memory_type, importance, access_count
                FROM memories 
                WHERE user_id = ?
                ORDER BY importance DESC, created_at DESC
                LIMIT ?
            """, (user_id, limit))
        
        results = cursor.fetchall()
        conn.close()
        
        return [
            {
                "id": row[0],
                "content": row[1],
                "type": row[2],
                "importance": row[3],
                "access_count": row[4]
            }
            for row in results
        ]
    
    def update_importance(self, memory_id: int, delta: int):
        """Cập nhật độ quan trọng của ký ức"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        
        cursor.execute("""
            UPDATE memories 
            SET importance = importance + ?,
                access_count = access_count + 1,
                last_accessed = CURRENT_TIMESTAMP
            WHERE id = ?
        """, (delta, memory_id))
        
        conn.commit()
        conn.close()
    
    def forget_old(self, user_id: str, keep_count: int = 100):
        """Xóa ký ức cũ không quan trọng"""
        conn = sqlite3.connect(self.db_path)
        cursor = conn.cursor()
        
        cursor.execute("""
            DELETE FROM memories 
            WHERE user_id = ? 
            AND id NOT IN (
                SELECT id FROM memories 
                WHERE user_id = ?
                ORDER BY importance DESC, created_at DESC
                LIMIT ?
            )
        """, (user_id, user_id, keep_count))
        
        deleted = cursor.rowcount
        conn.commit()
        conn.close()
        
        return deleted

Triển khai Vector Retrieval với HolySheep

Tính năng tìm kiếm ngữ nghĩa là trái tim của hệ thống RAG (Retrieval Augmented Generation). Chúng ta sử dụng embedding model để chuyển đổi text thành vector và tìm kiếm theo độ tương đồng cosine.

import httpx
import numpy as np
from typing import List, Dict, Tuple
import json

class VectorMemory:
    """Vector retrieval system cho semantic search"""
    
    def __init__(self, api_key: str, embedding_model: str = "embedding-3"):
        self.api_key = api_key
        self.embedding_model = embedding_model
        self.base_url = "https://api.holysheep.ai/v1"
        self.vectors = []  # Danh sách vector
        self.metadatas = []  # Metadata tương ứng
    
    def get_embedding(self, text: str) -> List[float]:
        """Lấy embedding vector từ HolySheep API"""
        url = f"{self.base_url}/embeddings"
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "Content-Type": "application/json"
        }
        payload = {
            "model": self.embedding_model,
            "input": text
        }
        
        with httpx.Client(timeout=30.0) as client:
            response = client.post(url, headers=headers, json=payload)
            response.raise_for_status()
            data = response.json()
            return data["data"][0]["embedding"]
    
    def add(self, text: str, metadata: Dict):
        """Thêm document vào vector store"""
        embedding = self.get_embedding(text)
        self.vectors.append(np.array(embedding))
        self.metadatas.append({
            "text": text,
            **metadata
        })
    
    def search(self, query: str, top_k: int = 5, threshold: float = 0.7) -> List[Dict]:
        """Tìm kiếm documents liên quan nhất"""
        query_embedding = self.get_embedding(query)
        query_vector = np.array(query_embedding)
        
        # Tính cosine similarity
        similarities = []
        for i, vector in enumerate(self.vectors):
            cos_sim = self._cosine_similarity(query_vector, vector)
            similarities.append((i, cos_sim))
        
        # Sắp xếp theo độ tương đồng giảm dần
        similarities.sort(key=lambda x: x[1], reverse=True)
        
        # Trả về top-k với ngưỡng threshold
        results = []
        for idx, score in similarities[:top_k]:
            if score >= threshold:
                results.append({
                    "content": self.metadatas[idx]["text"],
                    "score": float(score),
                    **self.metadatas[idx]
                })
        
        return results
    
    def _cosine_similarity(self, a: np.ndarray, b: np.ndarray) -> float:
        """Tính cosine similarity giữa 2 vectors"""
        dot_product = np.dot(a, b)
        norm_a = np.linalg.norm(a)
        norm_b = np.linalg.norm(b)
        return dot_product / (norm_a * norm_b)
    
    def save_index(self, filepath: str):
        """Lưu index ra file"""
        data = {
            "vectors": [v.tolist() for v in self.vectors],
            "metadatas": self.metadatas
        }
        with open(filepath, 'w') as f:
            json.dump(data, f)
    
    def load_index(self, filepath: str):
        """Load index từ file"""
        with open(filepath, 'r') as f:
            data = json.load(f)
        self.vectors = [np.array(v) for v in data["vectors"]]
        self.metadatas = data["metadatas"]


============== TÍCH HỢP VỚI HOLYSHEEP AI ==============

class HolySheepAgentMemory:
    """Tích hợp đầy đủ Memory System với HolySheep API"""
    
    def __init__(self, api_key: str):
        self.short_term
Tài nguyên liên quan
📚 Hướng dẫn AI API
💰 Xem giá
📖 Tài liệu nhà phát triển
🚀 Đăng ký miễn phí
Bài viết liên quan
vi ai api zai serverless huanjingzhongdelengqidongyou 2026 0
vi ai api xiangyingshijianbodongdaruhezhenduan p99 ya 2026 0

Vấn đề thực tế: Khi Agent không nhớ gì cả

Tổng quan kiến trúc Memory System 3 tầng

Triển khai Short-term Memory (Bộ nhớ ngắn hạn)

Sử dụng với HolySheep API

Triển khai Long-term Memory (Bộ nhớ dài hạn)

Triển khai Vector Retrieval với HolySheep

============== TÍCH HỢP VỚI HOLYSHEEP AI ==============

Tài nguyên liên quan

Bài viết liên quan

🔥 Thử HolySheep AI