diff --git a/README.md b/README.md index 58ca928..591a27b 100644 --- a/README.md +++ b/README.md @@ -28,6 +28,8 @@ This is a tutorial project of [Pocket Flow](https://github.com/The-Pocket/Pocket 🤯 All these tutorials are generated **entirely by AI** by crawling the GitHub repo! +- [AgentDock](https://the-pocket.github.io/Tutorial-Codebase-Knowledge/AgentDock) - Create specialized AI agents with custom personalities, tools, and conversation handling! + - [AutoGen Core](https://the-pocket.github.io/Tutorial-Codebase-Knowledge/AutoGen%20Core) - Build AI teams that talk, think, and solve problems together like coworkers! - [Browser Use](https://the-pocket.github.io/Tutorial-Codebase-Knowledge/Browser%20Use) - Let AI surf the web for you, clicking buttons and filling forms like a digital assistant! diff --git a/docs/AgentDock/01_agent_configuration___agentconfig___.md b/docs/AgentDock/01_agent_configuration___agentconfig___.md new file mode 100644 index 0000000..84c6d1e --- /dev/null +++ b/docs/AgentDock/01_agent_configuration___agentconfig___.md @@ -0,0 +1,281 @@ +# Chapter 1: Agent Configuration (`AgentConfig`) + +Welcome to the AgentDock tutorial! We're excited to help you get started building powerful AI agents. This first chapter introduces one of the most fundamental concepts: the **Agent Configuration**, often referred to as `AgentConfig`. + +## What's the Big Idea? + +Imagine you want to build a helpful AI assistant specifically designed to answer questions about finance. How do you tell the AI *how* to behave? How do you give it the right tools, like the ability to look up current stock prices? And how do you give it a name and personality, maybe making it sound knowledgeable but cautious about giving financial advice? + +This is where `AgentConfig` comes in. It's like a **blueprint** or a **recipe** for creating a specific AI agent. It gathers all the essential instructions in one place, defining: + +* **Who** the agent is (its name and ID). +* **How** it should act and talk (its personality). +* **What** it can do (the tools and capabilities it has). +* **Initial setup** details (like starting messages in a chat). + +Think of it like a character sheet for a game character. The sheet tells you the character's name, their personality traits (like "brave" or "cautious"), their skills (like "lockpicking" or "spellcasting"), and maybe their starting equipment. An `AgentConfig` does the same for an AI agent. + +## Anatomy of an AgentConfig + +An `AgentConfig` is typically defined in a simple text file (using JSON format, which we'll see soon). Let's break down the key ingredients of this recipe: + +1. **Identification (`agentId`, `name`, `description`)**: Basic info to know which agent is which. + * `agentId`: A unique, internal code name (e.g., `finance-assistant`). + * `name`: A friendly, display name (e.g., "Finance Assistant"). + * `description`: A short explanation of what the agent does. + +2. **Personality (`personality`)**: This is crucial! It's a set of instructions telling the underlying AI language model how to behave. It could be a simple sentence or a detailed list of traits and rules. + * Example: "You are a helpful Finance Assistant. You provide information about stocks but always remind the user this is not financial advice." + +3. **Nodes (`nodes`)**: These are the specific **capabilities** or **tools** the agent can use. If our Finance Assistant needs to look up stock prices, we list the "stock price lookup" node here. We'll learn more about these in the [Tools](02_tools_.md) and [Nodes (`BaseNode`, `AgentNode`)](03_nodes___basenode____agentnode___.md) chapters. + * Example: `["stock_price", "crypto_price"]` means the agent can use these two tools. + +4. **Node Configurations (`nodeConfigurations`)**: Some nodes might need specific settings. For example, which AI model should the agent use for its "brain"? This section holds those settings. We'll touch upon the AI model part in the [CoreLLM (LLM Abstraction)](04_corellm__llm_abstraction__.md) chapter. + * Example: Setting the specific AI model version for the language processing node. + +5. **Chat Settings (`chatSettings`)**: Controls how the chat interface works with this agent. + * `initialMessages`: What the agent says when you first open a chat with it. + * `historyPolicy`: How much of the past conversation the agent should remember. + * `chatPrompts`: Suggested questions the user might ask, displayed in the chat interface. + +## Example: Defining Our Finance Assistant + +Let's look at a simplified `AgentConfig` for our "Finance Assistant". This would typically be in a file like `agents/finance-assistant/template.json`: + +```json +{ + "version": "1.0", + "agentId": "finance-assistant", + "name": "Finance Assistant", + "description": "Provides stock and crypto info.", + "personality": [ + "You are a knowledgeable finance assistant.", + "You specialize in stocks and crypto using real-time data.", + "Always provide disclaimers: not financial advice." + ], + "nodes": [ + "llm.anthropic", // The AI model node + "stock_price", // Tool to get stock prices + "crypto_price" // Tool to get crypto prices + ], + "nodeConfigurations": { + "llm.anthropic": { // Settings for the AI model + "model": "claude-3-5-sonnet-20240620", // Which specific AI model to use + "temperature": 0.7 // Controls creativity (lower = more focused) + } + // Settings for stock_price or crypto_price could go here if needed + }, + "chatSettings": { + "initialMessages": [ + "Hello! I'm your Finance Assistant. Ask me about stocks or crypto!" + ], + "chatPrompts": [ + "What's Bitcoin's price?", + "Show me Apple's stock price" + ] + } +} +``` + +**Explanation:** + +* This JSON file defines our `finance-assistant`. +* `personality` gives it instructions on how to behave (knowledgeable, focused on finance, includes disclaimer). Notice it can be an array of strings, which are combined. +* `nodes` tells us it uses an AI model (`llm.anthropic`) and two tools (`stock_price`, `crypto_price`). +* `nodeConfigurations` sets specific options for the `llm.anthropic` node. +* `chatSettings` defines its greeting (`initialMessages`) and suggests some starting questions (`chatPrompts`). + +By creating this simple file, we've defined the entire behavior and capabilities of our Finance Assistant! + +## How AgentDock Uses This Configuration + +So you've written your `template.json` file. What happens next? When you want to interact with an agent (like the Finance Assistant), AgentDock needs to load and understand its configuration. + +**High-Level Steps:** + +1. **Request:** You ask AgentDock to start a chat with the "Finance Assistant". +2. **Find Template:** AgentDock looks for the `template.json` file corresponding to `finance-assistant`. +3. **Load & Validate:** It reads the JSON file. It then checks if the configuration is valid (does it have all the required parts? Is the format correct?). +4. **Inject Secrets:** It might securely add necessary information, like API keys required by some nodes (e.g., the key needed to use the `llm.anthropic` AI model). This is *not* stored directly in the template file for security. +5. **Ready:** AgentDock now has a complete, validated `AgentConfig` object in memory, ready to use for managing the agent's conversation and actions. + +Here's a simplified diagram of that process: + +```mermaid +sequenceDiagram + participant User + participant AgentDock System + participant Template File Store + participant Validator + participant Config Object + + User->>AgentDock System: Start chat with 'finance-assistant' + AgentDock System->>Template File Store: Find template for 'finance-assistant' + Template File Store-->>AgentDock System: Return template.json content + AgentDock System->>Validator: Validate this content + Validator-->>AgentDock System: Content is valid + AgentDock System->>Config Object: Create AgentConfig (add API key, etc.) + AgentDock System-->>User: Agent is ready, chat starts +``` + +## Under the Hood: Loading and Validation Code + +Let's peek at some (simplified) code to see how this loading happens. + +**Loading the Configuration:** + +The core function responsible for this is `loadAgentConfig` found in `agentdock-core/src/config/agent-config.ts`. + +```typescript +// File: agentdock-core/src/config/agent-config.ts + +import { AgentConfig, AgentConfigSchema } from '../types/agent-config'; +import { createError, ErrorCode } from '../errors'; +// ... other imports + +/** + * Load and validate an agent configuration + */ +export async function loadAgentConfig( + template: any, // The raw content from template.json + apiKey?: string // The API key to be added securely +): Promise { + try { + // 1. Basic Checks + if (!apiKey) { + throw createError(/* ... */); // Error if API key is missing + } + if (!template) { + throw createError(/* ... */); // Error if template is missing + } + + // 2. Prepare the config object + const config = { + ...template, // Copy properties from the template + // Ensure nodes is a mutable array + nodes: [...template.nodes], + // Add the API key to the correct node configuration + nodeConfigurations: { + ...template.nodeConfigurations, + // Simplified: Assume we know the LLM node needs the key + ['llm.anthropic']: { + ...template.nodeConfigurations?.['llm.anthropic'], + apiKey // Add the key here + } + }, + // ... potentially prepare other fields like chatSettings ... + }; + + // 3. Validate the final config object using a schema + return AgentConfigSchema.parse(config); // Throws error if invalid + + } catch (error) { + // Handle errors gracefully + throw createError(/* ... */); + } +} +``` + +**Explanation:** + +1. It first checks if the necessary inputs (`template` content and `apiKey`) are present. +2. It creates a new `config` object, copying data from the `template` and securely adding the `apiKey` to the right place within `nodeConfigurations`. +3. Crucially, it uses `AgentConfigSchema.parse(config)` to validate the entire structure. If anything is missing or has the wrong format, this step will fail, preventing an invalid agent from being used. + +**Validation Schema (`AgentConfigSchema`):** + +How does `AgentConfigSchema.parse` work? AgentDock uses a library called Zod to define the expected structure and data types for the `AgentConfig`. This definition is in `agentdock-core/src/types/agent-config.ts`. + +```typescript +// File: agentdock-core/src/types/agent-config.ts + +import { z } from 'zod'; // Zod library for schema validation +// ... other imports like PersonalitySchema ... + +/** + * Zod schema for validating agent configurations + */ +export const AgentConfigSchema = z.object({ + version: z.string(), // Must be a string + agentId: z.string(), // Must be a string + name: z.string(), + description: z.string(), + personality: PersonalitySchema, // Uses a specific schema for personality + nodes: z.array(z.string()), // Must be an array of strings + nodeConfigurations: z.record(z.any()), // An object with any kind of values + chatSettings: z.object({ // Must be an object with these fields: + initialMessages: z.array(z.string()).optional(), // Optional array of strings + // ... other chat settings fields ... + }), + // ... other optional fields like orchestration, maxConcurrency ... +}); + +// The actual AgentConfig type used in TypeScript code +export type AgentConfig = z.infer; +``` + +**Explanation:** + +* This schema precisely defines what a valid `AgentConfig` must look like. +* `z.object({...})` defines an object structure. +* `z.string()`, `z.array(z.string())`, `z.record(z.any())`, `z.optional()` specify the types and requirements for each field. +* When `AgentConfigSchema.parse(config)` is called, Zod checks if the `config` object matches this structure. If not, it raises an error. This ensures that all agents loaded into AgentDock meet the expected format. + +**Bundling Templates:** + +You might notice multiple `template.json` files, one for each agent. For efficiency, especially in web applications, AgentDock uses a build script (`scripts/bundle-templates.ts`) to gather all these JSON files into a single TypeScript file (`src/generated/templates.ts`). This makes it faster and easier to access any agent's template when needed, without reading individual files every time. + +```typescript +// Simplified idea from scripts/bundle-templates.ts + +import fs from 'fs'; +import path from 'path'; + +async function bundleTemplates() { + const templatesDir = path.join(process.cwd(), 'agents'); + const outputFile = path.join(process.cwd(), 'src/generated/templates.ts'); + const templates = {}; // Object to hold all templates + + // 1. Find all agent folders (like 'finance-assistant') + const agentDirs = await fs.readdir(templatesDir); + + // 2. Read each agent's template.json + for (const agentId of agentDirs) { + const templatePath = path.join(templatesDir, agentId, 'template.json'); + const templateContent = await fs.readFile(templatePath, 'utf-8'); + templates[agentId] = JSON.parse(templateContent); // Store it + } + + // 3. Write a TypeScript file exporting all templates + const fileContent = ` + // Generated file + export const templates = ${JSON.stringify(templates, null, 2)} as const; + // ... helper functions to get templates ... + `; + await fs.writeFile(outputFile, fileContent); + console.log('Templates bundled!'); +} + +bundleTemplates(); +``` + +This script essentially creates a ready-to-use collection of all agent blueprints. + +## Conclusion + +You've just learned about `AgentConfig`, the fundamental blueprint for defining AI agents in AgentDock! + +* It acts like a **recipe** or **character sheet**. +* It specifies the agent's **identity**, **personality**, **tools (nodes)**, and **settings**. +* It's typically defined in a `template.json` file for each agent. +* AgentDock loads, validates, and prepares this configuration using schemas (`AgentConfigSchema`) to ensure everything is correct before the agent starts working. + +Understanding `AgentConfig` is the first step to creating your own custom agents. You now know how to define *what* an agent is and *what* it should be able to do. + +In the next chapter, we'll dive deeper into one of the most exciting parts mentioned here: the **Tools** that give your agents their special abilities. + +Next: [Chapter 2: Tools](02_tools_.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/AgentDock/02_tools_.md b/docs/AgentDock/02_tools_.md new file mode 100644 index 0000000..31fffec --- /dev/null +++ b/docs/AgentDock/02_tools_.md @@ -0,0 +1,195 @@ +# Chapter 2: Tools + +In [Chapter 1: Agent Configuration (`AgentConfig`)](01_agent_configuration___agentconfig___.md), we learned how the `AgentConfig` acts as a blueprint for our AI agents. We saw that the `nodes` list in the configuration defines the capabilities an agent has. Now, let's zoom in on a very important type of capability: **Tools**. + +## Why Do Agents Need Tools? + +Imagine you ask a friend to find the current price of Apple stock. Your friend doesn't magically know the price; they'll likely use a tool – maybe a finance app on their phone or a website on their computer. + +AI agents in AgentDock work similarly. While the core AI (the Large Language Model or LLM) is great at understanding language, generating text, and reasoning, it doesn't inherently know real-time information like stock prices, today's weather, or the latest news headlines. It also can't perform complex, specialized calculations or directly interact with external services on its own. + +That's where **Tools** come in! They give the agent specialized abilities, just like giving a human a calculator, a map, or access to a search engine. + +## What Exactly Are Tools in AgentDock? + +In AgentDock, **Tools** are specialized types of [Nodes (`BaseNode`, `AgentNode`)](03_nodes___basenode____agentnode___.md) designed to perform specific, well-defined actions. Think of them as the dedicated equipment an agent can use. + +Here are some common examples you'll find in AgentDock: + +* `stock_price`: Looks up the current price of a stock. +* `weather`: Fetches the weather forecast for a specific location. +* `search`: Performs a web search to find recent information. +* `deep_research`: Conducts more in-depth research on a topic, potentially analyzing multiple sources. +* `crypto_price`: Gets the current price of a cryptocurrency. + +These tools often work by: + +1. **Interacting with external APIs:** Like asking a weather service (e.g., Open-Meteo) or a stock data provider (e.g., AlphaVantage) for information. +2. **Performing complex calculations:** Maybe a tool does some specialized financial modeling. +3. **Accessing specific data sources:** A tool might connect to a particular database. + +When you define an agent using `AgentConfig`, you specify which tools it's *allowed* to use in the `nodes` list. Our Finance Assistant from Chapter 1, for example, was given access to `stock_price` and `crypto_price`. + +```json +// Simplified excerpt from finance-assistant/template.json +{ + // ... other config ... + "nodes": [ + "llm.anthropic", // The agent's "brain" (an LLM Node) + "stock_price", // Tool allowed: Stock Price Lookup + "crypto_price" // Tool allowed: Crypto Price Lookup + ] + // ... other config ... +} +``` + +This means the Finance Assistant *can* use these tools, but it won't use them randomly. It needs a reason. + +## How Does an Agent Decide to Use a Tool? + +This is where the magic of the underlying AI model comes in! Here's the typical flow when you chat with an agent: + +1. **User Request:** You ask the agent a question, like "What's the stock price for Google?" +2. **Agent Analyzes:** The agent's "brain" (the [CoreLLM (LLM Abstraction)](04_corellm__llm_abstraction__.md)) analyzes your request. It understands you're asking for a specific piece of real-time data. +3. **Tool Identification:** The LLM checks the list of tools it's allowed to use (from the `AgentConfig`). It sees `stock_price` is available and recognizes it's the right tool for the job. +4. **Tool Call Request:** The LLM decides to use the tool. It tells the AgentDock system: "I need to use the `stock_price` tool with the parameter `symbol: 'GOOGL'`." +5. **System Executes Tool:** AgentDock receives this request. It finds the actual `stock_price` tool code and runs it, passing 'GOOGL' as the input. The tool connects to the stock data API and gets the price. +6. **Tool Result:** The `stock_price` tool finishes its job and returns the result (e.g., `{ price: 180.50, currency: 'USD', symbol: 'GOOGL', ... }`). +7. **Result to Agent:** AgentDock gives this result back to the LLM. +8. **Agent Responds:** The LLM now has the information it needed. It formulates a user-friendly answer, like "The current stock price for GOOGL is $180.50 USD." and sends it back to you. + +Crucially, the agent only uses tools *when necessary* based on the conversation and the tools available to it. + +## Anatomy of a Tool + +Let's peek at what a tool definition looks like internally. Tools in AgentDock are built following specific patterns. Here’s a highly simplified view inspired by the `stock_price` tool (`src/nodes/stock-price/index.ts`): + +```typescript +// Simplified conceptual structure of a tool +import { z } from 'zod'; // Library for defining expected inputs + +// Define the expected input parameters for the tool +const stockPriceSchema = z.object({ + symbol: z.string().describe('Stock symbol (e.g., AAPL, MSFT)'), + // Other parameters could go here (like API key, optional) +}); + +// Define the tool itself +export const stockPriceTool = { + // 1. Name: How the agent refers to the tool + name: 'stock_price', + + // 2. Description: Helps the agent understand what the tool does + description: 'Get the current stock price for a given symbol', + + // 3. Parameters: Defines what inputs the tool needs + parameters: stockPriceSchema, + + // 4. Execute Function: The actual code that runs the tool + async execute(params: { symbol: string }, options: any) { + // Simplified logic: + console.log(`Tool executing: Getting price for ${params.symbol}`); + // ... code to call the AlphaVantage API using params.symbol ... + const priceData = { price: 180.50, /* ... other data ... */ }; // Dummy data + + // ... code to format the result nicely (e.g., using a UI component) ... + const formattedResult = `Stock Price for ${params.symbol}: $${priceData.price}`; + + console.log('Tool finished execution.'); + return formattedResult; // Return the result + } +}; +``` + +**Explanation:** + +1. **`name`**: A unique identifier (like `stock_price`). This is the name used in the `AgentConfig`'s `nodes` list and how the LLM requests the tool. +2. **`description`**: A clear explanation for the LLM, helping it decide *when* to use this tool. +3. **`parameters`**: Defines the inputs the tool expects. Here, it needs a `symbol` which must be a string. Zod (`z`) helps validate the inputs. +4. **`execute`**: This is the heart of the tool. It's an asynchronous function (`async`) that takes the parameters provided by the LLM (like `{ symbol: 'GOOGL' }`) and performs the action (like calling an external API). It then returns the result. Often, this result is formatted for display, potentially using special UI components. + +## How AgentDock Manages Tools: The Tool Registry + +You might wonder: how does AgentDock know about all these different tools like `stock_price`, `weather`, etc.? And how does it give the *right* tools to the *right* agent? + +This is managed by the **Tool Registry**. + +Think of the Tool Registry as a central toolbox or catalog where all available tools are kept and organized. + +1. **Registration:** When AgentDock starts up (or more accurately, when tools are first needed, thanks to lazy initialization in `src/lib/tools.ts`), tools defined in the codebase (like those in `src/nodes/stock-price`, `src/nodes/weather`, etc.) are "registered". This means they are added to the Tool Registry's internal list. The file `src/nodes/registry.ts` gathers many of these custom tools together. + + ```typescript + // Simplified concept from src/nodes/registry.ts + import { stockPriceTool } from './stock-price'; + import { weatherTool } from './weather'; + import { searchTool } from './search'; + // ... import other tools ... + + // A collection holding all known custom tools + export const allTools = { + 'stock_price': stockPriceTool, + 'weather': weatherTool, + 'search': searchTool, + // ... other tools mapped by their name ... + }; + ``` + +2. **Provisioning:** When an agent needs to run (e.g., when you start a chat), AgentDock looks at the `nodes` list in the agent's `AgentConfig`. It then asks the Tool Registry: "Please give me the tools named 'stock_price' and 'crypto_price' for this agent." + + ```typescript + // Conceptual usage of the Tool Registry + import { getToolRegistry } from 'agentdock-core'; + + const agentConfigNodes = ['llm.anthropic', 'stock_price', 'crypto_price']; + const registry = getToolRegistry(); // Get the central registry + + // Ask the registry for the specific tools the agent needs + const agentTools = registry.getToolsForAgent(agentConfigNodes); + // agentTools would now contain the actual 'stock_price' and 'crypto_price' tool objects + // (It ignores 'llm.anthropic' because that's not a typical tool) + ``` + The `getToolRegistry` function (`agentdock-core/src/nodes/tool-registry.ts`) ensures there's a single, shared registry instance. The `getToolsForAgent` method filters the master list to provide only the tools requested in the agent's configuration. + +3. **Execution:** When the LLM decides to use a tool, the system (specifically the [Orchestration (`OrchestrationManager`)](05_orchestration___orchestrationmanager___.md)) uses the tool object provided by the registry to call its `execute` method with the required parameters. + +Here's a diagram showing the flow when an agent uses a tool: + +```mermaid +sequenceDiagram + participant User + participant Agent (LLM) + participant Orchestrator + participant ToolRegistry + participant StockPriceTool + + User->>Agent (LLM): What's the price of GOOGL? + Agent (LLM)->>Orchestrator: Need to use 'stock_price' tool with symbol='GOOGL' + Orchestrator->>ToolRegistry: Get tool 'stock_price' + ToolRegistry-->>Orchestrator: Here is the stockPriceTool object + Orchestrator->>StockPriceTool: Execute with symbol='GOOGL' + Note right of StockPriceTool: Calls external API, gets data... + StockPriceTool-->>Orchestrator: Return result { price: 180.50, ... } + Orchestrator->>Agent (LLM): Here's the result from 'stock_price' + Agent (LLM)->>User: The current price for GOOGL is $180.50 USD. +``` + +This registry system makes AgentDock flexible. You can easily add new tools by defining them and registering them, making them available for your agents to use. + +## Conclusion + +Tools are essential components in AgentDock, acting as specialized extensions for your AI agents. + +* They allow agents to perform actions beyond basic language processing, like **accessing real-time data**, **calling external services**, or **performing complex calculations**. +* Tools are defined with a **name**, **description**, expected **parameters**, and an **`execute` function**. +* Agents (specifically the LLM) **decide when to use** a tool based on the user's request and the tool's description. +* The **Tool Registry** manages all available tools, and AgentDock provides the specific tools listed in an agent's `AgentConfig` when needed. + +By equipping your agents with the right tools, you can significantly enhance their capabilities and build powerful, specialized AI assistants. + +Now that we understand `AgentConfig` and `Tools`, let's dive deeper into the general concept they both relate to: **Nodes**. Tools are just one type of Node an agent can use. + +Next: [Chapter 3: Nodes (`BaseNode`, `AgentNode`)](03_nodes___basenode____agentnode___.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/AgentDock/03_nodes___basenode____agentnode___.md b/docs/AgentDock/03_nodes___basenode____agentnode___.md new file mode 100644 index 0000000..0eb0d3d --- /dev/null +++ b/docs/AgentDock/03_nodes___basenode____agentnode___.md @@ -0,0 +1,217 @@ +# Chapter 3: Nodes (`BaseNode`, `AgentNode`) + +In [Chapter 2: Tools](02_tools_.md), we learned how agents use specialized "Tools" like `stock_price` or `weather` to perform specific tasks. But what exactly *are* these tools internally? And what about the agent's "brain" itself – the part that understands language and decides *when* to use a tool? + +This chapter introduces the fundamental concept that underlies both tools and the agent's core logic: **Nodes**. + +## What's the Big Idea? Nodes as Workers + +Imagine a factory assembly line. You have different stations, each with a specialized worker performing a specific task: one worker attaches a wheel, another paints the body, another inspects the quality. + +In AgentDock, **Nodes** are like these specialized workers. They are the fundamental building blocks or capabilities that make up an agent. Every distinct function or capability within the system is represented as a Node. + +There are different *types* of workers (Nodes): + +1. **The General Employee Handbook (`BaseNode`):** Before any worker starts, they get an employee handbook defining common rules, how they get paid, their employee ID, etc. `BaseNode` is like this handbook. It's the **template** or **base blueprint** for *all* nodes (all workers) in AgentDock. It defines the common structure and properties every node must have, like a unique ID and configuration settings. +2. **The Assembly Line Manager (`AgentNode`):** On the assembly line, there's usually a manager overseeing the process, deciding what needs to be done next, talking to other workers (tools), and reporting the final result. The `AgentNode` is like this manager. It's a very specific, crucial type of Node that handles the **core conversation logic**. It uses the powerful AI language model ([CoreLLM (LLM Abstraction)](04_corellm__llm_abstraction__.md)) to understand the user, generate responses, and decide *when* to delegate tasks to other specialized workers (the Tools we learned about in Chapter 2). + +So, the "Tools" from the previous chapter are just one kind of specialized worker (Node). The agent's main "thinking" part is another specialized worker (the `AgentNode`). Both inherit their basic structure from the `BaseNode` template. + +## `BaseNode`: The Blueprint for All Nodes + +`BaseNode` is the foundation. Think of it as the abstract concept of a "worker" in our factory. It doesn't *do* a specific job itself, but it defines what *all* workers must have: + +* **An ID (`id`):** Like an employee ID badge, every node instance gets a unique identifier. +* **Configuration (`config`):** Instructions or settings specific to *this* worker's job (e.g., which specific model the `AgentNode` should use, or which API key a `stock_price` tool needs). +* **Metadata:** Information *about* the worker type (e.g., its name like "Stock Price Tool", what inputs it needs, what outputs it produces). + +You don't usually interact with `BaseNode` directly. Instead, you use specific types of nodes (like `AgentNode` or `stock_price`) that are *built using* the `BaseNode` blueprint. + +Here's a simplified peek at the structure defined in `agentdock-core/src/nodes/base-node.ts`: + +```typescript +// Simplified structure from agentdock-core/src/nodes/base-node.ts + +// The "Employee Handbook" blueprint +export abstract class BaseNode { + // Every worker gets a unique ID + readonly id: string; + + // Every worker has specific job instructions + protected config: TConfig; + + // Information about the type of worker + readonly metadata: NodeMetadata; // Contains label, description, etc. + + // The constructor: How a new worker instance is created + constructor(id: string, config: TConfig) { + this.id = id; + this.config = config; + // ... sets up metadata based on abstract methods below ... + } + + // Abstract methods (MUST be implemented by specific worker types) + // Like sections in the handbook that need details filled in + protected abstract getLabel(): string; // What's the worker's job title? + protected abstract getDescription(): string; // What does this worker do? + // ... other abstract methods for inputs, outputs, etc. ... + + // The actual work function (also must be defined by specific workers) + abstract execute(input: unknown): Promise; +} +``` + +**Explanation:** + +* `abstract class BaseNode`: This means `BaseNode` is a template, not a concrete worker. +* `id`, `config`, `metadata`: These are the common properties all nodes will have. +* `constructor`: The standard way to create any node instance. +* `abstract` methods (`getLabel`, `getDescription`, `execute`): These are placeholders. Any *specific* node type (like `AgentNode` or `stock_price`) must provide its own implementation for these, defining its specific job title, description, and how it actually does its work (`execute`). + +## `AgentNode`: The Conversational Manager + +Now, let's look at the most important specialized worker: the `AgentNode`. This node is responsible for the agent's conversational abilities. It's the "manager" on the assembly line. + +The `AgentNode` *is* a `BaseNode`, meaning it follows the same basic blueprint (it has an ID, config, metadata). But it has a very specific job: + +1. **Communicate:** It interacts with the underlying AI Language Model ([CoreLLM (LLM Abstraction)](04_corellm__llm_abstraction__.md)) to understand user input and generate text responses. +2. **Coordinate:** It looks at the list of available Tools (other nodes defined in the [Agent Configuration (`AgentConfig`)](01_agent_configuration___agentconfig___.md)) and decides if and when to use them based on the conversation. +3. **Manage State:** It keeps track of the conversation flow. + +When you look at an `AgentConfig` like the one for our Finance Assistant: + +```json +// Simplified excerpt from finance-assistant/template.json +{ + // ... + "nodes": [ + "llm.anthropic", // <--- This name points to an AgentNode type + "stock_price", // <--- This name points to a Tool Node type + "crypto_price" // <--- This name points to a Tool Node type + ], + "nodeConfigurations": { + "llm.anthropic": { // <--- Configuration FOR the AgentNode + "model": "claude-3-5-sonnet-20240620", + "apiKey": "..." // (Added securely, not stored here) + } + // Config for stock_price could go here too + }, + "personality": [ /* ... instructions for the AgentNode ... */ ] + // ... +} +``` + +* The name `"llm.anthropic"` tells AgentDock: "I need an `AgentNode` worker configured to use the Anthropic LLM provider." +* The `nodeConfigurations["llm.anthropic"]` section provides the specific settings (`config`) for *that* `AgentNode` instance (which model to use, the API key). +* The `personality` section provides instructions that the `AgentNode` will use when talking to the LLM. +* The other names (`"stock_price"`, `"crypto_price"`) tell AgentDock: "I also need these Tool workers available." The `AgentNode` is then *aware* of these tools and can decide to use them. + +Here's a simplified look at the `AgentNode` class from `agentdock-core/src/nodes/agent-node.ts`: + +```typescript +// Simplified structure from agentdock-core/src/nodes/agent-node.ts +import { BaseNode } from './base-node'; // It uses the blueprint! +import { CoreLLM } from '../llm'; // It needs an LLM to think +import { AgentConfig } from '../types/agent-config'; // It uses AgentConfig + +// Configuration specific to AgentNode (API keys, full AgentConfig) +export interface AgentNodeConfig { + apiKey: string; + agentConfig?: AgentConfig; // The blueprint for the whole agent + // ... other options like provider, fallback settings ... +} + +// The AgentNode "worker" class, extending the BaseNode blueprint +export class AgentNode extends BaseNode { + readonly type = 'core.agent'; // Specific type identifier + private llm: CoreLLM; // Holds the reference to the LLM "brain" + + constructor(id: string, config: AgentNodeConfig) { + // Call the BaseNode constructor first (standard procedure) + super(id, config); + console.log('AgentNode Constructor - Test Log: 1744977381'); // For debugging + + // AgentNode specific setup: Create the LLM instance + if (!config.apiKey) throw new Error('API key needed!'); + if (!config.agentConfig) throw new Error('AgentConfig needed!'); + this.llm = this.createLLMInstance(/* ... using config ... */); + // ... setup fallback LLM if configured ... + } + + // --- Implementation of BaseNode abstract methods --- + protected getLabel(): string { return this.config.agentConfig?.name || 'Agent'; } + protected getDescription(): string { return /* ... gets from agentConfig ... */; } + // ... other required methods ... + + // The CORE method for AgentNode (different from simple 'execute') + async handleMessage(options: AgentNodeHandleMessageOptions): Promise { + // 1. Figure out which tools are available (using AgentConfig & Orchestration) + // 2. Prepare the system prompt (using personality from AgentConfig) + // 3. Call the LLM (this.llm) with messages, tools, prompt + // 4. Return the result stream + // (This is complex, covered more in Orchestration chapter) + } + + // The standard 'execute' is less relevant here; handleMessage is key + async execute(input: unknown): Promise { + throw new Error('Use handleMessage for AgentNode'); + } + + private createLLMInstance(/* ... */): CoreLLM { /* ... */ } // Helper +} +``` + +**Explanation:** + +* `export class AgentNode extends BaseNode`: This clearly shows `AgentNode` is a specific *type* of `BaseNode`. It inherits the basic structure but adds its own logic. +* `AgentNodeConfig`: Defines the specific configuration needed *just* for an `AgentNode`, including the crucial `agentConfig` which contains personality, tools list etc. +* `constructor`: It takes its specific `config` and uses it to set up the necessary components, most importantly the `CoreLLM` instance. +* `handleMessage`: This is the primary method where the `AgentNode` does its work – managing the conversation, deciding on tool use, and interacting with the LLM. We'll see more about how this orchestration works in the [Orchestration (`OrchestrationManager`)](05_orchestration___orchestrationmanager___.md) chapter. + +## How Nodes are Created and Used + +So, you define node *names* (like `"llm.anthropic"` or `"stock_price"`) in your `AgentConfig`. How does AgentDock turn these names into actual working Node instances? + +1. **Configuration Loading:** AgentDock loads the `AgentConfig` for the agent you want to use (as seen in [Chapter 1](01_agent_configuration___agentconfig___.md)). +2. **Node Registry:** AgentDock has an internal catalog called the `NodeRegistry` (`agentdock-core/src/nodes/node-registry.ts`). This registry knows which *code* corresponds to which node *name*. For example, it knows that the name `"llm.anthropic"` should be handled by the `AgentNode` class, and `"stock_price"` should be handled by the `StockPriceNode` class (which is also a `BaseNode`). +3. **Instantiation:** For each node listed in the `AgentConfig.nodes`, AgentDock asks the `NodeRegistry` to create an instance of the corresponding Node class. It passes the specific configuration from `AgentConfig.nodeConfigurations` to the node's constructor. +4. **Ready:** Now AgentDock has actual Node objects (an `AgentNode` instance, a `StockPriceNode` instance, etc.) ready to work together. The `AgentNode` will manage the flow, potentially calling the `execute` method of the Tool nodes when needed. + +Here's a simplified diagram of creating the `AgentNode`: + +```mermaid +sequenceDiagram + participant System as AgentDock System + participant Config as AgentConfig ('finance-assistant') + participant Registry as NodeRegistry + participant Constructor as AgentNode Constructor + + System->>Config: Read 'nodes' list (contains 'llm.anthropic') + System->>Config: Read 'nodeConfigurations' for 'llm.anthropic' + System->>Registry: Request Node for type 'llm.anthropic' + Registry-->>System: Return AgentNode class & its required config schema + System->>Constructor: Create AgentNode instance with ID & specific config + Note right of Constructor: BaseNode logic runs first, then AgentNode specific setup (like creating LLM) + Constructor-->>System: Return ready AgentNode instance +``` + +This process ensures that the agent is assembled correctly with all the necessary "workers" (Nodes) based on its blueprint (`AgentConfig`). + +## Conclusion + +Nodes are the core building blocks in AgentDock, like specialized workers on an assembly line. + +* **`BaseNode`** is the fundamental **template** or blueprint for all nodes, defining common properties like `id` and `config`. +* **`AgentNode`** is a specific, vital type of node that extends `BaseNode`. It acts as the **conversational manager**, using the LLM, handling chat logic, and deciding when to use Tools (which are other types of nodes). +* Tools (like `stock_price`) are also specific types of nodes extending `BaseNode`, designed for specialized tasks. +* The `AgentConfig` lists the *names* of nodes an agent needs, and the `NodeRegistry` knows how to build the actual Node instances from those names. + +Understanding Nodes helps you see how different capabilities (talking, using tools) are organized and interact within AgentDock. The `AgentNode` is the central piece orchestrating these capabilities. Its "brain" relies heavily on the underlying AI model. + +In the next chapter, we'll explore the abstraction that represents this AI "brain": the [CoreLLM (LLM Abstraction)](04_corellm__llm_abstraction__.md). + +Next: [Chapter 4: CoreLLM (LLM Abstraction)](04_corellm__llm_abstraction__.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/AgentDock/04_corellm__llm_abstraction__.md b/docs/AgentDock/04_corellm__llm_abstraction__.md new file mode 100644 index 0000000..7f520c0 --- /dev/null +++ b/docs/AgentDock/04_corellm__llm_abstraction__.md @@ -0,0 +1,232 @@ +# Chapter 4: CoreLLM (LLM Abstraction) + +In [Chapter 3: Nodes (`BaseNode`, `AgentNode`)](03_nodes___basenode____agentnode___.md), we learned that the `AgentNode` is the "manager" of our AI agent, handling conversations and deciding when to use Tools. But how does this manager actually *talk* to the powerful AI brain, the Large Language Model (LLM)? That's where `CoreLLM` comes in! + +## The Problem: So Many Remotes! + +Imagine you have several different TVs at home: a Sony, an LG, and a Samsung. Each TV comes with its own specific remote control. If you want to turn on the Sony, you need the Sony remote. To change the channel on the LG, you need the LG remote. It's a bit confusing, right? + +Different Large Language Models (LLMs) like OpenAI's GPT-4, Anthropic's Claude, and Google's Gemini are like those different TVs. Each one has its own way of being controlled – its own specific Application Programming Interface (API), its own way of handling requests, and its own format for responses. + +If our `AgentNode` had to learn the specific "remote control" for every single LLM, its code would become very complex and hard to manage. What if we wanted to switch our "Finance Assistant" from using Claude to using Gemini? We'd have to rewrite parts of the `AgentNode`! + +## The Solution: A Universal Remote (`CoreLLM`) + +Wouldn't it be great if you had *one* universal remote that could control all your TVs? You press the "power" button, and it knows how to turn on whichever TV you're pointing it at. + +**`CoreLLM` is AgentDock's universal remote control for LLMs.** + +It acts as a **translator** or **adapter**. The `AgentNode` doesn't need to know the specific details of talking to Claude or GPT or Gemini. It just talks to `CoreLLM` using a single, simple set of commands, like: + +* "Generate some text based on this conversation." +* "Start streaming a response based on this conversation." +* "Generate some text, and here are the tools you can use." + +`CoreLLM` then takes these simple commands and translates them into the specific instructions needed by the actual LLM provider (like Anthropic or Google) that the agent is configured to use (remember the `nodeConfigurations` in [Chapter 1: Agent Configuration (`AgentConfig`)](01_agent_configuration___agentconfig___.md)?). + +```json +// From finance-assistant/template.json in Chapter 1 +"nodeConfigurations": { + "llm.anthropic": { // Config for the AgentNode using CoreLLM + "model": "claude-3-5-sonnet-20240620", // Tells CoreLLM *which* model + "apiKey": "..." // Tells CoreLLM the key for Anthropic + } +} +``` + +Based on this configuration, `CoreLLM` knows it needs to talk to Anthropic using the specified model and API key. + +## How the `AgentNode` Uses `CoreLLM` + +The `AgentNode` holds an instance of `CoreLLM`. When it needs the LLM's help (e.g., to respond to a user message or decide if a tool is needed), it calls methods on its `CoreLLM` instance. + +Let's look at a simplified example of how the `AgentNode` might ask `CoreLLM` to stream a response: + +```typescript +// Simplified example inside AgentNode's logic + +// 'this.llm' is the CoreLLM instance for this AgentNode +const llm = this.llm; + +// 1. Prepare the conversation history for the LLM +const messages: CoreMessage[] = [ + { role: 'system', content: 'You are a helpful assistant.' }, + { role: 'user', content: 'Hi! Can you tell me a joke?' } +]; + +// 2. Ask CoreLLM to start streaming a response +const streamResult = await llm.streamText({ + messages: messages, + // Maybe provide tools if needed +}); + +// 3. The AgentNode can now work with the streamResult +// (e.g., send the streaming text back to the user interface) +// It doesn't need to know if this came from Claude, GPT, or Gemini! +``` + +**Explanation:** + +1. The `AgentNode` prepares the input for the LLM – usually the conversation history (`messages`). +2. It calls the `streamText` method on its `CoreLLM` instance (`llm`). This is the "universal command". +3. `CoreLLM` takes care of talking to the *actual* configured LLM (like Claude) behind the scenes. It returns a `streamResult` object that the `AgentNode` can use in a standard way, regardless of the underlying provider. + +`CoreLLM` provides other simple methods like: + +* `generateText()`: For when you need the full response at once, not streamed. +* `generateObject()`: For asking the LLM to output structured data (like JSON) matching a specific format. + +## Key Benefits of `CoreLLM` + +* **Simplicity:** The `AgentNode` code stays clean and doesn't need provider-specific logic. +* **Flexibility:** You can easily switch the underlying LLM for an agent just by changing its `AgentConfig` (e.g., changing `"llm.anthropic"` to `"llm.openai"` and updating the configuration). No code changes needed in the `AgentNode`! +* **Consistency:** It handles text generation, streaming, and tool usage calls in a uniform way across all supported LLMs. +* **Centralized Logic:** It handles the common complexities of interacting with LLMs, like formatting API requests, managing API keys, and basic error handling. + +## What Happens Under the Hood? + +How does this "universal remote" actually work? `CoreLLM` relies heavily on a fantastic library called the **Vercel AI SDK**. This SDK provides tools to interact with many different LLM providers in a standardized way. + +Here’s a step-by-step look at what happens when the `AgentNode` calls `CoreLLM.streamText()`: + +1. **Request Received:** `CoreLLM` gets the request from `AgentNode`, including the messages and maybe a list of available tools. +2. **Identify Provider:** `CoreLLM` knows which LLM provider (e.g., Anthropic) and model (e.g., `claude-3-5-sonnet-20240620`) to use based on the configuration it received when it was created (originally from the `AgentConfig`). +3. **Call AI SDK:** `CoreLLM` uses the Vercel AI SDK's functions. It essentially says to the SDK: "Please stream text using *this specific Anthropic model*, with *these messages* and *these tools*." +4. **SDK Handles API Call:** The Vercel AI SDK takes care of formatting the request correctly for the Anthropic API and makes the actual network call. +5. **Provider Streams Response:** The LLM provider (Anthropic) starts sending back the response in small pieces (streaming). +6. **SDK Processes Stream:** The Vercel AI SDK receives these pieces and makes them available in a standardized stream format. +7. **Result Returned:** `CoreLLM` gets this standardized stream result from the SDK and passes it back to the `AgentNode`. + +Here's a simplified diagram of that flow: + +```mermaid +sequenceDiagram + participant AN as AgentNode + participant CL as CoreLLM + participant SDK as Vercel AI SDK + participant LLMApi as LLM Provider API (e.g., Anthropic) + + AN->>CL: streamText(messages, tools) + Note right of CL: Knows it needs to use Anthropic model X + CL->>SDK: streamText({ model: anthropicModelX, messages, tools }) + SDK->>LLMApi: Formats and sends API Request (stream) + LLMApi-->>SDK: Streams back response chunks + SDK-->>CL: Provides standardized StreamResult + CL-->>AN: Returns the StreamResult +``` + +## A Peek at the Code + +Let's look at the main `CoreLLM` class definition (`agentdock-core/src/llm/core-llm.ts`). We'll keep it simple! + +```typescript +// Simplified from agentdock-core/src/llm/core-llm.ts +import { + LanguageModel, // Represents the specific LLM model (e.g., Claude) + CoreMessage, + streamText, // Function from Vercel AI SDK + generateText, // Function from Vercel AI SDK + // ... other imports from 'ai' SDK ... +} from 'ai'; +import { LLMConfig } from './types'; // Our config type + +export class CoreLLM { + private model: LanguageModel; // The actual model object from the AI SDK + private config: LLMConfig; // Stores provider, apiKey, model name etc. + + constructor({ model, config }: { model: LanguageModel; config: LLMConfig }) { + this.model = model; + this.config = config; + // Ready to act as a universal remote for 'this.model'! + } + + // --- Universal Commands --- + + async streamText(options: { messages: CoreMessage[], /*...*/ }) { + // Uses the AI SDK's streamText function, passing the specific model + const streamResult = await streamText({ + model: this.model, // Tells SDK which actual model to use + messages: options.messages, + // ... pass other options like tools, temperature ... + }); + return streamResult; // Return the standardized result + } + + async generateText(options: { messages: CoreMessage[], /*...*/ }) { + // Uses the AI SDK's generateText function + const result = await generateText({ + model: this.model, // Tells SDK which actual model to use + messages: options.messages, + // ... pass other options ... + }); + return result; // Return the standardized result + } + + // ... other methods like generateObject, streamObject ... +} +``` + +**Explanation:** + +* The `CoreLLM` class takes the specific `LanguageModel` object (created by the AI SDK for a provider like Anthropic or OpenAI) and the `LLMConfig` in its `constructor`. +* Methods like `streamText` and `generateText` are simple wrappers around the corresponding functions from the Vercel AI SDK (`ai` library). +* Crucially, they pass `this.model` to the SDK functions, telling the SDK *which specific LLM* to talk to for this particular `CoreLLM` instance. + +**How is the right `LanguageModel` created?** + +There's a helper function `createLLM` (`agentdock-core/src/llm/create-llm.ts`) that reads the `LLMConfig` and uses other helpers (`agentdock-core/src/llm/model-utils.ts`) to ask the Vercel AI SDK to create the correct `LanguageModel` object (e.g., an Anthropic one or an OpenAI one). + +```typescript +// Simplified from agentdock-core/src/llm/create-llm.ts +import { CoreLLM } from './core-llm'; +import { LLMConfig } from './types'; +import { + createAnthropicModel, // Helper to get Anthropic model object + createOpenAIModel, // Helper to get OpenAI model object + // ... other model creation helpers ... +} from './model-utils'; + +export function createLLM(config: LLMConfig): CoreLLM { + let model; // This will hold the LanguageModel object + + // Check the provider name from the config + switch (config.provider) { + case 'anthropic': + model = createAnthropicModel(config); // Get Anthropic model + break; + case 'openai': + model = createOpenAIModel(config); // Get OpenAI model + break; + // ... cases for gemini, deepseek, groq ... + default: + throw new Error(`Unsupported provider: ${config.provider}`); + } + + // Create the CoreLLM instance, giving it the specific model object + return new CoreLLM({ model, config }); +} +``` + +**Explanation:** + +* `createLLM` acts like a factory. Based on the `provider` field in the `config`, it calls the appropriate function (like `createAnthropicModel`) to get the specific `LanguageModel` object from the Vercel AI SDK. +* It then creates and returns a `CoreLLM` instance equipped with that specific model, ready to be used by an `AgentNode`. + +## Conclusion + +You've now learned about `CoreLLM`, AgentDock's essential abstraction for interacting with Large Language Models! + +* It acts like a **universal remote control**, hiding the differences between LLM providers like Anthropic, OpenAI, and Google. +* It provides a **simple, consistent interface** (`streamText`, `generateText`, etc.) for the `AgentNode` to use. +* It makes it **easy to switch LLMs** by just changing the [Agent Configuration (`AgentConfig`)](01_agent_configuration___agentconfig___.md). +* Under the hood, it leverages the **Vercel AI SDK** to communicate with the actual LLM APIs. + +`CoreLLM` simplifies development significantly, allowing the rest of AgentDock (especially the `AgentNode`) to focus on conversation logic and tool usage without worrying about the specific details of each LLM provider. + +Now that we understand how the agent configures itself (`AgentConfig`), uses tools (`Tools`), organizes its capabilities (`Nodes`), and talks to its AI brain (`CoreLLM`), how is the entire conversation flow managed? How does the system decide when to call the LLM versus when to execute a tool? That's where orchestration comes in. + +Next: [Chapter 5: Orchestration (`OrchestrationManager`)](05_orchestration___orchestrationmanager___.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/AgentDock/05_orchestration___orchestrationmanager___.md b/docs/AgentDock/05_orchestration___orchestrationmanager___.md new file mode 100644 index 0000000..510f867 --- /dev/null +++ b/docs/AgentDock/05_orchestration___orchestrationmanager___.md @@ -0,0 +1,368 @@ +# Chapter 5: Orchestration (`OrchestrationManager`) + +In [Chapter 4: CoreLLM (LLM Abstraction)](04_corellm__llm_abstraction__.md), we saw how `CoreLLM` acts like a universal remote, letting our `AgentNode` talk to different AI brains (LLMs) easily. The `AgentNode` can ask the `CoreLLM` to generate text, and it can also give the `CoreLLM` a list of [Tools](02_tools_.md) it might want to use. + +But this raises a question: How does the agent decide *when* to just talk versus *when* to use a specific tool? And can we guide this process to make the agent behave in a more structured way? + +## The Problem: Predictable Steps + +Imagine you want an agent to help you research a topic. A good process might be: + +1. **Search:** First, use a search tool to gather basic information. +2. **Think:** Then, use its AI brain (LLM) to structure and analyze that information. +3. **Reflect:** Finally, use the LLM again to summarize the key findings and potential biases. + +If we just let the agent decide freely, it might jump straight to reflecting without searching, or it might search multiple times randomly. How can we ensure it follows these specific steps in the right order? + +## The Solution: The Conductor (`OrchestrationManager`) + +This is where the **`OrchestrationManager`** comes in. Think of it as the **conductor of an orchestra**. The agent's capabilities (talking via `CoreLLM`, using various `Tools`) are like the different instruments. The `OrchestrationManager` doesn't play any instruments itself, but it holds the musical score (the rules) and guides the musicians (the agent's capabilities) on *when* and *how* to play. + +It helps create more structured, predictable, and controllable agent behavior by: + +* Defining different **stages** or **steps** in a conversation. +* Specifying **which tools** are allowed at each step. +* Sometimes enforcing a **strict sequence** of tools that *must* be used. + +These rules are defined in a special section called `orchestration` within the agent's blueprint, the [Agent Configuration (`AgentConfig`)](01_agent_configuration___agentconfig___.md). + +## Key Concepts of Orchestration + +Let's break down the "musical score" – the `OrchestrationConfig` – that the `OrchestrationManager` reads. This configuration is typically part of the agent's `template.json` file. + +### 1. `OrchestrationConfig`: The Rulebook + +This is the main JSON object within `template.json` that defines the rules. Here's a simplified example from the "Cognitive Reasoner" agent, which is designed for complex thinking tasks: + +```json +// Simplified from agents/cognitive-reasoner/template.json +{ + // ... other AgentConfig properties ... + "orchestration": { + "description": "Guides the agent to select a mode (sequence) based on the query.", + "steps": [ + // ... Step definitions go here ... + ] + } + // ... other AgentConfig properties ... +} +``` + +### 2. Steps (`steps`): Conversation Stages + +Inside the `orchestration` block, you define a list of `steps`. Each step represents a distinct phase or mode of the conversation. + +* **`name`**: A unique identifier for the step (e.g., "ResearchMode", "ProblemSolvingMode"). +* **`description`**: Explains what happens in this step. +* **`isDefault`**: If `true`, this step is used when no other step's conditions are met. Think of it as the starting or fallback stage. + +```json +// Simplified step definition +{ + "name": "ResearchMode", + "description": "Agent focuses on searching and analyzing.", + // ... other properties like conditions, sequence ... +}, +{ + "name": "DefaultMode", + "description": "General mode, no specific restrictions.", + "isDefault": true +} +``` + +### 3. Sequences (`sequence`): Mandatory Tool Order + +Some steps might require the agent to use a specific set of tools in a precise order. This is defined using the `sequence` property within a step. + +```json +// Simplified step with a sequence +{ + "name": "ResearchMode", + "description": "Research using Search -> Think -> Reflect.", + "sequence": [ + "search", // Must use 'search' first + "think", // Then must use 'think' + "reflect" // Finally must use 'reflect' + ], + // ... other properties ... +} +``` + +If a step has a `sequence`, the `OrchestrationManager` will only allow the agent to use the *next* required tool in that list. + +### 4. Conditions (`conditions`): Switching Steps + +How does the agent move from one step to another? `conditions` define the rules for activating a specific step. + +* **`type`**: The kind of condition. Common types: + * `tool_used`: Becomes true if a specific tool (`value`) was recently used. + * `sequence_match`: Becomes true if the sequence of recently used tools matches the `sequence` defined in *this* step (useful for triggering a step *after* a sequence completes). +* **`value`**: The value to check against (e.g., the tool name for `tool_used`). + +```json +// Simplified step with a condition +{ + "name": "AnalysisMode", + "description": "Activates after the 'search' tool has been used.", + "conditions": [ + { "type": "tool_used", "value": "search" } + ] + // ... other properties ... +} +``` + +The `OrchestrationManager` checks these conditions to determine which step is currently active. + +### 5. Tool Availability (`availableTools`): Allowed/Denied Tools + +Within a step, even if there isn't a strict `sequence`, you can still control which tools the agent is allowed or forbidden to use. + +* **`allowed`**: A list of tool names. Only these tools can be used in this step. +* **`denied`**: A list of tool names. These tools *cannot* be used in this step. + +```json +// Simplified step with tool availability rules +{ + "name": "CreativeMode", + "description": "Focus on brainstorming, no searching allowed.", + "availableTools": { + "allowed": ["think", "brainstorm"], // Only these are okay + "denied": ["search"] // Explicitly forbid search + } +} +``` + +If a step has *both* `sequence` and `availableTools`, the `sequence` takes priority – only the next tool in the sequence is allowed. `availableTools` is more commonly used in steps *without* a sequence. + +### 6. `OrchestrationManager`: The Enforcer + +The `OrchestrationManager` is the actual system component (a class in the code) that puts all these rules into action during a conversation. + +* It **reads** the `OrchestrationConfig` for the specific agent. +* It **keeps track** of the current state for each user session (which step is active? what tools were used recently? how far into a sequence are we?). This state is managed by an internal helper called `OrchestrationStateManager`. +* It **evaluates** the `conditions` to determine the current `activeStep`. +* It **filters** the list of all possible tools based on the active step's `sequence` or `availableTools`. +* It **updates** the state when a tool is used (e.g., adds the tool to history, advances the sequence index). + +## How Orchestration Works in Practice + +Let's follow the flow when a user sends a message to an agent using orchestration: + +1. **User Message:** You send a message, e.g., "Research the impact of AI on jobs." +2. **AgentNode Receives:** The agent's main [Nodes (`BaseNode`, `AgentNode`)](03_nodes___basenode____agentnode___.md) receives the message. +3. **Check Orchestration:** Before calling the [CoreLLM (LLM Abstraction)](04_corellm__llm_abstraction__.md), the `AgentNode` asks the `OrchestrationManager`: "For this user session, given the `OrchestrationConfig`, what's the current state and which tools are allowed *right now*?" +4. **Manager Evaluates:** The `OrchestrationManager` retrieves the session's state (using `OrchestrationStateManager`). Let's say the state indicates the "ResearchMode" step is active, and the `sequence` requires "search" next. The manager determines that only the "search" tool is currently allowed. +5. **Filtered Tools:** The `OrchestrationManager` tells the `AgentNode`: "Only allow the 'search' tool." +6. **LLM Call:** The `AgentNode` calls `CoreLLM.streamText()`, providing the user message, conversation history, and *only the allowed tools* (in this case, just "search"). +7. **LLM Decides:** The LLM analyzes the request ("Research...") and sees that the "search" tool is available and appropriate. It decides to use it. +8. **Tool Execution:** The system executes the "search" tool. +9. **Tool Result & State Update:** The "search" tool returns its results. The `AgentNode` (often via a helper service like `LLMOrchestrationService`) informs the `OrchestrationManager`: "The 'search' tool was just used for this session." +10. **Manager Updates State:** The `OrchestrationManager` updates the session state: adds "search" to the `recentlyUsedTools` history and advances the `sequenceIndex` for "ResearchMode" (now expecting "think"). +11. **LLM Continues:** The `AgentNode` gives the search results back to the `CoreLLM`. +12. **Next Interaction:** Now, if the LLM needs to use another tool, the `AgentNode` will again ask the `OrchestrationManager`. This time, the manager will see the state expects "think" next in the sequence and will only allow the "think" tool. +13. **Final Response:** The process continues until the sequence is complete or the LLM generates a final text response to the user. + +## Under the Hood: A Quick Look + +Let's see how the `OrchestrationManager` coordinates this. + +**High-Level Flow Diagram:** + +```mermaid +sequenceDiagram + participant User + participant AgentNode + participant OrchManager as OrchestrationManager + participant OrchState as OrchestrationStateManager + participant CoreLLM + + User->>AgentNode: Send message ("Research...") + AgentNode->>OrchManager: Get Allowed Tools (SessionID, Config) + OrchManager->>OrchState: Get Current State (SessionID) + OrchState-->>OrchManager: Return State (e.g., step=ResearchMode, index=0) + Note right of OrchManager: Sequence expects 'search' + OrchManager-->>AgentNode: Return Allowed Tools (['search']) + AgentNode->>CoreLLM: Generate Response (with ['search']) + CoreLLM-->>AgentNode: Request to use 'search' + Note over AgentNode,CoreLLM: Tool 'search' executes... + AgentNode->>OrchManager: Process Tool Usage ('search', SessionID, Config) + OrchManager->>OrchState: Update State (add 'search' to history, index=1) + OrchState-->>OrchManager: Confirm Update + OrchManager-->>AgentNode: Acknowledge Tool Processed + AgentNode->>CoreLLM: Provide 'search' result + CoreLLM-->>AgentNode: Generate response / Request next tool ('think') + AgentNode->>User: Send response +``` + +**Key Code Components:** + +* **`OrchestrationManager` (`agentdock-core/src/orchestration/index.ts`):** The main class coordinating the logic. + + ```typescript + // Simplified from agentdock-core/src/orchestration/index.ts + import { OrchestrationStateManager, createOrchestrationStateManager } from './state'; + import { StepSequencer, createStepSequencer } from './sequencer'; + // ... other imports + + export class OrchestrationManager { + private stateManager: OrchestrationStateManager; + private sequencer: StepSequencer; + + constructor(options: OrchestrationManagerOptions = {}) { + // Gets or creates the state manager (handles storage) + this.stateManager = createOrchestrationStateManager(options); + // Gets or creates the sequencer (handles sequence logic) + this.sequencer = createStepSequencer(this.stateManager); + } + + // Determines the current step based on conditions and state + async getActiveStep(config, messages, sessionId): Promise { + const state = await this.stateManager.getOrCreateState(sessionId); + // ... logic to check conditions against state.recentlyUsedTools ... + // ... finds matching step or default step ... + // ... updates state.activeStep if changed ... + return activeStep; + } + + // Determines allowed tools based on the active step's rules + async getAllowedTools(config, messages, sessionId, allToolIds): Promise { + const activeStep = await this.getActiveStep(config, messages, sessionId); + if (!activeStep) return allToolIds; // No rules, allow all + + // If sequence, ask the sequencer for the next tool + if (activeStep.sequence?.length) { + return this.sequencer.filterToolsBySequence(activeStep, sessionId, allToolIds); + } + + // Otherwise, apply allowed/denied rules + // ... logic using activeStep.availableTools ... + return filteredTools; + } + + // Updates state after a tool is used + async processToolUsage(config, messages, sessionId, toolName): Promise { + const activeStep = await this.getActiveStep(config, messages, sessionId); + if (!activeStep) return; + + // Tell the sequencer (which updates state via stateManager) + await this.sequencer.processTool(activeStep, sessionId, toolName); + + // Re-check if the step should change now that the tool was used + await this.getActiveStep(config, messages, sessionId); + } + + // Gets the current state (used for conditions, etc.) + async getState(sessionId: SessionId): Promise { + return await this.stateManager.toAIOrchestrationState(sessionId); + } + + // Updates arbitrary parts of the state + async updateState(sessionId, partialState) { + return await this.stateManager.updateState(sessionId, partialState); + } + } + ``` + + **Explanation:** + * The `OrchestrationManager` uses helper classes: `OrchestrationStateManager` (to load/save the current status like `activeStep`, `recentlyUsedTools`, `sequenceIndex`) and `StepSequencer` (to handle the specific logic of enforcing `sequence` rules). + * `getActiveStep` finds the right step based on rules and history. + * `getAllowedTools` filters tools based on the active step's sequence or allow/deny lists. + * `processToolUsage` records that a tool was used and advances any active sequence. + +* **`OrchestrationStateManager` (`agentdock-core/src/orchestration/state.ts`):** Handles saving and loading the orchestration state for each session. It uses the [Session Management (`SessionManager`)](07_session_management___sessionmanager___.md) and [Storage (`StorageProvider`, `StorageFactory`)](08_storage___storageprovider____storagefactory___.md) components discussed later to persist this state. + + ```typescript + // Simplified concept from agentdock-core/src/orchestration/state.ts + import { SessionManager } from '../session'; + // ... + + export class OrchestrationStateManager { + private sessionManager: SessionManager; // Uses SessionManager! + + constructor(/* ... options including storage ... */) { + // ... initializes sessionManager with storage ... + } + + // Gets state (or creates if missing) using SessionManager + async getOrCreateState(sessionId): Promise { + const result = await this.sessionManager.getSession(sessionId); + if (result.success && result.data) return result.data; + // ... handle creation if needed ... + return newState; + } + + // Updates state using SessionManager + async updateState(sessionId, updates): Promise { + const updateFn = (currentState) => ({ ...currentState, ...updates }); + const result = await this.sessionManager.updateSession(sessionId, updateFn); + return result.data; + } + + // Adds tool to history (within updateState usually) + async addUsedTool(sessionId, toolName) { /* ... */ } + + // Converts internal state to AI-facing state + async toAIOrchestrationState(sessionId): Promise { /* ... */ } + } + ``` + +* **`StepSequencer` (`agentdock-core/src/orchestration/sequencer.ts`):** Focuses specifically on managing the `sequence` logic within a step. + + ```typescript + // Simplified concept from agentdock-core/src/orchestration/sequencer.ts + import { OrchestrationStateManager } from './state'; + // ... + + export class StepSequencer { + private stateManager: OrchestrationStateManager; + + // Filters tools: returns ONLY the next expected tool if in a sequence + async filterToolsBySequence(step, sessionId, allToolIds): Promise { + const state = await this.stateManager.getState(sessionId); + const currentIndex = state?.sequenceIndex ?? 0; + const expectedTool = step.sequence?.[currentIndex]; + + if (expectedTool && allToolIds.includes(expectedTool)) { + return [expectedTool]; // Only allow the expected tool + } + return allToolIds; // Sequence done or tool unavailable? Allow all/none. + } + + // Processes tool use: advances sequence index if the tool matches + async processTool(step, sessionId, usedTool): Promise { + await this.stateManager.addUsedTool(sessionId, usedTool); // Always track history + + const state = await this.stateManager.getState(sessionId); + const currentIndex = state?.sequenceIndex ?? 0; + const expectedTool = step.sequence?.[currentIndex]; + + if (step.sequence && expectedTool === usedTool) { + // Advance the index in the state + await this.stateManager.updateState(sessionId, { sequenceIndex: currentIndex + 1 }); + return true; + } + return false; // Tool didn't match sequence + } + } + ``` + +By separating concerns (`Manager` for overall coordination, `StateManager` for persistence, `Sequencer` for sequence logic), the system remains organized and easier to manage. + +## Conclusion + +You've learned about the `OrchestrationManager`, AgentDock's "conductor" for controlling agent behavior! + +* It allows you to define **structured workflows** using `steps`, `conditions`, `sequences`, and `availableTools` within the agent's `OrchestrationConfig`. +* It acts as a **gatekeeper**, determining which tools the agent is allowed to use at any given moment based on the current state and rules. +* It helps create **more predictable and reliable agents**, especially for tasks requiring specific multi-step processes. +* It relies on `OrchestrationStateManager` to remember the state for each conversation and `StepSequencer` to handle mandatory tool orders. + +Understanding orchestration unlocks the ability to build sophisticated agents that follow specific protocols or workflows. + +Now that we've covered the core components of an agent (Config, Tools, Nodes, LLM, Orchestration), how does an external user actually interact with an agent? The next chapter explains how AgentDock exposes agents through a web API. + +Next: [Chapter 6: API Route (`/api/chat/[agentId]/route.ts`)](06_api_route____api_chat__agentid__route_ts___.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/AgentDock/06_api_route____api_chat__agentid__route_ts___.md b/docs/AgentDock/06_api_route____api_chat__agentid__route_ts___.md new file mode 100644 index 0000000..e8dff29 --- /dev/null +++ b/docs/AgentDock/06_api_route____api_chat__agentid__route_ts___.md @@ -0,0 +1,319 @@ +# Chapter 6: API Route (`/api/chat/[agentId]/route.ts`) + +In the [Chapter 5: Orchestration (`OrchestrationManager`)](05_orchestration___orchestrationmanager___.md), we saw how the `OrchestrationManager` acts like a conductor, guiding the agent through specific steps and controlling tool usage. But how does your message from the chat window even reach the agent and its conductor in the first place? And how does the agent's response get back to your screen? + +Let's explore the main "front door" for all chat interactions: the API Route. + +## The Big Idea: The Chat Server's Front Door + +Imagine you're sending a letter to a specific department (an agent) in a large company (AgentDock). You don't just throw the letter over the fence! You put it in an envelope, address it to the right department (`agentId`), and drop it off at the company's mailroom (the API Route). + +The mailroom (`/api/chat/[agentId]/route.ts`) is the central point on the AgentDock web server that receives all incoming chat messages from the user interface. Its job is to: + +1. **Receive the Message:** Accept the incoming "letter" (the user's message and chat history). +2. **Identify the Recipient:** Figure out which agent (`agentId`) the message is for. +3. **Check Credentials:** Find the necessary keys (like API keys) to allow the agent to work. +4. **Prepare the Agent:** Load the agent's instructions ([Agent Configuration (`AgentConfig`)](01_agent_configuration___agentconfig___.md)). +5. **Deliver the Message:** Pass the message and instructions to the right internal systems (like the [Nodes (`BaseNode`, `AgentNode`)](03_nodes___basenode____agentnode___.md) and [Orchestration (`OrchestrationManager`)](05_orchestration___orchestrationmanager___.md)). +6. **Return the Reply:** Get the agent's response (often as a stream of text) and send it back to the user's browser. + +This specific file, `src/app/api/chat/[agentId]/route.ts`, is a special kind of file in a Next.js web application. It defines a server-side endpoint that your browser can talk to. + +## Key Concepts + +Let's quickly understand a few key ideas: + +* **API Route:** In Next.js (the web framework AgentDock uses), an API route is a file inside the `app/api` directory that runs on the server, not in the user's browser. It allows the browser (front-end) to securely communicate with the server (back-end) to perform actions or get data. +* **Dynamic Route (`[agentId]`):** The square brackets `[]` in the filename mean this part of the URL is dynamic. So, whether you send a message to `/api/chat/finance-assistant` or `/api/chat/travel-planner`, this *same* `route.ts` file handles the request. It can then look at the URL to know which `agentId` was requested. +* **HTTP Request:** When you send a message in the chat UI, your browser sends an HTTP `POST` request to the API route. This request contains data like your message, the chat history, and potentially which agent you're talking to. +* **HTTP Response (Streaming):** The API route needs to send the agent's reply back. Instead of waiting for the entire reply (which might take time for the AI to generate), it usually *streams* the response. This means it sends the text back in small chunks as soon as they are generated, so you see the reply appearing word by word in the chat UI, making it feel much faster. + +## How it Works: From User Message to Agent Reply + +Let's trace the journey of your chat message: + +1. **You Type & Send:** You type "What's Apple's stock price?" in the chat window for the "Finance Assistant" and hit Send. +2. **Browser Sends Request:** Your browser packages up your message, the chat history, and sends an HTTP POST request to `/api/chat/finance-assistant`. It might also include an API key you provided in the settings via a special header (`x-api-key`). +3. **API Route Receives:** The `route.ts` file on the server receives this request. +4. **Identify Agent:** It extracts `agentId` ("finance-assistant") from the URL. +5. **Get Data:** It reads the message history and your new message from the request body. +6. **Find API Key:** It looks for the LLM API key needed by the Finance Assistant. It checks (in order): + * The `x-api-key` header from the request. + * Global API key settings stored securely. + * Environment variables on the server (if "Bring Your Own Key" mode is off). +7. **Load Agent Blueprint:** It finds the `template.json` for "finance-assistant" and uses `loadAgentConfig` (from [Chapter 1: Agent Configuration (`AgentConfig`)](01_agent_configuration___agentconfig___.md)) to create the complete `AgentConfig` object, securely adding the API key. +8. **Call the Adapter:** It calls a helper function, `processAgentMessage` (found in `src/lib/agent-adapter.ts`). This function acts as an intermediary to the core agent logic. +9. **Adapter Works:** `processAgentMessage` takes the `AgentConfig`, messages, API key, etc. It might: + * Get the [Orchestration (`OrchestrationManager`)](05_orchestration___orchestrationmanager___.md) instance. + * Create the specific [Nodes (`BaseNode`, `AgentNode`)](03_nodes___basenode____agentnode___.md) needed (like the `AgentNode` itself). + * Call the `AgentNode`'s `handleMessage` method, passing the messages, orchestration manager, and other context. +10. **Agent Thinks & Responds:** The `AgentNode` uses its [CoreLLM (LLM Abstraction)](04_corellm__llm_abstraction__.md) and available [Tools](02_tools_.md) (like `stock_price`), guided by the `OrchestrationManager`, to process the request and generate a response stream. +11. **Stream Back to Route:** The `AgentNode`'s `handleMessage` method returns a result object containing the live response stream back to the `processAgentMessage` function, which returns it to the API route (`route.ts`). +12. **Route Sends to Browser:** The API route takes this stream and sends it back to your browser as the HTTP response. +13. **UI Displays:** Your browser receives the streaming text chunks and displays them in the chat window. + +Phew! That seems like a lot, but it happens very quickly. The key is that the API route acts as the central coordinator for handling web requests. + +## Under the Hood + +Let's look a bit closer at the code and flow. + +### Sequence Diagram + +This diagram shows the main players involved when you send a message: + +```mermaid +sequenceDiagram + participant Browser as User Browser + participant Route as API Route (/api/chat/[agentId]/route.ts) + participant Adapter as Agent Adapter (processAgentMessage) + participant Agent as AgentNode + participant LLM as CoreLLM + + Browser->>Route: POST /api/chat/finance-assistant (message, history, apiKey?) + Route->>Route: Extract agentId, body, headers + Route->>Route: Resolve LLM API Key + Route->>Route: Load AgentConfig('finance-assistant') + Route->>Adapter: processAgentMessage(config, messages, apiKey, ...) + Adapter->>Agent: Create AgentNode(config, apiKey) + Adapter->>Agent: agent.handleMessage(messages, orchestrationMgr, ...) + Agent->>LLM: streamText(prompt, history, tools) + Note right of LLM: LLM generates response stream... + LLM-->>Agent: Return StreamResult object + Agent-->>Adapter: Return StreamResult object + Adapter-->>Route: Return StreamResult object + Route->>Route: Create Streaming HTTP Response from StreamResult + Route-->>Browser: Send streaming response + Browser->>Browser: Display text as it arrives +``` + +### Code Walkthrough: `route.ts` + +This file contains the main `POST` function that handles incoming chat requests. + +```typescript +// File: src/app/api/chat/[agentId]/route.ts +import { NextRequest, NextResponse } from 'next/server'; +import { + loadAgentConfig, // From Chapter 1 + APIError, ErrorCode, logger, LogCategory, Message +} from 'agentdock-core'; +import { templates, TemplateId } from '@/generated/templates'; // Bundled blueprints +import { processAgentMessage } from '@/lib/agent-adapter'; // The helper adapter +import { getLLMInfo } from '@/lib/utils'; +// ... other imports like resolveApiKey, addOrchestrationHeaders ... + +// This function handles POST requests to /api/chat/[agentId] +export async function POST( + request: NextRequest, + context: { params: Promise<{ agentId: string }> } +) { + try { + // 1. Get agentId from the URL (e.g., "finance-assistant") + const { agentId } = await context.params; + logger.debug(LogCategory.API, 'ChatRoute', 'Processing chat request', { agentId }); + + // 2. Read data from the request (messages, session info) + const body = await request.json(); + const { messages, system, sessionId: requestSessionId, config: runtimeOverrides } = body; + + // 3. Find the agent's blueprint (template) + const template = templates[agentId as TemplateId]; + if (!template) { + throw new APIError('Template not found', /*...*/); + } + + // 4. Figure out which LLM provider this agent uses + const llmInfo = getLLMInfo(template); + + // 5. Resolve the necessary LLM API key (checks headers, settings, env vars) + const apiKey = await resolveApiKey(request, llmInfo.provider, /* isByokOnly */); + if (!apiKey) { + throw new APIError('API key is required', /*...*/); + } + + // 6. Load the full AgentConfig, injecting the API key securely + const fullAgentConfig = await loadAgentConfig(template, apiKey); + + // 7. Get Session ID (from header or body, or create new) + const clientSessionId = request.headers.get('x-session-id') || requestSessionId; + const finalSessionId = clientSessionId || `session-${agentId}-${Date.now()}`; + + // --- Orchestration State Handling (Simplified) --- + let orchestrationState = null; + if (template.orchestration) { + // Fetch state if needed, using the orchestration adapter + orchestrationState = await import('@/lib/orchestration-adapter') + .then(m => m.getOrchestrationState(finalSessionId, /*...*/)); + } + // --- End Orchestration --- + + // 8. Call the Agent Adapter to handle the core logic + const result = await processAgentMessage({ + agentId, + messages: messages as Message[], + sessionId: finalSessionId, + apiKey, + provider: llmInfo.provider, + system, + config: runtimeOverrides, // Pass runtime overrides like temperature + fullAgentConfig: fullAgentConfig, // Pass the loaded config + orchestrationState // Pass the fetched state + // ... potentially pass fallback API key ... + }); + + // 9. Create the streaming response from the adapter's result + const response = result.toDataStreamResponse(); + + // 10. Add extra headers (like session ID, token usage, orchestration state) + response.headers.set('x-session-id', finalSessionId); + // ... add x-token-usage header if available ... + await import('@/lib/orchestration-adapter') + .then(m => m.addOrchestrationHeaders(response, finalSessionId)); + + // 11. Send the streaming response back to the browser + return response; + + } catch (error) { + // Handle errors gracefully + logger.error(LogCategory.API, 'ChatRoute', 'Error processing chat request', /*...*/); + return new Response(JSON.stringify(normalizeError(error)), /*...*/); + } +} +``` + +**Explanation:** + +1. The `POST` function receives the `request` and `context` (which contains the `agentId`). +2. It gets the agent's template (`template.json` content). +3. It resolves the required `apiKey` using the `resolveApiKey` helper. +4. It loads the full `fullAgentConfig` using `loadAgentConfig`. +5. It determines the `finalSessionId`. +6. It potentially fetches the current `orchestrationState` (relevant for [Chapter 5: Orchestration (`OrchestrationManager`)](05_orchestration___orchestrationmanager___.md)). +7. Crucially, it calls `processAgentMessage`, passing all the necessary information. +8. It gets a `result` object back from the adapter. This object contains methods to create the final response. +9. It calls `result.toDataStreamResponse()` to get a standard web `Response` object that streams the agent's reply. +10. It adds useful headers (like the session ID and orchestration state) to the response. +11. It returns the `response` to the browser. + +### Code Walkthrough: `agent-adapter.ts` + +The API route delegates the core agent interaction logic to `processAgentMessage` in this adapter file. This keeps the route file cleaner. + +```typescript +// File: src/lib/agent-adapter.ts +import { + logger, LogCategory, Message, AgentNode, AgentConfig +} from 'agentdock-core'; +import { v4 as uuidv4 } from 'uuid'; +import { getOrchestrationManagerInstance } from '@/lib/orchestration-adapter'; +// ... other imports ... + +// Options expected by processAgentMessage +interface HandleMessageOptions { + // ... agentId, messages, apiKey, provider, etc ... + fullAgentConfig: AgentConfig; // The fully loaded AgentConfig + sessionId?: string; + orchestrationState?: any; // Optional pre-fetched state +} + +// This function orchestrates the interaction with the AgentNode +export async function processAgentMessage(options: HandleMessageOptions) { + const { + agentId, + messages, + sessionId, + apiKey, + provider, + system, + config: runtimeOverrides, + _fallbackApiKey, + orchestrationState, + fullAgentConfig // Use the passed-in config + } = options; + + const finalSessionId = sessionId || uuidv4(); + + // Get the shared OrchestrationManager instance + const manager = getOrchestrationManagerInstance(); + // Ensure session state exists (important for orchestration) + await manager.ensureStateExists(finalSessionId); + + // 1. Create the AgentNode instance + // We pass the loaded fullAgentConfig here. + const agent = new AgentNode( + agentId, + { + agentConfig: fullAgentConfig, // The core blueprint! + apiKey, + provider, + options: runtimeOverrides, // LLM overrides (temp, etc.) + // ... fallback key if needed ... + } + ); + + try { + // 2. Call the AgentNode's main method + // Pass messages, session ID, the manager, and pre-fetched state + const result = await agent.handleMessage({ + messages, + sessionId: finalSessionId, + orchestrationManager: manager, // Pass the manager instance + systemOverride: system, + ...(runtimeOverrides ? { config: runtimeOverrides } : {}), + ...(orchestrationState ? { orchestrationState } : {}) // Pass state + }); + + // 3. Return the result object (containing the stream) to the API route + // The adapter doesn't process the stream itself. + return { + ...result, // Include all properties from the AgentNode result + // Helper method used by the API route to create the final response + toDataStreamResponse(opts = {}) { + return result.toDataStreamResponse(opts); + } + }; + + } catch (error) { + logger.error(LogCategory.API, 'AgentAdapter', 'Error processing agent message', /*...*/); + throw error; // Re-throw for the API route to handle + } +} +``` + +**Explanation:** + +1. It receives the `fullAgentConfig` (already loaded by the API route), `apiKey`, `messages`, `sessionId`, etc. +2. It gets the instance of the [Orchestration (`OrchestrationManager`)](05_orchestration___orchestrationmanager___.md). +3. It creates the specific [Nodes (`BaseNode`, `AgentNode`)](03_nodes___basenode____agentnode___.md) instance for this agent, passing in the `fullAgentConfig` and `apiKey`. +4. It calls the crucial `agent.handleMessage` method, providing all the context needed for the agent to run (messages, session ID, the orchestration manager, etc.). +5. It receives the `result` object back from `agent.handleMessage`. This object contains the stream and other information (like token usage promises). +6. It returns this `result` object back to the API route (`route.ts`). + +### API Key Resolution + +The `resolveApiKey` function (used inside `route.ts`) handles finding the correct LLM API key. It checks in this order: + +1. `x-api-key` header: If the user provided a key directly in the UI settings for this session. +2. Global Settings: Checks secure storage for a globally configured key for the required LLM provider (e.g., Anthropic). +3. Environment Variables: If "Bring Your Own Key" (BYOK) mode is OFF, it looks for server environment variables like `ANTHROPIC_API_KEY` or `OPENAI_API_KEY`. If BYOK is ON, it *stops* before checking environment variables, requiring the user to provide a key via steps 1 or 2. + +This ensures flexibility while allowing administrators to control key usage. + +## Conclusion + +You've now seen how the `/api/chat/[agentId]/route.ts` API route acts as the essential entry point for chat interactions in AgentDock's web interface. + +* It's the **front door** that receives user messages via HTTP requests. +* It identifies the target **agent** using the dynamic `[agentId]` part of the URL. +* It **coordinates** loading the agent's configuration, resolving API keys, and managing session information. +* It delegates the core agent logic to helper functions/adapters (like `processAgentMessage`) which interact with `AgentNode`, `CoreLLM`, and `OrchestrationManager`. +* It efficiently **streams** the agent's response back to the user's browser. + +Understanding this route helps you see how the user interface connects to the powerful backend components we've discussed. But how does AgentDock remember the state of your conversation across multiple messages, especially for things like orchestration? That involves session management. + +Next: [Chapter 7: Session Management (`SessionManager`)](07_session_management___sessionmanager___.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/AgentDock/07_session_management___sessionmanager___.md b/docs/AgentDock/07_session_management___sessionmanager___.md new file mode 100644 index 0000000..1a4d3fe --- /dev/null +++ b/docs/AgentDock/07_session_management___sessionmanager___.md @@ -0,0 +1,294 @@ +# Chapter 7: Session Management (`SessionManager`) + +In [Chapter 6: API Route (`/api/chat/[agentId]/route.ts`)](06_api_chat__agentid__route_ts__.md), we saw how the API route acts as the front door, receiving your messages and sending back the agent's replies. We also saw how it gets the `sessionId` and potentially loads the `orchestrationState`. But how does the system *remember* that state between your different messages? How does it keep your conversation separate from someone else talking to the same agent? + +## The Problem: Amnesia and Mixed-Up Conversations + +Imagine talking to an assistant who forgets everything you just said the moment you pause. You ask, "What's the capital of France?" They answer, "Paris." Then you ask, "What's its population?" And they reply, "Whose population?" They've forgotten you were talking about Paris! + +Also, imagine you're chatting with the "Finance Assistant", and someone else starts chatting with the same assistant at the same time. If the system isn't careful, your request for Apple's stock price might get mixed up with their request for Bitcoin's price, leading to confusing or wrong answers for both of you. + +We need a way to: + +1. **Remember:** Keep track of important details *within* a single conversation (like the current topic, recent messages, or the current step in an [Orchestration (`OrchestrationManager`)](05_orchestration___orchestrationmanager___.md)). +2. **Isolate:** Ensure that each user's conversation is completely separate and doesn't interfere with others. + +## The Solution: The Hotel Front Desk (`SessionManager`) + +This is exactly what the **`SessionManager`** does. Think of AgentDock as a hotel. Each time a new guest (a user starting a chat) arrives, the `SessionManager` (the friendly hotel front desk clerk) gives them a unique room key (`sessionId`). + +* **Unique Key (`sessionId`):** This key ensures the guest can only access their own room. In AgentDock, the `sessionId` uniquely identifies *your* specific conversation. +* **Room (`SessionState`):** The hotel room holds the guest's belongings and maybe notes about their stay (like "requested extra towels"). In AgentDock, the `SessionState` is where we store information specific to *your* conversation, like the [Orchestration (`OrchestrationManager`)](05_orchestration___orchestrationmanager___.md) status (which step you're on), cumulative token counts, or even recent message summaries. +* **Front Desk (`SessionManager`):** The front desk manages all the keys and knows which key belongs to which room. It can retrieve information about a specific guest's stay if you give them the key. In AgentDock, the `SessionManager` manages all active `sessionId`s and their corresponding `SessionState`. It provides functions to create, get, update, and delete the state associated with a specific `sessionId`. + +By using the `SessionManager`, AgentDock ensures that when you send a message with your unique `sessionId` (your room key), the system retrieves *your* specific `SessionState` (checks *your* room), processes your message in that context, updates the state if needed (maybe puts a new note in your room), and sends the reply back to you, without ever affecting any other guest's (user's) conversation. + +## Key Concepts + +1. **`SessionId`:** A unique string (like `session-finance-12345` or a long random ID) that identifies one specific conversation instance. This is your unique room key. +2. **`SessionState`:** An object holding the data associated with a `SessionId`. This can include anything the system needs to remember for that specific chat, like the `activeStep` or `recentlyUsedTools` used by the [Orchestration (`OrchestrationManager`)](05_orchestration___orchestrationmanager___.md). This is the information stored *inside* your room. +3. **`SessionManager`:** The class responsible for managing the lifecycle of sessions. It provides methods like `createSession`, `getSession`, `updateSession`, and `deleteSession`. This is the front desk clerk. +4. **`StorageProvider`:** Where does the `SessionManager` actually store the `SessionState`? It uses a pluggable [Storage (`StorageProvider`, `StorageFactory`)](08_storage___storageprovider____storagefactory___.md) (like in-memory storage for quick tests, or Redis for a persistent, shared storage). This is the hotel's record-keeping system (a filing cabinet, a computer database). + +## How it's Used: Remembering Orchestration State + +Let's revisit the `OrchestrationStateManager` from [Chapter 5: Orchestration (`OrchestrationManager`)](05_orchestration___orchestrationmanager___.md). Its job is to remember things like the `activeStep` or `sequenceIndex` for a specific conversation. How does it do this across multiple messages? **It uses the `SessionManager`!** + +The `OrchestrationStateManager` doesn't store the state itself directly. Instead, it asks the `SessionManager` to store and retrieve the orchestration-related data as part of the overall `SessionState` for that conversation. + +Here's a simplified example showing how `OrchestrationStateManager` uses `SessionManager`: + +```typescript +// Simplified concept from agentdock-core/src/orchestration/state.ts +import { SessionManager } from '../session'; // Uses SessionManager! +import { SessionId, SessionState } from '../types/session'; + +// Define what orchestration state looks like (extends basic SessionState) +interface OrchestrationState extends SessionState { + activeStep?: string; + recentlyUsedTools: string[]; + sequenceIndex?: number; + // ... other fields like lastAccessed, cumulativeTokenUsage ... +} + +// Factory function to create a NEW blank state for a session +function createDefaultOrchestrationState(sessionId: SessionId): OrchestrationState { + return { sessionId, recentlyUsedTools: [], sequenceIndex: 0, /* ... */ }; +} + +export class OrchestrationStateManager { + // It HOLDS a SessionManager instance, specifically for OrchestrationState + private sessionManager: SessionManager; + + constructor(/* ... options including storageProvider ... */) { + // Creates a SessionManager, telling it HOW to create a default state + // and WHICH storage provider to use. + this.sessionManager = new SessionManager( + createDefaultOrchestrationState, // Function to create new state + options.storageProvider, // Where to store it + 'orchestration-state' // Namespace in storage + ); + } + + // Get state for a session (using SessionManager) + async getState(sessionId: SessionId): Promise { + // Ask the SessionManager to get the data for this key + const result = await this.sessionManager.getSession(sessionId); + return result.success ? result.data : null; + } + + // Update state for a session (using SessionManager) + async updateState(sessionId: SessionId, updates: Partial): Promise { + // Define HOW to merge the updates with the current state + const updateFn = (currentState: OrchestrationState): OrchestrationState => { + return { ...currentState, ...updates, lastAccessed: Date.now() }; + }; + // Ask SessionManager to apply this update function + const result = await this.sessionManager.updateSession(sessionId, updateFn); + return result.success ? result.data : null; + } + + // Add a used tool (uses updateState above) + async addUsedTool(sessionId: SessionId, toolName: string): Promise { + const state = await this.getState(sessionId); + if (!state) return null; + const updatedTools = [toolName, ...(state.recentlyUsedTools || [])].slice(0, 10); + // Calls updateState, which uses sessionManager.updateSession + return this.updateState(sessionId, { recentlyUsedTools: updatedTools }); + } +} +``` + +**Explanation:** + +1. `OrchestrationState` defines the specific pieces of information the orchestrator needs to remember (like `activeStep`). It includes `sessionId` because all session states need it. +2. The `OrchestrationStateManager` *contains* a `SessionManager` instance. This `SessionManager` is specifically configured to handle `OrchestrationState`. +3. When `OrchestrationStateManager` needs to get the state (`getState`), it simply calls `this.sessionManager.getSession(sessionId)`. +4. When it needs to update the state (`updateState`, `addUsedTool`), it defines *how* the state should change and then calls `this.sessionManager.updateSession(sessionId, updateFn)`. + +The `OrchestrationStateManager` focuses on the *logic* of orchestration, while the `SessionManager` handles the generic task of *storing and retrieving* that state reliably and separately for each session, using the configured [Storage (`StorageProvider`, `StorageFactory`)](08_storage___storageprovider____storagefactory___.md). + +## Under the Hood: `SessionManager` Internals + +How does the `SessionManager` itself work? It's essentially a bridge between your code and the storage layer. + +**High-Level Flow (Updating State):** + +```mermaid +sequenceDiagram + participant OSM as OrchestrationStateManager + participant SM as SessionManager + participant SP as StorageProvider (e.g., Redis) + + OSM->>SM: updateSession(sessionId, updateFunction) + SM->>SP: get(storageKey) + SP-->>SM: Return current StoredSessionData (if exists) + Note right of SM: Applies updateFunction to the state + SM->>SP: set(storageKey, updated StoredSessionData, {ttl}) + SP-->>SM: Confirm data stored + SM-->>OSM: Return updated state (or error) +``` + +**Code Structure (`SessionManager`):** + +Let's look at a simplified version of the `SessionManager` class itself. + +```typescript +// Simplified from agentdock-core/src/session/index.ts +import { SessionId, SessionState, SessionResult } from '../types/session'; +import { StorageProvider } from '../storage/types'; +import { logger } from '../logging'; + +// Wrapper for stored data, includes state, metadata, TTL +interface StoredSessionData { + state: T; + metadata: { createdAt: Date; lastAccessedAt: Date; /*...*/ }; + ttlMs: number; +} + +export class SessionManager { + private storage: StorageProvider; // The storage system (e.g., Memory, Redis) + private defaultStateGenerator: (sessionId: SessionId) => T; // How to make a new state + private storageNamespace: string; // Prefix for storage keys + private defaultTtlMs: number; // Default time-to-live + + constructor( + defaultStateGenerator: (sessionId: SessionId) => T, + storageProvider: StorageProvider, + storageNamespace: string = 'sessions', + options: { defaultTtlMs?: number } = {} + ) { + this.storage = storageProvider; // Store the provided storage instance + this.defaultStateGenerator = defaultStateGenerator; + this.storageNamespace = storageNamespace; + this.defaultTtlMs = options.defaultTtlMs || 30 * 60 * 1000; // Default 30 mins + logger.debug('SessionManager initialized', { ns: storageNamespace }); + } + + // Helper to create the key used in storage + private getStorageKey(sessionId: SessionId): string { + return `${this.storageNamespace}:${sessionId}`; + } + + // Get session data from storage + async getSession(sessionId: SessionId): Promise> { + const storageKey = this.getStorageKey(sessionId); + try { + const storedData = await this.storage.get>(storageKey); + if (!storedData) { + return { success: false, sessionId, error: 'Session not found' }; + } + // TODO: Optionally update lastAccessed time here? + return { success: true, sessionId, data: storedData.state }; + } catch (error: any) { + logger.error('Error getting session', { error: error.message }); + return { success: false, sessionId, error: error.message }; + } + } + + // Update session data in storage + async updateSession(sessionId: SessionId, updateFn: (state: T) => T): Promise> { + const storageKey = this.getStorageKey(sessionId); + try { + const storedData = await this.storage.get>(storageKey); + if (!storedData) { + return { success: false, sessionId, error: 'Session not found for update' }; + } + + // Apply the update function to the state + const updatedState = updateFn(storedData.state); + + // Prepare the full data object to store + const updatedSessionData: StoredSessionData = { + ...storedData, + state: updatedState, + metadata: { ...storedData.metadata, lastAccessedAt: new Date() } + }; + + // Calculate TTL in seconds for storage provider + const ttlSeconds = this.defaultTtlMs > 0 ? Math.floor(this.defaultTtlMs / 1000) : undefined; + + // Save back to storage with TTL + await this.storage.set(storageKey, updatedSessionData, { ttlSeconds }); + + return { success: true, sessionId, data: updatedState }; + } catch (error: any) { + logger.error('Error updating session', { error: error.message }); + return { success: false, sessionId, error: error.message }; + } + } + + // Create a brand new session in storage + async createSession(options: { sessionId?: SessionId } = {}): Promise> { + const sessionId = options.sessionId || `session_${Date.now()}`; // Generate ID if needed + const storageKey = this.getStorageKey(sessionId); + + // Check if it already exists first + const existing = await this.storage.get(storageKey); + if (existing) { + logger.warn('Session already exists, returning existing.', { sessionId }); + return this.getSession(sessionId); // Just return the existing one + } + + // Create new state and metadata + const state = this.defaultStateGenerator(sessionId); + const now = new Date(); + const sessionData: StoredSessionData = { + state, + metadata: { createdAt: now, lastAccessedAt: now }, + ttlMs: this.defaultTtlMs + }; + const ttlSeconds = this.defaultTtlMs > 0 ? Math.floor(this.defaultTtlMs / 1000) : undefined; + + try { + // Store the new session + await this.storage.set(storageKey, sessionData, { ttlSeconds }); + return { success: true, sessionId, data: state }; + } catch (error: any) { + logger.error('Error creating session', { error: error.message }); + return { success: false, sessionId, error: error.message }; + } + } + + // Delete a session + async deleteSession(sessionId: SessionId): Promise> { + const storageKey = this.getStorageKey(sessionId); + try { + const deleted = await this.storage.delete(storageKey); + return { success: true, sessionId, data: deleted }; + } catch (error: any) { + logger.error('Error deleting session', { error: error.message }); + return { success: false, sessionId, error: error.message }; + } + } +} +``` + +**Explanation:** + +1. **Constructor:** Takes the `defaultStateGenerator` function (so it knows how to create a *new* empty state), the `storageProvider` instance (where to save/load), and a `storageNamespace` (to keep keys organized). +2. **`getStorageKey`:** A simple helper to create a unique key for the storage system (e.g., `orchestration-state:session-12345`). +3. **`getSession`:** Uses `storage.get(key)` to retrieve the data. It returns just the `state` part. +4. **`updateSession`:** Retrieves the current data using `storage.get(key)`, applies the `updateFn` provided by the caller (e.g., `OrchestrationStateManager`) to modify the `state`, updates the `lastAccessedAt` timestamp, and then saves the *entire* `StoredSessionData` object back using `storage.set(key, data, {ttl})`. Using `ttl` tells the storage system to automatically delete the data after a period of inactivity. +5. **`createSession`:** Creates a new state using `defaultStateGenerator`, wraps it in `StoredSessionData` with current timestamps, and saves it using `storage.set(key, data, {ttl})`. +6. **`deleteSession`:** Uses `storage.delete(key)` to remove the session data. + +The `SessionManager` provides a clean API (`getSession`, `updateSession`, etc.) while hiding the details of how and where the data is actually stored. + +## Conclusion + +You've now learned about the `SessionManager`, AgentDock's system for remembering things within a conversation and keeping different conversations separate. + +* It acts like a **hotel front desk**, managing unique **keys (`sessionId`)** for each conversation. +* It stores conversation-specific data (**`SessionState`**) like orchestration status or token counts. +* It ensures **isolation** between different user chats. +* It relies on a **`StorageProvider`** to actually save and load the session data. +* Core components like `OrchestrationStateManager` *use* the `SessionManager` to persist their state across messages. + +Understanding `SessionManager` shows how AgentDock maintains context and separation in conversations. But how does the actual storage part work? Where do these session states get saved? + +Next: [Chapter 8: Storage (`StorageProvider`, `StorageFactory`)](08_storage___storageprovider____storagefactory___.md) + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/AgentDock/08_storage___storageprovider____storagefactory___.md b/docs/AgentDock/08_storage___storageprovider____storagefactory___.md new file mode 100644 index 0000000..262cbf9 --- /dev/null +++ b/docs/AgentDock/08_storage___storageprovider____storagefactory___.md @@ -0,0 +1,336 @@ +# Chapter 8: Storage (`StorageProvider`, `StorageFactory`) + +In the [Chapter 7: Session Management (`SessionManager`)](07_session_management___sessionmanager___.md), we learned how the `SessionManager` keeps track of conversation details using a unique `sessionId` and `SessionState`. But we left one question unanswered: where does the `SessionManager` actually *save* all that `SessionState` data? Does it just keep it in the computer's temporary memory, or does it put it somewhere more permanent? + +## The Problem: Where to Keep the Notes? + +Imagine our `SessionManager` (the hotel front desk clerk) needs to keep notes about each guest's stay (the `SessionState`). Where should they put these notes? + +* **Sticky Notes:** They could use sticky notes on their desk. This is fast and easy for temporary notes, but if the power goes out or they go home, the notes are lost! This is like storing data only in the computer's active memory. +* **Filing Cabinet:** They could put the notes in a sturdy filing cabinet. This takes a bit more effort to file and retrieve, but the notes are safe even if the power goes out. If multiple clerks need access, they can share the cabinet. This is like using a persistent database (like Redis or Vercel KV). + +Which option is best? It depends! For quick tests or development, maybe sticky notes (memory) are fine. For a real application where you don't want to lose conversation state, you need the filing cabinet (persistent storage). + +AgentDock needs a way to *choose* where to store data like session state, API keys, or other temporary information, without the rest of the application needing to worry about the details. + +## The Solution: Flexible Filing Cabinets + +AgentDock solves this with the **Storage** system, primarily using two concepts: + +1. **`StorageProvider` (The Filing Cabinet Blueprint):** This is like a standard blueprint that defines what *any* filing cabinet must be able to do. It says every cabinet must have standard operations like: + * `get(key)`: Find and retrieve a specific file (data) using its label (key). + * `set(key, value)`: Put a new file (data) into the cabinet with a specific label (key). + * `delete(key)`: Remove a file using its label. + * `exists(key)`: Check if a file with a specific label exists. + * (And a few others for handling multiple files or lists). + This blueprint ensures that components like the [Session Management (`SessionManager`)](07_session_management___sessionmanager___.md) can use *any* storage system that follows the rules, without needing to know if it's a simple in-memory store or a powerful Redis database. + +2. **`StorageFactory` (The Filing Cabinet Store):** This is like a store where you can choose and get the specific type of filing cabinet you need based on your requirements (and configuration). You tell the factory: + * "I need an 'in-memory' cabinet for testing." + * "I need a 'Redis' cabinet connected to this address for production." + * "I need a 'Vercel KV' cabinet for my Vercel deployment." + The factory knows how to build and give you the correct cabinet (a `StorageProvider` instance) ready to use. + +## How `SessionManager` Uses Storage + +Let's look back at the `SessionManager` from [Chapter 7: Session Management (`SessionManager`)](07_session_management___sessionmanager___.md). It needs to save and load `SessionState`. How does it use our storage concepts? + +When a `SessionManager` is created, it's *given* a specific `StorageProvider` instance (like the Redis cabinet obtained from the `StorageFactory`). + +```typescript +// Simplified concept from SessionManager constructor +import { StorageProvider } from '../storage/types'; +import { getStorageFactory } from '../storage/factory'; + +export class SessionManager { + private storage: StorageProvider; // Holds the filing cabinet instance! + + constructor( + defaultStateGenerator: (sessionId: SessionId) => T, + storageProvider?: StorageProvider, // Can be passed in! + storageNamespace: string = 'sessions' + // ... + ) { + // Use the provided cabinet, or get a default one (memory) from the factory + this.storage = storageProvider || getStorageFactory().getProvider({ + type: 'memory', // Default to simple memory storage + namespace: storageNamespace + }); + // ... + } + + // --- Methods using the storage --- + + async getSession(sessionId: SessionId): Promise> { + const storageKey = this.getStorageKey(sessionId); // e.g., "sessions:session-123" + // Use the cabinet's 'get' operation + const storedData = await this.storage.get>(storageKey); + // ... handle result ... + } + + async updateSession(sessionId: SessionId, updateFn: (state: T) => T): Promise> { + const storageKey = this.getStorageKey(sessionId); + // ... get current data using this.storage.get() ... + // ... apply updateFn ... + // Use the cabinet's 'set' operation to save back + await this.storage.set(storageKey, updatedSessionData, { /* ttl options */ }); + // ... handle result ... + } + + async deleteSession(sessionId: SessionId): Promise> { + const storageKey = this.getStorageKey(sessionId); + // Use the cabinet's 'delete' operation + const deleted = await this.storage.delete(storageKey); + // ... handle result ... + } +} +``` + +**Explanation:** + +1. The `SessionManager` takes a `storageProvider` when it's created. If none is provided, it gets a default `memory` provider from the `StorageFactory`. +2. Inside its methods (`getSession`, `updateSession`, `deleteSession`), it calls the corresponding standard methods (`get`, `set`, `delete`) on the `this.storage` object (which is the specific `StorageProvider` instance it was given). + +The `SessionManager` doesn't care *which* type of storage provider it has, as long as it follows the standard blueprint (`StorageProvider` interface). This makes `SessionManager` flexible and decoupled from the storage details. + +## Under the Hood: Building and Using Cabinets + +Let's look closer at the blueprint and the factory. + +### `StorageProvider`: The Blueprint + +The core interface defining the standard operations is `StorageProvider` found in `agentdock-core/src/storage/types.ts`. + +```typescript +// Simplified from agentdock-core/src/storage/types.ts + +/** + * Core storage provider interface (The Filing Cabinet Blueprint) + */ +export interface StorageProvider { + /** Retrieve a file by its label (key) */ + get(key: string, options?: StorageOptions): Promise; + + /** Put a file (value) into the cabinet with a label (key) */ + set(key: string, value: T, options?: StorageOptions): Promise; + + /** Remove a file by its label (key) */ + delete(key: string, options?: StorageOptions): Promise; + + /** Check if a file with this label (key) exists */ + exists(key: string, options?: StorageOptions): Promise; + + /** Get multiple files at once */ + getMany(keys: string[], options?: StorageOptions): Promise>; + + /** Save multiple files at once */ + setMany(items: Record, options?: StorageOptions): Promise; + + /** Delete multiple files at once */ + deleteMany(keys: string[], options?: StorageOptions): Promise; + + /** List files whose labels start with a prefix */ + list(prefix: string, options?: ListOptions): Promise; + + /** Get a range of items from a list stored at a key */ + getList(key: string, start?: number, end?: number, options?: StorageOptions): Promise; + + /** Save an entire list to a key */ + saveList(key: string, values: T[], options?: StorageOptions): Promise; + + /** Remove an entire list */ + deleteList(key: string, options?: StorageOptions): Promise; + + /** Clean up resources (like closing connections) */ + destroy?(): Promise; +} +``` + +**Explanation:** + +* This interface simply lists the standard methods that any storage system (in-memory, Redis, Vercel KV, etc.) must implement. +* Methods like `get`, `set`, `delete` are the fundamental operations. +* Others like `getMany`, `setMany`, `list`, `getList` provide more advanced or optimized ways to interact with the storage. +* Using `Promise` means these operations might take some time (like talking to a network database) and are handled asynchronously. + +### `StorageFactory`: The Store + +The `StorageFactory` (`agentdock-core/src/storage/factory.ts`) is responsible for creating instances of specific `StorageProvider` implementations. It acts as a central point for configuration and management. + +**How it's used:** + +You typically get the single, shared factory instance and ask it for a provider: + +```typescript +import { getStorageFactory } from 'agentdock-core/storage'; + +// Get the factory instance (singleton) +const factory = getStorageFactory(); + +// Ask the factory for a Redis provider for the 'sessions' namespace +const redisProvider = factory.getProvider({ + type: 'redis', // Specify the type of cabinet + namespace: 'sessions', // A prefix for keys for this usage + // Config might be read from environment variables inside the factory +}); + +// Ask for a default provider (might be 'memory' or configured elsewhere) +const defaultProvider = factory.getDefaultProvider(); + +// Now you can use redisProvider.set(...) or defaultProvider.get(...) +``` + +**Inside the Factory (Simplified):** + +```typescript +// Simplified from agentdock-core/src/storage/factory.ts +import { MemoryStorageProvider, RedisStorageProvider, VercelKVProvider } from './providers'; +import { StorageProvider, StorageProviderFactory, StorageProviderOptions } from './types'; + +// Registry to hold functions that create providers +interface ProviderRegistry { [type: string]: StorageProviderFactory; } +// Cache to reuse provider instances (e.g., one connection per namespace) +interface ProviderCache { [cacheKey: string]: StorageProvider; } + +export class StorageFactory { + private static instance: StorageFactory; // Singleton instance + private providers: ProviderRegistry = {}; // Holds creator functions + private providerCache: ProviderCache = {}; // Holds created instances + private defaultType: string = 'memory'; // Default cabinet type + + private constructor() { + // Register built-in cabinet types and how to make them + this.registerProvider('memory', (options = {}) => new MemoryStorageProvider(options)); + this.registerProvider('redis', (options = {}) => { + // Reads process.env.REDIS_URL, process.env.REDIS_TOKEN internally + return new RedisStorageProvider({ namespace: options.namespace, /* ... */ }); + }); + this.registerProvider('vercel-kv', (options = {}) => new VercelKVProvider({ namespace: options.namespace })); + // ... + } + + // Get the single shared factory instance + public static getInstance(): StorageFactory { /* ... returns instance ... */ } + + // Add a new cabinet type + public registerProvider(type: string, factory: StorageProviderFactory): void { /* ... adds to providers ... */ } + + // Get (or create and cache) a specific cabinet instance + public getProvider(options: Partial = {}): StorageProvider { + const type = options.type || this.defaultType; + const namespace = options.namespace || 'default'; + const cacheKey = `${type}:${namespace}`; // e.g., "redis:sessions" + + // Reuse if already created for this type and namespace + if (this.providerCache[cacheKey]) { + return this.providerCache[cacheKey]; + } + + // Find the creator function for this type + const factory = this.providers[type]; + if (!factory) throw new Error(`Provider type '${type}' not registered`); + + // Create the new cabinet instance + const provider = factory({ namespace, ...options.config }); // Pass config + + // Cache it for next time + this.providerCache[cacheKey] = provider; + return provider; + } + // ... other methods like setDefaultType, getDefaultProvider ... +} + +// Helper function to get the singleton instance easily +export function getStorageFactory(): StorageFactory { + return StorageFactory.getInstance(); +} +``` + +**Explanation:** + +1. **Singleton:** There's usually only one `StorageFactory` instance in the application (`getInstance`). +2. **Registration:** When the factory starts, it `registerProvider`s the known types ('memory', 'redis', 'vercel-kv') along with functions that know how to create them. These creation functions often read connection details (like `REDIS_URL`) from environment variables. +3. **`getProvider`:** This is the main method. You ask for a `type` and `namespace`. + * It creates a `cacheKey` (e.g., "redis:sessions"). + * It checks its `providerCache`. If an instance for that key already exists (meaning we already connected to Redis for the 'sessions' namespace), it returns the existing one to reuse the connection. + * If not cached, it finds the registered creation function (`factory`) for the requested `type`. + * It calls the function to create the new `StorageProvider` instance (e.g., actually connecting to Redis). + * It stores the new instance in the `providerCache` and returns it. + +This factory pattern makes it easy to manage different storage backends and configure them centrally, often using environment variables. For example, the `getConfiguredStorageProvider` helper function used by the `OrchestrationManager` checks `process.env.KV_STORE_PROVIDER` or `process.env.REDIS_URL` to decide which type of provider to request from the factory. + +**Flow Diagram: Getting and Using a Provider** + +```mermaid +sequenceDiagram + participant AppCode as Application Code (e.g., SessionManager) + participant Factory as StorageFactory + participant Cache as Provider Cache + participant Creator as Provider Creator Fn (e.g., for Redis) + participant Provider as StorageProvider Instance (e.g., Redis) + + AppCode->>Factory: getProvider({type: 'redis', ns: 'sessions'}) + Factory->>Cache: Check cache['redis:sessions'] + Cache-->>Factory: Not found + Factory->>Factory: Find creator fn for 'redis' + Factory->>Creator: Create({namespace: 'sessions', ...}) + Note right of Creator: Reads env vars (REDIS_URL), connects to Redis... + Creator-->>Factory: Return new RedisProvider instance + Factory->>Cache: Store instance at cache['redis:sessions'] + Factory-->>AppCode: Return RedisProvider instance + AppCode->>Provider: set('myKey', 'myValue') + Provider->>Provider: Talk to actual Redis server + Provider-->>AppCode: Confirm set operation +``` + +### Example Providers (`MemoryStorageProvider`, `RedisStorageProvider`) + +AgentDock includes implementations of the `StorageProvider` interface: + +* **`MemoryStorageProvider` (`agentdock-core/src/storage/providers/memory-provider.ts`):** + * Stores data in a simple JavaScript `Map` in the computer's memory. + * Very fast, requires no external setup. + * **Data is lost** when the application restarts. + * Good for development, testing, or temporary data. Can optionally use a shared global map for persistence across serverless function calls if configured. + +* **`RedisStorageProvider` (`agentdock-core/src/storage/providers/redis-provider.ts`):** + * Uses the `@upstash/redis` library to connect to a Redis database (like Upstash or Vercel KV which uses the Redis API). + * Data is **persistent** across restarts. + * Requires a running Redis instance and connection details (URL, token) usually provided via environment variables (`REDIS_URL`, `SRH_TOKEN` or `KV_REST_API_TOKEN`). + * Suitable for production. + +* **`VercelKVProvider` (`agentdock-core/src/storage/providers/vercel-kv-provider.ts`):** + * Uses the `@vercel/kv` library, specifically designed for Vercel KV. + * Data is **persistent**. + * Typically configured automatically by Vercel environment variables when deployed. + * Ideal for applications hosted on Vercel. + +Each of these classes implements all the methods defined in the `StorageProvider` interface (`get`, `set`, `delete`, etc.) using the specific commands for its backend (in-memory Map operations, Redis commands, or Vercel KV SDK calls). + +## Conclusion + +You've reached the end of our core concepts tutorial! In this final chapter, we explored AgentDock's flexible **Storage** system: + +* It solves the problem of needing different places to store data (like session state) depending on the environment (testing vs. production). +* The **`StorageProvider`** interface acts as a **standard blueprint** defining common operations (`get`, `set`, `delete`) for any storage system. +* The **`StorageFactory`** acts as a **store or builder**, creating specific `StorageProvider` instances (like `MemoryStorageProvider`, `RedisStorageProvider`, `VercelKVProvider`) based on configuration, often read from environment variables. +* Components like the [Session Management (`SessionManager`)](07_session_management___sessionmanager___.md) are built to work with *any* `StorageProvider`, making them flexible and decoupled from storage details. + +This flexible storage approach allows AgentDock to adapt to different deployment scenarios, from simple local testing to robust cloud deployments, ensuring data like conversation state can be managed appropriately. + +Congratulations on completing the AgentDock core concepts tutorial! You now have a foundational understanding of: + +* How agents are defined ([Agent Configuration (`AgentConfig`)](01_agent_configuration___agentconfig___.md)) +* How they gain special abilities ([Tools](02_tools_.md)) +* The building blocks of capabilities ([Nodes (`BaseNode`, `AgentNode`)](03_nodes___basenode____agentnode___.md)) +* How agents interact with AI models ([CoreLLM (LLM Abstraction)](04_corellm__llm_abstraction__.md)) +* How complex workflows are managed ([Orchestration (`OrchestrationManager`)](05_orchestration___orchestrationmanager___.md)) +* How users interact via the web ([API Route (`/api/chat/[agentId]/route.ts`)](06_api_route____api_chat__agentid__route_ts__.md)) +* How conversation state is remembered ([Session Management (`SessionManager`)](07_session_management___sessionmanager___.md)) +* And how that state is saved ([Storage (`StorageProvider`, `StorageFactory`)](08_storage___storageprovider____storagefactory___.md)) + +You're now well-equipped to start building and customizing your own powerful AI agents with AgentDock! + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/AgentDock/index.md b/docs/AgentDock/index.md new file mode 100644 index 0000000..fbd26d6 --- /dev/null +++ b/docs/AgentDock/index.md @@ -0,0 +1,65 @@ +# Tutorial: AgentDock + +AgentDock is a framework for building *AI agents*. Think of it like a toolkit to create specialized AI assistants. +You define an agent's **personality**, what **tools** (like search or calculators) it can use, and how it should handle conversations using a configuration file (`AgentConfig`). +The core logic is handled by **Nodes**, especially the `AgentNode`, which interacts with Language Models (`CoreLLM`) and follows rules (`Orchestration`) to decide when to talk and when to use tools. +It keeps track of conversations (`Session Management`) and saves data (`Storage`), all accessible through a web API (`API Route`). + + +**Source Repository:** [https://github.com/AgentDock/AgentDock](https://github.com/AgentDock/AgentDock) + +```mermaid +flowchart TD + A0["Agent Configuration (AgentConfig)"] + A1["Nodes (BaseNode, AgentNode) +"] + A2["Tools +"] + A3["CoreLLM (LLM Abstraction) +"] + A4["Orchestration (OrchestrationManager) +"] + A5["Session Management (SessionManager) +"] + A6["API Route (/api/chat/[agentId]/route.ts) +"] + A7["Storage (StorageProvider, StorageFactory) +"] + A0 -- "Specifies allowed" --> A2 + A0 -- "Contains rules for" --> A4 + A1 -- "Reads configuration from" --> A0 + A1 -- "Uses" --> A2 + A1 -- "Calls for generation" --> A3 + A1 -- "Consults rules with" --> A4 + A2 -- "Can access" --> A3 + A4 -- "Manages state via" --> A5 + A5 -- "Persists state using" --> A7 + A6 -- "Loads" --> A0 + A6 -- "Delegates chat to" --> A1 + A6 -- "Manages" --> A5 + A6 -- "Reads keys from" --> A7 +``` + +## Chapters + +1. [Agent Configuration (`AgentConfig`) +](01_agent_configuration___agentconfig___.md) +2. [Tools +](02_tools_.md) +3. [Nodes (`BaseNode`, `AgentNode`) +](03_nodes___basenode____agentnode___.md) +4. [CoreLLM (LLM Abstraction) +](04_corellm__llm_abstraction__.md) +5. [Orchestration (`OrchestrationManager`) +](05_orchestration___orchestrationmanager___.md) +6. [API Route (`/api/chat/[agentId]/route.ts`) +](06_api_route____api_chat__agentid__route_ts___.md) +7. [Session Management (`SessionManager`) +](07_session_management___sessionmanager___.md) +8. [Storage (`StorageProvider`, `StorageFactory`) +](08_storage___storageprovider____storagefactory___.md) + + +--- + +Generated by [AI Codebase Knowledge Builder](https://github.com/The-Pocket/Tutorial-Codebase-Knowledge) \ No newline at end of file diff --git a/docs/index.md b/docs/index.md index f5d87c9..f4ceb1e 100644 --- a/docs/index.md +++ b/docs/index.md @@ -20,6 +20,7 @@ This is a tutorial project of [Pocket Flow](https://github.com/The-Pocket/Pocket ## Example Tutorials for Popular GitHub Repositories +- [AgentDock](./AgentDock/index.md) - Create specialized AI agents with custom personalities, tools, and conversation handling! - [AutoGen Core](./AutoGen Core/index.md) - Build AI teams that talk, think, and solve problems together like coworkers! - [Browser Use](./Browser Use/index.md) - Let AI surf the web for you, clicking buttons and filling forms like a digital assistant! - [Celery](./Celery/index.md) - Supercharge your app with background tasks that run while you sleep!