Task Extraction#
📚 What You’ll Learn
Key Concepts:
How conversations become structured, actionable tasks
Context compression and dependency detection
Data source integration during extraction
Task-specific context optimization
Prerequisites: Understanding of Message and Execution Flow
Time Investment: 10 minutes for complete understanding
Core Problem#
Challenge: Agentic systems need conversational awareness without requiring every component to process entire chat histories.
Traditional Approaches (Flawed): Traditional methods often require processing the entire conversation history at every step, which can be slow and expensive. Some approaches use generic chat history summarization, but this often causes important task-relevant details to be lost. Other approaches provide no contextual information at all, leading to a loss of conversational awareness.
Framework Solution: The framework implements a single-point context compression approach that extracts only task-relevant conversational context, enabling the Task Extraction system to efficiently convert human conversations into structured, actionable inputs for seamless processing.
Architecture#
Task extraction operates as the first pipeline step, converting raw conversations into structured tasks:
# Input: Full conversation history
messages = [
HumanMessage("Remember that data from yesterday?"),
HumanMessage("Can you analyze the trends?")
]
# Output: Structured task
ExtractedTask(
task="Analyze trends in the data from yesterday's conversation",
depends_on_chat_history=True,
depends_on_user_memory=False
)
- Key Benefits:
Downstream components receive compressed, actionable tasks
Conversational references are resolved (“that data” → specific context)
Dependencies are clearly identified for capability selection
Implementation#
TaskExtractionNode processes conversations automatically:
@infrastructure_node
class TaskExtractionNode(BaseInfrastructureNode):
name = "task_extraction"
description = "Task Extraction and Processing"
@staticmethod
async def execute(state: AgentState, **kwargs):
# Get native LangGraph messages
messages = state["messages"]
# Retrieve external context if available
retrieval_result = await data_manager.retrieve_all_context(state)
# Extract task using LLM
extracted_task = await _extract_task(messages, retrieval_result)
return {
"task_current_task": extracted_task.task,
"task_depends_on_chat_history": extracted_task.depends_on_chat_history,
"task_depends_on_user_memory": extracted_task.depends_on_user_memory
}
Bypass LLM-based Task Extraction
Task extraction can be bypassed either temporarily using the /task:off
slash command, or set as the default behavior via the configuration system. This allows you to skip LLM-based task extraction and use the full conversation history directly in downstream processing, based on your workflow needs.
- Bypass Behavior:
Skips LLM-based task extraction entirely
Passes full conversation history and retrieved datasource results as the “extracted task”
Sets dependency flags to True (assumes full context and chat history needed)
Maintains compatibility with downstream orchestration
- When to Use Bypass Mode:
Code R&D scenarios where full conversational context aids development
Short conversation histories where task extraction overhead exceeds benefits
Minimal external data scenarios where context compression isn’t needed
High-throughput applications requiring reduced LLM call latency (trades orchestrator processing cost for extraction speed)
- Advantages:
Faster upstream pipeline (skips LLM-based task extraction)
No risk of losing conversational context or nuance
- Disadvantages:
Longer capability selection process (full conversation history included)
Longer orchestrator prompts (full conversation history included)
Slower plan generation (more tokens to process)
Potential for information overload in complex conversations
Structured Output#
Task extraction uses structured LLM generation for consistency:
class ExtractedTask(BaseModel):
task: str = Field(description="Actionable task from conversation")
depends_on_chat_history: bool = Field(description="Requires previous context")
depends_on_user_memory: bool = Field(description="Requires stored user data")
Task Compression Examples:
User: "What's the weather like?"
→ Task: "Get current weather conditions"
→ Dependencies: history=False, memory=False
User: "How does that compare to yesterday?"
→ Task: "Compare current weather to yesterday's weather data"
→ Dependencies: history=True, memory=False
User: "Use my preferred location"
→ Task: "Get weather for the Bay Area"
→ Dependencies: history=False, memory=True
Data Source Integration#
Task extraction automatically integrates available data sources:
# Automatic data retrieval during extraction
try:
data_manager = get_data_source_manager()
retrieval_result = await data_manager.retrieve_all_context(request)
logger.info(f"Retrieved data from {retrieval_result.total_sources_attempted} sources")
except Exception as e:
logger.warning(f"Data source retrieval failed, proceeding without external context: {e}")
Graceful Degradation: Task extraction continues without external data if sources are unavailable.
Context Enrichment: Available data sources can improve dependency detection and task clarity.
Error Handling#
Task extraction includes retry policies optimized for LLM operations:
@staticmethod
def classify_error(exc: Exception, context: dict):
# Retry network/API timeouts
if isinstance(exc, (ConnectionError, TimeoutError)):
return ErrorClassification(
severity=ErrorSeverity.RETRIABLE,
user_message="Network timeout during task extraction, retrying..."
)
# Don't retry validation errors
if isinstance(exc, (ValueError, TypeError)):
return ErrorClassification(
severity=ErrorSeverity.CRITICAL,
user_message="Task extraction configuration error"
)
Retry Policy: 3 attempts with exponential backoff for network issues.
Integration Patterns#
Automatic Pipeline Integration: Task extraction runs automatically as the first infrastructure node in message processing.
State Integration: Results are stored in agent state for downstream consumption:
# Downstream capabilities access extracted task
current_task = state.get("task_current_task")
needs_history = state.get("task_depends_on_chat_history", False)
needs_memory = state.get("task_depends_on_user_memory", False)
Prompt System Integration: Uses framework prompt builders for domain-specific extraction:
# Applications can customize with domain-specific builders
class ALSTaskExtractionPromptBuilder(DefaultTaskExtractionPromptBuilder):
def get_instructions(self) -> str:
return "Extract tasks related to ALS accelerator operations..."
Configuration#
Task extraction uses configuration:
# Framework configuration
task_extraction_config = get_model_config("framework", "task_extraction")
# Application-specific overrides
als_config = get_model_config("als_assistant", "task_extraction")
Troubleshooting#
Task Extraction Timeouts:
Built-in retry logic with exponential backoff
Check network connectivity to LLM provider
Verify model configuration in
config.yml
Missing Dependencies:
Review prompt builder examples for dependency detection
Check chat history formatting
Consider application-specific prompt overrides
Data Source Failures:
Task extraction gracefully degrades without external data
Check data source provider registration
Verify data source configuration
Performance Considerations#
Optimization Features:
Async processing with
asyncio.to_thread()
for non-blocking LLM callsParallel data source retrieval
Structured output for consistent parsing
Memory Management:
Only essential task information persists in agent state
Data retrieval results don’t persist beyond extraction
Native LangGraph message compatibility
See also
- Task Extraction
API reference for task extraction classes and functions
- Context Management System
Context compression and dependency detection patterns
- ../04_orchestrator-planning
How tasks become execution plans
- Message and Execution Flow
Message processing pipeline architecture