Back to Catalog
Agentic AI
Planning & Control Flow
Proactive Goal Creator
Anticipate the user's goal by capturing surrounding multimodal context (gestures, screen state, environment) in addition to what the user types or says.
Intent & Description
🎯 Intent
Anticipate the user’s goal by capturing surrounding multimodal context (gestures, screen state, environment) in addition to what the user types or says.
📋 Context
A team builds an agent for a setting where the user cannot or will not articulate the full context in text — an accessibility tool used by someone with limited speech, an ambient home assistant, an embodied robot, a screen-aware coding helper. Cameras, microphones, screen capture, or other sensors are available and can supply context the user does not state. The team has operational and privacy approvals to capture and process that data.
💡 Solution
- A proactive goal creator runs alongside the dialogue interface. - It activates context-capture devices (cameras for gestures, screen recorders for UI state, microphones for ambient audio, environment sensors). - It passes the multimodal data through context engineering and combines it with the user’s articulated prompt to produce a refined goal. - The component must notify users when context is being captured, with a low false-positive rate to avoid surprise.
Real-world Use Case
- Embodied / ambient interaction is the primary surface, not chat.
- Accessibility needs make dialogue-only interaction insufficient.
- Context-capture is justified by clear user value and disclosed appropriately.
Source
Advantages
- Agent acts on anticipated intent, not only on explicit prompts.
- Richer context yields more accurate goal extraction.
- Users with disabilities can interact via captured context rather than dialogue alone.
Disadvantages
- Multimodal capture and continuous processing are expensive.
- Privacy and consent requirements must be disclosed and bounded.
- False positives can interrupt the user when no intent was actually expressed.