Research Question Coverage Across Competitive Landscape
Executive Summary
This analysis maps the five core research questions (Q1-Q5) against existing projects and research
programs. It identifies three distinct gaps where Alpha India can own defensible intellectual property:
🔴 Critical Gap (Q2 + Q4 in decentralized context): No existing project combines
personality-driven heterogeneity with decentralized learning where humans guide some agents and
knowledge spreads through peers. This is Alpha India's primary opportunity.
Q1: Task Generalization
How can robots solve unanticipated tasks beyond their training distribution?
Project
Coverage
Approach
Limitation
DeepMind Multi-Agent RL
✓ Core
Multi-agent coordination in complex environments (StarCraft II)
Homogeneous agents; static environments
MIT Interactive Agents
✓ Core
Learning from human demonstration + task guidance
Single-agent focus; centralized
Covariant / Embodied AI
✓ Core
Real-world robotic task transfer (sim-to-real)
Single-robot; no multi-agent learning
Google DeepRL Robotics
✓ Core
Multi-robot task learning with sim-to-real transfer
No personality heterogeneity; limited human guidance
Meta-Learning (MAML)
✓ Core
Fast adaptation to novel tasks (few-shot learning)
✓ Coverage Status: Well-covered
Task generalization is actively researched by major labs (DeepMind, Google, OpenAI, Berkeley, MIT). This
is not where Alpha India's primary moat is.
Alpha India Angle: Task generalization under personality-driven
heterogeneity and survival constraints. Most work assumes homogeneous
agents or centralized learning. Alpha India tests: Does personality-heterogeneous learning generalize to
novel tasks faster than homogeneous learning?
Q2: Situational Awareness
How do agents learn what they don't know? How does personality affect
environmental perception and uncertainty estimation?
Project
Coverage
Approach
Gap
CMU Swarm Robotics
◐ Partial
Decentralized swarm awareness + coordination
Pre-programmed behavior; no learning; no personality
Covariant / Embodied AI
◐ Partial
Vision-language models for embodied perception
Single-agent; no multi-agent awareness
OpenAI Emergent Communication
◐ Partial
Agents learn to communicate about environmental states
Focused on communication protocol, not perception
DeepMind, Google, Meta-Learning
✗ None
—
Assume perfect state observation; no awareness mechanism
🔴 CRITICAL GAP: Q2 is almost entirely unexplored in multi-agent, personality-driven
contexts.
What exists: CMU studies swarm awareness (but behavior is pre-programmed). Covariant
studies embodied perception (but single-agent). OpenAI studies communication (but not perception per
se).
What doesn't exist: How does personality affect what a decentralized agent perceives
about its environment? Does a curious agent perceive risk differently? Can agents learn situational
awareness collectively?
Aggressive VBs ignore subtle signals → take calculated risks
Why defensible: Mars rovers and deep-sea swarms MUST be situationally aware with
limited communication. Personality-driven awareness is novel and directly applicable.
Q3: Epistemic Agency
When do agents know to ask for help (peers, humans) vs. self-research? What's the
cost-benefit under uncertainty?
Project
Coverage
Approach
Limitation
DeepMind Multi-Agent RL
✓ Core
Multi-agent coordination + emergent cooperation
No deliberate help-seeking; agents just coordinate
OpenAI Emergent Communication
✓ Core
Agents learn to communicate without pre-defined protocol
Agents must communicate; no choice to self-rely
CMU Swarm Robotics
✓ Core
Decentralized coordination without central authority
Pre-programmed; no learning; no personality-driven strategy
MIT Interactive Agents
◐ Partial
Robots ask humans for help when uncertain
Single-agent; asking is not strategic (always ask when uncertain)
✓ Coverage Status: Moderately covered
Epistemic agency (help-seeking, communication) is well-studied in multi-agent contexts. However, the
strategic, personality-driven aspect is underdeveloped.
Alpha India Angle: Personality-driven help-seeking strategy. An aggressive agent might
self-research even when uncertain (high risk tolerance). A collaborative agent asks peers first (high
social trust). No one tests this systematically in decentralized swarms.
Q4: Training Signal Design
What form of human guidance works best? (Demonstration vs. critique vs. Socratic
dialogue). How does feedback propagate in decentralized teams?
Project
Coverage
Approach
Limitation
MIT Interactive Agents
✓ Core
Learning from human demonstration + correction feedback
Single-agent; centralized
OpenAI Learning from Feedback (RLHF)
✓ Core
Agents learn from human preference signals
Single-agent (LLM focused); no multi-agent learning
M³HF (2025 - Recent)
◐ Emerging
Multi-agent learning from multi-phase human feedback (expert + non-expert)
Centralized training. Agents learn together from a central authority. Not
decentralized.
MARLHF (2024 - Recent)
◐ Emerging
Multi-agent RL from preference-only human feedback
Offline, batch learning. Not real-time interactive guidance. Not
peer-propagated.
🔴 CRITICAL GAP: Q4 in decentralized, real-time, personality-aware context.
What exists: MIT tests demonstration vs. feedback (single-agent). OpenAI tests
preference learning (LLM-focused). M³HF and MARLHF test multi-agent feedback (but centralized
training).
What doesn't exist: How do different types of human guidance (demonstration, critique,
Socratic dialogue) propagate through a decentralized swarm? Does personality affect how agents weight
peer feedback vs. human feedback? Can you scale human guidance (1 human → 100 agents)?
🎯 USP Opportunity: Decentralized Learning from Mixed Human Feedback
First to systematically test: Multi-agent learning where feedback comes from diverse
human sources (players guiding different VBs) and propagates through peers, with personality-driven
weighting.
Humans guide *some* agents (not all)
Guidance spreads through peer testimony
Personality determines trust in peers vs. humans vs. self
Measure: Can 1 human effectively guide 100 agents via peer propagation?
Why defensible: In space/swarms, you can't guide every robot. Feedback must
propagate through peers with personality-driven trust. This is novel and mission-critical.
Q5: Evolution
Can simulated evolution produce human-like cognitive capabilities? How does
population diversity emerge and persist?
No task constraints; no human guidance; no survival pressure
DeepMind, OpenAI, Meta, Google
✗ None
—
Focus on RL or supervised learning; not evolutionary
✓ Coverage Status: Well-covered (in isolation)
Evolution and diversity are studied (UC Berkeley, NEAT). But evolution combined with task learning,
survival constraints, and human guidance is unexplored.
Alpha India Angle: Evolution under survival pressure + task
learning + human guidance. Do personality-heterogeneous populations
develop specialization faster? Do they adapt to environmental change faster? Does diversity persist or
converge?
Q2 in decentralized context – No one studies personality-driven situational
awareness Q4 in decentralized context – No one tests feedback propagation through peer
networks
Option A: Focus on Q2 (Personality-Aware Situational Awareness)
The Research Question
How does personality affect what agents perceive about their environment?
Can we measure and optimize personality-driven awareness?
Game Mechanic
VBs with different personalities explore differently. Curious VBs
discover more resources/hazards. Risk-averse VBs avoid threats better. Measurement: Do
heterogeneous populations learn more about their environment?
Advantages
Novel, unexplored research gap. Easy to measure (discovery rate, threat
avoidance). Direct space application (Mars rovers).
Risk
May be too narrow for 90-day MVP. Hard to engage players around
"situational awareness."
Option B: Focus on Q4 (Decentralized Learning from Mixed Feedback)
The Research Question
Can diverse human guidance (from different players) propagate through a
decentralized swarm? Can personality-driven trust determine how agents weight feedback?
Game Mechanic
Players guide their own VB (not all VBs). Guidance spreads via peer
learning. Personality determines how much a VB trusts peers vs. humans. Measurement: Does
heterogeneous learning enable 1 human to guide 100 VBs?
Advantages
Novel, publishable, directly addresses human guidance scaling. Engaging
game loop (players see their guidance propagate). Critical for swarm robotics.
Risk
Requires careful experimental design to isolate feedback type effects.
Complex measurement (peer vs. human vs. self weighting).
Option C: Combined USP (Personality-Driven Learning Under Survival Constraints)
The Research Question
What emerges when you combine personality heterogeneity, survival
pressure, embodied learning, and human guidance? Do agents specialize? Do they develop
complementary roles?
Game Mechanic
All five research questions integrated. VBs have survival needs (food,
energy). Personality drives learning strategy. Humans guide some VBs. Knowledge is spatially
indexed. Measurement: Does personality-driven specialization emerge? Does the population adapt
faster?
Advantages
Most comprehensive. Covers the full research space. Most defensible
(hardest to replicate). Strongest investor story (addresses real-world problems).
Risk
Most complex to implement and measure. Risk of having too many variables
to control for in 90 days.