RAMP Implementation

RAMP Policy

RAMP is a hybrid robot policy that combines language-model symbolic planning with affordance-aware robot assignment and motion-feasibility ranking. The policy uses an LLM plan as a proposal, then checks robot capabilities, object affordances, placement geometry, and simulated execution before choosing an action.

Environment Setup

The experiments use a dual-UR10 tabletop setup with a tray target and multiple household objects. The four arrangements below represent different initial object layouts used to test whether RAMP can adapt robot assignment and grasp strategy from the current belief state.

RAMP environment arrangement A
Arrangement A
RAMP environment arrangement B
Arrangement B
RAMP environment arrangement C
Arrangement C
RAMP environment arrangement D
Arrangement D
Object legend for the RAMP environment
Scene object legend

Methodology

At the beginning of each episode, the environment is reset and the first observation is converted into a belief state. The belief tracks scene frames, object poses, robot capabilities, object affordances, held objects, tray placements, and recovery poses from interrupted executions.

Given the belief and the natural-language goal, RAMP queries an LLM for a symbolic Python plan containing high-level actions such as pick, place, lift, and return_home. RAMP does not directly execute this plan. It treats the LLM output as an initial proposal and rewrites it using affordance and robot-capability metadata.

Belief State
LLM Plan
Affordance Rewrite
Twin Rollout
Best Action

Affordance-Aware Refinement

Candidate Ranking

RAMP evaluates several plan variants: the original LLM proposal, nearest-compatible-robot assignments, and robot-biased variants for each manipulator. Each candidate is executed in a sampled twin environment without visualization.

score = task_cost
      + w_d * end_effector_distance
      + w_c * capability_penalty
      + w_v * constraint_violations
Task cost
Distance
Capability match
Constraints
lowest score wins

Closed-Loop Execution

The selected plan is cached and returned one action at a time. The real environment executes each action using motion planning, including KOMO-based pick, lift, transport, and place motions. If an action succeeds, RAMP continues with the cached plan. If execution fails or creates constraint violations, the plan is invalidated and RAMP replans from the updated belief state.

Consecutive pick failures are tracked per object. After repeated failures, the object is marked unreachable and removed from future candidate plans to avoid retry loops.

Execution Protocol

  1. Instantiate the task, updater, environment, and policy from Hydra.
  2. Reset the environment and initialize the belief from observation.
  3. Sample a twin environment for policy-side simulation.
  4. Pass the current goal and belief to RAMP at every step.
  5. Execute the selected action in the real environment.
  6. Log execution statistics and feed success or constraint feedback to RAMP.
  7. Stop when the task is complete, no action remains, or the maximum number of environment steps is reached.

In the reported configuration, RAMP uses at most 15 environment steps, one feedback cycle, LLM querying enabled, and up to 50 candidate evaluations.