ExDex: Dexterous Non-Prehensile Manipulation for Ungraspable Object via Extrinsic Dexterity

Abstract

Objects with large base areas become ungraspable when they exceed the end-effector's maximum aperture. Existing approaches address this limitation through extrinsic dexterity, which exploits environmental features for non-prehensile manipulation. While grippers have shown some success in this domain, dexterous hands offer superior flexibility and manipulation capabilities that enable richer environmental interactions, though they present greater control challenges. Here we present ExDex, a dexterous arm-hand system that leverages reinforcement learning to enable non-prehensile manipulation for grasping ungraspable objects. Our system learns two strategic manipulation sequences: relocating objects from table centers to edges for direct grasping, or to walls where extrinsic dexterity enables grasping through environmental interaction. We validate our approach through extensive experiments with dozens of diverse household objects, demonstrating both superior performance and generalization capabilities with novel objects. Furthermore, we successfully transfer the learned policies from simulation to a real-world robot system without additional training, further demonstrating its applicability in real-world scenarios.

Method

(A) Training: Our system is trained in three stages. In Stage 1, we train a prediction model \(\pi_{\text{pre}}\) through supervised learning that takes point cloud input and predicts the optimal target position \(P_t\) for object repositioning. Stage 2 focuses on training three low-level skills via reinforcement learning: a pushing policy \(\pi_{\text{push}}\) that repositions objects to target locations, and two policies \(\pi_{\text{wall}}, \pi_{\text{edge}}\) that enable grasping of ungraspable objects from walls and table edges via extrinsic dexterity. In Stage 3, we jointly finetune these policies to ensure better transitions between consecutive skills.

(B) Inference: During inference, our system first use the \(\pi_{\text{pre}}\) to process the environmental point cloud to determine whether to execute the \(\pi_{\text{wall}}\) or \(\pi_{\text{edge}}\), while simultaneously predicting the corresponding target position \(P_t\). The pushing policy \(\pi_{\text{push}}\) then moves the object to this target position, followed by the selected extrinsic dexterity policy (\(\pi_{\text{wall}}\) or \(\pi_{\text{edge}}\)) to complete the grasp.

Experiments

Setups

(a) Object sets used in simulation. Policies are firstly trained on the pretrain set, and then finetuned on the finetune set, and tested for zero-shot generalization on the unseen set. (b) Real-world test objects (top: wall-task objects, bottom: edge-task objects). (c) Workspace of the real-world, We use an Inspired Hand mounted on a UR5e robot, equipped with a RealSense D455 camera.

Simulation

Real World

Conclusion

In this work, we investigate the challenging problem of manipulating ungraspable objects using extrinsic dexterity with a multi-finger hand. Inspired by human's ability to leverage environmental features like walls and edges, we present a hierarchical framework that combines strategic planning with dexterous manipulation skills. Our framework features a high-level planner that intelligently selects optimal external contacts and predicts target positions, coupled with a low-level controller that executes precise non-prehensile manipulation skills. Through extensive experiments in simulation, we demonstrate our framework's superior performance across different external contacts and various objects. The results show that our approach successfully addresses the key challenges in extrinsic dexterity by three factors: strategic object repositioning, dynamic contact interactions, and precise manipulation control. Moreover, the successful transfer of our policies from simulation to a real-world robot system validates the practical applicability of our method, bridging the gap between simulation and reality in contact-rich manipulation tasks.