Skip to content

Commit f25afaf

Browse files
authored
Merge pull request #534 - Perception and Navigation Architecture Overhaul
# Perception and Navigation Architecture Overhaul Major refactoring to improve modularity and organization of perception and navigation systems. ## Key Changes ### Core - Added timeout to RPC calls ### Navigation - Added recovery server - Added stuck checking - Added orientation to Path ### 👁️ Perception Improvements - Enhanced object tracking with improved spatial perception - Updated manipulation modules to use standardized DimOS types - Removed deprecated visual servoing code ### ⚙️ Core Infrastructure - Enhanced LCM pub/sub with observable operators - Improved message types (OccupancyGrid, Image, Header) - Added comprehensive transform utilities and testing - New camera module for monocular depth estimation ### 🤖 skill refactor - All navigation skills now work again ## Impact This refactoring establishes a cleaner separation of concerns between robot-specific code and reusable navigation/perception modules, setting up the foundation for more scalable development. Former-commit-id: 001cda9
2 parents b8af44c + 84c15fa commit f25afaf

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+2101
-1439
lines changed

assets/agent/prompt.txt

Lines changed: 3 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -29,35 +29,12 @@ PERCEPTION & TEMPORAL AWARENESS:
2929
- You can recognize and track humans and objects in your field of view
3030

3131
NAVIGATION & MOVEMENT:
32-
- You can navigate to semantically described locations using Navigate (e.g., "go to the kitchen")
33-
- You can navigate to visually identified objects using NavigateToObject (e.g., "go to the red chair")
32+
- You can navigate to semantically described locations using NavigateWithText (e.g., "go to the kitchen")
33+
- You can navigate to visually identified objects using NavigateWithText (e.g., "go to the red chair")
3434
- You can follow humans through complex environments using FollowHuman
35-
- You can execute precise movement to specific coordinates using NavigateToGoal like if you're navigating to a GetPose waypoint
3635
- You can perform various body movements and gestures (sit, stand, dance, etc.)
37-
- When navigating to a location like Kitchen or Bathroom or couch, use the generic Navigate skill to query spatial memory and navigate
3836
- You can stop any navigation process that is currently running using KillSkill
39-
- Appended to every query you will find current objects detection and Saved Locations like this:
40-
41-
Current objects detected:
42-
[DETECTED OBJECTS]
43-
Object 1: refrigerator
44-
ID: 1
45-
Confidence: 0.88
46-
Position: x=9.44m, y=5.87m, z=-0.13m
47-
Rotation: yaw=0.11 rad
48-
Size: width=1.00m, height=1.46m
49-
Depth: 4.92m
50-
Bounding box: [606, 212, 773, 456]
51-
----------------------------------
52-
Object 2: box
53-
ID: 2
54-
Confidence: 0.84
55-
Position: x=11.30m, y=5.10m, z=-0.19m
56-
Rotation: yaw=-0.03 rad
57-
Size: width=0.91m, height=0.37m
58-
Depth: 6.60m
59-
Bounding box: [753, 149, 867, 195]
60-
----------------------------------
37+
6138

6239
Saved Robot Locations:
6340
- LOCATION_NAME: Position (X, Y, Z), Rotation (X, Y, Z)
@@ -70,8 +47,6 @@ Saved Robot Locations:
7047

7148
***When navigating to an object not in current object detected, run NavigateWithText, DO NOT EXPLORE with raw move commands!!!***
7249

73-
***The object detection list is not a comprehensive source of information, when given a visual query like "go to the person wearing a hat" or "Do you see a dog", always Prioritize running observe skill and NavigateWithText***
74-
7550
PLANNING & REASONING:
7651
- You can develop both short-term and long-term plans to achieve complex goals
7752
- You can reason about spatial relationships and plan efficient navigation paths

dimos/core/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -54,7 +54,7 @@ def __getattr__(self, name: str):
5454

5555
if name in self.rpcs:
5656
return lambda *args, **kwargs: self.rpc.call_sync(
57-
f"{self.remote_name}/{name}", (args, kwargs)
57+
f"{self.remote_name}/{name}", (args, kwargs), timeout=2.0
5858
)
5959

6060
# return super().__getattr__(name)

dimos/hardware/zed_camera.py

Lines changed: 30 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,14 @@
3131
from dimos.hardware.stereo_camera import StereoCamera
3232
from dimos.core import Module, Out, rpc
3333
from dimos.utils.logging_config import setup_logger
34+
from dimos.protocol.tf import TF
35+
from dimos.msgs.geometry_msgs import Transform, Vector3, Quaternion
3436

3537
# Import LCM message types
36-
from dimos_lcm.sensor_msgs import Image
38+
from dimos.msgs.sensor_msgs import Image, ImageFormat
3739
from dimos_lcm.sensor_msgs import CameraInfo
38-
from dimos_lcm.geometry_msgs import PoseStamped
39-
from dimos_lcm.std_msgs import Header, Time
40-
from dimos_lcm.geometry_msgs import Pose, Point, Quaternion
40+
from dimos.msgs.geometry_msgs import PoseStamped
41+
from dimos.msgs.std_msgs import Header
4142

4243
logger = setup_logger(__name__)
4344

@@ -591,6 +592,9 @@ def __init__(
591592
self._subscription = None
592593
self._sequence = 0
593594

595+
# Initialize TF publisher
596+
self.tf = TF()
597+
594598
logger.info(f"ZEDModule initialized for camera {camera_id}")
595599

596600
@rpc
@@ -675,12 +679,8 @@ def _capture_and_publish(self):
675679
if left_img is None or depth is None:
676680
return
677681

678-
# Get timestamp
679-
timestamp_ns = time.time_ns()
680-
timestamp = Time(sec=timestamp_ns // 1_000_000_000, nsec=timestamp_ns % 1_000_000_000)
681-
682682
# Create header
683-
header = Header(seq=self._sequence, stamp=timestamp, frame_id=self.frame_id)
683+
header = Header(self.frame_id)
684684
self._sequence += 1
685685

686686
# Publish color image
@@ -709,20 +709,11 @@ def _publish_color_image(self, image: np.ndarray, header: Header):
709709
image_rgb = image
710710

711711
# Create LCM Image message
712-
height, width = image_rgb.shape[:2]
713-
encoding = "rgb8" if len(image_rgb.shape) == 3 else "mono8"
714-
step = width * (3 if len(image_rgb.shape) == 3 else 1)
715-
data = image_rgb.tobytes()
716-
717712
msg = Image(
718-
data_length=len(data),
719-
header=header,
720-
height=height,
721-
width=width,
722-
encoding=encoding,
723-
is_bigendian=0,
724-
step=step,
725-
data=data,
713+
data=image_rgb,
714+
format=ImageFormat.RGB,
715+
frame_id=header.frame_id,
716+
ts=header.ts,
726717
)
727718

728719
self.color_image.publish(msg)
@@ -734,22 +725,12 @@ def _publish_depth_image(self, depth: np.ndarray, header: Header):
734725
"""Publish depth image as LCM message."""
735726
try:
736727
# Depth is float32 in meters
737-
height, width = depth.shape[:2]
738-
encoding = "32FC1" # 32-bit float, single channel
739-
step = width * 4 # 4 bytes per float
740-
data = depth.astype(np.float32).tobytes()
741-
742728
msg = Image(
743-
data_length=len(data),
744-
header=header,
745-
height=height,
746-
width=width,
747-
encoding=encoding,
748-
is_bigendian=0,
749-
step=step,
750-
data=data,
729+
data=depth,
730+
format=ImageFormat.DEPTH,
731+
frame_id=header.frame_id,
732+
ts=header.ts,
751733
)
752-
753734
self.depth_image.publish(msg)
754735

755736
except Exception as e:
@@ -767,7 +748,7 @@ def _publish_camera_info(self):
767748
resolution = info.get("resolution", {})
768749

769750
# Create CameraInfo message
770-
header = Header(seq=0, stamp=Time(sec=int(time.time()), nsec=0), frame_id=self.frame_id)
751+
header = Header(self.frame_id)
771752

772753
# Create camera matrix K (3x3)
773754
K = [
@@ -830,22 +811,25 @@ def _publish_camera_info(self):
830811
logger.error(f"Error publishing camera info: {e}")
831812

832813
def _publish_pose(self, pose_data: Dict[str, Any], header: Header):
833-
"""Publish camera pose as PoseStamped message."""
814+
"""Publish camera pose as PoseStamped message and TF transform."""
834815
try:
835816
position = pose_data.get("position", [0, 0, 0])
836817
rotation = pose_data.get("rotation", [0, 0, 0, 1]) # quaternion [x,y,z,w]
837818

838-
# Create Pose message
839-
pose = Pose(
840-
position=Point(x=position[0], y=position[1], z=position[2]),
841-
orientation=Quaternion(x=rotation[0], y=rotation[1], z=rotation[2], w=rotation[3]),
842-
)
843-
844819
# Create PoseStamped message
845-
msg = PoseStamped(header=header, pose=pose)
846-
820+
msg = PoseStamped(ts=header.ts, position=position, orientation=rotation)
847821
self.pose.publish(msg)
848822

823+
# Publish TF transform
824+
camera_tf = Transform(
825+
translation=Vector3(position),
826+
rotation=Quaternion(rotation),
827+
frame_id="zed_world",
828+
child_frame_id="zed_camera_link",
829+
ts=header.ts,
830+
)
831+
self.tf.publish(camera_tf)
832+
849833
except Exception as e:
850834
logger.error(f"Error publishing pose: {e}")
851835

dimos/manipulation/visual_servoing/detection3d.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,8 @@
2626
from dimos.perception.detection2d.utils import calculate_object_size_from_bbox
2727
from dimos.perception.common.utils import bbox2d_to_corners
2828

29-
from dimos_lcm.geometry_msgs import Pose, Vector3, Quaternion, Point
29+
from dimos.msgs.geometry_msgs import Pose, Vector3, Quaternion
30+
from dimos.msgs.std_msgs import Header
3031
from dimos_lcm.vision_msgs import (
3132
Detection3D,
3233
Detection3DArray,
@@ -39,7 +40,6 @@
3940
Pose2D,
4041
Point2D,
4142
)
42-
from dimos_lcm.std_msgs import Header
4343
from dimos.manipulation.visual_servoing.utils import (
4444
estimate_object_depth,
4545
visualize_detections_3d,
@@ -179,8 +179,8 @@ def process_frame(
179179
else:
180180
# If no transform, use camera coordinates
181181
center_pose = Pose(
182-
Point(obj_cam_pos[0], obj_cam_pos[1], obj_cam_pos[2]),
183-
Quaternion(0.0, 0.0, 0.0, 1.0), # Default orientation
182+
position=Vector3(obj_cam_pos[0], obj_cam_pos[1], obj_cam_pos[2]),
183+
orientation=Quaternion(0.0, 0.0, 0.0, 1.0), # Default orientation
184184
)
185185

186186
# Create Detection3D object

dimos/manipulation/visual_servoing/manipulation_module.py

Lines changed: 11 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,9 @@
2727
import numpy as np
2828

2929
from dimos.core import Module, In, Out, rpc
30-
from dimos_lcm.sensor_msgs import Image, CameraInfo
31-
from dimos_lcm.geometry_msgs import Vector3, Pose, Point, Quaternion
30+
from dimos.msgs.sensor_msgs import Image
31+
from dimos.msgs.geometry_msgs import Vector3, Pose, Quaternion
32+
from dimos_lcm.sensor_msgs import CameraInfo
3233
from dimos_lcm.vision_msgs import Detection3DArray, Detection2DArray
3334

3435
from dimos.hardware.piper_arm import PiperArm
@@ -207,7 +208,9 @@ def __init__(
207208
self.target_click = None
208209

209210
# Place target position and object info
210-
self.home_pose = Pose(Point(0.0, 0.0, 0.0), Quaternion(0.0, 0.0, 0.0, 1.0))
211+
self.home_pose = Pose(
212+
position=Vector3(0.0, 0.0, 0.0), orientation=Quaternion(0.0, 0.0, 0.0, 1.0)
213+
)
211214
self.place_target_position = None
212215
self.target_object_height = None
213216
self.retract_distance = 0.12
@@ -239,22 +242,14 @@ def stop(self):
239242
def _on_rgb_image(self, msg: Image):
240243
"""Handle RGB image messages."""
241244
try:
242-
data = np.frombuffer(msg.data, dtype=np.uint8)
243-
if msg.encoding == "rgb8":
244-
self.latest_rgb = data.reshape((msg.height, msg.width, 3))
245-
else:
246-
logger.warning(f"Unsupported RGB encoding: {msg.encoding}")
245+
self.latest_rgb = msg.data
247246
except Exception as e:
248247
logger.error(f"Error processing RGB image: {e}")
249248

250249
def _on_depth_image(self, msg: Image):
251250
"""Handle depth image messages."""
252251
try:
253-
if msg.encoding == "32FC1":
254-
data = np.frombuffer(msg.data, dtype=np.float32)
255-
self.latest_depth = data.reshape((msg.height, msg.width))
256-
else:
257-
logger.warning(f"Unsupported depth encoding: {msg.encoding}")
252+
self.latest_depth = msg.data
258253
except Exception as e:
259254
logger.error(f"Error processing depth image: {e}")
260255

@@ -896,19 +891,7 @@ def _publish_visualization(self, viz_image: np.ndarray):
896891
"""Publish visualization image to LCM."""
897892
try:
898893
viz_rgb = cv2.cvtColor(viz_image, cv2.COLOR_BGR2RGB)
899-
height, width = viz_rgb.shape[:2]
900-
data = viz_rgb.tobytes()
901-
902-
msg = Image(
903-
data_length=len(data),
904-
height=height,
905-
width=width,
906-
encoding="rgb8",
907-
is_bigendian=0,
908-
step=width * 3,
909-
data=data,
910-
)
911-
894+
msg = Image.from_numpy(viz_rgb)
912895
self.viz_image.publish(msg)
913896
except Exception as e:
914897
logger.error(f"Error publishing visualization: {e}")
@@ -935,7 +918,8 @@ def get_place_target_pose(self) -> Optional[Pose]:
935918
place_pos[2] += z_offset + 0.1
936919

937920
place_center_pose = Pose(
938-
Point(place_pos[0], place_pos[1], place_pos[2]), Quaternion(0.0, 0.0, 0.0, 1.0)
921+
position=Vector3(place_pos[0], place_pos[1], place_pos[2]),
922+
orientation=Quaternion(0.0, 0.0, 0.0, 1.0),
939923
)
940924

941925
ee_pose = self.arm.get_ee_pose()

dimos/manipulation/visual_servoing/pbvs.py

Lines changed: 2 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
from typing import Optional, Tuple, List
2222
from collections import deque
2323
from scipy.spatial.transform import Rotation as R
24-
from dimos_lcm.geometry_msgs import Pose, Vector3, Quaternion, Point
24+
from dimos.msgs.geometry_msgs import Pose, Vector3, Quaternion
2525
from dimos_lcm.vision_msgs import Detection3D, Detection3DArray
2626
from dimos.utils.logging_config import setup_logger
2727
from dimos.manipulation.visual_servoing.utils import (
@@ -158,11 +158,7 @@ def update_tracking(self, new_detections: Optional[Detection3DArray] = None) ->
158158
True if target was successfully tracked, False if lost (but target is kept)
159159
"""
160160
# Check if we have a current target
161-
if (
162-
not self.current_target
163-
or not self.current_target.bbox
164-
or not self.current_target.bbox.center
165-
):
161+
if not self.current_target:
166162
return False
167163

168164
# Add new detections to history if provided

0 commit comments

Comments
 (0)