@@ -20,3 +20,48 @@ image = Image.from_file("path/to/your/image.jpg")
2020response = model.query(image.data, " What do you see in this image?" )
2121print (response)
2222```
23+
24+ ## Moondream Hosted Model
25+
26+ The ` MoondreamHostedVlModel ` class provides access to the hosted Moondream API for fast vision-language tasks.
27+
28+ ** Prerequisites:**
29+
30+ You must export your API key before using the model:
31+ ``` bash
32+ export MOONDREAM_API_KEY=" your_api_key_here"
33+ ```
34+
35+ ### Capabilities
36+
37+ The model supports four modes of operation:
38+
39+ 1 . ** Caption** : Generate a description of the image.
40+ 2 . ** Query** : Ask natural language questions about the image.
41+ 3 . ** Detect** : Find bounding boxes for specific objects.
42+ 4 . ** Point** : Locate the center points of specific objects.
43+
44+ ### Example Usage
45+
46+ ``` python
47+ from dimos.models.vl.moondream_hosted import MoondreamHostedVlModel
48+ from dimos.msgs.sensor_msgs import Image
49+
50+ model = MoondreamHostedVlModel()
51+ image = Image.from_file(" path/to/image.jpg" )
52+
53+ # 1. Caption
54+ print (f " Caption: { model.caption(image)} " )
55+
56+ # 2. Query
57+ print (f " Answer: { model.query(image, ' Is there a person in the image?' )} " )
58+
59+ # 3. Detect (returns ImageDetections2D)
60+ detections = model.query_detections(image, " person" )
61+ for det in detections.detections:
62+ print (f " Found person at { det.bbox} " )
63+
64+ # 4. Point (returns list of (x, y) coordinates)
65+ points = model.point(image, " person" )
66+ print (f " Person centers: { points} " )
67+ ```
0 commit comments