🌙 Moondream 3 Preview - Vision Language Model
Experience the power of Moondream 3, a state-of-the-art vision language model with mixture-of-experts architecture. This demo showcases all four skills: Query, Caption, Point, and Detect.
Advanced Settings
0.1 2
0.1 1
50 2048
Example Queries
Select Task | Question | Caption Length | Object to Find | Enable Reasoning (better for complex questions) | Temperature | Top-p | Max Tokens |
---|
About Moondream 3
- Architecture: 9B total parameters, 2B active, with mixture-of-experts
- Skills: Query (Q&A), Caption, Point detection, Object detection
- Features: 32K context length, multi-crop high resolution processing
- Model: moondream/moondream3-preview
Tips:
- Query: Ask open-ended questions about images or use for text-only tasks
- Caption: Generate short, normal, or long descriptions of images
- Point: Find specific objects and get their coordinates
- Detect: Get bounding boxes for objects in images
- Enable reasoning for complex visual understanding tasks