🌙 Moondream 3 Preview - Vision Language Model

Experience the power of Moondream 3, a state-of-the-art vision language model with mixture-of-experts architecture. This demo showcases all four skills: Query, Caption, Point, and Detect.

Built with anycoder

Select Task

Advanced Settings

0.1 2
0.1 1
50 2048
Example Queries
Select Task Question Caption Length Object to Find Enable Reasoning (better for complex questions) Temperature Top-p Max Tokens

About Moondream 3

  • Architecture: 9B total parameters, 2B active, with mixture-of-experts
  • Skills: Query (Q&A), Caption, Point detection, Object detection
  • Features: 32K context length, multi-crop high resolution processing
  • Model: moondream/moondream3-preview

Tips:

  • Query: Ask open-ended questions about images or use for text-only tasks
  • Caption: Generate short, normal, or long descriptions of images
  • Point: Find specific objects and get their coordinates
  • Detect: Get bounding boxes for objects in images
  • Enable reasoning for complex visual understanding tasks