Skip to content

voxel51/mose-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mose-v2

Discord Hugging Face Voxel51 Blog Newsletter LinkedIn Twitter Medium

A FiftyOne remote zoo dataset integration for MOSEv2, a large-scale video object segmentation benchmark: thousands of videos, instance masks, and diverse real-world conditions (occlusion, small objects, weather, low light, camouflage, etc.). See the project site and upstream repo for the full benchmark description.

Source and citation

@article{MOSEv2,
  title={{MOSEv2}: A More Challenging Dataset for Video Object Segmentation in Complex Scenes},
  author={Ding, Henghui and Ying, Kaining and Liu, Chang and He, Shuting and Jiang, Xudong and Jiang, Yu-Gang and Torr, Philip HS and Bai, Song},
  journal={arXiv preprint arXiv:2508.05630},
  year={2025}
}

Quick start

Installation

pip install fiftyone
pip install gdown   # required for Google Drive download; see also requirements.txt

Load via the FiftyOne Dataset Zoo

import fiftyone as fo
import fiftyone.zoo as foz

dataset = foz.load_zoo_dataset(
    "https://github.com/voxel51/mose-v2",
    split="train",  # or "validation"
    max_samples=1000,  # optional, for quicker exploration
)

session = fo.launch_app(dataset)

# For a dynamic Grouped view
grouped_view = dataset.group_by("sequence_id", order_by="frame_number")

Notes:

  • Downloads train and validation archives from Google Drive (file IDs are in __init__.py as DRIVE_FILE_IDS).
  • Extracts train/ and valid/ under the FiftyOne-managed dataset directory. A symlink validationvalid is created when needed so split names match FiftyOne’s expectations.
dataset_dir/
  train/
    JPEGImages/<sequence_name>/{00000,00001,...}.jpg
    Annotations/<sequence_name>/{00000,00001,...}.png
  valid/
    JPEGImages/<sequence_name>/{00000,00001,...}.jpg
    Annotations/<sequence_name>/00000.png
  • Registers one sample per video frame. Segmentation is stored as an indexed PNG per frame (ground_truth: fo.Segmentation with mask_path).
  • Annotation masks are 8-bit indexed PNGs: pixel value 0 is background; value N is object instance N.

Sample fields

Field Role
filepath Path to the JPEG frame
sequence_id Video sequence name
frame_number Zero-based frame index
tags Split and sequence (e.g. train, sequence id)
ground_truth Segmentation with mask_path to the indexed PNG

Statistics

Split Sequences Total Samples Annotated Samples
train 3,666 311,843 311,843
validation 433 66,526 433 (first frame only)

Visualize

Each image is tagged with its split and with its sequence name — frames that share a sequence_id belong to the same clip.

For a video-like browser in the App, use a dynamic grouped view — one group per sequence, frames ordered by frame_number.

MOSEv2 sample visualization (grid)

MOSEv2 grouped / carousel view

About

Fiftyone Integration of the version 2 of https://mose.video/, a complex video object segmentation dataset.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages