ViVo

Documentation

The documetation on this page outlines the steps involved in organising the dataset, processing undistortion, generating and filtering the point cloud data and generating the masks. The following outlines the steps involved in generating the dataset we used for the NVS tasks in the paper:
You should have downloaded a Scene Name and have a Scene ID (e.g. for Scene Name = athlete_row the ID is 3). These refer to the capture sessions, e.g. all scene using Scene ID 1 were captured on the same day.

Code & Environment Installation

To install the code repository and conda environment. Follow the steps bellow.


# Clone and enter data processing reposiory
git clone https://github.com/azzarelli/dataproc.git
cd dataproc

# Create base conda env
conda create -n vivo python=3.10 -y
conda activate vivo

# Install Pytorch (remember to change the torch & cuda tags for your system)
pip install torch torchvision

# Install requirements
pip install -r requirements.txt
            

RAW File Organisation

The raw folder structure follows:


[Scene Name]/
├──calibration.json
├──rotation_correction.json # Copy and Paste from here
├──train/
│   ├──[Camera ID #1]/
│   │   ├── v1_6_7907_[Camera ID]_depth-image_[Frame ID]_[UTC timestamp].png
│   │   ├── v1_6_7907_[Camera ID]_depth-image_[Frame ID]_[UTC timestamp].png.meta.json
│   │   ├── v1_6_7907_[Camera ID]_colour-image_[Frame ID]_[UTC timestamp].jpg
│   │   ├── v1_6_7907_[Camera ID]_colour-image_[Frame ID]_[UTC timestamp].jpg.meta.json
│   │   ...
│   │
│   ├──[Camera ID #2]/...
│   ...
│   └──[Camera ID #10]/...
│
└──test/
    ├──[Camera ID #11]/...
    ...
    └──[Camera ID #14]/...
            

The train/ and test/ folders follow the same structure. Verify that train/ contains 10x sub-folders and test/ contains 4x subfolder. The camera id's are unique so folders can be merged or changed if users wish.

The provided files are the depth images (.png), RGB color images (.jpg) and the relevant per-frame meta data for each.

Each file name should look like "v1_6_7907_000409113112_colour-image_0000002834_1739289401999071367", where "000409113112" is the camera ID, "0000002834" is the frame number and "1739289401999071367" is the UTC timestamp.

Remeber to copy the correct dict here to rotation_correction.json

Pre-Processing Data (pre_process.py)

The dataset pre-processing script pre_process.py turns the above RAW file structure into the following file structure (used for 3-D reconstruction). This script also allows you to undistort color/depth images, generate point clouds and filter point cloud data. Note that [New Scene Name] is provided by you in the GUI, when inseting the destination fp.


[New Scene Name]/
├──calibration.json
├──rotation_correction.json
├──capture-area.json
├──train/
│   ├──[Camera ID #1]/
│   │   ├──color            # RAW image
│   │   │    ├──[Frame #1].jpg
│   │   │    ...
│   │   │    └──[Frame #300].jpg
│   │   ├──color_corrected  # undistorted image
│   │   │    ├──(same as color/)
│   │   ├──depth            # RAW depth
│   │   │    ├──[Frame #1].png
│   │   │    ...
│   │   │    └──[Frame #300].png
│   │   ├──depth_corrected  # undistorted depth
│   │   │    ├──(same as depth/)
│   │   └──meta
│   │        ├──[Frame #1].color.json
│   │        ├──[Frame #1].depth.json
│   │        ...
│   │        ├──[Frame #300].color.json
│   │        └──[Frame #300].depth.json
│   │
│   ├──[Camera ID #2]/...
│   ...
│   └──[Camera ID #10]/...
│
└──test/
    ├──[Camera ID #11]/...
    ...
    └──[Camera ID #14]/...
            

Various arguments are avaliable to change the window size and set the root directory. Then, by running the following command a GUI will pop up.


python pre_process.py --root-dir /folder/containing/the/scene/ --session 1/2/3
            

The video below shows the steps involved in preprocessing the RAW data into the organised data. The additional GUI options provide allow for RGB and depth image undistortion, the choice of sparse or dense or no point cloud and the choice to produce point cloud for frame #1 or for every frame. You can also filter the point cloud using a radial distance mask or box filter (if you input the correct Session ID using --session #). The box filter will remove all points outisde the staging area (outlined by red tape).


The dataset pre-processing script pre_process.py turns the above RAW file structure into the following file structure (used for 3-D reconstruction). This script also allows you to undistort color/depth images, generate point clouds and filter point cloud data.

You should be able to automate the point cloud generation and undistortion process using the following Pseudo code as template.


dataDirectory = '/path/to/folder'
sceneinfo = [
    ('Name', Session ID),
    ('Name', Session ID),
    ...
]
for info in sceneinfo:
    pcgenerator = Generator(
        datadir=dataDirectory,
        settings={
            "undistort":True, # For undistorting images
            "pcd":{ # For point cloud generation
                "sparse":True,
                "dense":False,
                "perframe":False,
                "initial":True,
                "max_depth": -1., # Use -1 if you do not want to run this 
                "box_filter":False
            }
        },
        session=info[1]
    )
    """ Run the point cloud generator """
    pcgenerator.run()

    """ Generate video of the point cloud representation """
    pcgenerator.generate_novel_views()
            


Mask Generation (mask_gen.py)

This interface works with SAM2. You can install the additional environment variables using the following code. Please visit the official SAM2 repository for information or help on SAM2-related issues. After downloading the checkpoints and configurations files we will be ready to generate masks.


# Make the extended utilities folder
mkdir utils_ext
cd utils_ext

# Clone SAM2 
git clone https://github.com/facebookresearch/sam2.git
cd sam2

# Install SAM2 environment for more information see the SAM2 repository.
pip install -e .

# Download SAM2 checkpoints and configuration files
cd ../../
mkdir checkpoints && mkdir configs

cd checkpoints
>>> download for example 'sam2.1_hiera_large.pt' (this checkpoint works comfortably with an RTX 3090 GPU) and copy it into this folder

cd ../configs
mkdir sam2.1 && cd sam2.1
>>> download for example 'sam2.1_hiera_l.yaml'

# Go to main code (top-level) & run mask_gen.py
cd ../
python mask_gen.py
            

The video below shows the steps involved in applying the mask to each image and frame. Input the target folder, then select the camera you want to generate masks for. You can also select a specific frame to mask. We suggest starting with frame 0 and refining the mask if the propagated SAM2 mask produces incorrect rest.

1. When you are aready, initialize SAM2.

2. Once the model has loaded, click on the image where you want to add positive prompts.

3. Then hit Set Masks.
a. Once the masks have generated, you can add more positive prompts to refine the mask.
b. You can also click Set Pos/Neg to switch to using negative prompts.
c. You can also click Reset Prompt to reset the SAM2 prompts.

4. Once you are happy, hit Propagate and Save. When this is done you can move on to other cameras in the GUI.





Select the rotation_correction.json based on the Session ID

If you do not know the Session ID for your scene, you can find it in the catalogue next to your [Scene Name].


## Copy and Paste on of the following into "[Scene Name]/rotation_correction.json" ##
# For Session 1:
{
	"000809414712":-1,
	"000875114712":1,
	"000906614712":1,
	"000950714712":-1,
	"000236320812":1,
	"000404613112":1,
	"000409113112":1,
	"000454921912":-1,
	"000469213112":-1,
	"000558313112":-1,
	"000582921912":-1,
	"000594313112":-1,
	"000639313112":1,
	"000951614712":1
}
# For Session 2:
{
    "000809414712":-1,
    "000875114712":1,
    "000906614712":1,
    "000950714712":-1,
    "000236320812":1,
    "000404613112":1,
    "000409113112":1,
    "000454921912":-1,
    "000469213112":-1,
    "000558313112":-1,
    "000582921912":-1,
    "000639313112":1,
    "000951614712":1,
    "000594313112":-1
}

# For Session 3:
{
    "000497113112":1,
    "000499613112":-1,
    "000511713112":1,
    "000516213112":1,
    "000236320812":1,
    "000404613112":1,
    "000409113112":1,
    "000454921912":-1,
    "000469213112":-1,
    "000558313112":-1,
    "000582921912":-1,
    "000639313112":1,
    "000951614712":1,
    "000594313112":-1
}