Exploring the Impact of Rendering Method and Motion Quality on Model Performance when Using a Multi-view Synthetic Data for Action Recognition

Stanislav Panev*1      Emily Kim*1      Sai Abhishek Si Namburu1      Desislava Nikolova2
Celso de Melo3      Fernando de la Torre1      Jessica Hodgins1
1Carnegie Mellon University      2Technical University of Sofia      3Army Research Lab
*Equal Contribution

WACV 2024

REMAG - an HAR dataset suite comprises five datasets: one real and four synthetic by combining two renderers (CG and neural) with two motion sources (motion capture and video-based). Each of them includes three camera views.


This paper explores the use of synthetic data in a human action recognition (HAR) task to avoid the challenges of obtaining and labeling real-world datasets. We introduce a new dataset suite comprising five datasets, eleven common human activities, three synchronized camera views (aerial and ground) in three outdoor environments, and three visual domains (real and two synthetic). For the synthetic data, two rendering methods (standard computer graphics and neural rendering) and two sources of human motions (motion capture and video-based motion reconstruction) were employed. We evaluated each dataset type by training popular activity recognition models and comparing the performance on the real test data. Our results show that synthetic data achieve slightly lower accuracy (4-8%) than real data. On the other hand, a model pre-trained on synthetic data and fine-tuned on limited real data surpasses the performance of either domain alone. Standard computer graphics (CG)-rendered data delivers better performance than the data generated from the neural-based rendering method. The results suggest that the quality of the human motions in the training data also affects the test results: motion capture delivers higher test accuracy. Additionally, a model trained on CG aerial view synthetic data exhibits greater robustness against camera viewpoint changes than one trained on real data.


Human activity classes presented in our dataset suite.



The download links lead to ZIP files.

Real Data

11 activities | 24 subjects | 1.5k sequences | 1.8M frames

Synthetic Data

Human Characters Motion Source
Motion Capture Data
11 Activities | 26 subjects | VICON
Video-based Motions
5 Activities (gestures only) | 15 subjects | VIBE
Rendering Method Computer Graphics
25.4k sequences | 31.4M frames

Aerial View (152.93GB)
MD5: a7eec0f4576242d188e74ee10ebc877e
Google Drive | Alternative Hosting

Ground View (418.78GB)
MD5: ca2955cb44a3ac3c074e53406745f41c
Google Drive | Alternative Hosting
6.1k sequences | 5.0M frames

Aerial View (24.43GB)
MD5: f5657e247de7b74a83ab4df79e1b33e5
Google Drive | Alternative Hosting

Ground view (67.15GB)
MD5: 1cd58dc521c532eaf7a994bffdef62ff
Google Drive | Alternative Hosting
Neural Rendering + Computer Graphics
Liquid Warping GAN + Blender
25.4k sequences | 31.2M frames

Aerial View (91.02GB)
MD5: 3a8ae0c7a539a4e52ff3e112fe9f9af9
Google Drive | Alternative Hosting

Ground view (213.67GB)
MD5: 68a3cdb8019b08fff3cb7d2b9bc05576
Google Drive | Alternative Hosting
6.1k sequences | 5.0M frames

Aerial View (14.52GB)
MD5: f04b912914478515bdbebec84b27b901
Google Drive | Alternative Hosting

Ground view (34.86GB)
MD5: c00dfd92321415b42704bb2e4b6d9cbe
Google Drive | Alternative Hosting


The datasets are released under Creative Commons Attribution 4.0 International (CC BY 4.0) license.


	author    = {Panev, Stanislav and Kim, Emily and Namburu, Sai Abhishek Si and Nikolova, Desislava and de Melo, Celso and De la Torre, Fernando and Hodgins, Jessica},
	title     = {Exploring the Impact of Rendering Method and Motion Quality on Model Performance When Using Multi-View Synthetic Data for Action Recognition},
	booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
	month     = {January},
	year      = {2024},
	pages     = {4592-4602}