3D Multimedia

Analytics, Search and Generation




In Conjunction with ICME 2025

June 30 - July 4, Nantes, France

News !

  • Jan 10, 2025:   The website is coming. Call for papers.





Overview


   Today, ubiquitous multimedia sensors and large-scale computing infrastructures are producing at a rapid velocity of 3D multi-modality data, such as 3D point cloud acquired with LIDAR sensors, RGB-D videos recorded by Kinect cameras, meshes of varying topology, and volumetric data. 3D multimedia combines different content forms such as text, audio, images, and video with 3D information, which can perceive the world better since the real world is 3-dimensional instead of 2-dimensional. For example, the robots can manipulate objects successfully by recognizing the object via RGB frames and perceiving the object size via point cloud. Researchers have strived to push the limits of 3D multimedia search and generation in various applications, such as autonomous driving, robotic visual navigation, smart industrial manufacturing, logistics distribution, and logistics picking. The 3D multimedia (e.g., the videos and point cloud) can also help the agents to grasp, move and place the packages automatically in logistics picking systems. Therefore, 3D multimedia analytics is one of the fundamental problems in multimedia understanding. Different from 3D vision, 3D multimedia analytics mainly concentrate on fusing the 3D content with other media. It is a very challenging problem that involves multiple tasks such as human 3D mesh recovery and analysis, 3D shapes and scenes generation from real-world data, 3D virtual talking head, 3D multimedia classification and retrieval, 3D semantic segmentation, 3D object detection and tracking, 3D multimedia scene understanding, and so on. Therefore, the purpose of this workshop is to: 1) bring together the state-of-the-art research on 3D multimedia analysis; 2) call for a coordinated effort to understand the opportunities and challenges emerging in 3D multimedia analysis; 3) identify key tasks and evaluate the state-of-the-art methods; 4) showcase innovative methodologies and ideas; 5) introduce interesting real-world 3D multimedia analysis systems or applications; and 6) propose new real-world or simulated datasets and discuss future directions. We solicit original contributions in all fields of 3D multimedia analysis that explore the multi-modality data to generate the strong 3D data representation. We believe this workshop will offer a timely collection of research updates to benefit researchers and practitioners in the broad multimedia communities.




Call for papers

   We invite submissions for ICME 2025 Workshop, 3D Multimedia Analytics, Search and Generation (3DMM2025), which brings researchers together to discuss robust, interpretable, and responsible technologies for 3D multimedia analysis. We solicit original research and survey papers that must be no longer than 6 pages (including all text, figures, and references). Each submitted paper will be peer-reviewed by at least three reviewers. All accepted papers will be presented as either oral or poster presentations, with the best paper award. Papers that violate anonymity, do not use the ICME submission template will be rejected without review. By submitting a manuscript to this workshop, the authors acknowledge that no paper substantially similar in content has been submitted to another workshop or conference during the review period. Authors should prepare their manuscript according to the Guide for Authors of ICME. For detailed instructions, see here. Submission address is here.
  The scope of this workshop includes, but is not limited to, the following topics:

  • Generative Models for 3D Multimedia and 3D Multimedia Synthesis
  • Generating 3D Multimedia from Real-world Data
  • 3D Multimodal Analysis and Description
  • Multimedia Virtual/Augmented Reality
  • 3D Multimedia Systems
  • 3D Multimedia Search and Recommendation
  • Mobile 3D Multimedia
  • 3D Shape Estimation and Reconstruction
  • 3D Scene and Object Understanding
  • High-level Representation of 3D Multimedia Data
  • 3D Multimedia Application in Industry

  Fast Review for Rejected Regular Submissions of ICME 2025
  We set up a Fast Review mechanism for the regular submissions rejected by the ICME main conference. We strongly encourage the rejected papers to be submitted to this workshop. In order to submit through Fast Review, authors must write a front letter (1 page) to clarify the revision of the paper and attach all previous reviews. All the papers submitted through Fast Review will be directly reviewed by meta-reviewers to make the decisions.




Invited speakers



Prof.
 Yi Jin
Beijing Jiaotong University, China
Title: Research on Key Techniques of Task-Oriented Point Cloud Sampling Based on Deep Learning
Abstract: With the rapid advancement of 3D perception technologies and exponential growth of data scales, sampling techniques have become fundamental preprocessing operations in computer vision. As a critical component of 3D data processing, point cloud sampling not only serves the dual purposes of dimensionality reduction and quality enhancement, but more importantly determines the performance of subsequent perception tasks. This presentation systematically elaborates recent advances in integrating task-oriented sampling with deep learning. Following the proposed "point cloud representation - adaptive sampling mechanism - task performance optimization" framework, this work innovatively establishes a bio-inspired task-oriented sampling theory: Through developing multi-level deep network architectures, key technical bottlenecks have been overcome in inter-point relationship modeling, local feature extraction, regional similarity discrimination, and cross-region correlation mining, achieving remarkable improvements in both sampling quality and task generalization. Finally, the discussion will focus on how multimodal fusion technologies are reshaping the design paradigms and industrial landscapes of next-generation intelligent systems.
Biography: Yi Jin Professor is currently serves as Assistant Dean at the School of Computer and Information Technology, Beijing Jiaotong University. Yi Jin received a Ph.D. from Beijing Jiaotong University, China, in 2010. Her research interests include semantic understanding of traffic video, image processing, and computer vision. Yi Jin has been honored with multiple prestigious awards, including the Nomination for the IEEE Computer Society Annual Best Paper Award (2022), Silver Medal at the 50th International Exhibition of Inventions Geneva (2025), Gold Medal of the Macao International Innovation and Invention Expo (2024), Second Prize of the Innovation Achievement Award from the China Industry-University-Research Institute Collaboration Association (2023), Second Prize of the Science and Technology Award from the China Institute of Communications (2024). Professor Yi Jin has published over 70 high-quality papers in top-tier journals and conferences, including IEEE TIFS, IEEE TMM, AAAI, and ACM MM. Among these, five are ESI Highly Cited Papers. She has served as the editor-in-chief or co-editor of three academic books and textbooks, and contributed to three national and industry standards. She has led or participated in over 20 research projects at the national and ministerial levels, while securing 1 international patent and 35 Chinese invention patents. Yi Jin serves on the editorial boards of two prominent SCI-indexed journals: Journal of The Franklin Instituteand Journal of Electronic Imaging. She also contributes as a guest editor for several leading international and domestic journals in her field.




Dr.
 Wenbo Hu
Tencent ARC Lab, China
Title: GenConstruction: The Mutual Benefit between Content Generation and Reconstruction
Abstract: Video generation models have demonstrated remarkable content generation capabilities and hold great potential as world simulators. However, due to their fundamental nature of modeling in a 2D space, challenges persist in ensuring 3D rationality and consistency. Meanwhile, 3D foundation models have made significant progress, yet the recovery of accurate and stable 3D information from 2D observations in open-world scenarios remains a formidable task. This study discovers that content generation and reconstruction can mutually benefit each other, facilitating a spiral upward development. Specifically, leveraging the capabilities of generation models can significantly enhance the accuracy and stability of 3D reconstruction for 3D foundation models. Conversely, the structural information provided by 3D foundation models can greatly improve the 3D consistency and controllability of video generation models. In this talk, we will share a series of our exploratory works in this direction and discuss the evolving paths of generation and reconstruction.
Biography: Wenbo Hu is currently a senior researcher at Tencent ARC Lab, leading an effort to Generative World Model, including 3D from Images/Videos, Novel View Synthesis, and Video Generation. He obtained his Ph.D. degree in Computer Science and Engineering from The Chinese University of Hong Kong (CUHK) in 2022. His work has been selected as Best Paper Finalist in ICCV'2023. He received the CCF Elite Collegiate Award in 2017. He has served as a reviewer for top-tier conferences and journals, including SIGGRAPH, SIGGRAPH Asia, CVPR, ICCV, ECCV, NeurIPS, ICML, EG, TVCG, IJCV, etc.




Prof.
 Weidong Geng
Zhejiang University, China
Title: Multi-modal Modelling of Body language for Digital Human
Abstract: HumanAIGC is the cross-discipline research area of Digital Human and AIGC. This talk will present how to model body language to drive digital human with correlated visual-audio modals, especial on how to choreograph,stylize and personalize gestures and postures for digital human with textual description, speech and music. The application of AIGC-based Digital Human for TV-program making will also be discussed.
Biography: WEIDONG GENG, Professor, College of Computer Science & Technology, Zhejiang University, PR China. From 1995 to 2000, he was in Zhejiang University, where his research interests are in CAD/CG, and intelligent systems. He joined Fraunhofer Institute for Media Communication (former GMD.IMK), Germany, as a research scientist in 2000. In 2002, he worked in Multimedia Innovation Center, The Hong Kong Polytechnic University, Hong Kong. Since 2003, he has been working in State Key Laboratory of CAD&CG, Zhejiang University, and his current research focuses on computer aided design, digital human, perceptual user interface, interactive media and AIGC.



Organizers

Peng Dai
Noah’s Ark Lab, Canada
Shan An
JD Health, China
Kun Liu
Explore Academy of JD.com, China
Xuri Ge
University of Glasgow, UK
Guoxin Wang
Zhejiang University, China
Wu Liu
University of Science and Technology of China, China
Antonios Gasteratos
Democritus University of Thrace, Greece



Workshop Agenda

Date Description
9:30-9:40 Opening
9:40-10:05 Keynote 1: Research on Key Techniques of Task-Oriented Point Cloud Sampling Based on Deep Learning
10:05-10:30 Keynote 2: Cross-Modal Vision-and-Language Intelligence: Methodologies and Applications
10:30-10:55 Keynote 3: Muldi-modal Modelling of Body Language for Digital Human
10:55-12:15 8 Oral Presentation (~10min * 8)
12:15-12:20 Announce the Best Paper Award, Discussion and Closing



Accepted Papers

Oral Order Date Paper Title
1 10:55-11:05 Keypoint Ensemble For Image Matching
2 11:05-11:15 MF-Adapter: Better 3D Foundation Model with Multimodal Fusion Adapter
3 11:15-11:25 DHGS: Decoupled Hybrid Gaussian Splatting for Driving Scene
4 11:25-11:35 Optimizing Cooperative Multi-Object Tracking using Graph Signal Processing
5 11:35-11:45 Guided Model-based LiDAR Super-Resolution for Resource-Efficient Automotive scene Segmentation
6 11:45-11:55 LIVE-FIT: LED-based Immersive Virtual Environment with Fusion, Interaction, and Transmission
7 11:55-12:05 Benchmarking Learnable Mesh and Texture Representations for Immersive Digital Twins
8 12:05-12:15 MVLLaVA: An Intelligent Agent for Unified and Flexible Novel View Synthesis


Previous Workshops on 3DMM: 3DMM-ICME2022, 3DMM-ICME2023 3DMM-ICME2024

If you have any questions, feel free to contact < peng [DOT] dai [DOT] ca [AT] ieee.org