← Back to Site

Research

3D Creation Systems

3D

FlexGen: Flexible Multi-View Generation from Text and Image Inputs

Couples text-and-image conditioning with view-consistent diffusion so creative teams preview avatars before capture.

3D

Sat2City: 3D City Generation from a Single Satellite Image

Cascaded latent diffusion lifts orbital imagery into immersive streets, enabling rapid venue scouting.

3D

PRM: Photometric Stereo Based Large Reconstruction Model

Scales photometric stereo with transformer priors so holographic doubles retain sub-millimeter cues.

3D

Ref-NeuS: Ambiguity-Reduced Neural Implicit Surface Learning

Uses reflection anomaly detection and reflection-aware photometric loss to reconstruct shiny surfaces without ambiguity.

3D

LucidFusion: Reconstructing 3D Gaussians with Arbitrary Unposed Images

Introduces Relative Coordinate Gaussians that align unposed multi-view inputs and rasterize them into consistent 3D geometry.

3D

Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation

Transfers image diffusion priors into 3D pipelines so stage assets keep brand-consistent style.

3D

FlexPainter: Flexible and Multi-View Consistent Texture Generation

Introduces shared conditional embeddings plus view-synchronized diffusion to paint seamless textures from any prompt mix.

3D

Advancing High-Fidelity 3D & Texture Generation with 2.5D Latents

Unifies multiview RGB, normals, and coordinates into 2.5D latents so a lightweight refiner can output coherent 3D.

3D

DiMeR: Disentangled Mesh Reconstruction Model

Separates structure and detail with diffusion-guided priors to recover animatable meshes from sparse captures.

3D

LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching

Uses interval score matching to boost text-to-3D fidelity, ideal for rapid companion prototyping.

AI Agent Systems

Agent

GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs

Pairs large multimodal models with Gaussian controllers so digital actors obey real-world lighting, inertia, and contact rules.

Agent

ComfyMind: Toward General-Purpose Generation via Tree-Based Planning

Builds Semantic Workflow Interfaces plus search-tree planning with localized feedback so ComfyUI agents can execute complex, multi-modal jobs reliably.

Agent

Long-Video Audio Synthesis with Multi-Agent Collaboration

LVAS-Agent splits dubbing into scene segmentation, script writing, sound design, and synthesis with discussion-correction and retrieval loops.

Agent

PreGenie: Agentic Framework for High-Quality Presentation Generation

Two-stage Slidev pipeline where analysis/generation and review/regeneration agents iterate over Markdown + renders for human-level decks.

Agent

PresentCoach: Dual-Agent Presentation Coaching via Exemplars

Ideal Presentation Agent auto-produces exemplar videos, while Coach + Audience Agents deliver multimodal OIS feedback to speakers.

Paper preview