FlexGen: Flexible Multi-View Generation from Text and Image Inputs
Couples text-and-image conditioning with view-consistent diffusion so creative teams preview avatars before capture.
Couples text-and-image conditioning with view-consistent diffusion so creative teams preview avatars before capture.
Cascaded latent diffusion lifts orbital imagery into immersive streets, enabling rapid venue scouting.
Scales photometric stereo with transformer priors so holographic doubles retain sub-millimeter cues.
Uses reflection anomaly detection and reflection-aware photometric loss to reconstruct shiny surfaces without ambiguity.
Introduces Relative Coordinate Gaussians that align unposed multi-view inputs and rasterize them into consistent 3D geometry.
Transfers image diffusion priors into 3D pipelines so stage assets keep brand-consistent style.
Introduces shared conditional embeddings plus view-synchronized diffusion to paint seamless textures from any prompt mix.
Unifies multiview RGB, normals, and coordinates into 2.5D latents so a lightweight refiner can output coherent 3D.
Separates structure and detail with diffusion-guided priors to recover animatable meshes from sparse captures.
Uses interval score matching to boost text-to-3D fidelity, ideal for rapid companion prototyping.
Pairs large multimodal models with Gaussian controllers so digital actors obey real-world lighting, inertia, and contact rules.
Builds Semantic Workflow Interfaces plus search-tree planning with localized feedback so ComfyUI agents can execute complex, multi-modal jobs reliably.
LVAS-Agent splits dubbing into scene segmentation, script writing, sound design, and synthesis with discussion-correction and retrieval loops.
Two-stage Slidev pipeline where analysis/generation and review/regeneration agents iterate over Markdown + renders for human-level decks.
Ideal Presentation Agent auto-produces exemplar videos, while Coach + Audience Agents deliver multimodal OIS feedback to speakers.