Efficient VLA Models
Parallel generation, block diffusion, discrete diffusion, and efficient decoding for Vision-Language-Action models.
Academic Homepage
Master Student, Fudan University
I work on efficient multimodal generative models for autonomous driving, with a focus on Vision-Language-Action models, diffusion/flow-based planning, and AR-to-diffusion distillation.
About Me
I am a master's student at Fudan University, working on multimodal generative models and autonomous driving. My research focuses on efficient Vision-Language-Action models, especially diffusion- and flow-based methods for motion planning.
Recently, I have been exploring how to convert autoregressive VLAs into parallel generative models through block diffusion, hierarchical distillation, and reinforcement learning. I am broadly interested in efficient multimodal reasoning, embodied intelligence, and safety-critical decision making.
News
Research Interests
My current work centers on efficient VLA planning systems, generative trajectory modeling, and reliable decision making for autonomous driving.
Parallel generation, block diffusion, discrete diffusion, and efficient decoding for Vision-Language-Action models.
Trajectory-as-language, closed-loop planning, NAVSIM / Bench2Drive evaluation, and safety-critical decision making.
Progressive block-wise adaptation, block-wise distillation, and cross-scale distillation from autoregressive VLAs to diffusion models.
GRPO, PDMS reward optimization, simulator-guided feedback, and safe trajectory alignment.
Selected Publications
A discrete flow matching framework for parallel coarse-to-fine motion planning in autonomous driving, enabling efficient bidirectional trajectory refinement with simulator-guided alignment.
A masked diffusion Vision-Language-Action framework for autonomous driving, integrating non-causal trajectory decoding, MoE scaling, and online reinforcement learning for closed-loop planning.
A research direction on transforming pretrained autoregressive VLAs into efficient block diffusion models through progressive block-wise adaptation, block-wise distillation, and cross-scale model distillation.
Selected Projects
Parallel coarse-to-fine motion planning via discrete flow matching.
A VLA-based planning framework that casts future trajectory generation as discrete flow matching over structured trajectory tokens.
Masked diffusion VLA framework for autonomous driving.
A diffusion-based VLA framework that iteratively refines discrete future trajectory tokens using masked denoising and reinforcement learning.
Turning autoregressive VLAs into efficient parallel generative models.
A research line on progressive block-wise adaptation, block-wise teacher-student distillation, and cross-scale diffusion model transfer.
Experience & Education
Fudan University · Shanghai, China
Research on multimodal generative models, Vision-Language-Action models, and autonomous driving.
South China University of Technology · Guangzhou, China
Undergraduate study in the School of Mathematics.
Contact
I am always open to research discussions and collaborations on efficient multimodal generative models, autonomous driving, and embodied intelligence.