1Shanghai Jiao Tong University
2Bytedance Research
*Equal contributions †Corresponding author
The overview of RadarLLM. We first encode radar point clouds into discrete tokens via a Motion-guided Radar Tokenizer. The Radar-aware Language Model then aligns these tokens with textual representations in a shared embedding space through joint optimization of unsupervised token reconstruction and supervised bidirectional radar-text translation.
Architecture and training pipeline of motion-guided radar tokenizer. The Motion-guided Radar Tokenizer, built upon our Aggregate VQ-VAE architecture, compresses radar point cloud sequences into discrete semantic tokens through point cloud sequence reconstruction and motion embedding learning.
Virtual radar-text data generation pipeline. The Radar-Text dataset is constructed by simulating radar reflections from SMPL motion sequences using ray tracing and signal processing techniques, based on existing motion-text datasets.
Comparison with state-of-the-art methods on virtual and real datasets.
@InProceedings{shan2025mojito, title = {RadarLLM: Empowering Large Language Models to Understand Human Motion from Millimeter-wave Point Cloud Sequence}, author = {Lai, Zengyuan and Yang, Jiarui and Xia, Songpengcheng and Lin, Lizhou and Sun, Lan and Wang, Renwen and Liu, Jianran and Wu, Qi and Pei, Ling}, journal = {arXiv preprint arXiv:xxxx.xxxxx}, year = {2025} }