Posted 2024-08-01Research

CoST: Contrastive Quantization based Semantic Tokenization for Generative Recommendation

Aug. 1^st, 2024: Got accepted by RecSys ‘24.

Jieming ZHU^*, Mengqun JIN^*, Qijiong LIU, Zexuan QIU, Zhenhua DONG, and Xiu LI^#
^*Equal contribution (co-first authors)
[Code] [Paper]

Abstract

Embedding-based retrieval serves as a dominant approach to candidate item matching for industrial recommender systems. With the success of generative AI, generative retrieval has recently emerged as a new retrieval paradigm for recommendation, which casts item retrieval as a generation problem. Its model consists of two stages: semantic tokenization and autoregressive generation. The first stage involves item tokenization that constructs discrete semantic codes to index items, while the second stage autoregressively generates semantic codes of candidate items. Therefore, semantic tokenization serves as a crucial preliminary step for training generative recommendation models. Existing research usually adopts a quantizier with reconstruction loss (e.g., RQ-VAE) to obtain semantic codes of items. But such a method fails to capture the proximity information among items that is essential in modeling item relationships in recommender sytems. In this paper, we propose a contrastive quantization based semantic tokenization approach (dubbed CoST), which leverages both item relationships and semantic information to learn semantic codes. Our experimental results show that semantic tokenization makes a large effect on generative recommendation and CoST brings up to 40$%$ improvements in NDCG@5 and Recall@5 on the MIND dataset over the previous baselines.

Citation

@misc{zhu2024cost,
      title={CoST: Contrastive Quantization based Semantic Tokenization for Generative Recommendation}, 
      author={Jieming Zhu and Mengqun Jin and Qijiong Liu and Zexuan Qiu and Zhenhua Dong and Xiu Li},
      booktitle = {Proceedings of the 18th ACM conference on Recommender Systems},
      month = {oct},
      year = {2024},
      address = {Bari, Italy}
}

CoST: Contrastive Quantization based Semantic Tokenization for Generative Recommendation

https://liu.qijiong.work/2024/08/01/Research-CoST/

Author

Qijiong LIU (Jyonn)

Posted on

2024-08-01

Updated on

2024-08-01

Licensed under

CoST: Contrastive Quantization based Semantic Tokenization for Generative Recommendation

Abstract

Citation

Author

Posted on

Updated on

Licensed under

Comments

Links

Categories

Recents

Archives

Tags