Discrete Semantic Tokenization for Deep CTR Prediction
Mar. 6th, 2024: Got accepted by TheWebConf ‘24.
Qijiong LIU, Hengchang HU, Jiahao WU, Jieming ZHU, Min-Yen KAN#, Xiao-Ming WU#
[Code] [Paper]
Abstract
Incorporating item content knowledge into deep click-through rate (CTR) models remains challenging, particularly given the constraints of time and space efficiency in industrial scenarios. The content-based paradigm sacrifices time for space, while the embedding-based paradigm trades space for time. We introduce UIST, a user–item semantic tokenization approach guided by the semantic-based paradigm. UIST offers swift training and inference, maintaining limited memory usage. Unlike the embedding-based paradigm, which directly converts item and user semantics into a unified high-dimensional representation, UIST discretizes dense vectors into tokens with shorter lengths. Additionally, we design a hierarchical mixture inference module to analyze the contributions of each user-item token pair.
Citation
1 | @misc{liu2024semantic, |
Discrete Semantic Tokenization for Deep CTR Prediction