Discrete Semantic Tokenization for Deep CTR Prediction

Discrete Semantic Tokenization for Deep CTR Prediction

Mar. 6th, 2024: Got accepted by TheWebConf ‘24.

Qijiong LIU, Hengchang HU, Jiahao WU, Jieming ZHU, Min-Yen KAN#, Xiao-Ming WU#
[Code] [Paper]

Abstract

Incorporating item content knowledge into deep click-through rate (CTR) models remains challenging, particularly given the constraints of time and space efficiency in industrial scenarios. The content-based paradigm sacrifices time for space, while the embedding-based paradigm trades space for time. We introduce UIST, a user–item semantic tokenization approach guided by the semantic-based paradigm. UIST offers swift training and inference, maintaining limited memory usage. Unlike the embedding-based paradigm, which directly converts item and user semantics into a unified high-dimensional representation, UIST discretizes dense vectors into tokens with shorter lengths. Additionally, we design a hierarchical mixture inference module to analyze the contributions of each user-item token pair.

Citation

1
2
3
4
5
6
7
8
@misc{liu2024semantic,
title={Semantic Tokenization for Deep CTR Prediction},
author={Qijiong Liu and Hengchang Hu and Jiahao Wu and Jieming Zhu and Min-Yen Kan and Xiao-Ming Wu},
booktitle = {Proceedings of the ACM Web Conference 2024},
month = {may},
year = {2024},
address = {Singapore}
}

Discrete Semantic Tokenization for Deep CTR Prediction

https://liu.qijiong.work/2024/03/06/Research-UIST/

Author

Qijiong LIU (Jyonn)

Posted on

2024-03-06

Updated on

2024-05-28

Licensed under

Comments