Reference: | Cai, J., Wohn, D. Y., Mittal, A., and Sureshbabu, D. (2018). Utilitarian and hedonic motivations for live streaming shopping. In Proceedings of the 2018 ACM international conference on interactive experiences for TV and online video, pages 81–88. Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805. Fu, W. (2021). Consumer choices in live streaming retailing, evidence from taobao ecom- merce. In The 2021 12th International Conference on E-business, Management and Economics, pages 12–20. Han, J., Yu, Y., Liu, F., Tang, R., and Zhang, Y. (2019). Optimizing ranking algorithm in recommender system via deep reinforcement learning. In 2019 International Con- ference on Artificial Intelligence and Advanced Manufacturing (AIAM), pages 22–26. IEEE. Hofmann, K., Whiteson, S., and Rijke, M. D. (2013). Fidelity, soundness, and efficiency of interleaved comparison methods. ACM Transactions on Information Systems (TOIS), 31(4):1–43. Howard, R. A. (1960). Dynamic programming and markov processes. Jambo Live Streaming Platform (2020). Jambo live streaming platform. https:// jambolive.tv/. Katehakis, M. N. and Veinott Jr, A. F. (1987). The multi-armed bandit problem: decom- position and computation. Mathematics of Operations Research, 12(2):262–268. Kulesza, A., Taskar, B., et al. (2012). Determinantal point processes for machine learning. Foundations and Trends® in Machine Learning, 5(2–3):123–286. Ladosz, P., Weng, L., Kim, M., and Oh, H. (2022). Exploration in deep reinforcement learning: A survey. Information Fusion. Li, L., Chu, W., Langford, J., and Schapire, R. E. (2010). A contextual-bandit approach to personalized news article recommendation. In Proceedings of the 19th international conference on World wide web, pages 661–670. Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., and Wier- stra, D. (2015). Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971. Liu, F., Tang, R., Li, X., Zhang, W., Ye, Y., Chen, H., Guo, H., and Zhang, Y. (2018). Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv preprint arXiv:1810.12027. Liu, Y., Shen, Z., Zhang, Y., and Cui, L. (2021). Diversity-promoting deep reinforcement learning for interactive recommendation. In 5th International Conference on Crowd Science and Engineering, pages 132–139. Meta Platforms (2023). Ax • adaptive experimentation platform. https://ax.dev/. Michael Gimelfarb (2020). Adaptive epsilon-greedy exploration policy using bayesian ensembles. https://github.com/mike-gimelfarb/bayesian-epsilon-greedy. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G., et al. (2015). Human-level control through deep reinforcement learning. nature, 518(7540):529–533. OpenAI (2023). Deep deterministic policy gradient - spinning up documentation. https: //spinningup.openai.com/en/latest/algorithms/ddpg.html/. Plappert, M., Houthooft, R., Dhariwal, P., Sidor, S., Chen, R. Y., Chen, X., Asfour, T., Abbeel, P., and Andrychowicz, M. (2017). Parameter space noise for exploration. arXiv preprint arXiv:1706.01905. Rafailidis, D. and Nanopoulos, A. (2015). Modeling users preference dynamics and side information in recommender systems. IEEE Transactions on Systems, Man, and Cy- bernetics: Systems, 46(6):782–792. Sutton, R. S. and Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press. Tokic, M. (2010). Adaptive ε-greedy exploration in reinforcement learning based on value differences. In KI 2010: Advances in Artificial Intelligence: 33rd Annual German Conference on AI, Karlsruhe, Germany, September 21-24, 2010. Proceedings 33, pages 203–210. Springer. Wikipedia (2022). Ornstein–uhlenbeck process. https://en.wikipedia.org/wiki/ Ornstein%E2%80%93Uhlenbeck_process. Wongkitrungrueng, A., Dehouche, N., and Assarut, N. (2020). Live streaming commerce from the sellers’perspective: implications for online relationship marketing. Journal of Marketing Management, 36(5-6):488–518. Wu, Q., Liu, Y., Miao, C., Zhao, Y., Guan, L., and Tang, H. (2019). Recent advances in diversified recommendation. arXiv preprint arXiv:1905.06589. Yuyan, Z., Xiayao, S., and Yong, L. (2019). A novel movie recommendation system based on deep reinforcement learning with prioritized experience replay. In 2019 IEEE 19th International Conference on Communication Technology (ICCT), pages 1496–1500. IEEE. Zhao, X., Zhang, L., Ding, Z., Xia, L., Tang, J., and Yin, D. (2018). Recommendations with negative feedback via pairwise deep reinforcement learning. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1040–1048. Zheng, G., Zhang, F., Zheng, Z., Xiang, Y., Yuan, N. J., Xie, X., and Li, Z. (2018). Drn: A deep reinforcement learning framework for news recommendation. In Proceedings of the 2018 world wide web conference, pages 167–176. |