Reference: | [1] F. Ricci, L. Rokach, and B. Shapira, "Introduction to recommender systems handbook," in Recommender systems handbook, ed: Springer, 2011, pp. 1-35. [2] G. Adomavicius and A. Tuzhilin, "Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions," IEEE transactions on knowledge and data engineering, vol. 17, pp. 734-749, 2005. [3] M. Deshpande and G. Karypis, "Item-based top-n recommendation algorithms," ACM Transactions on Information Systems (TOIS), vol. 22, pp. 143-177, 2004. [4] G. Linden, B. Smith, and J. York, "Amazon. com recommendations: Item-to-item collaborative filtering," IEEE Internet computing, vol. 7, pp. 76-80, 2003. [5] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, "Item-based collaborative filtering recommendation algorithms," in Proceedings of the 10th international conference on World Wide Web, 2001, pp. 285-295. [6] J. B. Schafer, J. Konstan, and J. Riedl, "Recommender systems in e-commerce," in Proceedings of the 1st ACM conference on Electronic commerce, 1999, pp. 158-166. [7] C. D. Manning, P. Raghavan, and H. Schütze, Introduction to information retrieval vol. 1: Cambridge university press Cambridge, 2008. [8] L. Terveen, W. Hill, B. Amento, D. McDonald, and J. Creter, "PHOAKS: A system for sharing recommendations," Communications of the ACM, vol. 40, pp. 59-62, 1997. [9] B. Mobasher, H. Dai, T. Luo, and M. Nakagawa, "Discovery and evaluation of aggregate usage profiles for web personalization," Data mining and knowledge discovery, vol. 6, pp. 61-82, 2002. [10] Y. G. Jung, M. S. Kang, and J. Heo, "Clustering performance comparison using K-means and expectation maximization algorithms," Biotechnology & Biotechnological Equipment, vol. 28, pp. S44-S48, 2014. [11] J. Vermorel and M. Mohri, "Multi-armed bandit algorithms and empirical evaluation," in ECML, 2005, pp. 437-448. [12] A. Mahajan and D. Teneketzis, "Multi-armed bandit problems," Foundations and Applications of Sensor Management, pp. 121-151, 2008. [13] M. Coggan, "Exploration and exploitation in reinforcement learning," Research supervised by Prof. Doina Precup, CRA-W DMP Project at McGill University, 2004. [14] L. Li, W. Chu, J. Langford, and R. E. Schapire, "A contextual-bandit approach to personalized news article recommendation," in Proceedings of the 19th international conference on World wide web, 2010, pp. 661-670. [15] T. Lu, D. Pál, and M. Pál, "Contextual multi-armed bandits," in Proceedings of the Thirteenth international conference on Artificial Intelligence and Statistics, 2010, pp. 485-492. [16] M. Fukushima, T. Takayama, and M. Takayama, "How Developers Explore and Exploit Instant Innovation from Experiment to Implementing New Product Development," in IFIP International Conference on Product Lifecycle Management, 2014, pp. 507-517. [17] J. C. Gittins, "Bandit processes and dynamic allocation indices," Journal of the Royal Statistical Society. Series B (Methodological), pp. 148-177, 1979. [18] X. Wang, Y. Wang, D. Hsu, and Y. Wang, "Exploration in interactive personalized music recommendation: a reinforcement learning approach," ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 11, p. 7, 2014. [19] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction vol. 1: MIT press Cambridge, 1998. [20] M. Tokic, "Adaptive ε-greedy exploration in reinforcement learning based on value differences," in Annual Conference on Artificial Intelligence, 2010, pp. 203-210. [21] P. Auer, N. Cesa-Bianchi, Y. Freund, and R. E. Schapire, "The nonstochastic multiarmed bandit problem," SIAM journal on computing, vol. 32, pp. 48-77, 2002. [22] J. Langford and T. Zhang, "The epoch-greedy algorithm for multi-armed bandits with side information," in Advances in neural information processing systems, 2008, pp. 817-824. [23] O. Chapelle and L. Li, "An empirical evaluation of thompson sampling," in Advances in neural information processing systems, 2011, pp. 2249-2257. [24] S. L. Scott, "A modern Bayesian look at the multi‐armed bandit," Applied Stochastic Models in Business and Industry, vol. 26, pp. 639-658, 2010. [25] W. R. Thompson, "On the likelihood that one unknown probability exceeds another in view of the evidence of two samples," Biometrika, vol. 25, pp. 285-294, 1933. [26] P. Auer, "Using confidence bounds for exploitation-exploration trade-offs," Journal of Machine Learning Research, vol. 3, pp. 397-422, 2002. [27] P. Auer, N. Cesa-Bianchi, and P. Fischer, "Finite-time analysis of the multiarmed bandit problem," Machine learning, vol. 47, pp. 235-256, 2002. [28] N. Hansen and A. Ostermeier, "Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation," in Evolutionary Computation, 1996., Proceedings of IEEE International Conference on, 1996, pp. 312-317. [29] R. Allesiardo, R. Féraud, and D. Bouneffouf, "A neural networks committee for the contextual bandit problem," in International Conference on Neural Information Processing, 2014, pp. 374-381. [30] R. Féraud, R. Allesiardo, T. Urvoy, and F. Clérot, "Random Forest for the Contextual Bandit Problem," in Proceedings of the 19th International Conference on Artificial Intelligence and Statistics, 2016, pp. 93-101. [31] M. Valko, N. Korda, R. Munos, I. Flaounas, and N. Cristianini, "Finite-time analysis of kernelised contextual bandits," arXiv preprint arXiv:1309.6869, 2013. [32] L. Tang, Y. Jiang, L. Li, and T. Li, "Ensemble contextual bandits for personalized recommendation," in Proceedings of the 8th ACM Conference on Recommender Systems, 2014, pp. 73-80. [33] K.-H. Huang and H.-T. Lin, "Linear Upper Confidence Bound Algorithm for Contextual Bandit Problem with Piled Rewards," in Pacific-Asia Conference on Knowledge Discovery and Data Mining, 2016, pp. 143-155. [34] A. Swaminathan, A. Krishnamurthy, A. Agarwal, M. Dudík, J. Langford, D. Jose, et al., "Off-policy evaluation for slate recommendation," arXiv preprint arXiv:1605.04812, 2016. [35] P. Thomas and E. Brunskill, "Data-efficient off-policy policy evaluation for reinforcement learning," in International Conference on Machine Learning, 2016, pp. 2139-2148. [36] L. Li, W. Chu, J. Langford, and X. Wang, "Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms," in Proceedings of the fourth ACM international conference on Web search and data mining, 2011, pp. 297-306. [37] S. Li, A. Karatzoglou, and C. Gentile, "Collaborative filtering bandits," in Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 2016, pp. 539-548. |