English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 113648/144635 (79%)
Visitors : 51663432      Online Users : 490
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    政大機構典藏 > 商學院 > 統計學系 > 學位論文 >  Item 140.119/140754
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/140754


    Title: 基於文字探勘技術及模型組合比較結果之旅館推薦應用
    Hotel recommendation application based on text mining technology and model combination comparison results
    Authors: 陳麒仲
    Chen, Chi-Chung
    Contributors: 周珮婷
    陳麒仲
    Chen, Chi-Chung
    Keywords: 旅遊評論
    條件熵
    餘弦相似度
    TF-IDF
    Word2Vec
    SVM
    Travel reviews
    Cosine similarity
    Date: 2022
    Issue Date: 2022-07-01 16:58:15 (UTC+8)
    Abstract: 在這網路發達的時代,人們使用線上訂房網站做預訂旅館已經是稀鬆平常的事,旅館在網站上的評價,也會直接影響旅客在訂房上的選擇。隨著增加自身旅館的評分、減少旅客回應的負面評論,是每家旅館業者所追求的目標,尤其是如何減少負面評論更為重視,所以針對負面評論內提到的問題,去制定改善計畫提升旅館的評價,是個有效的治本方法。對於旅客也希望能夠住到滿意的旅館,不會去影響自身的旅遊體驗,但訂房過程還需要查看每家旅館的資訊,所以經由系統去推薦適合的旅館,不僅能省時也能省力。

    本研究透過網路爬蟲,蒐集訂房網站 Booking.com 上南北歐各一個熱門旅遊國家的旅館評論,以文字探勘 TF-IDF 的方法,配上資訊度量條件熵的方法,找尋特定國家旅館的負面關鍵字,幫助當地旅館業者能制定降低負面評論的計畫,以及定義真實負面評論旅客的標籤,透過詞向量模型和受歡迎的機器學習的分類演算法做出預測,為了著重在抓出真實負面評論旅客,模型評估指標選擇使用 Recall、F1Score、AUC Score 當標準,結果顯示以 Word2Vec 訓練的詞向量模型,以及擅長處於不平衡資料的 SVM 分類模型,兩者的組合模型成效較佳,尤其是由輸入中間的詞,去預測周圍的詞的 Skip gram 模型更優於 CBOW。最後根據預測出的真實負面評論旅客,針對其留過的負面評論,去計算與每間熱門旅館負面關鍵字的餘弦相似
    度得分,推薦相似度得分較低的旅館。
    In this era of the developed Internet, it is common for people to use online booking websites to make hotel reservations. The evaluation of hotels on the website will also directly affect the choice of travelers in booking. Every hotel operator wants to increase the rating of its hotel and reduce the negative reviews responded to by tourists. In particular, reducing negative reviews is more important. Therefore, we should formulate improvement plans for the problems mentioned in the negative reviews. The goal of this research is to help local hoteliers to develop a plan to reduce negative reviews. The web crawlers technique was used to collect hotel reviews on Booking.com. The method of text mining TF-IDF coupled with measuring conditional entropy of information to find the negative keywords of hotels in a specific country was used. Word vector models and popular machine learning classification algorithms were performed to identify the negative review travelers. The model evaluation indicators used are Recall, F1 Score, and AUC Score. The results show that the word vector model trained with Word2Vec and the SVM classification model perform better in imbalanced data settings. The Skip-gram model for predicting surrounding words by inputting the middle word is better than CBOW. Finally, the cosine similarity score was calculated with the negative keywords for each popular hotel, and a hotel recommendation was provided.
    Reference: [1] Aizawa, A.(2003, January). An information-theoretic perspective of tf–idf measures.
    Information Processing & Management Volume 39, Issue 1, Pages 45-65.
    [2] Belgiu, M.(2016,April). Random forest in remote sensing: A review of applications
    and future directions. ISPRS Journal of Photogrammetry and Remote Sensing Volume
    114, Pages 24-31.
    [3] Bouaziz, A., & Christel, D. P., & Pereira, C. C., & Precioso, F., & Lloret Patrick.
    (2014). Short Text Classification Using Semantic Random Forest. Data Warehousing
    and Knowledge Discovery pp 288–299.
    [4] Chen, Y., & Wang, X.(2012). Text feature extraction based on joint conditional entropy. Proceedings of 2012 2nd International Conference on Computer Science and
    Network Technology.
    [5] Cortes, C., & Vapnik, V. (1995). Support-vector networks, Machine Learning volume
    20, pages273–297.
    [6] Eberendu, A. C. (2016, August). Unstructured Data: an overview of the data of Big
    Data. International Journal of Computer Trends and Technology–Volume 38 Number
    1.
    [7] Fazzolari, M., & Petrocchi, M.(2018,August). A study on online travel reviews through
    intelligent data analysis. Information Technology & Tourism volume 20, pages37–58
    (2018).
    [8] Gretzel, U., & Kyung, H. Y.(2008,January). Use and Impact of Online Travel Reviews.
    Information and Communication Technologies in Tourism 2008 pp 35–46.
    [9] Gretzel, U.(2021). Conceptualizing the smart tourism mindset: Fostering. Utopian
    thinking in smart tourism development, 1(1), 3–8.
    [10] Groves, M., & Mundt, K.(2015). Friend or foe? Google Translate in language for
    academic purposes.
    [11] Huang, Y., & Wang, R., & Wei, B., & Zheng, S. L., & Chen, M.(2021,July). Sentiment Classification of Crowdsourcing Participants'ReviewsText Based on LDA Topic
    Model. IEEE Access Volume 9.
    [12] Koo, C., & Xiang, Z., & Gretzel, U., & Sigala, M.(2021,September). Artificial intelligence (AI) and robotics in travel, hospitality and leisure. Electronic Markets volume
    31, pages473–476.
    [13] Mikolov, T., & Chen, K., & Corrado, G., & Dean, J.(2013, January). Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781,
    2013.
    [14] Mikolov, T., & Surskever, I., & Chen, K., & Corrado, G., & Dean, J.(2013, December). Distributed Representations of Words and Phrases and their Compositionality.
    Proceedings of the 26th International Conference on Neural Information Processing
    Systems - Volume 2 Pages 3111–3119.
    [15] Mitra, V., & Wang, C. J., & Banerjee, S.(2007,June). Text classification: A least square
    support vector machine approach. Applied Soft Computing Volume 7, Issue 3, June
    2007, Pages 908-914.
    [16] Mostafa, L(2020). Machine Learning-Based Sentiment Analysis for Analyzing the
    Travelers Reviews on Egyptian Hotels. Proceedings of the International Conference
    on Artificial Intelligence and Computer Vision (AICV2020) pp 405–413
    [17] Noyum, V. D., & Mofenjou, Y. P., & Feudjio, C., & Göktug, A., & Fokoue, E.
    (2021,January). Boosting the Predictive Accurary of Singer Identification Using Discrete Wavelet Transform For Feature Extraction. arXiv - CS - Sound Pub Date : 2021-
    01-31.
    [18] Patel, A., & Meehan, K(2021). Fake News Detection on Reddit Utilising CountVectorizer and Term Frequency-Inverse Document Frequency with Logistic Regression,
    MultinominalNB and Support Vector Machine. 2021 32nd Irish Signals and Systems
    Conference (ISSC).
    [19] Polikar, R.(2012,January). Esemble Learning. Ensemble Machine Learning pp 1–34.
    [20] Ramos, J.(2003, January). Using TF-IDF to Determine Word Relevance in Document
    Queries. Department of Computer Science, Rutgers University, 23515 BPO Way, Piscataway, NJ, 08855.
    [21] Schafer, J. B. & Frankowski, D., & Herlocker, J., & Sen, S.(2007,January). Collaborative Filtering Recommender Systems. The Adaptive Web pp 291–324.
    [22] Schuckert, M. & Liu, X., & Law, R.(2015,August). Hospitality and Tourism Online
    Reviews: Recent Trends and Future Directions. Journal of Travel & Tourism Marketing Volume 32, 2015 - Issue 5.
    [23] Song, S., & Kawamura, H., & Uchida, J. & Saito, H.(2019,April). Determining tourist
    satisfaction from travel reviews. Information Technology & Tourism volume 21, pages337–
    367.
    [24] Stringam, B. B., & Jr, J. G., & Vanleeuwen, D. M.(2010,June).Assessing the Importance and Relationships of Ratings on User-Generated Traveler Reviews. Traveler
    Reviews, Journal of Quality Assurance in Hospitality & Tourism, 11:2, 73-92.
    [25] Tang, Y., & Zhang, Y. Q., & Chawla, N. V., & Krasser, S.(2008,December). SVMs
    Modeling for Highly Imbalanced Classification. IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 1.
    [26] Wisam, A. Q., & Musa, M. A., & Bilal, l. A.(2019, June). An Overview of Bag of
    Words;Importance, 2019 International Engineering Conference (IEC).
    [27] Wu, X., & Kumar, V., & Quinlan, J. R., & Ghosh, J., & Yang, Q., & Motoda, H., &
    McLachlan, G. J., & Ng, A., & Liu, B., & Yu, P. S., & Zhou, Z. H., & Steinbach, M.,
    & Hand, D. J., & Steinberg, D.(2007,December). Top 10 algorithms in data mining.
    Knowledge and Information Systems volume 14, pages1–37.
    [28] Xia, P., & Zhang, L., & Li, F.(2015,June). Learning similarity with cosine similarity.
    ensemble. Information Sciences Volume 307, Pages 39-52.
    [29] Zhao, D., & Du, N., & Chang, Z., & Li, Y.(2017). Keyword extraction for social media
    short text. 2017 14th Web Information Systems and Applications Conference.
    Description: 碩士
    國立政治大學
    統計學系
    109354022
    Source URI: http://thesis.lib.nccu.edu.tw/record/#G0109354022
    Data Type: thesis
    DOI: 10.6814/NCCU202200539
    Appears in Collections:[統計學系] 學位論文

    Files in This Item:

    File Description SizeFormat
    402201.pdf3126KbAdobe PDF20View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback