政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/115202
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  全文筆數/總筆數 : 114205/145239 (79%)
造訪人次 : 52334500      線上人數 : 482
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
搜尋範圍 查詢小技巧:
  • 您可在西文檢索詞彙前後加上"雙引號",以獲取較精準的檢索結果
  • 若欲以作者姓名搜尋,建議至進階搜尋限定作者欄位,可獲得較完整資料
  • 進階搜尋
    請使用永久網址來引用或連結此文件: https://nccur.lib.nccu.edu.tw/handle/140.119/115202


    題名: 兩階層式垃圾郵件過濾機制之研究
    A Study of Two-tier Filtering Schemes forAnti-spam
    作者: 葉生正
    蘇民揚
    張僩鈞
    關鍵詞: 支援向量機;貝氏演算法;資訊增益
    SVM;Naive Bayes;Information Gain
    日期: 2006
    上傳時間: 2017-12-18 17:38:26 (UTC+8)
    摘要: 垃圾郵件氾濫於今日,造就各種防堵機制群雄並起,而在內容過濾比對法中又以機械學習理論的支援向量機(Support Vector Machine, SVM)與貝氏演算法(Naïve Bayes)最為出色。故本研究論文主要擷取SVM以超平面快速分類的特點及貝氏演算法的彈性,設計規劃一套兩階層式之垃圾郵件過濾機制。本研究的實驗樣本採用中、英文郵件訓練樣本各1000封,以及測試樣本各200封,於中文斷詞、英文斷字後,再以Information Gain計算結果決定SVM訓練之關鍵字。最後將SVM對測試樣本之分類結果,以本論文定義的四種邊界距離挑選出落於模糊區間的郵件樣本,經由本研究提出之貝氏機率改良模型進行計分以判斷郵件類別。研究結果呈現四種邊界距離擷取出資料再計算後的準確率皆有所提升,其中又以最大距離(Maximum Distance)或平均距離(Average Distance)的改善最顯著;若加上在最佳化模式的預測下,中、英文樣本整體分類的精確度(Accuracy)皆達97%以上,因此可驗證本研究提出之兩階層式過濾機制與貝氏演算法改良模型的可行性與貢獻度。
    The Support Vector Machine (SVM) and Naive Bayes are well-known machine-learning algorithms for the application of content filtering against spam. On the basis of fast classification through the hyper-plane of SVM and flexible threshold setting of Bayes, this paper proposes a two-tier filtering scheme which combine SVM and new Naive Bayes model for anti-spam. In the first tier, Information Gain is the way to decide keywords for training vector of SVM. The paper also provides four kinds of margin of the hyper-plane, and picks out the sampling data which locates on the scope for the second tier Bayesian probability calculation to decide the classification. The experimental results indicate that all kinds of the margin setting bring the improved accuracy about 1% to 4%, especially the Maximum Distance and Average Distance Margin. Additionally, the optimal model performs the total accuracy of Chinese and English sampling mails above 97%. However, the proposed two-tier filtering scheme and new Naive Bayes model were verified with availability.
    關聯: TANET 2006 台灣網際網路研討會論文集
    資通安全、不當資訊防治
    資料類型: conference
    顯示於類別:[TANET 台灣網際網路研討會] 會議論文

    文件中的檔案:

    檔案 描述 大小格式瀏覽次數
    659.pdf363KbAdobe PDF2297檢視/開啟


    在政大典藏中所有的資料項目都受到原著作權保護.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - 回饋