政大機構典藏-National Chengchi University Institutional Repository(NCCUR):Item 140.119/115375
English  |  正體中文  |  简体中文  |  Post-Print筆數 : 27 |  Items with full text/Total items : 114014/145046 (79%)
Visitors : 52054447      Online Users : 422
RC Version 6.0 © Powered By DSPACE, MIT. Enhanced by NTU Library IR team.
Scope Tips:
  • please add "double quotation mark" for query phrases to get precise results
  • please goto advance search for comprehansive author search
  • Adv. Search
    HomeLoginUploadHelpAboutAdminister Goto mobile version
    Please use this identifier to cite or link to this item: https://nccur.lib.nccu.edu.tw/handle/140.119/115375


    Title: 一個能兼具相似度與差異度計算以及再學習機制的有效率電子文件辨識方法:以色情及醫學網頁辨識為例
    Authors: 許志堅
    Contributors: 傳播學院
    Keywords: 文件分類;資料探勘;決策樹;色情網頁過濾
    Filtering Porn Sites;Decision Tree;Data Mining;Document Classification
    Date: 2014
    Issue Date: 2017-12-25 15:02:50 (UTC+8)
    Abstract: 本研究提出一個系統化而且有效的的電子文件辨識方法,除了計算相似性,也能分析其差異性以避免誤判。我們以色情網頁與醫學相關網頁為例,利用機器學習方法當中的決策樹資料探勘演算法來進行不同類型網頁所擁有的特徵屬性的知識學習,尋求其關聯式規則作為未知文件之判斷。並且具備以下特色: 一、色情資訊與醫學資訊的個別知識與混合知識的關聯式規則分析:我們分析色情資訊以及醫學相關資訊可供比對與過濾的特徵,分別設計三種不同類型資料進行決策樹計算:(1)色情網頁決策樹分析:針對色情網頁的特徵進行訓練與計算,尋求單獨過濾色情網頁時的關聯式規則;(2)醫學網頁決策樹分析:針對醫學網頁的特徵進行訓練與計算,尋求單獨辨識醫學網頁時的關聯式規則;(3)色情網頁與醫學網頁混合資料決策樹分析:針對混和色情、醫學資訊之網頁特徵進行訓練與計算,找出在二者資訊可能同時並存的情形下,如何辨識雙方的關聯式規則。除了希望提高對於色情網頁的過濾能力之外,也能正確辨識醫學相關網頁,避免產生誤判與混淆。 二、再學習機制:色情網頁內容可能隨著時間或是當時熱門的事件而有變化、不斷推陳出新,而造成過濾上困難。我們利用機器學習的方式設計一套“再學習”機制,以獲取色情文件特徵值的動態關鍵字變化。 三、提出兼具效率與正確性的色情網頁與醫學(含性教育)網頁過濾機制:本研究以特徵值擷取為基礎,避免對圖片進行費時的分析、同時避免耗時的語意分析計算;運用ID3演算法來建構一個系統化的具備效率的過濾機制,並且獲得較高的過濾準確性。
    In this study, we apply decision tree data mining technique to basic attributes of porn sites to analyze the association rules for indentifying an unknown web site to be either legitimate or porny. We focus on web’s context and apply decision tree data mining technique to analyze the association rules for medical pages and pornographic pages. Then we propose a systematic method to accurately identify an unknown web to be either porny or legitimate. There are three major parts in this project, which are described as follows: (I)To compute associative rules for medical page and pornographic page respectively: We design three kinds of rule database for the computation of decision tree:(1) Database of pornographic page; (2) Database of medical page; (3) Database of mix of medical page and pornographic page. (II)The re-learning mechanism: Since the keywords of pornographic page are constantly changing, we construct a re-learning mechanism of recording new pornographic keywords. (III) An effective filtering mechanism with outstanding ability of recognizing porn sites: Without handling pictures and semantic analysis, we propose our effective filtering method by applying associative rules and keywords only.
    Relation: 執行起迄:2014/08/01~2015/07/31
    103-2410-H-004-112
    Data Type: report
    Appears in Collections:[Department of Radio & Television & Graduate Program] NSC Projects

    Files in This Item:

    File SizeFormat
    index.html0KbHTML129View/Open


    All items in 政大典藏 are protected by copyright, with all rights reserved.


    社群 sharing

    著作權政策宣告 Copyright Announcement
    1.本網站之數位內容為國立政治大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用,惟仍請適度,合理使用本網站之內容,以尊重著作權人之權益。商業上之利用,則請先取得著作權人之授權。
    The digital content of this website is part of National Chengchi University Institutional Repository. It provides free access to academic research and public education for non-commercial use. Please utilize it in a proper and reasonable manner and respect the rights of copyright owners. For commercial use, please obtain authorization from the copyright owner in advance.

    2.本網站之製作,已盡力防止侵害著作權人之權益,如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員(nccur@nccu.edu.tw),維護人員將立即採取移除該數位著作等補救措施。
    NCCU Institutional Repository is made to protect the interests of copyright owners. If you believe that any material on the website infringes copyright, please contact our staff(nccur@nccu.edu.tw). We will remove the work from the repository and investigate your claim.
    DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU Library IR team Copyright ©   - Feedback