Abstract: | 本研究案主要是探討分散式社群Web上具有隱私權保護的WebID分析研究。我們首先論述為何要用開放分散式而不是封閉集中式的個人資料管理控管機制。接著我們提出一個具有電腦規範認知的系統架構,在此架構上個人資料擁有者可以挑選一個可信的資料管理者來匿名化其個人資料與社群網路脈絡的WebID。這些個人化WebID匿名化資料集是以RDF(S)串連式資料型態來提供巨量資料的分析。除此之外我們引用結合R和Hadoop的RHadoop分析系統平台來進行有效且大量RDF(S)為主的分散式社群資料集的分析。最後我們設計並且實做出三種型態的WebID資料集控管所需的電腦可執行規範,主要包括了資料使用者控管規範,資料處理規範,與資料揭露規範,這些電腦規範可以呼叫上述RHadoop資料分析的模組,並且更進一步平衡資料使用效益和個人資料保護間的平衡。這一部份的研究成果已經發表在IEEE Web Intelligent-2014, Warsaw, Poland的國際研討會。 我們也完成另外一份論文的初稿: Propagation Control Services for WebID Analytics on the Decentralized Social Web。並準備投稿到相關的國際電腦科技研討會或專書。本論文初稿是延續上述已經發表論文的內容而從資訊流控管服務(Propagation Control Services)的觀點來分析分散式社群網路之上相關成員如資料擁有者,資料控管者,以及資料使用者之間的關係。我們沿用上述三種WebID資料集控管的三種電腦規範,並且強調這三種電腦規範執行與落實必須要在一個具可靠性與透通性的情況下來進行資料流通控管服務。我們最後點出該如何在資訊流通鏈利用上述三種電腦控管規範來呼叫WebID流通控管服務模組以化解WebID資料保護與效益間所產生的衝突。 本研究案:「語意式資料雲上如何來巧控海量資料分析效益與保護間的平衡」詳細的研究目的、文獻探討、研究方法與研究步驟、結論與未來研究請參考下面已經發表在IEEE Web Intelligence-2014 國際研討會的論文:Privacy-Preserving WebID Analytics on the Decentralized Policy-Aware Social Web (https://dl.acm.org/citation.cfm?id=2682811 )。以及另外一篇投稿中的論文:Propagation Control Services for WebID Analytics on the Decentralized Social Web。另外碩士生孫肇祥同學在103年度的碩士論文:整合R與Hadoop/MapReduce來分析FOAF社群網路,亦為本專案研究計畫成果之一。 We address the research challenges of privacy-preserving WebID analytics on the decentralized Social Web. We first argue why we should use open and decentralized control but not closed and centralized control of personal data management. Then, we present a policy-aware architecture, where a data owner hand-picks a trusted data controller to mask his/her personally identifiable information (PII) and other sensitive social relationships of the WebID so only anonymous RDF(S) linked datasets are available for analytics. Moreover, we advocate using a R and Hadoop integration paradigm, called RHadoop, for effective hybrid WebID analytics of large-scale social network linked datasets. Finally, we propose various types of semantics-enabled policies to call for the RHadoop hybrid WebID analytics and further balance data utility and protection on the privacy-aware Social Web. The primary stakeholders in WebID analytics are the data owner, data controller, and data user. Above three types of semantics-enabled policy are proposed and enforced by data controllers to enable access control, data handling, and data releasing actions on the WebID datasets. The policy enforcement should be accountable and transparent at the data controllers to provide WebID propagation control services. Each data controller enforces a data handling policy to anonymize massive WebIDs. Moreover, the super data controller enforces access control and data releasing policies to ensure that the data owners receive the privacy-preserving WebID analytics services. Finally, we point out how to resolve WebID protection and utility conflict through different types of semantics-enabled policy to call for WebID propagation control services at the data controllers of an information value chain. More detailed information about this project, Crafting the Balance between Big Data Analytics Utility and Protection in the Semantic Data Cloud, MOST 102-2221-E-004-014-, research results, please refer to the paper published at IEEE International Conference on Web Intelligence-2014, Warsaw, Poland (https://dl.acm.org/citation.cfm?id=2682811 ) and another submitting article, Propagation Control Services for WebID Analytics on the Decentralized Social Web. A master student thesis, using R and Hadoop /MapReduce for FOAF-based Social Network Analytics, submitted by Jhao-Siang Sun is one of the results. |