Abstract: | 資訊檢索是一種從一群資訊中取得與特定需求相關的技術。由於現今資訊檢索 技術的盛行,許多研究已開始應用相關技術於不同領域上的資訊分析,如從新聞 與財務報告中提取有用的資訊。在財務領域中,軟訊息(soft information )通常是 指文字、意見、想法和市場評論,而硬訊息(hard information )是指記錄在財務 報告中的數字或一些財務指標(如報酬率與波動度)。在這個計畫中,我將計畫 使用財務報告中的軟資訊並結合硬資訊(如公司的歷史股票波動度)依照公司的 風險對公司做排序。財務風險指的是所選擇的投資工具(如股票)會導致虧損的 潛力。在財務領域中,波動度是一個常用的風險測度,此風險測度會根據多項因 素而有所不同。本計劃的的第一個目標是根據眞實世界的某個數量對某個實體做 排序。在此計畫中,實體可以是某個公司與其財務報告,而數量可以是未來的股 票報酬波動度。具體來説,在此計畫中,我將先把公司在一年内的股票報酬波動 度分等級,而此風險等級可被視爲公司之間的風險相對關係。在將風險由波動度 分爲等級之後,我會試圖使用財務報告中的文字對相對應的公司依據其風險程度 排名。另一方面,通過訓練過後的模型,我會嘗試找出一些有意義的文字並分析 一個公司的風險與這些文字之間的關係。最後,除了使用財務報告中的文字,此 計畫的第二個目標是將硬訊息加入學習模型中,以期有更加的排序結果。 Information retrieval (IR) is the activity of obtaining information resources relevant to an information need from a collection of information resources. Due to the prevalence of IR techniques, a large amount of research focuses on analyzing the information for different applications, such as extracting useful information from news and financial reports. In finance, soft information usually refers to text, including opinions, ideas, and market commentary, whereas hard information is always recorded as numbers in finance reports or financial measures (such as return and volatility). In this project, I plan to use both the soft and hard information to rank the risk of companies. Financial risk is the potential that a chosen investment instruments (e.g., stock) will lead to a loss. In finance, volatility is an empirical measure of risk and will vary based on a number of factors. Given a collection of texts, the first goal of this project is to rank entities associated with the texts according to a real-world quantity. In this proposal, the texts can be financial reports, which are annually published by publicly-traded companies; the quantity can be the volatility of stock returns. Specifically, I would like to split the volatilities of companies within a year into different risk levels, and these risk levels can be considered as the relative difference of risk among the companies. After the splitting, I will use the soft information in the financial reports to rank the companies in an attempt to keep them in line with their relative risk levels. On the other hand, via the trained model, I will try to identify some meaningful terms and analyze the relations between the risk of a company and these terms. Finally, in addition to soft information in financial reports, the second goal of this project is to incorporate hard information, such as the historic volatilities of a company, into the objective function of a learning process. |