Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/94721
|
Title: | "Spaghetti" 主成份分析應用於時間相關區間型資料的研究---以台灣股價為例 A study of Spaghetti PCA for time dependent interval data applied to stock prices in Taiwan |
Authors: | 邱大倞 Chiu, Ta Ching |
Contributors: | 劉惠美 鄭宗記 Liu, Hui Mei Cheng, Tsung Chi 邱大倞 Chiu, Ta Ching |
Keywords: | 主成份分析 區間型資料 時間相關 方向性的區間型資料 Principal component analysis Interval data Time dependent Oriented intervals |
Date: | 2009 |
Issue Date: | 2016-05-09 11:37:47 (UTC+8) |
Abstract: | 區間型資料一般定義為由一個連續型變數的上限及下限所構成,本文中,我們特別引進了一種與時間相關的區間型資料,在Irpino (2006, Pattern Recognition Letters, 27, 504-513),他提出每個觀測值皆是由某個時段的起始值及終點值之有方向性的區間所組成,譬如某一支股票在某一周的開盤價和收盤價。過去已經有許多方法運用在區間型資料,但尚未有方法來處理有方向性的區間型資料,然而Irpino 延伸主成分方法來處理有方向性的區間資料。Irpino提出的方法以幾何學的觀點來說,可視為定義在多維度空間上對有方向性線段(一般都稱作“spaghetti”)的分析,在本文中我們有更作進一步的延伸,不僅引入股票的開盤價及收盤價,且引入當周的最高價及最低價來探索Irpino所遺漏的資訊。此外,我們也嘗試用貝他分配來取代Irpino所使用的均勻分配來檢測是否有明顯的改善。延伸的方法需要計算大量複雜的式子,包含了平均數,變異數,共變異數等,最後利用相關係數矩陣進行主成分分析。然而最後的結論為若考慮資訊的價值,以加入最高值和最小值的延伸方法是較好的選擇。 Interval data are generally defined by the upper and the lower value assumed by a unit for a continuous variable. In this study, we introduce a special type of interval description depending on time. The original idea (Irpino, 2006, Pattern Recognition Letters, 27, 504-513) is that each observation is characterized by an oriented interval of values with a starting and a closing value for each period of observation: for example, the beginning and the closing price of a stock in a week. Several factorial methods have been discovered in order to treat interval data, but not yet for oriented intervals. Irpino presented an extension of principle component analysis to time dependent interval data, or, in general, to oriented intervals. From a geometrical point of view, the proposed approach can be considered as an analysis of oriented segments (nicely called “spaghetti”) defined in a multidimensional space identified by periods. In this paper, we make further extension not only provide the opening and the closing value but also the highest and the lowest value in a week to find out more information that cannot simply obtained from the original idea. Besides, we also use beta distribution to see if there is any improvement corresponding to the original ones. After we make these extensions, many complicated computations should be calculated such as the mean, variance, covariance in order to obtain correlation matrix for PCA. With regard to the value of information, the extended idea with the highest prices and the lowest price is the best choice. |
Reference: | Cazes, P., Chouakria, A., Diday, E. & Schektman, Y. (1997) “Extension de l`analyse en composantes principales á des données de type intervalle”, Revue de Statistique Appliquée, XIV, 3, 5-24.
Chen, P.D. (2009) “An extension of Spaghetti PCA for time dependent interval data”, master thesis, National Chengchi University, Taipei, Taiwan, R.O.C.
Diday, E. (1987) “Introduction l’approche symbolique en Analyse des Donnés”, Première Journées Symbolique-Numerique, Université de Paris IX Dauphine.
Diday, E. (2002) “An Introduction to Symbolic Data Analysis and the Sodas Software”, Journal of Symbolic Data Analysis, 0, ISSN 1723-5081.
Gioia, F. & Lauro, C.N. (2005) “Basic statistical methods for interval data”, Statistica Applicata [Italian Journal of Applied Statistics], 17, 1, 75-104.
Gioia, F. & Lauro, C.N. (2006) “Principal component analysis on interval data”, Computational Statistics, 21, 2, 343-363.
Goupil, F., Touati, M. Diday, E. & Van Der Veen. H. (2000) “Symbolic Analysis of Financial Data ”.
Irpino, A. (2006) “Spaghetti PCA analysis: An extension of principal components analysis to time dependent interval data”, Pattern Recognition Letters, 27, 504-513.
Lauro, C.N. & Palumbo, F. (1998) “New approaches to principal component analysis to interval data, International Seminar on New Techniques & Technologies for Statistics, NTTS’98, 4/6 nov. 1998, Sorrento, Italy.
Lauro, C.N. & Palumbo, F. (2000) “Principal Component Analysis of Interval Data: A Symbolic Data Analysis Approach”, Computational Statistics, 15, 1, 73-87.
Lauro, C.N. & Palumbo, F. (2003) “Some results and new perspectives in Principal Component Analysis for interval data”, 237-244 Atti del Convegno CLADAG`03 Gruppo di Classificazione della Società Italiana di Statistica.
Palumbo, F. & Lauro, C.N. (2003) “A PCA for interval valued data based on midpoints and radii”, New developments in Psychometrics, Yanai H. et al. eds., Psychometric Society, Springer-Verlag, Tokyo.
Zuccolotto, P. (2007) “Principal component of sample estimates: an approach through symbolic data analysis”, Applied & Metallurgical Statistics, 16,173-192. |
Description: | 碩士 國立政治大學 統計學系 96354016 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0096354016 |
Data Type: | thesis |
Appears in Collections: | [統計學系] 學位論文
|
Files in This Item:
File |
Size | Format | |
index.html | 0Kb | HTML2 | 382 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|