Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/142100
|
Title: | 藉由氣象資料應用羅吉斯迴歸及決策樹模型來預測颱風及水災期間成災與否 Predict Disaster during Typhoons and Floods with Meteorological data by using Logistic Regression and Decision Tree models |
Authors: | 王崇飛 Wang, Chung-Fei |
Contributors: | 張家銘 Chang, Jia-Ming 王崇飛 Wang, Chung-Fei |
Keywords: | 災害 氣象 羅吉斯迴歸 決策樹 風速 雨量 颱風 水災 Disaster Weather Logistic Regression Decision Tree Rain Windspeed Typhoon Flood |
Date: | 2022 |
Issue Date: | 2022-10-05 09:09:06 (UTC+8) |
Abstract: | 臺灣地理位置特殊,每年皆會面對颱風、洪水等天然災害的威脅,雖然無法避免災害的發生,卻能運用各類科技技術,來減少災害來臨時造成的威脅與損傷。 近幾年因科技運算能力的強化,讓大數據、人工智慧與機器學習成為近幾年的熱門關鍵詞,而在災害防救上鮮少有運用數據分析技術針對災情與氣象進行研究,故本文將氣象資料與災情資訊應用羅吉斯迴歸及決策樹建立模型。 本研究透過蒐集歷史氣象資料、災情資訊及氣象測站資料,將前述資料內容進行格式欄位統一、去除無關資料等資料清洗,再將其依據彼此關聯性進行測站內資料整合,以作為後續分析統計及建模之基準資料。 模型則以測站內的氣象資料作為自變數,災情資料作為依變數,透過不同採樣方式進行訓練及測試資料集拆分,建構該氣象測站的模型,並將測試資料集透過模型產出預測的數值,使用混淆矩陣來比較不同條件下的精準度、精準率、召回率及F1-Score。 分析結果得出平均準確率最高為99.7%,平均精準率最高為67.9%,平均召回率最高為81.9%,平均F1-Score最高為48.6%;若單獨以測站來看F1-Score最高為C0M730(嘉義市東區)測站的96.6%,且除C0M730(嘉義市東區)測站外,有60個測站在本文所建立的224個模型的表現(F1-Score>80%),達到預期的結果,其餘未達標的部分將於未來透過其他模型演算法或採樣方式進一步的精進。 科技雖然無法改變氣候,卻可以改變面對氣候時的準備與應變,用最好的準備,來面對最壞的打算。 Due to its special geographical location, Taiwan faces the threat of natural disasters such as typhoons and floods every year. Although the occurrence of disasters cannot be avoided, various type of technology can be used to reduce the threats and damages caused by disasters. In recent years, due to the strengthening of scientific and technological computing capabilities, big data, artificial intelligence, and machine learning have become popular keywords. However, data analysis technology is rarely used in disaster prevention. Therefore, this paper uses the meteorological data and disaster information by Logistic Regression and Decision Tree to build models. This research will first collect meteorological data, disaster information and observation station data, and clean those data by unifying the format and deleting irrelevant data. Then integrate those data based on their correlation in each meteorological observation station to serve as the benchmark data for subsequent analysis, statistics, and modeling. The model using the meteorological data as the independent variable and the disaster data as the dependent variable, and then splits the training and testing data sets through different sampling methods. Build the model of the meteorological station and use the test data set to output the predicted value through the model, use the confusion matrix to compare the accuracy, precision, recall rate and F1-Score under different conditions. The analysis results show that the highest average accuracy rate is 99.7%, the highest average precision rate is 67.9%, the highest average recall rate is 81.9%, and the highest average F1-Score is 48.6%. If look at the observation station alone, the highest F1-Score is C0M730 (East District of Chiayi City) 96.6%. In addition to the C0M730 station, there are 60 stations of the 224 models building in this paper, in the performance reaching the expected(F1-Score> 80%). As a result, the remaining parts that do not reach the standard will be further refined through other model algorithm or sampling methods in the future. Although technology cannot change the climate, it can change the preparation and response to the climate. Use the best preparation to face the worst situation. |
Reference: | [1] 內政部消防署-全民防災E點通-歷年災害專區,取自:https://bear.emic.gov.tw/MY/#/home/disasterInfo/history [2] 國家災害防救科技中心-全球災害事件簿-颱風事件,取自:https://den.ncdr.nat.gov.tw/1132/1188/ [3] 民生公共物聯網-資料服務平台,取自https://ci.taiwan.gov.tw/dsp/index.aspx [4] 中央氣象局-測站代號及站況資料查詢,取自:https://e-service.cwb.gov.tw/wdps/obs/state.htm [5] 內政部TGOS全國門牌地址定位服務,取自:https://www.tgos.tw/tgos/Web/Address/TGOS_Address.aspx [6] Python-pandas, From: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html [7] scikit-learn- LogisticRegression, From:https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html [8] scikit-learn-Decision Tree, From:https://scikit-learn.org/stable/modules/Tree.html [9] imbalanced-learn- SMOT, From:https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.SMOTE.html [10] imbalanced-learn-TomekLink, From:https://imbalanced-learn.org/stable/references/generated/imblearn.under_sampling.TomekLinks.html [11] imbalanced-learn- Combination of over- and under-sampling methods, From:https://imbalanced-learn.org/stable/references/combine.html [12] matplotlib- 3D scatterplot, From:https://matplotlib.org/stable/gallery/mplot3d/scatter3d.html |
Description: | 碩士 國立政治大學 資訊科學系碩士在職專班 106971017 |
Source URI: | http://thesis.lib.nccu.edu.tw/record/#G0106971017 |
Data Type: | thesis |
DOI: | 10.6814/NCCU202201508 |
Appears in Collections: | [資訊科學系碩士在職專班] 學位論文
|
Files in This Item:
File |
Description |
Size | Format | |
101701.pdf | | 5686Kb | Adobe PDF2 | 166 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|