Loading...
|
Please use this identifier to cite or link to this item:
https://nccur.lib.nccu.edu.tw/handle/140.119/157107
|
Title: | CapST: Leveraging Capsule Networks and Temporal Attention for Accurate Model Attribution in Deep-fake Videos |
Authors: | 汪新 Ahmad, Wasim;Peng, Yan Tsung;Chang, Yuan-Hao;Ganfure, Gaddisa Olani;Khan, Sarwar |
Contributors: | 群智博五 |
Date: | 2025-04 |
Issue Date: | 2025-05-27 11:09:35 (UTC+8) |
Abstract: | Deep-fake videos, generated through AI face-swapping techniques, have garnered considerable attention due to their potential for impactful impersonation attacks. While existing research primarily distinguishes real from fake videos, attributing a deep-fake to its specific generation model or encoder is crucial for forensic investigation, enabling precise source tracing and tailored countermeasures. This approach not only enhances detection accuracy by leveraging unique model-specific artifacts but also provides insights essential for developing proactive defenses against evolving deep-fake techniques. Addressing this gap, this article investigates the model attribution problem for deep-fake videos using two datasets—Deepfakes from Different Models (DFDM) and GANGen-Detection, which comprise deep-fake videos and images generated by GAN models. We select only fake images from the GANGen-Detection dataset to align with the DFDM dataset, which specifies the goal of this study, focusing on model attribution rather than real/fake classification. This study formulates deep-fake model attribution as a multiclass classification task, introducing a novel Capsule-Spatial-Temporal (CapST) model that effectively integrates a modified VGG19 (utilizing only the first 26 out of 52 layers) for feature extraction, combined with Capsule Networks and a Spatio-Temporal attention mechanism. The Capsule module captures intricate feature hierarchies, enabling robust identification of deep-fake attributes, while a video-level fusion technique leverages temporal attention mechanisms to process concatenated feature vectors and capture temporal dependencies in deep-fake videos. By aggregating insights across frames, our model achieves a comprehensive understanding of video content, resulting in more precise predictions. Experimental results on the DFDM and GANGen-Detection datasets demonstrate the efficacy of CapST, achieving substantial improvements in accurately categorizing deep-fake videos over baseline models, all while demanding fewer computational resources. |
Relation: | ACM Transactions on Multimedia Computing, Communications and Applications, Vol.21, No.4, pp.1-23 |
Data Type: | article |
DOI 連結: | https://doi.org/10.1145/3715138 |
DOI: | 10.1145/3715138 |
Appears in Collections: | [社群網路與人智計算國際研究生博士學位學程(TIGP)] 期刊論文
|
Files in This Item:
File |
Description |
Size | Format | |
index.html | | 0Kb | HTML | 48 | View/Open |
|
All items in 政大典藏 are protected by copyright, with all rights reserved.
|