DETECTION ALGORITHMS AND MODELS FOR SYNTHETICALLY GENERATED VISUAL MEDIA
PDF

Keywords

Deepfake detection, synthetic visual media, spatial-temporal analysis, Vision Transformers, Convolutional Neural Networks, digital forensics, transfer learning.

How to Cite

Khakimbekov , D. (2026). DETECTION ALGORITHMS AND MODELS FOR SYNTHETICALLY GENERATED VISUAL MEDIA. INTERNATIONAL MULTIDISCIPLINARY SCIENCE CONFERENCE, 1(3), 9-12. https://doi.org/10.5281/zenodo.18942887

Abstract

In recent years, the digital landscape has witnessed a surge in the utilization of Generative Adversarial Networks (GANs) and diffusion models to synthesize hyper-realistic visual content. While these technologies have legitimate applications, they are frequently exploited to bypass biometric access control systems, manipulate digital evidence, and execute sophisticated social engineering attacks. Consequently, the detection of synthetically generated visual media has become a critical imperative in the domain of cybersecurity and system analysis. Currently, the most widely deployed detection algorithms operate on a spatial paradigm. They isolate individual video frames and utilize deep learning models to identify localized blending artifacts, lighting inconsistencies, or resolution discrepancies. However, modern generative models have evolved to produce near-perfect single images. Furthermore, when synthetic videos are transmitted across information systems or social media platforms, they undergo heavy compression, which inherently erases the microscopic pixel-level artifacts that spatial detectors rely upon. To overcome these limitations, this research shifts the detection paradigm from purely spatial artifact hunting to continuous temporal logic analysis. The fundamental premise of this work is that while an AI generator can perfectly synthesize an isolated frame, it struggles to maintain physiological consistency and natural movement dynamics across a continuous sequence of frames.

PDF

References

1. Deressa, D. W., et al. (2024). Improved deepfake video detection using convolutional vision transformer. IEEE Gaming, Entertainment, and Media Conference.

2. Dosovitskiy, A., et al. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929.

3. Rossler, A., et al. (2019). Faceforensics++: Learning to detect manipulated facial images. Proceedings of the IEEE/CVF International Conference on Computer Vision.

4. Kaddar, B., et al. (2024). Deepfake detection using spatiotemporal transformer. ACM Transactions on Multimedia Computing, Communications and Applications.