UI study reveals why AI models are so easily fooled

Monday, February 9, 2026

Panda with Pixels — The research team corrupted an image of a giant panda (original displayed below) with noise, in this case perturbed pixels. The noise confused the AI model into misclassifying the image as a bubble even though humans can easily still interpret the image as a panda.

University of Iowa engineering and mathematics researchers have discovered mathematical evidence explaining why even the most advanced artificial intelligence systems are still easy to trick. Their study shows that deep neural networks – those used for tasks like image recognition – are fundamentally limited when it comes to high‑dimensional data.

Even the smallest changes to an image can push AI systems into making incorrect predictions. This phenomenon, called adversarial fragility, has puzzled researchers for over a decade, and why this happens has remained a mystery.

“We now have a clearer understanding of why neural networks behave this way with our mathematical analysis and numerical experiments.” said Weiyu Xu, corresponding author of the study. Xu is a UI professor of electrical and computer engineering (ECE) and applied mathematical and computational sciences (AMCS).

Ziqing Lu, a co-first author of this study and PhD candidate mentored by Xu, noted, “Our results show that adversarial fragility isn’t just because of some non-robust features in training data as suggested by previous research. It’s a mathematical consequence of the trained neural network performing feature compression and effectively only looking at a small subset of meaningful features.”

The hope is that these new insights provide a foundation for safer and more reliable AI systems, Lu said.

The paper articulating the findings has been accepted to the 14th International Conference on Learning Representations (ICLR), one of the world’s leading venues for cutting‑edge research in machine learning and artificial intelligence. The conference is in April in Rio de Janeiro, Brazil.

In the study, the team shows that a neural network’s worst‑case robustness can be significantly worse than the robustness of an ideal, theoretically optimal classifier. Their matrix‑theoretic analysis demonstrates how increasing input dimensions can inherently weaken a network’s resistance to adversarial manipulation.

In addition to Xu and Lu, authors include Jingchao Gao, Raghu Mudumbai, Xiaodong Wu, Jirong Yi, Catherine Xu, Myung Cho and Hui Xie from the University of Iowa’s Department of Electrical and Computer Engineering and Program of Applied Mathematical and Computational Sciences.