Abstract:
Adversarial Attack is a rapidly growing field that
studies how intentionally crafted inputs can fool machine
learning models. This can have severe implications for the
security of machine learning systems, as it can allow attackers
to bypass security measures and cause the system to
malfunction. Finding solutions for these attacks involves
creating specific attack scenarios using a particular dataset and
training a model based on that dataset. Adversarial attacks on a
trained model can significantly reduce accuracy by
manipulating the decision boundary, causing instances initially
classified correctly to be misclassified. This alteration results in
a notable decline in the model's ability to classify instances after
an attack accurately. The above process helps us develop
strategies to defend against these attacks. However, a significant
challenge arises because generating these attack scenarios for a
specific dataset is time-consuming. Moreover, the disparity
between the model's prediction outcomes before and after the
attack tends to lack clear interpretability. In both above
limitations, the common limiting factor is time. The time it takes
to devise a solution is crucial because the longer it takes, the
more opportunity an attacker has to cause harm in real-world
situations. In this paper, we propose two approaches to address
the above gaps: minimizing the time required for attack
generation using data augmentation and understanding the
effects of an attack on the model's decision-making process by
generating more interpretable descriptions. We show that
description can be used to gain insights into how an attack
affects the model's decision-making process by identifying the
most critical features for the model's prediction before and after
the attack. Our work can potentially improve the security of
machine learning systems by making it more difficult for
attackers to generate effective attacks.