Combatting and detecting FGSM and PGD adversarial noise.

Here I have tested and expanded on some of the earlier ideas for dealing with adversarial noise. Here is a list of published defences. This group have made a package for testing the robustness of models against different types of adversarial noise standardising a measure of robustness. The published defences have been tested against the fast gradient sign method(FGSM) noise and projected gradient descent(PGD) noise described here. In my previous blogs, I have tested noise that has been generated for trained models, and saved to a file, for each class and for different epsilon values. This might not be state of the art, but does successfully corrupt inputs, and allows for much faster experimenting. In this blog, I am testing these same models against noise that is generated for each datapoint individually. As I showed in the second blog, some input data is easier to corrupt, and some classes easier to target. The novel changes made to the models are removing the softmax function, and applying a quadratic cost function to the final linear layer of a deep neural network. I refer to these as linear-quadratic(LQ) models. They are compared to models with a softmax function and trained using cross entropy, I refer to these as softmax-cross entropy(SCE) models. I have also pruned the models, this means removing weights by setting them to 0 based on the smallest mean activated value (LQ Pruned). The process for pruning is explained in my first blog here. Notebooks used for previous blogs use a framework called parana, which I built, for building, and pruning models. For the notebook RobustML test.ipynb, found in this Gitub Repo, the models have been loaded into .py files in tensorflow, so the pre-trained models can be tested by anyone.

The advantages of these methods is that they do not require a great deal of extra work, and are agnostic to the types of noise used as an attack. Extra training is not required, and pruning these models is very fast compared to the time taken to train. While I can't ensure that these methods increase robustness to all types of noise, robustness has improved for all types of noise tested.

Robustness improvements

Here are some tables comparing models with the same sized weight matrices trained with different cost functions and pruned.

Fully connected network

FGSM noise


PGD noise


Convolutional network

FGSM noise


PGD noise



There are some robustness improvements to me made by switching the output layer/cost function, not quite enough to be considered a robust model though. Pruning adds big improvements for these fully connected models, but less for the convolutional models, which have not been shown here. Robustness improvements I found to be in the 1%-2% range, which could be well within the range of noise. Part of the reason for this is that convolutional models already have better robustness to begin with, but also because pruning the initial layers has a much bigger influence on robustness, and the fully connected layers come after 3 convolutional layers. This is shown clearly in my first blog. I have had some success pruning weights from convolutional layers, but it is much more difficult, with smaller improvements that fully connected models, and I cannot do this in tensorflow, so it can't be tested with these noise generation methods.

A theory for the improvement seen in pruned models is that there are less pathways for gradients to flow. This may go some way to explain the improvements seen in linear-quadratic models. In earlier experiments I found that linear-quadratic models are more resilient to pruning than softmax-cross entropy models. This suggests that training linear-quadratic models encorages sparsity of active parameters. gradient descent does not lead to weights with a value of zero, but many are so small, they are within a rounding error of zero. This sort of sparsity of weights could lead to the same sort of restrictions on the flow of gradients. I have not tested these models on noise that is supposed to defeat obfuscated gradients. While the improvements in robustness to PGD noise in the convolutional model are not nearly as encoraging,

Output analysis

In my second blog here I have described a method for estimating the confidence of a model. The method is dividing the largest value in the output vector by the second largest value, I show there that this can distinguish between MNIST digits that are better, and worse handwriting from a subjective point of view. This was also quite effective for predicting if the image was corrupted by adversarial noise. A smaller confidence value means the largest and second largest outputs are close together, this could be interpreted as the model 'thinking' that the input image could be either of two outputs, with the confidence being slightly higher for one class than the other. There are images in the MNIST dataset that have low confidence, the blog mentioned above shows that images that could fool a human, like a 3 that looks like a 2, have low confidence estimates. I have not tested this with other datasets, but it is safe to assume that there are always going to be low confidence images, because there are always going to be low quality images, even if the distinction is not as clear as a 3 that looks like a 2.

Fully connected neural network

Here is the plot that I use to demonstrate how many images can be detected using this 1-2 ratio confidence estimate. It is a cumulative distribution function. The X axis is the 1-2 ratio described above, the range of 1-2 ratios changes for different models. The Y axis is the ratio of images with a confidence value above the X axis value. I find this useful for looking at the overlap of distributions. In the first plot about 55% of clean images have a 1-2 ratio above 5, 30% of FGSM corrupted images do, and no PGD corrupted images do.

Full softmax-cross entropy model


Full linear-quadratic model


Pruned linear-quadratic model


Convolution neural network

Full softmax-cross entropy model


Full linear-quadratic model


As you can see above, this method becomes less effective when a model is pruned, I have not shown output analysis for a pruned convolutional model, because there is not as much of an advantage here. The advantage here of linear-quadratic models is clear. These plots are generated using an epsilon value of 0.2 for FGSM noise and 0.1 for PGD noise. These plots only show the confidence values of inputs that have been corrupted, and does not show if the noise has been effective in corrupting the inputs. My second blog has more detail about the 1-2 ratios of effective and ineffective noise.

Different epsilon values

Below are 2 plots from the linear-quadratic convolutional model, showing the behaviour of 1-2 ratios of noise with different epsilon values. These plots are generated using the linear-quadratic convolutional model, with which i observed the least improvements for PGD noise robustness. The range of the X axis is different.

PGD noise


FGSM noise


There are some interesting results from the PDG noise, even in the softmax-cross entropy models, shown in the section above, most confidence values for corrupted inputs are much smaller than for those of clean inputs. For the linear-quadratic model shown here 98.2% of all clean inputs have a 1-2 confidence value of more than 2, while less than 10% PGD noise corrupted inputs with an epsilon value of 0.2 or higher do, 4.5% for an epsilon value of 0.3. FGSM noise has higher confidence values than PGD noise, but with the increase in epsilon values, there is still a drop off in high confidence estimate datapoints. Noise with an epsilon value of 0.05 does seems to have higher confidence estimates, but with the low rates of corruption, high confidence outputs correspond to images that have been unsuccessfully corrupted, giving high confidence, correct outputs.

The 1-2 ratio could be used to reject most of the corrupted inputs, this is not defined as robustness, nor should it be, these things need to be clearly defined. It could be used in a practical sense to improve the security of a system using these models. While the increases in robustness on their own are not as impressive as some other techniques, being able to reject such high percentages of corrupted images with a negligible ammount of clean images could be a consideration for any system using image recognition.


For fully connected models pruning helps robustness, but not the output analysis as much. For convolutional models, pruning has less of a robustness increase, and seems to result in worse confidence predictions. The fully connected model was pruned to get the biggest robustness increase without losing overall performance. The ratio of parameters that are removed can be tweaked to improve confidence estimates while also having an increase in robustness.

Compared to the previous 2 blogs, where I have used saved noise files, the performance of models using these techniques is similar, the trends hold that there are quite big improvements to be made by changing the final layer/cost function, and some improvements to be made by pruning weights with small mean activated values.

The contributing factors to the effectiveness of output analysis are the type of noise, and the epsilon value, the quality of the input value, and the output class. There is quite a big range of performance improvements to be made using these methods, but they will clearly need to be tailored to each individual application.

Next on my to do list should be to try combining these methods with other methods that increase robustness. I will happily take suggestions.

Notebook used

RobustML test