Skip to content

Putting it all together

For transferable black-box attacks, we would often initialize more than one model, assigning the models other than the one being directly attacked as black-box victim models, to evaluate the transferability of the attack (the attack's effectiveness under a black-box scenario where the target victim model's internal workings are unknown to the attacker).

A full example

To put everything together, we show a full example that does the following.

  1. Load the NIPS 2017 dataset.
  2. Initialize a pretrained ResNet-50, as the white-box surrogate model, for creating adversarial examples.
  3. Initialize two other pretrained VGG-11 and Inception-v3, as black-box victim models, to evaluate transferability.
  4. Run the classic MI-FGSM attack, and demonstrate its performance.
examples/mifgsm_transfer.py
import torch
from rich.progress import track

from torchattack import MIFGSM, AttackModel
from torchattack.evaluate import FoolingRateMetric, NIPSLoader, save_image_batch

bs = 16
eps = 8 / 255
root = 'datasets/nips2017'
save_dir = 'outputs'
model_name = 'resnet50'
victim_names = ['vgg11', 'inception_v3']
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Initialize the white-box model, fooling rate metric, and dataloader
model = AttackModel.from_pretrained(model_name).to(device)
frm = FoolingRateMetric()
dataloader = NIPSLoader(root, transform=model.transform, batch_size=bs)

# Initialize the attacker MI-FGSM
adversary = MIFGSM(model, model.normalize, device, eps)

# Attack loop and save generated adversarial examples
for x, y, fnames in track(dataloader, description=f'Evaluating white-box {model_name}'):
    x, y = x.to(device), y.to(device)
    x_adv = adversary(x, y)

    # Track fooling rate
    x_outs = model(model.normalize(x))
    adv_outs = model(model.normalize(x_adv))
    frm.update(y, x_outs, adv_outs)

    # Save adversarial examples to `save_dir`
    save_image_batch(x_adv, save_dir, fnames)

# Evaluate fooling rate
cln_acc, adv_acc, fr = frm.compute()
print(f'White-box ({model_name}): {cln_acc:.2%} -> {adv_acc:.2%} (FR: {fr:.2%})')

# For all victim models
for vname in victim_names:
    # Initialize the black-box model, fooling rate metric, and dataloader
    vmodel = AttackModel.from_pretrained(model_name=vname).to(device)
    vfrm = FoolingRateMetric()

    # Create relative transform (relative to the white-box model) to avoid degrading the
    # effectiveness of adversarial examples through image transformations
    vtransform = vmodel.create_relative_transform(model)

    # Load the clean and adversarial examples from separate dataloaders
    clnloader = NIPSLoader(root=root, transform=vmodel.transform, batch_size=bs)
    advloader = NIPSLoader(
        image_root=save_dir,
        pairs_path=f'{root}/images.csv',
        transform=vtransform,
        batch_size=bs,
    )

    # Black-box evaluation loop
    for (x, y, _), (xadv, _, _) in track(
        zip(clnloader, advloader),
        total=len(clnloader),
        description=f'Evaluating black-box {vname}',
    ):
        x, y, xadv = x.to(device), y.to(device), xadv.to(device)
        vx_outs = vmodel(vmodel.normalize(x))
        vadv_outs = vmodel(vmodel.normalize(xadv))
        vfrm.update(y, vx_outs, vadv_outs)

    # Evaluate fooling rate
    vcln_acc, vadv_acc, vfr = vfrm.compute()
    print(f'Black-box ({vname}): {vcln_acc:.2%} -> {vadv_acc:.2%} (FR: {vfr:.2%})')
$ python examples/mifgsm_transfer.py
Evaluating white-box resnet50 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:27
White-box (resnet50): 95.10% -> 0.10% (FR: 99.89%)
Evaluating black-box vgg11 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Black-box (vgg11): 80.30% -> 59.70% (FR: 25.65%)
Evaluating black-box inception_v3 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:02
Black-box (inception_v3): 92.60% -> 75.80% (FR: 18.14%)

Create relative transform for victim models (New in v1.5.0)

Notice in our example how we dynamically created a relative transform for each victim model. We use AttackModel.create_relative_transform such that our relative transform for the victim model does not introduce additional unnecessary transforms such as resize and cropping that may affect the transferability of the adversarial perturbation.

Attack Runner

torchattack provides a simple command line runner at torchattack.evaluate.runner, and the function run_attack, to evaluate the transferability of attacks. The runner itself also acts as a full example to demonstrate how researchers like us can use torchattack to create and evaluate adversarial transferability.

An exhaustive example run the PGD attack,

  • with an epsilon constraint of 16/255,
  • on the ResNet-18 model as the white-box surrogate model,
  • transferred to the VGG-11, DenseNet-121, and Inception-V3 models as the black-box victim models,
  • on the NIPS 2017 dataset,
  • with a maximum of 200 samples and a batch size of 4,
$ python -m torchattack.evaluate.runner \
    --attack PGD \
    --eps 16/255 \
    --model-name resnet18 \
    --victim-model-names vgg11 densenet121 inception_v3 \
    --dataset-root datasets/nips2017 \
    --max-samples 200 \
    --batch-size 4
PGD(model=ResNet, device=cuda, normalize=Normalize, eps=0.063, alpha=None, steps=10, random_start=True, clip_min=0.000, clip_max=1.000, targeted=False, lossfn=CrossEntropyLoss())
Attacking ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:07
Surrogate (resnet18): cln_acc=81.50%, adv_acc=0.00% (fr=100.00%)
Victim (vgg11): cln_acc=77.00%, adv_acc=34.00% (fr=55.84%)
Victim (densenet121): cln_acc=87.00%, adv_acc=37.00% (fr=57.47%)
Victim (inception_v3): cln_acc=92.00%, adv_acc=70.00% (fr=23.91%)