OJSP, ICASSP 2025
Masahiro Kada1 Ryota Yoshihashi1 Satoshi Ikehata1, 2 Rei Kawakami1 Ikuro Sato1, 3
1Institute of Science Tokyo 2National Institute of Informatics 3Denso IT Laboratory
Abstract
Mixture of experts with a sparse expert selection rule has been gaining much attention recently because of its scalability without compromising inference time. However, unlike standard neural networks, sparse mixture-of-experts models
inherently exhibit discontinuities in the output space, which may impede the acquisition of appropriate invariance to the input perturbations, leading to a deterioration of model performance for tasks such as classification. To address this issue, we propose Pairwise Router Consistency (PRC) that effectively penalizes the discontinuities occurring under natural deformations of input images. With the supervised loss, the use of PRC loss empirically improves classification accuracy on ImageNet-1K, CIFAR-10, and CIFAR-100 datasets, compared to a baseline method. Notably, our method with 1-expert selection slightly outperforms the baseline method using 2-expert selection. We also confirmed that models trained with our method experience discontinuous changes less frequently under input perturbations.
Method
We trained V-MoE [C. Riquelme+, NeurIPS 2021] with Pairwise Router Consistency (PRC) as a regularization term to ensure that the same input image patches are routed consistently.
Evaluation
We conducted quantitative evaluations on the ImageNet-1K, CIFAR-10, CIFAR-100, and Oxford Flowers-102 datasets.
Top-k | ImageNet-1K | CIFAR-10 | CIFAR-100 | Flowers | |
---|---|---|---|---|---|
V-MoE-S | 2 | 75.84% | 95.20% | 81.38% | 89.30% |
V-MoE-S w/ PRC | 2 | 76.27% | 95.36% | 82.27% | 90.18% |
V-MoE-S | 1 | 75.23% | 94.81% | 81.18% | 90.21% |
V-MoE-S w/ PRC | 1 | 75.92% | 95.12% | 82.12% | 91.24% |
We visualized the changes in routing as Gaussian noise was gradually added.

Original Image

V-MoE

V-MoE w/ PRC
BibTeX
@ARTICLE{10858379,
author={Kada, Masahiro and Yoshihashi, Ryota and Ikehata, Satoshi and Kawakami, Rei and Sato, Ikuro},
journal={IEEE Open Journal of Signal Processing},
title={Robustifying Routers Against Input Perturbations for Sparse Mixture-of-Experts Vision Transformers},
year={2025},
volume={},
number={},
pages={1-9},
keywords={Perturbation methods;Routing;Transformers;Predictive models;Contrastive learning;Data models;Computational modeling;Training;Image classification;Computer vision;Mixture of Experts;Dynamic Neural Network;Image Classification;Vision Transformer},
doi={10.1109/OJSP.2025.3536853}}
Acknowledgements
This work was supported by JSPS KAKENHI Grant Number JP22H03642 and DENSO IT LAB Recognition and Learning Algorithm Collaborative Research Chair (Science Tokyo).