We study the problem of choosing algorithm hyper-parameters in unsupervised domain adaptation, i.e., with labeled data in a source domain and unlabeled data in a target domain, drawn from a different input distribution. We follow the strategy to compute several models using different hyper-parameters, and, to subsequently compute a linear aggregation of the models. While several heuristics exist that follow this strategy, methods are still missing that rely on thorough theories for bounding the target error. In this turn, we propose a method that extends weighted least squares to vector-valued functions, e.g., deep neural networks. We show that the target error of the proposed algorithm is asymptotically not worse than twice the error of the unknown optimal aggregation. We also perform a large scale empirical comparative study on several datasets, including text, images, electroencephalogram, body sensor signals and signals from mobile phones. Our method outperforms deep embedded validation (DEV) and importance weighted validation (IWV) on all datasets, setting a new state-of-the-art performance for solving parameter choice issues in unsupervised domain adaptation with theoretical error guarantees. We further study several competitive heuristics, all outperforming IWV and DEV on at least five datasets. However, our method outperforms each heuristic on at least five of seven datasets.
ACHA
On a regularization of unsupervised domain adaptation in RKHS
E. R. Gizewski, L. Mayer, B. A. Moser, D. H. Nguyen, and 4 more authors
We analyze the use of the so-called general regularization scheme in the scenario of unsupervised domain adaptation under the covariate shift assumption. Learning algorithms arising from the above scheme are generalizations of importance weighted regularized least squares method, which up to now is among the most used approaches in the covariate shift setting. We explore a link between the considered domain adaptation scenario and estimation of Radon-Nikodym derivatives in reproducing kernel Hilbert spaces, where the general regularization scheme can also be employed and is a generalization of the kernelized unconstrained least-squares importance fitting. We estimate the convergence rates of the corresponding regularized learning algorithms and discuss how to resolve the issue with the tuning of their regularization parameters. The theoretical results are illustrated by numerical examples, one of which is based on real data collected for automatic stenosis detection in cervical arteries.