Autonomous robots are highly relevant targets for interaction studies, but can exhibit behavioral variability that confounds experimental validity. Currently, testing on real systems is the only means to prevent this, but remains very labour-intensive and often happens too late. To improve this situation, we are working towards early testing by means of partial simulation, with automated assessment, and based upon continuous software integration to prevent regressions. We will introduce the concept and describe a proof-of-concept that demonstrates fast feedback and coherent experiment results across repeated trials.