Digital imaging has become one of the most important techniques in environmental monitoring and exploration. In the case of the marine environment, mobile platforms such as autonomous underwater vehicles (AUVs) are now equipped with high-resolution cameras to capture huge collections of images from the seabed. However, the timely evaluation of all these images presents a bottleneck problem as tens of thousands or more images can be collected during a single dive. This makes computational support for marine image analysis essential. Computer-aided analysis of environmental images (and marine images in particular) with machine learning algorithms is promising, but challenging and different to other imaging domains because training data and class labels cannot be collected as efficiently and comprehensively as in other areas. In this paper, we present Machine learning Assisted Image Annotation (MAIA), a new image annotation method for environmental monitoring and exploration that overcomes the obstacle of missing training data. The method uses a combination of autoencoder networks and Mask Region-based Convolutional Neural Network (Mask R-CNN), which allows human observers to annotate large image collections much faster than before. We evaluated the method with three marine image datasets featuring different types of background, imaging equipment and object classes. Using MAIA, we were able to annotate objects of interest with an average recall of 84.1% more than twice as fast as compared to “traditional” annotation methods, which are purely based on software-supported direct visual inspection and manual annotation. The speed gain increases proportionally with the size of a dataset. The MAIA approach represents a substantial improvement on the path to greater efficiency in the annotation of large benthic image collections.