Self-supervised monocular depth estimation (SMDE) is a deep learning algorithm that takes reconstructed images as supervisory signal and generates loss to train the depth estimation model by backpropagation. SMDE has achieved remarkable performance in the autonomous driving field, which utilizes ego-motion matrix to build the relation between frames (show in Fig. 1) [1-3].
To enhance 3D perception of surgeons in minimally invasive surgery (MIS), it is essential to generate precise depth map from laparoscopic images. Different from autonomous driving scene, the training framework is established under the assumption that the position of the camera is static, and objects are dynamic (moving surgical instruments and deformable tissues). Therefore, the ego-motion model is replaced by the 3D displacement (3DD) module.