Depth estimation from a single image represents a very exciting challenge in computer vision. In this regards Self-supervised monocular depth estimation has gained immense popularity recently because they dont require groundtruth depth during training. Instead of the groundtruth depth map, the current methods rely on the view synthesis as a supervision for depth prediction. Recently there have been works that leverage the semantic cues while training in a multitask setup. But these methods cause some inherent problem while learning task-specific and task-sharing features which result in less accurate depth features. In this work, we propose to explicitly apply a mechanism by which network can weigh features for different tasks and avoid the interference between tasks of depth estimation and semantic sementation. In other words we employ the attention guided encoder network to learn both the task-specific and task-sharing depth features.
Cookie SettingseScholarship uses cookies to ensure you have the best experience on our website. You can manage which cookies you want us to use.Our Privacy Statement includes more details on the cookies we use and how we protect your privacy.