Most objects in the visual world are partially occluded, buthumans can recognize them without difficulty. However, it re-mains unknown whether object recognition models like convo-lutional neural networks (CNNs) can handle real-world occlu-sion. It is also a question whether efforts to make these modelsrobust to constant mask occlusion are effective for real-worldocclusion. We test both humans and the above-mentionedcomputational models in a challenging task of object recogni-tion under extreme occlusion, where target objects are heavilyoccluded by irrelevant real objects in real backgrounds. Ourresults show that human vision is very robust to extreme oc-clusion while CNNs are not, even with modifications to han-dle constant mask occlusion. This implies that the ability tohandle constant mask occlusion does not entail robustness toreal-world occlusion. As a comparison, we propose anothercomputational model that utilizes object parts/subparts in acompositional manner to build robustness to occlusion. Thisperforms significantly better than CNN-based models on ourtask with error patterns similar to humans. These findings sug-gest that testing under extreme occlusion can better reveal therobustness of visual recognition, and that the principle of com-position can encourage such robustness.