We propose Spatially-Aware open-set Network Dissection (SAND), a technique to identify and label the learned representation of the neurons of deep vision networks. In addition to a neuron's activation strength for an image, we also leverage its spatial pattern of activation, to guide our predictions towards more accurate and relevant concepts, while avoiding being misled by confounding information. We highlight important regions for a neuron through image masking, which has the dual advantage of being able to block out irrelevant concepts from an image and handling irregularly shaped activation regions. We use CLIP to connect highly activating image regions with descriptive concepts, and measure the quality of our results through human evaluation in two domains - natural images and medical images. Finally, we also propose an automated approach to evaluate a neuron's assigned concepts using image generation, to address the lack of ground truth labels for this task.