I propose a method which can visually explain the classification decision of deep neural networks(DNNs). Many methods have been proposed in machine learning and computer vision seeking to
clarify the decision of machine learning black boxes, specifically DNNs. All of these methods try to
gain insight into why the network “chose class A” as an answer. Humans search for explanations by
asking two types of questions. The first question is, “Why did you choose this answer?” The second
question asks, “Why did you not choose answer B over A?” The previously proposed methods are
not able to provide the latter directly or efficiently.
I introduce a method capable of answering the second question both directly and efficiently.
In general, the proposed method generates explanations in the input space of any model capable of
efficient evaluation and gradient evaluation. It neither requires any knowledge of the underlying
classifier nor uses heuristics in its explanation generation, and it is computationally fast to evaluate.
I provide extensive experimental results on three different datasets, showing the robustness of my
approach and its superiority for gaining insight into the inner representations of machine learning
models. As an example, I demonstrate my method can detect and explain how a network trained to
recognize hair color actually detects eye color, whereas other methods cannot find this bias in the
trained classifier.
I provide the details on how this framework can be applied to discrete data as well. I
proposed a SoftEmbedding function to be employed in conjunction with the discrete embedding
function. I show results on textual reviews demonstrating my method’s ability to find bias in learned
classifiers.
Finally, I provide the results of a user study, measuring how much this feedback helps
users to improve their understanding of the network’s learned function in comparison with other
possible method