About
The annual meeting of the Cognitive Science Society is aimed at basic and applied cognitive science research. The conference hosts the latest theories and data from the world's best cognitive science researchers. Each year, in addition to submitted papers, researchers are invited to highlight some aspect of cognitive science.
Volume 14, 1992
Talks
Why Intelligent Systems Should Get Depressed Occasionally and Appropriately
Some researchers suggest that depression may be adaptive. For example, depression may provide an opportunity to assess our capabilities, learn from past failures, u-igger personal change, and allocate activity away from futile goals. There are a variety of signature phenomena associated with depression, such as stable, global, and internal styles of failure explanation, a cognitive loop of failure-related rumination, lowered self-esteem and self-efficacy, and increased negative generalization and depressive realism. DEPlanner is presented, a simulated agent that adapts to failure in a simulated environment and exhibits eight targeted signature phenomena of depression.
Preconditions and Appropriateness Conditions
Classical plan preconditions implicitly play a dual role, both documenting the facts necessary for a plan to be sound and listing the conditions under which it should be used. As the closed-world assumption is relaxed these two roles begin to diverge, particularly when attempts are made to use plans in situations other than those for which they were originally constructed. Rosenschein and Kaelbling exploit one aspect of the divergence by suggesting that some logiceil preconditions can be considered in the design phase of building an agent, but "compiled away" so that the agent need not explicitly consider them [Rosenschein and Kaelbling, 1986]. W e suggest an alternative view whereby an agent can explicitly reason and learn about which conditions are the best cues for employing standard plans, and discuss the idea in the context of the Runner project.
Multi-agent Interactions: A Vocabulary of Engagement
Our project concerns the definition of a content theory of action appropriate for agents that act in a multi-agent environment and its implementation in a multi-agent system. Such a theory has to explain what agents know and how they use this knowledge; it has to identify what resources are available to the agents when they must decide on an action; it has to allow agents to reason and engage in concrete activity in their domain. More important for our research, and in contrast with numerous works in Distributed Artificial Intelligence, such a vocabulary must provide a basis for agents to decide and learn when, how or with whom they should cooperate. In this paper we suggest a vocabulary of interactions for intelligent agents. Our vocabulary attempts to do justice to the situated character of action with respect to the disparate but related dimensions of physicality, sociality and experience.
A Vocabulary for Indexing Plan Interactions and Repairs
Solving the multiple goals problem has been a major issue in Artificial Intelligence models of planning (Sussman, 1975; Sacerdoti, 1975; Wilensky, 1978; Wilensky, 1980; Wilensky, 1983; Carbonell, 1979); however, most models have assumed that the best plan for a set of goals to be satisfied in coivj unction will zirise from a simple combination of the best individual plans for each goal. However, human planners seem to possess an ability to look at a set of goals, and charMterize them as a whole, instezui of as a collection of individual goals (Hayes-Roth and Hayes-Roth, 1979). In this paper, we introduce the notion of indexing complex multiple-goal plans in terms of the interactions between the goals that they satisfy. W e present the vocabulary requirements for representing the causality behii^ goal interactions, the general planning strategies used to resolve these interactions, and the specific plans based on these more general resolution strategies that are instantiated in the actual planning problem.
Dynamic Construction of Mental Models in Connectionsit Networks
The task of "model construction", which is the one of constructing a detailed representation of a situation based on some clue, forms an important component of a number of cognitive activities. This paper addresses the problem of dynamic model construction fix)m a connectionist perspective. It discusses how to represent models as patterns of activity within a connectionist network, and how dynamic generation of such patterns can be efficiently achieved.
Learning Relations in an Interactive Architecture
This paper presents a connectionist architecture for deriving unknown role fillers in relational expressions. First, a restricted solution to the binding problem is presented which ensures systematicity in principle, and allows for sufficient compositionality so as to enable instantiation of shared variables in conjunctive expressions where the same object may fill a variety of roles in a variety of relations. Next, a more detailed architecture is explicated (an extension of McClelland's 1981 "Interactive Activation Competition" architecture) which allows for systematicity in practice while providing a training procedure for relations. Finally, results of the learning procedure for the Family Tree data set (Hinton, 1990) are used to demonstrate robust generalization in this domain.
Developing Microfeatures by Analogy
A technique is described whereby the output of ACME, a localist constraint satisfaction model of analogical mapping (Holyoak & Thagard, 1989) is used to constrain the distributed representations of domain objects developed by Hinton's (1986) multilayer model of prepositional learning. In a series of computational experiments, the ability of Hinton's network to transfer knowledge from a source domain to a target domain is systematically examined by first training the model on the full set of propositions representing a source domain together with a subset of propositions representing an isomorphic target domain, and then testing the network on the untrained target propositions. Without additional constraints, basic gradient descent can recover only a negligible proportion of the untrained propositions. Comparison of simulation results using various combinations of the distributed mapping technique and weight decay, indicate that general purpose network optimization techniques may go some ways towards improving the transfer performance of distributed network models. However, performance can be improved substantially more when optimization techniques are combined with the distributed representation mapping technique.
Direct, Incremental Learning of Fuzzy Propositions
To enable the gradual learning of symbolic representations, a new fuzzy logical operator is developed that supports the expression of negation to degrees. As a result, simple fuzzy propositions become instantiable in a feedforward network having multiplicative nodes and tunable negation links. A backpropagation learning procedure has been straightforwardly developed for such a network and applied to effect the direct, incremental learning of fuzzy propositions in a natural and satisfying manner. Some results of this approach and comparisons to related approaches are discussed as well as directions for further extension.
The Effects of Pattern Presentation on Interference in Backpropagation Networks
This paper reviews six approaches to solving the problem of 'catastrophic sequential interference". It is concluded that all of these methods function by reducing (or circumventing) hidden-layer overlap. A new method is presented, called 'random rehearsal training', that further explores an approach introduced by Hetherington and Seidenberg (1989). A constant number of patterns, randomly selected from those learned earlier, is rehearsed with every newly learned pattern. This scheme of rehearsing patterns may, perhaps, be compared to the functioning of the 'aniculatory loop' (Baddeley, 1986). It is shown that this presentation method may virtually eliminate sequential interference.
Developmental Changes in Infants' Perceptual Processing of Biomechanical Motions
In order to process the reduced information in pointlight images of human movement, observers rely upon general processing heuristics as well as representations more specific to human gait This paper explores changes in the perception of structure from motion in young infants. W e re-examined data from 17 experiments, involving infants of 3- and S-months old, to determine which stimulus features of pointlight motion infants use to organize percepts, and how perception changes. By combining discrimination and encoding information w e provide a picture of developing perceptual processes. Five-month-olds encode the stimuli more quickly than 3-month-olds, while the younger infants discriminate pairs of stimuli more frequently. Infants of both ages use phase information to discriminate displays. Three-month olds discriminate canonical forms from modified forms when the stimuli are organized about a vertical axis, whereas S-month olds discriminate these f a m s only when one of the figures take on a human-like configuration. These results support a view in which differential skill in what is encoded characterizes development Furthermore, this work may help guide the integration of theory-formation models with heuristic and constraint-based models, into a more complete account of perception.
The Role of Measurement in the Construction of Conservation Knowledge
Conservation knowledge and measurement abilities are two central components in quantitative development. Piaget's position is that conservation is a logical pre-requisite of measurement, while Miller's is the reverse. In this paper we illustrate how measurement is employed as an empirical tool in the construction of conservation knowledge. This account predicts the familiar pattern of conservation development from the limits on young children's measurement abilities. We present Q-Soar, a computational model that acquires number conservation knowledge by simulating children's performance in a published conservation training study. This model shows that measurement enables a verification process to be executed which is the basis of conservation learning.
An Invesitgation of Balance Scale Success
The success of a connectionist model of cognitive development on the balance scale task is due to manipulations which impede convergence of the backpropagation learning algorithm. The model was trained at different levels of a biased training environment with exposure to a varied number of training instances. The effects of weight updating method and modifying the network topology were also examined. In all cases in which these manipulations caused a decrease in convergence rate, there was an increase in the proportion of psychologically realistic nms. W e conclude that incremental connectionist learning is not sufficient for producing psychologically successful connectionist balance scale models, but must be accompanied by a slowing of convergence.
Chunking Processes and Context Effects in Letter Perception
Chunking is formalized as the dual process of building percepts by recognizing in stimuli chunks stored in memory, and creating new chunks by welding together those found in the percepts. As such it is a very attractive process with which to account for phenomena of perception and learning. Servan-Schreiber and Anderson (1990) demonstrated that chunking is at the root of the "implicit learning" phenomenon, and Servan-Schreiber (1990; 1991) extended that analysis to cover category learning as well. This paper aims to demonstrate the potential of chunking as a theory of perception by presenting a model of context effects in letter perception. Starting from a set of letter segments the model creates from experience chunks that encode partial letters, then letters, then partial words, and finally words. The model's ability to recognize letters alone, or in words, pseudo-words, or strings of unrelated letters is then tested using a backward masking task. The model reproduces the word and pseudoword superiority effects.
Abstractness and Transparency in the Mental Lexicon
This research is concerned with the structure and properties of the mental representations for morphologically complex words in English. In a series of experiments, using a cross-modal priming task, we ask whether the lexical entry for derivationally suffixed and prefixed words is morphologically structured or not, and how this relates to the semantic and the phonological tran^Mirency of the relationship between the stem and the affix govern + ment is semantically transparent, depart + ment is not; happy + ness is phonologically transparent, vain + ity is not). We find strong evidence for morphological decomposition, at the level of the lexical entry, for semantically transparent prefixed and suffixed forms, independent of the degree of surface transparency in the phonological relationship between the stem and the affix. Semantically opaque forms, in contrast, seem to behave like monomorphemic words. We discuss the implications of this for a theory of lexical representation and the processes of acquisition.
Polysemy and Lexical Representation: The Case of Three English Prepositions
This paper is a preliminary analysis from a cognitive linguistics perspective of the meaning of three very high frequency prepositions in English, at, on, and in, which are argued to be inherently polysemous. Although these so-called grammatical morphemes are usually defined in terms of topological relations, the majority of their usages are far too abstract or non-geometric for such spatially-oriented characterizations. Because they seem to sustain a variety of meanings which often overlap, they are exemplary lexical items for testing theories of lexical representation. Arguments against monosemous accounts center on their inability to formulate schemas which include all appropriate usages while excluding usages of other prepositions. Many of the usages differ only on the basis of variable speaker perspective and construal. A polysemic account is currently being developed and tested experimentally in a series of studies involving how native and non-native speakers of English evaluate and categorize various usages of the different prepositions. Initial results indicate that diese naive categorizations reflect a gradient of deviation from a canonical spatial sense. Furthermore, deviant usages tend to form fairly robust clusters consonant with a constrained polysemic analysis.
Grammaticality Judgement in Chinese-English bilinguals: A gating experiment
An on-line gating method was used to investigate Chinese-English bilinguals' performance in a grammaticality judgment task. Evidence of different transfer patterns (i.e., backward and forward transfer in early and late second language acquisition) was found in the data reported here. There were strong and systematic relations between performance on the judgments of grammaticality and a separate sentence interpretation task. However, there is also some evidence that inter-language transfer or interference occurs earlier in acquisition for the judgment task than for the sentence interpretation. Judgments of wellformedness might be one of the first domains to "soften" when one language comes into contact with another. Furthermore, it is possible that Chinese and English are more "inter-penetrable" for both forward and backward transfer between these two languages than has been observed between any two language types to date.
Understanding English Past-Tense Formation: The Shared Meaning Hypothesis
It has long been controversial whether language behavior is best described and explained with reference to the constructs of formal linguistic theory, or with reference to information processing concepts and the communicative goals of speakers. Recent work by Kim, Pinker, Prince and Prasada (1991) argues that the vocabulary of formal grammatical theory is essential to psychological explanation. They demonstrate that speakers' evaluation of the well formedness of past-tense forms is sensitive to whether novel verb forms are perceived to be extended from nouns or verbs. 1 show this pattern of preferences to be a consequence of semantic similarity between the novel sense of the verb and the irregular verb to which it is phonologically related. The data is consistent with the tenets of functional grammar: speaker' choice of one linguistic form over another is influenced by perceived communicative gain (Kuno, 1987; Bates & MacWhinney, 1989). The salient task in judging novel verbs phonologically related to irregular verbs is guarding against miscommunication. Dizzy Dean aside, that so few mortals have ever flown out to center field testifies to speakers' success.
A Recognition Model of Geometry Theorem-Proving
This paper describes POLYA, a computer program that writes geometry proofs. POLYA actively collects features from a geometry diagram on the basis of which it recognizes and applies knowledge from known examples. We present a vocabulary of visual targets, results, and actions to support incremental parsing of diagrams. W e also show how scripts can be used to organize visual actions into useful sequences. W e show how those sequences can be used to parse diagrams and instantiate proofs. Finally, we show how scripts represent the implicit spatial knowledge conveyed by examples.
Simulating Theories of Mental Imagery
A knowledge representation scheme for computationed imagery has previously been proposed. This scheme incorporates three interrelated representations: a long-term memory descriptive representation and two working memory representations, corresponding to the distinct visual and spatial components of mental imagery. It also includes a set of primitive functions for reconstructing and reasoning with image representations. In this paper we suggest that the representation scheme addresses the controversy involved in the imagery debate by providing the computational tools for specifying, implementing euid testing edternative theories of mental imagery. This capability is illustrated by considering the representation and processing issues involved in the mental rotation task.
Fractal (Reconstructive Analogue) Memory
This paper' proposes a new approach to mental imagery that has the potential for resolving an old debate. W e show that the methods by which fractals emerge from dynamical systems provide a natural computational framework for the relationship between the "deep" representations of long-term visual memory and the "surface" representations of the visual array, a distinction which was proposed by (Kosslyn, 1980). The concept of an iterated function system (IFS) as a highly compressed representation for a complex topological set of points in a metric space (Bamsley, 1988) is embedded in a connectionist model for mental imagery tasks. T w o advantages of this approach over previous models are the capability for topological transformations of the images, and the continuity of the deep representations with respect to the surface representations.
When Can Visual images Be Re-Interpreted? Non-Chronometric Tests of Pictorialism
The question of re-interpreting images can be seen as a new focus for the imagery debate since the possibility would appear to be a direct prediction of the pictorial account Finke, Pinker and Farah (1989) have claimed that their results "refute" the earlier negative evidence of Chambers and Reisberg (1985), while Peterson, Kihlstrom, Rose & Glisky (1992) have used the ambiguous stimuli of Chambers and Reisberg to show that under certain conditions, these images may be reinterpreted after all. By employing newly devised tasks, our o w n experiments have provided further conflicting evidence concerning the conditions under which images can and cannot be reinterpreted. W e consider their bearing on the fundamental 'format* issue which neither Finke et al (1989) nor Peterson et al. (1992) address directly.
"Ill-Structured Representations" for Ill-Structured Problems
While the distinction between well-structured and ill-structured problems is widely recognized in cognitive science, it has not generally been noted that there are often significant differences in the external representations which accompany the two problem types. It is hue suggested that there is a corresponding distinction to be made between "well-structured" and "ill-structured" representations. Such a distinction is used to further differentiate diagrams into finer-grained types, loosely corresponding to sketches and drafting-type diagrams, and it is argued that ill-structured, open-ended problems, like the preliminary phases of design problem solving, need "ill-structured" diagrammatic representations. Data from protocol studies of expert designers are used to support the thesis.
Sensory Discrimination in a Short-Term Trace Memory
We propose a fully recurrent neural network to model low-level auditory memory in a task to discriminate intensities of sequentially presented tones across a range of varying interstimulus intervals. In this model, memory represents a sensory-trace of the stimulus and takes the form of slow relaxation of a number of units to a globally attractive equilibrium value near zero. The same-different judgment is based on a derivative of the output of the dynamic memory. Gaussian noise added to unit activations was found to improve the resilience of stored information although at the cost of decreased sensitivity. The model exhibits many qualitative properties of human performance on a roving-standard intensity discrimination task.
A Speech Based Connectionist Model of Human Short Term Memory
In recent years connectionist modelling of Short Term Memory (STM) has been a popular subject of research amongst cognitive psychologists. The direct implications in natural language generation and processing, of the speech based phenomena observed in immediate recall STM experiments, make the development of a psychologically plausible STM model very attractive. In this paper we present a connectionist Short Term Store (STS) which is developed using both traditional STM theories of interference and decay trace. The proposed store has all the essential characteristics of human short term memory. It is capable of on-line storage and recall of temporal sequences, it has a limited span, exhibits clear primacy and recency effects, and demonstrates word-length and phonological similarity effects.
Does Memory Activation Grow with List Strength and/or Length
Recognition of an item from a list is typically modeled by assuming that the representations of the items are activated in parallel and combined or s u m m e d into a single measure (sometimes termed 'familiarity' or 'degree-ofmatch') on which a recognition decision is based. The present research asks whether extra items (lengtti), or extra repetitions (strength), increase this activation measure. Activation was assessed through examining hits and false alarms as the length or strength of word categories were varied. The use of a categorized list insured that response criteria were not changed across the length and strength manipulations. The results demonstrated that: 1) The activation does not change with an increase in the strength of presented items other than the test item; and 2) The activation is increased by an increase in the number of presented items in a category. The results provide important constraints for models of memory, because most models predict or assume either that activation grows with both length and strength, or grows with neither. In fact, the only extant model that can predict both the length and strength findings is the differentiation version of the S A M model (Shiffrin, RatcUff, & Clark, 1990).
Misinformed and Biased: Genuine Memory Distortions or Artifactual Phenomena
In the present study, two cognitive phenomena until now treated apart were compared to each other: hindsight bias and misinformation effect. Both phenomena result from the same basic retroactive-interference procedure focussing on how memory of originally encoded material is distorted by the encoding of subsequent, conflicting information. The results showed that subjects' recollections of the original information were similarly distorted under both conditions, that is, the amount of hindsight bias was as large as the misinformation effect. More fine-grained analyses, however, revealed important differences. With the additional results of a probability mixture model it was found that only hindsight subjects suffered from memory impairment and that, moreover, their recollections included genuine blends. The misinformation effect, on the other hand, turned out to be an artifact of averaging across two different sets of recollections. These results represent compelling data with respect to the ongoing discussion about the existence of genuine memory blends.
Neurally Motivaed Constraints on the Working Memory Capacity of a Production System for Parallel Processing: Implications of a Connectionist Model Based on Temporal Synchrony
The production system formulation plays an important role in models of cognition. However, there do not exist neurally plausible realizations of production systems that can support fast and automatic processing of productions involving variables and n-ary relations. In this paper we show that the neurally plausible model for rapid reasoning over facts and rules involving n-ary predicates and variables proposed by Ajjanagadde and Shastri can be interpreted as such a production system. This interpretation is significant because it suggests neurally motivated constraints on the capacity of the working memory of a production system capable of fast parallel processing. It shows that a large number of rules — even those containing variables — may fire in parallel and a large number of facts may reside in the working memory, provided no predicate is instantiated more than a smaU number of times (% 3) and the number of distinct entities referenced by the
Psychological Responses to Anomalous Data
A crucial aspect of understanding knowledge acquisition and theory change is understanding how people respond to anomsilous information. We propose that there are seven fundamental responses that people make to anomalous information. We provide evidence from the history of science and from psychology for each of these responses, and we present the results of a study that explores some of the factors that determine these responses.
Modelling Inductive And Deductive Discovery Strategies In Galilean Kinematics
This paper investigates how different strategies affect the success and efficiency of scientific discovery, by examining different approaches in Galilean kinematics. Computational models with biases for inductive or deductive approaches to discovery were constructed to simulate the processes involved in finding coherent and empirically correct sets of laws. The performance of the models shows that the best overall strategy is to begin with an inductive bias and then perform tight cycles of law generation and experimental testing. Comparison of the models with previous findings indicates that the best overall strategy for discovery depends on the relative ease of search in hypothesis and experiment spaces.
Complexity Management in a Discovery Task
Previous psychological research about scientific discovery has often focused on subjects' heuristics for discovering simple concepts with one relevant dimension or a few relevant dimensions with simple two-way interactions. This p^er presents results from an experiment in which subjects had to discover a concept involving complex three-way interactions on a multi-valued output by running experiments in a computerized microworld. Twenty-two C M U undergraduates attempted the task, of which sixteen succeeded, in an average of 85 minutes. The analyses focus on three strategies used to regulate task complexity. First, subjects preferred depth-first to breadth-first search, with successful subjects regulating the number of features varied from experiment to experiment most effectively. Second, subjects systematically regulated the length of their experiments. Third, a new explicit search heuristic (Put Upon Stack Heuristic) used by successful subjects is described.
Scientific Induction: Individual versus Group Processes and Multiple Hypotheses
It has been suggested that groups can evaluate multiple hypotheses better than individuals. The present study employed Wason's (1960) 2-4-6 task to examine the effects of multiple hypotheses in scientific induction. Subjects worked either individually or in four-member interacting groups. Subjects were also instructed to test either a single or a pair of hypotheses. The results indicate that groups perform significantly better than individuals. When testing multiple hypotheses, groups were more likely to determine the target hypothesis than individuals. Interacting groups generated more positive tests that received negative feedback and received more disconfirmation than individuals. When multiple hypotheses were tested, interacting groups used greater amounts of diagnostic tests than individuals. Interacting groups appear to search their experiment space and evaluate the evidence received better than individuals.
Team Cognition in the Cockpit: Linguistic Control of Shared Problem Solving
Communication of professional air transport crews (2- and 3-member crews) in simulated inflight emergencies was analyzed in order to determine (1) whether certain communication features distinguish high-performing from low-performing crews, and (2) whether crew size affects communication used for problem solving. Analyses focused on metacognitively explicit talk; i.e., language used to build a shared understanding of the problem, goals, plans and solution strategies. Normalized frequencies of utterances were compared during normal (low workload) and abnormal (high workload) phases of flight. Highperforming captains, regardless of crew size, were found to be more metacognitively explicit than low-performing captains, and effective captains in 3-member crews were found to be most explicit. First officers' talk complemented their captains' talk: First officers in lowperforming crews tended to be more explicit than first officers in high-performing crews.
A Unified Process Model of Syntactic and Semantic Error Recovery in Sentence Understanding
The development of models of human sentence processing hais traditionally followed one of two paths. Either the model posited a sequence of processing modules, each with its own taskspecific knowledge (e.g., syntax and semantics), or it posited a single processor utilizing different types of knowledge inextricably integrated into a monolithic knowledge base. Our previous work in modeling the sentence processor resulted in a model in which different processing modules used separate knowledge sources but operated in paral el to arrive at the interpretation of a sentence. One highlight of this model is that it offered an explanation of how the sentence processor might recover from an error in choosing the meaning of an ambiguous word: the semantic processor briefly pursued the different interpretations associated with the different meanings of the word in question until additional text confirmed one of them, or until processing limitations were exceeded. Errors in syntactic ambiguity resolution were assumed to be handled in some other way by a separate syntactic module. Recent experimental work by Laurie Stowe strongly suggests that the human sentence processor deals with syntactic error recovery using a mechanism very much like that proposed by our model of semantic error recovery. Another way to interpret Stowe's finding that two significantly different kinds of errors are handled in the same way is this: the human sentence processor consists of a single unified processing module utilizing multiple independent knowledge sources in parallel. A sentence processor built upon this architecture should at times exhibit behavior eissociated with modular approaches, and at other times act like an integrated system. In this paper we explore some of these ideas via a prototype computational model of sentence processing ca led C O M P E R E , and propose a set of psychological experiments for testing our theories.
What You Infer Might Hurt You - A Guiding Principle for a Discourse Planner
Most Natural Language Generation systems developed to date assume that a user will learn only what is explicitly stated in the discourse. This assumption leads to the generation of discourse that slates explicitly all the information to be conveyed, and that does not address further inferences from the discourse. The content planning mechanism described in this paper addresses these problems by taking into consideration the inferences the user is likely to make from the presented information. These inferences are modeled by means of inference rules, which are applied in a prescriptive manner to generate discourse that conveys the intended information, and in a predictive mode to draw further conclusions from the presented information. In addition, our mechanism minimizes the generated discourse by presenting only information the user does not know or about which s/he has misconceptions. The domain of our implementation is the explanation of concepts in high school algebra.
Theme Construction from Belief Conflict and Resolution
Story themes are generalized advice that a story contains, and theme recognition provides a way for a system to show that it has understood the story. T H U N D E R is a story understanding system that implements a model of theme construction from belief conflicts and resolutions. A belief conflict is conflicting evaluative beliefs regarding a story character's plan. W h e n execution of the plan results in a realized success or failure for the character, a resolution to the conflict is recognized from the additional reasons that the realization provides for the evaluative beliefs in conflict. The theme of the story is generated by reasoning about how the resolution shows the beliefs in conflict to be correct or incorrect, and produces a statement of generalized advice about reasons for evaluation. T w o types of advice are generated by T H U N D E R : (1) reason advice about the reasons for evaluation that the story shows to be correct, and (2) avoidance advice about how failures that occur as the result of erroneous evaluations could be avoided. The algorithms for constructing both type of advice and examples of T H U N D E R constructing themes are presented.
Communicating Abstract Advice: the Role of Stories
People often give advice by telling stories. Stories both recommend a course of action and exemplify general conditions in which that recommendation is appropriate. A computational model of advice taking using stories must address two related problems: determining the story's recommendations and appropriateness conditions, and showing that these obtain in the new situation. In this paper, we present an efficient solution to the second problem based on caching the results of the first. Our proposal has been implemented in Brainstormer, a planner that takes abstract advice.
Primacy Effects and Selective Attention in Incremental Clustering
Incremental clustering is a type of categorization in which learning is unsupervised and changes to category structure occur gradually. While there has been little psychological research on this subject, several computational models for incremental clustering have been constructed. Although these models provide a good fit to data provided by some psychological studies, they overlook the importance of selective attention in incremental clustering. This paper compares the performance of two models, Anderson's (1990) rational model of categorization, and Fisher's (1987) C O B W E B , to that of human subjects in a task which stresses the importance of selective attention. In the study, subjects were shown a series of pictorial stimuli in one of two orders. The results showed that subjects focused their attention on the first extreme feature they saw, and later used this feature to classify ambiguous stimuli. Both models fail to predict human performance. These results indicate the need for a selective attention mechanism in incremental clustering as well as provide one constraint on h o w such a mechanism might work.
Some Epistemic Benefits of Action: Tetris, a Case Study
We present data and argument to show that in Tetris—a read-time interactive video game—certain cognitive and perceptual problems are more quickly, easily, auid reliably solved by performing actions in the world rather than by performing computational actions in the head alone. W e have found that some translations and rotations are best understood as using the world to improve cognition. They are not being used to implement a plan, or to implement a reaction. To substantiate our position we have implemented a computational laboratory that lets us record keystrokes and game situations, as well as alIows us to dynamically create situations. Using the data of over 30 subjects playing 6 games, tachistoscopic tests of some of these subjects, and results from our own successful efforts at building expert systems to play Tetris, we show why knowing how to use one's environment to enhance speed and robustness aie important components in skilled play.
Reference Features as Guides to Reasoning About Opportunities
An intelligent agent acting in a complex and unpredictable world must be able to both plan ahead and react quickly to changes in its surroundings. In particular, such an agent must be able to react quickly when faced with unexpected opportunities to fulfill its goals. W e consider the issue of h o w an agent should respond to perceived opportunities, and w e describe a method for determining quickly whether it is rational to seize an opportunity or whether a more detailed analysis is required. Our system uses a set of heuristics based on reference features to identify situations and objects that characteristically involve problematic patterns of interaction. W e discuss the recognition of reference features, and their use in focusing the system's reasoning onto potentially adverse interactions between its ongoing plans and Uie current opportunity.
The Evolutionary Induction of Subroutines
In this paper' we describe a genetic algorithm capable of evolving large programs by exploiting two new genetic operators which construct and deconstruct parameterized subroutines. These subroutines protect useful partial solutions and help to solve the scaling problem for a class of genetic problem solving methods. W e demonstrate our algorithm acquires useful subroutines by evolving a modular program from "scratch" to play and win at Tic-Tac-Toe against a flawed "expert". This work also amplifies our previous note (Pollack, 1991) that a phase transition is the principle behind induction in dynamical cognitive models.
Learning Several Lessons from One Experience
The architecture of an intelligent agent must include components that carry out a wide variety of cognitive tasks, including perception, goal activation, plan generation, plan selection, and execution. In order to make use of opportunities to learn, such a system must be capable of determining which system components should be modified as a result of a new experience, and how lessons that aie appropriate for each component's task can be derived from the experience. We describe an approach that uses a self-model as a source of information about each system component. The model is used to determine whether a component should be augmented in response to a new example, and a portion of the model, component performance specifications, are used to determine what aspects of an example are relevant to each component and to express the details of the lessons learned in vocabulary that is appropriate to the component. W e show how this approach is implemented in the CASTLE system, which learns strategic concepts in the domain of chess.
Energy Minimization and Directionality in Phonological Theories
Goldsmith (1990.1991) and Lakoff (in press) have both proposed phonological theories involving parallel constraint satisfaction, and making explicit reference to Smolensky's (1986) harmony theory. W e show here that the most straightforward implementation of phonological constraint satisfaction models as spin glasses does not work, due to the need for directionality in constraints. Imposing directionality negates some of the advantages hoped for from such a model. We have developed a neural network that implements a subset of the operations in the Goldsmith and Lakoff phonological theories, but proper behavior requires asymmetric connections and essentially feed-forward processing. After describing the architecture of this network w e will move on to the issue of whether spin glass models are really an appropriate metaphor for phonological systems.
Integrating Category Acquisition with Inflectionial Marking: A Model of the German Nominal System
Linguistic categories play a key role in virtually every theory that has a bearing on human language. This paper presents a connectionist model of grammatical category formation and use, within the domain of the German nominal system. The model demonstrates (1) how categorical information can be created through cooccurrence learning; (2) how grammatical categorization and inflectional marking can be integrated in a single system; (3) how the use of cooccurrence information, semantic information and surface feature information can be usefully combined in a learning system; and (4) how a computational model can scale up toward simulating the full range of phenomena involved in an actual system of inflectional morphology. This is, to our knowledge, the first connectionist model to simultaneously address all these issues for a domain of language acquisition.
Rules or Connections? The Past Tense Revisited
We describe a connectionist model of the past tense that generates both regular and irregular past tense forms with good generalization. The model also exhibits frequency effects that have been taken as evidence for a past tense rule (Pinker, 1991) and consistency effects that are not predicted by rule-based accounts. Although not a complete account of the past tense, this work suggests that connectionist models may Capture generalizations about linguistic phenomena that rule-based accounts miss.
A Connectionist Account of English Inflectional Morphology: Evidence from Language Change
One example of linguistic productivity that has been much discussed in the developmental literature is the verb inflection system of English. Opinion is divided on the issue of whether regular^rregular distinctions in surface behavior must be attributed to an underlying distinction in the mechanisms of production. Looking at the course of historical development in English, the current paper evaluates potential shortcomings of two competing approaches. T w o sets of simulations are presented. The first argues that a single-mechanism model offers a natural account of historical facts that would be problematic for a dual mechanism approach. The second addresses a potential problem for a single-mechanism account, the question of default behavior, and demonstrates that even in the absence of superior type frequency a network is capable of developing a "default" category. W e conclude that the single network account offers a more promising mechanism for explaining English verb inflection.'
Learning Language in the Service of a Task
For language comprehension, using an easily specified task instead of a linguistic theoretic structure as the target of training and comprehension ameliorates several problems, and using constraint satisfaction as a processing mechanism ameliorates several more: namely, 1) stipulating an a priori linguistic representation as a target is no longer necessary, 2) meaning is grounding in the task, 3) constraints from lexical, syntactic, and task oriented information is easily learned and combined in terms of constraints, and 4) the dramatically informal, "noisy" grammar of natural speech is easily handled. The task used here is a simple jigsaw puzzle wherein one subject tells another where to place the puzzle blocks. In this paper, only the task of understanding to which block each command refers is considered. Accordingly, the inputs to a recurrent P D P model are the consecutive words of a command presented in turn and the set of blocks yet to be placed on the puzzle. The output is the particular block referred to by the command. In a first simulation, the model is trained on an artificial corpus that captures important characteristics of subjects' language. In a second simulation, the model is trained on the actual language produced by 42 subjects. The model learns the artificial corpus entirely, and the natural corpus fairly well. The benefits of embedding comprehension in a communicative task and the benefits of constraints satisfaction are discussed.
On the Unitization of Novel, Complex Visual Stimuli
We investigated the degree to which novel conjunctions of features come to be represented as perceptual wholes. Subjects were trained in a visual search task using novel, conjunctively defined stimuli composed of discrete features. The stimulus sets were designed so that successful search required identification of a conjunction of at least two features. With extended training, the slope of the search functions dropped by large amounts. Various transfer tasks were used to rule out the possibility that the organization of sequential search strategies involving simple features could account for this result. The perceptual discriminability or confusibility of the stimuli exerted an important influence on the rate of unitization. The nature of the perceptual unit appears to depend on the subset of features which are diagnostic for carrying out a particular discrimination task. The results provide important constraints for models of visual perception and recogniHon.
Discovering and Using Perceptual Grouping Principles in Visual Information Processing
Despite the fact that complex visual scenes contain multiple, overlapping objects, people perform object recognition with ease and accuracy. Psychological and neuropsychological data argue for a segmentation process that assists in object recognition by grouping low-level visual features based on which object they belong to. W e review several approaches to segmentation/recognition and argue for a bottom-up segmentation process that is based on feature grouping heuristics. The challenge of this approach is to determine appropriate grouping heuristics. Previously, researchers have hypothesized grouping heuristics and then tested their psychological validity or computational utility. W e suggest a basic principle underlying these heuristics: they are a reflection of the structure of the environment. W e have therefore taken an adaptive approach to the problem of segmentation in which a system, called magic, learns how to group features based on a set of presegmented examples. Whereas traditional grouping principles indicate the conditions under which features should be bound together £is part of the same object, the grouping principles learned by magic also indicate when features should be segregated into different objects. W e describe psychological studies aimed at determining whether limitations of MAGIC correspond to limitations of human visual information processing.
The Role of Genericity in the Perception of Illusory Contours
Visual images are ambiguous. Any given image, or collection of images, is consistent with an infinite number of possible states of the external world. Yet, the human visual system seems to have little difficulty in reducing this potential uncertainty to one, or perhaps a few perceptual interpretations. Many vision researchers have investigated what sort of constraints— assumptions about the external world and the images formed of it—that the visual system might be using to arrive at its perceptions. One important class of constraints are those based on genericity or general position. W e propose a theory of illusory contours in which general position assumptions are used to infer certain necessary conditions for the occurrence of illusory figures that appear to occlude their inducers. Experiments with human subjects are described. The results of these experiments suggest an important role for general position assumptions in understanding the perception of illusory contours. It is also demonstrated that parallelism of contours of "blob" type inducers is an important determinant of illusory contour strength.
Perceiving the Siz of Trees Via Their Form
Physical constraints on growth produce continuous variations in the shape of biological objects that correspond to their sizes. We investigated whether two such properties of tree form can be visually discriminated and used to evaluate the height of trees. Observers judged simulated tree silhouettes of constant image size. Comparison was made to judgments of real trees in natural viewdng conditions. Tree form was shown to confer an absolute metric on ground texture gradients. Eyeheight information was also shown to be ineffective as an alternative source of absolute scale.
Toward the Origins of Dyslexia
A series of experiments will be reported comparing the performance of groups of 11 year old and IS year old children with dyslexia with groups of normal children matched for chronological age (CA) and reading age (RA) respectively. Experiments testing gross motor skill demonstrated that both groups of children with dyslexia showed significant deficits in balance when required to undertake a further task at the same time, whereas the control groups were not affected. It was concluded that the control children balanced automatically whereas the children with dyslexia did not. Further experiments indicated that working memory performance was not diswdered, in that deficits in memory span were paralleled by deficits in speed of articulation. Tests of information processing speed led to an interesting dissociation. Simple reaction performance was indistinguishable from that of the C A controls. By contrast, on the simplest possible choice reaction, both groups of children with dyslexia were slowed to the level of their R A controls. It was concluded that the locus of the speed deficit lay within the decision-making process. Further experiments demonstrating deficits in sensory thresholds and abnormal evoked potentials will also be reported. W e conclude that an auiomatisation deficit is consistent with most of the known problems of children with dyslexia.
A Memory Architecture for Case-Based Argumentation
This paper describes a memory organization that supports intelligent memory-based argumentation. Our goal is to build a system that can argue opposite sides of an issue by retrieving stories that support or oppose it. Rather than attempting to determine how a story relates to a point on the fly, we explicitly represent the points that the stories support or oppose, as well cis how they support or oppose those points. We have developed a hierarchy of story point types; associated with each type is a set of rhetorical templates, which describe the ways that a story could support or oppose a point of that type. Each template consists of a series of assertion types on which the argument depends. This enables the program to attack intelligently the foundations of the point it is trying to refute. Our approach is being developed within the context of the ILS Story Archive, a large multimedia case base which includes stories from a wide variety of domains.
Constructive Similarity Assessment: Using Stored Cases to Define New Situations
A fundamental issue in case-based reasoning is similarity assessment: determining similarities and differences between new and retrieved cases. Many methods have been developed for comparing input case descriptions to the cases already in memory. However, the success of such methods depends on the input case description being sufficiently complete to reflect the important features of the new situation, which is not assured. In case-based explanation of anomalous events during story understanding, the anomaly arises because the current situation is incompletely understood; consequently, similarity assessment based on matches between known current features and old cases is likely to fail because of gaps in the current case's description. Our solution to the problem of gaps in a new case's description is an approach that we call constructive similarity assessment. Constructive similarity assessment treats similarity assessment not as a simple comparison between fixed new and old cases, but as a process for deciding which types of features should be investigated in the new situation and, if the features are borne out by other knowledge, added to the description of the current case. Constructive similarity assessment does not merely compare new cases to old: using prior cases as its guide, it dynamically carves augmented descriptions of new cases out of memory.
Generic Teleological Mechanisms and their Use in Case Adaptation
In experience-based (or case-based) reasoning, new problems are solved by retrieving and adapting the solutions to similar problems encountered in the past. An important issue in experience-based reasoning is to identify different types of knowledge and reasoning useful for different classes of caseadaptation tasks. In this paper, we examine a class of non-routine case-adaptation tasks that involve patterned insertions of new elements in old solutions. W e describe a modelbased method for solving this task in the context of the design of physical devices. The method uses knowledge of generic teleological mechanisms (GTMs such as cascading. Old designs are adapted to meet new functional specifications by accessing and instantiating the appropriate G T M . The Kritik2 system evaluates the computational feasibility and sufficiency of this method for design adaptation.
Representing Cases as Knowledge Sources that Apply Local Similarity Metrics
A model of case-based reasoning is presented that relies on a procedural representation for cases. In an implementation of this model, cases are represented as knowledge sources in a blackboard architecture. Case knowledge sources define local neighborhoods of similarity and are triggered if a problem case falls within a neighborhood. This form of "local indexing" is a viable alternative where global similarity metrics are unavailable. Other features of this approach include the potential for fine-grained scheduling of case retrieval, a uniform representation for cases and other knowledge sources in hybrid systems that incorporate case-based reasoning and other reasoning methods, and a straightforward way to represent the actions generated by cases. This model of case-based reasoning has been implemented in a prototype system ("Broadway") that selects from a case base automobiles that meet a car buyer's requirements most closely and explains its selections.
Multicases: A Case-Based Representation for Procedural Knowledge
This paper focuses on the representation of procedures in a case-based reasoner. It proposes a new method, the multicase, where several examples are merged without generalization into a single structure. The first part of the paper describes multicases as they are being implemented within the FLOABN project (Alterman, Zito- Wolf, and Carpenter 1991) and discusses some properties of multicases, including simplicity of use, ease of transfer between episodes, and better management of case detail. The second part presents a quantitative analysis of storage, indexing and decision costs based on a decision-tree model of procedures. This model shows that multicases have significantly reduced storage and decision costs compared to two other representation schemes.
Locally-to-Globally Consistent Processing in Similarity
SIAM, a model of structural similarity, is presented. SIAM, along with models of analogical reasoning, predicts that the relative similarity of different scenes will vary as a function of processing time. SIAM's prediction Is empirically tested by having subjects make speeded judgements about whether two scenes have the same objects. The similarity of two scenes with different objects is measured by the percentage of trials on which the scenes are called the same. Consistent with SIAM's prediction, similarity becomes increasingly influenced by the global consistency of feature matches with time. Early on, feature matches are most influential if they belong to similar objects. Later on, feature matches are most influential if they place objects in alignment in a manner that is consistent with other strong object alignments. The similarity of two scenes, rather than being a single fixed quantity, varies systematically with the time spent on the comparison.
Goal-Directed Processes in Similarity Judgement
This study explored the effects c: a goal and subject's knowledge in similsirity judgements. W e hypothesized that the process of computing similarity consist of two phases: the processes of explanation and feature comparison. W h e n a goal is salient and the knowledge required to achieve it is available, people compute similarity by explaining the goal in terms of a given state by using domain knowledge. Thus, in this case, rated similarity should be a function of the distance between the goal and the state. W h en the explanation fails, the judgements should instead to be based on the feature comparison. Expert, novice, and naive subjects were asked to solve the Tower of Hanoi puzzle. The subjects were required to judge the similarity between the goal and various states of the puzzle. The results showed that their judgements differed, depending on their expertise. While experts' ratings were best characterized by the number of operators necessary to transform a given state to the goal, those of naive subjects were completely based on the number of shared features. The second experiment revealed that the experts' judgements of similarity are not be due to learned contiguity through practice.
Correlated Properties in Artifact and Natural Kind Concepts
Property intercorrelations are viewed as central to the representation and processing of real-world object concepts. In contrast, prior research into real-world object concepts has incorporated the assumption that properties are independent and additive. In two studies, the role of correlated properties was explored. Property norms had been collected for 190 natural kinds and artifacts. In Experiment 1, property intercorrelations influenced performance in a property verification task. In Experiment 2, concept similarity, as measured by overlap of independent properties, predicted short interval priming latency for artifacts. In contrast, concept similarity, as measured by overlap of correlated property pairs, predicted short interval priming for natural kinds. The influence of property intercorrelations was stronger for natural kinds because they tended to contain a higher proportion of correlated properties. It was concluded that people encode knowledge about independent and correlated properties of real-world objects. Presently, a Hopfield netwoiic is being implemented to explore implications of allowing a system to encode property intercorrelations. Finally, results suggest that semantic relatedness can be defined in terms of property overlap between concepts.
Extending the Domain of a Feature-based Model of Property Induction
A connectionist model of argument strength, which applies to arguments involving natural categories and unfamiliar predicates, was proposed by Sloman (1991). The model applies to arguments such as robins have sesamoid bones, therefore hawks have sesamoid bones. The model is based on the hypothesis that argument strength is related to the proportion of the conclusion category's features that are shared by the premise categories. The model assumes a two-stage process in which premises are first encoded by connecting the features of premise categories to the predicate. Conclusions are then tested by examining the degree of activation of the predicate upon presentation of the features of the conclusion category. The current work extends the domain of the model to arguments with familiar predicates which are nonexplainable in the sense that the relation between the category and predicate of each statement is difficult to explain. W e report an experiment which demonstrates that both of the phenomena observed with single-premise specific arguments involving unfamiliar predicates are also observed using nonexplainable predicates. W e also show that the feature-based model can fit quantitatively subjects' judgments of the strengdi of arguments with familiar but nonexplainable predicates.
An Instantiation Model of category Typicality and Instability
According to the instantiation principle, when we make a judgment about a relatively superordinate category, we follow a two-step process. First, we instantiate the category into one or more subordinates. SeoHid, we make a judgment based on the subordinates. Instantiation theory :q>plied to typicality judgments makes the following predictions. W h e n subjects judge the typicality of a category A with respect to categ(My B, their mean typicality judgment should equal the weighted mean typicality (with respect to B ) of subwdinate categories of A. Furthermore, typicality judgments for category A will be unstable (i.e., have a high standard deviation) to the extent that A has a large number of diverse subordinates. The instantiation principle was implemented in a computer simulation, which used production frequencies and typicality ratings for subordinates to predict ratings for superordinate-level categories. In two experiments, subjects judged the typicalities of various animal and food categories. The instantiation model successfully predicted the means and standard deviations for the observed distributions of responses for these categories. Extensions and other applications of the instantiation principle are also briefly discussed.
Inhibition and Brain Computation
The synapse plays a fundamental role in the computations performed by the brain. The excitatory or inhibitory nauire of a synapse represents a (simplified) characterization of both the synapse itself and the computational role it plays in the larger circuit. M u ch speculation concerns the functional importance of excitation and inhibition in the physiology of the cerebral cortex. The current study uses neural network (connectionist) models to ask whether or not the relative proportion of inhibition (i.e., inhibitory synapses) and excitation (i.e., excitatory synapses) in the brain affects the development of its neural networks? The results are affirmative: A n artificial neural network, designed to perform a particular task involving winner-take-all output nodes, is sensitive to the initial configuration of positive (excitatory) and negative (inhibitory) connections (synapses), such that it learns considerably faster when started with 60-75% inhibitory connections than when it includes a greater or lesser proportion than this. Implications of this result for neuroanatomy and neurophysiology are discussed.
Relearning after Damage in Connectionist Networks: Implications for Patient Rehabilitation
Connectionist modeling is applied to issues in cognitive rehabilitation, concerning the degree and speed of recovery through retraining, the extent of generalization to untreated items, and how treated items are selected to maximize this generalization. A network previously used to model impairments in mapping orthography to semantics is retrained after damage. The degree of relearning and generalization varies considerably for different lesion locations, and has interesting implications for understanding the nature and variability of recovery in patients. In a second simulation, retraining on words whose semantics are atypical of their category yields more generalization than retraining on more prototypical words, suggesting a surprising strategy for selecting items in patient therapy to maximize recovery.
Modelling Paraphasias in Normal and Aphasic Speech
We model word substitution errors made by normal and phasic speakers with an interactive activation model of lexicalization. This comprises a three-layer architecture of semantic, lexical, and phonological units. W e test four hypotheses about the origin of aphasic word substitutions: that they result from pathological decay, loss of within-level inhibitory connections, increased initial random noise, or reduced flow of activation from the semantic to the lexical level. W e conclude that a version of the flnal hypothesis best explains the aphasic data, but with random fluctuations in connection strength rather than a uniform decrement This model accounts for aspects of recovery in aphasia, and frequency and imageability effects in paraphasias. Pathological lexical access is related to transient lexical access difficulties in normal speakers to provide an account of normal word substitution errors. W e argue that similar constraints operate in each case. This model predicts imageability and frequency effects which are verified by analysis of our normal speech error data.
Liguistic Permeability of Unilateral Neglect: Evidence from American Sign Language
Unilateral visual neglect is considered primarily an attentional deficit in which a patient fails to report or orient to novel or meaningful stimuli presented contralateral to a hemispheric lesion (Heilman et al. 198S). A recent resurgence of interest in attentional disorders has led to more thorough investigations of patients exhibiting neglect and associated disorders. These studies have begun to illuminate specific components which underlie attentional deficits, and further serve to explicate interactions between attentional mechanisms and other cognitive processes such as lexical and semantic knowledge. The present paper adds to this growing literature and presents a case study of a deaf user of Am^ican Sign Language who evidences severe unilateral left neglect following a right cerebral infarct Surprisingly, his ability to identify visually presented linguistic signs is unaffected by the left neglect, even when the signs fall in his contralesional visual field. In contrast the identification of non-linguistic objects presented to the contralesional visual field is greatly impaired. This novel and important finding has implications for our undo-standing of the domain specificity of attentional disorders and adds new insights into the interactions and penetrability of neglect in the face of linguistic knowledge. These results are discussed in relation to the computation model of neglect proposed by Mozer and Behrmann (1990).
Hippocampal-System Function in Stimulus Representation and Generalization: A Computational Theory
We propose a computational theory of hippocampalsystem function in mediating stimulus representation in associative learning. A connectionist model based on this theory is described here, in which the hippocampal system develc^s new and adaptive stimulus repesentations which are predictive, distributed, and compressed: other cortical and cerebellar modules are presumed to use these hippocampal representations to recode their o w n stimulus representations. This computational theory can been seen as an extension and/or refinement of several prior characterizations of hippocampal function. including theories of chunking, stimulus selection, cue-configuration, and contextual coding. The theory does not address temporal aspects of hippocampal function. SimulaticMis of the intact and lesioned model provide an account of data on diverse effects of hippocampal-region lesions, including simple discrimination learning, sensory preconditioning. reversal training, latent inhibition, contextual shifts, and ccMifigural learning. Potential implications of this theory for understanding human declarative m e m o y , temporal processing. and neural mechanisms are briefly discussed.
Lerning Distributed Representations for Syllables
This paper presents a connectionist model of how representations for syllables might be learned from sequences of phones. A simple recurrent network is trained to distinguish a set of words in an artificial language, which are presented to it as sequences of phonetic feature vectors. The distributed syllable representations that are learned as a side-effect of this task are used as input to other networks. It is shown that these representations encode syllable structure in a way which permits the regeneration of the phone sequences (for production) as well as systematic phonological operations on the representations.
Finding Liguistic Structure with Recurrent Neural Networks
Simple recurrent networks have been used extensively in modelling of learning various aspects of linguistic structure. W e discuss h o w such networks can be trained, and empirically compare two training algorithms, Elman's "copyback" regime and back-propagation through time, on simple tasks. Although these studies reveal that the copyback architecture has only a limited ability to pay attention to past input, other work has shown that this scheme can learn interesting linguistic structure in small grammars. In particular, the hidden unit activations cluster together to reveal linguistically interesting categories. W e explore various ways in whkrh this clustering of hidden units can be performed, and find that a wide variety of different measures produce similar results and appear to be implicit in the statisticsof the sequences learnt This perspective suggests a number of avenues for further research.
A Phonologically Motivated Input Representation for the Modelling of Auditory Word Perception in Continuous Speech
Representational choices are crucial to the success of connectionist modelling. Most previous models of auditory word perception in continuous speech have relied upon a traditional Chomsky-Halle style inventory of features; many have also postulated a localise phonemic level of representation mediating a featural and a lexical level. A different immediate representation of the speech input is proposed, motivated by current developments in phonological theory, namely Government Phonology. The proposed input representation consists of nine elements with physical correlates. A model of speech perception employing this input representation is described. Successive bundles of elements arrive across time at the input. Each is mapped, by means of recurrent connections, onto a window representing the current bundle and a context consisting of three such bundles either side of the current bundle. Simulations demonstrate the viability of the proposed input representation. A simulation of the compensation for coarticulation effect (Elman and McClelland. 1989) demonstrates an interpretation which does not involve top-down interaction between lexical and tower levels. The model described is envisaged as part of a wider model of language processing incorporating semantic and orthographic levels of representation, with no local lexical entries. ^
A PDP Approach to Processing Center-Embedded Sentences
Recent PDP models have been shown to have great promise in contributing to the understanding of the mechanisms which subserve language ixt)cessing. In this paper we address the specific question of h o w multiply embedded sentences might be processed. It has been shown experimentally that comprehension of center-embedded structures is poor relative to right-branching structures. It also has been demonstrated that this effect can be attenuated, such that the presence of semantically constrained lexical items in center-embedded sentences improves processing performance. This raises two questions: (1) What is it about the processing mechanism that makes center-embedded sentences relatively difficult? (2) H o w are the effects of semantic bias accounted for? Following an approach outlined in Elman (1990, 1991), w e train a simple recurrent network in a prediction task on various syntactic structures, including center-embedded and right-branching sentences. A s the results show, the behavior of the network closely resembles the pattern of experimental data, both in yielding superior performance in right-branching structures (compared with center-embeddings), and in processing center-embeddings better when they involve semantically constrained lexical items. This suggests that the recurrent network may provide insight into the locus of similar effects in humans.
Forced Simple Recurrent Neural Networks and Grammatical Inference
A simple recurrent neural network (SRN) introduced by Elman [l990] can be trained to infer a regular grammar from the positive examples of symbol sequences generated by the grammar. The network is trained, through the back-propagation of error, to predict the next symbol in each sequence, as the symbols are presented successively as inputs to the network. The modes of prediction failure of the SRN uchitecture are investigated. The SRN's internal encoding of the context (the previous symbols of the sequence) is found to be sufficiently developed when a particular aspect of context is not required for the immediate prediction at some point in the input sequence, but is required later. It is shown that this mode of failure can be avoided by using the auto-associative recurrent network (AARN). The AARN architecture contains additional output units, which are trained to show the current input and the current context. The effect of the size of the training set for grammatical inference is also considered. The RN has been shown to be effective when trained on an infinite (very large) set of positive examples [Servan-Schreiber et al, 1991]. When a finite (small) set of positive training data is used, the SRN architectures demonstrate a lack of generalization capability. This problem is solved through a new training algorithm that uses both positive and negative examples of the sequences. Simulation results show that when there is restriction on the number of nodes in the hidden layers, the AARN succeeds in the cases where the SRN fails.
Decomposition of Temporal Sequences
This paper deals with the decomposition of temporal sequences and the emergence of events. The problematic nature of various definitions of events is first reviewed and an hypothesis - the cut hypothesis - is proposed. The cut hypothesis states that a sequence of stimuli is cut out to become a cognitive entity if it is repeatedly experienced in different contexts. The hypothesis can thus explain the emergence of events on the basis of former experience. Two experiments were conducted to compare the predictions of the cut hypothesis to the predictions of two other explanations, explanation by association and explanation by changes along the sequence of stimuli. The first experiment showed that subjects better recognized a certain secpience after seeing it repeated as a whole than after seeing it as a part of another repeating sequence. The second experiment demonstrated that after experiencing a certain repeating sequence subjects would hardly consider dividing in its midst even though that point was a point of maximal change, as evidenced by divisions
The Role of Correlational Structure in Learning Event Categories
How do people learn categories of simple, transitive events? W e claim that people attempt to recover from input the predictive structure that is the basis of 'good', inferentially rich categories. Prior work with object categories found facilitation in teaming a component relation (e.g. feathers covary with beak) when that correlation was embedded in a system of other, mutually relevant correlations. Little research has investigated event categories, but researchers have suggested that verb meanings (hence perhaps event categories) might be organized quite differently from noun meanings (and object categories). Thus it is far from clear whether the learning biases or procedures found for object categories will also appear for event learning. T w o experiments investigated the effects of systematic correlational structure on learning the regularities axnprising a set of event categories. Both found the same pattern of facilitation from correlational coherence as found earlier with object categories. W e briefly discuss relations to 1) other constraints on concept learning that focus on the organization of the whole system of concepts and 2) learning paradigms that produce competition, not facilitation, between corrdated cues.
Development of Schemata During Event Parsing
The present work combines both process level descriptions and learned knowledge structures in a simple recurrent connectionist network to model human parsing judgements of two videotaped event sequences. The network accomodates the complex event boundary judgement time-series and provides insight into the activation and development of schemata and their role during encoding.
Skill as the Fit Between Performer Resources and Task Demands: A Perspective from Software Use and Learning
This paper goes beyond the routine vs. adaptive expertise distinction seen most recently in Holyoak (1991) by offering a framework which locates skill in the fit between performer resources and task demands. Empirical support for this framework is derived from a review of the literature about "real world" software learning and usage. ^
The Nature of Expertise in Anagram Solution
Second-generation theories of expertise have stressed the knowledge differences between experts and novices and have used the serial architecture of the production system as a model for both expert and novice problem solving. Recently, Holyoak (1991) has proposed a third generation of theories based on the idea of expertise-related differences in the processing of solution constraints. According to this view, the problem solving of experts, in contrast to that of novices, often is better characterized as a process of satisfying multiple solution constraints in parallel than as a process of serially testing and rejecting hypotheses. W e provide data from three experiments that are consistent with this hypothesis for the domain of anagram solution.
Allocation of Effort to Risky Decisions
This research investigates expertise at decision making under risk and the allocation of cognitive effort as risky decisions are made^- We conceptualize risk within a space defined by decision variables that managers monitor in their environment. W e present a representation of the risk space that captures h o w foreign exchange traders understand risk in spot currency markets. Results from an experiment with professional traders as subjects show that the risk space explains whether and when traders make decisions to buy, sell, and hold spot positions in foreign currencies. An index of cognitive effort is presented that can be used to predict subjects' level of confidence in their assessments of market behavior. Effort is relatively high when conditions are likely to trigger uncertainty. Effort is relatively low when markets act as expected.
A Constraint Satisfaction Model of Cognitive Dissonance Phenomena
A constraint satisfaction network model simulated cognitive dissonance data from the insufficient justification and free choice paradigms. The networks captured the psychological regularities in both paradigms. In the case of finee choice, the model fit the human data better than did cognitive dissonance theory.
A Rational Theory of Cognitive Strategy Selection and Change
This paper presents a rational theory of cognitive strategy selection and change in which the cognitive agent in consideration is proposed to be adaptive in choosing the "best" or optimal strategy from a set of strategies available to be employed. The optimal strategy is assumed to maximize the difference between the expected utility of the goal which the selected strategy would lead to and the computational cost associated with achieving this goal. We considered an example of strategy selection and change in computer programming and interpreted the results from a set of experimental studies we had conducted in this domain in the light of this rational framework. W e also substantiated our theoretical claims by developing a computer simulation of this example. The simulation was implemented in ACTR, a cognitive model constrained by rational analysis as well as by experimental data.
Simultaneous Question Comprehension and Answer Retrieval
A model is described for question comprehension in which parsing, memory activation, identification and application of retrieval heuristics, and answer formulation are highly interactive processes operating in parallel. The model contfasts significantly with serial models in the literature, although it is more in line with parallel models of sentence comprehension. T w o experiments are described in support of the parallel view of question answering. In one, differential reading times for different question types were shown to be present only when subjects intended to answer the questions they were reading. In another, reading times for words in questions increased and answering times decreased when a unique answer could be identified early in the questions. The results suggest that source node activation and answer retrieval begin during parsing. Both symbolic and connectionist approaches to modeling question answering are potentially influenced by this perspxxtive.
Implicit Argument Inferences in On-Line Comprehension
While people are capable of constructing a variety of inferences during text processing, recent work on inferences suggests that only a restricted number of inferences are constructed on-line. We investigated whether implicit semantic information associated with the arguments of verbs is automatically encoded. Short passives such as "The ship was sunk" are intuitively understood as containing an implicit Agent, e.g. that someone is responsible for the ship's sinking. T o investigate whether implicit Agents are encoded automatically, short passives were compared to intransitive sentences with the same propositional content The experimental logic used depended on a specific property of rationale clauses such as "to collect an insurance settlement"; namely. Uiat the contextual element associated with the understood subject of the rationale clause must be capable of volitional action. If people encode an implicit Agent while iMYKCssing short passives, then they should be able to associate it with the understood subject of a rationale clause. N o such association should be possible with inb-ansitives. In two experiments, intransitives elicited longer reading times and were judged to be less felicitous than short passives at the earliest point possible in the the rationale clause. Short passives were judged fully felicitous and their reading times did not differ from control sentences with explicit agents.
Another Context Effect in Sentence Processing: Implications for the Principle of Referential Support
A major goal of psycholinguistics is to determine what sources of information are used immediately in language comprehension, and what sources come into play at later stages. Prepositional phrase attach-ment ambiguities were used in a self-paced reading task to compare contexts that contained one or two possible referents for the verb phrase (VP) in the target sentence. With one set of sentences, a VP-attachment preference was observed in the 2-VP-referent context, but not in the 1-VP-referent context. With another set of sentences, no effect of context was observed. This result falls outside of the scope of the principle of referential support (Altmann & Steedman, 1988) as currently formulated. It suggests that a similar but more broadly-based theory is required.
Consulting Temporal Context During Sentence Comprehension: Evidence from the Monitoring of Eye Movements in Reading
An important aspect of language processing is the comprehender's ability to determine temporal relations between an event denoted by a verb and events already established in the discourse. This often requires the tense of a verb to be evaluated in relation to specific temporal discourse properties. W e investigate the time course of this process by examining how the temporal properties of a discourse influence the initial processing of temporarily ambiguous reduced relative clauses. M u c h of tfie empirical work on the reading of reduced relative clauses has revealed that readers experience a large mis-analysis effect (or *gardenpatfi') in reduced relatives like "The student spotted bv the proctor received a warning" because the reader has initially interpreted the verb "spotted" as a past tense verb in a main clause. Recent results from an eye movement study are provided which irnlicate that this mis-analysis of relative clauses can be eliminated when the temporal constraints of a discourse do not easily permit a main clause past tense interpretation. Such a flnding strongly suggests that readers process tense in relation to the temporal properties of the discourse, and that constraints from these properties can n^ndly influence processing at a stnKtural level.
Plausibility and Syntactic Ambiguity Resolution
Different theories of human syntactic parsing make conflicting claims concerning the role of non-syntactic information (e.g. semantics, real world knowledge) on on-line parsing. W e address this debate by examining the effect of plausibility of thematic role assignments on the processing of syntactic ambiguities. In a selfpaced reading experiment, ambiguous condition reading times were longer than unambiguous condition times at the point of syntactic disambiguation only when plausibility cues had supported the incorrect interpretation. Off-line measures of plausibility also predicted reading time effects in regression analyses. These results indicate that plausibility information m ay influence thematic role assignment and the initial interpretation of a syntactic ambiguity, and they argue against parsing models in which the syntactic component is blind to plausibility information.
The Time Course of Metaphor Comprehension
This research investigates the process by which people understand metaphors. W e apply processing distinctions from computational models of analogy to derive predictions for psychological theories of metaphor. W e distinguish two classes of theories: those that begin with a matching process (e.g Centner & Clement, 1988; Ortony, 1979) and those that begin with a mapping process (e.g. Gluckserg and Keysar, 1990). In matching theories, processing begins with a comparison of the two terms of the metaphor. In mapping theories, processing begins by deriving an abstraction from the base (or vehicle) term, which is then mapped to the target (or topic). In three experiments, w e recorded subjects' time to interpret metaphors. The metaphors were preceded by either the base term, the target term, or nothing. The rationale was as follows. First, interpretations should be faster with advanced terms than without, simply because of advanced encoding. The important prediction is that if the initial process is mapping from the base, then seeing the base in advanced should be more facilitative than seeing the target in advanced. Matching models predict no difference in interpretation time between base and target priming. The results gena:ally supp(Hied matching-fixst models, although support f w mapping-first models was found with highly conventional metaphors.
Is the Future Always Ahead? Evidence for System-Mappings in Understanding Space-Time Metaphors
Languages often use spatial terms to talk about time. FRONT - BACK spatial terms are the terms most often imported from SPACE to TIME cross-linguistically. However, in English there are two different metaphorical mapping systems assigning FRONT - BACK to events in lime. This research examines the psychological reality of the two mapping systems: specifically, w e ask whether subjects construct global domain-mappings between SPACE and TIME when comprehending sentences such as "Graduation lies before her" and "His birthday comes before Christmas." Two experiments were conducted lo test the above question. In both experiments, subjects' comprehension time was slowed down when temporal relations were presented across the two different metaphorical systems inconsistently. This suggests that people had to pay a substantial remapping cost when the mapping system was switched from one to the other. The existence of domain mappings in on-line processing further suggests that the two SPACE/TIME metaphorical mapping systems are psychologically real.
Indirect Analogical Mapping
An Indirect Analogical Mapping Model (IMM) is proposed and preliminary tests are described. Most extant models of analogical mapping enumerate explicit units to represent all possible correspondences between elements in the source and target analogs. IMM is designed to conform to more reasonable assumptions about the representation of propositions in human memory. It computes analogical mappings indirectly -- as a form of guided retrieval -- and without the use of explicit mapping units. IMM's behavior is shown to meet each of Holyoak and Thagard's (1989) computational constraints on analogical mapping. For their constraint of pragmatic centrality, I M M yields more intuitive mappings than does Holyoak and Thagard's model.
Visual Analogical Mapping
This paper describes some results of research aimed at understanding the structures and processes required for understanding analogical thinking that involves images and diagrams. W e will describe V A M P . l and V A M P . 2 , two programs for visual analogical mt^ping. VAMP.l uses a knowledge representation scheme proposed by Janice Glasgow that captures spatial information using nested threedimensional arrays. V A M P . 2 overcomes some limitations of V A M P . l by replacing the array representation with a scheme inspired by Minsky's Society of Mind and connectionism.
Probing the Emergent Behavior of Tabletop, an Architecture Uniting High-Level Perception with Analogy-Making
Tabletop is a computer model of analogy-making that has a nondeterministic parallel architecture. It is based on the premise that analogy-making is a by-product of high-level perception, and it operates in a restricted version of an everyday domain: that of place-settings on a table. The domain's simplicity helps clarify the tight link between perception and analogy-making. In each problem, a table configuration is given; the user, hypothetically seated at the table, points at some object. The program responds by doing "the same thing", as determined from the opposite side of the table. Being nondeterministic, Tabletop acts differently when run repeatedly on any problem. Thus to understand how diverse pressures affect the program, one must compile statistics of many runs on many problems. Tabletop was tested on several families of interrelated problems, and a performance landscape was built up, representing its "likes" and "dislikes". Through qualitative comparisons of this landscape with human preferences, one can assess the psychological realism of Tabletop's "uste".
Concept Learning and Flexible Weighting
We previously introduced an exemplar model, named GCM-ISW, that exploits a highly flexible weighting scheme. Our simulations showed that it records faster learning rates and higher asymptotic accuracies on several artificial categorization tasks than models with more limited abilities to warp input spaces. This paper extends our previous work; it describes experimental results that suggest human subjects also invoke such highly flexible schemes. In particular, our model provides significantly better fits than models with less flexibility, and we hypothesize that humans selectively weight attributes depending on an item's location in the input space.
Adaptation of Cue-Specific Learning Rates in Network Models of Human Category Learning
Recent engineering considerations have prompted an improvement to the least mean squares (LMS) learning rule for training one-layer adaptive networks; incorporating a dynamically modifiable learning rate for each associative weight accellerates overall learning and provides a mechanism for adjusting the salience of individual cues (Sutton, 1992a,b). Prior research has established that the standard L M S rule can characterize aspects of animal learning (Rescorla & Wagner, 1972) and human category learning (Gluck & Bower, 1988a,b). W e illustrate here how this enhanced L M S rule is analogous to adding a cue-salience or attentional component to the psychological model, giving the network model a means for discriminating between relevant and irrelevant cues. W e then demonstrate the effectiveness of this enhanced L M S rule for modeling human performance in two non-stationary learning tasks for which the standard L M S network model fails to adequately account for the data (Hurwitz, 1990; Gluck, Glauthier, & Sutton, in preparation).
Abstractional and Associative Processes in Concept Learning: A Simulation of Pigeon Data.
Symbolic and associative theories ha\e l)ccn claimed to be able to account for concept learning from examples. Given that there seems to be enough empirical evidence supiwrting both claims, we have tried to integrate associative and symbolic Tomiulations into a single com|Mitational model that abstracts infonmation Trom empirical data at the same time that it takes into account the strength with which each hypothesis is associated with lewanL The model is tested in a simulation of pigeon data in a fuzz\ concept learning task, where only a few abstractions are stored in representation of ail the training patterns and strengthcd or weakened depending on their predictive value.
Multivariable Function Learning: Applications of the Adaptive Regression Model to Intuitive Physics
We investigated multivariable function learning--the acquisition of quantitative mappings between multiple continuous stimulus dimensions and a single continuous response dimension. Our subjects learned to predict amounts of time that a ball takes to roll down inclined planes varying in length and angle of inclination. Performance with respect to the length of the plane was quite good, even very early in learning. On the other hand, performance with respect to the angle of the plane was systematically biased early in learning, but eventually became quite good. An extension of K o h and Meyer's (1991) adaptive regression model accounts well for the results. Implications for the study of intuitive physics mcffe generally are discussed.
Memory for Multiplication Facts
It takes approximately one second for an adult to respond to the problem "7 x 8" The results of that second are well documented, and there are a number of competing theories attempting to explain the phenomena [Campbell & Graham 1985; Ashcroft 1987; Siegler 1988]. However. there are few fully articulated models available to test specific assumption [McCloskey, Harley, & Sokol 1991]. This paper presents a connectionist account of mental multiplication which models adult reaction time and error patterns. The phenomenon is viewed as spreading activation between stimulus digits and target products, and is implemented by a multilayered network augmented with a version of the "cascade" equations [McClelland 1979]. Simulations are performed to mimic Campbell & Graham s [1985] experiments measuring adults' memory for single-digit multiplication. A surprisingly small number assumptions are needed to replicate the results found in the psychological literature—fewer than some (less explicit) theories presuppose.
Calculating Salience of Knowledge
information systems continue to grow in size and scope, advances in data management become more and more on the critical path for usability of these systems. This paper reports on the implementation and applicability of an important function - that of calculating the conceptual salience of knowledge or data in a knowledge base or database. Salience is calculated with a method based on Tversky's formulation of salience as composed of two factors: intensity and discriminability. The salience computation has been implemented and tested on a database and is independent of the particular knowledge area.
The Interaction of Memory and Explicit Concepts in Learning
The extent to which concepts, memory, and planning are necessary to the simulation of intelligent behavior is a fundamental philosophical issue in AI. A n active and productive segment of the research community has taken the position that multiple low-level agents, properly organized, can account for high-level behavior. The empirical research relevant to this debate with fully operational systems has thus far been primarily on mobile robots that do simple tasks. This paper recounts experiments with Hoyle, a system in a cerebral, rather than a physical, domain. The program learns to perform well and quickly, often outpacing its human creators at two-person, perfect information board games. Hoyle demonstrates that a surprising amount of intelligent behavior can be treated as if it were situation-determined, that often planning is unnecessary, and that the memory required to support this learning is minimal. The contribution of this paper is its demonstration of h ow explicit, rather than implicit, concept representation strengthens a reactive system that learns, and reduces its reliance on memory.
REMIND: Integrating Language Understanding and Episodic Memory Retrieval in a Connectionist Network
Most AI simulations have modeled memory retrieval separately from language understanding, even though both activities seem to use many of the same processes. This paper describes REMIND, a structured reading-activation model of integrated text comprehension and episodic reminding. In REMIND , activation is spread through a semantic network that performs dynamic inferencing and disambiguation to infer a conceptual representation of an input cue. Because stored episodes are associated with the concepts used to understand them. the spreading-activation process also activates any memory episodes that share features (X knowledge structures with the cue. After a conceptual representation is formed of the cue, the episode in the network with the highest activation is recalled from memory. Since the inferences made from a cue often include actors' plans and goals only implied in its text. REMIND is able to get abstract remindings that would not be possible without an integrated understanding and revival model.
A Model of the Role of Expertise in Analog Retrieval
This paper presents a model of the use of expert knowledge to improve accuracy of aneJog retrieval. This model, match refinement by structural difference links ( M R S D L ) , is based upon the assumption that expertise in domains requiring analogicad reasoning consists in part of knowledge of the structural similarities and differences between some pairs of the source analogs. In an empirical evaluation on four data sets, M R S D L consistently retrieved the most similar or nearly most similar source analog. Achieving comparable accuracy on these data sets with a two-stage retrieval technique such as M A C / F A C would require exhaustive matching with more than half of the source anedogs. The evaluation also showed that parallel competitive matching is often substantially faster than exhaustive matching or M R S D L .
The Story with Reminding: Memory Retrieval is Influenced by Analogical Similarity
AI models of reminding (ARCS, MAC/FAC) that predict that memory access is influenced by analogical similarity are tested. In Experiment 1, subjects initially studied a set of 12 target stories. Later, subjects read 10 other cue stories and were asked to write down the stories they were reminded of from the first set. Cue stories were associated with either an analogous and disanalogous target (competition condition), an analogous target (singleton condition), or a disanalogous target (singleton condition). An effect of analogical similarity was found only in the competition condition. Experiment 2 used the same design but targets and cues were simple subject-verb-object sentences. Cue sentences shared similar nouns and verbs with target sentences. Materials were constructed such that associated nouns either consistently mapped or cross-mapped between cues and targets. Consistent-mapped sentences were recalled more than cross-mapped sentences in both conditions. Issues for future research are addressed.
Abductive Explanation of Emotions
Emotions and cognition are inextricably intertwined. Peelings influence thoughts and actions which in turn give rise to new emotional reactions. W e claim that people infer emotional states in others using common-sense psychological theories of the interactions between emotions, cognition, and action. We have developed a situation calculus theory of emotion elicitation representating knowledge underlying common-sense causal reasoning involving emotions. We show how the theory can be used to construct explanations of emotional states. The method for constructing explanations is based on the notion of abduction. This method has been implemented in a computer program called A M A L . The results of computational experiments using A M A L to construct explanations of examples based on cases taken from a diary study of emotions indicate that the abductive approach to explanatory reasoning about emotions offers significant advantages. We found that the majority of the diary study examples cannot be explained using deduction alone, but they can be explained by making abductive inferences. The inferences provide useful information relevant to emotional states.
Assessing Explanatory Coherence: A New Method fro Integrating Verbal Data with Models of On-line Belief Revision
In an earlier study, we modeled subjects' beliefs in textually embedded propositions with E C H O , a computational system for simulating explanatory evaluations (Schank & Ranney, 1991). W e both presumed and found that subjects' representations of the texts were not completely captured by the (a priori) representations generated and encoded into E C H O ; extraneous knowledge likely contributed to subjects' biases toward certain hypotheses. This study builds on previous work via two questions: First, h o w well can E C H O predict subjects' belief evaluations when a priori representations are not used? To assess this, w e asked subjects to predict (and explain, with alternatives) an endpoint pendular-release trajectory, while collecting believability ratings for their on-line beliefs; subjects' protocols were then "blindly" encoded and simulated with E C H O , and their ratings were compared to ECHO'S resulting activations. Second, how similar are different coders' encodings of the same reasoning episode? T o assess intercoder agreement, w e examined the fit between ECHO'S activations for coders' encodings of the same protocols. W e found that intercoder correlations were acceptable, and E C H O predicted subjects' ratings well—almost as well as those from the more diminutive, constrained situations modeled by Schank and Ranney (1991).
Educating Migraine Patients Through On-Line Generation of Medical Explanations
Computer support for learning in technical domains such as medicine requires an intelligent interface between the non-expert and the technical knowledge base. W e describe a general method for constructing such interfaces and demonstrate its applicability for patient education. The employment of this technology in a medical clinic poses problems which are linguistic, psychological, and socio-cultural, rather than technological, in nature.^
Validating COGNITIO by Simulating a Student Learning to Program in Smalltalk
We describe COGNITIO, a computational theory of learning and cognition, and provide evidence of its psychological validity by comparing the protocols of a student learning to program in Smalltalk against a COGNTTIO-based computer simulation of the same. COGNITIO is a production system cognitive architecture that accounts parsimoniously for human learning based on three learning mechanisms: schema formation, episodic memory, and knowledge compilation. The results of simulation support the validity of COGNITIO as a computational theory of learning and cognition. W e also draw some implications of COGNITIO for the teaching of complex problem solving skills.
Using Theory Revision to Model Students and Acquire Stereotypical Errors
Student modeling has been identified as an important component to the long term development of Intelligent Computer-Aided Instruction (ICAI) systems. Two basic approaches have evolved to model student misconceptions. One uses a static, predefined library of user bugs which contains the misconceptions modeled by the system. The other uses induction to learn student misconceptions from scratch. Here, we present a third approach that uses a machine learning technique called theory revision. Using theory revision allows the system to automatically construct a bug library for use in modeling while retaining the flexibility to address novel errors.
Knowledge Tracing in the ACT Programming Tutor
The ACT Programming Tutor provides assistance to students as they write short computer programs. The tutor is constructed around a set of several hundred programming rules that allows the tutor to solve exercises step-by-step along with the student. This p^)er evaluates the Oitor's student modeling jx^ocedure that is used to guide remediation. This procedure, termed knowledge tracing, employs an overlay of the tutor's programming rules. In knowledge tracing, the tutor maintains an estimate of the probability that the student has learned each of the rules. The probability for a rule is updated at each opportunity to apply the rule, based on the student's performance. The predictive validity of the modeling procedure for tutw performance accuracy and posttest performance accuracy is assessed. Individual differences in learning parameters and cognitive rules are discussed, along with possible improvements in the modeling procedure.
Integrating Case Presentation with Simulation-Based Learning-by-Doing
In this paper we argue that the key to teaching someone to perform a complex task is to interleave instruction and practice in a way that exploits the synergism between the two effectively. Furthermore, w e argue that computer simulations provide a particularly promising environment in which to achieve this interleaving. We will illustrate our argument by describing a simulation-based system we are building to train people to perform complex social tasks, such as selling consulting services. In particular, we will focus on the system's ability to present real-world cases at the moment that they are relevant to the student's simulated activities. In doing so, we hope to contribute both to the construction of useful teaching system and to the theory of case-based reasoning, particularly in case retrieval.
Diagnosis can Help in Intelligent Tutoring
Recently there has been controversy about whether Intelligent Tutoring Systems are. even potentially, more effective than standard C A L programs, that is, whether it is educationally more valuable to attempt to identify the cause of user's mistakes rather than merely explain the correct method. This issue was addressed by comparative testing of two versions of the S U M I T Intelligent Tutoring Assistant for arithmetic using a diagnostic version, which diagnosed errors and gave appropriate messages, and a 'CAL' version was identical in all respects except that it made no diagnoses and therefore gave standard error messages indicating the correct method. In a comparative study of the two versions, a class of 9 year old children were flrst divided into two matched groups on the basis of a pencil and paper pre-tesi, then both groups had two 30 minute individual sessions with the appropriate version of SUMIT, and then performance was assessed on a subsequent pencil and paper post-test. Both groups improved significantly in their performance from pre-test to post-test, but the diagnostic group showed significantly greater reductions in the number of bugs. It is concluded that diagnostic remediation can be more effective than non-diagnostic approaches.
The Proper Treatment of Cognition
The apparent contradiction between Smolensky's claim that connectionism is presenting a dynamical conception of the nature of cognition as an alternative to the traditional symbolic conception, and Giunti's recent elaboration of computational systems are special cases of dynamical systems can be resolved by adopting a framework in which (a) cognitive systems are dynamical systems, (b) cognition is statespace evolution in dynamical systems, and (c) diffo*- ences between major research paradigms in cognitive science are differences in the kind of dynamical systems thought most appropriate for modeling some aspect of cognition, and in the kinds of concepts, tools and techniques used to understand systems of that kind.
Are Computational Explanations Vacuous
There is a certain worry about computational information processing explanations which occasionally arises. It takes the following general form: The informational content of computational systems is not genuine. It is ascribed to the system by an external observer. But if this is the case, why can't it be ascribed to any system? And if it can be ascribed to any system, then surely it is a vacuous notion for explanatory purposes. I respond to this worry by arguing that not every system can be accurately described as a computational information processing system.
Taking Connectionism Seriously:
Connectionism is drawing much attention as a new paradigm for cognitive science. A n important objective of connectionism has become the definition of a subsymbolic bridge between the mind and the brain. By analyzing an important example of this subsymbolic approach, NETtalk, I will show that this type of connectionism does not fulfil its promises and is applying new techniques in a symbolic approach. It is shown that connectionist models can only become part of such a new approach when they are embedded in an alternative conceptual framework where the emphasis is not placed upon what knowledge a system must posses to be able to accomplish a task but on how a system can develop this knowledge through its interaction with the environment.
Compositionality and Systematicity in Connectionist Language Learning
In a now famous paper, Fodor and Pylyshyn (1988) argue that connectionist networks, as they are commonly constructed and trained, are incapable of displaying certain crucial characteristics of human thought and language. These include the capacity to employ compositionally structured representations and to exhibit systematicity in thought and language production. Since the appearance of Fodor and Pylyshyn's paper, an number of connectionists have produced what seem to be counter-examples to the Fodor-Pylyshyn thesis. The present work examines two of these apparent counter-examples; one is due to Elman and the other to St. John and McClelland. It is argued that although Elman's and St. John & McClelland's networks discover a degree of compositionality, and display a degree of systematic behaviour, the degrees involved are substantially less than that found in humans, and (consequently) are less than what Fodor ic Pylyshyn require (or presumably would require if the question were put to them).
The (Non)Necessity of Recursion in Natural Language Processing
The prima facie unbounded nature of natural language, contrasted with the finite character of our memory and computational resources, is often taken to warrant a recursive language processing mechanism. The widely held distinction between an idealized infinite grammatical competence and the actual finite natural language performance provides further support for a recursive processor. In this paper, I argue that it is only necessary to postulate a recursive language mechanism insofar as the competence/performance distinction is upheld. However, I provide reasons for eschewing the latter and suggest that only data regarding observable linguistic behaviour ought to be used when modeling the human language mechanism. A connectionist model of language processing—the simple recurrent network proposed by Elman—is discussed as an example of a non-recursive alternative and I conclude that the computational power of such models promises to be sufficient to account for natural language behaviour.
Posters
Encoding and Retrieval Processes: Separate Issues in Problem Solving
Studies investigating the facilitation of spontaneous access during problem solving by manipulating encoding processes suggest that similar processing at acquisition and test (i.e., problem-oriented processing) enhances spontaneous access (Adams et al., 1988; Lockhart et al.. 1988). Bow Jen (1985) argues that access difficulty Is due to problem solving time (i.e.. retrieval) constraints rather than acquisition processes. Ross et al. (1989) have challenged Bowden by suggesting that an increase in retrieval time allows subjects to "catch on" to the the experimental procedure. This study investigates this claim and also attempts to separate acquisition and retrieval factors by crossing problem solving time (40. 80. 120 sec) with acquisition processing factors (problem-oriented, fact-oriented, and mixed orientation). The mixed condition includes problem-oriented and fact-oriented as a within subjects variable. Results show an increase in performance from 40 sec to 80 sec, but no added benefit beyond 80 sec. Problem-oriented processing facilitates spontaneous access. The critical evaluation is that of the mixed condition. Performance in the mixed condition also shows a faciliation of spontaneous access for those acquisition materials that involve problem-oriented processing, but not fact-oriented processing. suggesting that one form of encoding facilitates later access.
A Grounded Mental Model of Physical Systems: A Modular Connectionist Architecture
Some basic characteristics of subjects' use of mental models of physical systems are discussed. Many representations for physical knowledge suggested so far, including qualitative-reasoning-based models, do not account for these experimental findings. This paper presents a connectionist architecture which suggests an explanation of these experimental results. Two simulation experiments are described which demonstrate how mental models of physical systems may evolve and why grounding symbols used by a mental model to a quantitative representation is necessary.
Self-Organization of Auditory Motion Detectors
This work addresses the question of how neural networks self-organise to recognize familiar sequential patterns. A neural network model with mild constraints on its initial architecture learns to encode the direction of spectral motion as auditory stimuli excite the units in a tonotopically arranged input layer like that found after peripheral processing by the cochlea. The network consists of a series of inhibitory clusters with excitatory interconnections that self-organize as streams of stimuli excite the clusters over time. Self-organization is achieved by application of the learning heuristics developed by Marshall (1990^ for the self-organization of excitatory and inhibitory pathways in visual motion detection. These heuristics are implemented through linear thresholding equations for unit activation having faster-than-linear inhibitory response. Synaptic weights are learned throughout processing according to the competitive algorithm explored in Malsburg (1973).
Simple+Robust = Pragmatic: A Natural language Query Processing Model for Card-type Databases
Read users' queries to databases written in their natural language tend to be extra-grammatical, erroneous and, sometimes just a sequence of keywords. Since most conventional natural language interfaces are seminatural, they cannot treat such real queries very well. This paper proposes a new natural language query interpretation model, named SIMPLA. Because the model has a keyword-based parsing mechanism, it is very robust to cope with extragrammatical sentences. The strong keyword-based parsing capability is very dependent upon its target database's being a "card''-type. SIMPLA provides several operators to define peripheral knowledge, regarding the target database. Such peripheral knowledge is stored virtually in parts of the target ''card"-type database. Since the target database with the peripheral knowledge remains •'card"-type, S I M P L A does not decrease its robust natural language processing capability, while it embodies the ability to respond to questions concerning peripheral questions.
Integrating Reactivity, Goals, and Emotion in a Broad Agent
Researchers studying autonomous agents are increasingly examining the problem of integrating multiple capabilities into single agents. The O z project is developing technology for dramatic, interactive, simulated worlds. One requirement of such worlds is the presence of broad, though perhaps shallow, agents. To support our needs, we are developing an agent architecture, called Tok, that displays reactivity, goal-directed behavior, and emotion, along with other capabilities. Integrating the components of Tok into a coherent whole raises issues of h o w the parts interact, and seems to place constraints on the nature of each component. Here w e describe briefly the integration issues w e have encountered in building a particular Tok agent (Lyotard the cat), note their impact on the architecture, and suggest that modeling emotion, in particular, may constrain the design of integrated agent architectures.
Dedal: using Domain Concepts to Index Engineering Design Information
The goal of Dedal is to facilitate the reuse of engineering design experience by providing an intelligent guide for browsing multimedia design documents. Based on protocol analysis of design activities, w e defined a language to describe the content and the form of technical documents for mechanical design. W e use this language to index pages of an Electronic Design Notebook which contains text and graphics material, meeting reports and transcripts of conversations among designers. Index and query representations combine elements of the design language with concepts from a model of the designed artifact. The information retrieval mechanism uses heuristic knowledge from the artifact model to help engineers formulate questions, guide the search for relevant information and refine the existing set of indices. Dedal is a compromise between domain-independent argumentation-based systems and pure model-based systems which assume a complete formalization of all design documents.
No Logic? No Problem! Using A Covariation Analysis On A Deductive Task
Subjects were presented with previously played Mastermind games in the form of "Mastermind problems". Although each problem was formally deduclble, and In some cases, overdetermined, subjects nevertheless usually failed to make more than a third of the potential deductions. A Bayesian model that treated the task as one of "probabilistic reasoning" rather than "logical deduction" accounted well for the performance of the lower performing subjects. It is argued that at least some of the reasoning failures seen on hypothesis evaluation tasks such as this one are produced in part by the solver's replacement of a "deduction" representation with a "probabilistic reasoning" representation.
Projected Meaning, Grounded Meaning and Intrinsic Meaning
It is proposed that the fundamental difference between representations whose constituent symbols have intrinsic meaning (e.g. mental representations) and those whose symbols have meanings w e consider "projected" (e.g. computational representations) is causal. More specifically, this distinction depends on differences in how physical change is brought about, or what w e call "causal mechanisms". These mechanisms serve to physically ground our intuitive notions about syntax and semantiacs.
Seeing is Believing: Why Vision Needs Semantics
Knowledge about the functional properties of the world constraints and informs perception. For example, looking at a table, chair, a building or a sculpture, we are able to resolve occluded attachments because we know that in order to stand, an object's center of gravity must lie within its footprint. W h e n when we see a floating wheel in the interior of a vehicle, we know that it is probably the mejuis by which the driver communicates steering information to the chassis. Movable handles imply input to machines; fixed handles imply an upside and a downside to any object they grace. We are constructing a machine-understanding machine with which to explore the usefulness of semantics in perception. This system will investigate simple mechanical devices such as geai trains, simultaneously building a representation of the structures and functions of parts, and using that representation to guide and disambiguate perception. In this paper we discuss how this work has led to an understanding of perception in which a semantics of structure and function play a central role in guiding even the lowest level perceptual actions.
A Case-Based Approach to Problem Formulation
In domains requiring complex relational representations, simply expressing a new problem may be a complex, error-prone, and time-consuming task. This paper presents an approach to problem formulation, termed case-based formulation (CBF), that uses previous cases as a model and a guide for expressing new cases. By expressing new problems in terms of old, C B F can potentially increase the speed and accuracy of problem formulation, reduce the computational expense of retrieval, and determine the relevant similarities and differences between a new case and and the most similar old cases as a side-effect of expressing the new case. Three forms of C B F can be distinguished by the extent to which the retrieval and adaptation of previous cases are automated and the extent to which the facts of multiple cases Can be combined. A n initial implementation of one form of C B F is described and its ability to use previous cases to increase the efficiency and accuracy of new-case formalization is illustrated with a complex relational case.
Orthographic and Semantic Similarity in Auditory Rhyme Decisions
Seidenberg and Tanenhaus (1979) demonstrated that orthographic information is obligatorily activated during auditory word recognition by showing that rhyme decisions to orthographically similar rhymes pie-tie were quicker than rhyme decisions to orthographically dissimilar rhymes ryetie. This effect could be due to the fact that orthographic and phonological codes axe closely inter-related in lexical memory and the two dimensions are highly correlated. However, it could also be a example of a more general similarity bias in making rhyme decisions, in which subjects cannot ignore irrelevant information from other dimensions. W e explored this later possibility by having subjects make rhyme decisions to words that vary in orthographic similarity and also to words that vary in semantic similarity {good-kind, cruel-kind). This possibility is ruled out in two experiments in which we fail to find an interference effect with semantically related trials, while replicating the basic orthographic interference and facilitation results.
Analogy and Representation: Support for the Copycat Model
We report two experiments which assessed the psychological validity of the Copycat framework for analogy, which proposes that analogy is a process of creating a representation. Experiment 1 presented subjects with two letter string analogies: "If abc is changed to abd how would kji be changed in the same way?", and the same statement but with mrrjjj as the string to be changed. Each subject attempted to solve both analogies and order of presentation was varied. The predictions of Copycat very closely matched the performance of human subjects on the first analogy people solved. However, the second analogy task showed substantial asymmetrical transfer effects that the model does not directly predict. Substantially greater transfer was observed from the mrrjjj analogy, for which it is hard to produce a highly structured representation, to the easier to represent kji analogy, than vice-versa. In Experiment 2 the first part of the statement of the problem was "If aabbcc is changed to aabbcd...". In this case kji becomes harder to represent than mrrjjj. As predicted, this version yielded more transfer from kji to mrrjjj than the reverse. In both experiments transfer was asymmetrically with greater transfer from less structured to more structured problems than the reverse. Overall the study supported Copycat's contention that representation is a vital component for understanding analogical processors.
Using Cognitive Biases to Guide Feature Set Selection
Although learning is a cognitive task, machine learning algorithms, in general, fail to take advantage of existing psychological limitations. In this paper, we use a learning task from the field of natural language processing and examine three well-known cognitive biases for human information processing: 1) the tendency to rely on the most recent information, 2) the heightened accessibility of the subject of a sentence, and 3) short term memory limitations. In a series of experiments, w e modify a baseline instance representation in response to these limitations and show that the overall performance of the learning algorithm improves as increasingly more cognitive biases and limitations are explicitly incorporated into the instance representation.
The Interaction of Principles and Examples in Instructions
Learners often have difficulty following instructions written at a general enough level to apply to many different cases. Presence and type of example (example either matched the first task, did not match the first task, or was not present) and presence of a principle (that provided a rationale for part of a procedure) were manipulated in a set of instructions for computer text editing in order to examine whether initial performance and later transfer could be improved. The results suggest that a principle can aid initial learning from general instructions if no example is given or the example does not match the first task. The principle could help users disambiguate the instructions by providing a rationale for potentially misunderstood actions. However, if the example matches the first task, then the presence of a principle seems to slow initial performance, perhaps because the learner tries to compare and integrate the example and the principle. O n later training tasks, however, a principle improves performance. These results suggest that the features of instructions that aid initial performance and those that aid later performance are different and careful research on how to integrate these features is important.
Strategies for Contributing to Collaborative Arguments
The Argumentation Project at LRDC aims to support students in knowledge building by means of collaborative argumentation. A component of this project is a system for helping students generate arguments in a dicilogical situation. Empirical research suggests that students generally have difficulty generating arguments for different positions on an issue and may resort to giving arguments that are insincere or irrelevant. Our system will assist the arguer by constraining him to respond relevantly and consistently to the actions of other arguers, suggesting appropriate ways to respond. This assistance wUl be provided by strategies derived from conversational maxims and "good conduct" rules for collaborative argumentation. We describe a prototype system that uses these strategies to simulate both sides of a dialogical argument.
The Zoo Keeper's Paradox
Default reasoning is a mode of commonsense reas(Miing which lets us jump to plausible conclusions when there is no contrary information. A crucial operation of default reasoning systems is the checking and maintaining of consistency. However, it has been argued that default reasoning is inconsistent: Any rational agent will believe that it has some false beliefs. By doing so. the agent guarantees itself an inconsistent belief set (Israel, 1980). Perlis (1986) develops Israel's argument into an argument for the inconsist^cy of recoUective Socratic default reasoning systems. The Zoo Keeper's Paradox has been offered as a concrete example to demonstrate the inconsistency of commonsense beliefs. In this paper, w e show that Israel and Perils' arguments are not well founded. A rational agent only needs to believe that some of its beliefs are possibly or probably false. This requirement does not imply that the beliefs of rational agents are necessarily inconsistent Decision theory is used to show that concrete examples of seemingly inconsistent beliefs, such as the Zoo Keeper's Paradox, can be rational as well as consistent. These examples show that analyses of commonsense beliefs can be very misleading when utility is ignored. W e also examine the justifications of the exploratory and incredulous approaches in default reasoning, decision theoretic considerations favor the exploratory approach.
A Connectionist Architecture for Sequential Decision Learning
a connectionist architecture and learning algorithm for sequential decision learning are presented. The architecture provides representations for probabilities and utilities. The learning algorithm provides a mechanism to learn from longterm rewards/utilities while observing information available locally in time. The mechanism is based on gradient ascent on the current estimate of the long-term reward in the weight spju^e defined by a "policy" network. The learning principle can be seen as a generalization of previous methods proposed to implement "policy iteration" mechanisms with connectionist networks. The algorithm is simulated for an "agent" moving in an environment described as a simple one-dimensional random walk. Results show the agent discovers optimal moving strategies in simple caises and learns how to avoid short-term suboptimal rewards in order to maximize long-term rewards in more complex cases.
Early Warnings of Plan Failure, False Positives and Envelopes: Experiments and a Model
We analyze a tradeoff between early warnings of plan failures and false positives. In general, a decision rule that provides earlier warnings will also produce more false positives. Slack time envelopes are decision rules that warn of plan failures in our Phoenix system. Until now, they have been constructed according to ad hoc criteria. In this paper w e show that good performance under different criteria can be achieved by slack time envelopes throughout the course of a plan, even though envelopes are very simple decision rules. W e also develop a probabilistic model of plan progress, from which w e derive an algorithm for constructing slack time envek>pes that achieve desired tradeoffs between early warnings and false positives.
Syllable Priming and Lexical Representations: Evidence from Experiments and Simulations
This paper explores the composition of syllable structure in lexical representations. Data from auditory lexical decision experiments are presented which demonstrate that syllable structure is represented in the mental lexicon and that the effects of syllable suucture are separable from shared segmental overlap. The data also indicate that syllable representations correspond to a surface syllable rather than an abstract underlying syllable posited by some linguistic theories. These findings raise questions concerning the origin of syllable structure in lexical representations. A connectionist simulation utilizing the TIMIT data base shows that syllable-like structure may be induced from exposure to phonetic input. Taken together these results suggest that knowledge of surface syllable structure is actively used in understanding language and this knowledge may derive from a speaker's experience with language.
An Empirically Based Computationally Tractable Dialogue Model
We describe an empirically based approach to the computational management of dialogues. It is based on an explicit theoretically motivated position regarding the status of computational models, where it is claimed that computational models of discourse can only be about computers' processing of language. The dialogue model is based on an extensive analysis of collected dialogues from various application domains. Issues concerning computational tractability has also been decisive for its development. It is concluded that a simple dialogue grammar based model is sufficient for the management of dialogues with natural language interfaces. W e also describe the grammar used by the dialogue manager for a Natural Language interface for a database system.
Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory
This work describes an approach for inferring Deterministic Context-free (DCF) Grammars in a C!onnectionist pjiradigm using a Recurrent Neural Network Pushdown Automaton (NNPDA). The N N P D A consists of a recurrent neural network connected to an external stack memory through a common error function. W e show that the N N P D A is able to learn the dynamics of an underlying pushdown automaton from examples of grammatical and non-grammatical strings. Not only does the network learn the state transitions in the automaton, it also learns the actions required to control the stack. In order to use continuous optimization methods, we develop an analog stack which reverts to a discrete stack by quantization of all activations, after the network has learned the transition rules and stack actions. W e further show an enhancement of the network's learning capabilities by providing hints. In addition, an initial comparative study of simulations with first, second and third order recurrent networks has shown that the increased degree of freedom in a higher order networks improve generalization bu not necessarily learning speed.
The Role of Expertise in the Development of Display-Based Problem Solving Strategies
This paper reports two experiments which explore the relationship between working memory and the development of expertise. Consideration is given to the role played by external memory sources and displaybased problem solving in computer programming tasks. Evidence is presented which suggests that expertise in programming is dependent upon the development of strategies for effectively utilizing external displays. In this context, it appears that novices rely extensively upon working memory to generate as much of a solution as possible before transferring it to an external source. In contrast, experts make extensive use of an external display as an information repository. These results are discussed in terms of a framework which emphasizes the role of display-based problem solving and its contribution to strategy development.
Taxonomies and Part-Whole Hierarchies in the Acquisition of Word Meaning - A Connectionist Model
The aim of this paper is to introduce a simple connectionist model for the acquisition of word meaning, and to demonstrate how this model can be enhanced based on empirical observations about language learning in children. The main sources are observations by Markman (1989, 1990) about constraints children place on word meaning, and Nelson (1988), as well as Benelli (1988), about the role of language in the acquisition of concept taxonomies. The model enhancements based on these observations, and those authors' conclusions, are mainly built on well-known neural mechanisms such as resonance, reset and recruitment, as first introduced in the adaptive resonance theory (ART) models by Grossberg (1976). This way the strength of connectionist models in plausibly modeling detailed aspects of natural language is underlined.
Point of View: Modeling the Emotions of Others
When people reason about the behavior of others they often find that their predictions and explanations involve attributing emotions to those about w h o m they are reasoning. In this paper we discuss the internal models and representations w e have used to make machine reasoning of this kind possible. In doing so, we briefly sketch a simulated-world program called the Affective Reasoner. Elsewhere, we have discussed the Affective Reasoner's mechanisms for generating emotions in response to situations that impinge on an agent's concerns, for generating actions in response to emotions, and for reasoning about emotion episodes from cases [Elliott, 1992]. Here we give details about how agents in the Affective Reasoner model each other's point of view for both the purpose of reasoning about one another's emotion-based actions, and for "having" emotions about the fortunes (good or bad) of others (i.e., feeling sorry for someone, feeling happy for them, resenting their good fortune, or gloating over their bad fortune). T o do this, agents maintain Concernsof- Oihers representations (COOs) to establish points of view for other agents, and use cases to reason about those agents' expressions of emotions.
Using Stories to Enhance and Simplify Computer Simulations for Teaching
Computer-based simulations are a valuable teaching tool because they permit a learner to explore a phenomenon on his o w n and to learn from his mistakes. T w o factors, however, limit the use of computer simulations in teaching: good simulations are hard to build and learners can flounder with just a simulation. W e have built HeRMiT^, a case-based tutor that integrates a simulation with a library of videotaped stories. The stories make up for any lack of depth or fidelity in the simulation by facilitating the generalization and application of underlying principles.
Bootstrapping Syntactic Categories
In learning the structure of a new domain, it appears necessary to simultaneously discover an appropriate set of categories and a set of rules defined over them. W e show how this bootstrapping problem m a y be solved in the case of learning syntactic categories, without making assumptions about the nature of linguistic rules. Each word is described by a vector of bigram statistics, which describe the distribution of local contexts in which it occurs; cluster analysis with respect to an appropriate similarity metric groups together words with similar distributions of contexts. Using large noisy untagged corpora of English, the resulting clusters are in good agreement with a standard linguistic analysis. A similar method is also applied to classify short sequences of words into phrasal syntactic categories. This statistical approach can be straightforwardly reahsed in a neural network, which finds syntactically interesting categories from real text, whereas the principal alternative network approach is limited to finding the categories in small artificial grammars. The general strategy, using simple statistics to find interesting categories without assumptions about the nature of the irrelevant rules defined over those categories, m a y be applicable to other domains.
Frequency Effects on Categorization and Recognition
An experiment investigating effects of familiarity (indicated by presentation frequency) on categorization and recognition behavior is presented. Results show frequency influenced performance under speeded response conditions only, producing increased categorization of new, similar items with the frequent item, and differentiation (a decrease in false alarms to these same items) in recognition. These results are evaluated with respect to different versions of an exemplar model of categorization and recognition (Medin & Schaffer, 1978; Nosofsky, Clark & Shinn, 1989). Models that include a mechanism for differentiation, or changes in the similarity computation to a familiar example, provided better descriptions of both categorization and recognition behavior than models without this added aspect. The addition of a differentiation mechanism improved flts to categorization data of all three versions of exemplar models considered: the type model (in which repetitions do not produce separate memory traces), the toicen model (which posits individual memory traces for each repetition of an item) and the frequency parameter model (wiiich iiKludes frequency weighting as a free parameter).
Declarative Learning: Cognition without Primitives
Declarative learning by experience is a foundation cognitive c^ability, and w e argue that, over and above the normal processes of declarative learning, the ability for truly novel learning is the critical capability which bootstraps human cognition. Next w e assert that none of the established models of machine learning and no established architecture for cognition have adequate declarative learning capabilities, in that all depend for their success on some pre-characterisation of the learning domain in terms of state space or pre-existing primitives geared to the domain. Finally we describe briefly the Contextual Memory System, which was designed explicitly to sup{>ort all five declarative learning capabilities. The C M S underlies the Maths Understander machine learning system which 'reads' mathematics texts from scratch, assimilating mathematics concepts, and using them not only to check proofs but also to solve problems.
Hebbian Learning of Artificial Grammars
A connectionist model is presented that used a hebbian learning rule to acquire knowledge about an artificial grammar (AG). The validity of the model was evaluated by the simulation of two classic experiments from the A G learning literature. The first experiment showed that human subjects were significantly better at learning to recall a set of strings generated by an A G , than by a random, process. The model shows the same pattern of performance. The second experiment showed that human subjects were able to generalize the knowledge they acquired during A G learning to novel strings generated by the same grammar. The model is also capable of generalization, and the percentage of errors made by human subjects and by the model are qualitatively and quantitatively very similar. Overall, the model suggests that hebbian learning is a viable candidate for the mechanism by which human subjects become sensitive to the regularities present in AG's. From the perspective of computational neuroscience, the implications of the model for implicit learning theory . as well as what the model may suggest about the relationship between implicit and explicit memory, are discussed.
Comparison of Well-Structured & Ill-Structured Task Environments and Problem Spaces
Many of our results in the problem-solving literature are fh)m puzzle-game domains. Intuitively, most of us feel that there are differences between puzzle problems and open-ended, real-world problems. There has been some attempt to capture these differences in the vocabulary of "ill-structured" and "well-structured" prooblems. However, there seem to be no empirical studies directed at this distinction. This paper examines and compares the task environments and problem spaces of a prototypical well-structured problem (cryptarithmetic) with the task environments and problem spaces of a class of prototypical illstructured problems (design problems). Results indicate substantive differences, both in the task environments and the problem spaces.
Prediction Performance as a Function of the Representation Language in Concept Formation Systems
Existing concept formation systems employ diverse representation formalisms, ranging from logical to probabilistic, to describe acquired concepts. Those systems are usually evaluated in terms of their prediction performance and/or psychological validity. The evaluation studies, however, fail to take into account the underlying concept representation as one of the parameters that influence the system performance. So, whatever the outcome, the performance is bound to be interpreted as 'representation specific' This paper evaluates the performance of INC2, an incremental concept formation system, relative to the language used for representing concepts. The study includes the whole continuum, from logical to probabilistic representation. The results demonstrate the correctness of our assumption that performance does depend on the chosen concept representation language.
Transitions Between Modes of Inquiry in a Rule Discovery Task
Studies of rule discovery behavior employ one of two research paradigms: In the reception paradigm the item evaluated on each trial is provided by the researcher; in the selection/generation paradigm the item to be evaluated is selected or generated by the subject. The prevalence of both paradigms and their correspondence to well established modes of scientific inquiry led us to the hypothesis that if given the choice, subjects would employ both modes of inquiry. To test this hypothesis 27 adults and 27 8th graders solved three rule discovery problems in a computer environment which allowed free transitions between item reception and generation. Almost all the adults and roughly half the children employed both modes of inquiry on at least one problem, with adults much likelier to generate items. The use of a method of inquiry came in blocks with generation tending to follow reception. An inverse relationship was found between item generation and the proportion of positive instances supplied by the environment. Within both age groups, consistent individual differences were found regarding inquiry style. These results shed new light on inquiry behavior and demonstrate the desirability of letting subjects freely choose between differing modes of inquiry.
Are Rules a Thing of the Past? The Acquisition of Verbal Morphology by an Attractor Network
This paper investigates the ability of a connectionist attractor network to learn a system analogous to part of the system of English verbal morphology. The model learned to produce phonological representations of stems and inflected forms in response to semantic inputs. The model was able to resolve several outstanding problems. It displayed all three stages of the characteristic U-shaped pattern of acquisition of the English past tense (early correct performance, a period of overgeneralizations and other errors, and eventual mastery). The network is also able to simulate direct access (the ability to create an inflected form directly from a semantic representation without having to first access an intermediate base form). The model was easily able to resolve homophonic verbs (such as ring and wring). In addition, the network was able to apply the past tense, third person -s and progressive -ing suffixes productively to novel forms and to display sensitivity to the subregularities that mark families of irregular past tense forms. The network also simulates the frequency by regularity interaction that has been found in reaction time studies of human subjects and provides a possible explanation for some hypothesized universal constraints upon morphological operations.
Memory and Discredited Information: Can You Forget I Ever Said That?
Previous research has found that when information stored in memory is discredited, it can still influence later inferences one makes. This has previously been considered as an editing problem, where one has inferences based on the information prestored in memory before the discrediting, and one cannot successfully trace out and alter those inferences. However, in the course of comprehending an account, one can potentially make inferences after a discrediting, which may also show influence from the discredited information. In this experiment, subjects read a series of reports about a fire investigation, and their opportunity to make inferences before a correction appeared in the series was manipulated. Subjects received a correction statement either directly following the information it was to discredit, or with several statements intervening. The results show that subjects w h o received the correction directly after the information it corrected made as many inferences based on the discredited information as subjects w ho received the correction later (and thus could presumably make many more inferences before the correction occurred). This suggests that discredited information can influence inferences made after a correction, as well as those made before. Several hypotheses accounting for this effect are proposed.
A Fine-Grained Model of Skill Acquisition: Fitting Cascade to Individual Subjects
The Cascade model of cognitive skill acquisition was developed to integrate a number of AI techniques and to account for psychological results on the self-explanation effect. In previous work, we compared Cascade's behavior to aggregate data collected from the protocols of 9 subjects in a self evaluation study. Here, we report the results of a fine-grained analysis, in which we matched Cascade's behavior to the individual protocols of each of the subjects. Our analyses demonstrate empirically that cascade is a good model of subject behavior at the level of goals and inferences. It covers aliout lTi%, of the subjects" example-studying behavior and 6 0 % to 9 0 % of their problem-solving l)ehavior. In addition, this research forced us to (leveloj) general feasible methods for matching a simulation to large protocols (approximately 3000 stages total). Finally, the analyses point out some weaknesses in the Cascade system and provide us vvitli direction for future analyses of the model and data.
A Re-examination of Graded Membership in Animal and Artifact Categories
Previous studies of gradedness have failed to distinguish between the issues of typicality and category membership. Thus, data which have been taken to demonstrate that membership is a matter of degree may only demonstrate that typicality is graded. The present paper reports the results of two studies that attempt to overcome limitations of past methods. In the first study, subjects were asked to rate both typicality and category membership for the same stimuli as a way of distinguishing the two questions. A second study was based on the notion that there may be no definitive answer to questions about membership in graded categories. Thus, disagreements about membership in all-or-none and graded categories may have different qualities Stimuli included animal and artifact categories as well as animals that had undergone different kinds of transformations. Results from both studies suggest some support for claims that membership in animal and artifact categories is graded.
Progressions of Conceptual Models of Cardiovascular Physiology and their Relationship to Expertise
The application of scientific principles in diverse science domains is widely regarded as a hallmark of expertise. However, in medicine, the role of basic science knowledge is the subject of considerable controversy. In this paper, w e present a study that examines students' and experts* understanding of complex biomedical concepts related to cardiovascular physiology. In the experiment, subjects were presented with questions and problems pertaining to cardiac output, venous return, and the mechanical properties of the cardiovascular system. The results indicated a progression of conceptual models as a function of expertise, which was evident in predictive accuracy, and the explanation and application of these concepts. The study also documented and characterized the etiology of significant misconceptions that impeded subjects' ability to reason about the cardiovascular and circulatory system. Certain conceptual errors were evident even in the responses of physicians. The scope of application of basic science principles is not as evident in the practice of medicine, as in the applied physical domains. Students and medical practitioners do not experience the same kinds of epistemic challenges to counter their naive intuitions.
Augmenting Qualitative Simulation with Global Filtering
Capturing correct changes both locally and globally is crucial to predicting the behavior of physical systems. However, due to the nature of qualitative simulation techniques, they cannot avoid losing some information which is useful for finding precise global behavior. This paper describes how global constraints are represented and m a nipulated in current simulation systems, using a model of an internal combustion engine. The basic idea of our approach is to automatically generate additional information for maintaining global constraints during simulation so that simulation techniques can filter global behaviors with the sufficient information. This is done by automatically introducing variables and controlling their values to guide correct transitions between the behaviors. W e express this idea within the framework of Qualitative Process (QP) theory. This technique has been implemented and integrated into an existing qualitative simulation program QPE.
A Production System Model of Cognitive Impairments Following Frontal Lobe Damage
A computer model is presented which performs four different types of tasks sometimes impaired by frontal damage: the Wisconsin Card Sorting Test, the Stroop task, a motor sequencing task and a context memory task. Patterns of performance typical of frontal-damaged patients are shown to result in each task from the same type of damage to the model, namely the weakening of associations among elements in working memory. The simulation shows how a single underlying type of damage could result in impairments on a variety of seemingly distinct tasks. Furthermore, the hypothesized damage affects the processing components that carry out the task rather than a distinct central executive responsible for coordinating these components.
Inference Evaluation in Deductive, Inductive and Analogical Reasoning
An experiment with a three factorial design is described which tests the impact of 1) the degree of the mapping isomorphism, 2) the differences in the types of reasoning (deduction, induction, and analogy), and 3) the Icind of entities changed (objects, attributes, and relations) on the certainty of the inferences made. All the three factors have been found to have significant main effects and a significant interactions between the first factor and all the rest have also been found. Different particular results are discussed. For example, the certainty in the deductive inferences is not significantly different from the one in induction and analogy when there is no one-to-one mapping between the descriptions. Moreover, deduction, induction and analogy have similar behavior in relation to that factor. This is considered as a possible support of the existence of a uniform computational mechanism for evaluation of inferences in all the three kinds of reasoning, a mechanism which is primarily based on the degree of isomorphism.
Identifying Language from Speech: An Example of High-Level, Statistically-Based Feature Extraction
We are studying the extraction of high-level features of raw speech that are statistically-based. Given carefully chosen features, we conjecture that extraction can be performed reliably and in real time. As an example of this process, w e demonstrate how speech samples can be classified reliably into categories according to what language was spoken. The success of our method depends critically on the distributional patterns of speech over time. We observe that spoken communication among humans utilizes a myriad of devices to convey messages, including frequency, pitch, sequencing, etc., as well as prosodic and durational properties of the signal. The complexity of interactions among these are difficult to capture in any simplistic model which has necessitated the use of models capable of addressing this complexity, such as hidden Markov models and neural networks. W e have chosen to use neural networks for this study. A neural network is trained from speech samples collected from fluent, bilingual speakers in an anechoic chamber. These samples are classified according to what language is being spoken and randomly grouped into uaining and testing sets. Training is conducted over a fixed, short interval (segment) of speech, while testing involves applying the network multiple times to segments within a larger, variable-size window. Plurality vote determines the classification. Empirically, the proper size of the window can be chosen to yield virtually 100% classification accuracy for English and French in the tests we have performed.
Toward a Knowledge Representation for Simple Narratives
In this paper we report progress on the design of a knowledge representation formalism, based on Allen's temporal logic [Allen, 1984], to be used in a generative model of narratives. Our goal is to develop a model that will simultaneously generate text and meaning representations so that claims about recovery of meaning from text can be assessed. W e take as our domain a class of simple stories, based on Grimm's fairy tales. We base our work on story grammars, as they are the only available framework with a declarative representation. W e provide the logical foundations for developing a story grammar [Rumelhart, 1975] into a generative model of simple narratives. W e have provided definitions to specify the "syntactic" categories of the story grammar and the constraints between constituents.
Question Asking During Learning wit a Point and Query Interface
Educational software would benefit from question asking facilities that are theoretically grounded in psychology, education, and artificial intelligence. Our previous research has investigated the psychological mechanisms of question asking and has developed a computationally tractable model of human question answering. We have recently developed a Point and Query (P&Q) human-computer interface based on this research. With the P & Q software, the student asks a question by simply pointing to a word or picture element and then to a question chosen from a menu of "good" questions associated with the element. This study examined students' question asking over time, using the P & Q software, while learning about woodwind instruments. While learning, the students were expected to solve tasks that required either deep-level causal knowledge or superficial knowledge. The frequency of questions asked with the P & Q interface was approximately 800 limes the number of questions asked per student per hour in a classroom. The learning goals directly affected the ordering of questions over time. For example, students did not ask deep-level causal questions unless that knowledge was necessary to achieve the learning goal.
Towards a Distributed Network Learning Framework: Theory and Technology to Support Educational Electronic Learning Environments
Electronic networks are being increasingly used to support a variety of educational activities. Although early research in this area has been promising, there has been less work to date concerning more basic cognitive and theoretical issues associated with the design and use of educational electronic networks. This paper proposes a distributed network learning framework (DNLF) which will be presented through three main aspects: (a) network mediators and the flow of information and knowledge, (b) networks and cognitive theories of learning, and (c) the network-human interface. A s an example of an application of the distributed network learning framework, an ongoing research and development project is discussed that involves a cognitively-based educational electronic communication tool. The Message Assistant. In addition to the standard electronic mail features such as creating, sending, and receiving messages, this program includes a user-defined incremental expert system and hypertextual linking functions to assist a network mediator in her or his evaluation, organization, and distribution of network information and knowledge. The distributed network learning framework can function as a flexible-and extendable—set of conceptual views from which to examine and to work with different aspects of dynamically evolving network learning environments.
A Theory of Dynamic Selective Vigilance and Preference Reversal, Based on the Example of New Coke
A neural network theory of preference reversal is presented. This theory includes a model of why New Coke was preferred to Old Coke on taste tests but was unpopular in the market. The model uses competing drive lod representing "excitement" and "security." Context influences which drive wins the competition, hence, which stimulus attributes are attended to. Our network's design, outlined m stages, is based on Grossberg's gated dipole theory. Three sets of dipoles, representing attributes, categories, and drives, are connected by modifiable associative synapses. The network also includes competition among categories and enhancement oi attention by mismatch of expectation.
Why are Situations Hard
An ecological model of human information processing is introduced which characterizes intuition as a state oracle providing information for particular types of situations for which attunements to constraints have been developed. The consequences of this model are examined showing among other things that: for a cognitive task with a fixed problem space difficulty can only be reduced by introducing metaphor, difficulty of translation is minimum for a situationally equivalent metaphor, a situationally equivalent metaphor preserves and reflects extrinsic information about the situation, any situation containing a subcategory isomorphic to a problem situation can be made into a metaphor by supplying instructions, these characteristics can be exploited by an algorithm which chooses a metaphor in such a way that attunements are substituted for problem constraints and instructions are used as an "error term".
Front-end Serial Processing of Complex and Compound Words: The APPLE Model
Native speaker competence in English includes the ability to produce and recognize morphologically complex words such as blackboard and indestructibility &s well as novel constructions such as quoteworthiness. This paper addresses the question: H o w do subjects 'see into these complex strings? It presents, as an answer, the Automatic Progressive Parsing and Lexical Excitation (APPLE) model of complex word recognition and demonstrates how the model can provide a natural account of the complex and compound word recognition data in the literature. The APPLE model has as its core a recursive procedure which isolates progressively larger substrings of a complex word and allows for the lexical excitation of constituent morphemes. The model differs from previous accounts of morphological decomposition in that it supports a view of the mental lexicon in which the excitation of lexical entries and the construction of morphological representations IS automatic and obligatory.
The Search Image Hypothesis in Animal behavior: Its Relevance to Analyzing Vision at the Complexity Level
We show how a concept from animal behavior, the visual search hypothesis, is relevant to complexity considerations in computational vision. In particular we show that this hypothesis is an indication of the validity of the bounded/unbounded visual search distinction proposed by Tsotsos. Specifically we show bounded visual search corresponds to a broad range of naturally occurring, target driven problems in which attention alters the search behavior of animals.
Learning by Problem Solving versus by Exmaples: The Benefits of Generating and Receiving Information
This experiment contrasts learning by solving problems with learning by studying examples, while attempting to control for the elaborations that accompany each solution step. Subjects were given different instructional materials for a set of probability problems. They were either provided with or asked to generate solutions, and they were either provided with or asked to create their o w n explanations for the solutions. Subjects were then tested on a set of related problems. Subjects in all four conditions exhibited good performance on the near transfer test problems. On the far transfer problems, however, subjects in two cells exhibited stronger performance: those solving and elaborating on their own and those receiving both solutions and elaborations from the experimenter. There also was an indication of a generation effect in the far transfo* case, benefiting subjects w h o generated their own solutions. In addition, subjects' self-explanations on a particular concept were predictive of good performance on the corresponding subtask of the test problems.
The Phase Tracker of Attention
We introduce a new mechanism of selective attention among perceptual groups as part of a computational model of early vision. In this model, selection of objects is a two-stage process: perceptual grouping is first performed in parallel in connectionist networks which dynamically bind together the neural activities triggered in response to related features in the image; secondly, by locking its output on the quasi-peridic bursts of activity associated with a single perceptual group, a dynamic network called the phase-tracker of attention produces a temporal filter which retains the selected group for further processing, while rejecting the unattended ones. Simulations show that the network's behavior matches known psychological data that fit in the descriptive framework of object-based theories of visual attention.
An Extension of Rhetorical Structure Theory for the Treatment of Retrieval Dialogues
A unification of a speech-act oriented model for information-seeking dialogues (cor) with a model to describe the structure of monological text units (rst) is presented. This paper focuses on the necessary extensions of rst in order to be applicable for information-seeking dialogues: New relations are to be defined and basic assumptions of RST have to be relaxed. Our approach is verified by interfacing the dialogue component of an intelligent multimedia retrieval system with a component for natural language generation.
A Connectionist Solution to the Multiple Instantiation Problem Using Temporal Synchrony
Shastri and Ajjanagadde have described a neurally plausible system for knowledge representation and reasoning that can represent systematic knowledge involving n-ary predicates and variables, and perform a broad class of reasoning with extreme efficiency. The system maintains and propagates variable bindings using temporally synchronous—i.e., in-phase — firing of appropriate nodes. This paper extends the reasoning system to incorporate multiple instantiation of predicates, so that any predicate can be instantiated up to k times, k being a system parameter. The ability to accommodate multiple instantiations of a predicate allows the system to handle a much broader class of rules, including bounded transitivity and recursion. The time and space requirements increase only by a constant factor, and the extended system can still answer queries in time proportional to the length of the shortest derivation of the query.
Acquiring Rules for Need-Based Actions Aided by Perception and Language
The CHILDLIKE system is designed to learn about objects, object qualities, relationships among objects, and words that refer to them. Once sufficient visual-linguistic associations have been established, they can be used as foundations for a) further learning involving language alone and b) reasoning about the effect of different actions on perceived objects and relations, and internally sensed need levels. Here, we address the issue of learning efficient rules for action selection. A trial-and-error (or reinforcement) learning algorithm is used to acquire and refine action-related rules. Learning takes place via generation of hypotheses to guide movement through sequences of states, as well as modifications to two entities: the weight associated with each action, which encodes the uncertainty underlying the action, and the potential value (or vector) of each state which encodes the desirability of the state with respect to the current needs. CHILDLIKE is described, and issues relating to the handling of uncertainty, generalization of rules and the role of a short-term memory are also briefly addressed.
Genetically Generated Neural Networks I: Representational Effects
This paper studies several applications of genetic algorithms (GAs) within the neural networks field. T he system was used to generate neural network circuit architectures. This was accomplished by using the G A to determine the weights in a fully interconnected network. The importance of the internal genetic representation was shown by testing different approaches. The effects in speed of optimization of varying the constraints imposed upon the desired network were also studied. It was observed that relatively loose constraints provided results comparable to a fully constrained system. The type of neural network circuits generated were recurrent competitive fields as described by Grossberg (1982).
Genetically Generated Neural Networks II: Searching for an Optimal Representation
Genetic Algorithms (GAs) make use of an internal representation of a given system in order to perform optimization functions. The actual structural layout of this representation, called a genome, has a crucial impact on the outcome of the optimization process. The purpose of this paper is to study the effects of different internal representations in a G A , which generates neural networks. A second G A was used to optimize the genome structure. This structure produces an optimized system within a shorter time interval.
Using Analogies in Natural Language Generation
Any system with explanatory capabilities must be able to generate descriptions of concepts defined in its knowledge base. The use of analogies to highlight selected features in these descriptions can greatly enhance their effectiveness, as analogies are a powerful and compact means of communicating ideas and descriptions. In this paper, w e describe a system that can make use of analogies in generating descriptions. W e outline the differences between using analogies in problem solving and using them in language generation, and show h o w the discourse structure kept by our generation system provides knowledge that aids finding an acceptable analogy to express.
Perceiving Size in Events Via Kinematic Form
Traditional solutions to the problem of size perception have confounded size and distance perception. We investigated size perception using information that is independent of distance. As do the shapes of biological objects (Bingham, 1992), the forms of events vary with size. We investigated whether observers were able to use size specific variations in the kinematic forms of events as information about size. Observers judged the size of a ball in displays containing only kinematic information about size. This was accomplished by covarying object distance and actual size to produce equivalent image sizes for all objects and extents in the displays. Simulations were generated using dynamical models for planar events. Motions were confined to a plane parallel to the display screen. Mass density, friction, and elasticity were held constant over changes in size, simulating wooden balls. Observers were able to detect the increasing sizes of the equal image size balls. Mean size judgments exhibited a pattern predicted by a scaling factor in the equation of motion derived using similarity analysis.
Parallelism in Pronoun Comprehension
The aim of this study was to distinguish between two heuristic strategies proposed to account for the assignment of ambiguous pronouns: a subject assignment strategy and a parallel function strategy. According to the subject assignment strategy a pronoun is assigned to a preceding subject noun phrase, whereas according to the parallel function strategy a pronoun is assigned to a previous noun phrase with the same grammatical function. These two strategies were tested by examining the interpretation of ambiguous subject and non-subject pronouns. There was a strong preference for assigning both types of pronouns to preceding subject noun phrases which supported the subject assignment strategy. However the preference was reduced for non-subject pronouns compared to subject pronouns which we interpreted as evidence for grammatical parallelism. A subsidiary aim of the study was to investigate text-level effects of order-of-mention where a pronoun is assigned to a noun phrase which has been mentioned in the szmne sequential position. We did not observe any strong effects although we did observe a possible topic assignment strategy where topic-hood depended on order-ofmention. A post hoc inspection of the materials revealed possible effects of intra-sentential order-of-mention parallelism. We conclude that a subject assignment strategy, a parallel grammatical function strategy, a topic assignment strategy and a parallel order-of-mention strategy m a y all constrain the interpretation of ambiguous subject and non-subject pronouns.
Constraints on Models of Recognition and Recall Imposed by Data on the Time Course of Retrieval
Reaction time distributions in recognition conditions were compared to those in cued recall to explore the time course of retrieval, to test current models, and to provide constraints for the development of n e w models (including, to take an example, the class of recurrent neural nets, since they naturally produce reaction time predictions). Two different experimental paradigms were used. Results from a free response procedure showed fundamental differences between the two test modes, both in mean reaction time and the general shape of the distributions. Analysis of data from a signal-to-respond procedure revealed large differences between recognition and recall in the rate of growth of performance. These results suggest the existence of different processes underlying retrieval in recognition and cued recall. One model posits parallel activation of separate memory traces; for recognition, the summed activation is used for a decision, but for recall a search is based on sequential probabilistic choices from the traces. Further constraining models was the observation of nearly identical reaction time distributions for positive and negative responses in recognition, suggesting a single process for recognition decisions for targets and distractors.
A Model of Knowledge-Based Skill Acquisition
We hypothesize that two important functions of declarative knowledge in learning is to enable the learner to detect and to correct errors. W e describe psychologically plausible mechanisms for both functions. The mechanisms are implemented in a computational model which learns cognitive skills in three different domains, illustrating the cognitive function of abstract principles, concrete facts, and tutoring messages in skill acquisition.
Problem-Solving Stereotypes for an Intelligent Assistant
This paper examines the role of case-based reasoning in a problem-solving assistant system, which differs from an autonomous problem solver in that it shares the problem-solving task with a human partner. The paper focuses on the criteria driving the system designer's (or the system's) choice of cases, of representation vocabulary, and of indexing terms, and upon how the assumption of a human in the problem-solving loop influences these criteria. It presents these theoretical considerations in the context of work in progress on lOPS, a case-based intelligent assistant for airline irregular operations scheduling.
Communicating Properties Using Salience-Induced Comparisons
A method for generating simple comparison sentences of the form A is like B is proposed. The postulated input to the generator consists of the name of an entity A, and a set of descriptors about A in the form of attribute:value pairs. The main source of knowledge that controls decision making is a probabilistic conception of salience of empirically observable properties among concrete objects. W e also use a salience heuristic based on the notion of property intrinsicness. The information-theoretic concept of redundancy is used to quantify salience in probabilistic contexts. Salience factors influencing selection decisions are modeled as utilities and costs, and the decision for selecting the best object of comparison is based on the maximization of net expected utility. The method proposed has been implemented in a generation system written in CProlog.
Dynamic Gating in Vision
Visual attention requires the selection of salient regions and their remapping into a position-invariant format. W e propose the dynamic-gating model capable of autonomous remapping. It combines the localization network of Koch and U U m a n (1985) with a modified shifter-circuit network (Anderson L Van Es.sen, 1987). Autonomous selection and remapping of salient regions result from local gating dynamics and local connectivity, implying that scaling to large problem sizes is straightforward.
Understanding Detective Stories
In this paper, we illustrate a general approach to psychological inference by considering its application to a simple detective story. Detective stories provide a fertile ground for the investigation of psychological inference, because their plots so often hinge on the mental states of the characters involved. Although our analysis contains several, logically independent suggestions for h o w to tackle some of the different problems that arise in understanding the story, one guiding principle underlies our approach: the re-use thesis. According to the re-use thesis, certain inferential mechanisms whose primary function has nothing in particular to do with psychological inference cam be re-used for psychological inference tasks. In the course of the paper, w e present several examples of the re-use thesis in action. Finally, w e sketch how these applications of the re-use thesis can contribute to an understanding of our detective story.
Imagery as Process Representation in Problem Solving
In this paper, we describe the characteristics of imagery phenomena in problem solving, develop a model for the process of forming and observing mental images in problem solving, and check the model against data obtained from subjects. Then, we describe the interaction between imaging and problem solving observed in our experiments, and discuss the use of our model to simulate it. We also discuss the relation between mental models and mental image briefly.
What does a System Need to Know to Understand a User's Plans
During natural language interactions, it is often the case that a set of statements issued by a speaker/writer can be interpreted in a number of ways by a listener/reader. Sometimes the intended interpretation can not be determined by considering only conversation coherence and relevance of the presented information, and specialized domain knowledge m a y be necessary in choosing the intended interpretation. In this paper, we identify the points during the inference process where such specialized knowledge can be successfully applied to aid in assessing the likelihood of an interpretation, and present the results of an inference process that uses domain knowledge, in addition to other factors, such as coherence and relevance, to choose the interpretation intended by the speaker. Our mechanism has been developed for use in task-oriented consultation systems. The particular domain that we have chosen for exploration is that of a travel agency.
Calculating Breadth of Knowledge
Since the advent of computers, information systems have grown in terms of the quantity of knowledge they deal with. Advances in data management are on the critical path for usability of these systems. This paper reports on a novel approach to an important problem; that of calculating the conceptual breadth of knowledge or data in a knowledge base or database. Breadth determination is useful in that ascribing meta-level knowledge of conceptual content can help to predict, for example, the validity of the closed-world assumption or the likelihood of encountering new information of a particular type. The point at which a system determines it is likely to have breadth in a given knowledge area may also serve as the trigger point for calculations that assume relatively complete knowledge in that area. The accurate determination of when a system has complete knowledge in an area is crucial for the accurate application of many AI algorithms.
Learning and Problem Solving Under a Memory Load
A problem solving experiment is described where the difficulty Ss experienced in solving a panicular puzzle is manipulated using a dual task paradigm. Although Ss show impaired performance solving the puzzle the first time, performance improves considerably on a second trial and Ss are not impaired by a second trial memory load. In spite of the improvement in performance, Ss are unable to report virtually any information about the problem or their solution strategies. A model is presented that describes the pattern of performance across the levels of memory load and across the two trials. The theoretical implications of this model are discussed.
Categorization and Stimulus Structure
Concept discovery experiments have yielded theories that work well for simple, rule governed categories. They appear less applicable to richly structured natural categories, however. This paper explores the possibility that a complex but structured environment provides more opportunities for learning than the early theories allowed. Specifically, category structure may aid in learning in two ways: correlated attributes may act jointly, rather than individually, and natural structure may allow more efficient cue sampling. A n experiment is presented which suggests that each of these advantages may be found for natural categories. The results call into question independent sampling assumptions inherent in many concept learning theories and are consistent with the idea that correlated attributes act jointly. In order to model natural category learning, modifications to existing models are suggested.
"Adaptation" to Displacement Prisms Is Sensorimotor Learning
Observers reaching to a target seen through wedge shaped displacement prisms initially reach in the direction of displacement, correcting their reaches over a series of about 12 trials. With subsequent removal of the prisms, observers initially reach to the opposite side of the target, correcting over about 6 trials. This phenomenon has been called "adaptation" because of its similarity to the adaptation of sensory thresholds to prevailing energy levels. W e show, however, that this perturbation to visually guided reaching only mimics sensory adaptation initially. Subsequent changes show that this is sensorimotor learning. Error in pointing to targets is the commonly used measure. W e measured times for rapid reaches to place a stylus in a target. Participants wearing a prism worked to achieve criterion times previously established with normal, unperturbed vision. Blocks of trials with and without a prism were alternated. Both the number of trials to criterion and the mean times per block of trials decreased over successive blocks in a session, as well as over successive days. By the third day, participants were able to respond rapidly to perturbations. This reflects the acquisition of a new skill that must be similar to that acquired by users of corrective lens.
An Analysis of How Students Take the Initiative in Keyboard-to-Keyboard Tutorial Dialogues in a Fixed Domain
By student initiatives we mean productions which the student could reasonably expect to modify the course of the tutorial dialog;ue. Asking a question is one kind of student initiative. This paper describes a system called CircSim-Tutor which we are building, the background of the project, the 28 hour-long tutoring sessions jmalyzed in this paper, and the analysis done. It compares our work to previous work, gives a classification of the student initiatives found and of the tutor's responses to them, and discusses some examples.
Visual Attention and Manipulator Control
One function of visual attention is as a filter that selects one region of the visual field for enhanced detection and recognition processing. A second function of attention is to provide localization information, which can be used in guiding motor activity. A visual system in which the eyes can be moved requires such localization information to guide eye movements. Furthermore, the control of arm and hand movements for object manipulation is simplified by attentional localization of the hand with respect to a fixation frame centered on the object. This paper describes this role of attention in the visual guidance of simple motor behaviors associated with unskilled object manipulation behaviors.
Additive Modular Learning in preemptrons
Cognitive scientists, AI researchers in particular, have long-recognized the enormous benefits of modularity (e.g., Simon, 1969), as well as the need for self-organization (Samuel, 1967) in creating artifacts whose complexity approaches that of human intelligence. And yet these two goals seem almost incompatible, since truly modular systems are usually designed, and systems that truly learn are inherently nonmodular and produce only simple behaviors. Our paper seeks to remedy this shortcoming by developing a new architecture of Additive A d i ^ v e Modules which we instantiate as Addam, a modular agent whose behavioral repertoire evolves as the complexity of the environment is increased.'
MusicSoar: Soar as an Architecture for Music Cognition
Newell (1990) argued that the time is ripe for unified theories of cognition that encompass the full scope of cognitive phenomena. Newell and his colleagues (Newell, 1990; Laird, NeweU & Rosenbloom, 1987) have proposed Soar as a candidate theory. W e are exploring the application of Soar to the domain of music cognition. MusicSoar is a theory of the cognitive processes in music perception. A n important feature of MusicSoar is that it attempts to satisfy the real-time constraints of music perception within the Soar framework. If MusicSoar is a plausible model of music cognition, then it indicates that much of a listener's ability is based on a kind of memory-based reasoning involving pattern recognition and fast retrieval of information from memory: Soar's problem-solving methods of creating subgoals are too slow for routine perception, but they are involved in creating the knowledge in long-term memory that then can meet the processing demands of music in real time.
Attention, Memory, and Concepts in Autism
In this paper, it is hypothesized that many of the behavioral abnormalities found in autistic persons result from deficits in fundamental cognitive abilities. Memory and attention are the most likely candidates. The memory deficit may be primarily one of retrieval, possibly exacerbated by an encoding deficit However, both types ol memory deficit are probably the result of a (Rimary deficit in attention. This is supported by the observation that the autistic memory deficit resembles that following frontal lobe, rather than mediotemporal lobe, damage. This and other evidence is used to draw a parallel between autism and frontal lobe syndrome. In light of this analogy, how a primary deficit in the fundamental cognitive ability of attention may be responsible for the more secondary autistic deficits in memory and more advanced forms of cognition, such as language acquisition, symbol manipulation, rule extraction, and social interaction, is explored.
Collaborative Mediation of the Setting of Activity
Various aspects of task settings, including the actors and the physical environment, interact in complex ways in the construction and selection of action. In this paper, w e examine the process of collaborative mediation, that is, h ow collaborators facilitate activity by making aspects of the setting available or accessible to the principal actor. W e investigate collaborative mediation in three activities: verbal descriptions of strongly structured objects, such as one's house; cooperative computer use; and parent-child cooking. In each of these cases, the collaborator's role with respect to the principal actor and the rest of the setting differs, but they are ail of similar kind. The collaborator makes available different aspects of the setting (physical setting, goals, tests of success, etc.) as needed at appropriate moments, thus helping to operationalize goals via physical guidance, advice, indication of aspects of the setting to make them accessible or relevant, or the taking of initiative which moves the activity forward more directly. Our analysis elaborates the methods by which agents can mediate one another's construction of the settings in which they find themselves, and so facilitate successful activity. We thus extend and generalize similar analyses and approach a general theory.
Incremental Reminding: the Case-Based Elaboration and Interpretation of Complex Problem Situations
When solving a complex problem, gathering relevant information to understand the situation and imposing appropriate interpretations on that information are critical to problem solving success. These two tasks are especially difficult in weak-theory domains -- domains in which knowledge is incomplete, uncertain, and contradictory. In such domains, experts may rely on experience for all aspects of problem solving. We have developed a case-based approach to problem elaboration and interpretation in such domains. An experience-based problem-solver should be able to incrementally acquire information and, in the course of that acquisition, be reminded of multiple cases in order to present multiple viewpoints to problems that present multiple faults. We are addressing issues of 1) elaboration and interpretation of complex problem situations; 2) multiple interpretations; and 3) the role of categories as the foci of reasoning in the context of the Organizational Change Advisor (ORCA). Its model of incremental reminding is a plausible mechanism for this sort of expert problem solving behavior, and one that works well in weak theory domains. Because there is an implicit cost associated with retrieving a complex case, O R C A implements a retrieval time similarity function that requires both general expectations and specific situational relevance be considered before a story is told to the user; this increases the chances that a retrieved case will be useful.
Integrating Causal learning Rules with Backpropagation in PDS Networks
This paper presents a method for training PDP networks that, unlike backpropagation, does not require excessive amounts of training data or massive amounts of training time to generate appropriate generalizations. The method that we present uses general conceptual knowledge about cause-and-effect relationships within a single training instance to constrain the number of possible generalizations. W e describe how this approach has been previously implemented in rule based systems and we present a method for implementing the rules within the framework of Parallel Distributed Semantic (PDS) Networks, which use multiple PDP networks structured in the form of a semantic network. Integrating rules about causality with backprop in PDS Networks retains the advantages of PDP , while avoiding the problems of enormous numbers of training instances and excessive amounts of training time.
Fuzzy Evidential Logic: A Model of Causality for Commonsense Reasoning
This paper proposes a fuzzy evidential model for commonsense causal reasoning. After an analysis of the advantages and limitations of existing accounts of causality, a generalized rule-based model F E L {Fuzzy Evidential Logic) is proposed that takes into account the inexactness and the cumulative evidentiality of commonsense reasoning. It corresponds naturally to a neural (connectionist) network. Detailed analyses are performed regarding how the model handles commonsense causal reasoning.
Exemplar Competition: A Variation on Category Learning in the Competition Model
Two cue validity models for category learning were compared to the exemplar model of Medin & Schaffer (1978). The cue validity models tested for the use of two cue validity measures from the Competition Model of Bates & MacWhinney (1982, 1987, 1989) ("reUabiUty" and "overall validity"); one of these models additionally tested for "rote" associations between items and categories. Twenty-four undergraduate subjects learned to classify pseudowords into two categories over 40 blocks of trials. The overall fit of the cue validity model without rote associations was poor, but the fit of the model that included these was nearly identical to the exemplar model {R^ = .89 V3 .90). However, both cue validity models failed to capture differences predicted by exemplar similarity, but not cue validity, that were apparent as early as the first block of learning trials. The critical parameters in the Medin-SchaiFer model were fit as a logarithmic function of the learning block to provide a uniform account of learning across the 40 blocks of trials. The evidence that we provide suggests that competition at the level of exemplars should be considered as a possible extension of the Competition Model.
A View of Diagnostic Reasoning as a Memory-directed Task
Diagnostic reasoning underlies many intelligent activities, including (but not limited to) situation assessment/ context recognition, natural language understanding, scene recognition, interpretation of scientific observations, and, of course, medical diagnosis and other forms of fault-finding. In this paper, we present a memory-directed, schema-based approach to diagnostic reasoning. Features of the problem are used to "evoke" one or more possible diagnoses, stored as schemas. Schemas contain information about their "manifestations" that be used to confirm or deny the diagnosis and, in some applications, information that can be used to take action based on the diagnosis. Potential advantages of the approach include cognitive plausibility, rule exception handling via (generalized) case-based reasoning, applicability to multiple domains, extensibility from experience, itnd a natural way to organize knowledge about what to do after a diagnosis is made.
Defining the Action Selection Problem
There has been a lack of progress in the field of action selection due to an incomplete understanding of the problem being faced. The differing nature of constituent parts of the action selection/'timeallocation* problem has not been properly appreciated. Some common sub-problems, such as obtaining food and avoiding predators, are described in terms of the demands they make on an animal's time. The significant differences between these sub>- problems are highlighted and a classificatory scheme is proposed, with which sub-problems can be categorized. The need to take into account the full range of different sul)-problems is demonstrated with a few examples. A particular shortcoming shared by all of the more well-known action selection mechanisms, from both robotics and animal behaviour, is described.
Analogical versus Rule-Based Classification
Classification models have implicitly assumed that the nature of the representation that emerges from encoding will determine the type of classification strategy that will be used. These experiments, however, demonstrate that differences in classification performance can occur even when different transfer strategies operate on identical representations. Specifically, a series of examples was presented under incidental concept learning conditions. When the encoding task was completed, subjects were induced to make transfer decisions by analogy to stored information or to search for and apply rules. Across four experiments, an analogical transfer mode was found to be more effective than a rule-based transfer mode for preserving co-occurring features in classification decisions. This result held across a variety of category structures and stimulus materials. It was difficult for subjects who adopted an analytic transfer strategy to test hypotheses and identify regularities that were embedded in stored instances. Alternatively, subjects who adopted an analogical strategy preserved feature covariations as an indirect result of similarity-based retrieval and comparison processes.
A Simple Recurrent Network Model of Serial Conditioning: Implications for Temporal Event Representation
Elman (1990) proposed a connectionist architecture for the representation of temporal relationships. This approach is applied to the modeling of serial conditioning. El man's basic simple recurrent network (SRN) was modified to focus its attention on the prediction of important events (Unconditioned Stimuli, or USs) by limiting the connection weights for other events (the Conditioned Stimuli, or CSs). With this modification, the model exhibited blocking and serial conditioning to sequential stimulus compounds. A n exploration of the underlying mechanisms suggests that event terminations (CS offsets) were used in predicting U S occurrences following simple trace conditioning and event beginnings (CS onsets) were more important following serial conditioning. The results held true under a series of learning rate and momentum values.
The Figural Effect and a Graphical Algorithm for Syllogistic Reasoning
Theories of syllogistic reasoning based on Euler Circles have foundered on a combinatorial explosion caused by an inappropriate interpretation of the diagrams. A new interpretation is proposed, allowing single diagrams to abstract over multiple logical models of premises, permitting solution by a simple rule, which involves the identification of individuals whose existence is entailed by the premises. This solution method suggests a performance model. which predicts some of the phenomena of the Figural Effect, a tendency for subjects to prefer conclusions in which the terms preserve their grammatical status from the premises (Johnson-Laird k Steedman 1978). 21 students were asked to identify the necessary individuals for each of the 64 pairs of premis.'5e>. The order in which the three terms specifying the individuals were produced was shown to be as predicted by the performance model. but contrary to the presumed predictions of Mentad Models theory.
A Computational Best-Examples Model
In the past, several machine learning algorithms were developed based on the exemplar view. However, none of the algorithms implemented the bestexamples model in which the concept representation is restricted to exemplars that are typical of the concept. This paper describes a computational bestexamples model and empirical evaluations on the algorithm. In this algorithm, typicalities of instances are first measured, then typical instances are selected to store as concept descriptions. The algorithm is also able to handle irrelevant attributes by learning attribute relevancies for each concept. The experimental results empirically showed that the bestexamples model recorded lower storage requirements and higher classification accuracies than three other algorithms on several domains.