Behavioral experiments are often feed-forward: they begin with designing the experiment, and proceed by collectingthe data, analyzing it, and drawing inferences from the results. Active learning is an alternative approach where partialexperimental data is used to iteratively design subsequent data collection. Here, we study experimental application ofBayesian Active Model Selection (BAMS), which designs trials to discriminate between a set of candidate models. Weconsider a model set defined by a generative grammar of Gaussian Process kernels that can model both simple functionsand complex compositions of them. To validate the method experimentally, we use BAMS to discover how factors suchas contrast and number affect numerosity judgements. We compare the rate of convergence of the active-learning methodto a baseline passive-learning strategy that selects trials at random. Active learning over a structured model space mayincrease the efficiency and robustness of behavioral data acquisition and modeling.