Learnable gate functions control the evolution of memory in recurrent neural networks,enabling them to store information for later retrieval. This thesis proposes modeling gates as discrete latent random variables and learning them using variational inference. To this end, we factorize the inference problem into two subproblems. This has the added benefit that it allows recurrent networks to learn from acausal information.
We introduce the GRU-VI model as a proof of concept along with a synthetic memorizationtask to test our hypothesis. The memorization task and corresponding dataset is designed to expose the network to inputs for which some time steps are more important than others, and this importance is signaled by a part of the input unrelated to the prediction problem itself. We compare the GRU-VI model’s accuracy across different lag values, increasing the amount of time that the network needs to store input for. As the time increases, we find that this does not improve the network’s accuracy over the baseline. Finally we discuss these results and propose some modifications to better learn sequences with acausal information.
Maximum Entropy Reinforcement Learning (MaxEnt RL) algorithms such as Soft Q-Learning (SQL) trade off reward and policy entropy, which has the potential to improve training stability and robustness. Most MaxEnt RL methods, however, use a constant tradeoff coefficient (temperature), contrary to the intuition that the temperature should be high early in training to avoid overfitting to noisy value estimates and decrease later in training as we increasingly trust high value estimates to truly lead to good rewards. Moreover, our confidence in value estimates is state-dependent, increasing every time we use more evidence to update a state's value estimate. In this paper, we present a simple state-based temperature scheduling approach and instantiate it for SQL as StaTeS-SQL. We prove the convergence of this method in the tabular case, describe how to use pseudo-counts generated by a density model to schedule the state-dependent temperature in large state spaces, and propose a combination of our method with advanced techniques collectively known as Rainbow. We evaluate our approach on the Atari Learning Environment benchmark and outperform Rainbow in 18 of 20 domains.
Cookie SettingseScholarship uses cookies to ensure you have the best experience on our website. You can manage which cookies you want us to use.Our Privacy Statement includes more details on the cookies we use and how we protect your privacy.