Learning Task Informed Abstractions

Current model-based reinforcement learning methods struggle when operating from complex visual scenes due to their inability to prioritize task-relevant features. To mitigate this problem, we propose learning Task Informed Abstractions (TIA) that explicitly separates reward-correlated visual features from distractors. For learning TIA, we introduce the formalism of Task Informed MDP (TiMDP) that is realized by training two models that learn visual features via cooperative reconstruction, but one model is adversarially dissociated from the reward signal. Empirical evaluation shows that TIA leads to significant performance gains over state-of-the-art methods on many visual control tasks where natural and unconstrained visual distractions pose a formidable challenge.

In many real-world problems, the state space of MDP cannot be directly accessed but needs to be inferred from high-dimensional sensory observations. To explicitly segregate task-relevant and irrelevant factors, we propose to model the latent embedding space $\mathcal{S}$ with two components: a task-relevant component $\mathcal{S}^+$ and a task-irrelevant component $\mathcal{S}^-$. We assume that the reward is fully determined by the task-relevant component (r: \mathcal S^+ \mapsto \mathbb R), and the task-irrelevant component contains no information about the reward: $\mathrm{MI}(r_t; s^{-}_{t}) = 0$ at each time step $t$.

Molecular dynamics (MD) simulation is the workhorse of various scientific domains. However, simulating a physical system with many particles is tremendously computationally expensive. Learning-based force fields have made major progress in accelerating ab-initio MD simulation but are still significantly slower than classical force fields. Complex systems such as battery and protein take weeks to months to simulate, even with classical force fields. We adopt a different ML approach by learning time-averaged acceleration at a coarse-grained level from trajectory data generated by traditional MD simulation. We coarse-grain a physical system using graph clustering and then use a deep graph neural network to model the time-averaged evolution. Our model can simulate complex systems at a lower spatial/temporal resolution and preserve key statistics of interest. Despite only trained to make single-step predictions, our model can rollout for 100,000 steps and recover properties related to 1-10ns level long-time dynamics. Our model applies to a range of estimation problems for complex systems, including predicting radius of gyration of single-chain coarse-grained polymers of more than 1000 beads in implicit solvent and Li diffusivity of multi-component Li-ion polymer electrolyte systems.