We introduce a generalized machine learning framework to probabilistically parameterize upper-scale models in the form of nonlinear PDEs consistent with a continuum theory, based on coarse-grained atomistic simulation data of mechanical deformation and flow processes. The proposed framework utilizes a hypothesized coarse-graining methodology with manifold learning and surrogate-based optimization techniques. Coarse-grained high-dimensional data describing quantities of interest of the multiscale models are projected onto a nonlinear manifold whose geometric and topological structure is exploited for measuring behavioral discrepancies in the form of manifold distances. A surrogate model is constructed using Gaussian process regression to identify a mapping between stochastic parameters and distances. Derivative-free optimization is employed to adaptively identify a unique set of parameters of the upper-scale model capable of rapidly reproducing the system's behavior while maintaining consistency with coarse-grained atomic-level simulations. The proposed method is applied to learn the parameters of the shear transformation zone (STZ) theory of plasticity that describes plastic deformation in amorphous solids as well as coarse-graining parameters needed to translate between atomistic and continuum representations. We show that the methodology is able to successfully link coarse-grained microscale simulations to macroscale observables and achieve a high-level of parity between the models across scales.