The Expectation-Maximization (EM) Algorithm is well-known in the literature of machine learning and has been widely used for training of probabilistic and some non-probabilistic models, such as mixture of Gaussians and K-means, respectively. Despite the vast volume of research on application of the EM algorithm for training probabilistic models, there has been little attempt toward usage of the EM algorithm for non-probabilistic models. In this dissertation, various piecewise constant models, and their learning procedures in the literature are reviewed. For each model, the EM-based optimization of reviewed model is proposed. The EM algorithms proposed in this dissertation have the same spirit as the original EM algorithm. For each model, the proposed EM algorithm is properly modified to fit the non-probabilistic nature of the model. The EM algorithm was originally designed to fit the modular structure of any intelligent model, such as neural networks or mixture models. In this dissertation, it is shown how with the EM algorithm it is possible to approach a piecewise constant model as a modular structure and optimize the model based on each module of the structure. The optimization procedure consists of two steps, Expectation/assignment step and Maximization/update step. More specifically, in the EM algorithm, for each module of the structure, a maximization/minimization problem has to be solved. The parameters of optimization problem for each module are provided by the expectation step for that module. In this dissertation, it is shown that such optimization problems are NP-hard and can often be approximated through a proper surrogate objective function. We proposed novel surrogate functions. The proposed EM-based approach is applied to several piecewise constant models, such as prototype nearest neighbor. Further, the convergence guarantee and computational complexity of the developed EM algorithms are presented for each model. Finally, through extensive experiments we show that the proposed EM-based algorithms have superior or similar performance when compared with several other similar state-of-the-art models and algorithms. Additionally, the proposed approach for optimizing the piecewise constant models provides an in-depth interpretability for training procedures. We specifically applied the proposed optimization algorithm to synthetic reduced nearest neighbor for classification, adversarial label-poisoning, robust synthetic reduced nearest neighbor and synthetic reduced nearest neighbor for regression.