Modern applications, including natural language processing, sensor networks, collaborative filtering, and federated learning, necessitate data collection from diverse sources. However, these sources may be tainted by untrustworthy, erroneous, or adversarial data. Moreover, even in the absence of corruption, the sources might not conform to a shared underlying distribution. They could be categorized into different groups, with distinct and arbitrarily varying data distributions.
For instance, consider movie recommendation systems where users rate movies. The ratings provided by different users can exhibit variations based on their genre preferences, highlighting the diversity in data distributions among sources.
In this thesis, we consider into a range of issues within the aforementioned contexts:\begin{enumerate}
\item Robust estimation of structured distributions, both discrete and continuous.
\item Robust classification.
\item List-decodable regression
\item Mixed linear regression with small batches
\item Robust parameter estimation in graph settings.
\end{enumerate}
Previous approaches to these problems have suffered from limitations in terms of computational complexity, estimation accuracy, and sample complexity due to the presence of corrupted data sources.
This thesis introduces novel methodologies to address the limitations of previous approaches, focusing on robust learning from corrupted data sources. By doing so, it broadens the horizons for achieving precise distribution estimation, regression, classification, and parameter inference across diverse application domains.