Making Decisions Under Uncertainty for Large Data Domains
- Author(s): Roytman, Alan James
- Advisor(s): Ostrovsky, Rafail
- et al.
In this thesis, we study key questions that touch upon many important problems in practice which are data-intensive. How can we process this influx of data while using a small amount of memory without sacrificing solution quality? We study this question in the context of the classical k-means clustering problem for the streaming model under a data separability assumption. We design a near-optimal streaming approximation algorithm that uses small space and makes one pass over the stream.
The streaming model may be too restrictive for certain problems that demand more computational resources. Can we still provide provable guarantees for such applications, where the input arrives online? We consider this question in the context of load balancing problems for data centers and study various scheduling problems which are energy-aware. Moreover, we show that our algorithmic techniques have applications to the machine learning community for a fundamental online convex optimization problem by giving insight into fine-tuning the tradeoff between two performance benchmarks: the competitive ratio and regret.