## Type of Work

Article (202) Book (0) Theses (18) Multimedia (0)

## Peer Review

Peer-reviewed only (216)

## Supplemental Material

Video (0) Audio (0) Images (2) Zip (0) Other files (0)

## Publication Year

## Campus

UC Berkeley (57) UC Davis (12) UC Irvine (28) UCLA (21) UC Merced (11) UC Riverside (18) UC San Diego (48) UCSF (25) UC Santa Barbara (2) UC Santa Cruz (12) UC Office of the President (33) Lawrence Berkeley National Laboratory (71) UC Agriculture & Natural Resources (0)

## Department

Energy Sciences (40) Research Grants Program Office (32) Tobacco-Related Disease Research Program (3) University of California Research Initiatives (3) Multicampus Research Programs and Initiatives (MRPI); a funding opportunity through UC Research Initiatives (UCRI) (2)

School of Medicine (24)

## Journal

Proceedings of the Annual Meeting of the Cognitive Science Society (9) Western Journal of Emergency Medicine: Integrating Emergency Care with Population Health (2) null (1)

## Discipline

Medicine and Health Sciences (9) Engineering (5) Social and Behavioral Sciences (4) Life Sciences (1) Physical Sciences and Mathematics (1)

## Reuse License

BY - Attribution required (18) BY-NC-ND - Attribution; NonCommercial use; No derivatives (6) BY-NC - Attribution; NonCommercial use only (5) BY-SA - Attribution; Derivatives must use same license (1)

## Scholarly Works (222 results)

Pre-2018 CSE ID: CS2008-0916

Hierarchical Clustering is a clustering method which defines clusters on the data at various granularities - starting with a single cluster with all input data points to clusters with just individual points. Any desired number of clusters can be obtained by breaking off the hierarchy at some level and nodes of the pruned branches can be merged to form clusters.

Clusters at any given level of hierarchy depend on clusters formed in the previous level. Hierarchical clustering approaches operate greedily without backtracking. The final hierarchy is often not what the user expects, it can be improved by providing feedback. This work studies various ways of interacting with the hierarchy - providing feedback to and incorporating feedback into the hierarchy. We discuss metrics to quantify quality of a hierarchy. We apply the designed feedback mechanism on datasets with different attribute types. We report results of application of these methods on datasets and improvements in the hierarchies as per defined metrics.

Background: Social practices around marital sex and family planning in Indian societies often result in gendered inequities within households, such as husbands’ elevated alcohol use, poor gender equity ideologies, and wives’ intimate partner violence (IPV) victimization from husbands. The resulting power imbalance women face may contribute to challenges in contraception use and exclude wives from pregnancy decision-making.

Objective: To explore gendered inequities in relation to reproductive health outcomes of modern spacing contraception, and unintended pregnancy in Maharashtra, India.

Methods: This analysis includes data from rural, non-sterilized, couples (N=1,081) (Chapters 2-3), and postpartum (≤6 months) wives in urban slums (N=1,047) (Chapter 4). Associations were tested between 1) husbands’ elevated alcohol use, and gender equity ideologies with wives’ IPV victimization via logistic regression (Chapter 2), 2) wives’ IPV victimization with use of modern spacing contraception via multinomial regression (Chapter 3), and 3) wives’ reports of externally-decided pregnancy and IPV victimization with unintended pregnancy, through multinomial logistic regression (Chapter 4).

Results: Chapter 2 findings indicate that wives were less likely to report IPV if husbands reported greater gender equity ideologies (AOR: 0.97, 95% CI: 0.95, 0.99); husband’s elevated alcohol use was associated with increased risk of IPV (AOR: 1.89, 95% CI: 1.01, 3.40). Results from Chapter 3 show that women reporting physical IPV were more likely to report condom use (AOR: 2.07, 95% CI: 1.01, 3.89), and women reporting sexual IPV were more likely to report other modern spacing contraception (AOR: 2.86, 95% CI: 1.14, 7.16). Chapter 4 demonstrates that women reporting externally-decided pregnancies were more likely to have mistimed pregnancies (AOR: 6.14, 95% CI: 3.60, 10.46), as were women reporting IPV (AOR: 2.12, 95% CI: 1.38, 3.25).

Conclusion: This dissertation supports the need for gender equity counseling for husbands, with potential utility of integration within existing alcohol intervention services for men (Chapter 2). Results from Chapter 3 indicate that wives contending with IPV are accessing family planning services, thus presenting opportunities for IPV intervention. Finally, results from Chapter 4 support the need to include questions on wives’ roles in pregnancy decision-making in both screening and intervention efforts within family planning services.

Nearest neighbor search is a basic primitive method used for machine learning and information retrieval. We look at exact nearest neighbor search algorithms using tree structures. The most basic tree structure used for fast nearest neighbor search is k-d trees. This thesis will look at k-d tree’s shortcomings and explore various ways to improve its performance. First, we look at PCA trees, which give good performance but is time-expensive. We then study randomized trees, which are very efficient data structures and are flexible in space complexity. Then we introduce a new randomized tree structure, two-vantage-point tree, which outperforms all other tree structures including PCA trees, r-k-d trees, and RP trees. At last, we look at spillover on trees, which can be used to improve the performance of any tree structures. We then compare randomized trees with spillover and show that spill trees only work well with very small spill factor. If more space is allowed, two-vantage-point trees are preferred over spill trees.

In the first part of this thesis, we examine the computational complexity of three fundamental statistical tasks: maximum likelihood estimation, maximum a posteriori estimation, and approximate posterior sampling. We show that maximum likelihood estimation for mixtures of spherical Gaussians is NP-hard. We also demonstrate that hardness of maximum likelihood estimation implies hardness of maximum a posteriori estimation and approximate posterior sampling in many instances.

In the second part of this thesis, we explore the behavior of a common sampling algorithm known as the Gibbs sampler. We show that in the context of Bayesian Gaussian mixture models, this algorithm can take a very long time to converge, even when the data looks as though it were generated by the model. We also demonstrate that when a particular variant of the Gibbs sampler is used in the context of a class of bipartite graphical models, called Restricted Boltzmann Machines, it can be guaranteed to converge quickly in certain instances.

In the third part of this thesis, we consider learning problems in which the learner is allowed to solicit interaction from a user. In the context of classification, we present an efficient active learning algorithm whose performance is guaranteed to be comparable to any active learning algorithm for the particular instance under consideration. We also introduce a generic framework, termed interactive structure learning, for interactively learning complex structures over data, and we present a simple and effective algorithm for this setting which enjoys nice statistical properties.

We prove a formula relating Dedekind zeta functions associated to a number field $k$ to certain Shintani zeta functions, whose analytic properties and values at non-positive integers have been well studied by Takuro Shintani. This allows us to compute explicit formulas for Dedekind zeta functions, partial zeta functions and certain $L$-series and their derivatives evaluated at non-positive integers. We relate the explicitly given value of the derivative of partial zeta functions at $s=0$ to those predicted by abelian Stark's conjecture. Though this conjecture remains open, we are able to write down explicit formulas for the absolute values of the conjectured Stark units.

The main ingredient in these formulas is an explicit proof of Shintani's unit theorem for number fields of arbitrary signature. This says that the totally positive units of a number field $k$ has a fundamental domain given by a signed union of polyhedral cones in the Minkowski space of the field. Existence of such domains was known to Shintani. In the case $k$ is a totally real field, Colmez, Diaz y Diaz--Friedman and Charollois-Dasgupta-Greenberg were able to construct such domains and give their generators explicitly. We give an explicit construction of such domains for number fields of arbitrary signature with an exact formula for the domain. Moreover, our construction is cohomological, allowing for future cohomological applications of Shintani's method as in the work of Charollois--Dasgupta--Greenberg.

This construction allows us to write Dedekind zeta functions and partial zeta functions in terms of certain analytic zeta functions defined over polyhedral cones (Shintani zeta functions). Thus we are able to translate questions about special values of Dedekind zeta functions to those about special values of Shintani zeta, whose values at non-positive integers are given by closed finite expressions due to work of Shintani.

In interactive machine learning, the learning machine is engaged in some fashion with an information source (e.g. a human or another machine). In this thesis, we study frameworks for interactive machine learning.

In the first part, we consider interaction in supervised learning. The typical model of interaction in supervised learning has been restricted to labels alone. We study a framework in which the learning machine can receive feedback that goes beyond labels of data points, to features that may be indicative of a particular label. We call this framework learning with feature feedback and study it formally in several settings.

In the second part, we study interaction in unsupervised learning, in particular, topic modeling. Topic models are popular tools for analyzing large text corpora. However, the topics discovered by a topic model are often not meaningful to practitioners. We study two different interactive protocols for topic modeling that allow users to address deficiencies and build models that yield meaningful topics.

In the third part, we study interactive machine teaching. Different from traditional machine teaching, in which teachers do not interact with the learners, we study a framework in which interactive teachers can efficiently teach any concept to any learner.