Search

Article
Peer Reviewed

NNH: Improving performance of nearest-neighbor searches using histograms

UC Irvine Previously Published Works (2004)

Efficient search for nearest neighbors (NN) is a fundamental problem arising in a large variety of applications of vast practical interest. In this paper we propose a novel technique, called NNH ("Nearest Neighbor Histograms"), which uses specific histogram structures to improve the performance of NN search algorithms. A primary feature of our proposal is that such histogram structures can co-exist in conjunction with a plethora of NN search algorithms without the need to substantially modify them. The main idea behind our proposal is to choose a small number of pivot objects in the space, and pre-calculate the distances to their nearest neighbors. We provide a complete specification of such histogram structures and show how to use the information they provide towards more effective searching. In particular, we show how to construct them, how to decide the number of pivots, how to choose pivot objects, how to incrementally maintain them under dynamic updates, and how to utilize them in conjunction with a variety of NN search algorithms to improve the performance of NN searches. Our intensive experiments show that nearest neighbor histograms can be efficiently constructed and maintained, and when used in conjunction with a variety of algorithms for NN search, they can improve the performance dramatically.

Cover page: NNH: Improving performance of nearest-neighbor searches using histograms

Article
Peer Reviewed

Supporting efficient record linkage for large data sets using mapping techniques

UC Irvine Previously Published Works (2006)

This paper describes an efficient approach to record linkage. Given two lists of records, the record-linkage problem consists of determining all pairs that are similar to each other, where the overall similarity between two records is defined based on domain-specific similarities over individual attributes. The record-linkage problem arises naturally in the context of data cleansing that usually precedes data analysis and mining. Since the scalability issue of record linkage was addressed in [21], the repertoire of database techniques dealing with multidimensional data sets has significantly increased. Specifically, many effective and efficient approaches for distance-preserving transforms and similarity joins have been developed. Based on these advances, we explore a novel approach to record linkage. For each attribute of records, we first map values to a multidimensional Euclidean space that preserves domain-specific similarity. Many mapping algorithms can be applied, and we use the Fastmap approach [16] as an example. Given the merging rule that defines when two records are similar based on their attribute-level similarities, a set of attributes are chosen along which the merge will proceed. A multidimensional similarity join over the chosen attributes is used to find similar pairs of records. Our extensive experiments using real data sets show that our solution has very good efficiency and recall.

Cover page: Supporting efficient record linkage for large data sets using mapping techniques

Article
Peer Reviewed

Comparison of clustering techniques for residential energy behavior using smart meter data

LBL Publications (2017)

Current practice in whole time series clustering of residential meter data focuses on aggregated or sub sampled load data at the customer level, which ignores day-to-day differences within customers. This information is critical to determine each customer's suitability to various demand side management strategies that support intelligent power grids and smart energy management. Clustering daily load shapes provides fine-grained information on customer attributes and sources of variation for subsequent models and custom-er segmentation. In this paper, we apply 11 clustering methods to daily residential meter data. We evaluate their parameter settings and suitability based on 6 generic performance metrics and post-checking of resulting clusters. Finally, we recommend suitable techniques and parameters based on the goal of discovering diverse daily load patterns among residential customers. To the authors' knowledge, this paper is the first robust comparative review of clustering techniques applied to daily residential load shape time series in the power systems literature.

Article
Peer Reviewed

Weak base pairing in both seed and 3' regions reduces RNAi off-targets and enhances si/shRNA designs

UC San Francisco Previously Published Works (2014)

© The Author(s) 2014.The use of RNA interference is becoming routine in scientific discovery and treatment of human disease. However, its applications are hampered by unwanted effects, particularly off-targeting through miRNA-like pathways. Recent studies

Article
Peer Reviewed

Update of Euclidean windows of the hadronic vacuum polarization

LBL Publications (2023)

We compute the standard Euclidean window of the hadronic vacuum polarization using multiple independent blinded analyses. We improve the continuum and infinite-volume extrapolations of the dominant quark-connected light-quark isospin-symmetric contribution and address additional subleading systematic effects from sea-charm quarks and residual chiral-symmetry breaking from first principles. We find aμW=235.56(65)(50)×10-10, which is in 3.8σ tension with the recently published dispersive result of aμW=229.4(1.4)×10-10 [G. Colangelo, A. X. El-Khadra, M. Hoferichter, A. Keshavarzi, C. Lehner, P. Stoffer, and T. Teubner, Phys. Lett. B 833, 137313 (2022)PYLBAJ0370-269310.1016/j.physletb.2022.137313] and in agreement with other recent lattice determinations. We also provide a result for the standard short-distance window. The results reported here are unchanged compared to our presentation at the Edinburgh workshop of the g-2 Theory Initiative in 2022 [C. Lehner, Talk Presented at the 5th Plenary Meeting of the g-2 Theory Initiative in Edinburgh (2022)].

Cover page: Update of Euclidean windows of the hadronic vacuum polarization

Creative Commons 'BY' version 4.0 license

Article
Peer Reviewed

Relativistic ultrafast electron diffraction at high repetition rates

UC Berkeley Previously Published Works (2023)

The ability to resolve the dynamics of matter on its native temporal and spatial scales constitutes a key challenge and convergent theme across chemistry, biology, and materials science. The last couple of decades have witnessed ultrafast electron diffraction (UED) emerge as one of the forefront techniques with the sensitivity to resolve atomic motions. Increasingly sophisticated UED instruments are being developed that are aimed at increasing the beam brightness in order to observe structural signatures, but so far they have been limited to low average current beams. Here, we present the technical design and capabilities of the HiRES (High Repetition-rate Electron Scattering) instrument, which blends relativistic electrons and high repetition rates to achieve orders of magnitude improvement in average beam current compared to the existing state-of-the-art instruments. The setup utilizes a novel electron source to deliver femtosecond duration electron pulses at up to MHz repetition rates for UED experiments. Instrument response function of sub-500 fs is demonstrated with < 100 fs time resolution targeted in future. We provide example cases of diffraction measurements on solid-state and gas-phase samples, including both micro- and nanodiffraction (featuring 100 nm beam size) modes, which showcase the potential of the instrument for novel UED experiments.

$Cover page: Relativistic ultrafast electron diffraction at high repetition rates$

Article
Peer Reviewed

A vital role for TUBB8 in human oocyte meiotic spindle assembly and maturation

UC Berkeley Previously Published Works (2016)

Article
Peer Reviewed

TBX6 null variants and a common hypomorphic allele in congenital scoliosis.

UC Davis Previously Published Works (2015)

BACKGROUND: Congenital scoliosis is a common type of vertebral malformation. Genetic susceptibility has been implicated in congenital scoliosis. METHODS: We evaluated 161 Han Chinese persons with sporadic congenital scoliosis, 166 Han Chinese controls, and 2 pedigrees, family members of which had a 16p11.2 deletion, using comparative genomic hybridization, quantitative polymerase-chain-reaction analysis, and DNA sequencing. We carried out tests of replication using an additional series of 76 Han Chinese persons with congenital scoliosis and a multicenter series of 42 persons with 16p11.2 deletions. RESULTS: We identified a total of 17 heterozygous TBX6 null mutations in the 161 persons with sporadic congenital scoliosis (11%); we did not observe any null mutations in TBX6 in 166 controls (P<3.8×10(-6)). These null alleles include copy-number variants (12 instances of a 16p11.2 deletion affecting TBX6) and single-nucleotide variants (1 nonsense and 4 frame-shift mutations). However, the discordant intrafamilial phenotypes of 16p11.2 deletion carriers suggest that heterozygous TBX6 null mutation is insufficient to cause congenital scoliosis. We went on to identify a common TBX6 haplotype as the second risk allele in all 17 carriers of TBX6 null mutations (P<1.1×10(-6)). Replication studies involving additional persons with congenital scoliosis who carried a deletion affecting TBX6 confirmed this compound inheritance model. In vitro functional assays suggested that the risk haplotype is a hypomorphic allele. Hemivertebrae are characteristic of TBX6-associated congenital scoliosis. CONCLUSIONS: Compound inheritance of a rare null mutation and a hypomorphic allele of TBX6 accounted for up to 11% of congenital scoliosis cases in the series that we analyzed. (Funded by the National Basic Research Program of China and others.).

Cover page: TBX6 null variants and a common hypomorphic allele in congenital scoliosis.

Article
Peer Reviewed

The anomalous magnetic moment of the muon in the Standard Model

UC Irvine Previously Published Works (2020)

We review the present status of the Standard Model calculation of the anomalous magnetic moment of the muon. This is performed in a perturbative expansion in the fine-structure constant α and is broken down into pure QED, electroweak, and hadronic contributions. The pure QED contribution is by far the largest and has been evaluated up to and including O(α5) with negligible numerical uncertainty. The electroweak contribution is suppressed by (mμ∕MW)2 and only shows up at the level of the seventh significant digit. It has been evaluated up to two loops and is known to better than one percent. Hadronic contributions are the most difficult to calculate and are responsible for almost all of the theoretical uncertainty. The leading hadronic contribution appears at O(α2) and is due to hadronic vacuum polarization, whereas at O(α3) the hadronic light-by-light scattering contribution appears. Given the low characteristic scale of this observable, these contributions have to be calculated with nonperturbative methods, in particular, dispersion relations and the lattice approach to QCD. The largest part of this review is dedicated to a detailed account of recent efforts to improve the calculation of these two contributions with either a data-driven, dispersive approach, or a first-principle, lattice-QCD approach. The final result reads aμSM=116591810(43)×10−11 and is smaller than the Brookhaven measurement by 3.7σ. The experimental uncertainty will soon be reduced by up to a factor four by the new experiment currently running at Fermilab, and also by the future J-PARC experiment. This and the prospects to further reduce the theoretical uncertainty in the near future – which are also discussed here – make this quantity one of the most promising places to look for evidence of new physics.

Cover page: The anomalous magnetic moment of the muon in the Standard Model