Skip to main content
Open Access Publications from the University of California


UC San Francisco Previously Published Works bannerUCSF

Computational linguistics: A new tool for exploring biopolymer structures and statistical mechanics


Unlike homopolymers, biopolymers are composed of specific sequences of different types of monomers. In proteins and RNA molecules, one-dimensional sequence information encodes a three-dimensional fold, leading to a corresponding molecular function. Such folded structures are not treated adequately through traditional methods of polymer statistical mechanics. A promising new way to solve problems of the statistical mechanics of biomolecules comes from computational linguistics, the field that uses computers to parse and understand the sentences in natural languages. Here, we give two examples. First, we show that a dynamic programming method of computational linguistics gives a fast way to search protein models for native structures. Interestingly, the computational search process closely resembles the physical folding process. Second, linguistics-based dynamic programming methods are also useful for computing partition functions and densities of states for some foldable biopolymers e helix-bundle proteins are reviewed here. In these ways, computational linguistics is helping to solve problems of the searching and counting of biopolymer conformations.

Many UC-authored scholarly publications are freely available on this site because of the UC's open access policies. Let us know how this access is important for you.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View