Light-responsive proteins enable control of biological processes with unprecedented precision, holding great promise for clinical and industrial applications. Introducing these proteins into cultured cells or tissues of live animals allows investigation and control of various cellular and organism functions, from neuronal activity, to intracellular signaling, gene expression and cell proliferation, for example. In this work, we take different approaches to the optimization of phytochromes and phytochrome-based optobiology tools. We also present a deep learning framework for protein biology with direct implications on identification of functionally relevant residues in phytochromes.
First, by co-expressing cyanobacterial enzymes, we show that it is possible to increase endogenous chromophore production. Chromophores are bilin molecules that covalently bind to phytochromes, enabling photoconversion. Endogenous production of chromophores is a key development for phytochrome its use in mammalian cells. We demonstrate the limiting factors in chromophore production are two of the required enzymes in the chromophore’s pathway, and not solely heme as previously reported. We show how stoichiometry and species-matching affect chromophore production, and how chromophore levels can impact the performance of phytochrome-based optogenetic systems. Next, we demonstrate the utility of coupling the endogenous chromophore pathway and a light-responsive module composed of cyanobacterial Phytochrome B (PhyB) and its interacting factor (PIF3) to control expression of reporter genes.
Finally, we present a deep learning framework to identify complex relationships inherent in multiple sequence alignments. We develop a Hierarchical Attention network (HAN) for protein sequence families (HANprot) and demonstrate its performance in terms of relevant residue matching. We also demonstrate its utility in finding relevant residues for PhyB, towards potential optimization of its photolabile properties. The residues identified by HANprot can be used as a starting point for further protein investigations when structural or database annotations are lacking.