Split-wrmScarlet and split-sfGFP: tools for faster, easier fluorescent labeling of endogenous proteins in Caenorhabditis elegans

Jérôme Goudeau; Catherine S. Sharp; Jonathan Paw; Laura Savy; Manuel D. Leonetti; Andrew G. York; Dustin L. Updike; Cynthia Kenyon; Maria Ingaramo

Research Article

Note that this is a limited PDF or print version; animated and interactive figures are disabled. For the full version of this article, please visit one of the following links: https://andrewgyork.github.io/split_wrmscarlet https://calico.github.io/split_wrmscarlet

Your browser doesn't seem to support Javascript. This document uses Javascript for interactive figures, math typesetting, and to automatically generate the reference list. Either activate Javascript, or use the "Download PDF" link above if you want properly typeset math and a reference list.

Split-wrmScarlet and split-sfGFP: tools for faster, easier fluorescent labeling of endogenous proteins in Caenorhabditis elegans

Jérôme Goudeau^1,4, Catherine S. Sharp², Jonathan Paw¹, Laura Savy³, Manuel D. Leonetti³, Andrew G. York¹, Dustin L. Updike², Cynthia Kenyon^1,4 and Maria Ingaramo^1,4

¹Calico Life Sciences LLC, South San Francisco, CA, United States

²Mount Desert Island Biological Laboratory, Bar Harbor, Maine 04672, United States

³Chan Zuckerberg Biohub, San Francisco, California 94158, United States

⁴Corresponding authors: jerome@calicolabs.com, cynthia@calicolabs.com, ingaramo@calicolabs.com

Abstract

We create and share a new red fluorophore, along with a set of strains, reagents and protocols, to make it faster and easier to label endogenous C. elegans proteins with fluorescent tags. CRISPR-mediated fluorescent labeling of C. elegans proteins is an invaluable tool, but it is much more difficult to insert fluorophore-size DNA segments than it is to make small gene edits. In principle, high-affinity asymmetrically-split fluorescent proteins solve this problem in C. elegans: the small fragment can quickly and easily be fused to almost any protein of interest and can be detected wherever the large fragment is expressed and complemented. However, there is currently only one available strain stably expressing the large fragment of a split fluorescent protein, restricting this solution to a single tissue (the germline) in the highly autofluorescent green channel. No available C. elegans lines express unbound large fragments of split red fluorescent proteins, and even state-of-the-art split red fluorescent proteins are dim compared to the canonical split-sfGFP protein. In this study, we engineer a bright, high-affinity new split red fluorophore, split-wrmScarlet. We generate transgenic C. elegans lines to allow easy single-color labeling in muscle or germline and dual-color labeling in somatic cells. We also describe 'glonads', a novel expression strategy for the germline, where traditional expression strategies struggle. We validate these strains by targeting split-wrmScarlet to several genes whose products label distinct organelles, and we provide a protocol for easy, cloning-free CRISPR/Cas9 editing. As the collection of split-FP strains for labeling in different tissues or organelles expands, we will post updates at doi.org/10.5281/zenodo.3993663

Peer review status

Anonymously reviewed at Genetics. Abby Dernburg and Dustin Updike gave valuable input on an earlier version of this article.

Cite as: doi:10.5281/zenodo.3993663

Introduction

Genetically expressed fluorophores are essential tools for visualizing and quantifying cellular proteins. In C. elegans, fluorescent proteins have traditionally been introduced on extrachromosomal arrays [Kimble 1982, Mello 1991] or via MosSCI-based integration [Frøkjær-Jensen 2012, Frøkjær-Jensen, C. 2008]. These methods have enabled important discoveries but can also lead to artifacts due to supraphysiological gene-expression levels and lack of endogenous regulatory control. In recent years, the repertoire of C. elegans transgenic tools has expanded [see Nance 2019 for review], particularly due to advances in CRISPR/Cas9 genome-editing technologies [Paix 2014, Dickinson 2016]. CRISPR/Cas9 allows precise transgene insertion by homology-directed repair (HDR) and can be used to label an endogenous gene at its native locus with a fluorescent protein [Friedland 2013, Dokshin 2018, Farboud 2019, Vicencio 2019].

However, relative to CRISPR/Cas9-mediated integration of smaller transgenes, genomic insertion of large DNA fragments like those encoding fluorescent proteins remains a challenge, both because repair with double-stranded templates is less efficient than repair with single-stranded oligodeoxynucleotide donors (ssODN) [Farboud 2019], and because of the requirement for cloning to prepare the HDR donor template. Recent methods such as ‘hybrid’ [Dokshin 2018] and ‘nested’ [Vicencio 2019] CRISPR remove the need for cloning but still require preparation of the DNA template or several rounds of injections and selection of transgenic progeny. As a result, using CRISPR with small ssODN templates is currently faster, easier, cheaper and more efficient than with large templates. In our lab, we routinely make C. elegans genome edits with short ssODNs with almost guaranteed success. In contrast, in our experience, large edits using double-stranded DNA templates are more time-consuming and have higher failure rates.

Our preferred approach is to combine the utility of full-length fluorescent proteins with the convenience of short genomic edits, by using high-affinity asymmetrically-split fluorescent proteins [Cabantous 2004]. These fluorophores typically separate a GFP-like protein between the 10th and 11th strands of the beta barrel, splitting it asymmetrically into a large (FP_1-10) and a small (FP₁₁) fragment. The fragments are not individually fluorescent, but upon binding one another, recapitulate the fluorescent properties of an intact fluorophore (Figure 1A). Unlike the low-affinity split fluorescent proteins used in BiFC assays [Hu 2002], high-affinity binding between the fragments is critical here. Our preferred approach for tagging a new cellular protein begins with a C. elegans strain expressing the large FP_1-10 fragment in cells of interest, unattached to any cellular protein. This way, only the small FP₁₁ fragment (<72 nt) needs to be inserted to tag the target protein, which will only fluoresce in compartments where it can bind the large fragment. These short insertions tend to be faster, easier, and more reliable than inserting a >600 nt full-length fluorescent protein [Paix 2015, Prior 2017, Dokshin 2018, Richardson 2018]. Therefore, collections of C. elegans lines stably expressing the large FP_1-10 in different tissues are an invaluable resource allowing rapid fluorescent tagging in a cell type of choice. Stable lines with red FP_1-10 fragments would be especially useful, given C. elegans’ substantial autofluorescence in the GFP channel.

Green and red asymmetrically-split fluorescent proteins have been used to combine cell and protein specificity in C. elegans neurons and synapses [Noma 2017, He 2019, Feng 2019]; however, these strains used extrachromosomal arrays, not stable lines, which are more time-consuming to maintain and can have variable expression levels. To the best of our knowledge, there is only one available unbound FP_1-10 stable C. elegans line, which expresses sfGFP_1-10 in the germline [Hefel 2019], and there are no available lines with red FP_1-10 fragments. Existing red split fluorophores are also much dimmer in C. elegans than green ones, despite recent improvements like split-sfCherry3 [Feng 2019]. In addition, we often struggle to express genome-integrated full-length fluorescent protein fusions in the germline, potentially due to generational silencing.

Here, we describe tools that reduce these obstacles for convenient fluorescent labeling of endogenous C. elegans proteins. We engineer split-wrmScarlet, a new split red fluorescent protein based on mScarlet [Bindels 2016, El Mouridi 2017], which is three times brighter in worms than split-sfCherry3 (https://www.addgene.org/138966/). We generate and share C. elegans lines carrying single-copy insertions of split-wrmScarlet_1-10 expressed broadly in somatic cells (https://cgc.umn.edu/strain/CF4582) and specifically in muscle (https://cgc.umn.edu/strain/CF4610). We also describe a novel approach to make C. elegans lines with robust germline expression of exogenous proteins that appears to be resistant to generational silencing. We use this approach to make a germline specific split-wrmScarlet_1-10 strain (https://cgc.umn.edu/strain/DUP237). We provide a protocol for an easy, cloning-free method to label endogenous genes with FP₁₁s using CRISPR/Cas9, commercially available synthetic single-stranded oligodeoxynucleotide (ssODN) donors, and microinjection (doi.org/10.17504/protocols.io.bamkic4w). We validate this protocol by targeting split-wrmScarlet₁₁ to six different genes whose products have distinct cellular locations. We also show that labeling with tandem split-wrmScarlet₁₁-repeats increases fluorescence in vivo, and we provide the plasmid necessary to generate the dsDNA template through Addgene (https://www.addgene.org/158533). We also generate a strain expressing an integrated copy of sfGFP_1-10 [Pédelacq 2005] in somatic cells (https://cgc.umn.edu/strain/CF4587), and a strain expressing sGFP2_1-10 [Köker 2018] specifically in the germline (https://cgc.umn.edu/strain/DUP223). Finally, we generate a dual-color strain expressing both sfGFP_1-10 and split-wrmScarlet_1-10 in somatic cells (https://cgc.umn.edu/strain/CF4588) for two-color applications such as colocalization studies or organelle interaction. As the collection of split-FP strains and related resources for labeling different tissues, organelles and proteins expands, we will post updates here.

Results

Split-wrmScarlet

To engineer split-wrmScarlet, first we introduced a 32 amino acid spacer between the 10^th and 11^th β-strands of yeast-codon optimized mScarlet, following a strategy described previously [Feng 2017]. We subjected the spacer-inserted mScarlet sequence to several rounds of semi-random mutagenesis in E. coli, generating a version with fluorescence comparable to the full-length mScarlet when expressed in bacteria. However, upon separating the two fragments into two S. cerevisiae plasmids to test for complementation, we observed no detectable fluorescence in yeast. We decided to continue with several rounds of selection of new mutant libraries in yeast using FACS, by fusing the small fragment (without the MDELYK C-terminus residues) from our brightest E. coli clone to a plasma-membrane-targeted blue FP (mTagBFP), and expressing soluble the large fragment from a high-copy number vector containing a strong promoter. The brightest resulting protein, which we named split-wrmScarlet, contained 10 amino acid substitutions relative to the C-terminal truncated mScarlet (Figure S1, A and B). Fluorescence microscopy of yeast containing both plasmids corroborated that split-wrmScarlet showed the expected localization and can reach brightness comparable to that of intact mScarlet in yeast (Figure S2, A and B).

Split-wrmScarlet is threefold brighter than split-sfCherry3 in C. elegans muscles

In order to compare split-wrmScarlet to split-sfCherry3, the brightest published red split-FP at the time of the experiment, we combined the FP_1-10 and FP₁₁ fragments into a single plasmid for each fluorophore. Specifically, we generated worm-codon-optimized plasmids encoding three nuclear localization signals (NLS), mTagBFP2, FP₁₁, a T2A peptide-bond-skipping sequence, mNeonGreen and the corresponding FP_1-10, driven by the eft-3 promoter (Figure S3A). Each FP₁₁ was linked to mTagBFP2 in order to reduce the risk of proteolysis of the short peptide, and mNeonGreen was linked to FP_1-10 to monitor its expression, and for normalization purposes. Each construct was injected into wild-type animals and fluorescent progeny were analyzed. Unexpectedly, split-sfCherry3 turned out to be toxic when expressed ubiquitously, whereas 99% of split-wrmScarlet-overexpressing worms became viable adults (Figure S3B).

In an attempt to reduce split-sfCherry3-associated toxicity, we modified our construct by using the muscle-specific myo-3 promoter and removing the NLS sequence (Figure 1B). We did not detect toxicity associated with the expression of these constructs and were able to compare the fluorescence of split-sfCherry3 and split-wrmScarlet in young adults. Red fluorescence emitted from split-wrmScarlet was 2.9-fold higher than that of split-sfCherry3 when normalized to the mNeonGreen signal (Figure 1B, 1C and S4). We also observed a 60% higher expression level of mTagBFP2 in the split-wrmScarlet-expressing animals (Supplementary Figure S4). It is worth noting that differences in expression levels could influence both brightness and toxicity comparisons. A more controlled way to compare the split FPs at similar expression levels would be to make single-copy genomic insertions of these constructs at a neutral site in the genome.

**Figure 1: Engineering and evaluating split-wrmScarlet.** (A) Principle of endogenous protein labeling with split-wrmScarlet. The protein structure from split-wrmScarlet was generated using Phyre2 and PyMOL. (B) Schematic of the plasmids encoding split-wrmScarlet and split-sfCherry3. Each plasmid consists of the large FP_1-10 sequence fused to mNeonGreen, and the corresponding small FP₁₁ sequence fused to mTagBFP2. The T2A sequence ensures that mTagBFP2::FP11 and the corresponding mNeonGreen::FP_1-10 are separated. The images are representative displays of the ratio of red to green fluorescence intensity from images acquired under identical conditions after background subtraction and masking with the same threshold. Scale bar, 50 µm. (C) Emission intensities from split-sfCherry3 and split-wrmScarlet normalized to mNeonGreen. Mean ± s.d. Circles are individuals (n=6 for each split fluorescent protein). ****P < 0.0001.

Split-wrmScarlet₁₁-mediated tagging in all somatic tissues or specifically in muscles

Our protein-tagging approach was analogous to existing split-FP methods developed for human cells [Kamiyama 2016, Leonetti 2016] and C. elegans [Hefel 2019]. It requires split-wrmScarlet_1-10 (i.e. just the large fragment of split-wrmScarlet without the 11^th β-strand) to be expressed in the cell or tissue of interest, and the small split-wrmScarlet₁₁ fragment to be inserted at an endogenous locus to tag a protein of interest (Figure 1A). To build strains expressing single-copy insertions of split-wrmScarlet_1-10, we first optimized its sequence for C. elegans codon usage [Redeman 2011] and included three introns (Table S1). The strain expressing split-wrmScarlet_1-10 throughout the soma (driven by the eft-3 promoter and unc-54 3’UTR) was generated by editing the genome of the existing MosSCI line CA1200 [Zhang 2015] and replacing the sequence encoding tir-1::mRuby with split-wrmScarlet_1-10 using CRISPR/Cas9 and hybrid DNA templates [Paix 2015, Dokshin 2018] (Supplementary Table S4). In order to perform tissue-specific labeling, we generated a strain expressing muscle-specific split-wrmScarlet_1-10 using the SKI-LODGE system in the strain WBM1126 [Silva-García 2019] (Supplementary Table S4). The expression of split-wrmScarlet_1-10 in these two lines did not affect the number of viable progeny (Figure S5A) nor lifespan (Figure S5B and Table S6), suggesting that the expression of split-wrmScarlet_1-10 had no deleterious effect. To tag a gene of interest with the split-wrmScarlet₁₁ fragment, we used microinjection of preassembled Cas9 ribonucleoproteins, because this method enables high-efficiency genome editing in worms [Paix 2015]. The most efficient insertion of short sequences in C. elegans was previously shown to be achieved using ssODN donors [Paix 2015, Prior 2017, Dokshin 2018]. A great advantage of this strategy is that all of the components required for editing are commercially available or can be synthesized rapidly in the lab [Leonetti 2016]. Synthetic ssODNs have a typical size limit of 200 nt. The small size of split-wrmScarlet₁₁ (18-24 a.a.) is key: 200 nt can encompass split-wrmScarlet₁₁ (66-84 nt, including a 4 a.a. linker) flanked by two homology arms >34 nt (up to 67-58 nt) for HDR. In principle, a few days after the somatic and/or muscle-specific split-wrmScarlet_1-10 strain(s) are microinjected, progeny can be screened for red fluorescence, genotyped and sequenced to check the accuracy of editing (Figure 2; a detailed protocol is available at doi.org/10.17504/protocols.io.bamkic4w). If desired, co-CRISPR strategies such as dpy-10(cn64) [Paix 2015] or co-injection with pRF4 [Dokshin 2018] can be used to screen for correct candidates and to control for microinjection efficacy and payload toxicity.

**Figure 2: Split-wrmScarlet₁₁-mediated tagging.** Schematic representation of the split-wrmScarlet workflow to visualize endogenous proteins specifically in muscles, germline, or throughout the soma. Some illustrations were created with BioRender.com.

To test our approach, we used it to tag six proteins with distinct subcellular localizations. Starting with the somatic split-wrmScarlet_1-10 parental strain CF4582, we introduced split-wrmScarlet₁₁ at the N-terminus of TBB-2, FIB-1 or VHA-13 or at the C-terminus of EAT-6, HIS-3 and TOMM-20 (Supplementary Table S4). These proteins mark the cytoskeleton, nucleoli, lysosomes, plasma membrane, nuclei and mitochondria, respectively. Importantly, for tagging transmembrane proteins, the split-wrmScarlet₁₁ tag was introduced at the terminus exposed to the cytosol. Split-wrmScarlet fluorescence from all six proteins matched their expected subcellular localization in somatic cells (Figure 3, A-F). To test the muscle-specific split-wrmScarlet_1-10 line CF4610, we tagged the N-terminus of the endogenous FIB-1 with split-wrmScarlet₁₁ and confirmed the fluorescence from nucleoli in muscle cells (Figure 3G). Together, our results show that split-wrmScarlet enables rapid fluorescent tagging of proteins with disparate cytoplasmic or nuclear locations expressed from their endogenous loci.

The 18 a.a. split-wrmScarlet₁₁ sequence used for these experiments ends with two glycines. In mammalian cells, C-terminal gly-gly sequences have been reported to function as degradation signals [Koren 2018]. Our TOMM-20::split-wrmScarlet₁₁ had a spontaneous mutation of the last glycine to a stop codon (Supplementary Material, Table S4), which could be problematic if the protein degradation mechanism, DesCEND (destruction via C-end degrons), operates in C. elegans. However, we do not detect differences in protein abundance of HIS-3 versus HIS-3:split-wrmScarlet₁₁ by western blots in C. elegans (Figure S11). We also do not detect differences in protein abundance of mScarlet truncated to end in gly-gly compared to mScarlet ending in MDELYK via fluorescence in yeast (Figure S12). Nonetheless, we recommend using a 24 a.a. split-wrmScarlet₁₁ sequence YTVVEQYEKSVARHCTGGMDELYK when labeling proteins at their C-terminus to avoid the possibility that split-wrmScarlet₁₁ ending in gly-gly could function as a degron. This modified sequence still fits within the 200 nt ssODN synthesis limit and works at least as well as the 18 a.a. split-wrmScarlet11 sequence (Figure S6).

**Figure 3: Split-wrmScarlet labeling of proteins with distinct subcellular locations.** Endogenous proteins tagged with split-wrmScarlet11 in animals expressing split-wrmScarlet_1-10 in somatic tissues, in muscles or in the germline. (A-F) Confocal images of worms expressing somatic split-wrmScarlet_1-10 and (A) EAT-6::split-wrmScarlet₁₁ (plasma membrane), (B) split-wrmScarlet₁₁::TBB-2 (cytoskeleton), (C) split-wrmScarlet₁₁::FIB-1 (nucleoli), (D) HIS-3::split-wrmScarlet₁₁ (nuclei), (E) split-wrmScarlet₁₁::VHA-13 (lysosomes), or (F) TOMM-20::split-wrmScarlet₁₁ (mitochondria). (G) Transgenic worm expressing split-wrmScarlet_1-10 in muscle and split-wrmScarlet₁₁::FIB-1. (H) Transgenic worm expressing split-wrmScarlet_1-10 in the germline and split-wrmScarlet₁₁::FIB-1. (A-H) Maximum intensity projections of 3D stacks shown. Scale bars, 50 µm.

Split-wrmScarlet₁₁-mediated tagging in the germline.

Our initial attempt to use split-wrmScarlet in the germline failed. We made a single-copy integrated Psun-1::split-wrmScarlet_1-10::sun-1 3’UTR strain via MosSCI, but when we injected a plasmid encoding mNeonGreen::split-wrmScarlet₁₁, we observed green fluorescence, but no red fluorescence (Figure S7, A and B), suggesting the absence of split-wrmScarlet_1-10 expression. We suspected germline silencing of the germline-expressed split-wrmScarlet_1-10, so we attempted an alternative expression approach which we call “glonads”. The germline-helicase protein GLH-1 is highly expressed and germline-specific [Marnik 2019]. We fused a T2A::split-wrmScarlet_1-10 sequence to the C-terminus of the endogenous glh-1 gene using CRISPR/Cas9. The high expression of GLH-1 yielded high expression of split-wrmScarlet_1-10, and the T2A separated split-wrmScarlet_1-10 from GLH-1 [Liu 2017]. The glh-1::T2A::split-wrmScarlet_1-10 strain (https://cgc.umn.edu/strain/DUP237) can be used like our other tissue-specific strains for germline-specific tagging. To demonstrate this, we tagged the N-terminus of endogenous FIB-1 with split-wrmScarlet₁₁, and we observed red fluorescence localized to the nucleoli specifically in the germline and embryos, as we hoped (Figure 3H; Figure S8, A and C). Finally, we note that the strategy used to express split-wrmScarlet_1-10 or split-sGFP2_1-10 in the germline, by tagging the 3’ end of the endogenous glh-1 with T2A::FP_1-10 with CRISPR/Cas9, could be used to express any other protein of choice.

Split-wrmScarlet₁₁ tandem repeats increase fluorescence

To benchmark the fluorescence intensity of split-wrmScarlet against its full-length counterpart, we first generated split-wrmScarlet::vha-13 [El Mouridi 2017] transgenic animals and compared their fluorescence to split-wrmScarlet₁₁::vha-13 in worms expressing split-wrmScarlet_1-10 somatically (Figure 4, A and B). At the vha-13 locus, split-wrmScarlet was about half as bright as a full-length fluorophore (48%), a ratio comparable to that of split-mNeonGreen2 and its full-length counterpart in human cells [Feng 2017].

Since visualizing endogenous proteins of low abundance can be challenging, it is key to address this limitation. Increasing the number of FP₁₁ domains tagged to an endogenous protein multiplies the number of the corresponding FP_1-10s recruited, increasing the overall fluorescent signal in human cells [Leonetti 2016] and in C. elegans [He 2019, Hefel 2019]. To demonstrate that split-wrmScarlet fluorescence is enhanced by split-wrmScarlet₁₁ tandem repeats, we introduced two split-wrmScarlet₁₁ domains at the N-terminus of VHA-13 and three split-wrmScarlet₁₁ domains at the C-terminus of HIS-3 in animals expressing somatic split-wrmScarlet_1-10. Compared to animals carrying a single split-wrmScarlet₁₁ at the identical locus, carrying two split-wrmScarlet₁₁s increased overall fluorescence by 1.5-fold, while carrying three increased it by 2.3-fold (Figure 4, C and D). Note that our three-split-wrmScarlet₁₁ tandem sequence exceeds the 200 nt ssODN synthesis limit, so we used dsDNA donor templates for these constructions (Supplementary Material, Table S4).

sfGFP₁₁-mediated tagging in somatic cells

Split-sfGFP has been used successfully in worms before [Noma 2017, He 2019, Hefel 2019]. However, there is still a need for a strain that ubiquitously expresses sfGFP_1-10 in the soma from an integrated single-copy insertion in order to avoid heterogeneous expression and time-consuming manual maintenance. To build this strain, we codon-optimized the original sfGFP_1-10 sequence for C. elegans and included one intron [Cabantous 2004, Redeman 2011] (Supplementary Table S1). We initially generated a strain expressing sfGFP_1-10 driven by the let-858 promoter and unc-54 3’UTR using MosSCI, but later replaced the let-858 promoter with the eft-3 promoter using CRISPR/Cas9 and hybrid DNA donor template because we observed that Peft-3 resulted in significantly higher levels of gene expression [Paix 2015, Dokshin 2018] (Supplementary Table S4). To validate this strain, we inserted sfGFP₁₁ at the N-terminus of lysosomal VHA-13 or at the C-terminus of nuclear-localized HIS-3 (Figure 5, A and B). Both strains yielded relatively bright signals in accordance with their predicted subcellular localization. We generated eGFP::VHA-13 transgenic animals and compared their fluorescence to sfGFP11::VHA-13 in worms expressing sfGFP_1-10 somatically (Figure S9, A and B). At the vha-13 locus, split-sfGFP was about a third as bright as a full-length eGFP. It is worth noting that this comparison is not perfect, in part due to the absence of the six superfolder mutations S30R, Y39N, N105T, Y145F, I171V and A206V in the eGFP fluorophore.

**Figure 5: Split-sfGFP and split-wrmScarlet dual color protein labeling.** Images of animals stably expressing sfGFP_1-10 in somatic tissues (A) CF4592 (*muIs253[Peft-3::sfGFP_1-10::unc-54 3'UTR, Cbr-unc-119(+)] II; unc-119(ed3) III; his-3(muIs255[his-3::sfGFP₁₁] V*) or (B) CF4589 (*muIs253[Peft-3::sfGFP_1-10::unc-54 3'UTR, Cbr-unc-119(+)] II; unc-119(ed3) III; vha-13(muIs268[sfGFP₁₁::vha-13]) V* ). (C) Dual color protein labeling with split-wrmScarlet and split-sfGFP in somatic cells. Composite display of red and green channels of animals expressing split-wrmScarlet_1-10 and sfGFP_1-10 in somatic tissues, HIS-3::sfGFP₁₁ and split-wrmScarlet₁₁::FIB-1; CF4602 (muIs253[Peft-3::sfGFP_1-10::unc-54 3'UTR, Cbr-unc-119(+)], muIs252[Peft-3::split-wrmScarlet_1-10::unc-54 3'UTR, Cbr-unc-119(+)] II; unc-119(ed3) III; fib-1(muIs254[split-wrmScarlet₁₁::fib-1]), his-3(muIs255[his-3::sfGFP₁₁]) V). Maximum intensity projections of 3D stacks shown. Scale bars, 50 µm.

sGFP2₁₁-mediated tagging in the germline

We also generated a germline-specific sGFP2_1-10 strain using a similar strategy (Figure S8B, Supplementary Material, Table S1). Split-sGFP2 is a split-superfolder GFP variant optimized for brightness and photostability [Köker 2018] . To test this germline-specific line DUP223, we tagged the C-terminus of endogenous PGL-1 with sGFP2₁₁ and confirmed the green fluorescence from P-granules (Figure S8B, lower panel).

Dual color protein labeling with split-wrmScarlet and split-sfGFP

Finally, to allow two-color imaging in the soma, we crossed the strains Peft-3::sfGFP_1-10; his-3::sfGFP₁₁ (CF4592) and Peft-3::split-wrmScarlet_1-10; split-wrmScarlet₁₁::fib-1 (CF4601). This cross resulted in the line Peft-3::sfGFP_1-10, Peft-3::split-wrmScarlet_1-10 (CF4588) as well as the dually labeled strain Peft-3::sfGFP_1-10, Peft-3::split-wrmScarlet_1-10; split-wrmScarlet₁₁::fib-1, his-3::sfGFP₁₁ (CF4602, Figure 5C). The fluorescent signals from both split-FPs appeared in their respective subcellular compartments, suggesting two systems are compatible. We note an additional advantage of the strain CF4588: the loci of split-wrmScarlet_1-10 and sfGFP_1-10 are genetically linked (only 0.96 cM apart), which facilitates outcrossing when needed. In addition, all our parental C. elegans lines expressing split-wrmScarlet_1-10 and sfGFP_1-10 are viable homozygotes, so the strains do not require special maintenance.

The current split-wrmScarlet is not detectable in mammalian cells

We failed to detect split-wrmScarlet in mammalian cells, despite our efforts to rescue its fluorescence by screening a mammalian-codon-optimized split-wrmScarlet₁₁ single/double mutant library in HEK293T cells (Figure S10, A and B, and supplementary text).

Discussion

In this study, we describe several new tools for rapid CRISPR-mediated labeling of endogenously-expressed proteins using split fluorophores. While these tools are powerful and relatively easy to implement, several considerations should be taken into account when using this method. First, as with all existing split-FP systems, detection of a given protein labeled with an FP₁₁ can only occur in a cellular compartment where the corresponding FP_1-10 is present. Proteins tagged with split-wrmScarlet₁₁ or sfGFP₁₁ generated in this work were either exposed to the cytosol or nucleoplasm (nuclei or nucleoli), where split-wrmScarlet_1-10 and/or sfGFP_1-10 were present. For proteins or epitopes located within the lumen of organelles, such as mitochondria or the endoplasmic reticulum, one might need to generate and validate C. elegans lines expressing split-wrmScarlet_1-10 or sfGFP_1-10 containing a mitochondrial localization sequence or ER signal peptide and retention signals, respectively. These approaches have been used successfully in mammalian cells with split-sfGFP when tagging ER-resident polypeptides [Kamiyama 2016] and with split-sfCherry2 to detect proteins present in the mitochondrial matrix [Ramadani-Muja 2019].

Second, when labeling proteins with split-wrmScarlet at the C-terminus, we recommend using the 24 a.a. split-wrmScarlet₁₁ sequence YTVVEQYEKSVARHCTGGMDELYK. As described in the Results section, our 18 a.a. split-wrmScarlet₁₁ fragment ends in gly-gly, which has been shown to be a degradation signal in mammalian cells. We cannot exclude the possibility that ending in gly-gly can be detrimental in C. elegans. The 24 a.a. split-wrmScarlet_1-10 still fits within a 200 nt ssODN donor template with a 12 nt linker and up to 58 nt homology arms and is at least as bright as the 18 a.a. split-wrmScarlet₁₁ (Figure S6).

Third, as for any other protein tag, it is important to select, when possible, a site that is unlikely to interfere with protein folding, function or localization [Snapp 2005, Nance 2019]. For example, N-termini of membrane- and organelle-resident proteins often contain signal peptides or localization signals, and C-termini may contain degron sequences that regulate protein turnover. Interestingly, there are examples of proteins that become toxic when tagged with a full-length GFP, but tolerate labeling with a split fluorescent protein. For example, SYP-4 was reported to be mostly functional when endogenously tagged with sfGFP₁₁ in a strain expressing sfGFP_1-10 specifically in the germline, but not functional when labeled with full-length GFP [Hefel 2019]. Fourth, for proteins of interest present at low levels, we provided an alternative protocol to insert an additional two or three split-wrmScarlet₁₁ fragments, which increases the overall fluorescence substantially. However, the number of split-wrmScarlet₁₁ fragments could likely be increased further, to at least seven tandem repeats, based on approaches used successfully with split-sfGFP in human cells [Feng 2017] and C. elegans [Noma 2017, He 2019, Hefel 2019].

Fifth, we would like to emphasize differences between our technique and the bimolecular fluorescence-complementation (BiFC) assay. When used together, high-affinity green and red split fluorescent proteins can provide information on co-localization, but unlike BiFC split proteins [Hu 2002], they are not intended to assess protein-protein interactions directly. This is because BiFC split proteins require finely tuned weak affinities that do not disrupt the underlying interaction being studied. In our approach, only the split-wrmScarlet₁₁ fragment is attached to a protein of interest, the split-wrmScarlet_1-10 one is expressed in excess and unattached.

Finally, we would like to note that despite being three times brighter than the latest split-sfCherry3 in worms, our current split-wrmScarlet was not visible in the mammalian cell line we examined (Figure S10). Its ability to fluoresce is not restricted to worms, because it can reach wild-type levels of brightness in yeast. We do not know the basis for this discrepancy. It is possible that the concentration of the split-wrmScarlet_1-10 fragment in mammalian cells is too low to drive complementation with split-wrmScarlet₁₁. This could potentially be overcome by further mutagenizing split-wrmScarlet and screening for fluorescence at low expression levels in mammalian cells.

We believe our system can substantially increase the speed, efficiency, and ease of in vivo microscopy studies in C. elegans. We expect it to facilitate two-color and co-localization experiments and to find wide use in the worm community. We believe that these strains could facilitate novel or large-scale experiments, such as efforts to tag the entire genome of C. elegans.

Materials and Methods

Mutagenesis and screening. For the initial screenings in E. coli, we introduced a 32 amino-acid spacer between the 10^th and 11^th β-strands of full-length mScarlet in a pRSET vector [Feng 2017]. This starting construct was nonfluorescent, but we restored low fluorescence levels by introducing the superfolder mutation G220A. Semi-random mutagenesis was carried out using rolling-circle amplification with NNK primers at positions I8, K10, F15, G32, Q43, A45, K46, L47, G52, G53, D60, S63, P64, Q65, F66, S70, R71, T74, K75, D79, Y84, W94, R96, T107, V108, Q110, E115, L125, R126, T128, K139, K140, W144, E145, S147, T148, E149, R150, I162, K163, M164, L175, F178, K179, K183, K185, K186, N195, R198, I202, T203, S204, D208, Y209, T210, V211, V212, E213, Q214, Y215, E216, R217, S218, E219, A220, H222, S223, T224, G225, G226, M227, D228, and E229 with Phusion polymerase (NEB) in GC buffer, followed by pooling of the PCR products, DpnI digestion and transformation into BL21(DE3) E. coli. These positions covered areas deemed important for brightness or stability, and the interface between FP₁₁ and FP_1-10. Primers were resynthesized if a mutation interfered with neighboring mutagenic primer binding. The brightest three to five colonies were identified using a Leica M165 FC fluorescent stereomicroscope, and their plasmid DNA subjected to a new mutagenesis round. After five rounds, we separated the two fragments of a version of split-wrmScarlet (which had fluorescence comparable to mScarlet) into two S. cerevisiae plasmids to test for complementation. Because we did not detect fluorescence, we continued selection using two plasmids in yeast. For screening on two plasmids, a pRSET vector expressing split-wrmScarlet_1-10 and a pD881-MR vector (ATUM) expressing mTagBFP-split-wrmScarlet₁₁ (without the MDELYK tail from the C-terminus) were used to perform the semi-random mutagenesis. The libraries were co-electroporated into E. coli and expression was induced with 1% rhamnose and 1 mM IPTG. The library was enriched for fluorescent clones using FACS, and then subcloned to make pRS-GPD-split-wrmScarlet_1-10 and p416-TEF-membrane-mTagBFP-split-wrmScarlet₁₁. The yeast plasmids were co-transformed into a URA^-, HIS^-, LEU^-, MET^- S. cerevisiae strain and selected for in SC media without uracil and histidine, and FACS was used again for enrichment of clones with the highest red to blue ratio. After three rounds of semi-random mutagenesis with the two-plasmid strategy, a final round of random mutagenesis was performed using the GeneMorph II kit (Agilent). Yeast plasmids are available through Addgene (https://www.addgene.org/158585/, https://www.addgene.org/158584/), and E. coli plasmid sequences are present in Supplementary Material, Table S5.

C. elegans strains and maintenance. Animals were cultured under standard growth conditions with E. coli OP50 at 20°C [Brenner 1974]. Strains generated in this work are listed in the Supplementary Material, Table S3.

Nucleic acid reagents. Synthetic nucleic acids were purchased from Integrated DNA Technologies (IDT), GenScript or Genewiz. For knock-in of a single split-wrmScarlet₁₁ or sfGFP₁₁ sequence, 200-mer HDR templates were ordered in ssODN form (synthetic single-stranded oligodeoxynucleotide donors) from IDT. For knock-in of split-wrmScarlet₁₁ repeats, HDR templates were ordered in dsDNA form (plasmids) from GenScript or Genewiz. For plasmids injected as extrachromosomal arrays, sequences were synthesized and cloned into the pUC57 vector (Genewiz). The complete set of crRNAs and DNA sequences used for the experiments described here can be found in Supplementary Material, Tables S1, S4 and S5.

Strain generation: CRISPR/Cas9-triggered homologous recombination. CRISPR insertions were performed using published protocols [Paix 2015, Paix 2016]. Ribonucleoprotein complexes (protein Cas9, tracrRNA, crRNA) and DNA templates were microinjected into the gonad of young adults using standard methods [Evans 2016]. Injected worms were singled and placed at 25ºC overnight. All crRNA and DNA template sequences used to generate the strains described in this work are listed in the Supplementary Material, Table S4. Split-wrmScarlet₁₁ and sfGFP₁₁ integrants were identified by screening for fluorescence in the F1 or F2 progeny of injected worms. The co-CRISPR dpy-10(cn64) mutation was used as a marker when generating nonfluorescent strains. The CF4582 strain muIs252[Peft-3::split-wrmScarlet_1-10::unc-54 3'UTR Cbr-unc-119(+)] II; unc-119(ed3) III was generated by replacing the tir-1::mRuby sequence from the strain CA1200 ieSi57 II; unc-119(ed3) III [Zhang 2015] with the split-wrmScarlet_1-10 sequence. The CF4587 strain muIs253[Peft-3::sfGFP_1-10::unc-54 3'UTR Cbr-unc-119(+)] II; unc-119(ed3) III was generated by replacing the let-858 promoter from the strain COP1795 knuSi785 [pNU1687(Plet-858::sfGFP_1-10::unc-54 3’UTR unc-119(+))] II; unc-119(ed3) III with the eft-3 (also known as eef-1A.1) promoter. Both CF4582 and CF4587 strains were generated using long, partially single-stranded DNA donors [Dokshin 2018]. The CF4610 strain muIs257[Pmyo-3::split-wrmScarlet_1-10::unc-54 3'UTR] I was generated by inserting the split-wrmScarlet_1-10 sequence in the WBM1126 strain following the SKI LODGE protocol [Silva-García 2019]. The strains PHX731 vha-13(syb731[wrmScarlet::vha-13]) V and PHX1049 vha-13(syb1049[gfp::vha-13]) V were generated by SunyBiotech's CRISPR services. Strains generated were genotyped by Sanger sequencing of purified PCR products (Genewiz).

Strain generation: Mos1-mediated single-copy insertion. The COP1795 strain was generated by NemaMetrix's MosSCI services. The PHX1797 strain was generated by SunyBiotech's MosSCI services, using a codon-optimized sequence of split-wrmScarlet_1-10 with three introns, and engineered to avoid piRNA recognition transgene silencing [Wu 2018, Zhang 2018] (Supplementary Material, Table S1)

Strain generation: genetic crosses. The following C. elegans strains were created by standard genetic crosses: CF4588 muIs253[Peft-3::sfGFP_1-10::unc-54 3'UTR Cbr-unc-119(+)] muIs252[Peft-3::split-wrmScarlet_1-10::unc-54 3'UTR Cbr-unc-119(+)] II; unc-119(ed3) III and CF4602 muIs253[Peft-3::sfGFP_1-10::unc-54 3'UTR Cbr-unc-119(+)] muIs252[Peft-3::split-wrmScarlet_1-10::unc-54 3'UTR Cbr-unc-119(+)] II; unc-119(ed3) III; fib-1(muIs254[split-wrmScarlet₁₁::fib-1]) his-3(muIs255[his-3::sfGFP₁₁]) V. Nonfluorescent parental lines CF4582, CF4587 and CF4610 generated using dpy-10(cn64) co-CRISPR were backcrossed at least once.

Strain generation: plasmid microinjection. Peft-3::3NLS::mTagBFP2::split-wrmScarlet₁₁::T2A::mNeonGreen::split-wrmScarlet_1-10::fib-1 3’UTR, Peft-3::3NLS::mTagBFP2::sfCherry3₁₁::T2A::mNeonGreen::sfCherry3_1-10::fib-1 3’UTR, Pmyo-3::mTagBFP2::split-wrmScarlet₁₁::T2A::mNeonGreen::split-wrmScarlet_1-10::fib-1 3’UTR, or Pmyo-3::mTagBFP2::sfCherry3₁₁::T2A::mNeonGreen::sfCherry3_1-10::fib-1 3’UTR constructs were microinjected at 20 ng/μL using a standard microinjection procedure [Mello 1991]. Germline gene expression was achieved using a microinjection-based protocol with diluted transgenic DNA [Kelly 1997], Psun-1::mNeonGreen::linker::split-wrmScarlet₁₁::tbb-2 3’UTR construct (5 ng/µL) was co-injected with PvuII-digested genomic DNA fragments from E. coli (100 ng/µL). Plasmid sequences are listed in Supplementary Material, Table S5.

Germline strain generation: glh-1::T2A::split-wrmScarlet_1-10 and glh-1::T2A::sGFP2_1-10. Using CRISPR/Cas9, the C-terminus of glh-1 was tagged with either T2A::split-wrmScarlet_1-11 or T2A::sGFP2_1-11, a split superfolder GFP variant optimized for brightness and photostability [Köker 2018]. Fluorescence originating from these full-length fusions was present throughout the cytoplasm and nuclei of adult germ cells and gametes, with the maternally deposited signal persisting through the early stages of embryogenesis and larval development (Figure S8A and S8B, top panels). After verifying fluorescence, we used a precise CRISPR/Cas9 deletion of either split-wrmScarlet₁₁ or sGFP2₁₁ to convert these FP_1-11 strains into FP_1-10 strains, DUP237 glh-1(sam140[glh-1::T2A::split-wrmScarlet_1-10]) I and DUP223 glh-1(sam129[glh-1::T2A::sGFP2_1-10]) I and corroborated the absence of fluorescence (Figure S8A and S8B, middle panels). The crRNAs, ssDNAs and dsDNA template sequences are described in Supplementary Material, Tables S1-S4.

Microscopy. Confocal fluorescence imaging was performed using the NIS Elements imaging software on a Nikon confocal spinning disk system equipped with an Andor EMCCD camera, a CSU-X1 confocal scanner (Yokogawa), 405, 488, and 561 nm solid-state lasers, and 455/50, 525/26 and 605/70 nm emission filters. Transgenic animals expressing sfGFP₁₁ or split-wrmScarlet₁₁ were screened using a Leica M165 FC fluorescent stereomicroscope equipped with a Sola SE-V with GFP and mCherry filters.

Image analysis. Images were analyzed using Fiji. Image manipulations consisted of maximum intensity projections along the axial dimension, rolling ball radius background subtraction, smoothing, and LUT minimum and maximum adjustments. Masks were created by thresholding and setting the pixels under the threshold cutoff to NaN. Plotting of values per pixel was carried out in python 3, using numpy and matplotlib. When performing normalizations for split-sfCherry3 versus split-wrmScarlet, the red channel was divided by the green channel (mNeonGreen::FP_1-10) because the localization of both fragments is expected to be the same (cytosolic). For normalization of signals where mTagBFP::FP₁₁ is targeted to the membrane, the blue channel was used instead of the green channel.

Mounting worms for microscopy. Pads made of 3% agarose (GeneMate) were dried briefly on Kimwipes (Kimtech) and transferred to microscope slides. Around 10 μL of 2 mM levamisole (Sigma) was pipetted onto the center of the agarose pad. Animals were transferred to the levamisole drop, and a coverslip was placed on top before imaging.

Brood size analysis. Eight single synchronized adults grown at 20°C were transferred to fresh plates every 24 hours until cessation of reproduction, and the number of viable progeny produced by each worm was scored.

Developmental toxicity assay. Ten N2E wild-type animals were microinjected with either Peft-3::3NLS::mTagBFP2::split-wrmScarlet₁₁::T2A::mNeonGreen::split-wrmScarlet_1-10::fib-1 3’UTR or Peft-3::3NLS::mTagBFP2::sfCherry3₁₁::T2A::mNeonGreen::sfCherry3_1-10::fib-1 3’UTR construct at (20 ng/μL) and were singled. mNeonGreen-positive F1 animals were scored and their development was monitored for up to five days from egg-laying. The number of fluorescent dead eggs, arrested larvae (i.e. animals never reaching adulthood) or adults were scored for each group.

Comparison of split-sfCherry3 to split-wrmScarlet in muscle. Ten N2E wild-type animals were microinjected with either Pmyo-3::mTagBFP2::split-wrmScarlet₁₁::T2A::mNeonGreen::split-wrmScarlet_1-10::fib-1 3’UTR, or Pmyo-3::mTagBFP2::sfCherry3₁₁::T2A::mNeonGreen::sfCherry3_1-10::fib-1 3’UTR constructs were microinjected at 20 ng/μL. F1 animals expressing mNeonGreen in muscle were selected for comparison.

Lifespan assays. NGM plates were supplemented with 5-Fluorouracil (5-FU, Sigma, 15 μM) [Goudeau 2011] in order to prevent progeny from hatching and with kanamycin sulfate to prevent bacterial contamination (Sigma, 25 μg/mL). Animals fed with kanamycin-resistant OP50 were scored manually as dead or alive, from their L4 larval stage defined as day 0. A worm was considered alive if it moved spontaneously or, in cases where it wasn’t moving, if it responded to a light touch stimulus with a platinum wire. Animals that crawled off the plates, had eggs that accumulated internally, burrowed or ruptured were censored and included in the analysis until the time of censorship.

Structure prediction and rendering of split-wrmScarlet. Phyre2 was used to predict the three-dimensional modelling in intensive mode with default parameters [Kelley 2015]. The 3D model obtained was visualized using PyMOL (v2.2.0).

Statistical analysis. Differences in fluorescence intensity between groups were compared using unpaired t-test with Welch’s correction. Data are presented as means ± SD. Kaplan-Meier estimates of survival curves were calculated using survival (v2.38–3) and rms (v4.5–0) R packages and differences were tested using log-rank test. The number of animals used in each experiment is indicated in the figure legends.

Data availability. Strains expressing a single-copy of split-wrmScarlet_1-10 and/or sfGFP_1-10 CF4582, CF4587, CF4588, CF4610, DUP223 and DUP237 are available via the Caenorhabditis Genetics Center (CGC). The vectors pJG100 carrying Peft-3::split-wrmScarlet_1-10::unc-54 3’UTR, pJG103, carrying split-wrmScarlet₁₁ x3 tandem repeats, yeast plasmids p416-TEF-membrane localization signal-mTagBFP-split-wrmScarlet₁₁-TEF terminator and pRS423-GPD-split-wrmScarlet_1-10-CYC1 terminator are deposited, along with sequences and maps at Addgene. Other strains and plasmids are available upon request. The authors state that all data necessary for confirming the conclusions presented here are represented fully within the article. A detailed protocol to generate C. elegans with sfGFP₁₁ and/or split-wrmScarlet₁₁ integrants is available at doi.org/10.17504/protocols.io.bamkic4w. Supplementary material is also available at Figshare.

Author contributions

M.I. developed the split-wrmScarlet in A.G.Y. laboratory. J.P. performed the cell sorting. J.G. performed C. elegans experiments in C.K. laboratory. C.S. generated the two C. elegans germline strains in D.U. laboratory. L.S. conducted the mammalian cell experiments in M.D.L. laboratory. J.G. wrote the initial draft. All authors provided intellectual contributions to the collaboration.

Acknowledgements

We thank Katie Podshivalova, Rex Kerr, Calvin Jan and David Botstein for comments on the manuscript, Peichuan Zhang for experimental suggestions, Vikram Narayan for biochemistry advice, and members of the Kenyon lab for fruitful discussions. We would like to thank Abby Dernburg for bringing to our attention that a gly-gly C-terminus might be a degron. We also thank Behnom Farboud from Barbara Meyer's laboratory for discussing CRISPR/Cas9 protocols and Liangyu Zhang from Abby Dernburg's laboratory for sharing sequences of the CA1200 strain. Some strains were provided by the CGC, which is funded by NIH Office of Research Infrastructure Programs (P40 OD010440). A.G.Y. and C.K. are supported by Calico Life Sciences L.L.C., M.D.L. by the Chan Zuckerberg Biohub, and D.L.U. by NIH-NIGMS (R01 GM113933) with use of equipment supported by NIH-NIGMS (P20 GM103423).

Appendix

Additional details and discussion can be found in the appendix, which is also referenced via hyperlinks throughout this article.

Research Article

Note that this is a limited PDF or print version; animated and interactive figures are disabled. For the full version of this article, please visit one of the following links: https://andrewgyork.github.io/split_wrmscarlet https://calico.github.io/split_wrmscarlet

Split-wrmScarlet and split-sfGFP: tools for faster, easier fluorescent labeling of endogenous proteins in Caenorhabditis elegans

Abstract

Peer review status

Introduction

Results

Split-wrmScarlet

Split-wrmScarlet is threefold brighter than split-sfCherry3 in C. elegans muscles

Split-wrmScarlet11-mediated tagging in all somatic tissues or specifically in muscles

Split-wrmScarlet11-mediated tagging in the germline.

Split-wrmScarlet11 tandem repeats increase fluorescence

sfGFP11-mediated tagging in somatic cells

sGFP211-mediated tagging in the germline

Dual color protein labeling with split-wrmScarlet and split-sfGFP

The current split-wrmScarlet is not detectable in mammalian cells

Discussion

Materials and Methods

Author contributions

Acknowledgements

Appendix

Split-wrmScarlet₁₁-mediated tagging in all somatic tissues or specifically in muscles

Split-wrmScarlet₁₁-mediated tagging in the germline.

Split-wrmScarlet₁₁ tandem repeats increase fluorescence

sfGFP₁₁-mediated tagging in somatic cells

sGFP2₁₁-mediated tagging in the germline