Case Study
EupDB provides data resources and analysis tools for mining Euphorbiaceae related genes. Here, we present one example to illustrate how EupDB is used for mining genes associated with important economic traits. The example starts with a castor bean GWAS locus controlling single seed weight (SSW):
Step 1: According to the GWAS locus (Chr2:30626631) of SSW, we found 13 candidate genes within 50kb around this locus using the “GWAS & QTL Search” (Fig. 1).
Fig. 1: GWAS locus of single seed weight in castor bean
Step 2: Using the “Heatmap” tool, we profiled the expression patterns of 13 candidate genes in different tissues and stages of developing castor seeds, and 9 have the expression (FPKM > 1) in developing seeds and embryos, suggesting they may involve into the seed development (Fig. 2).
Fig. 2: Expression patterns of candidate genes
Step 3: To obtain the valuable information of these 9 genes, we used the “Orthologous Groups” analysis to get the Arabidopsis homologous genes. We found that the candidate gene Rc02G003981 was houmougous to Arabidopsis ZARPN (AT2G02850) (Fig. 3), which has been proven to play an important role in regulating biomass and seed yield as the target gene of miR408 in Arabidopsis and rice (Zhang et al., 2017; Song et al., 2018).
Fig. 3: “Orthologous Groups” search homologous genes
Step 4: We used the “miRNA Target Prediction” to identify the target genes of miR408 in castor bean genome. As a result, we found that the candidate gene Rc02G003981 was the best hit of Rco-MIR408 (Fig. 4), suggesting a similar evolutionary mechanism for regulating seed mass and yield via miR408 and ARPN.
Fig. 4: Using “miRNA Target Prediction” to identify the target genes of miR408 in castor bean genome
Step 5: By Viewing the “Basic information” of the Rc02G003981, we found that the gene is annotated to GO term post-embryonic development (GO0009791) and Pfam term Cu_bind_like (Fig. 5), further implying that this gene is closely related to seed growth and development.
Fig. 5: “Basic information” of the Rc02G003981
Step 6: Other important information. We further used the “SNP” tools in the “Variation” module to detect the genetic variation (SNPs) in genic region and their genotype in population of gene Rc02G003981 (Fig. 6), which provide a valuable information for seed traits identification, genotype analysis of castor germplasm, screening of important germplasms and further breeding and genetic improvements.
Fig. 6: Genetic variation and population information of Rc02G003981
Besides, we used the “DNA Methylation” tool to detect the DNA methylation level of gene Rc02G003981 in root, leaf, endosperm and embryo of castor bean (Fig. 7). We found that the gene body of Rc02G003981 was methylated, suggesting that the gene has a constitutive and relatively high expression, which is consistent with its expression pattern.
Fig. 7: The DNA methylation level of gene Rc02G003981 in root, leaf, endosperm and embryo of castor bean
Step 7: When compared to castor bean, collecting GWAS data on important perennial woody Euphorbiaceae such as rubber tree and tung oil tree is considerably more challenging. And the gene we just identified in castor bean provided clues to the exploration of genes associated with seed traits in these plants.
First, we identified 137 sequences homologous to Rc02G003981 in other Euphorbiaceae species using “BLAST” tool (BLASTP, evalue <1e-10) (Fig. 8). The most similar sequences to Rc02G003981 is Manes.18G085600 and GH714_000666 which can also be identified by more stringent tool “Orthologous Groups”.
Fig. 8: Using “BLAST” to identifiy homologous sequences of Rc02G003981
Then, we used the tool “miRNA Target Prediction” to identify the target genes of miR408 in Manihot esculenta, Hevea brasiliensis and Jatropha curcas (expected value <3). As a result, we found 6 candidate genes: Manes.18G085600, Manes.18G014300, GH714_004551, GH714_018482, GH714_018382, and jc014854 (Fig. 9), which are also included in the BLAST search results. In addition, we used the external tool MEGA to perform sequence alignment of these genes (Fig. 10), indicating that the binding sites of miR408 are similar and these target genes are sequence conserved.
Fig. 9: Using “miRNA Target Prediction” to identify the target genes of miR408
Fig. 10: Multiple sequence alignment by MEGA
Furthermore, we can compare these genes at the gene and genome levels. Taking Manes.18G085600 and Rc02G003981 as an example, we found that the exon-intron structures of these genes are similar (Fig. 11). In addition, by “Syntenic Gene” search in whole genome, we found that Manes.18G085600 and Rc02G003981 are located in syntenic blocks, which further confirmed the conservation of their sequences (Fig.12).
Fig. 11: Gene Structure of Manes.18G085600 and Rc02G003981
Fig. 12: “Syntenic Gene” search in whole genome
In summary, EupDB not only supports functional genomics research on a particular species in the Euphorbiaceae family, but also provides a variety of comparative tools for multiple species within the family, so we can more easily integrate and analyze information among multiple species. We believe that these modules and tools in EupDB will greatly facilitate comparative and functional genomics research on Euphorbiaceae plants, and provide support for the genetic improvement of important economic traits such as starch, rubber and seed oil quality and yield.
Reference:
Song, Z., Zhang, L., Wang, Y., Li, H., Li, S., Zhao, H., & Zhang, H. (2018). Constitutive expression of miR408 improves biomass and seed yield in Arabidopsis. Frontiers in Plant Science, 8, 2114.
Zhang, J. P., Yu, Y., Feng, Y. Z., Zhou, Y. F., Zhang, F., Yang, Y. W., … & Chen, Y. Q. (2017). MiR408 regulates grain yield and photosynthesis via a phytocyanin protein. Plant Physiology, 175(3), 1175-1185.