Identification of SNPs in rice GPAT genes and in silico analysis of their functional impact on GPAT proteins

  • Imran SAFDER State Key Laboratory of Rice Biology and China National Center for Rice Improvement, China National Rice Research Institute, Hangzhou 310006 (CN)
  • Gaoneng SHAO State Key Laboratory of Rice Biology and China National Center for Rice Improvement, China National Rice Research Institute, Hangzhou 310006 (CN)
  • Zhonghua SHENG State Key Laboratory of Rice Biology and China National Center for Rice Improvement, China National Rice Research Institute, Hangzhou 310006 (CN)
  • Peisong HU State Key Laboratory of Rice Biology and China National Center for Rice Improvement, China National Rice Research Institute, Hangzhou 310006 (CN)
  • Shaoqing TANG State Key Laboratory of Rice Biology and China National Center for Rice Improvement, China National Rice Research Institute, Hangzhou 310006 (CN)
Keywords: 3000 Rice Genome project, functional SNPs, in silico analysis, nucleotide variation


SNPs are the most common nucleotide variations in the genome. Functional SNPs in the coding region, known as nonsynonymous SNPs (nsSNPs), change amino acid residues and affect protein function. Identifying functional SNPs is an uphill task as it is difficult to correlate between variation and phenotypes in association studies. Computational in silico analysis provides an opportunity to understand the SNPs functional impact to proteins and facilitate experimental approaches in understanding the relationship between the phenotype and genotype. Advancement in sequencing technologies contributed to sequencing thousands of genomes. As a result, many public databases have been designed incorporating this sequenced data to explore nucleotide variations. In this study, we explored functional SNPs in the rice GPAT family (as a model plant gene family), using 3000 Rice Genome Sequencing Project data. We identified 1056 SNPs, among hundred rice varieties in 26 GPAT genes, and filtered 98 nsSNPs. We further investigated the structural and functional impact of these nsSNPs using various computational tools and shortlisted 13 SNPs having high damaging effects on protein structure. We found that rice GPAT genes can be influenced by nsSNPs and they might have a major effect on regulation and function of GPAT genes. This information will be useful to understand the possible relationships between genetic mutation and phenotypic variation, and their functional implication on rice GPAT proteins. The study will also provide a computational pathway to identify SNPs in other rice gene families.


Metrics Loading ...


Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A, Bork P, … Sunyaev SR (2010). A method and server for predicting damaging missense mutations. Nature Methods 7:248-249.

Alexandrov N, Tai S, Wang W, Mansueto L, Palis K, Fuentes RR, … Li Z (2015). SNP-Seek database of SNPs derived from 3000 rice genomes. Nucleic Acids Research 43:D1023-D1027.

Arif R, Akram F, Jamil T, Mukhtar H, Lee SF, Saleem M (2017). Genetic variation and its reflection on posttranslational modifications in frequency clock and mating Type a-1 proteins in Sordaria fimicola. BioMed Research International 2017:1268623.

Arshad M, Attya Bhatti PJ (2018). Identification and in silico analysis of functional SNPs of human TAGAP protein: A comprehensive study. PloS One 13(1):e0188143.

Ashkenazy H, Abadi S, Martz E, Chay O, Mayrose I, Pupko T, Ben-Tal N (2016). ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules. Nucleic Acids Research 44:W344-W350.

Ashkenazy H, Erez E, Martz E, Pupko T, Ben-Tal N (2010). ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Research 38:W529-W533.

Berezin C, Glaser F, Rosenberg J, Paz I, Pupko T, Fariselli P, Casadio R, Ben-Tal N (2004). ConSeq: the identification of functionally and structurally important residues in protein sequences. Bioinformatics 20:1322-1324.

Bhardwaj A, Dhar YV, Asif MH, Bag SK (2016). In silico identification of SNP diversity in cultivated and wild tomato species: insight from molecular simulations. Scientific Reports 6:38715-38715.

Bhardwaj VK, Purohit R (2020). Structural changes induced by substitution of amino acid 129 in the coat protein of Cucumber mosaic virus. Genomics 112:3729-3738.

Blom N, Gammeltoft S, Brunak S (1999). Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. Journal of Molecular Biology 294:1351-1362.

Carugo O, Pongor S (2001). A normalized root‐mean‐square distance for comparing protein three‐dimensional structures. Protein Science 10:1470-1473.

Celniker G, Nimrod G, Ashkenazy H, Glaser F, Martz E, Mayrose I, Pupko T, Ben‐Tal N (2013). ConSurf: using evolutionary data to raise testable hypotheses about protein function. Israel Journal of Chemistry 53:199-206.

Chaisan T, Van K, Kim MY, Kim KD, Choi B-S, Lee S-H (2012). In silico single nucleotide polymorphism discovery and application to marker-assisted selection in soybean. Molecular Breeding 29:221-233.

Chen M-H, Bergman CJ, Pinson SRM, Fjellstrom RG (2008). Waxy gene haplotypes: Associations with pasting properties in an international rice germplasm collection. Journal of Cereal Science 48:781-788.

Choi Y, Chan AP (2015). PROVEAN web server: a tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics (Oxford, England) 31:2745-2747.

Choi Y, Sims GE, Murphy S, Miller JR, Chan AP (2012). Predicting the functional effect of amino acid substitutions and indels. PloS One 7.

Cobb JN, DeClerck G, Greenberg A, Clark R, McCouch S (2013). Next-generation phenotyping: requirements and strategies for enhancing our understanding of genotype-phenotype relationships and its relevance to crop improvement. Theoretical and Applied Genetics 126:867-887.

De Alencar S, Lopes JC (2010). A comprehensive in silico analysis of the functional and structural impact of SNPs in the IGF1R gene. BioMed Research International 715139.

Deller MC, Kong L, Rupp B (2016). Protein stability: a crystallographer's perspective. Acta Crystallographica Section F: Structural Biology Communications 72:72-95.

Friso G, van Wijk KJ (2015). Posttranslational protein modifications in plant metabolism. Plant Physiology 169:1469-1487.

Gailing O, Vornam B, Leinemann L, Finkeldey R (2009). Genetic and genomic approaches to assess adaptive genetic variation in plants: forest trees as a model. Physiologia Plantarum 137:509-519.

Goff SA, Ricke D, Lan T-H, Presting G, Wang R, Dunn M, Glazebrook J, Sessions A, Oeller P, Varma H (2002). A draft sequence of the rice genome (Oryza sativa L. ssp. japonica). Science 296:92-100.

Guajardo V, Solís S, Almada R, Saski C, Gasic K, Moreno MÁ (2020). Genome-wide SNP identification in Prunus rootstocks germplasm collections using genotyping-by-sequencing: phylogenetic analysis, distribution of SNPs and prediction of their effect on gene function. Scientific Reports 10:1467-1467.

Gulzar N, Dingerdissen H, Yan C, Mazumder R (2017). Impact of nonsynonymous single-nucleotide variations on post-translational modification sites in human proteins. Protein Bioinformatics, Springer, pp 159-190.

Han JH, Kerrison N, Chothia C, Teichmann SA (2006). Divergence of interdomain geometry in two-domain proteins. Structure (London, England, 1993) 14:935-945.

Hirakawa H, Shirasawa K, Ohyama A, Fukuoka H, Aoki K, Rothan C, Sato S, Isobe S, Tabata S (2013). Genome-wide SNP genotyping to infer the effects on gene functions in tomato. DNA Research 20:221-233.

Huq MA, Akter S, Nou IS, Kim HT, Jung YJ, Kang KK (2016). Identification of functional SNPs in genes and their effects on plant phenotypes. Journal of Plant Biotechnology 43:1-11.

Islam MJ, Khan AM, Parves MR, Hossain MN, Halim MA (2019). Prediction of deleterious non-synonymous SNPs of human STK11 gene by combining algorithms, molecular docking, and molecular dynamics simulation. Scientific Reports 9:16426.

Jackson SA (2016). Rice: the first crop genome. Rice 9:1-3.

Jiang D, Ye Q-l, Wang F-S, Cao L (2010). The mining of citrus EST-SNP and its application in cultivar discrimination. Agricultural Sciences in China 9:179-190.

Jovine L, Qi H, Williams Z, Litscher E, Wassarman PM (2002). The ZP domain is a conserved module for polymerization of extracellular proteins. Nature Cell Biology 4:457-461.

Kamaraj B, Purohit R (2013). In silico screening and molecular dynamics simulation of disease-associated nsSNP in TYRP1 gene and its structural consequences in OCA3. BioMed Research International 2013:697051-697051.

Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJ (2015). The Phyre2 web portal for protein modeling, prediction and analysis. Nature Protocols 10:845.

Kharabian-Masouleh A, Waters DLE, Reinke RF, Ward R, Henry RJ (2012). SNP in starch biosynthesis genes associated with nutritional and functional properties of rice. Scientific Reports 2:557.

Kharabian A (2010). An efficient computational method for screening functional SNPs in plants. Journal of Theoretical Biology 265:55-62.

Korani W, Clevenger JP, Chu Y, Ozias-Akins P (2019). Machine learning as an effective method for identifying true single nucleotide polymorphisms in polyploid plants. Plant Genome 12.

Kumar B, Abdel-Ghani AH, Pace J, Reyes-Matamoros J, Hochholdinger F, Lübberstedt T (2014). Association analysis of single nucleotide polymorphisms in candidate genes with root traits in maize (Zea mays L.) seedlings. Plant Science 224:9-19.

Lek M, Karczewski KJ, Minikel EV, Samocha KE, Banks E, Fennell T, … MacArthur DG (2016). Analysis of protein-coding genetic variation in 60,706 humans. Nature 536:285-291.

Li B, Krishnan VG, Mort ME, Xin F, Kamati KK, Cooper DN, Mooney SD, Radivojac P (2009a). Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics 25:2744-2750.

Li J-Y, Wang J, Zeigler RS (2014). The 3,000 rice genomes project: new opportunities and challenges for future rice research. Gigascience 3:2047-217X.

Li X, Gao X, Ren J, Jin C, Xue Y (2009b). BDM-PUB: Computational prediction of protein ubiquitination sites with a Bayesian discriminant method.

Liao M-l, Somero GN, Dong Y-W (2019). Comparing mutagenesis and simulations as tools for identifying functionally important sequence changes for protein thermal adaptation. Proceedings of the National Academy of Sciences 116:679-688.

Majeed S, Rana IA, Atif RM, Ali Z, Hinze L, Azhar MT (2019). Role of SNPs in determining QTLs for major traits in cotton. Journal of Cotton Research 2:5.

Mammadov J, Aggarwal R, Buyyarapu R, Kumpatla S (2012). SNP markers and their impact on plant breeding. International Journal of Plant Genomics 2012:728398.

Mansueto L, Fuentes RR, Borja FN, Detras J, Abriol-Santos JM, Chebotarov D, … Alexandrov N (2016). Rice SNP-seek database update: new SNPs, indels, and queries. Nucleic Acids Research 45:D1075-D1081.

McCouch SR, Zhao K, Wright M, Tung C-W, Ebana K, Thomson M, Reynolds A, Wang D, DeClerck G, Ali ML (2010). Development of genome-wide SNP assays for rice. Breeding Science 60:524-535.

Nelson MR, Marnellos G, Kammerer S, Hoyal CR, Shi MM, Cantor CR, Braun A (2004). Large-scale validation of single nucleotide polymorphisms in gene regions. Genome Research 14:1664-1668.

Ng PC, Henikoff S (2003). SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Research 31:3812-3814.

Ortbauer M, Vahdati K, Leslie C (2013). Abiotic stress adaptation: protein folding stability and dynamics. Abiotic Stress-Plant Responses and Applications in Agriculture 1:3-25.

Pea G, Aung HH, Frascaroli E, Landi P, Pè ME (2013). Extensive genomic characterization of a set of near-isogenic lines for heterotic QTL in maize (Zea mays L.). BMC Genomics 14:61.

Pettersen EF, Goddard TD, Huang CC, Couch GS, Greenblatt DM, Meng EC, Ferrin TE (2004). UCSF Chimera–A visualization system for exploratory research and analysis. Journal of Computational Chemistry 25:1605-1612.

Piquerez SJ, Balmuth AL, Sklenář J, Jones AM, Rathjen JP, Ntoukakis V (2014). Identification of post-translational modifications of plant protein complexes. JoVE Journal of Visualized Experiments e51095.

Qiu W-R, Xiao X, Lin W-Z, Chou K-C (2014). iMethyl-PseAAC: identification of protein methylation sites via a pseudo amino acid composition approach. BioMed Research International 947416.

Radivojac P, Vacic V, Haynes C, Cocklin RR, Mohan A, Heyen JW, Goebl MG, Iakoucheva LM (2010). Identification, analysis, and prediction of protein ubiquitination sites. Proteins: Structure, Function, and Bioinformatics 78:365-380.

Rasal KD, Shah TM, Vaidya M, Jakhesara SJ, Joshi CG (2015). Analysis of consequences of non-synonymous SNP in feed conversion ratio associated TGF-β receptor type 3 gene in chicken. Meta Gene 4:107-117.

Safder I, Shao G, Sheng Z, Hu P, Tang S (2021). Identification and analysis of the structure, expression and nucleotide polymorphism of the GPAT gene family in rice. Plant Gene 100290.

Salmon M, Thimmappa RB, Minto RE, Melton RE, Hughes RK, O’Maille PE, Hemmings AM, Osbourn A (2016). A conserved amino acid residue critical for product and substrate specificity in plant triterpene synthases. Proceedings of the National Academy of Sciences 113:E4407-E4414.

Sandhu D, Pudussery MV, Kumar R, Pallete A, Markley P, Bridges WC, Sekhon RS (2020). Characterization of natural genetic variation identifies multiple genes involved in salt tolerance in maize. Functional & Integrative Genomics 20:261-275.

Schreiber L, Nader-Nieto AC, Schönhals EM, Walkemeier B, Gebhardt C (2014). SNPs in genes functional in starch-sugar interconversion associate with natural variation of tuber starch and sugar content of potato (Solanum tuberosum L.). G3 Genes, Genomes, Genetics 4:1797-1811.

Senior AW, Evans R, Jumper J, Kirkpatrick J, Sifre L, Green T, … Hassabis D (2020). Improved protein structure prediction using potentials from deep learning. Nature 577:706-710.

Seymour GB, Chapman NH, Chew BL, Rose JK (2013). Regulation of ripening and opportunities for control in tomato and other fruits. Plant Biotechnology Journal 11:269-278.

Tam V, Patel N, Turcotte M, Bossé Y, Paré G, Meyre D (2019). Benefits and limitations of genome-wide association studies. Nature Reviews Genetics 20:467-484.

Tibbs Cortes L, Zhang Z, Yu J (2021). Status and prospects of genome-wide association studies in plants. The Plant Genome 14:e20077.

Wang C-C, Yu H, Huang J, Wang W-S, Faruquee M, Zhang F, … Zheng T-Q (2020a). Towards a deeper haplotype mining of complex traits in rice with RFGB v2.0. Plant Biotechnology Journal 18:14-16.

Wang H, Ham T-H, Im D-E, Lar SM, Jang S-G, Lee J, Mo Y, Jeung J-U, Kim ST, Kwon S-W (2020b). A new SNP in rice gene encoding pyruvate phosphate dikinase (PPDK) associated with floury endosperm. Genes (Basel) 11:465.

Wang H, Mo Y-J, Im D-E, Jang S-G, Ham T-H, Lee J, Jeung J-U, Kwon S-W (2018). A new SNP in cy OsPPDK gene is associated with floury endosperm in Suweon 542. Molecular Genetics and Genomics 293:1151-1158.

Wen P-P, Shi S-P, Xu H-D, Wang L-N, Qiu J-D (2016). Accurate in silico prediction of species-specific methylation sites based on information gain feature optimization. Bioinformatics 32:3107-3115.

Withana WVE, Kularathna RMRE, Kottearachchi NS, Kekulandara DS, Weerasena J, Steele KA (2020). In silico analysis of the fragrance gene (badh2) in Asian rice (Oryza sativa L.) germplasm and validation of allele specific markers. Plant Genetic Resources: Characterization and Utilization 18:71-80.

Xia Y, Li R, Ning Z, Bai G, Siddique KH, Yan G, Baum M, Varshney RK, Guo P (2013). Single nucleotide polymorphisms in HSP17. 8 and their association with agronomic traits in barley. PloS One 8:e56816.

Xue Y, Zhou F, Zhu M, Ahmed K, Chen G, Yao X (2005). GPS: a comprehensive www server for phosphorylation sites prediction. Nucleic Acids Research 33:W184-W187.

Yang J, Anishchenko I, Park H, Peng Z, Ovchinnikov S, Baker D (2020). Improved protein structure prediction using predicted interresidue orientations. Proceedings of the National Academy of Sciences 117:1496-1503.

Yang W, Bai X, Kabelka E, Eaton C, Kamoun S, van der Knaap E, David F (2004). Discovery of single nucleotide polymorphisms in Lycopersicon esculentum by computer aided analysis of expressed sequence tags. Molecular Breeding 14:21-34.

Zaynab M, Fatima M, Abbas S, Sharif Y, Umair M, Zafar MH, Bahadar K (2018). Role of secondary metabolites in plant defense against pathogens. Microbial Pathogenesis 124:198-202.

Zhang M, Huang C, Wang Z, Lv H, Li X (2020). In silico analysis of non-synonymous single nucleotide polymorphisms (nsSNPs) in the human GJA3 gene associated with congenital cataract. BMC Molecular and Cell Biology 21:12.

Zhang W, Mirlohi S, Li X, He Y (2018). Identification of functional single-nucleotide polymorphisms affecting leaf hair number in Brassica rapa. Plant Physiology 177:490-503.

Zhang Y, Skolnick J (2005). TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Research 33:2302-2309.

Zhao H, Yao W, Ouyang Y, Yang W, Wang G, Lian X, … Xie W (2015). RiceVarMap: a comprehensive database of rice genomic variations. Nucleic Acids Research 43:D1018-D1022.

How to Cite
SAFDER, I., SHAO, G., SHENG, Z., HU, P., & TANG, S. (2021). Identification of SNPs in rice GPAT genes and in silico analysis of their functional impact on GPAT proteins. Notulae Botanicae Horti Agrobotanici Cluj-Napoca, 49(3), 12346.
Research Articles
DOI: 10.15835/nbha49312346