GOS 2138030

From Metagenes
Warning: this metagenomic sequence has been carefully annotated by students during bioinformatics assignments. These quality annotations are therefore the result of a teaching exercise that you are most welcome to amend and extend if necessary!


Sequence
CAMERA AccNum : JCVI_READ_1091120566212
Annotathon code: GOS_2138030
Sample :
  • GPS :0°35'38s; 91°4'10w
  • Galapagos Islands: Mangrove on Isabella Island - Ecuador
  • Mangrove (-0.1m, 25.4°C, 0.1-0.8 microns)
Authors
Team : Algarve 2011
Username : biotec_DIV
Annotated on : 2011-06-08 01:25:12
  • Afonso Vasco Duarte Vieira
  • Santos Daniel Filipe Matoso dos
  • Viegas Idalio de Jesus Contreiras

Synopsis

Genomic Sequence

>JCVI_READ_1091120566212 GOS_2138030 Genomic DNA
GGGGGGCCTCAGTATTGTTCGGGTCAAAATGGATCTGTACAAGGCCGAAATCGGCGGCATGGGAAAACACATTCTCCGGCGAGGGATGATGATCAGCAGG
CAGATCCTGATTTGTCGCAGCAGGACGATTCTGGCGGTTTCTCAGGTATTGTTCCATGAGGAGACGAGTTCTTTGATTCACATACTCCCGGATTTCATCA
AGTGTAGCTGATGCGATAATGGCCTCTCTACCGGGGATTGCAAGAGGGATAGACCGATCCGTCTGACGAAGGCCGAGGTGTTCTGCAGCCCGCATTCGGT
CTTCAAACACCGTCTCATCGGCATCAGATGCGTGAAATCGGTGAATAGCCCCGTAGGTGAGAATTTGCTTTGGGTCTTGTCCGATGGTTGGAAGCAGGAG
ATTTTGAATCATATGAATCCGGAATAACTGGCGCTCCAGATCTCGATCATATATGCCTGAATCATCGTTCTTCCAGTACCTCGTCAGTTCAATTATGTTT
TCATAACCTCGCCCGAAATTGAGGCTTCCATTCTCAAGCGCGTTTCGATCCGCTATCAGCTCCGTGGGCAGGGGTCCGGATTCGCCATTCCATAGTCGAT
ATACAGCCTGCAAGTGACTGACCGGGTCCGGTCCTGCAAAGTTGCCGGTGGGAAAGTTACCGTGGTCGGGGGGTTTGAGAGCTTCGGGTAAATCTTGCAT
GGATTCCGGTTCATAGATGACCCGGCGGCCACTGTATCCGACTTGATGAAAACCACGCGGAAGGAAGTCGGGGTGGTGCTCAAATATGATCCCAAATGGA
ATTAAGTCAGACGGTGATATCTGGGTTAAAAGATCCGTATCAAGATAAGCTCGAAGCGTCTTCAGGAGGCTGATGGCGATCAGTGTTCGCGGAGGGATGT
GCTGGATTCCCCAATAATCAGGATTAGTGTGGGGAGAAAAG

Translation

[2 - 940/941]   indirect strand
>GOS_2138030 Translation [2-940   indirect strand]
FSPHTNPDYWGIQHIPPRTLIAISLLKTLRAYLDTDLLTQISPSDLIPFGIIFEHHPDFLPRGFHQVGYSGRRVIYEPESMQDLPEALKPPDHGNFPTGN
FAGPDPVSHLQAVYRLWNGESGPLPTELIADRNALENGSLNFGRGYENIIELTRYWKNDDSGIYDRDLERQLFRIHMIQNLLLPTIGQDPKQILTYGAIH
RFHASDADETVFEDRMRAAEHLGLRQTDRSIPLAIPGREAIIASATLDEIREYVNQRTRLLMEQYLRNRQNRPAATNQDLPADHHPSPENVFSHAADFGL
VQIHFDPNNTEAP

[ Warning ] 5' incomplete: does not start with a Methionine
[ Warning ] 3' incomplete: following codon is not a STOP

Annotator commentaries

After the review one can conclude that our sequence(GOS_2138030) encodes a protein, due to its large size (313 aminoacids) but not homologous sequences were found in the BLAST(best e-value are 0.63), and called ORFan. Furthermore,no known protein domains were found. Due to lack of results in the BLAST was not possible to perform multiple alignments and phylogenetic tree construction. As such it was not possible to define its molecular function, biological process or determine the taxonomy classification.

ORF finding

PROTOCOL

a) SMS ORFinder / forward strand / frames 1, 2 & 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code

b) SMS ORFinder / reverse strand / frames 1, 2 & 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code



RESULTS ANALYSIS


Found 5 ORFs in our sequence(GOS_2138030) of DNA, was selected the ORF number 1, reading frame 2, direction reverse to annotate, the criterion was size, is the largest ORF found. Made sure BLASTp, BLASTx and similar sequences were not found, with e-value significant (best e-value of 0.63). However due to the size (313 amino acids) probably the ORF has biological function, being called ORFan. The ORF is to reverse, starts in base 2 and end in 940, having a length of 938 base pairs.


The ORF is incomplete, has no initiation codon or termination, however, the ORF extends from the 2nd nucleotide (base 2) until the penultimate nucleotide (base 940),so ORF may have been broken in handling, isolation and sequencing so it was only sequenced a portion of the ORF.


However, as no similar sequences were found in the BLAST, it was not possible to perform a multiple alignment and draw conclusions regarding the size of the sequences and codons of initiation and termination right.


BLAST were made all other ORFs, however were not found similar sequences.

RAW RESULTS

a)forward strand

No ORFs were found in reading frame 1.

>ORF number 1 in reading frame 2 on the direct strand extends from base 2 to base 211.
GGGGGCCTCAGTATTGTTCGGGTCAAAATGGATCTGTACAAGGCCGAAATCGGCGGCATG
GGAAAACACATTCTCCGGCGAGGGATGATGATCAGCAGGCAGATCCTGATTTGTCGCAGC
AGGACGATTCTGGCGGTTTCTCAGGTATTGTTCCATGAGGAGACGAGTTCTTTGATTCAC
ATACTCCCGGATTTCATCAAGTGTAGCTGA

>Translation of ORF number 1 in reading frame 2 on the direct strand.
GGLSIVRVKMDLYKAEIGGMGKHILRRGMMISRQILICRSRTILAVSQVLFHEETSSLIH
ILPDFIKCS*

>ORF number 2 in reading frame 2 on the direct strand extends from base 212 to base 427.
TGCGATAATGGCCTCTCTACCGGGGATTGCAAGAGGGATAGACCGATCCGTCTGACGAAG
GCCGAGGTGTTCTGCAGCCCGCATTCGGTCTTCAAACACCGTCTCATCGGCATCAGATGC
GTGAAATCGGTGAATAGCCCCGTAGGTGAGAATTTGCTTTGGGTCTTGTCCGATGGTTGG
AAGCAGGAGATTTTGAATCATATGAATCCGGAATAA

>Translation of ORF number 2 in reading frame 2 on the direct strand.
CDNGLSTGDCKRDRPIRLTKAEVFCSPHSVFKHRLIGIRCVKSVNSPVGENLLWVLSDGW
KQEILNHMNPE*

No ORFs were found in reading frame 3.

-------------------------------------------------------------------------------------------------------------------------
b)reverse strand

No ORFs were found in reading frame 1.

>ORF number 1 in reading frame 2 on the reverse strand extends from base 2 to base 940.
TTTTCTCCCCACACTAATCCTGATTATTGGGGAATCCAGCACATCCCTCCGCGAACACTG
ATCGCCATCAGCCTCCTGAAGACGCTTCGAGCTTATCTTGATACGGATCTTTTAACCCAG
ATATCACCGTCTGACTTAATTCCATTTGGGATCATATTTGAGCACCACCCCGACTTCCTT
CCGCGTGGTTTTCATCAAGTCGGATACAGTGGCCGCCGGGTCATCTATGAACCGGAATCC
ATGCAAGATTTACCCGAAGCTCTCAAACCCCCCGACCACGGTAACTTTCCCACCGGCAAC
TTTGCAGGACCGGACCCGGTCAGTCACTTGCAGGCTGTATATCGACTATGGAATGGCGAA
TCCGGACCCCTGCCCACGGAGCTGATAGCGGATCGAAACGCGCTTGAGAATGGAAGCCTC
AATTTCGGGCGAGGTTATGAAAACATAATTGAACTGACGAGGTACTGGAAGAACGATGAT
TCAGGCATATATGATCGAGATCTGGAGCGCCAGTTATTCCGGATTCATATGATTCAAAAT
CTCCTGCTTCCAACCATCGGACAAGACCCAAAGCAAATTCTCACCTACGGGGCTATTCAC
CGATTTCACGCATCTGATGCCGATGAGACGGTGTTTGAAGACCGAATGCGGGCTGCAGAA
CACCTCGGCCTTCGTCAGACGGATCGGTCTATCCCTCTTGCAATCCCCGGTAGAGAGGCC
ATTATCGCATCAGCTACACTTGATGAAATCCGGGAGTATGTGAATCAAAGAACTCGTCTC
CTCATGGAACAATACCTGAGAAACCGCCAGAATCGTCCTGCTGCGACAAATCAGGATCTG
CCTGCTGATCATCATCCCTCGCCGGAGAATGTGTTTTCCCATGCCGCCGATTTCGGCCTT
GTACAGATCCATTTTGACCCGAACAATACTGAGGCCCCC

>Translation of ORF number 1 in reading frame 2 on the reverse strand.
FSPHTNPDYWGIQHIPPRTLIAISLLKTLRAYLDTDLLTQISPSDLIPFGIIFEHHPDFL
PRGFHQVGYSGRRVIYEPESMQDLPEALKPPDHGNFPTGNFAGPDPVSHLQAVYRLWNGE
SGPLPTELIADRNALENGSLNFGRGYENIIELTRYWKNDDSGIYDRDLERQLFRIHMIQN
LLLPTIGQDPKQILTYGAIHRFHASDADETVFEDRMRAAEHLGLRQTDRSIPLAIPGREA
IIASATLDEIREYVNQRTRLLMEQYLRNRQNRPAATNQDLPADHHPSPENVFSHAADFGL
VQIHFDPNNTEAP

>ORF number 1 in reading frame 3 on the reverse strand extends from base 141 to base 386.
TTCCATTTGGGATCATATTTGAGCACCACCCCGACTTCCTTCCGCGTGGTTTTCATCAAG
TCGGATACAGTGGCCGCCGGGTCATCTATGAACCGGAATCCATGCAAGATTTACCCGAAG
CTCTCAAACCCCCCGACCACGGTAACTTTCCCACCGGCAACTTTGCAGGACCGGACCCGG
TCAGTCACTTGCAGGCTGTATATCGACTATGGAATGGCGAATCCGGACCCCTGCCCACGG
AGCTGA

>Translation of ORF number 1 in reading frame 3 on the reverse strand.
FHLGSYLSTTPTSFRVVFIKSDTVAAGSSMNRNPCKIYPKLSNPPTTVTFPPATLQDRTR
SVTCRLYIDYGMANPDPCPRS*

>ORF number 2 in reading frame 3 on the reverse strand extends from base 534 to base 764.
TTCAAAATCTCCTGCTTCCAACCATCGGACAAGACCCAAAGCAAATTCTCACCTACGGGG
CTATTCACCGATTTCACGCATCTGATGCCGATGAGACGGTGTTTGAAGACCGAATGCGGG
CTGCAGAACACCTCGGCCTTCGTCAGACGGATCGGTCTATCCCTCTTGCAATCCCCGGTA
GAGAGGCCATTATCGCATCAGCTACACTTGATGAAATCCGGGAGTATGTGA

>Translation of ORF number 2 in reading frame 3 on the reverse strand.
FKISCFQPSDKTQSKFSPTGLFTDFTHLMPMRRCLKTECGLQNTSAFVRRIGLSLLQSPV
ERPLSHQLHLMKSGSM*

Multiple Alignement

PROTOCOL



RESULTS ANALYSIS


Not possible to perform multiple alignment, because due to lack of results in blast has no known homologous sequences necessary for the accomplishment of multiple alignment. As already mentioned, it was not possible to determine the correct codon of initiation and termination and correct ORF size.

RAW RESULTS

Protein Domains

PROTOCOL


InterpPro, default parameters at EBI


RESULTS ANALYSIS


Using INTERPROSCAN were not found known protein domains, but was found a signal peptide in protein, however is not conclusive because the e-value were not available (NA).


This signal peptide is a short sequence of amino acids with the function of target protein in cells or to outside, controlling the fate of proteins synthesized in eukaryotic cells, both as prokaryotes. Normally this peptide signal is cleaved when a protein is traslocated across membranes (1,2).






1) Nielsen H. Engelbrecht J. Brunak S. von Heijne G. (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Engineering vol.10 no.1 pp.1–6


2) Plewczynski D. Slabinski L. Ginalski1 K and Rychlewski L. (2008) Prediction of signal peptides in protein sequences by neural networks. Acta biochimica polonica Vol. 55 No. 2/2008, 261–267


RAW RESULTS

GOS_2138030	4D6A1AEEF2C16DE3	313	SignalPHMM	SignalP-NN(euk)	signal-peptide	1	31	NA	?	27-Apr-2011	NULL	NULL

Phylogeny

PROTOCOL



RESULTS ANALYSIS


Was not possible to construct a phylogenetic tree, because there were no homologous sequences in the BLAST, as such not have sequences to set IN and OUT groups for the multiple alignment required for construction of phylogenetic tree.

RAW RESULTS

Taxonomy report

PROTOCOL


BLASTp vs NR, default ncbi parameters + "1000 max target sequences"

BLASTp vs SWISSPROT, default ncbi parameters + "1000 max target sequences"

BLASTx vs NR,default ncbi parameters + "1000 max target sequences"



RESULTS ANALYSIS


Not possible to carry out a review of taxonomic reliable because not found similar sequences in the blast, and such review requires to find homologous sequences for comparison, and it was impossible to define in/out groups due to the lack of results.

RAW RESULTS

BLAST

PROTOCOL

a)BLASTp vs NR, default ncbi parameters + "1000 max target sequences"

b)BLASTp vs SWISSPROT, default ncbi parameters + "1000 max target sequences"

c)BLASTx vs NR,default ncbi parameters + "1000 max target sequences"



RESULTS ANALYSIS


Searching for similar sequences has not revealed the existence of significant results in the databases, the best results were e-value 1.6, score 38.1, with a total of 9 hits for BLASTp vs NR; e-value 2,2, score 33.1,with a total of 15 hits for BLASTp vs swissprot; e-value 0.63, score 39.7 and a total of 3 hits for BLASTx vs NR. These results have no biological significance (because the maximum required for the e-value is less than 1e-4, and to score greater than 100). Thus, it was not possible to put a hypothesis as to the type of protein made since have been unable to find homologous sequences.

RAW RESULTS

a)BLASTp vs NR

                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|NP_001067181.1|  Os12g0595300 [Oryza sativa Japonica Group...  38.1    1.6  
ref|YP_001893561.1|  CagE TrbE VirB component of type IV trans...  38.1    1.8  
gb|ABA99184.2|  F-box domain containing protein, expressed [Or...  37.7    2.2  
gb|AAV31177.2|  hypothetical protein STB1_57t00015 [Solanum tu...  37.7    2.2  
gb|EAZ21088.1|  hypothetical protein OsJ_36730 [Oryza sativa J...  37.4    2.6  
gb|ABA99183.1|  F-box domain containing protein [Oryza sativa ...  37.4    2.9  
ref|NP_001177040.1|  Os12g0595200 [Oryza sativa Japonica Group...  37.4    3.0  
ref|XP_001942539.1|  PREDICTED: similar to Short transient rec...  36.6    4.7  
gb|EAY83748.1|  hypothetical protein OsI_38965 [Oryza sativa I...  35.8    8.7  

ALIGNMENTS
>ref|NP_001067181.1| Os12g0595300 [Oryza sativa Japonica Group]
 dbj|BAF30200.1| Os12g0595300 [Oryza sativa Japonica Group]
Length=565

 Score = 38.1 bits (87),  Expect = 1.6, Method: Compositional matrix adjust.
 Identities = 24/74 (33%), Positives = 36/74 (49%), Gaps = 1/74 (1%)

Query  127  ELIADRNALENGSLNFGRGYENIIELTRYWKNDDSGIYDRDLERQLFRIHMIQNLLLPTI  186
            E+  D   ++N     GR Y   +E TR W  D    +  +L+R   R+ M+ NL +P  
Sbjct  192  EIYEDEGIIQNAGHEIGRVYCLRVETTR-WSLDHLVRWCAELQRGGARVLMLANLAIPEH  250

Query  187  GQDPKQILTYGAIH  200
             + P+ IL  GA H
Sbjct  251  PELPQAILNCGASH  264


>ref|YP_001893561.1| CagE TrbE VirB component of type IV transporter system [Burkholderia 
phytofirmans PsJN]
 gb|ACD21632.1| CagE TrbE VirB component of type IV transporter system [Burkholderia 
phytofirmans PsJN]
Length=849

 Score = 38.1 bits (87),  Expect = 1.8, Method: Compositional matrix adjust.
 Identities = 24/54 (45%), Positives = 33/54 (62%), Gaps = 5/54 (9%)

Query  101  FAGPDP-VSHLQAVY-RLWNGESGPLPTELIADRNALENGSLNFGRGYENIIEL  152
            F G  P +SHLQA Y RL N + GP+P +  + R A++   L+FGR    IIE+
Sbjct  182  FDGEKPAISHLQAFYGRLLNADVGPVPLDSYSVRFAIQRNELHFGR---EIIEI  232


>gb|ABA99184.2| F-box domain containing protein, expressed [Oryza sativa Japonica 
Group]
 dbj|BAG97850.1| unnamed protein product [Oryza sativa Japonica Group]
Length=526

 Score = 37.7 bits (86),  Expect = 2.2, Method: Compositional matrix adjust.
 Identities = 24/74 (33%), Positives = 36/74 (49%), Gaps = 1/74 (1%)

Query  127  ELIADRNALENGSLNFGRGYENIIELTRYWKNDDSGIYDRDLERQLFRIHMIQNLLLPTI  186
            E+  D   ++N     GR Y   +E TR W  D    +  +L+R   R+ M+ NL +P  
Sbjct  153  EIYEDEGIIQNAGHEIGRVYCLRVETTR-WSLDHLVRWCAELQRGGARVLMLANLAIPEH  211

Query  187  GQDPKQILTYGAIH  200
             + P+ IL  GA H
Sbjct  212  PELPQAILNCGASH  225


>gb|AAV31177.2| hypothetical protein STB1_57t00015 [Solanum tuberosum]
Length=443

 Score = 37.7 bits (86),  Expect = 2.2, Method: Compositional matrix adjust.
 Identities = 31/130 (24%), Positives = 57/130 (44%), Gaps = 7/130 (5%)

Query  156  WKNDDSGIYDRDLERQLFRIHMIQNLLLPTIGQDPKQILTYGAIHRFHASDADETVFEDR  215
            WK  +  I+  DL+  +FR H  Q+     +  D +++     I+   +S  DET  ED+
Sbjct  93   WKTSNESIFFDDLKNMIFRTHGNQHKFRNIVLTD-EELNAMDQINLHQSSSQDET--EDQ  149

Query  216  MRAAEHLGLRQTDRSIPLAIPGREAIIASATLDEIREYVNQRTRLLMEQYLRNRQNRPAA  275
              + +H      D         +E       + +++ YV+  T+L +E+ + +R     A
Sbjct  150  ATSEKH----DVDFDKKYVELKKEIAEVDKHMTDMKAYVDNSTKLTIEEIMSSRGQPSQA  205

Query  276  TNQDLPADHH  285
            T+Q   A  H
Sbjct  206  THQQDDARQH  215


>gb|EAZ21088.1| hypothetical protein OsJ_36730 [Oryza sativa Japonica Group]
Length=605

 Score = 37.4 bits (85),  Expect = 2.6, Method: Compositional matrix adjust.
 Identities = 24/74 (33%), Positives = 36/74 (49%), Gaps = 1/74 (1%)

Query  127  ELIADRNALENGSLNFGRGYENIIELTRYWKNDDSGIYDRDLERQLFRIHMIQNLLLPTI  186
            E+  D   ++N     GR Y   +E TR W  D    +  +L+R   R+ M+ NL +P  
Sbjct  153  EIYEDEGIIQNAGHEIGRVYCLRVETTR-WSLDHLVRWCAELQRGGARVLMLANLAIPEH  211

Query  187  GQDPKQILTYGAIH  200
             + P+ IL  GA H
Sbjct  212  PELPQAILNCGASH  225


>gb|ABA99183.1| F-box domain containing protein [Oryza sativa Japonica Group]
 gb|EAZ21087.1| hypothetical protein OsJ_36729 [Oryza sativa Japonica Group]
Length=530

 Score = 37.4 bits (85),  Expect = 2.9, Method: Compositional matrix adjust.
 Identities = 46/168 (28%), Positives = 70/168 (42%), Gaps = 21/168 (12%)

Query  120  ESGPLPTELIADRNALENGSLNFGRGYENIIELTRYWKNDDSGIYDRDLERQLFRIHMIQ  179
            ESG L  ++IA   A+ NG    GR +   +E TR W  +    +   L R   R+ ++ 
Sbjct  158  ESG-LCDDVIAHDAAMLNGGFEIGRVFCFRVETTR-WSLEQLNRWCAALHRGRARVIVVA  215

Query  180  NLLLPTIGQDPKQILTYGAIHRFH-------ASDADETVFEDRMRAAEHLGLRQTDRSI-  231
            NL LP   + P+ +L   ++   H       A   D  +       A  LG+   DR+I 
Sbjct  216  NLHLPGYPRFPQALLDCTSLLELHLFFFTVEAYRIDRLLVLGLYSCAWGLGM--IDRAIH  273

Query  232  ------PLAIPGREAI---IASATLDEIREYVNQRTRLLMEQYLRNRQ  270
                   LAI G E     +A   L  +R Y NQ   + ++   R R+
Sbjct  274  RESEIRELAIDGVEGSTFRLADTRLQTLRMYENQVGTVAVDNATRLRK  321


>ref|NP_001177040.1| Os12g0595200 [Oryza sativa Japonica Group]
 dbj|BAH95768.1| Os12g0595200 [Oryza sativa Japonica Group]
Length=533

 Score = 37.4 bits (85),  Expect = 3.0, Method: Compositional matrix adjust.
 Identities = 46/168 (28%), Positives = 70/168 (42%), Gaps = 21/168 (12%)

Query  120  ESGPLPTELIADRNALENGSLNFGRGYENIIELTRYWKNDDSGIYDRDLERQLFRIHMIQ  179
            ESG L  ++IA   A+ NG    GR +   +E TR W  +    +   L R   R+ ++ 
Sbjct  158  ESG-LCDDVIAHDAAMLNGGFEIGRVFCFRVETTR-WSLEQLNRWCAALHRGRARVIVVA  215

Query  180  NLLLPTIGQDPKQILTYGAIHRFH-------ASDADETVFEDRMRAAEHLGLRQTDRSI-  231
            NL LP   + P+ +L   ++   H       A   D  +       A  LG+   DR+I 
Sbjct  216  NLHLPGYPRFPQALLDCTSLLELHLFFFTVEAYRIDRLLVLGLYSCAWGLGM--IDRAIH  273

Query  232  ------PLAIPGREAI---IASATLDEIREYVNQRTRLLMEQYLRNRQ  270
                   LAI G E     +A   L  +R Y NQ   + ++   R R+
Sbjct  274  RESEIRELAIDGVEGSTFRLADTRLQTLRMYENQVGTVAVDNATRLRK  321


>ref|XP_001942539.1| PREDICTED: similar to Short transient receptor potential channel 
4 (TrpC4) (Trp-related protein 4) (hTrp-4) (hTrp4) [Acyrthosiphon 
pisum]
Length=747

 Score = 36.6 bits (83),  Expect = 4.7, Method: Compositional matrix adjust.
 Identities = 23/98 (24%), Positives = 47/98 (48%), Gaps = 7/98 (7%)

Query  151  ELTRYWKNDDSGIYDRDLERQLFR-------IHMIQNLLLPTIGQDPKQILTYGAIHRFH  203
            E+ +     ++G+   +L+  +FR       I++  N  + T  Q  KQ+L +G ++R  
Sbjct  182  EVCKSLAQLNNGLEASELKLSVFRALANPFYIYLTSNDPILTAFQLSKQLLEHGNVNRLF  241

Query  204  ASDADETVFEDRMRAAEHLGLRQTDRSIPLAIPGREAI  241
             SD +    + R+ A + +GL +T   + L +  +E  
Sbjct  242  KSDYESLNLQTRIFAVDLIGLCRTSDEVKLILTRKEGC  279


>gb|EAY83748.1| hypothetical protein OsI_38965 [Oryza sativa Indica Group]
Length=605

 Score = 35.8 bits (81),  Expect = 8.7, Method: Compositional matrix adjust.
 Identities = 23/72 (32%), Positives = 35/72 (49%), Gaps = 1/72 (1%)

Query  127  ELIADRNALENGSLNFGRGYENIIELTRYWKNDDSGIYDRDLERQLFRIHMIQNLLLPTI  186
            E+  D   ++N     GR Y   +E TR W  D    +  +L+R   R+ M+ NL +P  
Sbjct  153  EIYEDEGIIQNAGHEIGRVYCLRVETTR-WSLDHLVRWCAELQRGGARVLMLANLAIPEH  211

Query  187  GQDPKQILTYGA  198
             + P+ IL  GA
Sbjct  212  PELPQAILNCGA  223


  Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
excluding environmental samples from WGS projects
    Posted date:  Mar 22, 2011  4:36 PM
  Number of letters in database: 326,528,513
  Number of sequences in database:  13,473,798

Lambda     K      H
   0.320    0.139    0.428 
Gapped
Lambda     K      H
   0.267   0.0410    0.140 
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 13473798
Number of Hits to DB: 160516757
Number of extensions: 7428067
Number of successful extensions: 14170
Number of sequences better than 100: 0
Number of HSP's better than 100 without gapping: 0
Number of HSP's gapped: 14170
Number of HSP's successfully gapped: 0
Length of query: 313
Length of database: 4621495809
Length adjustment: 138
Effective length of query: 175
Effective length of database: 2762111685
Effective search space: 483369544875
Effective search space used: 483369544875
T: 11
A: 40
X1: 16 (7.4 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (20.4 bits)
S2: 72 (32.3 bits)


-------------------------------------------------------------------------------
b)BLASTp vs SWISSPROT:


                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

sp|A4TB76.1|RIR2H_MYCGI  RecName: Full=R2-like ligand binding ...  33.1    2.2  
sp|Q9PQV5.1|RPOC_UREPA  RecName: Full=DNA-directed RNA polymer...  33.1    2.7  
sp|P40343.3|VPS27_YEAST  RecName: Full=Vacuolar protein sortin...  32.3    4.2  
sp|B5ZAZ4.1|RPOC_UREU1  RecName: Full=DNA-directed RNA polymer...  32.0    4.7  
sp|Q2W2D1.1|PURA_MAGSA  RecName: Full=Adenylosuccinate synthet...  32.0    5.1  
sp|Q92FQ5.1|PURA_LISIN  RecName: Full=Adenylosuccinate synthet...  32.0    5.3  
sp|B8DAL0.1|PURA_LISMH  RecName: Full=Adenylosuccinate synthet...  32.0    5.3  
sp|Q725A8.1|PURA_LISMF  RecName: Full=Adenylosuccinate synthet...  32.0    5.3  
sp|Q8YAR1.1|PURA_LISMO  RecName: Full=Adenylosuccinate synthet...  32.0    5.3  
sp|A0AEN2.1|PURA_LISW6  RecName: Full=Adenylosuccinate synthet...  32.0    5.5  
sp|Q08972.1|NEW1_YEAST  RecName: Full=[NU+] prion formation pr...  32.0    5.6  
sp|B2GEZ1.1|PURA_LACF3  RecName: Full=Adenylosuccinate synthet...  32.0    5.9  
sp|A6VD51.1|PURA_PSEA7  RecName: Full=Adenylosuccinate synthet...  31.6    7.3  
sp|Q06853.1|SLAP2_CLOTH  RecName: Full=Cell surface glycoprote...  31.6    7.3  
sp|Q4KJ68.1|PURA_PSEF5  RecName: Full=Adenylosuccinate synthet...  31.2    9.1  

ALIGNMENTS
>sp|A4TB76.1|RIR2H_MYCGI RecName: Full=R2-like ligand binding oxidase; AltName: Full=Ribonucleotide 
reductase R2 subunit homolog; AltName: Full=Ribonucleotide 
reductase small subunit homolog
Length=312

 Score = 33.1 bits (74),  Expect = 2.2, Method: Compositional matrix adjust.
 Identities = 26/79 (33%), Positives = 40/79 (51%), Gaps = 7/79 (9%)

Query  182  LLPTIGQDPKQILTYGAI-HRFH--ASDADETVFEDRMRAAEHLGLRQTDRSIPLAIPGR  238
            L+  IG D ++ + +G    R H  A DA+ TVFEDRM     L L+ TD +  L     
Sbjct  194  LVRRIGDDERRHMAWGTFTCRRHVAADDANWTVFEDRMNELIPLALQNTDDAFAL----Y  249

Query  239  EAIIASATLDEIREYVNQR  257
            + I    T++E ++Y   +
Sbjct  250  DEIPFGLTIEEFQQYAADK  268


>sp|Q9PQV5.1|RPOC_UREPA RecName: Full=DNA-directed RNA polymerase subunit beta'; Short=RNAP 
subunit beta'; AltName: Full=RNA polymerase subunit beta'; 
AltName: Full=Transcriptase subunit beta'
 sp|B1AIH6.1|RPOC_UREP2 RecName: Full=DNA-directed RNA polymerase subunit beta'; Short=RNAP 
subunit beta'; AltName: Full=RNA polymerase subunit beta'; 
AltName: Full=Transcriptase subunit beta'
Length=1305

 Score = 33.1 bits (74),  Expect = 2.7, Method: Composition-based stats.
 Identities = 21/75 (28%), Positives = 32/75 (43%), Gaps = 6/75 (8%)

Query  224   LRQTDRSIPLAIPGREAIIASATLDEIREYVNQRTRLLMEQYLRNRQNRPAATNQDLPAD  283
             + Q    I +  PG   +    T+      +N+ T +  +  L N++  P+A NQ    D
Sbjct  1181  ISQLTNKITITNPGDSGLFVGETIS-----INEFTEV-AQNMLVNKKKPPSAINQVFGLD  1234

Query  284   HHPSPENVFSHAADF  298
             H PS    F  AA F
Sbjct  1235  HAPSKSGSFLSAASF  1249


>sp|P40343.3|VPS27_YEAST RecName: Full=Vacuolar protein sorting-associated protein 27; 
AltName: Full=Golgi retention defective protein 11
Length=622

 Score = 32.3 bits (72),  Expect = 4.2, Method: Compositional matrix adjust.
 Identities = 23/72 (32%), Positives = 34/72 (47%), Gaps = 6/72 (8%)

Query  129  IADRNALENGSLNFG-----RGYENIIELTRYWKNDDSGIYDRDLERQLFRIHMIQNLLL  183
            +A R       LN+      + Y  +IE+     ++   IYDR LE+QL  I++ Q   L
Sbjct  385  LAQRVFASKARLNYALNDKAQKYNTLIEMNGKI-SEIMNIYDRLLEQQLQSINLSQQYTL  443

Query  184  PTIGQDPKQILT  195
            P +  DP   LT
Sbjct  444  PQVPSDPYNYLT  455


>sp|B5ZAZ4.1|RPOC_UREU1 RecName: Full=DNA-directed RNA polymerase subunit beta'; Short=RNAP 
subunit beta'; AltName: Full=RNA polymerase subunit beta'; 
AltName: Full=Transcriptase subunit beta'
Length=1305

 Score = 32.0 bits (71),  Expect = 4.7, Method: Composition-based stats.
 Identities = 20/75 (27%), Positives = 32/75 (43%), Gaps = 6/75 (8%)

Query  224   LRQTDRSIPLAIPGREAIIASATLDEIREYVNQRTRLLMEQYLRNRQNRPAATNQDLPAD  283
             + Q    + +  PG   +    T+      +N+ T +  +  L N++  P+A NQ    D
Sbjct  1181  ISQLTNKVTITNPGDSGLFVGETIS-----INEFTEV-AQSMLVNKKKPPSAINQVFGLD  1234

Query  284   HHPSPENVFSHAADF  298
             H PS    F  AA F
Sbjct  1235  HAPSKSGSFLSAASF  1249


>sp|Q2W2D1.1|PURA_MAGSA RecName: Full=Adenylosuccinate synthetase; Short=AMPSase; Short=AdSS; 
AltName: Full=IMP--aspartate ligase
Length=429

 Score = 32.0 bits (71),  Expect = 5.1, Method: Compositional matrix adjust.
 Identities = 40/160 (25%), Positives = 67/160 (42%), Gaps = 28/160 (18%)

Query  38   LTQISPSDLIPFGIIFEHHPDFLPRGFHQVGYSGRRVIYEPE--SMQDLPEALKP-----  90
            L  ++P  ++PF  +   H D       +V ++ +RV++E    +M D+     P     
Sbjct  189  LLDVAPK-ILPFADVVWQHLD-------EVRHTRKRVLFEGAQGAMLDVDHGTYPYVTSS  240

Query  91   -PDHGNFPTGNFAGPDPVSHLQAVYRLWN---GESGPLPTELIADRNALENGSLNFGRGY  146
                GN  TG+  GP  V ++  + + +    GE GP PTEL       E G     RG+
Sbjct  241  NTVSGNAGTGSGVGPGQVGYVLGICKAYTTRVGE-GPFPTELFD-----EIGKSIGERGH  294

Query  147  ENIIELTRYWKNDDSGIYDRDLERQLFRIHMIQNLLLPTI  186
            E     T   +    G +D  + RQ  ++  I  + L  +
Sbjct  295  EF---GTVTGRPRRCGWFDAVMVRQAVKVGGITGIALTKL  331


>sp|Q92FQ5.1|PURA_LISIN RecName: Full=Adenylosuccinate synthetase; Short=AMPSase; Short=AdSS; 
AltName: Full=IMP--aspartate ligase
Length=430

 Score = 32.0 bits (71),  Expect = 5.3, Method: Compositional matrix adjust.
 Identities = 20/68 (29%), Positives = 31/68 (46%), Gaps = 10/68 (15%)

Query  71   GRRVIYEPES--MQDLPEALKP------PDHGNFPTGNFAGPDPVSHLQAVYRLWNGE--  120
            G+RV++E     M D+ +   P      P  G    G+  GP  ++H+  V + +     
Sbjct  214  GKRVLFEGAQGVMLDIDQGTYPFVTSSNPIAGGVTIGSGVGPSKINHVVGVAKAYTTRVG  273

Query  121  SGPLPTEL  128
             GP PTEL
Sbjct  274  DGPFPTEL  281


>sp|B8DAL0.1|PURA_LISMH RecName: Full=Adenylosuccinate synthetase; Short=AMPSase; Short=AdSS; 
AltName: Full=IMP--aspartate ligase
Length=430

 Score = 32.0 bits (71),  Expect = 5.3, Method: Compositional matrix adjust.
 Identities = 20/68 (29%), Positives = 31/68 (46%), Gaps = 10/68 (15%)

Query  71   GRRVIYEPES--MQDLPEALKP------PDHGNFPTGNFAGPDPVSHLQAVYRLWNGE--  120
            G+RV++E     M D+ +   P      P  G    G+  GP  ++H+  V + +     
Sbjct  214  GKRVLFEGAQGVMLDIDQGTYPFVTSSNPIAGGVTIGSGVGPSKINHVVGVAKAYTTRVG  273

Query  121  SGPLPTEL  128
             GP PTEL
Sbjct  274  DGPFPTEL  281


>sp|Q725A8.1|PURA_LISMF RecName: Full=Adenylosuccinate synthetase; Short=AMPSase; Short=AdSS; 
AltName: Full=IMP--aspartate ligase
 sp|C1KV62.1|PURA_LISMC RecName: Full=Adenylosuccinate synthetase; Short=AMPSase; Short=AdSS; 
AltName: Full=IMP--aspartate ligase
Length=430

 Score = 32.0 bits (71),  Expect = 5.3, Method: Compositional matrix adjust.
 Identities = 20/68 (29%), Positives = 31/68 (46%), Gaps = 10/68 (15%)

Query  71   GRRVIYEPES--MQDLPEALKP------PDHGNFPTGNFAGPDPVSHLQAVYRLWNGE--  120
            G+RV++E     M D+ +   P      P  G    G+  GP  ++H+  V + +     
Sbjct  214  GKRVLFEGAQGVMLDIDQGTYPFVTSSNPIAGGVTIGSGVGPSKINHVVGVAKAYTTRVG  273

Query  121  SGPLPTEL  128
             GP PTEL
Sbjct  274  DGPFPTEL  281


>sp|Q8YAR1.1|PURA_LISMO RecName: Full=Adenylosuccinate synthetase; Short=AMPSase; Short=AdSS; 
AltName: Full=IMP--aspartate ligase
Length=430

 Score = 32.0 bits (71),  Expect = 5.3, Method: Compositional matrix adjust.
 Identities = 20/68 (29%), Positives = 31/68 (46%), Gaps = 10/68 (15%)

Query  71   GRRVIYEPES--MQDLPEALKP------PDHGNFPTGNFAGPDPVSHLQAVYRLWNGE--  120
            G+RV++E     M D+ +   P      P  G    G+  GP  ++H+  V + +     
Sbjct  214  GKRVLFEGAQGVMLDIDQGTYPFVTSSNPIAGGVTIGSGVGPSKINHVVGVAKAYTTRVG  273

Query  121  SGPLPTEL  128
             GP PTEL
Sbjct  274  DGPFPTEL  281


>sp|A0AEN2.1|PURA_LISW6 RecName: Full=Adenylosuccinate synthetase; Short=AMPSase; Short=AdSS; 
AltName: Full=IMP--aspartate ligase
Length=430

 Score = 32.0 bits (71),  Expect = 5.5, Method: Compositional matrix adjust.
 Identities = 20/68 (29%), Positives = 31/68 (46%), Gaps = 10/68 (15%)

Query  71   GRRVIYEPES--MQDLPEALKP------PDHGNFPTGNFAGPDPVSHLQAVYRLWNGE--  120
            G+RV++E     M D+ +   P      P  G    G+  GP  ++H+  V + +     
Sbjct  214  GKRVLFEGAQGVMLDIDQGTYPFVTSSNPIAGGVTIGSGVGPSKINHVVGVAKAYTTRVG  273

Query  121  SGPLPTEL  128
             GP PTEL
Sbjct  274  DGPFPTEL  281


>sp|Q08972.1|NEW1_YEAST RecName: Full=[NU+] prion formation protein 1
Length=1196

 Score = 32.0 bits (71),  Expect = 5.6, Method: Composition-based stats.
 Identities = 13/42 (31%), Positives = 26/42 (62%), Gaps = 0/42 (0%)

Query  231  IPLAIPGREAIIASATLDEIREYVNQRTRLLMEQYLRNRQNR  272
            IPL +PG + ++ +A+L E+RE   +   +L E    +++N+
Sbjct  428  IPLLLPGIQKVVDTASLPEVRELAEKALNVLKEDDEADKENK  469


>sp|B2GEZ1.1|PURA_LACF3 RecName: Full=Adenylosuccinate synthetase; Short=AMPSase; Short=AdSS; 
AltName: Full=IMP--aspartate ligase
Length=429

 Score = 32.0 bits (71),  Expect = 5.9, Method: Compositional matrix adjust.
 Identities = 22/69 (32%), Positives = 33/69 (48%), Gaps = 12/69 (17%)

Query  71   GRRVIYEPE--SMQDLPEALKP------PDHGNFPTGNFAGPDPVSHLQAVYRLWN---G  119
            G +V++E    +M D+ E   P      P  G  PTG   GP+ +  +  + + +    G
Sbjct  214  GDKVLFEGAQGTMLDIDEGTYPYVTSSNPTAGGAPTGAGVGPNKIETVVGIAKAYTTRVG  273

Query  120  ESGPLPTEL  128
            E GP PTEL
Sbjct  274  E-GPFPTEL  281

------------------------------------------------------------------------------------------------------------------------
c)BLASTx vs NR


                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|YP_002122139.1|  ABC transporter related [Hydrogenobaculum...  39.7    0.63 
gb|AAV31177.2|  hypothetical protein STB1_57t00015 [Solanum tu...  39.3    0.83 
ref|YP_001893561.1|  CagE TrbE VirB component of type IV trans...  38.5    1.4  

ALIGNMENTS
>ref|YP_002122139.1| ABC transporter related [Hydrogenobaculum sp. Y04AAS1]
 gb|ACG58161.1| ABC transporter related [Hydrogenobaculum sp. Y04AAS1]
Length=262

 Score = 39.7 bits (91),  Expect = 0.63
 Identities = 43/173 (25%), Positives = 72/173 (42%), Gaps = 23/173 (13%)
 Frame = -2

Query  574  PLPTELIADRNALENGSLNFGRGY-------ENIIELTRYWKNDDSGIYD------RDLE  434
            P    +I + +A EN  L  G+ Y       E +++    W   DS +        R LE
Sbjct  84   PQDISIIPNLSAYENLRL-IGKLYGVSKDSVEKMLKAVNLWDRKDSMVQTFSGGMIRKLE  142

Query  433  RQLFRIHMIQNLLL--PTIGQDPKQILTYGAIHRFHASDADETVFEDRMRAAEHLGLRQT  260
              +  +H  + LLL  P++G DP   L+  +  +    D    +    M  AE +     
Sbjct  143  IAMALLHSPKLLLLDEPSVGLDPNSKLSIWSFLKSKKQDMAVLITTHDMNEAERI----C  198

Query  259  DRSIPLAIPGREAIIASATLDEIREYVNQRTRLLMEQYLRNRQNRPAATNQDL  101
            DR   +AI  +  I+A  TL+E+RE +  +   L E +++          QD+
Sbjct  199  DR---IAIMKKGQIVAKGTLEELRELIGDKNASLEEIFIQLTGENIEEDTQDI  248


>gb|AAV31177.2| hypothetical protein STB1_57t00015 [Solanum tuberosum]
Length=443

 Score = 39.3 bits (90),  Expect = 0.83
 Identities = 31/130 (24%), Positives = 57/130 (44%), Gaps = 7/130 (5%)
 Frame = -2

Query  475  WKNDDSGIYDRDLERQLFRIHMIQNLLLPTIGQDPKQILTYGAIHRFHASDADETVFEDR  296
            WK  +  I+  DL+  +FR H  Q+     +  D +++     I+   +S  DET  ED+
Sbjct  93   WKTSNESIFFDDLKNMIFRTHGNQHKFRNIVLTD-EELNAMDQINLHQSSSQDET--EDQ  149

Query  295  MRAAEHLGLRQTDRSIPLAIPGREAIIASATLDEIREYVNQRTRLLMEQYLRNRQNRPAA  116
              + +H      D         +E       + +++ YV+  T+L +E+ + +R     A
Sbjct  150  ATSEKH----DVDFDKKYVELKKEIAEVDKHMTDMKAYVDNSTKLTIEEIMSSRGQPSQA  205

Query  115  TNQDLPADHH  86
            T+Q   A  H
Sbjct  206  THQQDDARQH  215


>ref|YP_001893561.1| CagE TrbE VirB component of type IV transporter system [Burkholderia 
phytofirmans PsJN]
 gb|ACD21632.1| CagE TrbE VirB component of type IV transporter system [Burkholderia 
phytofirmans PsJN]
Length=849

 Score = 38.5 bits (88),  Expect = 1.4
 Identities = 24/54 (44%), Positives = 33/54 (61%), Gaps = 5/54 (9%)
 Frame = -2

Query  640  FAGPDP-VSHLQAVY-RLWNGESGPLPTELIADRNALENGSLNFGRGYENIIEL  485
            F G  P +SHLQA Y RL N + GP+P +  + R A++   L+FGR    IIE+
Sbjct  182  FDGEKPAISHLQAFYGRLLNADVGPVPLDSYSVRFAIQRNELHFGR---EIIEI  232


  Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
excluding environmental samples from WGS projects
    Posted date:  Apr 21, 2011  4:12 PM
  Number of letters in database: 444,284,216
  Number of sequences in database:  13,804,263

Lambda     K      H
   0.318    0.134    0.401 
Gapped
Lambda     K      H
   0.267   0.0410    0.140 
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Number of Sequences: 13804263
Number of Hits to DB: 450973826
Number of extensions: 11498552
Number of successful extensions: 28075
Number of sequences better than 10: 0
Number of HSP's better than 10 without gapping: 0
Number of HSP's gapped: 28069
Number of HSP's successfully gapped: 0
Length of query: 941
Length of database: 4739251512
Length adjustment: 138
Effective length of query: 803
Effective length of database: 2834263218
Effective search space: 495996063150
Effective search space used: 495996063150
T: 12
A: 40
X1: 16 (7.3 bits)
X2: 38 (14.6 bits)
X3: 64 (24.7 bits)
S1: 41 (20.4 bits)
S2: 81 (35.8 bits)