GOS 1438010

From Metagenes
Warning: this metagenomic sequence has been carefully annotated by students during bioinformatics assignments. These quality annotations are therefore the result of a teaching exercise that you are most welcome to amend and extend if necessary!


Sequence
CAMERA AccNum : JCVI_READ_1091143176889
Annotathon code: GOS_1438010
Sample :
  • GPS :5°33'10n; 87°5'16w
  • Eastern Tropical Pacific: Dirty Rock, Cocos Island - Costa Rica
  • Fringing Reef (-1.1m, 28.3°C, 0.8-3.0 microns)
Authors
Team : Algarve
Username : MIEBbestteam1
Annotated on : 2010-07-12 11:17:59
  • DianaFilipaNunesReis a36795@ualg.pt
  • RicardoJorgePereira a36809@ualg.pt
  • TiagoPereiraFernandes a36810@ualg.pt

Synopsis

Genomic Sequence

>JCVI_READ_1091143176889 GOS_1438010 Genomic DNA
GAGCGATTGGACAAGAACGATCAGCTTATCAAGGCGGGTCGTCAAGACGAAGCGACGGCCGCTACCGAAAGGCTAATCGATGGGCTAACGAAGTCGGGCT
TCACAGATCAGGAAATAGAGGCTGCGATGAAGGCGGTGGTTAAAGAGAAAAAGGAAAAGAGGGAAGCCGAGAGTTCTAAACAACTTATCGCGTATCAGAA
TGAGCTCTTGCGCCCCCAGTGCCCCGATGCGAAGGCGAAGCATGTCGCGAGCCAGAAAGTTCTGCTGGAACACGTGACCCCGGCGGTTATCGTTGCGGCG
AGCGCGGCCTGGGCGGCATCCGGCAAGACGAAGCCAAGCCACGGTGTCGACCCGGCTGAGTTTGGAATCGCGCCCGAAATGTTAGAGGGGGTTGAGAAGG
CTTTAGGTGCCGAGATCACGCCCGTAAGCGTGGCAAAACTTAGAGCGGAAGCGGTTTCGGGATCAGATTCAGCTGCAGGAAGTGATAAAACTAAAAGACA
GCAGGACTTAATGAAAGCCCTTCGCGAACAGGCGGAAATAAACCAGTTGCTTCTCGGGGCCGGCCATGAAGCCGAAGTGAATATCAGCATGAAAGAGCGC
TACGTTAGCCTAATGCAGCAGGGATACACTGTGCAGGAAATAGAGGATGGCCAGAAGGTTGTAGCCAAAGGACAACTTGTGCAAGCTCTTCGATTGCAAT
TGAAGGAGCGCGATCGGCTTATTAAGGCGGGCCGTGAAGCCGAAGCGGTAGCCGGCACCGATAAGCTAATCGATAAACTAAAGAAGCTGGGATTCACAGA
GCATGAATTAGCGGATGAGGTAGAGGCAGTGGTTAAAGAGAAAAAGGAGGCAGTGGTTAAAG

Translation

[ - /862]   unknown


Annotator commentaries

After analyzing our results we can conclude that our ORF is probably non-coding and consequently the data base used might not have a know sequence with identical biological function, since the E-values were high and inconclusive we can't deduce more information from this data.

ORF finding

PROTOCOL


a)SMS ORFinder / forward strand / frames 1, 2 & 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code

b)SMS ORFinder / reverse strand / frames 1, 2 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code



RESULTS ANALYSIS


a) Forward strand: only one ORF have been found


b) reverse strand: three ORF have been found


The ORF that we had chosen belongs to forward strand because is the longest one. The ORF extends from the base 1 to base 861, however it hasn't start codon neither stop codon.

RAW RESULTS:

a)forward strand

>ORF number 1 in reading frame 1 on the direct strand extends from base 1 to base 861.
GAGCGATTGGACAAGAACGATCAGCTTATCAAGGCGGGTCGTCAAGACGAAGCGACGGCC
GCTACCGAAAGGCTAATCGATGGGCTAACGAAGTCGGGCTTCACAGATCAGGAAATAGAG
GCTGCGATGAAGGCGGTGGTTAAAGAGAAAAAGGAAAAGAGGGAAGCCGAGAGTTCTAAA
CAACTTATCGCGTATCAGAATGAGCTCTTGCGCCCCCAGTGCCCCGATGCGAAGGCGAAG
CATGTCGCGAGCCAGAAAGTTCTGCTGGAACACGTGACCCCGGCGGTTATCGTTGCGGCG
AGCGCGGCCTGGGCGGCATCCGGCAAGACGAAGCCAAGCCACGGTGTCGACCCGGCTGAG
TTTGGAATCGCGCCCGAAATGTTAGAGGGGGTTGAGAAGGCTTTAGGTGCCGAGATCACG
CCCGTAAGCGTGGCAAAACTTAGAGCGGAAGCGGTTTCGGGATCAGATTCAGCTGCAGGA
AGTGATAAAACTAAAAGACAGCAGGACTTAATGAAAGCCCTTCGCGAACAGGCGGAAATA
AACCAGTTGCTTCTCGGGGCCGGCCATGAAGCCGAAGTGAATATCAGCATGAAAGAGCGC
TACGTTAGCCTAATGCAGCAGGGATACACTGTGCAGGAAATAGAGGATGGCCAGAAGGTT
GTAGCCAAAGGACAACTTGTGCAAGCTCTTCGATTGCAATTGAAGGAGCGCGATCGGCTT
ATTAAGGCGGGCCGTGAAGCCGAAGCGGTAGCCGGCACCGATAAGCTAATCGATAAACTA
AAGAAGCTGGGATTCACAGAGCATGAATTAGCGGATGAGGTAGAGGCAGTGGTTAAAGAG
AAAAAGGAGGCAGTGGTTAAA

>Translation of ORF number 1 in reading frame 1 on the direct strand.
ERLDKNDQLIKAGRQDEATAATERLIDGLTKSGFTDQEIEAAMKAVVKEKKEKREAESSK
QLIAYQNELLRPQCPDAKAKHVASQKVLLEHVTPAVIVAASAAWAASGKTKPSHGVDPAE
FGIAPEMLEGVEKALGAEITPVSVAKLRAEAVSGSDSAAGSDKTKRQQDLMKALREQAEI
NQLLLGAGHEAEVNISMKERYVSLMQQGYTVQEIEDGQKVVAKGQLVQALRLQLKERDRL
IKAGREAEAVAGTDKLIDKLKKLGFTEHELADEVEAVVKEKKEAVVK

No ORFs were found in reading frame 2.

No ORFs were found in reading frame 3.


b)reverse strand

No ORFs were found in reading frame 1.

>ORF number 1 in reading frame 2 on the reverse strand extends from base 482 to base 778.
CATTTCGGGCGCGATTCCAAACTCAGCCGGGTCGACACCGTGGCTTGGCTTCGTCTTGCC
GGATGCCGCCCAGGCCGCGCTCGCCGCAACGATAACCGCCGGGGTCACGTGTTCCAGCAG
AACTTTCTGGCTCGCGACATGCTTCGCCTTCGCATCGGGGCACTGGGGGCGCAAGAGCTC
ATTCTGATACGCGATAAGTTGTTTAGAACTCTCGGCTTCCCTCTTTTCCTTTTTCTCTTT
AACCACCGCCTTCATCGCAGCCTCTATTTCCTGATCTGTGAAGCCCGACTTCGTTAG

>Translation of ORF number 1 in reading frame 2 on the reverse strand.
HFGRDSKLSRVDTVAWLRLAGCRPGRARRNDNRRGHVFQQNFLARDMLRLRIGALGAQEL
ILIRDKLFRTLGFPLFLFLFNHRLHRSLYFLICEARLR*

>ORF number 1 in reading frame 3 on the reverse strand extends from base 3 to base 263.
TTAACCACTGCCTCCTTTTTCTCTTTAACCACTGCCTCTACCTCATCCGCTAATTCATGC
TCTGTGAATCCCAGCTTCTTTAGTTTATCGATTAGCTTATCGGTGCCGGCTACCGCTTCG
GCTTCACGGCCCGCCTTAATAAGCCGATCGCGCTCCTTCAATTGCAATCGAAGAGCTTGC
ACAAGTTGTCCTTTGGCTACAACCTTCTGGCCATCCTCTATTTCCTGCACAGTGTATCCC
TGCTGCATTAGGCTAACGTAG

>Translation of ORF number 1 in reading frame 3 on the reverse strand.
LTTASFFSLTTASTSSANSCSVNPSFFSLSISLSVPATASASRPALISRSRSFNCNRRAC
TSCPLATTFWPSSISCTVYPCCIRLT*

>ORF number 2 in reading frame 3 on the reverse strand extends from base 264 to base 668.
CGCTCTTTCATGCTGATATTCACTTCGGCTTCATGGCCGGCCCCGAGAAGCAACTGGTTT
ATTTCCGCCTGTTCGCGAAGGGCTTTCATTAAGTCCTGCTGTCTTTTAGTTTTATCACTT
CCTGCAGCTGAATCTGATCCCGAAACCGCTTCCGCTCTAAGTTTTGCCACGCTTACGGGC
GTGATCTCGGCACCTAAAGCCTTCTCAACCCCCTCTAACATTTCGGGCGCGATTCCAAAC
TCAGCCGGGTCGACACCGTGGCTTGGCTTCGTCTTGCCGGATGCCGCCCAGGCCGCGCTC
GCCGCAACGATAACCGCCGGGGTCACGTGTTCCAGCAGAACTTTCTGGCTCGCGACATGC
TTCGCCTTCGCATCGGGGCACTGGGGGCGCAAGAGCTCATTCTGA

>Translation of ORF number 2 in reading frame 3 on the reverse strand.
RSFMLIFTSASWPAPRSNWFISACSRRAFIKSCCLLVLSLPAAESDPETASALSFATLTG
VISAPKAFSTPSNISGAIPNSAGSTPWLGFVLPDAAQAALAATITAGVTCSSRTFWLATC
FAFASGHWGRKSSF*

Multiple Alignement

PROTOCOL



RESULTS ANALYSIS


Since we didn't got significant E-values form blast, we couldn't look for ingroup and outgroups and without them we can’t perform an multiple alignment with our ORF

RAW RESULTS

Protein Domains

PROTOCOL



RESULTS ANALYSIS


Since we got E-values inconclusive like 1.6 we couldn’t find any sequence with homology. Consequently we didn't got any results.

RAW RESULTS:

No results available.

Phylogeny

PROTOCOL



RESULTS ANALYSIS


Since we couldn't find the ingroups and outgroups we didn't had the data to create a phylogenetic tree

RAW RESULTS

Taxonomy report

PROTOCOL


a)BLASTp versus NR, NCBI default parameters apart from "1000 Max target sequences"

b)BLASTx versus NR, NCBI default parameters apart from "1000 Max target sequences"



RESULTS ANALYSIS


With e-values so high, we can't relate with certain that our sequence belongs to any of this bacteria family's below.

RAW RESULTS:

a)
cellular organisms
. Bacteria           [bacteria]
. . Dethiosulfovibrio peptidovorans DSM 11002 -   37 2 hits [bacteria]            DNA polymerase III, beta subunit [Dethiosulfovibrio peptido
. . Burkholderia sp. CCGE1002 .................   36 2 hits [b-proteobacteria]    putative serine/threonine protein kinase [Burkholderia sp. 
. . Ruminococcus flavefaciens FD-1 ............   36 1 hit  [firmicutes]          carboxynorspermidine decarboxylase [Ruminococcus flavefacie
. . Bacteroides fragilis NCTC 9343 ............   36 2 hits [CFB group bacteria]  plasmid transfer protein [Bacteroides fragilis NCTC 9343] >
. . Nocardioides sp. JS614 ....................   35 2 hits [high GC Gram+]       acetyl-CoA acetyltransferases [Nocardioides sp. JS614] >gi|
. . Bacteroides sp. 1_1_6 .....................   35 2 hits [CFB group bacteria]  plasmid transfer protein [Bacteroides sp. 1_1_6] >gi|251837
. Solanum demissum ----------------------------   36 3 hits [eudicots]            Plant disease resistant protein, putative [Solanum demissum
. Anopheles gambiae str. PEST .................   36 2 hits [flies]               AGAP003166-PA [Anopheles gambiae str. PEST] >gi|157017731|g
. Branchiostoma floridae ......................   35 2 hits [lancelets]           hypothetical protein BRAFLDRAFT_213083 [Branchiostoma flori

---------------------------------------------------------------------------------------------------------------------------------------------


b)
cellular organisms
. Bacteria           [bacteria]
. . Streptococcus      [firmicutes]
. . . Streptococcus sanguinis SK36 ---------   40  2 hits [firmicutes]        platelet-binding glycoprotein [Streptococcus sanguinis SK36
. . . Streptococcus pyogenes ...............   38  1 hit  [firmicutes]        streptococcal protective antigen [Streptococcus pyogenes]
. . . Streptococcus pneumoniae TIGR4 .......   36  6 hits [firmicutes]        cell wall surface anchor family protein [Streptococcus pneu
. . . Streptococcus pneumoniae 70585 .......   35  2 hits [firmicutes]        cell wall surface anchor family protein [Streptococcus pneu
. . . Streptococcus pneumoniae CGSP14 ......   35  2 hits [firmicutes]        cell wall surface anchor family protein [Streptococcus pneu
. . Rhizobium etli GR56 --------------------   37  1 hit  [a-proteobacteria]  Mandelate racemase/muconate lactonizing protein [Rhizobium 
. . Synechococcus sp. RCC307 ...............   36  3 hits [cyanobacteria]     trigger factor [Synechococcus sp. RCC307] >gi|172047799|sp|
. . Edwardsiella tarda EIB202 ..............   36  2 hits [enterobacteria]    cell division protein [Edwardsiella tarda EIB202] >gi|26798
. . Salinispora tropica CNB-440 ............   36  4 hits [high GC Gram+]     hypothetical protein Strop_1435 [Salinispora tropica CNB-44
. Toxoplasma gondii RH ---------------------   38  1 hit  [apicomplexans]     hyothetical protein [Toxoplasma gondii RH]
. Caenorhabditis elegans (nematode) ........   38  2 hits [nematodes]         DumPY : shorter than wild-type family member (dpy-6) [Caeno
. Leishmania major strain Friedlin .........   38 18 hits [kinetoplastids]    proteophosphoglycan ppg4 [Leishmania major strain Friedlin]
. Pyrococcus horikoshii OT3 ................   38  1 hit  [euryarchaeotes]    173aa long hypothetical protein [Pyrococcus horikoshii OT3]
. Toxoplasma gondii ME49 ...................   37  2 hits [apicomplexans]     hypothetical protein TGME49_111270 [Toxoplasma gondii ME49]
. Toxoplasma gondii VEG ....................   37  1 hit  [apicomplexans]     hypothetical protein TGME49_111270 [Toxoplasma gondii ME49]
. Toxoplasma gondii GT1 ....................   36  1 hit  [apicomplexans]     conserved hypothetical protein [Toxoplasma gondii GT1]
. Leishmania braziliensis MHOM/BR/75/M2904 .   36  8 hits [kinetoplastids]    proteophosphoglycan ppg4 [Leishmania braziliensis MHOM/BR/7
. Leishmania braziliensis ..................   36  8 hits [kinetoplastids]    proteophosphoglycan ppg4 [Leishmania braziliensis MHOM/BR/7
. Danio rerio (zebra fish) .................   35  1 hit  [bony fishes]       PREDICTED: hypothetical protein, partial [Danio rerio]
. Ricinus communis .........................   35  2 hits [eudicots]          structural constituent of nuclear pore, putative [Ricinus c
. Drosophila erecta ........................   35  2 hits [flies]             GG24688 [Drosophila erecta] >gi|190659978|gb|EDV57170.1| GG
. Leishmania infantum JPCM5 ................   35  1 hit  [kinetoplastids]    proteophosphoglycan ppg4 [Leishmania infantum] >gi|13407323
. Leishmania infantum ......................   35  1 hit  [kinetoplastids]    proteophosphoglycan ppg4 [Leishmania infantum] >gi|13407323

BLAST

PROTOCOL


a)BLASTp versus NR, NCBI default parameters apart from "Number of descriptions_1000"

b)BLASTx versus NR, NCBI default parameters apart from "Number of descriptions_1000"



RESULTS ANALYSIS


In Blastp analysis we got a few results with high e-values suck us 1.6, 2.5 and 3.3. Probably no sequence with similar biological function have been found. However we can't conclude if that sequence has or not biological function. we also used blastx in order to get more information but it also was inconclusive since we got high E-values like 0.32.

RAW RESULTS:

a)
                                                      Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|ZP_06392035.1|  DNA polymerase III, beta subunit [Dethiosu...  37.7    1.6  
ref|ZP_06224163.1|  putative serine/threonine protein kinase [...  37.0    2.5  
ref|ZP_06145743.1|  carboxynorspermidine decarboxylase [Rumino...  36.6    3.3  
gb|AAT40545.2|  Plant disease resistant protein, putative [Sol...  36.6    3.5  
ref|XP_312857.4|  AGAP003166-PA [Anopheles gambiae str. PEST] ...  36.2    4.1  
ref|YP_209738.1|  plasmid transfer protein [Bacteroides fragil...  36.2    4.7  
ref|XP_002589244.1|  hypothetical protein BRAFLDRAFT_213083 [B...  35.4    6.8  
gb|ABV29169.1|  disease resistance protein R3a-like protein [S...  35.4    7.5  
ref|YP_922219.1|  acetyl-CoA acetyltransferases [Nocardioides ...  35.4    7.8  
ref|ZP_04850359.1|  plasmid transfer protein [Bacteroides sp. ...  35.0    8.6  

ALIGNMENTS
>ref|ZP_06392035.1| DNA polymerase III, beta subunit [Dethiosulfovibrio peptidovorans 
DSM 11002]
 gb|EFC90976.1| DNA polymerase III, beta subunit [Dethiosulfovibrio peptidovorans 
DSM 11002]
Length=386

 Score = 37.7 bits (86),  Expect = 1.6, Method: Compositional matrix adjust.
 Identities = 28/81 (34%), Positives = 40/81 (49%), Gaps = 3/81 (3%)

Query  120  EFGIAPEMLEGVEKALGAEITPVSVAKLRAEAVSG---SDSAAGSDKTKRQQDLMKALRE  176
            E GIA  M E   K LGAE+  V   +    +  G   S S A  DK   +QD++  L  
Sbjct  142  EGGIAGSMGEEFPKYLGAELLQVKSGECHCVSTDGRRLSLSKAYVDKENPEQDMLLPLTS  201

Query  177  QAEINQLLLGAGHEAEVNISM  197
              E  ++L G G + +VN+S+
Sbjct  202  VREFLRILSGLGEDLQVNVSV  222


>ref|ZP_06224163.1| putative serine/threonine protein kinase [Burkholderia sp. CCGE1002]
 gb|EFA56731.1| putative serine/threonine protein kinase [Burkholderia sp. CCGE1002]
Length=361

 Score = 37.0 bits (84),  Expect = 2.5, Method: Compositional matrix adjust.
 Identities = 23/67 (34%), Positives = 34/67 (50%), Gaps = 2/67 (2%)

Query  92   VTPAVIVAASAAWA--ASGKTKPSHGVDPAEFGIAPEMLEGVEKALGAEITPVSVAKLRA  149
            VTP+   +ASA+W   A+G  KP   V   +    PEML  +E+ L   + P++   +R 
Sbjct  161  VTPSAYCSASASWPTHATGGGKPPVPVPSVQRSWPPEMLARIERQLATHMGPLASLLVRR  220

Query  150  EAVSGSD  156
             A   SD
Sbjct  221  AATQASD  227


>ref|ZP_06145743.1| carboxynorspermidine decarboxylase [Ruminococcus flavefaciens 
FD-1]
Length=378

 Score = 36.6 bits (83),  Expect = 3.3, Method: Compositional matrix adjust.
 Identities = 28/114 (24%), Positives = 52/114 (45%), Gaps = 13/114 (11%)

Query  101  SAAWAASGKTKPSHGVDPAEFGIAPEMLEGVEKALGAEI-------TPVSVAKLRAEAVS  153
            S  +  +G   P + +D A      E+L+GVE   GA+I       +   V  L AE +S
Sbjct  2    SVNFDTAGLPTPCYIIDEARLIHNLEILKGVEDRTGAKILLAQKAFSCYHVYPLIAEYIS  61

Query  154  GSDSAA------GSDKTKRQQDLMKALREQAEINQLLLGAGHEAEVNISMKERY  201
            G+  +       G ++  ++  +  A   ++EI++++   GH    + S  +RY
Sbjct  62   GTACSGLFEAKLGYEEMGKENHVFSAAYRESEIDEIISYCGHIIFNSFSQLDRY  115


>gb|AAT40545.2| Plant disease resistant protein, putative [Solanum demissum]
 gb|ABV29181.1| disease resistance protein R3a-like protein [Solanum demissum]
Length=1314

 Score = 36.6 bits (83),  Expect = 3.5, Method: Composition-based stats.
 Identities = 30/123 (24%), Positives = 57/123 (46%), Gaps = 3/123 (2%)

Query  146  KLRAEAVSGSDSAAGSDKTKRQQDLMKALREQAEINQLLLGAGHEAEV-NISMKERYVSL  204
            K   E +S   S + +D +K ++D++  L+    IN+L +G     +  N    + ++ L
Sbjct  729  KNHVEMLSLEWSRSIADNSKNEKDILDGLQPNTNINELQIGGYRGTKFPNWLADQSFLKL  788

Query  205  MQQGYTVQEIEDGQKVVAKGQLVQALRLQLKERDRLIKAGREAEAVAGTDKLIDKLKKLG  264
            +Q   ++   +D   + A GQL     L ++   R+I+   E      + K  + L+KL 
Sbjct  789  VQ--LSLSNCKDCDSLPALGQLPSLKFLAIRRMRRIIEVTEEFYGSLSSKKPFNSLEKLE  846

Query  265  FTE  267
            F E
Sbjct  847  FAE  849


>ref|XP_312857.4| AGAP003166-PA [Anopheles gambiae str. PEST]
 gb|EAA08490.4| AGAP003166-PA [Anopheles gambiae str. PEST]
Length=1556

 Score = 36.2 bits (82),  Expect = 4.1, Method: Compositional matrix adjust.
 Identities = 47/204 (23%), Positives = 81/204 (39%), Gaps = 24/204 (11%)

Query  20   AATERLIDGLTKSGFTDQEIEAAMKAVVKEKKEKREAESSKQLI-AYQNELLR---PQCP  75
             AT+RLI  LT     D      +K +  ++  +  ++   +++  ++N  L    PQ P
Sbjct  563  CATDRLICSLT----FDARFNGGIKQISLDRSSECVSKFETEIVRNFKNYFLLQPIPQSP  618

Query  76   DAKAKHVAS-----------QKVLLEHVTPAVIVAASAAWAASGKT-KPSHGVDPAEFG-  122
            +  +    S            + +++   P +  A    W   GK  +P H    AEFG 
Sbjct  619  NTMSNTSKSLPCEAENSSLLDETVIDAAVPWIEQAERERWEEKGKIYEPEHEAIFAEFGE  678

Query  123  ---IAPEMLEGVEKALGAEITPVSVAKLRAEAVSGSDSAAGSDKTKRQQDLMKALREQAE  179
                  E+L   E+A   E  P+ V  L AEA       A   K + ++ L + +  Q +
Sbjct  679  IKHQLKELLNHNERAPVEERFPLQVFNLNAEATEKLTQEANHTKDEEKKRLQQFIESQKD  738

Query  180  INQLLLGAGHEAEVNISMKERYVS  203
            IN+ L+            K RY++
Sbjct  739  INEKLIENCWMIMTKKPWKIRYLN  762


>ref|YP_209738.1| plasmid transfer protein [Bacteroides fragilis NCTC 9343]
 emb|CAH05759.1| plasmid transfer protein [Bacteroides fragilis NCTC 9343]
Length=946

 Score = 36.2 bits (82),  Expect = 4.7, Method: Composition-based stats.
 Identities = 19/58 (32%), Positives = 34/58 (58%), Gaps = 3/58 (5%)

Query  1    ERLDKNDQLIKAGRQDEATAATERLIDGLTKSGFTDQEI---EAAMKAVVKEKKEKRE  55
            E++DK  QL++ G   E T A   + + L + GFT QE+   E  + +V++ + +K+E
Sbjct  567  EKIDKLQQLVEKGEGGEKTNAERAIQNILIEKGFTRQELDNPETRLLSVIERRIQKKE  624


>ref|XP_002589244.1| hypothetical protein BRAFLDRAFT_213083 [Branchiostoma floridae]
 gb|EEN45255.1| hypothetical protein BRAFLDRAFT_213083 [Branchiostoma floridae]
Length=707

 Score = 35.4 bits (80),  Expect = 6.8, Method: Compositional matrix adjust.
 Identities = 33/129 (25%), Positives = 60/129 (46%), Gaps = 16/129 (12%)

Query  138  EITPVSVAKLRAEAVSGSDSAAGSDKTKRQQDLMKALREQAE-INQLLLGAGHEAEVNIS  196
             ++P +V  +++EA+     ++G D T    DL + +    +  N+ LLG    A  N  
Sbjct  261  NVSPKTVTMMQSEALHFDKQSSGKDVTIAPADLYRKIETVLQYFNKHLLGIDVGAPDN--  318

Query  197  MKERYVSLMQQGYTVQEIEDGQKVVAKGQLVQALRLQLKERDRLIKAGREAEAVAGTDKL  256
                  SLM   ++    ED  + V KG LV ++    +E +R +      E ++ T++ 
Sbjct  319  ------SLMSM-FSDFLWEDLSQAVIKGCLVHSIPRTSQELERYM------EVISATEEF  365

Query  257  IDKLKKLGF  265
            +  L+ LGF
Sbjct  366  VSSLENLGF  374


>gb|ABV29169.1| disease resistance protein R3a-like protein [Solanum demissum]
Length=1306

 Score = 35.4 bits (80),  Expect = 7.5, Method: Composition-based stats.
 Identities = 29/123 (23%), Positives = 58/123 (47%), Gaps = 3/123 (2%)

Query  146  KLRAEAVSGSDSAAGSDKTKRQQDLMKALREQAEINQLLLGAGHEAEV-NISMKERYVSL  204
            K   E +S   S + +D +K +++++  L+    IN+L +G     +  N    + ++ L
Sbjct  729  KNHVEMLSLEWSRSIADNSKNEKEILDGLQPNTNINELQIGGYRGTKFPNWLADQSFLKL  788

Query  205  MQQGYTVQEIEDGQKVVAKGQLVQALRLQLKERDRLIKAGREAEAVAGTDKLIDKLKKLG  264
            +Q   ++   +D   + A GQL     L ++   R+I+  +E      + K  + L+KL 
Sbjct  789  VQ--LSLSNCKDCDSLPALGQLPSLKFLAIRRMHRIIEVTQEFYGSLSSKKPFNSLEKLE  846

Query  265  FTE  267
            F E
Sbjct  847  FAE  849


>ref|YP_922219.1| acetyl-CoA acetyltransferases [Nocardioides sp. JS614]
 gb|ABL80532.1| acetyl-CoA acetyltransferases [Nocardioides sp. JS614]
Length=404

 Score = 35.4 bits (80),  Expect = 7.8, Method: Compositional matrix adjust.
 Identities = 36/129 (27%), Positives = 57/129 (44%), Gaps = 15/129 (11%)

Query  14   RQDEATAATERLIDGLTKSGFTDQEIEAAMKAVVKEKKEKREAESSKQLIAYQNELLRPQ  73
            RQDE  A + RL D    SGF D ++   ++ V  ++ E     SS + +A    + RP+
Sbjct  181  RQDEFAARSHRLADAAWTSGFYD-DLVVPVEGVDLDRDEGIRPGSSAERLAGLRPVFRPE  239

Query  74   CPDAKAKHVASQKVLLEHVTPAVIVAASAAWAASGK-------TKPSHGVDPAEFGIAPE  126
                     A     L     AV++ + AA A  G+        + +  V+P  FG AP 
Sbjct  240  -----GTITAGNASPLNDGASAVLLGSEAAAATIGRDPVARIAGRGASAVEPQWFGYAP-  293

Query  127  MLEGVEKAL  135
             +E  ++AL
Sbjct  294  -VEAADRAL  301


>ref|ZP_04850359.1| plasmid transfer protein [Bacteroides sp. 1_1_6]
 gb|EES65583.1| plasmid transfer protein [Bacteroides sp. 1_1_6]
Length=946

 Score = 35.0 bits (79),  Expect = 8.6, Method: Composition-based stats.
 Identities = 19/58 (32%), Positives = 33/58 (56%), Gaps = 3/58 (5%)

Query  1    ERLDKNDQLIKAGRQDEATAATERLIDGLTKSGFTDQEI---EAAMKAVVKEKKEKRE  55
            E++DK  QL++ G   E T A   + + L   GFT QE+   E  + +V++ + +K+E
Sbjct  567  EKIDKLQQLVEKGEGGEKTNAERAIQNILIGKGFTRQELDNPETRLLSVIERRIQKKE  624

----------------------------------------------------------------------------------

b)

 Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|YP_001034807.1|  platelet-binding glycoprotein [Streptococ...  40.0    0.32 
gb|ACD81471.1|  streptococcal protective antigen [Streptococcu...  38.9    0.72 
emb|CAJ20290.1|  hyothetical protein [Toxoplasma gondii RH]        38.5    0.94 
ref|NP_509435.2|  DumPY : shorter than wild-type family member...  38.5    0.94 
ref|XP_843162.1|  proteophosphoglycan ppg4 [Leishmania major s...  38.1    1.2  
dbj|BAA29086.1|  173aa long hypothetical protein [Pyrococcus h...  38.1    1.2  
ref|XP_002364369.1|  hypothetical protein TGME49_111270 [Toxop...  37.7    1.6  
ref|XP_843163.1|  proteophosphoglycan 5 [Leishmania major stra...  37.7    1.6  
ref|ZP_03522814.1|  Mandelate racemase/muconate lactonizing pr...  37.4    2.1  
gb|EEE25672.1|  conserved hypothetical protein [Toxoplasma gon...  37.0    2.7  
ref|YP_001226323.1|  trigger factor [Synechococcus sp. RCC307]...  37.0    2.7  
ref|YP_003296245.1|  cell division protein [Edwardsiella tarda...  36.6    3.6  
ref|XP_001568166.1|  proteophosphoglycan ppg4 [Leishmania braz...  36.6    3.6  
ref|YP_001158279.1|  hypothetical protein Strop_1435 [Salinisp...  36.2    4.7  
ref|XP_001568167.1|  proteophosphoglycan ppg4 [Leishmania braz...  36.2    4.7  
ref|XP_001568165.1|  proteophosphoglycan ppg3 [Leishmania braz...  36.2    4.7  
ref|NP_346206.1|  cell wall surface anchor family protein [Str...  36.2    4.7  
ref|YP_002741036.1|  cell wall surface anchor family protein [...  35.8    6.1  
ref|XP_001923127.1|  PREDICTED: hypothetical protein, partial ...  35.8    6.1  
ref|XP_002511630.1|  structural constituent of nuclear pore, p...  35.4    8.0  
ref|XP_001968111.1|  GG24688 [Drosophila erecta] >gb|EDV57170....  35.4    8.0  
ref|YP_001836467.1|  cell wall surface anchor family protein [...  35.4    8.0  
ref|XP_001468867.1|  proteophosphoglycan ppg4 [Leishmania infa...  35.4    8.0  

ALIGNMENTS
>ref|YP_001034807.1| platelet-binding glycoprotein [Streptococcus sanguinis SK36]
 gb|ABN44257.1| Platelet-binding glycoprotein [Streptococcus sanguinis SK36]
Length=1625

 Score = 40.0 bits (92),  Expect = 0.32
 Identities = 42/148 (28%), Positives = 68/148 (45%), Gaps = 9/148 (6%)
 Frame = -2

Query  812   ANSCSVNPSFFSLSISLSVPATASASRPALISRSRSFNCNRRACTSCPLATTFWPSSISC  633
             + S SV+ S  S S S SV A+ SAS  A +S S S + +  A  S   +T+   S+   
Sbjct  1354  STSASVSAST-SASTSASVSASTSASTSASVSASESASTSSSAKASESASTSASVSASES  1412

Query  632   TVYPCCIRLT*RSFMLIFTSASWPAPRSNWFISACSRRAFIKSCCLLVLSLPAAESDPET  453
                   + ++         SAS  A  S    ++ S             S  A+ES    
Sbjct  1413  ASTSASVSVS--------ESASTSASVSTSTSASTSASVSASESASTSASASASESTSVR  1464

Query  452   ASALSFATLTGVISAPKAFSTPSNISGA  369
             ASAL+ A+++  +SA ++ ST +++SG+
Sbjct  1465  ASALALASISASVSASESASTSASLSGS  1492


>gb|ACD81471.1| streptococcal protective antigen [Streptococcus pyogenes]
Length=553

 Score = 38.9 bits (89),  Expect = 0.72
 Identities = 41/157 (26%), Positives = 73/157 (46%), Gaps = 16/157 (10%)
 Frame = +1

Query  343  GVDPAEFGIAPEMLEGVEKALGAEITPVSVAKLRAEAVsgsdsaagsdKTKRQQDLMKAL  522
            G+DPA FG +   LE   + LG+  + VS  +  +  +        + + K +Q L + L
Sbjct  98   GIDPARFGYSNSQLEFYSRQLGSLNSGVSDWQQGSVNLKTLLIEEFAKRIKSEQKLKEVL  157

Query  523  REQAEINQLLLGAGHEAEVNISMKERYVSLMQQ-----GYTVQEIEDGQKV--VAKGQLV  681
             EQA   +L L    E+E+  + KER  S ++          +E++D ++        + 
Sbjct  158  AEQAA--ELELRKSEESELK-TQKERLESKLENAEYATAIKQKELDDAKEANKTLSESIA  214

Query  682  QALRLQLKERDR----LIKAGREAEAVAGTDKLIDKL  780
            + L    KE+D+    L+K   +A  +A   +L+DKL
Sbjct  215  KTLSRSTKEKDKLKEELVKEKTKAAKIA--KELMDKL  249


>emb|CAJ20290.1| hyothetical protein [Toxoplasma gondii RH]
Length=1821

 Score = 38.5 bits (88),  Expect = 0.94
 Identities = 38/164 (23%), Positives = 65/164 (39%), Gaps = 21/164 (12%)
 Frame = -2

Query  800  SVNPSFFSLSISLSVPATASASRPALISRSRSFNCNRRACTSCPLATTFWPSSISCTVYP  621
            S +PS +  S+S S  A  S+   +    S SF+C+  AC+ C  ++    S   C    
Sbjct  502  SCHPSSYPPSVSPSFCAPCSSFCSSSSFSSSSFSCSSSACSGCSFSSCSSSSCSGCLFSS  561

Query  620  CCIRLT*RSFMLIFTSASW------PAPRSNWFISACSRRAFIKSCCLLVLSLPAAESDP  459
            C             +S+SW          S+   S+CS  +F            ++    
Sbjct  562  C-------------SSSSWSGCSFSSCSSSSCSSSSCSGCSFSSCSSSSCSGCSSSSCSS  608

Query  458  ETASALSFATLTGVISAPKAFS--TPSNISGAIPNSAGSTPWLG  333
             + S  SF++ +    +  +FS  + S+ SG   +S  S+ W G
Sbjct  609  SSWSGCSFSSCSSSSCSGCSFSSCSSSSCSGCSSSSCSSSSWSG  652


>ref|NP_509435.2| DumPY : shorter than wild-type family member (dpy-6) [Caenorhabditis 
elegans]
 gb|AAB07691.2| Dumpy : shorter than wild-type protein 6 [Caenorhabditis elegans]
Length=1254

 Score = 38.5 bits (88),  Expect = 0.94
 Identities = 44/146 (30%), Positives = 69/146 (47%), Gaps = 13/146 (8%)
 Frame = +1

Query  340   HGVDPAEFGIAPE--MLEGVEKALGAEITPVSVAKLRAEAVsgsdsaagsdKTKRQQDLM  513
             H    A F  A E  + +G +K +  E  P    + RA+  +  D     +K  R+Q + 
Sbjct  1048  HTTTDAAFVTATEASLNDGSDKKIIDEAQPTDEIR-RAQPTNEMDKEMEFEKRIREQRIQ  1106

Query  514   ----KALREQAEINQLLLGAGHEAEVNISMKERYVSLMQQGYTVQEIEDGQKVVAKGQLV  681
                 K LRE+  + + L     E +    M ER   ++QQ   ++E E+ Q+V     L+
Sbjct  1107  MEQAKRLREEELLEKQLQEQEIEEKARNEMIERKQKMLQQLEELKEAEERQRV-----LL  1161

Query  682   QALRLQLKERDRLIKAGREAEAVAGT  759
             +  RLQ +ER RLI A +EAE   G+
Sbjct  1162  EQERLQEQERQRLI-AEKEAEIAFGS  1186


>ref|XP_843162.1| proteophosphoglycan ppg4 [Leishmania major strain Friedlin]
 gb|AAZ14280.1| proteophosphoglycan ppg4 [Leishmania major strain Friedlin]
Length=7194

 Score = 38.1 bits (87),  Expect = 1.2
 Identities = 44/147 (29%), Positives = 68/147 (46%), Gaps = 20/147 (13%)
 Frame = -2

Query  773   SISLSVPATASASRPALISRSRSFNCNRR---ACTSCPLATTFWPSSISCTVYPCCIRLT  603
             S S S P+ +S+S P+  S + S + +     + +S PLA++    S S +  P      
Sbjct  3334  SSSSSAPSASSSSAPSSSSSAPSASSSSAPSSSSSSAPLASSSSAPSSSSSSAPSA----  3389

Query  602   *RSFMLIFTSASWPAPRSNWFISACSRRAFIKSCCLLVLSLPAAESDPETASALSFATLT  423
                     +S+S P+  S+   SA S  A   S      S P+A S    +S+ S A L 
Sbjct  3390  --------SSSSAPSSSSSSAPSASSSSAPSSSSS----SAPSASSSSAPSSSSSSAPLA  3437

Query  422   GVISAPKAFST-PSNISGAIPNSAGST  345
                SAP + ST PS  S + P+S+ S+
Sbjct  3438  SSSSAPSSSSTAPSASSSSAPSSSSSS  3464


 Score = 35.8 bits (81),  Expect = 6.1
 Identities = 45/158 (28%), Positives = 68/158 (43%), Gaps = 34/158 (21%)
 Frame = -2

Query  812  ANSCSVNPSFFSLSISLSVPATASASRPALISRSRSFNCNRRACTSCPLATTFWPSSISC  633
            +NSC  +P+    S S S P+ +S+S P+  S            +S P A++    S S 
Sbjct  308  SNSCEKHPT----SSSSSAPSASSSSAPSSSS------------SSAPSASSSSAPSSSS  351

Query  632  TVYPCCIRLT*RSFMLIFTSASWPAPRSNWFISACSRRAFIKSCCLLVLSLPAAESDPET  453
            +  P              +S+S P+  S+   SA S  A   S      S P+A S    
Sbjct  352  SSAPSA------------SSSSAPSSSSSSAPSASSSSAPSSSS-----SAPSASSSSAP  394

Query  452  ASALSFATLTGVISAPKAFST-PSNISGAIPNSAGSTP  342
            +S+ S A      SAP + S+ PS  S + P+S+ S P
Sbjct  395  SSSSSSAPSVSSSSAPSSSSSAPSASSSSAPSSSSSAP  432


 Score = 35.4 bits (80),  Expect = 8.0
 Identities = 45/153 (29%), Positives = 69/153 (45%), Gaps = 13/153 (8%)
 Frame = -2

Query  800   SVNPSFFSLSISLSVPATASASRPALISRSRSFNCNRRACTSCPLATTFWPSSISCTVYP  621
             S + S    S S S P+ +S+S P+    S S +    + +S P +++    S S +  P
Sbjct  1232  SASSSSAPSSSSSSAPSASSSSAPS----SSSSSAPSASSSSAPSSSSSTAPSASSSSAP  1287

Query  620   CCIRLT*RSFMLIFTSASWPAPRSNWFISACSRRAFIKSCCLLVLSLPAAESDPETASAL  441
                  T  S     +S+S P+  S+   SA S  A   S      S P+A S    +S+ 
Sbjct  1288  SSSSSTAPSA----SSSSAPSSSSSSAPSASSSSAPSSSSS----SAPSASSSSAPSSSS  1339

Query  440   SFATLTGVISAPKAFST-PSNISGAIPNSAGST  345
             S A      SAP + ST PS  S + P+S+ S+
Sbjct  1340  SSAPSASSSSAPSSSSTAPSASSSSAPSSSSSS  1372


>dbj|BAA29086.1| 173aa long hypothetical protein [Pyrococcus horikoshii OT3]
Length=173

 Score = 38.1 bits (87),  Expect = 1.2
 Identities = 33/113 (29%), Positives = 55/113 (48%), Gaps = 5/113 (4%)
 Frame = -2

Query  806  SCSVNPSFFSLSISLSVPATASASRPALISRSRSFNCNRRACTSCPLATTFWPSSISCTV  627
            +CS+ P+ F+ S SL + + + A  PA+I  S S +      +S  L+T    SS +C+V
Sbjct  43   NCSLPPTSFAYSSSLMLSSISLAPPPAIIFPSSSMSLTTLMASSRALST----SSTTCSV  98

Query  626  YPCCIRLT*RSFMLIFTSASW-PAPRSNWFISACSRRAFIKSCCLLVLSLPAA  471
             P    +T   F+   T   + PA   +   SA  R + ++S  L+ +  P A
Sbjct  99   PPLMSIVTALGFLQPSTKIMFSPATFLSSTSSASPRSSGVRSLMLVTILAPVA  151


>ref|XP_002364369.1| hypothetical protein TGME49_111270 [Toxoplasma gondii ME49]
 gb|EEA97228.1| hypothetical protein TGME49_111270 [Toxoplasma gondii ME49]
 gb|EEE32844.1| conserved hypothetical protein [Toxoplasma gondii VEG]
Length=1913

 Score = 37.7 bits (86),  Expect = 1.6
 Identities = 46/157 (29%), Positives = 72/157 (45%), Gaps = 9/157 (5%)
 Frame = -2

Query  809  NSCSVNPSFFSLSISLSVPATASASRPALISRSRSFNCNRRACTSCPLATTFWPSSISCT  630
            +S  ++ SFFS   S S  + + A R +L   SRS      A +S  L+    PS +S  
Sbjct  706  SSAPLSASFFSSEKSSSPASISPAVRASLSRASRSVRFRSCATSSVVLSA---PSRVSPF  762

Query  629  VYPCCIRLT*RSFMLIFTSASWPAPRSNWFISA---CSRRAFIKSCCLLVLSLPAAESDP  459
              P     +  S  L F++ S P+  S    SA    S RA++ S    + S  +  S P
Sbjct  763  SLPLSTGRS-SSPSLSFSAPSHPSSSSACDASASALSSSRAYLPSPSPALSS--SLSSCP  819

Query  458  ETASALSFATLTGVISAPKAFSTPSNISGAIPNSAGS  348
             + S+ S +  +   SAP  F + S+ S ++ +S GS
Sbjct  820  SSLSSTSSSAPSSASSAPSTFPSSSSASFSVYSSYGS  856


>ref|XP_843163.1| proteophosphoglycan 5 [Leishmania major strain Friedlin]
 gb|AAZ14281.1| proteophosphoglycan 5 [Leishmania major strain Friedlin]
Length=17392

 Score = 37.7 bits (86),  Expect = 1.6
 Identities = 44/149 (29%), Positives = 66/149 (44%), Gaps = 21/149 (14%)
 Frame = -2

Query  773    SISLSVPATASASRPALISRSR----SFNCNRRACTSCPLATTFWPSSISCTVYPCCIRL  606
              S S S P+ +S+S P+  S S     S +    + +S PLA++    S S +  P     
Sbjct  10033  SSSSSAPSASSSSAPSSSSSSAPSASSSSAPSSSSSSAPLASSSSAPSSSSSTAPSA---  10089

Query  605    T*RSFMLIFTSASWPAPRSNWFISACSRRAFIKSCCLLVLSLPAAESDPETASALSFATL  426
                       +S+S P+  S+   SA S  A   S      S P+A S    +S+ S A  
Sbjct  10090  ---------SSSSAPSSSSSSAPSASSSSAPSSSSS----SAPSASSSSAPSSSSSSAPS  10136

Query  425    TGVISAPKAFST-PSNISGAIPNSAGSTP  342
                  SAP + S+ PS  S + P+S+ S P
Sbjct  10137  ASSSSAPSSSSSAPSASSSSAPSSSSSAP  10165


 Score = 37.4 bits (85),  Expect = 2.1
 Identities = 44/149 (29%), Positives = 67/149 (44%), Gaps = 23/149 (15%)
 Frame = -2

Query  773   SISLSVPATASASRPALISRSRSFNCNRR---ACTSCPLATTFWPSSISCTVYPCCIRLT  603
             S S S P+ +S+S P+  S + S + +     + +S PLA++    S S T         
Sbjct  5858  SSSSSAPSASSSSAPSSSSSAPSASSSSAPSSSSSSAPLASSSSAPSSSSTAPSA-----  5912

Query  602   *RSFMLIFTSASWPAPRSNWFISACSRRAFIKSCCLLVLSLPAAESDPETASALSFATLT  423
                     +S+S P+  S+   SA S  A   S      S P+A S    +S+ S A L 
Sbjct  5913  --------SSSSAPSSSSSSAPSASSSSAPSSSS-----SAPSASSSSAPSSSSSSAPLA  5959

Query  422   GVISAPKAFST--PSNISGAIPNSAGSTP  342
                SAP + S+  PS  S + P+S+ S P
Sbjct  5960  SSSSAPSSSSSSAPSASSSSAPSSSSSAP  5988


 Score = 36.2 bits (82),  Expect = 4.7
 Identities = 45/157 (28%), Positives = 70/157 (44%), Gaps = 20/157 (12%)
 Frame = -2

Query  800    SVNPSFFSLSISLSVPATASASRPALISRSRSFNCNRR---ACTSCPLATTFWPSSISCT  630
              S + S    S S S P+ +S+S P+  S + S + +     + +S PLA++    S S +
Sbjct  12437  SASSSSAPSSSSSSAPSGSSSSAPSSSSSAPSASSSSAPSSSSSSAPLASSSSAPSSSSS  12496

Query  629    VYPCCIRLT*RSFMLIFTSASWPAPRSNWFISACSRRAFIKSCCLLVLSLPAAESDPETA  450
                P              +S+S P+  S+   SA S  A   S      S P+A S    +
Sbjct  12497  SAPSA------------SSSSAPSSSSSSAPSASSSSAPSSSSS----SAPSASSSSAPS  12540

Query  449    SALSFATLTGVISAPKAFST-PSNISGAIPNSAGSTP  342
              S+ S A      SAP + S+ PS  S + P+S+ S P
Sbjct  12541  SSSSSAPSASSSSAPSSSSSAPSASSSSAPSSSSSAP  12577