GOS 1371010

From Metagenes
Warning: this metagenomic sequence has been carefully annotated by students during bioinformatics assignments. These quality annotations are therefore the result of a teaching exercise that you are most welcome to amend and extend if necessary!


Sequence
CAMERA AccNum : JCVI_READ_1091142175437
Annotathon code: GOS_1371010
Sample :
  • GPS :5°38'24n; 86°33'55w
  • Eastern Tropical Pacific: 30 miles from Cocos Island - Costa Rica
  • Open Ocean (-2m, 28.7°C, 0.1-0.8 microns)
Authors
Team : Algarve
Username : MAM
Annotated on : 2010-07-06 17:05:07
  • a36997 AnaVanessaMendesConstantino
  • a37011 MafaldaSofiaReisinhoOliveira
  • a37014 MartaAlexandraRodriguesCasanova

Synopsis

Genomic Sequence

>JCVI_READ_1091142175437 GOS_1371010 Genomic DNA
TTTTTCGATTACACATGGCCCGTAAATGCATCTAATCTTCCAGTATCTTTAAGTAATAATATAAATACACAAGTAGAACTTTTAGATTTACCAACAATAA
CCACAACTGGACCATGGGGAAAAACTGGAAACCCAGCAGCAGTTAATGTGATATGCGAATTACCAGAATTAGGTTCCATTACTTCCAGTAGTCCATACAG
TTCATCCTATGATCACGCATCGATATTAAATGATAGTCAGGCTATGTGGTGTAATAAATCATTTGTTGGTAGTAATTTAGCAAGTGGTGTAAATAATCCA
TATATAAATTATGGAACTTATTTTGGTCAGGGCACTATAAATTATCAATTAAAAAATAATCTAGGAAGTGAAATAAATTTAGGAGGCTCAGGAGGTATAA
CAAATCGTGAGACCCAGGAGTTTCCTACTGCCCCAATAATTAATGAAACCGGTATAAAATGGATATTATTATCTTTAAAATCATCATTAACTGGTAGTAA
TAGTGGAAAGACAGAAGTTGAATTAAAATATAACGGAACTGCATTAAATTTAAAAGACGATTATTTGTTGTTTTATTGTGAACATAAACCTAATTCTAGT
AATAATCAGTATCCTTATACTTTAAATGGCACCACCAACAGCACTTATTCAACTTGGTTAAATGCTATAAGTAATAGTATTCCAAGTGCAACTTCAATAG
TAACTATTAATAATGCCCCGGATTCTGGCGGTAATAATGGTTGTTGGGCAGGGGGAACAAAAGATAAACCAATATTAGCGGATTTAGCTAAAAATGCTAC
AAACAAATACATATTAATTGGTTTATTACAAGGTGAAAAGGTTGATACAATTAATATTGCAAAAGTGCCATAATATAATATTTTATTTGATATAATATTA
TATATGTCTAGTAATTTATTAAACAATGATAACAAAACTAATTATTTGTTTAAAAAAGAA

Translation

[1 - 873/960]   direct strand


Annotator commentaries

It was determined that this ORF is non-coding because we didn't get results on the protein domains, and it take us to believe that this is an incomplete ORF, without stop codon or start codon, concluding it doesn't encode any protein.


Through the size of this sequence we could determine what would be a good result, but that's not what happened during this process. From the BLAST(p and x) were realize that the E-values were very high (BLASTx_ 0.046 and BLASTp_0.40), to obtain data with much accuracy. This proved itself when we try to make the step of protein domains it didn't give us the result. Since there we couldn't make more steps, because some of the steps didn't give us the results and others informations because The Rule Book just considers significant values to the tree, E-values above 10^-4.

ORF finding

PROTOCOL


a) SMS ORFinder / forward strand / frames 1, 2 & 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code


b) SMS ORFinder / reverse strand / frames 1, 2 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code




RESULTS ANALYSIS


We've found to many ORF's in this sequence. On the forward strand we've found one ORF at the frame 1 that extends from base 1 to base 873. At frames 2 and 3 we haven't found any ORF's. On the reverse strand we've found tree ORF's, one at the frame 1 that extends from base 20 to base 316, the other two ORF's are at frame 2, one from base 401 to base 616 and the other one is from base 755 to 952. At frame 3, reverse strand, we haven't found any ORF.


The biggest ORF is supposed to be the best one, so we've choosen the ORF at frame 1, forward strand that extendeds from base 1 to base 873.


We've determinaded that this ORF is non-coding because it doesn't have start codon and stop codon. So this ORF is incomplete and it was impossible to continue this studiing from the protein domains, because this protein doesn't have a known function or because that information wasn't already available at the research base.

RAW RESULTS

a) forward strand

>ORF number 1 in reading frame 1 on the direct strand extends from base 1 to base 873.
TTTTTCGATTACACATGGCCCGTAAATGCATCTAATCTTCCAGTATCTTTAAGTAATAAT
ATAAATACACAAGTAGAACTTTTAGATTTACCAACAATAACCACAACTGGACCATGGGGA
AAAACTGGAAACCCAGCAGCAGTTAATGTGATATGCGAATTACCAGAATTAGGTTCCATT
ACTTCCAGTAGTCCATACAGTTCATCCTATGATCACGCATCGATATTAAATGATAGTCAG
GCTATGTGGTGTAATAAATCATTTGTTGGTAGTAATTTAGCAAGTGGTGTAAATAATCCA
TATATAAATTATGGAACTTATTTTGGTCAGGGCACTATAAATTATCAATTAAAAAATAAT
CTAGGAAGTGAAATAAATTTAGGAGGCTCAGGAGGTATAACAAATCGTGAGACCCAGGAG
TTTCCTACTGCCCCAATAATTAATGAAACCGGTATAAAATGGATATTATTATCTTTAAAA
TCATCATTAACTGGTAGTAATAGTGGAAAGACAGAAGTTGAATTAAAATATAACGGAACT
GCATTAAATTTAAAAGACGATTATTTGTTGTTTTATTGTGAACATAAACCTAATTCTAGT
AATAATCAGTATCCTTATACTTTAAATGGCACCACCAACAGCACTTATTCAACTTGGTTA
AATGCTATAAGTAATAGTATTCCAAGTGCAACTTCAATAGTAACTATTAATAATGCCCCG
GATTCTGGCGGTAATAATGGTTGTTGGGCAGGGGGAACAAAAGATAAACCAATATTAGCG
GATTTAGCTAAAAATGCTACAAACAAATACATATTAATTGGTTTATTACAAGGTGAAAAG
GTTGATACAATTAATATTGCAAAAGTGCCATAA

>Translation of ORF number 1 in reading frame 1 on the direct strand.
FFDYTWPVNASNLPVSLSNNINTQVELLDLPTITTTGPWGKTGNPAAVNVICELPELGSI
TSSSPYSSSYDHASILNDSQAMWCNKSFVGSNLASGVNNPYINYGTYFGQGTINYQLKNN
LGSEINLGGSGGITNRETQEFPTAPIINETGIKWILLSLKSSLTGSNSGKTEVELKYNGT
ALNLKDDYLLFYCEHKPNSSNNQYPYTLNGTTNSTYSTWLNAISNSIPSATSIVTINNAP
DSGGNNGCWAGGTKDKPILADLAKNATNKYILIGLLQGEKVDTINIAKVP*

No ORFs were found in reading frame 2.

No ORFs were found in reading frame 3.

------------------------------------------------------------------------------------------------------------------------
b) reverse strand


No ORFs were found in reading frame 1.

>ORF number 1 in reading frame 2 on the reverse strand extends from base 20 to base 316.
TTAGTTTTGTTATCATTGTTTAATAAATTACTAGACATATATAATATTATATCAAATAAA
ATATTATATTATGGCACTTTTGCAATATTAATTGTATCAACCTTTTCACCTTGTAATAAA
CCAATTAATATGTATTTGTTTGTAGCATTTTTAGCTAAATCCGCTAATATTGGTTTATCT
TTTGTTCCCCCTGCCCAACAACCATTATTACCGCCAGAATCCGGGGCATTATTAATAGTT
ACTATTGAAGTTGCACTTGGAATACTATTACTTATAGCATTTAACCAAGTTGAATAA

>Translation of ORF number 1 in reading frame 2 on the reverse strand.
LVLLSLFNKLLDIYNIISNKILYYGTFAILIVSTFSPCNKPINMYLFVAFLAKSANIGLS
FVPPAQQPLLPPESGALLIVTIEVALGILLLIAFNQVE*

>ORF number 2 in reading frame 2 on the reverse strand extends from base 401 to base 616.
TCGTCTTTTAAATTTAATGCAGTTCCGTTATATTTTAATTCAACTTCTGTCTTTCCACTA
TTACTACCAGTTAATGATGATTTTAAAGATAATAATATCCATTTTATACCGGTTTCATTA
ATTATTGGGGCAGTAGGAAACTCCTGGGTCTCACGATTTGTTATACCTCCTGAGCCTCCT
AAATTTATTTCACTTCCTAGATTATTTTTTAATTGA

>Translation of ORF number 2 in reading frame 2 on the reverse strand.
SSFKFNAVPLYFNSTSVFPLLLPVNDDFKDNNIHFIPVSLIIGAVGNSWVSRFVIPPEPP
KFISLPRLFFN*

>ORF number 3 in reading frame 2 on the reverse strand extends from base 755 to base 952.
GATGAACTGTATGGACTACTGGAAGTAATGGAACCTAATTCTGGTAATTCGCATATCACA
TTAACTGCTGCTGGGTTTCCAGTTTTTCCCCATGGTCCAGTTGTGGTTATTGTTGGTAAA
TCTAAAAGTTCTACTTGTGTATTTATATTATTACTTAAAGATACTGGAAGATTAGATGCA
TTTACGGGCCATGTGTAA

>Translation of ORF number 3 in reading frame 2 on the reverse strand.
DELYGLLEVMEPNSGNSHITLTAAGFPVFPHGPVVVIVGKSKSSTCVFILLLKDTGRLDA
FTGHV*

No ORFs were found in reading frame 3.

Multiple Alignement

PROTOCOL


EBI / ClustalW2 / default values



RESULTS ANALYSIS


As the minimum amount allowed to make a phylogenetic tree is 10^-4, and our values completely out of that value, then we didn't choose our outgroups and ingroups, because we would't make the tree, and therefore we didn't made the multiple alignments.

RAW RESULTS

We didn't continue making the multiple alignments of these alignment because the E-value is too big and 
we weren't able to continue.

Protein Domains

PROTOCOL


a)InterPro, default parameters at EBI



RESULTS ANALYSIS


It was impossible determinate the protein domains from this sequence, because its E-values are very high (the lowest value is 0,046 at BLASTx, which is value a too high, in other words this value is negligible). This databases are constantly increasing due to the discovery of data that is made every day and placed in the bases at the time that the determination of protein domains was made, had not homologous proteins existing.

RAW RESULTS

No hits reported

Phylogeny

PROTOCOL



RESULTS ANALYSIS


As the minimum amount allowed to make a phylogenetic tree is 10^-4, and our values completely out of that value, then we didn't choose our outgroups and ingroups, because we would't make the tree.

RAW RESULTS


We didn't continue making the multiple alignments of these alignment because the E-value is too big and 
we weren't able to continue.

Taxonomy report

PROTOCOL


a)BLASTp versus NR, NCBI default parameters apart from "Number of descriptions_1000"


b)BLASTx versus NR, NCBI default parameters apart from "Number of descriptions_1000"



RESULTS ANALYSIS


As the minimum amount allowed to make a phylogenetic tree is 10^-4, and our values completely out of that value, then we didn't choose our outgroups and ingroups, because we would't make the tree.

RAW RESULTS

a)BLASTp

cellular organisms
. Bacteria           [bacteria]
. . Aeromonas hydrophila subsp. hydrophila ATCC 7966 -   39 2 hits [g-proteobacteria]  M16B family peptidase [Aeromonas hydrophila subsp. hydrophi
. . Opitutaceae bacterium TAV2 .......................   37 2 hits [verrucomicrobia]   Type II secretory pathway pseudopilin PulG-like protein [Op
. . Synechococcus sp. WH 7803 ........................   36 2 hits [cyanobacteria]     putative rhodanese-related sulfurtransferase [Synechococcus
. . Enterococcus faecalis HIP11704 ...................   36 2 hits [firmicutes]        conserved hypothetical protein [Enterococcus faecalis HIP11
. . Kocuria rhizophila DC2201 ........................   36 2 hits [high GC Gram+]     2,5-diketo-D-gluconic acid reductase [Kocuria rhizophila DC
. . Butyrivibrio crossotus DSM 2876 ..................   35 1 hit  [firmicutes]        pilus assembly protein CpaF [Butyrivibrio crossotus DSM 287
. . Synechococcus sp. WH 7805 ........................   35 2 hits [cyanobacteria]     hypothetical protein WH7805_10973 [Synechococcus sp. WH 780
. Lachancea thermotolerans CBS 6340 ------------------   36 1 hit  [ascomycetes]       KLTH0F00990p [Lachancea thermotolerans] >gi|238935635|emb|C
. Lachancea thermotolerans ...........................   36 1 hit  [ascomycetes]       KLTH0F00990p [Lachancea thermotolerans] >gi|238935635|emb|C
. Magnaporthe oryzae 70-15 ...........................   36 2 hits [ascomycetes]       hypothetical protein MGG_05647 [Magnaporthe grisea 70-15] >
. Nematostella vectensis .............................   35 2 hits [sea anemones]      predicted protein [Nematostella vectensis] >gi|156228123|gb
. Metallosphaera sedula DSM 5348 .....................   35 2 hits [crenarchaeotes]    type II secretion system protein E [Metallosphaera sedula D

------------------------------------------------------------------------------------------------------------------------


b)BLASTx

cellular organisms
. Bacteria           [bacteria]
. . Opitutaceae bacterium TAV2 -----------------------   43 2 hits [verrucomicrobia]       Type II secretory pathway pseudopilin PulG-like protein [Op
. . Yersinia mollaretii ATCC 43969 ...................   37 2 hits [enterobacteria]        Large exoprotein involved in heme utilization or adhesion [
. . Leptospirillum ferrodiazotrophum .................   36 1 hit  [bacteria]              conserved hypothetical protein [Leptospirillum ferrodiazotr
. . Aeromonas hydrophila subsp. hydrophila ATCC 7966 .   36 2 hits [g-proteobacteria]      M16B family peptidase [Aeromonas hydrophila subsp. hydrophi
. . Butyrivibrio crossotus DSM 2876 ..................   36 1 hit  [firmicutes]            pilus assembly protein CpaF [Butyrivibrio crossotus DSM 287
. . Lactobacillus coleohominis 101-4-CHN .............   35 2 hits [firmicutes]            N-acetylmuramoyl-L-alanine amidase, family 2 [Lactobacillus
. . Yersinia kristensenii ATCC 33638 .................   35 2 hits [enterobacteria]        Outer membrane autotransporter barrel domain protein [Yersi
. Stachybotrys chartarum -----------------------------   40 1 hit  [ascomycetes]           endoglucanase [Stachybotrys chartarum]
. Stachybotrys echinata ..............................   40 1 hit  [ascomycetes]           endoglucanase [Stachybotrys echinata]
. Dictyostelium discoideum ...........................   39 2 hits [cellular slime molds]  SNF1/AMP-activated kinase [Dictyostelium discoideum]
. Cryptosporidium parvum Iowa II .....................   39 2 hits [apicomplexans]         hypothetical protein [Cryptosporidium parvum Iowa II] >gi|4
. Cryptosporidium hominis TU502 ......................   38 2 hits [apicomplexans]         hypothetical protein [Cryptosporidium hominis TU502] >gi|54
. Cryptosporidium hominis ............................   38 2 hits [apicomplexans]         hypothetical protein [Cryptosporidium hominis TU502] >gi|54
. Lachancea thermotolerans CBS 6340 ..................   37 1 hit  [ascomycetes]           KLTH0F00990p [Lachancea thermotolerans] >gi|238935635|emb|C
. Lachancea thermotolerans ...........................   37 1 hit  [ascomycetes]           KLTH0F00990p [Lachancea thermotolerans] >gi|238935635|emb|C
. Dictyostelium discoideum AX4 .......................   37 6 hits [cellular slime molds]  hypothetical protein DDB_G0277905 [Dictyostelium discoideum
. Naegleria gruberi strain NEG-M .....................   36 1 hit  [eukaryotes]            predicted protein [Naegleria gruberi] >gi|284083132|gb|EFC3
. Naegleria gruberi ..................................   36 1 hit  [eukaryotes]            predicted protein [Naegleria gruberi] >gi|284083132|gb|EFC3
. Metallosphaera sedula DSM 5348 .....................   36 2 hits [crenarchaeotes]        type II secretion system protein E [Metallosphaera sedula D
. Drosophila mojavensis ..............................   36 2 hits [flies]                 GI15579 [Drosophila mojavensis] >gi|193908279|gb|EDW07146.1
. Plasmodium berghei str. ANKA .......................   35 2 hits [apicomplexans]         hypothetical protein [Plasmodium berghei strain ANKA] >gi|5
. Plasmodium berghei .................................   35 2 hits [apicomplexans]         hypothetical protein [Plasmodium berghei strain ANKA] >gi|5
. Polysphondylium pallidum PN500 .....................   35 1 hit  [cellular slime molds]  peptidase C19 family protein [Polysphondylium pallidum PN50

BLAST

PROTOCOL


a)BLASTp versus NR, NCBI default parameters apart from "Number of descriptions_1000"


b)BLASTx versus NR, NCBI default parameters apart from "Number of descriptions_1000"



RESULTS ANALYSIS


The E values are quite insignificant, being the smallest E-value 0.40 in BLASTp and 0.046 in BLASTX.


The number of alignments at significant BLASTp is rather small, not even getting to 12, which is the number requested. In BLASTx this number increases to 23. Even this number had increased, the E-values remained quite high, and then we've conclude that these alignments have a very few homologies, so that the proteins in which BLAST found similarity with our sequence are different, in other words, one is similar and the other one is not.


As the minimum amount allowed to make a phylogenetic tree is 10^-4, and our values completely out of that value, then we didn't choose our outgroups and ingroups, because we would't make the tree.

RAW RESULTS

a) BLASTp
                                                                  Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|YP_855265.1|  M16B family peptidase [Aeromonas hydrophila ...  39.7    0.40 
ref|ZP_03724684.1|  Type II secretory pathway pseudopilin PulG...  37.7    1.6  
ref|YP_001226018.1|  putative rhodanese-related sulfurtransfer...  37.0    2.6  
ref|XP_002554252.1|  KLTH0F00990p [Lachancea thermotolerans] >...  36.6    3.3  
ref|ZP_05568403.1|  conserved hypothetical protein [Enterococc...  36.2    3.8  
ref|XP_360273.1|  hypothetical protein MGG_05647 [Magnaporthe ...  36.2    4.0  
ref|YP_001855451.1|  2,5-diketo-D-gluconic acid reductase [Koc...  36.2    4.3  
ref|ZP_05793210.1|  pilus assembly protein CpaF [Butyrivibrio ...  35.8    5.0  
ref|XP_001640986.1|  predicted protein [Nematostella vectensis...  35.8    5.1  
ref|ZP_01125400.1|  hypothetical protein WH7805_10973 [Synecho...  35.8    5.3  
ref|YP_001192168.1|  type II secretion system protein E [Metal...  35.4    6.6  

ALIGNMENTS
>ref|YP_855265.1| M16B family peptidase [Aeromonas hydrophila subsp. hydrophila 
ATCC 7966]
 gb|ABK38743.1| peptidase, M16B family [Aeromonas hydrophila subsp. hydrophila 
ATCC 7966]
Length=937

 Score = 39.7 bits (91),  Expect = 0.40, Method: Composition-based stats.
 Identities = 32/99 (32%), Positives = 44/99 (44%), Gaps = 12/99 (12%)

Query  100  PYINYGTYFGQGTINYQLKNNLGSEINL----------GGSGGIT-NRETQEFPT-APII  147
            P+   G YF  G +N+ L  N  S INL          G S G + NRE   F T A + 
Sbjct  756  PFDTTGDYFTAGLMNFNLGGNFNSRINLNLREDKGYTYGASSGFSANREAGTFATGANVR  815

Query  148  NETGIKWILLSLKSSLTGSNSGKTEVELKYNGTALNLKD  186
             +  +  I   LK       +G T VEL Y  +A++ +D
Sbjct  816  ADATVDAIRQFLKEMDNYCKNGPTPVELAYMRSAVSQQD  854


>ref|ZP_03724684.1| Type II secretory pathway pseudopilin PulG-like protein [Opitutaceae 
bacterium TAV2]
 gb|EEG21337.1| Type II secretory pathway pseudopilin PulG-like protein [Opitutaceae 
bacterium TAV2]
Length=240

 Score = 37.7 bits (86),  Expect = 1.6, Method: Compositional matrix adjust.
 Identities = 30/86 (34%), Positives = 42/86 (48%), Gaps = 2/86 (2%)

Query  197  PNSSNNQYPYTLNGTTNSTYSTWLNAISNSIPSATSIVTINNAPDSGGNNGCWAGGTKDK  256
            PN +   YPY +N +  S  ST + A S ++PS T ++T N+ P +GG  G W  G K  
Sbjct  133  PNGNAYGYPYAVNMSVISDTSTQVAANSITVPSQTVLMTDNSTPVTGG--GTWTYGFKFT  190

Query  257  PILADLAKNATNKYILIGLLQGEKVD  282
                D     T     +G L GEK +
Sbjct  191  ASGFDDPYWGTRIVERMGALHGEKTN  216


>ref|YP_001226018.1| putative rhodanese-related sulfurtransferase [Synechococcus sp. 
WH 7803]
 emb|CAK24721.1| Putative rhodanese-related sulfurtransferase [Synechococcus sp. 
WH 7803]
Length=116

 Score = 37.0 bits (84),  Expect = 2.6, Method: Compositional matrix adjust.
 Identities = 13/46 (28%), Positives = 25/46 (54%), Gaps = 0/46 (0%)

Query  204  YPYTLNGTTNSTYSTWLNAISNSIPSATSIVTINNAPDSGGNNGCW  249
            +PY +     S  + W++AI   +P++ ++V + +A     N GCW
Sbjct  40   FPYPVEHLPLSEAAQWMDAIDQRLPASQAVVVLCHAGVRSWNFGCW  85


>ref|XP_002554252.1| KLTH0F00990p [Lachancea thermotolerans]
 emb|CAR23815.1| KLTH0F00990p [Lachancea thermotolerans]
Length=785

 Score = 36.6 bits (83),  Expect = 3.3, Method: Compositional matrix adjust.
 Identities = 24/56 (42%), Positives = 35/56 (62%), Gaps = 6/56 (10%)

Query  92   NLASGVNNPYINYGTYFGQGTINYQ--LKNN---LGSEINLGGSGGITNRETQEFP  142
            NL+  VNN ++N+ TYF + TI+++  L NN   LGS+ NL    G +N +T  FP
Sbjct  487  NLSEIVNNIFLNFKTYFKEFTISFKNCLHNNQFALGSQENLNNGVGSSN-DTLAFP  541


>ref|ZP_05568403.1| conserved hypothetical protein [Enterococcus faecalis HIP11704]
 gb|EEU71360.1| conserved hypothetical protein [Enterococcus faecalis HIP11704]
Length=245

 Score = 36.2 bits (82),  Expect = 3.8, Method: Compositional matrix adjust.
 Identities = 32/117 (27%), Positives = 52/117 (44%), Gaps = 22/117 (18%)

Query  4    YTWPVNASNLPVSLSNNINTQVELLDLPTIT-----TTGPWGKTGNPAAVNVI-------  51
            +   VNA+++P  + + +  +   LD+P IT      TG +     P  V  I       
Sbjct  131  FDLIVNAADMPTDIVSVVANKAYELDVPFITGGVGLRTGTFTNVLLPKKVKSILKYYEGN  190

Query  52   -CELPELGSITSSSPYSSSYDHASILNDSQAMWCNKSFVGSNLASGVNNPYINYGTY  107
             CE P  GSI++++    S+    ILN     W N  F   N  S +N  ++++ TY
Sbjct  191  SCETPMKGSISTTNMLVGSFIANEILN----FWINDEF---NEQSHIN--FVDFDTY  238


>ref|XP_360273.1| hypothetical protein MGG_05647 [Magnaporthe grisea 70-15]
 gb|EDK06355.1| hypothetical protein MGG_05647 [Magnaporthe grisea 70-15]
Length=388

 Score = 36.2 bits (82),  Expect = 4.0, Method: Compositional matrix adjust.
 Identities = 22/59 (37%), Positives = 31/59 (52%), Gaps = 7/59 (11%)

Query  227  IPSATSIVTINNAPDSGGNNGCWAGGT------KDKPILADLAKNATNKYILIGLLQGE  279
            IPS T I  + +APD G    C + GT      KD+ + AD ++  T KY+ + L  GE
Sbjct  224  IPSQTMIYCVGSAPDRGAVF-CRSAGTYAVIVAKDEEVKADGSRVMTGKYVTVRLQSGE  281


>ref|YP_001855451.1| 2,5-diketo-D-gluconic acid reductase [Kocuria rhizophila DC2201]
 dbj|BAG29945.1| 2,5-diketo-D-gluconate reductase B [Kocuria rhizophila DC2201]
Length=278

 Score = 36.2 bits (82),  Expect = 4.3, Method: Compositional matrix adjust.
 Identities = 25/85 (29%), Positives = 35/85 (41%), Gaps = 2/85 (2%)

Query  147  INETGIKWILLSLKSSLTGSNSGKTEVELKYNGTALNLKDDYLLFYCEHKPNSSNNQYPY  206
            I  +GI    L L S + G + G+  V     GT   L  D+L  Y  H PN   +QY  
Sbjct  61   IRSSGIPRSELFLTSKIPGRDHGRASVRRSLEGTLTRLGTDHLDLYMIHWPNPGVDQYVQ  120

Query  207  TLNGTTNSTYSTWLNAI--SNSIPS  229
            T      +     + +I  SN +P 
Sbjct  121  TWREMMTARDEGLVRSIGVSNFLPE  145


>ref|ZP_05793210.1| pilus assembly protein CpaF [Butyrivibrio crossotus DSM 2876]
Length=398

 Score = 35.8 bits (81),  Expect = 5.0, Method: Compositional matrix adjust.
 Identities = 20/48 (41%), Positives = 31/48 (64%), Gaps = 3/48 (6%)

Query  202  NQYPYTLNGTTNSTYSTWLNAISNSIPSATSIVTINNAPD---SGGNN  246
            N+Y   ++G T S  +T+LNA+SN IPS   I+TI ++ +   SG +N
Sbjct  189  NKYNIFISGGTGSGKTTFLNALSNYIPSDERIITIEDSAELQLSGTDN  236


>ref|XP_001640986.1| predicted protein [Nematostella vectensis]
 gb|EDO48923.1| predicted protein [Nematostella vectensis]
Length=448

 Score = 35.8 bits (81),  Expect = 5.1, Method: Compositional matrix adjust.
 Identities = 26/87 (29%), Positives = 39/87 (44%), Gaps = 6/87 (6%)

Query  114  NYQLKNNLGSEINLGGSGGITNRETQEFPTAPIINETGIKWILL------SLKSSLTGSN  167
             +Q +N     ++L G+G       + F   P++ E GI  I+L      S K      +
Sbjct  106  KWQHRNRKPMCVHLAGTGDHFYWRRRNFMAKPLLKEHGIGSIILENPFYGSRKPKDQQRS  165

Query  168  SGKTEVELKYNGTALNLKDDYLLFYCE  194
            S K  V+L   GT L L+   LL +CE
Sbjct  166  SLKHVVDLFIMGTGLILESSVLLHWCE  192


>ref|ZP_01125400.1| hypothetical protein WH7805_10973 [Synechococcus sp. WH 7805]
 gb|EAR17439.1| hypothetical protein WH7805_10973 [Synechococcus sp. WH 7805]
Length=116

 Score = 35.8 bits (81),  Expect = 5.3, Method: Compositional matrix adjust.
 Identities = 13/46 (28%), Positives = 23/46 (50%), Gaps = 0/46 (0%)

Query  204  YPYTLNGTTNSTYSTWLNAISNSIPSATSIVTINNAPDSGGNNGCW  249
            +PY +     S    W++ I   +P++ ++V I +A     N GCW
Sbjct  40   FPYPVEHLPLSAAEQWMDEIDQRLPASRTVVVICHAGVRSWNFGCW  85


>ref|YP_001192168.1| type II secretion system protein E [Metallosphaera sedula DSM 
5348]
 gb|ABP96244.1| type II secretion system protein E [Metallosphaera sedula DSM 
5348]
Length=476

 Score = 35.4 bits (80),  Expect = 6.6, Method: Compositional matrix adjust.
 Identities = 29/107 (27%), Positives = 51/107 (47%), Gaps = 17/107 (15%)

Query  188  YLLFYCEHKPNSSNNQYPYTLNGTTNSTYSTWLNAISNSIPSATSIVTINNAPDSGGNNG  247
            YL F  ++KP        Y + G+T S  +T+LNA+ N       I++I + P+   +  
Sbjct  214  YLWFLLDYKPF-------YLIVGSTGSGKTTFLNALLNFANPDAKILSIEDTPELNLSGK  266

Query  248  CWAG--------GTKDKPI--LADLAKNATNKYILIGLLQGEKVDTI  284
             W           T D  I  L+ LA      Y++IG ++G++++T+
Sbjct  267  NWIRFFSRQSLVSTYDVTIGELSRLALRYRPDYLIIGEVRGKEIETL  313


----------------------------------------------------------------------------------------------------

b) BLASTx

                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|ZP_03724684.1|  Type II secretory pathway pseudopilin PulG...  43.1    0.046
emb|CAL48345.1|  endoglucanase [Stachybotrys chartarum]            40.4    0.30 
gb|AAM77710.1|AF435067_1  endoglucanase [Stachybotrys echinata]    40.4    0.30 
gb|AAD30963.2|  SNF1/AMP-activated kinase [Dictyostelium disco...  39.7    0.50 
ref|XP_625708.1|  hypothetical protein [Cryptosporidium parvum...  39.7    0.50 
ref|XP_665360.1|  hypothetical protein [Cryptosporidium homini...  38.1    1.5  
ref|ZP_04642548.1|  Large exoprotein involved in heme utilizat...  37.7    1.9  
ref|XP_002554252.1|  KLTH0F00990p [Lachancea thermotolerans] >...  37.4    2.5  
ref|XP_642250.1|  hypothetical protein DDB_G0277905 [Dictyoste...  37.4    2.5  
ref|XP_002669583.1|  predicted protein [Naegleria gruberi] >gb...  37.0    3.3  
gb|EES53632.1|  conserved hypothetical protein [Leptospirillum...  37.0    3.3  
ref|YP_855265.1|  M16B family peptidase [Aeromonas hydrophila ...  37.0    3.3  
ref|ZP_05793210.1|  pilus assembly protein CpaF [Butyrivibrio ...  36.6    4.3  
ref|YP_001192168.1|  type II secretion system protein E [Metal...  36.6    4.3  
ref|XP_002009829.1|  GI15579 [Drosophila mojavensis] >gb|EDW07...  36.2    5.6  
ref|XP_640018.1|  hypothetical protein DDB_G0282385 [Dictyoste...  36.2    5.6  
ref|XP_643247.1|  hypothetical protein DDB_G0276133 [Dictyoste...  36.2    5.6  
ref|ZP_05553352.1|  N-acetylmuramoyl-L-alanine amidase, family...  35.8    7.3  
ref|XP_675219.1|  hypothetical protein [Plasmodium berghei str...  35.8    7.3  
ref|XP_665526.1|  hypothetical protein [Cryptosporidium homini...  35.8    7.3  
gb|EFA79650.1|  peptidase C19 family protein [Polysphondylium ...  35.4    9.5  
ref|ZP_04625405.1|  Outer membrane autotransporter barrel doma...  35.4    9.5  
ref|XP_675437.1|  hypothetical protein [Plasmodium berghei str...  35.4    9.5  

ALIGNMENTS
>ref|ZP_03724684.1| Type II secretory pathway pseudopilin PulG-like protein [Opitutaceae 
bacterium TAV2]
 gb|EEG21337.1| Type II secretory pathway pseudopilin PulG-like protein [Opitutaceae 
bacterium TAV2]
Length=240

 Score = 43.1 bits (100),  Expect = 0.046
 Identities = 30/86 (34%), Positives = 42/86 (48%), Gaps = 2/86 (2%)
 Frame = +1

Query  589  PNSSNNQYPYTLNGTTNSTYSTWLNAISNSIPSATSIVTINNAPDSGGNNGCWAGGTKDK  768
            PN +   YPY +N +  S  ST + A S ++PS T ++T N+ P +GG  G W  G K  
Sbjct  133  PNGNAYGYPYAVNMSVISDTSTQVAANSITVPSQTVLMTDNSTPVTGG--GTWTYGFKFT  190

Query  769  PILADLAKNATNKYILIGLLQGEKVD  846
                D     T     +G L GEK +
Sbjct  191  ASGFDDPYWGTRIVERMGALHGEKTN  216


>emb|CAL48345.1| endoglucanase [Stachybotrys chartarum]
Length=238

 Score = 40.4 bits (93),  Expect = 0.30
 Identities = 36/140 (25%), Positives = 51/140 (36%), Gaps = 24/140 (17%)
 Frame = +1

Query  310  YGTYFGQGT----INYQLKNNLGSEINLGGSGGITNRETQEFPTAPIINETGIKWIllsl  477
            +G   GQG     I+Y   N +G  +N   SGG  N ++  +    I  +  + WI    
Sbjct  39   WGRNSGQGNQCTYIDYSSSNGVGWRVNWNWSGGDNNVKSYPYSGRQIPTKRIVSWI----  94

Query  478  ksslTGSNSGKTEVELKYNGTALNLKDDYLLFYCEHKPNSSNNQYPYTLNGTTNSTYSTW  657
                    S  T V   Y G  +     Y LF   + PN S +   Y L          W
Sbjct  95   -------GSLPTTVSWNYQGNNIRANVAYDLFTASN-PNHSTSSGDYEL--------MIW  138

Query  658  LNAISNSIPSATSIVTINNA  717
            L  + N  P    + T+N A
Sbjct  139  LGRLGNVYPIGNQVATVNVA  158


>gb|AAM77710.1|AF435067_1 endoglucanase [Stachybotrys echinata]
Length=237

 Score = 40.4 bits (93),  Expect = 0.30
 Identities = 41/165 (24%), Positives = 61/165 (36%), Gaps = 27/165 (16%)
 Frame = +1

Query  235  SQAMWCNKSFVGSNLASGVNNPYINYGTYFGQGT----INYQLKNNLGSEINLGGSGGIT  402
            +Q++    S+  SN     NN +   G   GQG     ++Y   N +G  +N   SGG  
Sbjct  16   AQSLCDQYSYYSSNGYEFNNNMW---GRNSGQGNQCTYVDYSSPNGVGWRVNWNWSGGDN  72

Query  403  NRETQEFPTAPIINETGIKWIllslksslTGSNSGKTEVELKYNGTALNLKDDYLLFYCE  582
            N ++  +    +  +  + WI            S  T V   Y G  L     Y LF   
Sbjct  73   NVKSYPYSGRQLPTKRIVSWI-----------GSLPTTVSWNYQGNNLRANVAYDLFTAA  121

Query  583  HKPNSSNNQYPYTLNGTTNSTYSTWLNAISNSIPSATSIVTINNA  717
            + PN  N+   Y L          WL  + N  P    + T+N A
Sbjct  122  N-PNHPNSSGDYEL--------MIWLGRLGNVYPIGNQVATVNIA  157


>gb|AAD30963.2| SNF1/AMP-activated kinase [Dictyostelium discoideum]
Length=718

 Score = 39.7 bits (91),  Expect = 0.50
 Identities = 43/189 (22%), Positives = 74/189 (39%), Gaps = 16/189 (8%)
 Frame = +1

Query  253  NKSFVGSNLASGVNNPYINYGTYFGQGTINYQ--LKNNLGSEINLGGSGGITNRETQEFP  426
            N S + +N    +NN  IN         IN    + NN  +  N   +    N       
Sbjct  421  NNSIINNN---NINNNNINNNNNNNNNNINNNNIINNNNNNNNNNNNNNNNNNNNNNNNN  477

Query  427  TAPIINETGIKWIllslksslTGSNSGKTEVELKYNGTALNLKDDYLLFYCEHKPNSSNN  606
             + I   T +  I  +L +S   ++SG +      N +  N  +D       +  N++NN
Sbjct  478  NSSISGGTEVFSISPNLNNSYNSNSSGNSNGSNSNNNSNNNTNNDNN----NNNNNNNNN  533

Query  607  QYPYTLNGTTNSTYSTWLNAISNSIPSATSIVTI------NNAPDSGGNNGCWAGGTKDK  768
                  N   N+  +  +++++NS+ +   +         NN  D G NN  + GG  D 
Sbjct  534  NNNNNNNNNNNNNNNNCIDSVNNSLNNENDVNNSNINNNNNNNSDDGSNNNSYEGG-GDV  592

Query  769  PILADLAKN  795
             +L+DL  N
Sbjct  593  LLLSDLNGN  601


>ref|XP_625708.1| hypothetical protein [Cryptosporidium parvum Iowa II]
 gb|EAK87634.1| uncharacterized protein [Cryptosporidium parvum Iowa II]
Length=1159

 Score = 39.7 bits (91),  Expect = 0.50
 Identities = 41/162 (25%), Positives = 69/162 (42%), Gaps = 8/162 (4%)
 Frame = +1

Query  253  NKSFVGSNLASGVNNPYINY-GTYFGQGTIN---YQLKNNLGSE--INLGGSGGITNRET  414
            N+  +  N+   VNN    Y   Y G   I    + + NN G+   IN   + G +N   
Sbjct  374  NEKKLRDNVVEYVNNKEFKYHNNYSGSSGITNSCFNIFNNYGNSGIINPLNADGNSNNNN  433

Query  415  QEFPTAPIINETGIKWIllslksslTGSNS--GKTEVELKYNGTALNLKDDYLLFYCEHK  588
                ++ +IN  G  +  L+  S+   +NS  G   + ++ N   L L       Y  + 
Sbjct  434  NGNSSSSLINLNGSSFSALNGLSNNNNNNSTEGSQPILVQNNSYQLLLSQLMQYLYGGNN  493

Query  589  PNSSNNQYPYTLNGTTNSTYSTWLNAISNSIPSATSIVTINN  714
             NS+NN    +  GT NS+Y+  +   +++  + T  V  NN
Sbjct  494  NNSNNNNTNNSNTGTNNSSYNGLIGTNNSNFGANTGNVGTNN  535


>ref|XP_665360.1| hypothetical protein [Cryptosporidium hominis TU502]
 gb|EAL35131.1| hypothetical protein Chro.40131 [Cryptosporidium hominis]
Length=728

 Score = 38.1 bits (87),  Expect = 1.5
 Identities = 38/160 (23%), Positives = 69/160 (43%), Gaps = 6/160 (3%)
 Frame = +1

Query  253  NKSFVGSNLASGVNNPYINY-GTYFGQGTIN---YQLKNNLGSE--INLGGSGGITNRET  414
            N+  +  N+   VNN    Y   Y G   I    + + NN G+   IN   + G +N  +
Sbjct  381  NEKKLRDNVVEYVNNKEFKYHNNYSGSSGITNSCFNIFNNYGNSGIINPLNTDGNSNNNS  440

Query  415  QEFPTAPIINETGIKWIllslksslTGSNSGKTEVELKYNGTALNLKDDYLLFYCEHKPN  594
                ++ +IN TG  +  L+  S+   +++  ++  L  N     L    + +      N
Sbjct  441  NGNSSSSLINFTGSSFSALNGLSNNNNNSTEGSQPILVQNNNYQLLLSQLMQYLYGGNNN  500

Query  595  SSNNQYPYTLNGTTNSTYSTWLNAISNSIPSATSIVTINN  714
            +SNN       GT NS+Y+  ++  +++  + T  V  NN
Sbjct  501  NSNNNNTNNNTGTNNSSYNGLISTNNSNFGANTGNVGTNN  540


>ref|ZP_04642548.1| Large exoprotein involved in heme utilization or adhesion [Yersinia 
mollaretii ATCC 43969]
 gb|EEQ08930.1| Large exoprotein involved in heme utilization or adhesion [Yersinia 
mollaretii ATCC 43969]
Length=955

 Score = 37.7 bits (86),  Expect = 1.9
 Identities = 60/253 (23%), Positives = 96/253 (37%), Gaps = 17/253 (6%)
 Frame = +1

Query  94   TITTTGPW------GKTGNPAAVNVICELPELGsitssspysssydhasILNDSQAMWCN  255
            TI TTG        G +GN    +VI  L   G+  + +   +   +    N       N
Sbjct  543  TIATTGAASGLTFAGTSGNHTLADVILNLNGTGAAFTKNVGVNLLLNHVTFNTVAGTALN  602

Query  256  KSFVGSNLASGVN-NPYINY-GTYFGQGTINYQLKNNLGSEINLGGSGGITNRETQEFPT  429
             S  G   A+ VN N  IN  G   G  ++     +N   +IN+  S GI  +     PT
Sbjct  603  -SLAGLTFANSVNGNNIINVTGAGIGVTSVGGVDLSNAYLDINVTNSAGIGLQVADGTPT  661

Query  430  AP------IINETGIKWIllslksslTGSNSGKTEVELKYNGTALNLKDDYLLFYCEHKP  591
                    +IN TG   I  +  ++ T +N+G     + + G A N+ ++  +       
Sbjct  662  TTTIGTNSLINATGATAISFTGTTAKTLTNNGTINGAVTFAGAAANIINNNSILNGTLTT  721

Query  592  NSSNNQYPYTLNGTTNSTYSTWLNAISNSIPSATSIVTINN--APDSGGNNGCWAGGTKD  765
             S N+         +N T +    + + +I +   + TIN     D+   NG  AG T  
Sbjct  722  GSGNDSLVLGSGSESNGTINLGDGSNNVTIENGAQVSTINTGAGDDNFTINGMTAGSTYL  781

Query  766  KPILADLAKNATN  804
              + A    N  N
Sbjct  782  GSLNAGSGNNTLN  794


>ref|XP_002554252.1| KLTH0F00990p [Lachancea thermotolerans]
 emb|CAR23815.1| KLTH0F00990p [Lachancea thermotolerans]
Length=785

 Score = 37.4 bits (85),  Expect = 2.5
 Identities = 24/56 (42%), Positives = 35/56 (62%), Gaps = 6/56 (10%)
 Frame = +1

Query  274  NLASGVNNPYINYGTYFGQGTINYQ--LKNN---LGSEINLGGSGGITNRETQEFP  426
            NL+  VNN ++N+ TYF + TI+++  L NN   LGS+ NL    G +N +T  FP
Sbjct  487  NLSEIVNNIFLNFKTYFKEFTISFKNCLHNNQFALGSQENLNNGVGSSN-DTLAFP  541


>ref|XP_642250.1| hypothetical protein DDB_G0277905 [Dictyostelium discoideum AX4]
 sp|Q54YF2.1|AMPKA_DICDI RecName: Full=5'-AMP-activated serine/threonine-protein kinase 
catalytic subunit alpha; Short=AMPKA; AltName: Full=Protein 
kinase, AMP-activated, alpha subunit; AltName: Full=SNF1/AMP-activated 
kinase catalytic subunit; AltName: Full=Sucrose 
non-fermenting protein snfA
 gb|EAL68125.1| hypothetical protein DDB_G0277905 [Dictyostelium discoideum AX4]
Length=727

 Score = 37.4 bits (85),  Expect = 2.5
 Identities = 43/190 (22%), Positives = 73/190 (38%), Gaps = 17/190 (8%)
 Frame = +1

Query  253  NKSFVGSNLASGVNNPYINYGTYFGQGTINYQL---KNNLGSEINLGGSGGITNRETQEF  423
            N S + +N    +NN  IN         IN       NN  +  N   +    N      
Sbjct  421  NNSIINNN---NINNNNINNNNNNNNNNINNNNIINNNNNNNNNNNNNNNNNNNNNNNNN  477

Query  424  PTAPIINETGIKWIllslksslTGSNSGKTEVELKYNGTALNLKDDYLLFYCEHKPNSSN  603
              + I   T +  I  +L +S   ++SG +      N +  N  +D       +  N++N
Sbjct  478  NNSSISGGTEVFSISPNLNNSYNSNSSGNSNGSNSNNNSNNNTNNDNN----NNNNNNNN  533

Query  604  NQYPYTLNGTTNSTYSTWLNAISNSIPSATSIVTI------NNAPDSGGNNGCWAGGTKD  765
            N      N   N+  +  +++++NS+ +   +         NN  D G NN  + GG  D
Sbjct  534  NNNNNNNNNNNNNNNNNCIDSVNNSLNNENDVNNSNINNNNNNNSDDGSNNNSYEGG-GD  592

Query  766  KPILADLAKN  795
              +L+DL  N
Sbjct  593  VLLLSDLNGN  602


>ref|XP_002669583.1| predicted protein [Naegleria gruberi]
 gb|EFC36839.1| predicted protein [Naegleria gruberi]
Length=1487

 Score = 37.0 bits (84),  Expect = 3.3
 Identities = 39/154 (25%), Positives = 60/154 (38%), Gaps = 16/154 (10%)
 Frame = +1

Query  358   NLGSEINLGGSGGITNRETQEFPTAPIINETGIKWIllslksslTGSNSGKTEVELKYNG  537
             N  +  N GG+  I  RE    P     NET I +I      +   SN   T  +  Y  
Sbjct  909   NYVASTNNGGTHIIYRREPYVIPNIVGYNETAINFIEEGNLVADAFSNLLITVPQYNYYT  968

Query  538   TALNLKDDYLLFYCEHKPNSSNNQYPYTLNGTTNSTYSTWLNAISNSIPSATSIVT----  705
             ++L    D +++   ++ N++ N        T NSTY        NS PS  S+      
Sbjct  969   SSL----DRIIYQSIYEGNNTRN--------TCNSTYIDLTVVSYNSFPSVVSLTKRYDG  1016

Query  706   INNAPDSGGNNGCWAGGTKDKPILADLAKNATNK  807
             +N     G +N     GT D  +    + ++T K
Sbjct  1017  LNPVLLVGNDNSTGVNGTTDSILQISKSLSSTKK  1050


>gb|EES53632.1| conserved hypothetical protein [Leptospirillum ferrodiazotrophum]
Length=656

 Score = 37.0 bits (84),  Expect = 3.3
 Identities = 44/172 (25%), Positives = 67/172 (38%), Gaps = 22/172 (12%)
 Frame = +1

Query  247  WCNKSFVGSNLASG--VNNPYINYGTYFGQGTINYQ---LKNNL---------GSEINLG  384
            +C    VGSN  S   V N Y N    +G G I +      NN+         GS ++  
Sbjct  309  YCVGGVVGSNSGSAATVQNTYYNTSVAYGSGAIGHNSGTSTNNIGISGPSSGTGSNLSNL  368

Query  385  GSGGITNRETQEFPTAPIINETGIKWIllslksslTGSNSGKTEVELKYNGTALNLKDDY  564
            G+    N  T  F T P+   T   WI   + +  TG  S    + +   GTA    +  
Sbjct  369  GTFNTWNSSTGAFNTTPV---TSAPWI---MGTITTGGESIVAPILVSDMGTATVTANSV  422

Query  565  LLFYCEHKPNSSNNQYPYTLNGTTNSTYSTWLNAISNSIPSATSIVTINNAP  720
             + Y    P S  + Y  T   T + T +   ++++    + T +VT  N P
Sbjct  423  SMTY-SGLPYSLGSHYT-TTGATLSGTVADSTSSVNTGTFTITPVVTALNFP  472


>ref|YP_855265.1| M16B family peptidase [Aeromonas hydrophila subsp. hydrophila 
ATCC 7966]
 gb|ABK38743.1| peptidase, M16B family [Aeromonas hydrophila subsp. hydrophila 
ATCC 7966]
Length=937

 Score = 37.0 bits (84),  Expect = 3.3
 Identities = 32/99 (32%), Positives = 44/99 (44%), Gaps = 12/99 (12%)
 Frame = +1

Query  298  PYINYGTYFGQGTINYQLKNNLGSEINL----------GGSGGIT-NRETQEFPT-APII  441
            P+   G YF  G +N+ L  N  S INL          G S G + NRE   F T A + 
Sbjct  756  PFDTTGDYFTAGLMNFNLGGNFNSRINLNLREDKGYTYGASSGFSANREAGTFATGANVR  815

Query  442  NETGIKWIllslksslTGSNSGKTEVELKYNGTALNLKD  558
             +  +  I   LK       +G T VEL Y  +A++ +D
Sbjct  816  ADATVDAIRQFLKEMDNYCKNGPTPVELAYMRSAVSQQD  854