GOS 1395010

From Metagenes
Jump to: navigation, search
Warning: this metagenomic sequence has been carefully annotated by students during bioinformatics assignments. These quality annotations are therefore the result of a teaching exercise that you are most welcome to amend and extend if necessary!


Sequence
CAMERA AccNum : JCVI_READ_1091142451834
Annotathon code: GOS_1395010
Sample :
  • GPS :38°56'49n; 76°25'2w
  • North American East Coast: Chesapeake Bay, MD - USA
  • Estuary (-13.2m, 1°C, 0.1-0.8 microns)
Authors
Team : Algarve
Username : DIH1
Annotated on : 2010-07-01 23:37:45
  • a33834 HélderFelicioFernandes
  • a33836 InesPereiraAlface
  • a35282 DianaMartaLuísMiguéns

Contents

Synopsis

  • Taxonomy: unidentified (NCBI info)
    Rank: species - Genetic Code: Standard - NCBI Identifier: 32644
    Kingdom: - Phylum: - Class: - Order:
    unclassified sequences;

Genomic Sequence

>JCVI_READ_1091142451834 GOS_1395010 Genomic DNA
AGAAGTACACGCGGCGCGGAGAACTTGCCGCCTTGCATAAACTACACAACGAAACAAATGGAGATGAACAACAAGCAATCGAAGCAATCCAACTCGCTAT
CGCGAACCAATGGCAAGGAATATACCCTCGACCAAAGAAGGCAGGAACAAGCGCACCGAACAGAGACCAGCTTGAAACGTATCTCAAGTACGGGACTCTT
TAAAACGACTCCGGCGGAAGCGTGGCACGAAGGGACAAACATCCGCACCGCTTTACGATGCTACCCGGAAGAGACGCGGGCGGAAGTGGTGAAACTCATT
AAAAAGACCGTCGATTTCATAGACGCGAAAAAGACGCTTACCAATTTCGAAGACGTAGCCCTTTGCGCAGAAATGATCTTCGAAATATTTCCCGTTTTGA
AGTTGGAGGAGTTGAGGTTAGTTTGTGACCGCATGAAACAAGGTCATTACGGCAAATTTTACGAGCGACTCAAAATCCAAGAGTTCCGGGAATGCATCCA
GAAACACGAAGAGGAACGCGCTCCGATCCTTGAGGAGATGCATAAGCAAATTTACCGGGGTACGGACAACCCTACGAACGTCCCCGAATACGACAAAGAA
GCGGCTCACCTCGCTTGGAAACTGAAGAATAACCCCTTTTTAATACCCGGAAAGAATGACAACGACTGAAAAAATAGAAGACCTCATCGGCAAAGAATGC
GTATACACCGCAGCAAAAAACATGATACCTTGCCGAATCAAAAAAGTATTCGTACAAAGGGAATTGAAATGGCCTTATGAGTCCGACATATTACTTGAAA
TAGAACCCTTACACCGTAGCAAGGTAAATGAAGTTGACTTGGAAGAGATGCAACAAGGCGT

Translation

[1 - 666/861]   direct strand
>GOS_1395010 Translation [1-666   direct strand]
RSTRGAENLPPCINYTTKQMEMNNKQSKQSNSLSRTNGKEYTLDQRRQEQAHRTETSLKRISSTGLFKTTPAEAWHEGTNIRTALRCYPEETRAEVVKLI
KKTVDFIDAKKTLTNFEDVALCAEMIFEIFPVLKLEELRLVCDRMKQGHYGKFYERLKIQEFRECIQKHEEERAPILEEMHKQIYRGTDNPTNVPEYDKE
AAHLAWKLKNNPFLIPGKNDND
[ Warning ] 5' incomplete: does not start with a Methionine

Annotator commentaries

Despite the absence of any known functional domain, the e-values being too high (higher than 0.98), the number of hits in the Blast results being too low and of little significance (there were only 21 hits obtained, all with very weak e-values and score values (score values lower than 37.7)), the alignments being equally weak and not being possible to realize the Multiple Sequence Alignment (due to the high e-values), there is a strong possibility that this sequence is a coding one because of its rather long size. There is the possibility that this is an ORFan, as its size is longer than 200 amino acids and it doesn’t seem to have any known homolog proteins.


Due to its weak e-values (too high to be of significance), none of the few results obtained should be taken into strong consideration, making it impossible to withdraw any viable and concrete conclusions. The Blast results, not only present weak e-values and, as a result, weak score values, also present, as expected, alignments with very weak homologies – too weak to be taken into serious consideration. By observation of these results it is assumed that this is a hypothetical protein, but further information is out of reach since there were no results in the search for the functional domains. As it is, there is no knowing what its function may be, or in which biological process it may be involved. Also, still due to the e-values, it was not possible to construct a Phylogenetic Tree with significance, nor submit the sequence to a Multiple Sequence Alignment. As such, it has become impossible to reach a decision regarding its taxonomy classification, as we can’t guide ourselves exclusively on the Lineage Report (which is, by itself, of no significance also, seeing as it was made with such high e-values).


Therefore, we couldn’t reach any conclusion that is viable and well founded about the molecular function, biological process or taxonomy of this organism.


ORF finding

PROTOCOL

a) SMS ORFinder / direct strand / frames 1, 2 & 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code

b) SMS ORFinder / reverse strand / frames 1, 2 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code




RESULTS ANALYSIS



a) In the direct strand ORFs were found in the reading frames 1, 2 and 3

In frame 1 an incomplete ORF was found with only a STOP codon in the 3´end.

In frame 2, two incomplete ORFs were found. The first one only has a STOP codon in the 3´end and the second one has no methionine in the 5´end or STOP codon in the 3´end.

In frame 3 an incomplete ORF was found with only a STOP codon in the 3´end.


b) In the reverse strand ORFs were found in the reading frame 1, 2 and 3

In frame 1 an incomplete ORF was found with only a STOP codon

In frame 2 two incomplete ORFs were found, both with a STOP codon in the 3´end and no methionine in the 5´end.

In frame 3 two incomplete ORFs were found, the first one only has a STOP codon in the 3´end and the second one has no methionine in the 5´end or STOP codon in the 3´end.



The chosen ORF is located in the reading frame 1 on the direct strand, because it is the longest and as such, it has a high probability of being translated and it has biological significance.


This ORF does not have a START codon (that we could find) and it has a STOP codon in the 3’ end.


The remaining ORFs also have biological significance (more than 60 amino acids), but the chosen ORF is, by far, the longest and so it has the highest probability of being translated.



RAW RESULTS

a)In the direct strand: 

>ORF number 1 in reading frame 1 on the direct strand extends from base 1 to base 669.
AGAAGTACACGCGGCGCGGAGAACTTGCCGCCTTGCATAAACTACACAACGAAACAAATG
GAGATGAACAACAAGCAATCGAAGCAATCCAACTCGCTATCGCGAACCAATGGCAAGGAA
TATACCCTCGACCAAAGAAGGCAGGAACAAGCGCACCGAACAGAGACCAGCTTGAAACGT
ATCTCAAGTACGGGACTCTTTAAAACGACTCCGGCGGAAGCGTGGCACGAAGGGACAAAC
ATCCGCACCGCTTTACGATGCTACCCGGAAGAGACGCGGGCGGAAGTGGTGAAACTCATT
AAAAAGACCGTCGATTTCATAGACGCGAAAAAGACGCTTACCAATTTCGAAGACGTAGCC
CTTTGCGCAGAAATGATCTTCGAAATATTTCCCGTTTTGAAGTTGGAGGAGTTGAGGTTA
GTTTGTGACCGCATGAAACAAGGTCATTACGGCAAATTTTACGAGCGACTCAAAATCCAA
GAGTTCCGGGAATGCATCCAGAAACACGAAGAGGAACGCGCTCCGATCCTTGAGGAGATG
CATAAGCAAATTTACCGGGGTACGGACAACCCTACGAACGTCCCCGAATACGACAAAGAA
GCGGCTCACCTCGCTTGGAAACTGAAGAATAACCCCTTTTTAATACCCGGAAAGAATGAC
AACGACTGA

>Translation of ORF number 1 in reading frame 1 on the direct strand.
RSTRGAENLPPCINYTTKQMEMNNKQSKQSNSLSRTNGKEYTLDQRRQEQAHRTETSLKR
ISSTGLFKTTPAEAWHEGTNIRTALRCYPEETRAEVVKLIKKTVDFIDAKKTLTNFEDVA
LCAEMIFEIFPVLKLEELRLVCDRMKQGHYGKFYERLKIQEFRECIQKHEEERAPILEEM
HKQIYRGTDNPTNVPEYDKEAAHLAWKLKNNPFLIPGKNDND*

>ORF number 1 in reading frame 2 on the direct strand extends from base 437 to base 625.
AACAAGGTCATTACGGCAAATTTTACGAGCGACTCAAAATCCAAGAGTTCCGGGAATGCA
TCCAGAAACACGAAGAGGAACGCGCTCCGATCCTTGAGGAGATGCATAAGCAAATTTACC
GGGGTACGGACAACCCTACGAACGTCCCCGAATACGACAAAGAAGCGGCTCACCTCGCTT
GGAAACTGA

>Translation of ORF number 1 in reading frame 2 on the direct strand.
NKVITANFTSDSKSKSSGNASRNTKRNALRSLRRCISKFTGVRTTLRTSPNTTKKRLTSL
GN*

>ORF number 2 in reading frame 2 on the direct strand extends from base 644 to base 859.
TACCCGGAAAGAATGACAACGACTGAAAAAATAGAAGACCTCATCGGCAAAGAATGCGTA
TACACCGCAGCAAAAAACATGATACCTTGCCGAATCAAAAAAGTATTCGTACAAAGGGAA
TTGAAATGGCCTTATGAGTCCGACATATTACTTGAAATAGAACCCTTACACCGTAGCAAG
GTAAATGAAGTTGACTTGGAAGAGATGCAACAAGGC

>Translation of ORF number 2 in reading frame 2 on the direct strand.
YPERMTTTEKIEDLIGKECVYTAAKNMIPCRIKKVFVQRELKWPYESDILLEIEPLHRSK
VNEVDLEEMQQG

>ORF number 1 in reading frame 3 on the direct strand extends from base 3 to base 203.
AAGTACACGCGGCGCGGAGAACTTGCCGCCTTGCATAAACTACACAACGAAACAAATGGA
GATGAACAACAAGCAATCGAAGCAATCCAACTCGCTATCGCGAACCAATGGCAAGGAATA
TACCCTCGACCAAAGAAGGCAGGAACAAGCGCACCGAACAGAGACCAGCTTGAAACGTAT
CTCAAGTACGGGACTCTTTAA

>Translation of ORF number 1 in reading frame 3 on the direct strand.
KYTRRGELAALHKLHNETNGDEQQAIEAIQLAIANQWQGIYPRPKKAGTSAPNRDQLETY
LKYGTL*


b)In the reverse strand: 

>ORF number 1 in reading frame 1 on the reverse strand extends from base 223 to base 444.
AAAGGGGTTATTCTTCAGTTTCCAAGCGAGGTGAGCCGCTTCTTTGTCGTATTCGGGGAC
GTTCGTAGGGTTGTCCGTACCCCGGTAAATTTGCTTATGCATCTCCTCAAGGATCGGAGC
GCGTTCCTCTTCGTGTTTCTGGATGCATTCCCGGAACTCTTGGATTTTGAGTCGCTCGTA
AAATTTGCCGTAATGACCTTGTTTCATGCGGTCACAAACTAA

>Translation of ORF number 1 in reading frame 1 on the reverse strand.
KGVILQFPSEVSRFFVVFGDVRRVVRTPVNLLMHLLKDRSAFLFVFLDAFPELLDFESLV
KFAVMTLFHAVTN*

>ORF number 1 in reading frame 2 on the reverse strand extends from base 419 to base 601.
CCTTGTTTCATGCGGTCACAAACTAACCTCAACTCCTCCAACTTCAAAACGGGAAATATT
TCGAAGATCATTTCTGCGCAAAGGGCTACGTCTTCGAAATTGGTAAGCGTCTTTTTCGCG
TCTATGAAATCGACGGTCTTTTTAATGAGTTTCACCACTTCCGCCCGCGTCTCTTCCGGG
TAG

>Translation of ORF number 1 in reading frame 2 on the reverse strand.
PCFMRSQTNLNSSNFKTGNISKIISAQRATSSKLVSVFFASMKSTVFLMSFTTSARVSSG
*

>ORF number 2 in reading frame 2 on the reverse strand extends from base 602 to base 820.
CATCGTAAAGCGGTGCGGATGTTTGTCCCTTCGTGCCACGCTTCCGCCGGAGTCGTTTTA
AAGAGTCCCGTACTTGAGATACGTTTCAAGCTGGTCTCTGTTCGGTGCGCTTGTTCCTGC
CTTCTTTGGTCGAGGGTATATTCCTTGCCATTGGTTCGCGATAGCGAGTTGGATTGCTTC
GATTGCTTGTTGTTCATCTCCATTTGTTTCGTTGTGTAG

>Translation of ORF number 2 in reading frame 2 on the reverse strand.
HRKAVRMFVPSCHASAGVVLKSPVLEIRFKLVSVRCACSCLLWSRVYSLPLVRDSELDCF
DCLLFISICFVV*

>ORF number 1 in reading frame 3 on the reverse strand extends from base 72 to base 290.
TATGTCGGACTCATAAGGCCATTTCAATTCCCTTTGTACGAATACTTTTTTGATTCGGCA
AGGTATCATGTTTTTTGCTGCGGTGTATACGCATTCTTTGCCGATGAGGTCTTCTATTTT
TTCAGTCGTTGTCATTCTTTCCGGGTATTAAAAAGGGGTTATTCTTCAGTTTCCAAGCGA
GGTGAGCCGCTTCTTTGTCGTATTCGGGGACGTTCGTAG

>Translation of ORF number 1 in reading frame 3 on the reverse strand.
YVGLIRPFQFPLYEYFFDSARYHVFCCGVYAFFADEVFYFFSRCHSFRVLKRGYSSVSKR
GEPLLCRIRGRS*

>ORF number 2 in reading frame 3 on the reverse strand extends from base 663 to base 860.
AGAGTCCCGTACTTGAGATACGTTTCAAGCTGGTCTCTGTTCGGTGCGCTTGTTCCTGCC
TTCTTTGGTCGAGGGTATATTCCTTGCCATTGGTTCGCGATAGCGAGTTGGATTGCTTCG
ATTGCTTGTTGTTCATCTCCATTTGTTTCGTTGTGTAGTTTATGCAAGGCGGCAAGTTCT
CCGCGCCGCGTGTACTTC

>Translation of ORF number 2 in reading frame 3 on the reverse strand.
RVPYLRYVSSWSLFGALVPAFFGRGYIPCHWFAIASWIASIACCSSPFVSLCSLCKAASS
PRRVYF

Multiple Alignement

PROTOCOL


Phylogeny.fr


RESULTS ANALYSIS



Thanks to the high e-values that have no significance, it was not possible to choose sequences for the making of the Multiple Sequence Alignment.



RAW RESULTS

No hits reported.

Protein Domains

PROTOCOL


InterProScan/default parameters at EBI


RESULTS ANALYSIS


No functional domains were found regarding this sequence.



RAW RESULTS

No hits reported.

Phylogeny

PROTOCOL


a) Phylogeny.fr / BioNJ method / default substitution model / out group: not found

b) Phylogeny.fr / PhyML method / default substitution model / out group: not found



RESULTS ANALYSIS



Thanks to the high e-values that have no significance, it was not possible to choose sequences for the ingroup and outgroup necessary for the making of viable Phylogenetic Trees.

RAW RESULTS

No Phylogenetic Trees were constructed.

Taxonomy report

PROTOCOL

BLASTx versus NR, NCBI default parameters apart from "Number of descriptions_1000"



RESULTS ANALYSIS



Based on the Lineage Report’s results, it is observed that the closest organism to the unknown sequence is the Apis mellifera. However, this result is strongly contested by the annotators, not only because of the Blast results (e-values too high to allow to make any viable conclusion) but also because the presence of bees (or any other insect) in a sample collected in the sea is more than just highly unlikely. Even if, throughout the list, it is also observed the presence of organisms from the Kingdom Bacteria, once again, the e-values are much too high to allow the making of viable conclusions.


As it is, these results should not be taken into account.


RAW RESULTS

cellular organisms
. Eukaryota           [eukaryotes]
. . Fungi/Metazoa group [eukaryotes]
. . . Endopterygota       [insects]
. . . . Apis mellifera (bee) -------------------------------   37 1 hit  [bees]                PREDICTED: similar to CG5428-PA [Apis mellifera]
. . . . Drosophila persimilis ..............................   34 2 hits [flies]               GL21433 [Drosophila persimilis] >gi|194103674|gb|EDW25717.1
. . . Debaryomyces hansenii --------------------------------   36 1 hit  [ascomycetes]         DEHA2A14036p [Debaryomyces hansenii]
. . . Debaryomyces hansenii CBS767 .........................   36 1 hit  [ascomycetes]         hypothetical protein DEHA0A14454g [Debaryomyces hansenii CB
. . Plasmodium berghei str. ANKA ---------------------------   35 1 hit  [apicomplexans]       hypothetical protein [Plasmodium berghei strain ANKA] >gi|5
. . Plasmodium berghei .....................................   35 1 hit  [apicomplexans]       hypothetical protein [Plasmodium berghei strain ANKA] >gi|5
. . Plasmodium vivax SaI-1 .................................   35 1 hit  [apicomplexans]       hypothetical protein [Plasmodium vivax SaI-1] >gi|148802711
. . Plasmodium vivax .......................................   35 1 hit  [apicomplexans]       hypothetical protein [Plasmodium vivax SaI-1] >gi|148802711
. . Oryza sativa Japonica Group (Japanese rice) ............   34 1 hit  [monocots]            hypothetical protein OsJ_05377 [Oryza sativa Japonica Group]
. . Oryza sativa Indica Group (Indian rice) ................   34 1 hit  [monocots]            hypothetical protein OsI_05851 [Oryza sativa Indica Group]
. Bacteroides sp. 9_1_42FAA --------------------------------   36 2 hits [CFB group bacteria]  conserved hypothetical protein [Bacteroides sp. 9_1_42FAA] 
. Anoxybacillus flavithermus WK1 ...........................   36 2 hits [firmicutes]          GTPase, G3E family [Anoxybacillus flavithermus WK1] >gi|212
. Bacteroides sp. 2_2_4 ....................................   36 2 hits [CFB group bacteria]  conserved hypothetical protein [Bacteroides sp. 2_2_4] >gi|
. Bacteroides plebeius DSM 17135 ...........................   35 2 hits [CFB group bacteria]  hypothetical protein BACPLE_00682 [Bacteroides plebeius DSM
. uncultured Termite group 1 bacterium phylotype Rs-D17 ....   35 2 hits [bacteria]            CRISPR-associated protein Cas4 [uncultured Termite group 1 
. Syntrophobacter fumaroxidans MPOB ........................   35 2 hits [d-proteobacteria]    sensory histidine kinase CreC [Syntrophobacter fumaroxidans
. Bacteroides sp. 1_1_6 ....................................   35 2 hits [CFB group bacteria]  conserved hypothetical protein [Bacteroides sp. 1_1_6] >gi|
. Bacteroides ovatus ATCC 8483 .............................   35 2 hits [CFB group bacteria]  hypothetical protein BACOVA_03892 [Bacteroides ovatus ATCC 
. Bacteroides sp. D2 .......................................   35 1 hit  [CFB group bacteria]  hypothetical protein BACOVA_03892 [Bacteroides ovatus ATCC 
. Prevotella sp. oral taxon 472 str. F0295 .................   34 2 hits [CFB group bacteria]  conserved hypothetical protein [Prevotella sp. oral taxon 4
. Candidatus Accumulibacter phosphatis clade IIA str. UW-1 .   34 2 hits [b-proteobacteria]    exodeoxyribonuclease V, alpha subunit [Candidatus Accumulib
. Bacteroides sp. D1 .......................................   34 2 hits [CFB group bacteria]  conserved hypothetical protein [Bacteroides sp. D1] >gi|262
. Bacteroides sp. 2_1_22 ...................................   34 2 hits [CFB group bacteria]  conserved hypothetical protein [Bacteroides sp. D1] >gi|262
. Marinobacter algicola DG893 ..............................   34 2 hits [g-proteobacteria]    Twin-arginine translocation pathway signal [Marinobacter al
. Phyllobacterium brassicacearum ...........................   34 1 hit  [a-proteobacteria]    1-aminocyclopropan carboxylic acid deaminase [Phyllobacteri

BLAST

PROTOCOL


BLASTx versus NR, NCBI default parameters apart from "Number of descriptions_1000"



RESULTS ANALYSIS


After running the sequence through Blast, it is observed that a reduced number of hits was found, therefore, very few sequences of some homology with the unknown sequence were found and even then, their homology is very weak. The e-values are all very high (above 0.98) which means they are very weak e-values. It’s important to find sequences with the lowest e-values possible (under 1e-4) because the number of false positives rises with the number of searches when there are a great number of hits, so, the lowest the e-value, the highest the sequences' homology and significance. Such is not the case with this sequence, the e-values are all too high and because of that, of little significance. Therefore, the score values are low (lower than 37.7) which means that, even the sequence with the highest homology found is, in reality, not that homolog. As it is, these sequences are of very low biological significance.


In the alignment list, it is observed that the highest id value is 31% (every other being lower than 30%), making them alignments with very low homology and, again, of little significance that should not be taken into account in the making of conclusions.


The protein, based on these results, should then be a hypothetical one but, with such high e-values and weak homologies, and with no functional domain found in Protein Domain, it is hard to reach a precise and confidant conclusion.


To reach this results it was necessary to run the Blastx, since in Blastp the results obtained were even less satisfactory.


Due to the e-values being so high (> 1e-4) it was not possible to choose ingroup or outgroup for the construction of a significant Phylogenetic Tree or to make a significant multiple alignment.


RAW RESULTS

                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|XP_001122579.1|  PREDICTED: similar to CG5428-PA [Apis mel...  37.7    0.98 
ref|ZP_04541069.1|  conserved hypothetical protein [Bacteroide...  36.6    2.2  
ref|YP_002315763.1|  GTPase, G3E family [Anoxybacillus flavith...  36.6    2.2  
ref|ZP_04551376.1|  conserved hypothetical protein [Bacteroide...  36.2    2.8  
emb|CAG84918.2|  DEHA2A14036p [Debaryomyces hansenii]              36.2    2.8  
ref|XP_456940.1|  hypothetical protein DEHA0A14454g [Debaryomy...  36.2    2.8  
ref|XP_677011.1|  hypothetical protein [Plasmodium berghei str...  35.8    3.7  
ref|ZP_03207066.1|  hypothetical protein BACPLE_00682 [Bactero...  35.4    4.8  
ref|YP_001956163.1|  CRISPR-associated protein Cas4 [unculture...  35.4    4.8  
ref|XP_001613837.1|  hypothetical protein [Plasmodium vivax Sa...  35.4    4.8  
ref|YP_846770.1|  sensory histidine kinase CreC [Syntrophobact...  35.4    4.8  
ref|ZP_04845357.1|  conserved hypothetical protein [Bacteroide...  35.0    6.3  
ref|ZP_02066890.1|  hypothetical protein BACOVA_03892 [Bactero...  35.0    6.3  
ref|ZP_05918539.1|  conserved hypothetical protein [Prevotella...  34.7    8.3  
ref|YP_003167631.1|  exodeoxyribonuclease V, alpha subunit [Ca...  34.7    8.3  
ref|ZP_04546666.1|  conserved hypothetical protein [Bacteroide...  34.7    8.3  
gb|EEE56301.1|  hypothetical protein OsJ_05377 [Oryza sativa J...  34.7    8.3  
gb|EEC72485.1|  hypothetical protein OsI_05851 [Oryza sativa I...  34.7    8.3  
ref|XP_002028488.1|  GL21433 [Drosophila persimilis] >gb|EDW25...  34.7    8.3  
ref|ZP_01894038.1|  Twin-arginine translocation pathway signal...  34.7    8.3  
gb|ABO31418.1|  1-aminocyclopropan carboxylic acid deaminase [...  34.7    8.3  

ALIGNMENTS
>ref|XP_001122579.1| PREDICTED: similar to CG5428-PA [Apis mellifera]
Length=148

 Score = 37.7 bits (86),  Expect = 0.98
 Identities = 30/94 (31%), Positives = 49/94 (52%), Gaps = 7/94 (7%)
 Frame = +1

Query  223  WHEGTNIRTALRCYPEETRAEVVKLIKKTVDFIDAKKTLTNFEDVALCAEMIFEIF---P  393
            W   TN       Y EE ++++ K+IKKT  F+D  K+LTN +  AL   + F+     P
Sbjct  33   WKRRTNPNILFLKY-EEMKSDLPKVIKKTAAFLD--KSLTNDQVDALAQHLSFDSMKSNP  89

Query  394  VLKLEELRLVCDRMKQGHY-GKFYERLKIQEFRE  492
             +  EE  ++  RMK  +  G+F    K+ +++E
Sbjct  90   AVNYEEHIILNKRMKLINVDGEFIRSGKVDQWKE  123


>ref|ZP_04541069.1| conserved hypothetical protein [Bacteroides sp. 9_1_42FAA]
 gb|EEO61031.1| conserved hypothetical protein [Bacteroides sp. 9_1_42FAA]
Length=148

 Score = 36.6 bits (83),  Expect = 2.2
 Identities = 29/107 (27%), Positives = 51/107 (47%), Gaps = 3/107 (2%)
 Frame = +1

Query  310  VDFIDAKKTLTNFEDVALCAEMIFEIFPVLKLEELRLVCDRMKQGHYGKFYERLKIQEFR  489
            + F +  KT+ + + VAL A++I + F  LKLEE++L C R      GK Y+RL      
Sbjct  1    MSFFNVGKTMNDVQ-VALTADLIIDRFYYLKLEEIKL-CFRNAMAS-GKIYDRLDGNIIL  57

Query  490  ECIQKHEEERAPILEEMHKQIYRGTDNPTNVPEYDKEAAHLAWKLKN  630
              + +++ +R  I+  +        +N      Y +   HL  + +N
Sbjct  58   GWLNEYDAQRDEIVSSLSINEAHEQNNDNTGMFYGEYIKHLTERSEN  104


>ref|YP_002315763.1| GTPase, G3E family [Anoxybacillus flavithermus WK1]
 gb|ACJ33778.1| GTPase, G3E family [Anoxybacillus flavithermus WK1]
Length=368

 Score = 36.6 bits (83),  Expect = 2.2
 Identities = 28/106 (26%), Positives = 48/106 (45%), Gaps = 3/106 (2%)
 Frame = +1

Query  178  RISSTGLFKTTPAEAWHEGTNIRTALRCY-PEETRAEVVKLIKKTVDFIDAKKTLTNFED  354
            RI   G+  T  A  W     +   L+    E+ R   V ++ KT      ++   +FE 
Sbjct  177  RIQMGGIITTIDATRWMNRQQLSIPLQMLLKEQVRHADVLIVNKTDILRPDEQAKLSFEL  236

Query  355  VALCAEM--IFEIFPVLKLEELRLVCDRMKQGHYGKFYERLKIQEF  486
              + +E   IF  F  +++E++ L+  RM+  H     ERL+IQ +
Sbjct  237  QTINSEARTIFTTFSNVRMEDIHLLSKRMRNEHERMNIERLRIQTY  282


>ref|ZP_04551376.1| conserved hypothetical protein [Bacteroides sp. 2_2_4]
 gb|EEO55521.1| conserved hypothetical protein [Bacteroides sp. 2_2_4]
Length=148

 Score = 36.2 bits (82),  Expect = 2.8
 Identities = 29/107 (27%), Positives = 52/107 (48%), Gaps = 3/107 (2%)
 Frame = +1

Query  310  VDFIDAKKTLTNFEDVALCAEMIFEIFPVLKLEELRLVCDRMKQGHYGKFYERLKIQEFR  489
            + F +  KT+ + + VAL A++I + F  LKLEE++L C R      GK Y+RL      
Sbjct  1    MSFFNVGKTMNDVQ-VALTADLIIDRFYYLKLEEIKL-CFRNAMAS-GKIYDRLDGNIIL  57

Query  490  ECIQKHEEERAPILEEMHKQIYRGTDNPTNVPEYDKEAAHLAWKLKN  630
              + +++ +R  I+  +        +N +    Y +   HL  + +N
Sbjct  58   GWLNEYDVQRDEIVSSLSINEAHEQNNNSTGMFYGEYIKHLTERSEN  104


>emb|CAG84918.2| DEHA2A14036p [Debaryomyces hansenii]
Length=295

 Score = 36.2 bits (82),  Expect = 2.8
 Identities = 35/129 (27%), Positives = 55/129 (42%), Gaps = 5/129 (3%)
 Frame = +1

Query  280  AEVVKLIKKTVDFIDAKKTLTNFEDVALCAEMIFEIFPVLKLEELRLVCDRMKQGHYGKF  459
            +++   I K  D +D   T+  F D   C E + +     K EEL+L  +  KQ    + 
Sbjct  49   SQLTNSITKFEDILD--DTICKFNDTKWCVEQMLQNRQ--KQEELKLKEEEEKQRRIKED  104

Query  460  YERLKIQEFRECIQKHEEERAPILEEMHKQIYRGTDNPTNVPEYDKEAAHLAWK-LKNNP  636
             ER +++E +   +K EEE A I +E   Q  +  +      +  KE      + LK N 
Sbjct  105  DERKRLEEEKTKKRKKEEEEARIKKEKEDQQAKEEELKKKKEQEQKEKEQKERESLKENE  164

Query  637  FLIPGKNDN  663
                  NDN
Sbjct  165  KKSDANNDN  173


>ref|XP_456940.1| hypothetical protein DEHA0A14454g [Debaryomyces hansenii CBS767]
Length=295

 Score = 36.2 bits (82),  Expect = 2.8
 Identities = 35/129 (27%), Positives = 55/129 (42%), Gaps = 5/129 (3%)
 Frame = +1

Query  280  AEVVKLIKKTVDFIDAKKTLTNFEDVALCAEMIFEIFPVLKLEELRLVCDRMKQGHYGKF  459
            +++   I K  D +D   T+  F D   C E + +     K EEL+L  +  KQ    + 
Sbjct  49   SQLTNSITKFEDILD--DTICKFNDTKWCVEQMLQNRQ--KQEELKLKEEEEKQRRIKED  104

Query  460  YERLKIQEFRECIQKHEEERAPILEEMHKQIYRGTDNPTNVPEYDKEAAHLAWK-LKNNP  636
             ER +++E +   +K EEE A I +E   Q  +  +      +  KE      + LK N 
Sbjct  105  DERKRLEEEKTKKRKKEEEEARIKKEKEDQQAKEEELKKKKEQEQKEKEQKERESLKENE  164

Query  637  FLIPGKNDN  663
                  NDN
Sbjct  165  KKSDANNDN  173


>ref|XP_677011.1| hypothetical protein [Plasmodium berghei strain ANKA]
 emb|CAH99266.1| conserved hypothetical protein [Plasmodium berghei]
Length=1563

 Score = 35.8 bits (81),  Expect = 3.7
 Identities = 26/116 (22%), Positives = 52/116 (44%), Gaps = 6/116 (5%)
 Frame = +1

Query  298  IKKTVDFIDAKKTLTNFEDVALCAEMIFEIFPVLKLEELRLVCDRMKQGHYGKFYERLKI  477
            I KT+D +   KT  N  D+  C ++ F+ +P L  E      +            +  I
Sbjct  860  ILKTLDILLHSKTKINDGDIIKCLQICFDKYPDLDTE-----LESFLYNFNSYLLNQKNI  914

Query  478  QEFRE-CIQKHEEERAPILEEMHKQIYRGTDNPTNVPEYDKEAAHLAWKLKNNPFL  642
            ++F + CI  +E+ R  I++ +HK +     +   V  Y+ ++     ++ N+ F+
Sbjct  915  KQFLQLCITDNEQRRFVIIKSLHKFVIDNFCSNIEVDIYNDKSKESIVEMFNDEFI  970


>ref|ZP_03207066.1| hypothetical protein BACPLE_00682 [Bacteroides plebeius DSM 17135]
 gb|EDY96857.1| hypothetical protein BACPLE_00682 [Bacteroides plebeius DSM 17135]
Length=159

 Score = 35.4 bits (80),  Expect = 4.8
 Identities = 20/75 (26%), Positives = 38/75 (50%), Gaps = 1/75 (1%)
 Frame = +1

Query  313  DFIDAKKTLTNFEDVALCAEMIFEIFPVLKLEELRLVCDRMKQGHYGKFYERLKIQEFRE  492
            +F++  K +T+ +     A +I + +  L + ++ L+  R K G+YG  Y+RL  Q    
Sbjct  45   EFVNVGKKMTDAQTFET-AMIILQDYKFLTIADINLLFKRAKSGYYGNLYDRLDGQIILG  103

Query  493  CIQKHEEERAPILEE  537
              +++  ER    EE
Sbjct  104  WFRRYFSERCGAAEE  118


>ref|YP_001956163.1| CRISPR-associated protein Cas4 [uncultured Termite group 1 bacterium 
phylotype Rs-D17]
 dbj|BAG13702.1| CRISPR-associated protein Cas4 [uncultured Termite group 1 bacterium 
phylotype Rs-D17]
Length=212

 Score = 35.4 bits (80),  Expect = 4.8
 Identities = 32/110 (29%), Positives = 48/110 (43%), Gaps = 14/110 (12%)
 Frame = +1

Query  292  KLIKKTVDFIDAKKTLTNFEDVALCAEMIFEIFPVLKLEELRLVCDRMKQGHYGKFYERL  471
            KLI   V++   K    N + V LCA+        L LEE+  V        YGK   RL
Sbjct  89   KLIPFPVEYKSGKAKSDNVDKVQLCAQ-------ALCLEEMMNVTIESGAIFYGKTRNRL  141

Query  472  KIQEFRECIQKHEEERAPILEEMHKQIYRGTDNPTNVPEYDKEAAHLAWK  621
             + EF + ++   EE   + +E H  +       T  PEY K+  + ++K
Sbjct  142  NV-EFNKSLR---EETFALAQEFHSLV---DSRETPKPEYSKKCYNCSFK  184


>ref|XP_001613837.1| hypothetical protein [Plasmodium vivax SaI-1]
 gb|EDL44110.1| hypothetical protein, conserved [Plasmodium vivax]
Length=580

 Score = 35.4 bits (80),  Expect = 4.8
 Identities = 32/124 (25%), Positives = 53/124 (42%), Gaps = 14/124 (11%)
 Frame = +1

Query  241  IRTALRCYPEETRAEVVKLIKKTVDFIDAKKTLTNFEDVALCAEMIFEIFPVLKLEELRL  420
            + T      E  R  ++KLI   V F+     L N +DV L  +++ +   + K E  +L
Sbjct  20   VNTLAHVREERRRRHLLKLITTFVGFLQRYVHLMNEQDVTLLLDVMAKC-SIRKAEICQL  78

Query  421  VCDRMKQGHYGKFYERLKIQEFRECIQKHEEERAPILEEMHKQIYRGTDNPTNVPEYDKE  600
            +  R+ +G     +  L  +    CI         IL  +HK   +G  +P +   YD+E
Sbjct  79   LIQRLGKGEKNTLFYSLNSKSV--CI---------ILNSLHKVTAQGNHHPHH--PYDRE  125

Query  601  AAHL  612
            A  L
Sbjct  126  ARRL  129


>ref|YP_846770.1| sensory histidine kinase CreC [Syntrophobacter fumaroxidans MPOB]
 gb|ABK18335.1| integral membrane sensor signal transduction histidine kinase 
[Syntrophobacter fumaroxidans MPOB]
Length=475

 Score = 35.4 bits (80),  Expect = 4.8
 Identities = 16/52 (30%), Positives = 29/52 (55%), Gaps = 0/52 (0%)
 Frame = +1

Query  430  RMKQGHYGKFYERLKIQEFRECIQKHEEERAPILEEMHKQIYRGTDNPTNVP  585
            R+ +    KF +R+++  F   I+   EE+ P+L + H Q+  GT++   VP
Sbjct  314  RLSELENKKFLDRVELVPFSVLIRTTVEEKEPLLSQKHLQVQIGTEDDVRVP  365


>ref|ZP_04845357.1| conserved hypothetical protein [Bacteroides sp. 1_1_6]
 gb|EES70099.1| conserved hypothetical protein [Bacteroides sp. 1_1_6]
Length=156

 Score = 35.0 bits (79),  Expect = 6.3
 Identities = 23/108 (21%), Positives = 51/108 (47%), Gaps = 5/108 (4%)
 Frame = +1

Query  286  VVKLIKKTVDFIDAKKTLTNFEDVALCAEMIFEIFPVLKLEELRLVCDRMKQGHYGKFYE  465
            +V LI   ++F +   T++  + VA   ++I E +P +K ++ +L      +  YG+ Y 
Sbjct  2    LVILIADALEFFNVSNTMSATQ-VATTVDLIIEEYPYMKTDDFKLCFKNAMKMKYGENYN  60

Query  466  RLKIQEFRECIQKHEEERAPILEEM----HKQIYRGTDNPTNVPEYDK  597
            R+        ++++ +ER  + +      HK    G  + T+   Y++
Sbjct  61   RIDGSIIMGWLREYNKERCAVADNQSWNTHKAKLSGETSFTSGLSYEE  108
Personal tools