GOS 806020

From Metagenes
Warning: this metagenomic sequence has been carefully annotated by students during bioinformatics assignments. These quality annotations are therefore the result of a teaching exercise that you are most welcome to amend and extend if necessary!


Sequence
CAMERA AccNum : JCVI_READ_1091140850035
Annotathon code: GOS_806020
Sample :
  • GPS :1°12'58s; 90°25'22w
  • Galapagos Islands: Devil's Crown, Floreana Island - Ecuador
  • Coastal (-2.2m, 25.5°C, 0.1-0.8 microns)
Authors
Team : Biochimie 2009
Username : RAJESS
Annotated on : 2009-06-02 12:54:35
  • iorio jessica
  • lachaal raja

Synopsis

Genomic Sequence

>JCVI_READ_1091140850035 GOS_806020 Genomic DNA
TCAGAAGGTACTCCATGATCCCTGGTGTGTTCTCACTCGAGAAGGCGGCGATCAATTCCGGCACCAGCAAGAGGTAAGCTGCCTGGGGCACGGCCAAGAG
GCCTGCTCCAATTCGTCCCCTGGTGGAATATGCTCCGAAGACCAGCGCACCAATGATGGCGACTGGGATAAGGAAATACAACCCTTCCCAAACAAGAGCA
ACGTAGACGTGGAAAGCAACGATGATGCCGTTGGCCACGGCGGCCGTTCGCATATCTCGTATGATGTCCAGCATGATCATCCCTCCGAGCCCACAGTGGC
CTCGAGTGCGACTTCATCCGTCTGCCAGTTACCGTGCAACACGCCACTCAGTGCGAGAAGACCCCACAGAACCACAGGTGGAATGAACTGTGTTTGAAAT
CCATCCATCACGATCCAAATCACGAACATGACCCAAACGGGTCCAGCGATCACCGTGGCCATTTTTGCTTGAATCTCCTTCGTAAATGTAAGCGCGATGA
ATACCATTGGGATGATCATTGATACAACACGCCAAACGGCGTATTCGACCTCTGTTTCATCCAAGTTGAGGCCTTTCTCGCTCAGTACGGGTAGAATGGT
GGCTAGCGTATGTCCGACAGCCACAACGAGAAGCCAAATTTTTGGGTTAAGGATTTTCTTTAATTGATCCATACTCAAGCCTCAGTGGCCTCGGGTGCGG
CTGAATCATCCATGTTCCAACCCATCACACCGGACAAAGCAGGCATGCCCCAGAGGATCATGGGGATAATTGCTTCAGGGAACGGAGCGGCAGGCTCACC
AGCCATGTCCAATCCCATCGCAGCAGCAATCACAAACCAAACGAACACTGGCCCACAAAGAACTGCGGCAAGCCGAGACTGTTCTTGTCCTTCCGTCATG
AGAGCGGCATAGAAGAGATAGACGCCCACGGCTGCAAAGAACCCAATCATCGCAAGATGTTCGTTGCCCAACTGCGCATATGATGCACCCACGCCCATCA
GCATGTGCATGACGCGATGATGAGGAGCCAAATCTGATGTTGTGCAGATGTCAT

Translation

[374 - 775/1054]   indirect strand
>GOS_806020 Translation [374-775   indirect strand]
GLSMDQLKKILNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRVVSMIIPMVFIALTFTKEIQAKMATVIAGPVWVMFVIWIVMDGFQTQFI
PPVVLWGLLALSGVLHGNWQTDEVALEATVGSEG

[ Warning ] 5' incomplete: does not start with a Methionine

Annotator commentaries

La séquence que j'ai étudié fait 1054pb. J'ai recherché les ORF potentiels et jen ai trouvé plusieurs. J'ai donc choisit de façon totallement arbitraire d'étudier le plus long. Celui ci s'étend de la position 374 à 775 et est trouvé sur le brin indirect. Cependant il n'est pas complet en 5', je n'est, donc, pas put déterminer le poids moléculaire de la protéine pour laquelle il est susceptible de coder.


Lors de la rechercher de domaines protéiques, Interpro détecte 5 domaines potentiels (4 domaines transmembranaires et un peptide signal) mais au vue de l'absence d'information suplémentaire sur Interpro et des homologues trouvés (uniquement des séquences hypothétiques non analysés), je n'ai pas pu considérer, dans un premier temps, ces résultats comme significatifs. Mais l'alignement multiple m'a permit de retrouvé 3 des 5 domaines conservés trouvés par interpro. On peut alors considéré ces domaines comme probablement significatifs.


Le BLASTp contre nr ne m'a donner aucun résultats significatifs. Les scores faibles (entre 33 et 35), les E-values

élevés (supèrieur à 1) m'ont amené à considéré les séquences trouvées comme des faux homologues.

J'ai tenté, alors, de faire un BLASTp contre la banque environemenale et j'ai, alors, trouvé des séquences homologues avec de bons e-values. La meilleur séquence a un e-value de 4e-69 et présente 99% d'identités.


De plus, l'alignement multiple entre les 10 meilleurs séquences homologues et la séquence étudiée conforte l'idée que les séquences sont bien homologues puisqu'il y a 4 domaines conservées.


Mon analyse c'est arrétée étant donné le manque d'information. La réalisation d'un arbre ne m'orait pas donné plus d'information sur la taxonomie de cette séquence puisque tous les homologues sont des séqences hypothétiques.



L'étude de cette séquence me permet uniquement de faire l'hypothèse que c'est probablement une séquence codante, probablement assez répendu dans l'environnemen.

ORF finding

PROTOCOLE:logiciel ORF finder sur SMS


Experience 1: SMS ORFinder / sens direct / cadres 1, 2 & 3 / min 60 AA / initiation 'any codon' / code génétique 'standard'

Expérience 2: SMS ORFinder / sens indirect / cadres 1, 2 & 3 / min 60 AA / initiation 'any codon' / code génétique 'standard'



ANALYSE DES RÉSULTATS:


J'ai trouvé plusieurs ORF potentiels sur le brin direct et indirect en utilisant "any codon" et plusieurs ont une taille non négligeable. Sur le brin direct ORFinder détecte 6 ORFs potentiels dont les deux plus longs font 395 nucléotides et 278 nucléotides. Sur le brin indirect, ORFinder détecte 7 ORFs potentiels dont les plus longs ont 311,404,377 et 314 nucléotides.

Cependant, aucun ne commence par une méthionine.


L'ORF le plus long est trouvé sur le brin indirect en utilisant "any codon" et a 404 nucléotides, de la position 374 à 778 ( ORF number 2 in reading frame 2 on the reverse strand ).

En faisant un BLASTp de cette ORF contre la banque environnemental (puisque je n'ai trouvé aucun vrai homologue en faiant un BLASTp contre nr) je trouve une séquence homologue qui présente 99% d'identité avec ma séquence. Or, cette séquence homologue s'aligne parfaitement à partir de la position 3 de mon ORF.

Or, cet ORF s'étend de la position 374 à 778, donc il est totalement interne à mon fragment. Le codon d'initiation est probablement situé en amont de la position 374.


Le nombre de séquences homologues trouvés me permet de penser que ma séquence est codante.


Je garde donc cette ORF (position 374 à 778) pour la suite de l'analyse en faisant l'hypothèse qu'il est codant.


Cet ORF est incomplet en 5', je ne peut donc pas déterminer son poids moléculaire.



RÉSULTATS BRUTS:

Expérience 1: sens direct,"any codon"


>ORF number 1 in reading frame 1 on the direct strand extends from base 277 to base 522.
TCATCCCTCCGAGCCCACAGTGGCCTCGAGTGCGACTTCATCCGTCTGCCAGTTACCGTG
CAACACGCCACTCAGTGCGAGAAGACCCCACAGAACCACAGGTGGAATGAACTGTGTTTG
AAATCCATCCATCACGATCCAAATCACGAACATGACCCAAACGGGTCCAGCGATCACCGT
GGCCATTTTTGCTTGAATCTCCTTCGTAAATGTAAGCGCGATGAATACCATTGGGATGAT
CATTGA

>Translation of ORF number 1 in reading frame 1 on the direct strand.
SSLRAHSGLECDFIRLPVTVQHATQCEKTPQNHRWNELCLKSIHHDPNHEHDPNGSSDHR
GHFCLNLLRKCKRDEYHWDDH*

>ORF number 2 in reading frame 1 on the direct strand extends from base 664 to base 912.
TTGATCCATACTCAAGCCTCAGTGGCCTCGGGTGCGGCTGAATCATCCATGTTCCAACCC
ATCACACCGGACAAAGCAGGCATGCCCCAGAGGATCATGGGGATAATTGCTTCAGGGAAC
GGAGCGGCAGGCTCACCAGCCATGTCCAATCCCATCGCAGCAGCAATCACAAACCAAACG
AACACTGGCCCACAAAGAACTGCGGCAAGCCGAGACTGTTCTTGTCCTTCCGTCATGAGA
GCGGCATAG

>Translation of ORF number 2 in reading frame 1 on the direct strand.
LIHTQASVASGAAESSMFQPITPDKAGMPQRIMGIIASGNGAAGSPAMSNPIAAAITNQT
NTGPQRTAASRDCSCPSVMRAA*

>ORF number 1 in reading frame 2 on the direct strand extends from base 2 to base 397.
CAGAAGGTACTCCATGATCCCTGGTGTGTTCTCACTCGAGAAGGCGGCGATCAATTCCGG
CACCAGCAAGAGGTAAGCTGCCTGGGGCACGGCCAAGAGGCCTGCTCCAATTCGTCCCCT
GGTGGAATATGCTCCGAAGACCAGCGCACCAATGATGGCGACTGGGATAAGGAAATACAA
CCCTTCCCAAACAAGAGCAACGTAGACGTGGAAAGCAACGATGATGCCGTTGGCCACGGC
GGCCGTTCGCATATCTCGTATGATGTCCAGCATGATCATCCCTCCGAGCCCACAGTGGCC
TCGAGTGCGACTTCATCCGTCTGCCAGTTACCGTGCAACACGCCACTCAGTGCGAGAAGA
CCCCACAGAACCACAGGTGGAATGAACTGTGTTTGA

>Translation of ORF number 1 in reading frame 2 on the direct strand.
QKVLHDPWCVLTREGGDQFRHQQEVSCLGHGQEACSNSSPGGICSEDQRTNDGDWDKEIQ
PFPNKSNVDVESNDDAVGHGGRSHISYDVQHDHPSEPTVASSATSSVCQLPCNTPLSARR
PHRTTGGMNCV*

>ORF number 2 in reading frame 2 on the direct strand extends from base 473 to base 667.
ATCTCCTTCGTAAATGTAAGCGCGATGAATACCATTGGGATGATCATTGATACAACACGC
CAAACGGCGTATTCGACCTCTGTTTCATCCAAGTTGAGGCCTTTCTCGCTCAGTACGGGT
AGAATGGTGGCTAGCGTATGTCCGACAGCCACAACGAGAAGCCAAATTTTTGGGTTAAGG
ATTTTCTTTAATTGA

>Translation of ORF number 2 in reading frame 2 on the direct strand.
ISFVNVSAMNTIGMIIDTTRQTAYSTSVSSKLRPFSLSTGRMVASVCPTATTRSQIFGLR
IFFN*

>ORF number 1 in reading frame 3 on the direct strand extends from base 207 to base 386.
ACGTGGAAAGCAACGATGATGCCGTTGGCCACGGCGGCCGTTCGCATATCTCGTATGATG
TCCAGCATGATCATCCCTCCGAGCCCACAGTGGCCTCGAGTGCGACTTCATCCGTCTGCC
AGTTACCGTGCAACACGCCACTCAGTGCGAGAAGACCCCACAGAACCACAGGTGGAATGA


>Translation of ORF number 1 in reading frame 3 on the direct strand.
TWKATMMPLATAAVRISRMMSSMIIPPSPQWPRVRLHPSASYRATRHSVREDPTEPQVE*


>ORF number 2 in reading frame 3 on the direct strand extends from base 705 to base 983.
ATCATCCATGTTCCAACCCATCACACCGGACAAAGCAGGCATGCCCCAGAGGATCATGGG
GATAATTGCTTCAGGGAACGGAGCGGCAGGCTCACCAGCCATGTCCAATCCCATCGCAGC
AGCAATCACAAACCAAACGAACACTGGCCCACAAAGAACTGCGGCAAGCCGAGACTGTTC
TTGTCCTTCCGTCATGAGAGCGGCATAGAAGAGATAGACGCCCACGGCTGCAAAGAACCC
AATCATCGCAAGATGTTCGTTGCCCAACTGCGCATATGA

>Translation of ORF number 2 in reading frame 3 on the direct strand.
IIHVPTHHTGQSRHAPEDHGDNCFRERSGRLTSHVQSHRSSNHKPNEHWPTKNCGKPRLF
LSFRHESGIEEIDAHGCKEPNHRKMFVAQLRI*



Expérience 2: sens indirect,"any codon"


>ORF number 1 in reading frame 1 on the reverse strand extends from base 499 to base 726.
AACAGAGGTCGAATACGCCGTTTGGCGTGTTGTATCAATGATCATCCCAATGGTATTCAT
CGCGCTTACATTTACGAAGGAGATTCAAGCAAAAATGGCCACGGTGATCGCTGGACCCGT
TTGGGTCATGTTCGTGATTTGGATCGTGATGGATGGATTTCAAACACAGTTCATTCCACC
TGTGGTTCTGTGGGGTCTTCTCGCACTGAGTGGCGTGTTGCACGGTAA

>Translation of ORF number 1 in reading frame 1 on the reverse strand.
NRGRIRRLACCINDHPNGIHRAYIYEGDSSKNGHGDRWTRLGHVRDLDRDGWISNTVHST
CGSVGSSRTEWRVAR*

>ORF number 2 in reading frame 1 on the reverse strand extends from base 742 to base 1053.
AGTCGCACTCGAGGCCACTGTGGGCTCGGAGGGATGATCATGCTGGACATCATACGAGAT
ATGCGAACGGCCGCCGTGGCCAACGGCATCATCGTTGCTTTCCACGTCTACGTTGCTCTT
GTTTGGGAAGGGTTGTATTTCCTTATCCCAGTCGCCATCATTGGTGCGCTGGTCTTCGGA
GCATATTCCACCAGGGGACGAATTGGAGCAGGCCTCTTGGCCGTGCCCCAGGCAGCTTAC
CTCTTGCTGGTGCCGGAATTGATCGCCGCCTTCTCGAGTGAGAACACACCAGGGATCATG
GAGTACCTTCTG

>Translation of ORF number 2 in reading frame 1 on the reverse strand.
SRTRGHCGLGGMIMLDIIRDMRTAAVANGIIVAFHVYVALVWEGLYFLIPVAIIGALVFG
AYSTRGRIGAGLLAVPQAAYLLLVPELIAAFSSENTPGIMEYLL

>ORF number 1 in reading frame 2 on the reverse strand extends from base 5 to base 259.
CATCTGCACAACATCAGATTTGGCTCCTCATCATCGCGTCATGCACATGCTGATGGGCGT
GGGTGCATCATATGCGCAGTTGGGCAACGAACATCTTGCGATGATTGGGTTCTTTGCAGC
CGTGGGCGTCTATCTCTTCTATGCCGCTCTCATGACGGAAGGACAAGAACAGTCTCGGCT
TGCCGCAGTTCTTTGTGGGCCAGTGTTCGTTTGGTTTGTGATTGCTGCTGCGATGGGATT
GGACATGGCTGGTGA

>Translation of ORF number 1 in reading frame 2 on the reverse strand.
HLHNIRFGSSSSRHAHADGRGCIICAVGQRTSCDDWVLCSRGRLSLLCRSHDGRTRTVSA
CRSSLWASVRLVCDCCCDGIGHGW*

>ORF number 2 in reading frame 2 on the reverse strand extends from base 374 to base 778.
GGCTTGAGTATGGATCAATTAAAGAAAATCCTTAACCCAAAAATTTGGCTTCTCGTTGTG
GCTGTCGGACATACGCTAGCCACCATTCTACCCGTACTGAGCGAGAAAGGCCTCAACTTG
GATGAAACAGAGGTCGAATACGCCGTTTGGCGTGTTGTATCAATGATCATCCCAATGGTA
TTCATCGCGCTTACATTTACGAAGGAGATTCAAGCAAAAATGGCCACGGTGATCGCTGGA
CCCGTTTGGGTCATGTTCGTGATTTGGATCGTGATGGATGGATTTCAAACACAGTTCATT
CCACCTGTGGTTCTGTGGGGTCTTCTCGCACTGAGTGGCGTGTTGCACGGTAACTGGCAG
ACGGATGAAGTCGCACTCGAGGCCACTGTGGGCTCGGAGGGATGA


>Translation of ORF number 2 in reading frame 2 on the reverse strand.
GLSMDQLKKILNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRVVSMIIPMV
FIALTFTKEIQAKMATVIAGPVWVMFVIWIVMDGFQTQFIPPVVLWGLLALSGVLHGNWQ
TDEVALEATVGSEG*

>ORF number 3 in reading frame 2 on the reverse strand extends from base 779 to base 1003.
TCATGCTGGACATCATACGAGATATGCGAACGGCCGCCGTGGCCAACGGCATCATCGTTG
CTTTCCACGTCTACGTTGCTCTTGTTTGGGAAGGGTTGTATTTCCTTATCCCAGTCGCCA
TCATTGGTGCGCTGGTCTTCGGAGCATATTCCACCAGGGGACGAATTGGAGCAGGCCTCT
TGGCCGTGCCCCAGGCAGCTTACCTCTTGCTGGTGCCGGAATTGA

>Translation of ORF number 3 in reading frame 2 on the reverse strand.
SCWTSYEICERPPWPTASSLLSTSTLLLFGKGCISLSQSPSLVRWSSEHIPPGDELEQAS
WPCPRQLTSCWCRN*

>ORF number 1 in reading frame 3 on the reverse strand extends from base 3 to base 380.
GACATCTGCACAACATCAGATTTGGCTCCTCATCATCGCGTCATGCACATGCTGATGGGC
GTGGGTGCATCATATGCGCAGTTGGGCAACGAACATCTTGCGATGATTGGGTTCTTTGCA
GCCGTGGGCGTCTATCTCTTCTATGCCGCTCTCATGACGGAAGGACAAGAACAGTCTCGG
CTTGCCGCAGTTCTTTGTGGGCCAGTGTTCGTTTGGTTTGTGATTGCTGCTGCGATGGGA
TTGGACATGGCTGGTGAGCCTGCCGCTCCGTTCCCTGAAGCAATTATCCCCATGATCCTC
TGGGGCATGCCTGCTTTGTCCGGTGTGATGGGTTGGAACATGGATGATTCAGCCGCACCC
GAGGCCACTGAGGCTTGA

>Translation of ORF number 1 in reading frame 3 on the reverse strand.
DICTTSDLAPHHRVMHMLMGVGASYAQLGNEHLAMIGFFAAVGVYLFYAALMTEGQEQSR
LAAVLCGPVFVWFVIAAAMGLDMAGEPAAPFPEAIIPMILWGMPALSGVMGWNMDDSAAP
EATEA*

>ORF number 2 in reading frame 3 on the reverse strand extends from base 708 to base 1022.
GTGGCGTGTTGCACGGTAACTGGCAGACGGATGAAGTCGCACTCGAGGCCACTGTGGGCT
CGGAGGGATGATCATGCTGGACATCATACGAGATATGCGAACGGCCGCCGTGGCCAACGG
CATCATCGTTGCTTTCCACGTCTACGTTGCTCTTGTTTGGGAAGGGTTGTATTTCCTTAT
CCCAGTCGCCATCATTGGTGCGCTGGTCTTCGGAGCATATTCCACCAGGGGACGAATTGG
AGCAGGCCTCTTGGCCGTGCCCCAGGCAGCTTACCTCTTGCTGGTGCCGGAATTGATCGC
CGCCTTCTCGAGTGA

>Translation of ORF number 2 in reading frame 3 on the reverse strand.
VACCTVTGRRMKSHSRPLWARRDDHAGHHTRYANGRRGQRHHRCFPRLRCSCLGRVVFPY
PSRHHWCAGLRSIFHQGTNWSRPLGRAPGSLPLAGAGIDRRLLE*




Multiple Alignement

PROTOCOLE:


Phylogeny.fr/ ClustalW/


ANALYSE DES RÉSULTATS:


L'alignemnt multiple montre qu'il n'y a de nombreuses positions conservées. On observe 4 domaines conservés: position 24 à 46, position 55 à 66, position 72 à 100 et position 126 à 141. Ces domaines conservés ne correspondent pas exactement aux domaines conservés trouvé par interpro mais sont quand même situé au même niveau de la séquence de la protéine. Exepté pour le dernier domaine trouvé par l'alignement multiple (126 à 141) les domaines trouvés par interpro sont contenus dans les domaines trouvés par l'alignement multiple.


Il semble donc que ces domaines soit significatifs.



RÉSULTATS BRUTS:

CLUSTAL FORMAT: MUSCLE (3.7) multiple sequence alignment


gi|1437074      --------------------MDPLDPQIWLILVALGHTVPGVLMATNWA----DDTAKMV
gi|1396843      --------------VNLDQLKKILNPQIWLLIVAVGHTLATILPVLSDKGINMEDTEVEY
gi|1354711      --------MLQKKALEMDQLKKILNPKIWLIVVALGHTLATILPVLSDKGLNLDDTEVEY
gi|1442176      --------------MSMDQLKKILNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEY
gi|1349134      -----------------------LNPKIWLLVVAGGHTLATILPVLSDKGLNLDETEVEY
gi|1381482      ---------IMDKESRSIDVKTILNPKAWLIITGVMHALAGVLAELDME----NETRVVI
gi|1435998      ----------MDKEAKYLDVILILNPKIWLIVTGLIHAGVGVLSELDMG----NETQVVV
gi|1374554      IKAEKVKRPRPRRVLEMDQLKKICNVKVWLILIAVIHTVVGVLAQTDFS----VDAEAEM
gi|1425328      -------------MSEVDWKTEALNPKWWLIILALGHTFMGVLLPMDPD----NDNELMV
gi|1353428      -------------MSDLDYKKEVLNPKWWLIVNAIGHTFIGVLLPLDPD----NDNELMI
                                        : : **:: .  *:   :*   .       :     

gi|1437074      AGWMLLTSVT-LVYAALGMDGEEQARLAVVLAGPVWIWFVVCITQGLEYTMGKEPITMNW
gi|1396843      AVWKIVSMIIPMVFIAFTFNKEIQAKLATAIAGPVWVMFVVMILME------------GF
gi|1354711      AVWRVVSMILPMVFIALTFTKEIQAKLATAIAGPIWVMFVVWNIVD------------GF
gi|1442176      AVWRVVSMIIPMVFIALTFTKEIQAKMATVIAGPVWVMFVIWIVMD------------GF
gi|1349134      AVWRVVSMIIPMVFIALTFTKEIQAKLATAIAGPVWVMFVVWTVMD------------GF
gi|1381482      SGWFLLTTVT-MLYAAFFTEGVQQARLATVIAGPIWVWFIICIAQGYTLDYEGE--TWTF
gi|1435998      SGFFLLTTFT-MLYAAFFTEGEQQARLATVIAGPIWVWFVICVAQGYTLDYEGE--TFTF
gi|1374554      AGVFLVISTY-LFYAAFFTTGEAQARLAAVLAGPIWVWFMICALFELE---SGTGFVWEL
gi|1425328      SGIFLVTTVY-MLYAIFLNEGQAQARLAAVIAGPVWVWFVICMALELEITYSTEPITWGF
gi|1353428      AGVFLVITVY-MLYAAFLTTGRSQARLAAVIAGPIWVWFLVCIALGLEITYTTDAAAQSF
                :   ::     :.:  :      **.:*..:***:*: *::                   

gi|1437074      K--DNGPPVVLWGVLALSGLLGSGWI--------------
gi|1396843      E-TLFIPPLVLWGLLALSGLLHGNFQNLMSKE--------
gi|1354711      Q-TLFIPPLVLWGLLALSGLLHGNFQNLLDSDSE------
gi|1442176      Q-TQFIPPVVLWGLLALSGVLHGNWQTDEVALEATVGSEG
gi|1349134      Q-TLLIPPLVLWGLLALSGVLHGNWQTNEAAPAEV-----
gi|1381482      ELGSNIPPLILWGMTALSGIIHGNFQELNK----------
gi|1435998      ELNTSIPPLIIWGMTALSGIVHGNFQELTK----------
gi|1374554      G-PELLPPLVFWGMTALSGLLHGNFHNLMSSQEA------
gi|1425328      S-SDFVPPMILWGMTALTGVL--GWNSED-----------
gi|1353428      DLADNVPPLIFWGMTALTGLL--GWNMEDE----------
                      **:::**: **:*::  .:               


Gblocks 0.91b Results
Processed file: input.fasta
Number of sequences: 10
Alignment assumed to be: Protein
New number of positions: 76

 10 160
gi|1437074 ---------- ---------- MDPLDPQIWL ILVALGHTVP GVLMATNWA-
gi|1396843 ---------- ----VNLDQL KKILNPQIWL LIVAVGHTLA TILPVLSDKG
gi|1354711 --------ML QKKALEMDQL KKILNPKIWL IVVALGHTLA TILPVLSDKG
gi|1442176 ---------- ----MSMDQL KKILNPKIWL LVVAVGHTLA TILPVLSEKG
gi|1349134 ---------- ---------- ---LNPKIWL LVVAGGHTLA TILPVLSDKG
gi|1381482 ---------I MDKESRSIDV KTILNPKAWL IITGVMHALA GVLAELDME-
gi|1435998 ---------- MDKEAKYLDV ILILNPKIWL IVTGLIHAGV GVLSELDMG-
gi|1374554 IKAEKVKRPR PRRVLEMDQL KKICNVKVWL ILIAVIHTVV GVLAQTDFS-
gi|1425328 ---------- ---MSEVDWK TEALNPKWWL IILALGHTFM GVLLPMDPD-
gi|1353428 ---------- ---MSDLDYK KEVLNPKWWL IVNAIGHTFI GVLLPLDPD-

           ---DDTAKMV AGWMLLTSVT -LVYAALGMD GEEQARLAVV LAGPVWIWFV
           INMEDTEVEY AVWKIVSMII PMVFIAFTFN KEIQAKLATA IAGPVWVMFV
           LNLDDTEVEY AVWRVVSMIL PMVFIALTFT KEIQAKLATA IAGPIWVMFV
           LNLDETEVEY AVWRVVSMII PMVFIALTFT KEIQAKMATV IAGPVWVMFV
           LNLDETEVEY AVWRVVSMII PMVFIALTFT KEIQAKLATA IAGPVWVMFV
           ---NETRVVI SGWFLLTTVT -MLYAAFFTE GVQQARLATV IAGPIWVWFI
           ---NETQVVV SGFFLLTTFT -MLYAAFFTE GEQQARLATV IAGPIWVWFV
           ---VDAEAEM AGVFLVISTY -LFYAAFFTT GEAQARLAAV LAGPIWVWFM
           ---NDNELMV SGIFLVTTVY -MLYAIFLNE GQAQARLAAV IAGPVWVWFV
           ---NDNELMI AGVFLVITVY -MLYAAFLTT GRSQARLAAV IAGPIWVWFL

           VCITQGLEYT MGKEPITMNW K--DNGPPVV LWGVLALSGL LGSGWI----
           VMILME---- --------GF E-TLFIPPLV LWGLLALSGL LHGNFQNLMS
           VWNIVD---- --------GF Q-TLFIPPLV LWGLLALSGL LHGNFQNLLD
           IWIVMD---- --------GF Q-TQFIPPVV LWGLLALSGV LHGNWQTDEV
           VWTVMD---- --------GF Q-TLLIPPLV LWGLLALSGV LHGNWQTNEA
           ICIAQGYTLD YEGE--TWTF ELGSNIPPLI LWGMTALSGI IHGNFQELNK
           ICVAQGYTLD YEGE--TFTF ELNTSIPPLI IWGMTALSGI VHGNFQELTK
           ICALFELE-- -SGTGFVWEL G-PELLPPLV FWGMTALSGL LHGNFHNLMS
           ICMALELEIT YSTEPITWGF S-SDFVPPMI LWGMTALTGV L--GWNSED-
           VCIALGLEIT YTTDAAAQSF DLADNVPPLI FWGMTALTGL L--GWNMEDE

           ---------- 
           KE-------- 
           SDSE------ 
           ALEATVGSEG 
           APAEV----- 
           ---------- 
           ---------- 
           SQEA

Protein Domains

PROTOCOLE:


InterProScan / parametres par defauts



ANALYSE DES RÉSULTATS:


La recherche de domaine protéique me donne 5 dommaines potentiels: un peptide signal et 4 domaines transmenbranaires. Or, aucune information suplementaire n'est donné sur c'est domaine, mis à part leurs postions, et l'analyse des résultats du BLAST ne me permet pas de considérer ces résultats comme significatifs.


L'existance d'un peptide signal et de domaines transmenbranaire nous indique que l'ORF code proablement pour une protéine apartenant soit à une cellule eucaryote soit à une cellule procaryote.

RÉSULTATS BRUTS:

GOS_806020	7B9C53B068E8FE88	134	SignalPHMM	SignalP	signal-peptide	1	27	NA	?	25-Apr-2009	NULL	NULL
GOS_806020	7B9C53B068E8FE88	134	TMHMM	tmhmm	transmembrane_regions	15	33	NA	?	25-Apr-2009	NULL	NULL
GOS_806020	7B9C53B068E8FE88	134	TMHMM	tmhmm	transmembrane_regions	47	67	NA	?	25-Apr-2009	NULL	NULL
GOS_806020	7B9C53B068E8FE88	134	TMHMM	tmhmm	transmembrane_regions	72	92	NA	?	25-Apr-2009	NULL	NULL
GOS_806020	7B9C53B068E8FE88	134	TMHMM	tmhmm	transmembrane_regions	98	116	NA	?	25-Apr-2009	NULL	NULL


Phylogeny

PROTOCOLE:


Arbre PhyML par Phylogeny.fr "a la carte", paramètre par défaut.


ANALYSE DES RÉSULTATS:


D'après l'arbre la séquence protéique étudié semble être abondante dans l'environnement.


L'arbre nous permet de voir que la séquence étudié semble plus proche de 4 séquences particulières: GOS9190304, GOS9793682, GOS1644588 et GOS4462809.




RÉSULTATS BRUTS:


                                                                                                         ----0.1---
 
                                   +--------------------gi_142532837_gb_ECY76417.1_hypothetical_protein_GOS_2301336_mari
                  +----------------+
                  |                +------gi_135342819_gb_EBG99739.1_hypothetical_protein_GOS_9338418_mari
           +------+
           |      |
 +---------+      +-----------------------------gi_137455497_gb_EBU19997.1_hypothetical_protein_GOS_7151055_mari
 |         |
 |         |                    +---------gi_138148208_gb_EBY01539.1_hypothetical_protein_GOS_6042715_mari
 |         +--------------------+
 |                              +---------gi_143599849_gb_EDF80096.1_hypothetical_protein_GOS_891873_marin
 |
 |  +------------------------------------------gi_143707419_gb_EDG38170.1_hypothetical_protein_GOS_791552_marin
 |  |
 |  |
 |  |                                           +-------gi_139684350_gb_ECG75866.1_hypothetical_protein_GOS_4462809_mari
 |  |                                           |
 +--+                                       +---+     +gi_144217645_gb_EDJ72196.1_hypothetical_protein_GOS_1644588_mari
    |                                       |   | +---+
    |                                       |   | |   +GOS_806020_Traduction_374-775_sens_indirect
    |                                       |   +-+
    +---------------------------------------+     |
                                            |     +-gi_134913429_gb_EBE25933.1_hypothetical_protein_GOS_9793682_mari
                                            |
                                            +gi_135471197_gb_EBH85890.1_hypothetical_protein_GOS_9190304_mari

Taxonomy report

PROTOCOLE:




ANALYSE DES RÉSULTATS:


Je n'ai trouvé aucun homologue dans la banque nr et les seuls homologues ont été trouvé dans la banque environnemental. Je ne peut donc pas faire de relations taxonomiques entre ma séquence et les séquences homologues trouvés qui sont des séquences hypothétiques.

RÉSULTATS BRUTS:

BLAST

PROTOCOLE:


1) BLASTp contre NR, paramètres par défaut au NCBI sauf "Max target sequences_500"


2) BLASTp contre env_nr, paramètres par défaut au NCBI sauf "Max target sequences_500"



ANALYSE DES RÉSULTATS:


Le BLASTp contre nr donne très peu de séquences (7 séquences) et avec de mauvais scores. Le meilleur E-value est 1.4 et lorsque j'observe les alignement 2 à 2 j'observe que les "meilleurs" alignements ne se font que sur de courtes régions.

Je considère, donc, ces séquences comme de faux positifs.


J'effectue, alors, un second BLASTp de ma séquence mas cette fois contre la banque environnementale pour tenté de trouver des vrais homologues qui me permettrait de poursuivre l'analyse.

J'obtient des résultats interessant. Les soixante premières séquences ont des e-value faibles, des scores élevés et de bon alignement 2 à 2. Toute ces séquences proviennent de métagénome marin.

Je trouve une séquence avec 99%. Celle ci s'aligne parfaitement sur toute la longueur de mon ORF à partir de la position 3 de l'ORF et à une E-value de 4e-69 (gb|EDJ72196.1| hypothetical protein GOS_1644588 [marine metag... 261 4e-69).


On peut alors, placé le seuil entre vrai et faux positifs à la séquence ayant une e-value de 7e-04 (gb|ECZ18801.1| hypothetical protein GOS_2226749 [marine metag... 45.1 7e-04).


Au vue du nombre de séquences homologues, de la valeurs des e-values et de la qualité des alignements 2 à 2 on peut penser que la séquence que j'étudie est répandue dans l'environnement malgrès qu'elle n'est pas encors été étudier.



RESULTATS BRUTS

1) BLASTp contre NR

Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|XP_001796362.1|  hypothetical protein SNOG_05974 [Phaeosph...  35.8    1.4   
gb|EEH20263.1|  plasma membrane calcium-transporting ATPase [P...  35.4    1.7  
ref|XP_001935463.1|  malate dehydrogenase, mitochondrial precu...  34.7    2.9   
ref|XP_001468403.1|  malate dehydrogenase [Leishmania infantum...  34.3    3.6   
ref|XP_002172995.1|  malate dehydrogenase [Schizosaccharomyces...  34.3    3.6   
ref|XP_001686105.1|  malate dehydrogenase [Leishmania major st...  34.3    3.9   
gb|EEH36768.1|  calcium-transporting ATPase [Paracoccidioides ...  33.1    8.8  


>ref|XP_001796362.1|  hypothetical protein SNOG_05974 [Phaeosphaeria nodorum SN15]
 gb|EAT87038.2|  hypothetical protein SNOG_05974 [Phaeosphaeria nodorum SN15]
Length=339

 GENE ID: 5973239 SNOG_05974 | hypothetical protein [Phaeosphaeria nodorum SN15]

 Score = 35.8 bits (81),  Expect = 1.4, Method: Compositional matrix adjust.
 Identities = 19/49 (38%), Positives = 30/49 (61%), Gaps = 1/49 (2%)

Query  4    MDQLKKILNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRV  52
            + QLK   +P    + V  GH+ ATI+P+LS+ G NL+  +++  V RV
Sbjct  181  ISQLKNT-DPSSENITVVGGHSGATIVPLLSQSGYNLEGEKLDSYVNRV  228


>gb|EEH20263.1|  plasma membrane calcium-transporting ATPase [Paracoccidioides 
brasiliensis Pb03]
Length=1271

 Score = 35.4 bits (80),  Expect = 1.7, Method: Compositional matrix adjust.
 Identities = 36/113 (31%), Positives = 50/113 (44%), Gaps = 6/113 (5%)

Query  2    LSMDQLKKILNPKIWLLVVAVG--HTLATILPVLSEKGLNLDETEVEYAV-WRVVSMIIP  58
             S  QL K+LNPK     VA+G  H L   L      GL++DET++E  V +   +    
Sbjct  177  FSPGQLNKMLNPKSLNAFVALGGLHGLERGLRTNLTSGLSIDETKLEGTVTFDEATKNAA  236

Query  59   MVFIALTFTKEIQAKMATVIAGPVWVMFVIWIVMDGFQTQFIPPVVLWGLLAL  111
                   F  E+ AKM T    PV   FV  + +  +Q+  +P     G L L
Sbjct  237  SGKYQPEFKHEL-AKMPTEAGIPVESQFVDRLRV--YQSNKLPERKADGFLVL  286


>ref|XP_001935463.1|  malate dehydrogenase, mitochondrial precursor [Pyrenophora tritici-repentis 
Pt-1C-BFP]
 gb|EDU48037.1|  malate dehydrogenase, mitochondrial precursor [Pyrenophora tritici-repentis 
Pt-1C-BFP]
Length=339

 GENE ID: 6343377 PTRG_05130 | malate dehydrogenase, mitochondrial precursor
[Pyrenophora tritici-repentis Pt-1C-BFP]

 Score = 34.7 bits (78),  Expect = 2.9, Method: Compositional matrix adjust.
 Identities = 19/49 (38%), Positives = 30/49 (61%), Gaps = 1/49 (2%)

Query  4    MDQLKKILNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRV  52
            + QLK   +P    + V  GH+ ATI+P+LS+ G NL+  +++  V RV
Sbjct  181  ISQLKNT-DPANENITVIGGHSGATIVPLLSQSGHNLEGEQLKQYVHRV  228


>ref|XP_001468403.1|  malate dehydrogenase [Leishmania infantum]
 emb|CAM71487.1|  malate dehydrogenase, putative [Leishmania infantum]
Length=331

 GENE ID: 5072479 LinJ34.0160 | malate dehydrogenase [Leishmania infantum JPCM5]

 Score = 34.3 bits (77),  Expect = 3.6, Method: Compositional matrix adjust.
 Identities = 19/57 (33%), Positives = 33/57 (57%), Gaps = 6/57 (10%)

Query  2    LSMDQLKKIL------NPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRV  52
            L+M + +K+L      +P++  + V  GH+  TI+P+ S  G+ L + +VEY   RV
Sbjct  155  LNMMRARKMLGDFTGQDPEMLDVPVIGGHSGQTIVPLFSHSGVELRQEQVEYLTHRV  211


>ref|XP_002172995.1|  malate dehydrogenase [Schizosaccharomyces japonicus yFS275]
 gb|EEB06702.1|  malate dehydrogenase [Schizosaccharomyces japonicus yFS275]
Length=344

 GENE ID: 7048184 SJAG_01753 | malate dehydrogenase
[Schizosaccharomyces japonicus yFS275]

 Score = 34.3 bits (77),  Expect = 3.6, Method: Compositional matrix adjust.
 Identities = 16/44 (36%), Positives = 27/44 (61%), Gaps = 0/44 (0%)

Query  9    KILNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRV  52
            K  +P+   + +  GH+ ATI+P+LS+ G+ L E E +  V R+
Sbjct  193  KGASPETIRVPIVGGHSGATIVPLLSQSGVQLSEKERDEIVHRI  236


>ref|XP_001686105.1|  malate dehydrogenase [Leishmania major strain Friedlin]
 emb|CAJ07716.1|  malate dehydrogenase, putative [Leishmania major]
Length=331

 GENE ID: 5654769 LmjF34.0150 | malate dehydrogenase
[Leishmania major strain Friedlin] (10 or fewer PubMed links)

 Score = 34.3 bits (77),  Expect = 3.9, Method: Compositional matrix adjust.
 Identities = 19/57 (33%), Positives = 33/57 (57%), Gaps = 6/57 (10%)

Query  2    LSMDQLKKIL------NPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRV  52
            L+M + +K+L      +P++  + V  GH+  TI+P+ S  G+ L + +VEY   RV
Sbjct  155  LNMMRARKMLGDFTGQDPEMLDVPVIGGHSGQTIVPLFSHSGVELRQEQVEYLTHRV  211


>gb|EEH36768.1|  calcium-transporting ATPase [Paracoccidioides brasiliensis Pb01]
Length=1010

 Score = 33.1 bits (74),  Expect = 8.8, Method: Compositional matrix adjust.
 Identities = 35/113 (30%), Positives = 49/113 (43%), Gaps = 6/113 (5%)

Query  2    LSMDQLKKILNPKIWLLVVAVG--HTLATILPVLSEKGLNLDETEVEYAV-WRVVSMIIP  58
             S  QL K+LNPK     VA+G  H L   L      GL++DET++E  V +   +    
Sbjct  339  FSPGQLNKMLNPKSLNAFVALGGLHGLERGLRTNLTSGLSIDETKLEGTVTFDEATKYAA  398

Query  59   MVFIALTFTKEIQAKMATVIAGPVWVMFVIWIVMDGFQTQFIPPVVLWGLLAL  111
                   F  E+ AKM T     V   FV  + +  +Q+  +P     G L L
Sbjct  399  SGKYQPVFKHEL-AKMPTEAGFSVESQFVDRLRV--YQSNKLPEREADGFLVL  448


2) BLASTp contre env_nr

Score     E
Sequences producing significant alignments:                       (Bits)  Value

gb|EDJ72196.1|  hypothetical protein GOS_1644588 [marine metag...   261    4e-69
gb|EBH85890.1|  hypothetical protein GOS_9190304 [marine metag...   213    2e-54
gb|EBE25933.1|  hypothetical protein GOS_9793682 [marine metag...   209    3e-53
gb|ECG75866.1|  hypothetical protein GOS_4462809 [marine metag...   201    1e-50
gb|EBU19997.1|  hypothetical protein GOS_7151055 [marine metag...  86.7    2e-16
gb|EBY01539.1|  hypothetical protein GOS_6042715 [marine metag...  81.6    8e-15
gb|EDF80096.1|  hypothetical protein GOS_891873 [marine metage...  74.7    1e-12
gb|ECY76417.1|  hypothetical protein GOS_2301336 [marine metag...  70.5    2e-11
gb|EDG38170.1|  hypothetical protein GOS_791552 [marine metage...  68.9    5e-11
gb|EBG99739.1|  hypothetical protein GOS_9338418 [marine metag...  67.4    1e-10
gb|EBD82573.1|  hypothetical protein GOS_9866050 [marine metag...  66.6    3e-10
gb|EDC90382.1|  hypothetical protein GOS_1394185 [marine metag...  65.5    6e-10
gb|EBL69016.1|  hypothetical protein GOS_8530415 [marine metag...  63.2    3e-09
gb|ECV78615.1|  hypothetical protein GOS_2835637 [marine metag...  61.6    8e-09
gb|EBU94746.1|  hypothetical protein GOS_6980504 [marine metag...  59.3    4e-08
gb|EBU18242.1|  hypothetical protein GOS_7153853 [marine metag...  58.5    7e-08
gb|EDB04065.1|  hypothetical protein GOS_1891998 [marine metag...  58.2    9e-08
gb|ECV37858.1|  hypothetical protein GOS_2910426 [marine metag...  57.8    1e-07
gb|ECX28953.1|  hypothetical protein GOS_2565438 [marine metag...  57.8    1e-07
gb|ECV05432.1|  hypothetical protein GOS_2969774 [marine metag...  57.8    1e-07
gb|ECK56059.1|  hypothetical protein GOS_3382863 [marine metag...  57.8    1e-07
gb|ECX20625.1|  hypothetical protein GOS_2580111 [marine metag...  57.4    2e-07
gb|ECO08045.1|  hypothetical protein GOS_3404690 [marine metag...  57.4    2e-07
gb|ECV69331.1|  hypothetical protein GOS_2851916 [marine metag...  56.6    3e-07
gb|EBG83707.1|  hypothetical protein GOS_9365703 [marine metag...  56.2    3e-07
gb|EBV02827.1|  hypothetical protein GOS_6967844 [marine metag...  56.2    4e-07
gb|EDB01389.1|  hypothetical protein GOS_1896416 [marine metag...  55.8    4e-07
gb|EBL55586.1|  hypothetical protein GOS_8552148 [marine metag...  55.5    6e-07
gb|ECN81111.1|  hypothetical protein GOS_4432487 [marine metag...  55.5    7e-07
gb|EBK44380.1|  hypothetical protein GOS_8731870 [marine metag...  55.5    7e-07
gb|EDG38102.1|  hypothetical protein GOS_791688 [marine metage...  53.9    2e-06
gb|ECQ67652.1|  hypothetical protein GOS_4708443 [marine metag...  53.9    2e-06
gb|ECQ57176.1|  hypothetical protein GOS_5122772 [marine metag...  53.9    2e-06
gb|EBK34759.1|  hypothetical protein GOS_8747750 [marine metag...  53.5    2e-06
gb|EBN63093.1|  hypothetical protein GOS_8214358 [marine metag...  53.5    3e-06
gb|EDG31105.1|  hypothetical protein GOS_803888 [marine metage...  53.1    3e-06
gb|ECV71752.1|  hypothetical protein GOS_2847753 [marine metag...  52.4    5e-06
gb|EBN22498.1|  hypothetical protein GOS_8281441 [marine metag...  52.4    5e-06
gb|EBU05346.1|  hypothetical protein GOS_7173805 [marine metag...  52.0    7e-06
gb|ECF70709.1|  hypothetical protein GOS_5160956 [marine metag...  50.4    2e-05
gb|EBW26472.1|  hypothetical protein GOS_6774042 [marine metag...  50.1    3e-05
gb|ECD68331.1|  hypothetical protein GOS_5979347 [marine metag...  49.7    3e-05
gb|ECZ47740.1|  hypothetical protein GOS_2176829 [marine metag...  49.7    3e-05
gb|ECL71844.1|  hypothetical protein GOS_5772670 [marine metag...  49.7    3e-05
gb|EBD49035.1|  hypothetical protein GOS_9921088 [marine metag...  49.7    4e-05
gb|EDJ72197.1|  hypothetical protein GOS_1644589 [marine metag...  49.3    4e-05
gb|ECV25398.1|  hypothetical protein GOS_2934645 [marine metag...  48.5    7e-05
gb|ECL99192.1|  hypothetical protein GOS_4662701 [marine metag...  48.1    1e-04
gb|ECG75865.1|  hypothetical protein GOS_4462808 [marine metag...  48.1    1e-04
gb|ECJ39604.1|  hypothetical protein GOS_4485216 [marine metag...  47.8    1e-04
gb|EBD83094.1|  hypothetical protein GOS_9865265 [marine metag...  47.0    2e-04
gb|ECM50624.1|  hypothetical protein GOS_6159497 [marine metag...  46.6    3e-04
gb|EBV29125.1|  hypothetical protein GOS_6928663 [marine metag...  46.6    3e-04
gb|EBS24801.1|  hypothetical protein GOS_7467702 [marine metag...  46.2    4e-04
gb|ECF52662.1|  hypothetical protein GOS_5899309 [marine metag...  46.2    4e-04
gb|EBT88119.1|  hypothetical protein GOS_7200308 [marine metag...  45.4    6e-04
gb|EBB27693.1|  hypothetical protein GOS_260640 [marine metage...  45.4    7e-04
gb|ECY97721.1|  hypothetical protein GOS_2263227 [marine metag...  45.1    7e-04
gb|ECZ18801.1|  hypothetical protein GOS_2226749 [marine metag...  45.1    7e-04
gb|ECO05907.1|  hypothetical protein GOS_3486373 [marine metag...  42.7    0.004
gb|ECF67526.1|  hypothetical protein GOS_5284263 [marine metag...  40.4    0.021
gb|ECZ01833.1|  hypothetical protein GOS_2255888 [marine metag...  40.0    0.025
gb|ECK81821.1|  hypothetical protein GOS_5853425 [marine metag...  40.0    0.026
gb|EBD88874.1|  hypothetical protein GOS_9855929 [marine metag...  40.0    0.027
gb|ECP86490.1|  hypothetical protein GOS_4386120 [marine metag...  40.0    0.029
gb|ECK56061.1|  hypothetical protein GOS_3382865 [marine metag...  39.3    0.049
gb|ECL42175.1|  hypothetical protein GOS_3459065 [marine metag...  38.9    0.065
gb|ECI35764.1|  hypothetical protein GOS_5109482 [marine metag...  38.1    0.11 
gb|EBZ92223.1|  hypothetical protein GOS_3429998 [marine metag...  37.4    0.18 
gb|EBD46527.1|  hypothetical protein GOS_9925403 [marine metag...  37.4    0.19 
gb|EDC61070.1|  hypothetical protein GOS_1445911 [marine metag...  36.6    0.32 
gb|ECD84053.1|  hypothetical protein GOS_5320622 [marine metag...  35.4    0.75 
gb|EBY58334.1|  hypothetical protein GOS_5262896 [marine metag...  35.0    0.84 
gb|ECJ19139.1|  hypothetical protein GOS_5287540 [marine metag...  35.0    0.86 
gb|EDB04068.1|  hypothetical protein GOS_1892001 [marine metag...  35.0    0.87 
gb|EDH13074.1|  hypothetical protein GOS_660971 [marine metage...  34.7    1.1  
gb|EBT90634.1|  hypothetical protein GOS_7196504 [marine metag...  34.7    1.1  
gb|ECJ79464.1|  hypothetical protein GOS_6419768 [marine metag...  34.7    1.3  
gb|EDB63273.1|  hypothetical protein GOS_1621953 [marine metag...  33.9    2.1  
gb|EBX59412.1|  hypothetical protein GOS_6562991 [marine metag...  33.5    2.4  
gb|EBI24472.1|  hypothetical protein GOS_9125152 [marine metag...  33.5    2.8  
gb|ECY86065.1|  hypothetical protein GOS_2283901 [marine metag...  33.1    3.3  
gb|ECD72458.1|  hypothetical protein GOS_5812218 [marine metag...  32.7    3.8  
gb|ECP14843.1|  hypothetical protein GOS_6149478 [marine metag...  32.3    5.4  
gb|EBV23224.1|  hypothetical protein GOS_6937059 [marine metag...  32.3    5.4  
gb|EDG89090.1|  hypothetical protein GOS_703003 [marine metage...  32.3    5.6  
gb|EDF38260.1|  hypothetical protein GOS_964886 [marine metage...  31.6    8.6  
gb|ECM45020.1|  hypothetical protein GOS_6374403 [marine metag...  31.6    9.0  
gb|EBC12449.1|  hypothetical protein GOS_120273 [marine metage...  31.6    9.1 


>gb|EDJ72196.1|  hypothetical protein GOS_1644588 [marine metagenome]
Length=133

 Score =  261 bits (668),  Expect = 4e-69, Method: Compositional matrix adjust.
 Identities = 132/133 (99%), Positives = 133/133 (100%), Gaps = 0/133 (0%)

Query  2    LSMDQLKKILNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRVVSMIIPMVF  61
            +SMDQLKKILNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRVVSMIIPMVF
Sbjct  1    MSMDQLKKILNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRVVSMIIPMVF  60

Query  62   IALTFTKEIQAKMATVIAGPVWVMFVIWIVMDGFQTQFIPPVVLWGLLALSGVLHGNWQT  121
            IALTFTKEIQAKMATVIAGPVWVMFVIWIVMDGFQTQFIPPVVLWGLLALSGVLHGNWQT
Sbjct  61   IALTFTKEIQAKMATVIAGPVWVMFVIWIVMDGFQTQFIPPVVLWGLLALSGVLHGNWQT  120

Query  122  DEVALEATVGSEG  134
            DEVALEATVGSEG
Sbjct  121  DEVALEATVGSEG  133


>gb|EBH85890.1|  hypothetical protein GOS_9190304 [marine metagenome]
Length=133

 Score =  213 bits (542),  Expect = 2e-54, Method: Compositional matrix adjust.
 Identities = 102/121 (84%), Positives = 115/121 (95%), Gaps = 0/121 (0%)

Query  1    GLSMDQLKKILNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRVVSMIIPMV  60
             L MDQLKKILNPKIWL+VVA+GHTLATILPVLS+KGLNLD+TEVEYAVWRVVSMI+PMV
Sbjct  6    ALEMDQLKKILNPKIWLIVVALGHTLATILPVLSDKGLNLDDTEVEYAVWRVVSMILPMV  65

Query  61   FIALTFTKEIQAKMATVIAGPVWVMFVIWIVMDGFQTQFIPPVVLWGLLALSGVLHGNWQ  120
            FIALTFTKEIQAK+AT IAGP+WVMFV+W ++DGFQT FIPP+VLWGLLALSG+LHGN+Q
Sbjct  66   FIALTFTKEIQAKLATAIAGPIWVMFVVWNIVDGFQTLFIPPLVLWGLLALSGLLHGNFQ  125

Query  121  T  121
             
Sbjct  126  N  126


>gb|EBE25933.1|  hypothetical protein GOS_9793682 [marine metagenome]
Length=119

 Score =  209 bits (531),  Expect = 3e-53, Method: Compositional matrix adjust.
 Identities = 104/115 (90%), Positives = 109/115 (94%), Gaps = 0/115 (0%)

Query  11   LNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRVVSMIIPMVFIALTFTKEI  70
            LNPKIWLLVVA GHTLATILPVLS+KGLNLDETEVEYAVWRVVSMIIPMVFIALTFTKEI
Sbjct  1    LNPKIWLLVVAGGHTLATILPVLSDKGLNLDETEVEYAVWRVVSMIIPMVFIALTFTKEI  60

Query  71   QAKMATVIAGPVWVMFVIWIVMDGFQTQFIPPVVLWGLLALSGVLHGNWQTDEVA  125
            QAK+AT IAGPVWVMFV+W VMDGFQT  IPP+VLWGLLALSGVLHGNWQT+E A
Sbjct  61   QAKLATAIAGPVWVMFVVWTVMDGFQTLLIPPLVLWGLLALSGVLHGNWQTNEAA  115


>gb|ECG75866.1|  hypothetical protein GOS_4462809 [marine metagenome]
Length=125

 Score =  201 bits (510),  Expect = 1e-50, Method: Compositional matrix adjust.
 Identities = 94/119 (78%), Positives = 113/119 (94%), Gaps = 0/119 (0%)

Query  3    SMDQLKKILNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRVVSMIIPMVFI  62
            ++DQLKKILNP+IWLL+VAVGHTLATILPVLS+KG+N+++TEVEYAVW++VSMIIPMVFI
Sbjct  2    NLDQLKKILNPQIWLLIVAVGHTLATILPVLSDKGINMEDTEVEYAVWKIVSMIIPMVFI  61

Query  63   ALTFTKEIQAKMATVIAGPVWVMFVIWIVMDGFQTQFIPPVVLWGLLALSGVLHGNWQT  121
            A TF KEIQAK+AT IAGPVWVMFV+ I+M+GF+T FIPP+VLWGLLALSG+LHGN+Q 
Sbjct  62   AFTFNKEIQAKLATAIAGPVWVMFVVMILMEGFETLFIPPLVLWGLLALSGLLHGNFQN  120


>gb|EBU19997.1|  hypothetical protein GOS_7151055 [marine metagenome]
Length=145

 Score = 86.7 bits (213),  Expect = 2e-16, Method: Compositional matrix adjust.
 Identities = 53/136 (38%), Positives = 81/136 (59%), Gaps = 14/136 (10%)

Query  2    LSMDQLKKILNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRVVSMIIPMVF  61
            L MDQLKKI N K+WL+++AV H   T++ VL++   ++D       V+ V+S    + +
Sbjct  15   LEMDQLKKICNVKVWLILIAVIH---TVVGVLAQTDFSVDAEAEMAGVFLVISTY--LFY  69

Query  62   IALTFTKEIQAKMATVIAGPVWVMFVIWIVMD-----GF----QTQFIPPVVLWGLLALS  112
             A   T E QA++A V+AGP+WV F+I  + +     GF      + +PP+V WG+ ALS
Sbjct  70   AAFFTTGEAQARLAAVLAGPIWVWFMICALFELESGTGFVWELGPELLPPLVFWGMTALS  129

Query  113  GVLHGNWQTDEVALEA  128
            G+LHGN+     + EA
Sbjct  130  GLLHGNFHNLMSSQEA  145


>gb|EBY01539.1|  hypothetical protein GOS_6042715 [marine metagenome]
Length=134

 Score = 81.6 bits (200),  Expect = 8e-15, Method: Compositional matrix adjust.
 Identities = 51/128 (39%), Positives = 78/128 (60%), Gaps = 20/128 (15%)

Query  6    QLKKILNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRVVSMIIPMVFIALT  65
             +K ILNPK WL++  V H LA +L  L  +    +ET V  + W +++ +  M++ A  
Sbjct  10   DVKTILNPKAWLIITGVMHALAGVLAELDME----NETRVVISGWFLLTTV-TMLYAAF-  63

Query  66   FTKEIQ-AKMATVIAGPVWVMFVIWIVMDGFQTQF------------IPPVVLWGLLALS  112
            FT+ +Q A++ATVIAGP+WV F+I I   G+   +            IPP++LWG+ ALS
Sbjct  64   FTEGVQQARLATVIAGPIWVWFIICIA-QGYTLDYEGETWTFELGSNIPPLILWGMTALS  122

Query  113  GVLHGNWQ  120
            G++HGN+Q
Sbjct  123  GIIHGNFQ  130


>gb|EDF80096.1|  hypothetical protein GOS_891873 [marine metagenome]
Length=133

 Score = 74.7 bits (182),  Expect = 1e-12, Method: Compositional matrix adjust.
 Identities = 49/123 (39%), Positives = 74/123 (60%), Gaps = 18/123 (14%)

Query  10   ILNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRVVSMIIPMVFIALTFTKE  69
            ILNPKIWL+V  + H     + VLSE  +  +ET+V  + + +++    M++ A     E
Sbjct  13   ILNPKIWLIVTGLIHAG---VGVLSELDMG-NETQVVVSGFFLLTTF-TMLYAAFFTEGE  67

Query  70   IQAKMATVIAGPVWVMFVIWIVMDGFQTQF------------IPPVVLWGLLALSGVLHG  117
             QA++ATVIAGP+WV FVI  V  G+   +            IPP+++WG+ ALSG++HG
Sbjct  68   QQARLATVIAGPIWVWFVI-CVAQGYTLDYEGETFTFELNTSIPPLIIWGMTALSGIVHG  126

Query  118  NWQ  120
            N+Q
Sbjct  127  NFQ  129


>gb|ECY76417.1|  hypothetical protein GOS_2301336 [marine metagenome]
Length=128

 Score = 70.5 bits (171),  Expect = 2e-11, Method: Compositional matrix adjust.
 Identities = 44/127 (34%), Positives = 72/127 (56%), Gaps = 19/127 (14%)

Query  5    DQLKKILNPKIWLLVVAVGHT-LATILPVLSEKGLNLDETEVEYAVWRVVSMIIPMVFIA  63
            D   + LNPK WL+++A+GHT +  +LP+  +     D   +   ++ V +  + M++  
Sbjct  5    DWKTEALNPKWWLIILALGHTFMGVLLPMDPDN----DNELMVSGIFLVTT--VYMLYAI  58

Query  64   LTFTKEIQAKMATVIAGPVWVMFVIWIVMD------------GFQTQFIPPVVLWGLLAL  111
                 + QA++A VIAGPVWV FVI + ++            GF + F+PP++LWG+ AL
Sbjct  59   FLNEGQAQARLAAVIAGPVWVWFVICMALELEITYSTEPITWGFSSDFVPPMILWGMTAL  118

Query  112  SGVLHGN  118
            +GVL  N
Sbjct  119  TGVLGWN  125


>gb|EDG38170.1|  hypothetical protein GOS_791552 [marine metagenome]
Length=119

 Score = 68.9 bits (167),  Expect = 5e-11, Method: Compositional matrix adjust.
 Identities = 45/121 (37%), Positives = 69/121 (57%), Gaps = 18/121 (14%)

Query  11   LNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRVVSMIIPMVFIALTFTKEI  70
            L+P+IWL++VA+GHT+  +L   +      D+T    A W +++ +  +V+ AL    E 
Sbjct  4    LDPQIWLILVALGHTVPGVLMATNWA----DDTAKMVAGWMLLTSVT-LVYAALGMDGEE  58

Query  71   QAKMATVIAGPVWVMFVIWIVMDGFQTQF------------IPPVVLWGLLALSGVLHGN  118
            QA++A V+AGPVW+ FV+  +  G +                PPVVLWG+LALSG+L   
Sbjct  59   QARLAVVLAGPVWIWFVV-CITQGLEYTMGKEPITMNWKDNGPPVVLWGVLALSGLLGSG  117

Query  119  W  119
            W
Sbjct  118  W  118


>gb|EBG99739.1|  hypothetical protein GOS_9338418 [marine metagenome]
Length=130

 Score = 67.4 bits (163),  Expect = 1e-10, Method: Compositional matrix adjust.
 Identities = 44/129 (34%), Positives = 69/129 (53%), Gaps = 18/129 (13%)

Query  3    SMDQLKKILNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRVVSMIIPMVFI  62
             +D  K++LNPK WL+V A+GHT   +L  L       D   +   V+ V++  + M++ 
Sbjct  3    DLDYKKEVLNPKWWLIVNAIGHTFIGVLLPLDPDN---DNELMIAGVFLVIT--VYMLYA  57

Query  63   ALTFTKEIQAKMATVIAGPVWVMFVIWIVM--------DGFQTQF-----IPPVVLWGLL  109
            A   T   QA++A VIAGP+WV F++ I +        D     F     +PP++ WG+ 
Sbjct  58   AFLTTGRSQARLAAVIAGPIWVWFLVCIALGLEITYTTDAAAQSFDLADNVPPLIFWGMT  117

Query  110  ALSGVLHGN  118
            AL+G+L  N
Sbjct  118  ALTGLLGWN  126


>gb|EBD82573.1|  hypothetical protein GOS_9866050 [marine metagenome]
Length=130

 Score = 66.6 bits (161),  Expect = 3e-10, Method: Compositional matrix adjust.
 Identities = 45/129 (34%), Positives = 67/129 (51%), Gaps = 18/129 (13%)

Query  3    SMDQLKKILNPKIWLLVVAVGHTLATILPVLSEKGLNLDETEVEYAVWRVVSMIIPMVFI  62
             +D  K+ LNPK WL+V A+GHT   +L  L       D   +   V+ V++  + M++ 
Sbjct  3    ELDYKKEALNPKWWLIVNALGHTFIGVLLPLDPDN---DNELMIAGVFLVIT--VYMLYA  57

Query  63   ALTFTKEIQAKMATVIAGPVWVMFVIWIVM--------DGFQTQF-----IPPVVLWGLL  109
            A   T   QA++A VIAGP+WV F + I +        D     F     +PP++ WG+ 
Sbjct  58   AFLTTGRAQARLAAVIAGPIWVWFAVCIALGLEITYTTDAATQSFDLADNVPPLIFWGMT  117

Query  110  ALSGVLHGN  118
            AL+GVL  N
Sbjct  118  ALTGVLGWN  126


>gb|EDC90382.1|  hypothetical protein GOS_1394185 [marine metagenome]
Length=156

 Score = 65.5 bits (158),  Expect = 6e-10, Method: Compositional matrix adjust.
 Identities = 45/155 (29%), Positives = 77/155 (49%), Gaps = 29/155 (18%)

Query  2    LSMDQLKKILNPKIWLLVVAVGHTL-ATILPVLSEKGLNLDETEVEYAVWRVVSMIIPMV  60
            L MD L+KILNPK WLLVV + HT+  T++P+L+    + D  E + A +    ++I +V
Sbjct  8    LDMD-LRKILNPKFWLLVVLISHTIVGTLVPLLTT---DFDSVEFQAASY---GLVISVV  60

Query  61   FIALTFTKE--IQAKMATVIAGP--VWVMFVI-----------------WIVMDGFQTQF  99
              ++ F  E   QA++   IAG   +W++  +                 ++    F  + 
Sbjct  61   LASIYFMTEGQTQARLTATIAGATLIWILVTLVANPANNFDLSVNFEPPFLYKFSFDIEL  120

Query  100  IPPVVLWGLLALSGVLHGNWQTDEVALEATVGSEG  134
             PP+++W +L LSG+ + N +T     EA +    
Sbjct  121  APPIIVWAMLTLSGITYWNCETRSAVKEANLDDNA  155