GOS 1088020

From Metagenes
Warning: this metagenomic sequence has been carefully annotated by students during bioinformatics assignments. These quality annotations are therefore the result of a teaching exercise that you are most welcome to amend and extend if necessary!


Sequence
CAMERA AccNum : JCVI_READ_1091120566153
Annotathon code: GOS_1088020
Sample :
  • GPS :0°35'38s; 91°4'10w
  • Galapagos Islands: Mangrove on Isabella Island - Ecuador
  • Mangrove (-0.1m, 25.4°C, 0.1-0.8 microns)
Authors
Team : BioCell2009
Username : cinthom
Annotated on : 2010-01-21 18:31:34
  • BAGARRE Thomas
  • llas cynthia

Synopsis

Genomic Sequence

>JCVI_READ_1091120566153 GOS_1088020 Genomic DNA
CGACCCGCTTGCGACGACGACCGTGAAACGACTCCGGGAGACGAATGCCCGTTCGGGCGGACTGAACTTCATCAATCCGCGGTGTCCGCTCCACGGACCG
CTCGACTAACGATGCCGAGGCCTGATAGACCAGTTCATCCACGGAGACCAGCCGATCGCGACGCTTGCGATTGGCATTACTTAGCCGCCGTCGATAATCT
TGATCGGACAACAAGCATTTAACCTGGTCGGCGGCTTGCAACACGCGCCCCGGCTCGACGCCGCACGGGTCACCACTCATCAACCGTTCAACGGCCGGAT
TGGAGCGGAACCAAACGATGGGCTTCTCCGCCGCCAGGTACTCATTGATCGATGTGCTCCCCGAAGGCTCGCGATGTGTCATTAGCAGCACGTCGGCGAC
CCGCAGCAGGTCGCTCGCCTGCCCCTCATCTGCGGTCATCAAAACCCGATCTCCAAGTTCAGCGTCATGCAACTCCCTAACCAGCGTGGTTGTCTCGGAC
GCCGAATTATCAAGCGGTCCAACCCACAAGAATTGGACCGTTGGCTCGCAAGGGTCGGCGAGCGCGCAGCTCGCAATCGAAATGAACCGATCAAAGCCGC
CCGCCAGGTCCAGCCCTCCAGTACCAACCAGCAGCAAGCTATCGGGAGAGAGTCCCAGCTGTTGTTTGTTCAGCGCCGCATCTGTCACCGAAACGTGGTT
TGGCTCAGGACTGGAAACGCTTGCCAGTCGTTCGATGGTGTCGAATTCGGTATCGACAAAATGAAACTTCGCATCCGATGTAGTCAGGCTATTGGGCAAC
AGATGAGCGAACTTTGCCGGAAGTGAAACCGTTTGACAGACACTGGCGAAGTAGACAACTGATCGACGTCCTCGCCGCGCATCCGTGGGAGAATTGCCTG
AATCGGAATTCCCACTTCATCCAACCAATGGAGCAACTTTCCAGGATCATTGTCGAG

Translation

[42 - 956/957]   indirect strand
>GOS_1088020 Translation [42-956   indirect strand]
SGNSDSGNSPTDARRGRRSVVYFASVCQTVSLPAKFAHLLPNSLTTSDAKFHFVDTEFDTIERLASVSSPEPNHVSVTDAALNKQQLGLSPDSLLLVGTG
GLDLAGGFDRFISIASCALADPCEPTVQFLWVGPLDNSASETTTLVRELHDAELGDRVLMTADEGQASDLLRVADVLLMTHREPSGSTSINEYLAAEKPI
VWFRSNPAVERLMSGDPCGVEPGRVLQAADQVKCLLSDQDYRRRLSNANRKRRDRLVSVDELVYQASASLVERSVERTPRIDEVQSARTGIRLPESFHGR
RRKRV

[ Warning ] 5' incomplete: does not start with a Methionine
[ Warning ] 3' incomplete: following codon is not a STOP

Annotator commentaries

Nous avons choisit une ORF de 914 nucléotides qui ne comporte ni codon initiateur ni codon stop, donc incomplète en 3' et 5'. Ceci est confirmé par l'alignement multiple qui nous montre également l'existence de plusieurs gaps au sein de la séquence.

Notre ORf est un brin indirect (-3) codant pour un polypeptide de 305 acides aminés.


Après évaluation avec INTERPRO le domaine protéique associé à L'ORF correspond à domaine unintagreted : notre ORF ne possède aucun domaine protéique fonctionnel.


Après réalisation de plusieurs protocoles (BLASTP, BLASTX et BLASTP contre ENV_NR), le nombre d'homologues obtenus est insuffisant à la construction d'un arbre phylogénique.


Nous ne pouvons conclure quant à l'origine taxonomique de notre séquence provenant de la Mangrove on Isabella Island (Ecuador).

ORF finding

PROTOCOLE:


1) NCBI / copy and past ORF sequence / paramètre par defaut / ORF Find /

2) SMS / ORF Finder / copy and past ORF sequence / any codon / 1,2 and 3 / direct strand / 60 codons long / standard genetic code / submit

3) SMS / ORF Finder / copy and past ORF sequence / any codon / 1,2 and 3 / reverse strand / 60 codons long / standard genetic code / submit /




ANALYSE DES RÉSULTATS:

NCBI trouve 8 ORF alors que SMS en trouve 7.

Les réusltats devraient être identiques d'un site à l'autre.




HYPOTHÈSE : La différence doit être dû aux différents paramètres sur lesquels se basent NCBI et SMS:


. NCBI trouve une ORF de sens indirect 2 du 789ième nucléotide au 950ième nucléotide:


950 atgatcctggaaagttgctccattggttggatgaagtgggaattc

M I L E S C S I G W M K W E F

905 cgattcaggcaattctcccacggatgcgcggcgaggacgtcgatc

R F R Q F S H G C A A R T S I

860 agttgtctacttcgccagtgtctgtcaaacggtttcacttccggc

S C L L R Q C L S N G F T S G

815 aaagttcgctcatctgttgcccaatag 789



et une autre ORF de sens indirect 3:


802 ctgttgcccaatagcctgactacatcggatgcgaagtttcatttt

L L P N S L T T S D A K F H F

757 gtcgataccgaattcgacaccatcgaacgactggcaagcgtttcc

V D T E F D T I E R L A S V S

712 agtcctgagccaaaccacgtttcggtgacagatgcggcgctgaac

S P E P N H V S V T D A A L N

667 aaacaacagctgggactctctcccgatagcttgctgctggttggt

K Q Q L G L S P D S L L L V G

622 actggagggctggacctggcgggcggctttgatcggttcatttcg

T G G L D L A G G F D R F I S

577 attgcgagctgcgcgctcgccgacccttgcgagccaacggtccaa

I A S C A L A D P C E P T V Q

532 ttcttgtgggttggaccgcttgataattcggcgtccgagacaacc

F L W V G P L D N S A S E T T

487 acgctggttagggagttgcatgacgctgaacttggagatcgggtt

T L V R E L H D A E L G D R V

442 ttgatgaccgcagatgaggggcaggcgagcgacctgctgcgggtc

L M T A D E G Q A S D L L R V

397 gccgacgtgctgctaatgacacatcgcgagccttcggggagcaca

A D V L L M T H R E P S G S T

352 tcgatcaatgagtacctggcggcggagaagcccatcgtttggttc

S I N E Y L A A E K P I V W F

307 cgctccaatccggccgttgaacggttgatgagtggtgacccgtgc

R S N P A V E R L M S G D P C

262 ggcgtcgagccggggcgcgtgttgcaagccgccgaccaggttaaa

G V E P G R V L Q A A D Q V K

217 tgcttgttgtccgatcaagattatcgacggcggctaagtaatgcc

C L L S D Q D Y R R R L S N A

172 aatcgcaagcgtcgcgatcggctggtctccgtggatgaactggtc

N R K R R D R L V S V D E L V

127 tatcaggcctcggcatcgttagtcgagcggtccgtggagcggaca

Y Q A S A S L V E R S V E R T

82 ccgcggattgatgaagttcagtccgcccgaacgggcattcgtctc

P R I D E V Q S A R T G I R L

37 ccggagtcgtttcacggtcgtcgtcgcaagcgggtcg 1


. SMS ne trouve aucune ORF dans le sens indirect 2.



Ce dernier programme doit trouver plus optimal de coupler les deux ORF de NCBI citées ci-dessus étant toutes deux de sens indirect et de cadre de lecture différent. Mais cela reste une hypohèse.




On a donc deux ORF intéressantes :


1)NCBI : une ORF de 1 à 802 de sens 3 indirect

3)SMS : une ORF de 42 à 956 de sens 3 indirect



On effectue alors un blastx afin d'observer les séquences les plus proches de la séquence issue de Mangrove on Isabella Island (cf BLAST). Les séquences obtenues les plus homologues sont toutes de sens -3. Ce résultat est cohérent avec les ORF choisies ultérieurement.


Ne pouvant pas plus les distinguer, on effectue la suite de l'analyse avec l'ORF la plus longue obtenue avec le protocole 3).


On ne peut déterminer la présence de codon stop et de codon initiateur : cette ORF est incomplète en 3' et en 5'.







RÉSULTATS BRUTS:




______________________________________________________________________________________________________________________
                                                       1) avec NCBI :
_______________________________________________________________________________________________________________________


Length: 267 aa

    802 ctgttgcccaatagcctgactacatcggatgcgaagtttcatttt
        L  L  P  N  S  L  T  T  S  D  A  K  F  H  F 
    757 gtcgataccgaattcgacaccatcgaacgactggcaagcgtttcc
        V  D  T  E  F  D  T  I  E  R  L  A  S  V  S 
    712 agtcctgagccaaaccacgtttcggtgacagatgcggcgctgaac
        S  P  E  P  N  H  V  S  V  T  D  A  A  L  N 
    667 aaacaacagctgggactctctcccgatagcttgctgctggttggt
        K  Q  Q  L  G  L  S  P  D  S  L  L  L  V  G 
    622 actggagggctggacctggcgggcggctttgatcggttcatttcg
        T  G  G  L  D  L  A  G  G  F  D  R  F  I  S 
    577 attgcgagctgcgcgctcgccgacccttgcgagccaacggtccaa
        I  A  S  C  A  L  A  D  P  C  E  P  T  V  Q 
    532 ttcttgtgggttggaccgcttgataattcggcgtccgagacaacc
        F  L  W  V  G  P  L  D  N  S  A  S  E  T  T 
    487 acgctggttagggagttgcatgacgctgaacttggagatcgggtt
        T  L  V  R  E  L  H  D  A  E  L  G  D  R  V 
    442 ttgatgaccgcagatgaggggcaggcgagcgacctgctgcgggtc
        L  M  T  A  D  E  G  Q  A  S  D  L  L  R  V 
    397 gccgacgtgctgctaatgacacatcgcgagccttcggggagcaca
        A  D  V  L  L  M  T  H  R  E  P  S  G  S  T 
    352 tcgatcaatgagtacctggcggcggagaagcccatcgtttggttc
        S  I  N  E  Y  L  A  A  E  K  P  I  V  W  F 
    307 cgctccaatccggccgttgaacggttgatgagtggtgacccgtgc
        R  S  N  P  A  V  E  R  L  M  S  G  D  P  C 
    262 ggcgtcgagccggggcgcgtgttgcaagccgccgaccaggttaaa
        G  V  E  P  G  R  V  L  Q  A  A  D  Q  V  K 
    217 tgcttgttgtccgatcaagattatcgacggcggctaagtaatgcc
        C  L  L  S  D  Q  D  Y  R  R  R  L  S  N  A 
    172 aatcgcaagcgtcgcgatcggctggtctccgtggatgaactggtc
        N  R  K  R  R  D  R  L  V  S  V  D  E  L  V 
    127 tatcaggcctcggcatcgttagtcgagcggtccgtggagcggaca
        Y  Q  A  S  A  S  L  V  E  R  S  V  E  R  T 
     82 ccgcggattgatgaagttcagtccgcccgaacgggcattcgtctc
        P  R  I  D  E  V  Q  S  A  R  T  G  I  R  L 
     37 ccggagtcgtttcacggtcgtcgtcgcaagcgggtcg 1      
        P  E  S  F  H  G  R  R  R  K  R  V 

	
Frame		from		to	Length
-3		1	..	802	802 
+3		318	..	764	447
+2		467	..	826	360
-2		789	..	950	162
+1		814	..	956	144
+1		352	..	480	129
+1		112	..	222	111
+2		2	..	109	108





________________________________________________________________________________________________________________________
                                                    2) avec SMS sens direct
______________________________________________________________________________________________________________________



>ORF number 1 in reading frame 1 on the direct strand extends from base 1 to base 222.
CGACCCGCTTGCGACGACGACCGTGAAACGACTCCGGGAGACGAATGCCCGTTCGGGCGG
ACTGAACTTCATCAATCCGCGGTGTCCGCTCCACGGACCGCTCGACTAACGATGCCGAGG
CCTGATAGACCAGTTCATCCACGGAGACCAGCCGATCGCGACGCTTGCGATTGGCATTAC
TTAGCCGCCGTCGATAATCTTGATCGGACAACAAGCATTTAA

>Translation of ORF number 1 in reading frame 1 on the direct strand.
RPACDDDRETTPGDECPFGRTELHQSAVSAPRTARLTMPRPDRPVHPRRPADRDACDWHY
LAAVDNLDRTTSI*

>ORF number 2 in reading frame 1 on the direct strand extends from base 586 to base 783.
ACCGATCAAAGCCGCCCGCCAGGTCCAGCCCTCCAGTACCAACCAGCAGCAAGCTATCGG
GAGAGAGTCCCAGCTGTTGTTTGTTCAGCGCCGCATCTGTCACCGAAACGTGGTTTGGCT
CAGGACTGGAAACGCTTGCCAGTCGTTCGATGGTGTCGAATTCGGTATCGACAAAATGAA
ACTTCGCATCCGATGTAG

>Translation of ORF number 2 in reading frame 1 on the direct strand.
TDQSRPPGPALQYQPAASYRERVPAVVCSAPHLSPKRGLAQDWKRLPVVRWCRIRYRQNE
TSHPM*

>ORF number 1 in reading frame 2 on the direct strand extends from base 185 to base 385.
CCGCCGTCGATAATCTTGATCGGACAACAAGCATTTAACCTGGTCGGCGGCTTGCAACAC
GCGCCCCGGCTCGACGCCGCACGGGTCACCACTCATCAACCGTTCAACGGCCGGATTGGA
GCGGAACCAAACGATGGGCTTCTCCGCCGCCAGGTACTCATTGATCGATGTGCTCCCCGA
AGGCTCGCGATGTGTCATTAG

>Translation of ORF number 1 in reading frame 2 on the direct strand.
PPSIILIGQQAFNLVGGLQHAPRLDAARVTTHQPFNGRIGAEPNDGLLRRQVLIDRCAPR
RLAMCH*

>ORF number 2 in reading frame 2 on the direct strand extends from base 386 to base 826.
CAGCACGTCGGCGACCCGCAGCAGGTCGCTCGCCTGCCCCTCATCTGCGGTCATCAAAAC
CCGATCTCCAAGTTCAGCGTCATGCAACTCCCTAACCAGCGTGGTTGTCTCGGACGCCGA
ATTATCAAGCGGTCCAACCCACAAGAATTGGACCGTTGGCTCGCAAGGGTCGGCGAGCGC
GCAGCTCGCAATCGAAATGAACCGATCAAAGCCGCCCGCCAGGTCCAGCCCTCCAGTACC
AACCAGCAGCAAGCTATCGGGAGAGAGTCCCAGCTGTTGTTTGTTCAGCGCCGCATCTGT
CACCGAAACGTGGTTTGGCTCAGGACTGGAAACGCTTGCCAGTCGTTCGATGGTGTCGAA
TTCGGTATCGACAAAATGAAACTTCGCATCCGATGTAGTCAGGCTATTGGGCAACAGATG
AGCGAACTTTGCCGGAAGTGA

>Translation of ORF number 2 in reading frame 2 on the direct strand.
QHVGDPQQVARLPLICGHQNPISKFSVMQLPNQRGCLGRRIIKRSNPQELDRWLARVGER
AARNRNEPIKAARQVQPSSTNQQQAIGRESQLLFVQRRICHRNVVWLRTGNACQSFDGVE
FGIDKMKLRIRCSQAIGQQMSELCRK*

>ORF number 1 in reading frame 3 on the direct strand extends from base 204 to base 764.
TCGGACAACAAGCATTTAACCTGGTCGGCGGCTTGCAACACGCGCCCCGGCTCGACGCCG
CACGGGTCACCACTCATCAACCGTTCAACGGCCGGATTGGAGCGGAACCAAACGATGGGC
TTCTCCGCCGCCAGGTACTCATTGATCGATGTGCTCCCCGAAGGCTCGCGATGTGTCATT
AGCAGCACGTCGGCGACCCGCAGCAGGTCGCTCGCCTGCCCCTCATCTGCGGTCATCAAA
ACCCGATCTCCAAGTTCAGCGTCATGCAACTCCCTAACCAGCGTGGTTGTCTCGGACGCC
GAATTATCAAGCGGTCCAACCCACAAGAATTGGACCGTTGGCTCGCAAGGGTCGGCGAGC
GCGCAGCTCGCAATCGAAATGAACCGATCAAAGCCGCCCGCCAGGTCCAGCCCTCCAGTA
CCAACCAGCAGCAAGCTATCGGGAGAGAGTCCCAGCTGTTGTTTGTTCAGCGCCGCATCT
GTCACCGAAACGTGGTTTGGCTCAGGACTGGAAACGCTTGCCAGTCGTTCGATGGTGTCG
AATTCGGTATCGACAAAATGA

>Translation of ORF number 1 in reading frame 3 on the direct strand.
SDNKHLTWSAACNTRPGSTPHGSPLINRSTAGLERNQTMGFSAARYSLIDVLPEGSRCVI
SSTSATRSRSLACPSSAVIKTRSPSSASCNSLTSVVVSDAELSSGPTHKNWTVGSQGSAS
AQLAIEMNRSKPPARSSPPVPTSSKLSGESPSCCLFSAASVTETWFGSGLETLASRSMVS
NSVSTK*





_______________________________________________________________________________________________________________________
                                                 3) avec SMS sens indirect
________________________________________________________________________________________________________________________



>ORF number 1 in reading frame 1 on the reverse strand extends from base 289 to base 519.
ACAAACAACAGCTGGGACTCTCTCCCGATAGCTTGCTGCTGGTTGGTACTGGAGGGCTGG
ACCTGGCGGGCGGCTTTGATCGGTTCATTTCGATTGCGAGCTGCGCGCTCGCCGACCCTT
GCGAGCCAACGGTCCAATTCTTGTGGGTTGGACCGCTTGATAATTCGGCGTCCGAGACAA
CCACGCTGGTTAGGGAGTTGCATGACGCTGAACTTGGAGATCGGGTTTTGA

>Translation of ORF number 1 in reading frame 1 on the reverse strand.
TNNSWDSLPIACCWLVLEGWTWRAALIGSFRLRAARSPTLASQRSNSCGLDRLIIRRPRQ
PRWLGSCMTLNLEIGF*

No ORFs were found in reading frame 2.

>ORF number 1 in reading frame 3 on the reverse strand extends from base 42 to base 956.
AGTGGGAATTCCGATTCAGGCAATTCTCCCACGGATGCGCGGCGAGGACGTCGATCAGTT
GTCTACTTCGCCAGTGTCTGTCAAACGGTTTCACTTCCGGCAAAGTTCGCTCATCTGTTG
CCCAATAGCCTGACTACATCGGATGCGAAGTTTCATTTTGTCGATACCGAATTCGACACC
ATCGAACGACTGGCAAGCGTTTCCAGTCCTGAGCCAAACCACGTTTCGGTGACAGATGCG
GCGCTGAACAAACAACAGCTGGGACTCTCTCCCGATAGCTTGCTGCTGGTTGGTACTGGA
GGGCTGGACCTGGCGGGCGGCTTTGATCGGTTCATTTCGATTGCGAGCTGCGCGCTCGCC
GACCCTTGCGAGCCAACGGTCCAATTCTTGTGGGTTGGACCGCTTGATAATTCGGCGTCC
GAGACAACCACGCTGGTTAGGGAGTTGCATGACGCTGAACTTGGAGATCGGGTTTTGATG
ACCGCAGATGAGGGGCAGGCGAGCGACCTGCTGCGGGTCGCCGACGTGCTGCTAATGACA
CATCGCGAGCCTTCGGGGAGCACATCGATCAATGAGTACCTGGCGGCGGAGAAGCCCATC
GTTTGGTTCCGCTCCAATCCGGCCGTTGAACGGTTGATGAGTGGTGACCCGTGCGGCGTC
GAGCCGGGGCGCGTGTTGCAAGCCGCCGACCAGGTTAAATGCTTGTTGTCCGATCAAGAT
TATCGACGGCGGCTAAGTAATGCCAATCGCAAGCGTCGCGATCGGCTGGTCTCCGTGGAT
GAACTGGTCTATCAGGCCTCGGCATCGTTAGTCGAGCGGTCCGTGGAGCGGACACCGCGG
ATTGATGAAGTTCAGTCCGCCCGAACGGGCATTCGTCTCCCGGAGTCGTTTCACGGTCGT
CGTCGCAAGCGGGTC

>Translation of ORF number 1 in reading frame 3 on the reverse strand.
SGNSDSGNSPTDARRGRRSVVYFASVCQTVSLPAKFAHLLPNSLTTSDAKFHFVDTEFDT
IERLASVSSPEPNHVSVTDAALNKQQLGLSPDSLLLVGTGGLDLAGGFDRFISIASCALA
DPCEPTVQFLWVGPLDNSASETTTLVRELHDAELGDRVLMTADEGQASDLLRVADVLLMT
HREPSGSTSINEYLAAEKPIVWFRSNPAVERLMSGDPCGVEPGRVLQAADQVKCLLSDQD
YRRRLSNANRKRRDRLVSVDELVYQASASLVERSVERTPRIDEVQSARTGIRLPESFHGR
RRKRV



Multiple Alignement

PROTOCOLE:


EBI / clustalW / copier/coller séquences homologues / result / view alignement file / copier/coller



ANALYSE DES RÉSULTATS:


On réalise ici un alignement multiple avec 4 séquences homologues, ce qui n'assure pas la fiabilité des résultats.



Des régions identiques aux cinq séquences sont identifiées : par exemple, en position 196 et 198 de l'ORF se trouvent respectivement E (acide glutamique) et P (proline) mais de nombreux gaps sont également observés : en position 33, 47, 76... On compte au total 5 gaps de plus de 10 aa : l'ORF a donc des acides aminés manquant.


On remarque que les séquences Firmicut1 et Firmicut2 couvrent de part et d'autre l'ORF . On en déduit qu'elle n'est pas complète en 3' et en 5'.


Ces remarques confirment que l'on ne peut continuer l'analyse de GOS_1088020 ADN génomique (Galapagos Islands: Mangrove on Isabella Island).


RÉSULTATS BRUTS:


Etiquetage des séquences :


>Firmicut1 Ruminococcus gnavus ATCC 29149/Firmicutes/ref|ZP_02042715.1| gi|154505977|ref|ZP_02042715.1| hypothetical protein RUMGNA_03519 [Ruminococcus gnavus ATCC 29149] 
MKRDGISIENEACRSYVNMEQTGNMWELQEQDPQLHVYFTESVMGVRLSFFLCIEGIEALNAVLYYKGKGEVFSEKQAYH
FPLYLNRHIEKEIYFPYPAEAVRLDLSDQNVVVKIEKLEIASASSENALNALKENMGQLGRKEKIAVLTHDMSNTGAPLL
AYHIAEKLKENGRDVVVLAANHGDGFLEEKYAEKEIPIFYLHDNARNQISVGWCGTKGESKNWEAEEYLEHLMQLLREEG
FCRVITNTVVSGKYVPLLKQYGFFIVSLIHEMKTTIELYGFLSPGESIAYYSDYIVFPDYSVQKGFETLFPVIRGKCSIR
PQGVYMQVDEEGDSRFSFEEYGFSLDQKIVMCSGTCELRKGTDLFVSAAQILSRKEKTEEVHFVWTGKFSDPILEGWIYN
QIRQSNLEGKVHFIPFIKDKQKYHELLRHADVFWLTSREDPFPSVVLEAMKYEIPVVAFKNSGGVNTMLDEGRGNLISEF
DVEEMAKVTKQLLSERREEVLTGAKAWIEDKLKFDDYIRYLEQLFQNKVQIVPEIDLYEIITPKLHEYFQGEVSSQIEKE
RIHQIQIMKDSKKFKFSDDKIVFFDIADRTNYFEDKMVMKESGEICRDLFLDRPLICVPPYYCKKSAGELTGSRKILCGA
NVLSKQMEKSGQLLLLEQLMEYQGIYLMGAGVCDFTEGDQISDYTQNVLHFLLSSKGVHAVRDEKTRYFLEQIGIHNVIR
TGCPSLWSLTAQRCKKIPKEKGKEVLVVLEEKKDCREEDSRMLNMLREEYETVYIWIKDRKNLDYLQKLGDLNKYKLLPG
SISTLKKIFCETAELDYIGTNIQIGIQSLTMLKRSLILSDTDYAQALREEMHLPVLGRSMQKEVEAWITAPYETRVLLPE
ENIKYWKSQFVRKFARWYKKR

>Bacteroi1 Bacteroides caccae ATCC 43185/Bacteroidetes/ref|ZP_01959102.1| gi|153806434|ref|ZP_01959102.1| hypothetical protein BACCAC_00698 [Bacteroides caccae ATCC 43185] 
MKKKLIFICCEMGKGGVSKSLASLLNTIDYDVYNVDLFLFSRQGLFLDQVPSNVNILKETTTLRELIFTFKFIAAFKRLL
SIVLCKRITSLEKRWRLFWKLNKKSFRPNSKKYDCAISYNDGVELYYMVDCLCANVKIAWNHTNYTNSFTYKPTLDKFYY
DRVNYIVTISEECAYTLKRVFPENADKVRIIENIVLKDTLMSLAEVENPYDSYSIDRDTTVICTVAGLYVRKGFDYASVA
LGNLKKEGIKFLWFIVGGGPEENEIKDLVRNNNIENETFFLHQQSNPYKFVKWADIFLLTSHAEGKSIAIEEAKLLEKPI
LITHFASAFDQIESGKSGLVAEMTNDSVTEKLRMLIGDKDLQTNLSNYLKNNTHSNRDKNINALYSLINS

>Bacteroi2 Dyadobacter/Bacteroidetes/ref|YP_003087157.1| gi|255036536|ref|YP_003087157.1| glycosyl transferase group 1 [Dyadobacter fermentans DSM 18053] 
MEQKKILFVSHDANRAGSQLLLLQLLRLLKERGVPMHLLLCNGGELESEFEEVVGVTRLYHKKITTPPLTGKILRKTNLL
KMYEERSLQKGNERILAELEQQNIGLIFINSIANAEVYYDFLRPFHQLPLVLFVHELAMSVKIYTQEKQLAYLLKKTDHL
IAVSNAVADYYIRKYDFPGAHVSTFTLIDHEHIDQRLAAVQHDILEKTYKVPEDAIVIGGCGNAEWRKGNDIFNWIASRV
IRKTQPLPVYFVWVGAGPQHEIYELIASDIRQMGLSDKIILIPPTPRALDYINRFDVLLLSSREDPYPLVVMEAALQEIP
VVCFEDAGGAPELIEADAGFVVPYMDISAASDAIIQLILDPSLRNTMGQNARRKVLERHNTDKSVASVEAIIQKYLPLQV
SEQHNQ

>Firmicut2 Clostridium scindens ATCC 35704/Firmicutes/ref|ZP_02429964.1| gi|167757837|ref|ZP_02429964.1| hypothetical protein CLOSCI_00168 [Clostridium scindens ATCC 35704] 
MEEVKETLIIKTGKIVKTKGLSVEGKKWIFEDNDPQIYIRFDNAVYGIRITGNCYQADFKQSLSTVYYKGLEDCFTETKS
CRFLFEPDNQKVREIRFGFPINEIRFDPVEISGVCFVDGIYIEPIPKSYLIEDSLARRIEQERHRDKVIIVTHDLSATGA
PILACHIAKKMKQNKVEVVVLAGKLGNGYLEEQYKKWNIPVICLDNTQNEEFEYINIKNNSKKCGLNEREFLENLFRMLR
RQGFMTVITNTIVSGHYVELLKDYNFKIISLIHEMRASIELYGFVEPGRKIAENADYIVFPNQYVENDFRKIYPEIKGNT
LIKAQGVYLENVEEDANFNLEEYGINLNDLVIMSSGTCELRKGVDLFVNAALIFIDRNREKDVHFIWTGNFNNKELQCWI
INQLERSGSQRKIQFIPFIKEAAKYKTLLSRADAFWAMSREDPFPSTVLEAMKNEVPVVGFNGTGGIQVMLSNNRGILID
GFNLEKVVECTEKLVEQNEKNKVLIEEAKKYVDEMEFDSYVKFLRDFLSKPVKINPKLDIYKWSEKTCHYYDRLEETDCT
LKRKQIEWDRCKLLFKRKEISKEDMVYLDSAIATSDVEDEIIMDYCTNICEEVFPTVRKKHIPIRVNDEKFEDIEGFLKI
LCGNNLLSTRLEKSKQWLLPNNIWNYRNTCLLGAGISHFDTETSFSDFTKSFLRFILQSKYYHSVCDEQTKEILHSMGIK
NVINTSFPSMWKLTPDFCARIPVNKAENVLTTISGRPENVENDLFMLKILKENYKKVYIWIQKQFDYQYIRREMKLDDYT
IIPPSLSELDKILLIKDMDYIGARFHIAVRNLNYGHRSLIIGGDDNAKGIISETNLPVLQEHKIKSDLQDIIYQKWGVNK
IVIPLDKIKEWKDQFEL

>ORF 
SGNSDSGNSPTDARRGRRSVVYFASVCQTVSLPAKFAHLLPNSLTTSDAKFHFVDTEFDT
IERLASVSSPEPNHVSVTDAALNKQQLGLSPDSLLLVGTGGLDLAGGFDRFISIASCALA
DPCEPTVQFLWVGPLDNSASETTTLVRELHDAELGDRVLMTADEGQASDLLRVADVLLMT
HREPSGSTSINEYLAAEKPIVWFRSNPAVERLMSGDPCGVEPGRVLQAADQVKCLLSDQD
YRRRLSNANRKRRDRLVSVDELVYQASASLVERSVERTPRIDEVQSARTGIRLPESFHGR
RRKRV



Alignement multiple :



CLUSTAL 2.0.12 multiple sequence alignment


Firmicut1       --MKRDGISIENEACRSYVNMEQTGNMWELQEQDPQLHVYFTESVMGVRLSFFLCIEGIE 58
Firmicut2       MEEVKETLIIKTGKIVKTKGLSVEGKKWIFEDNDPQIYIRFDNAVYGIRITGNCYQADFK 60
Bacteroi2       ------------------------------------------------------------
Bacteroi1       ------------------------------------------------------------
ORF             ------------------------------------------------------------
                                                                            

Firmicut1       ALNAVLYYKGKGEVFSEKQAYHFPLYLNRHIEKEIYFPYPAEAVRLDLSDQNVVVKIEKL 118
Firmicut2       QSLSTVYYKGLEDCFTETKSCRFLFEPDNQKVREIRFGFPINEIRFDPVEISGVCFVDGI 120
Bacteroi2       ------------------------------------------------------------
Bacteroi1       ------------------------------------------------------------
ORF             ------------------------------------------------------------
                                                                            

Firmicut1       EIASASSENALN-ALKENMGQLGRKEKIAVLTHDMSNTGAPLLAYHIAEKLKENGRDVVV 177
Firmicut2       YIEPIPKSYLIEDSLARRIEQERHRDKVIIVTHDLSATGAPILACHIAKKMKQNKVEVVV 180
Bacteroi2       ----------------------MEQKKILFVSHDANRAGSQLLLLQLLRLLKERGVPMHL 38
Bacteroi1       -----------------------MKKKLIFICCEMGKGGVSKSLASLLNTIDYDVYNVDL 37
ORF             -----------------------------------------------SGNSDSGNSPTDA 13
                                                                   .        

Firmicut1       LAANHGDGFLEEKYAEKEIP--IFYLHDNARNQISVGWCGTKGESKNWEAEEYLEH---- 231
Firmicut2       LAGKLGNGYLEEQYKKWNIP--VICLDNTQNEEFEYINIKNNSKKCGLNEREFLEN---- 234
Bacteroi2       LLCNGGE--LESEFEEVVGV--TRLYHKKITTPPLTGKILRKTNLLKMYEERSLQKGNER 94
Bacteroi1       FLFSRQGLFLDQVPSNVNILKETTTLRELIFTFKFIAAFKRLLSIVLCKRITSLEKR-WR 96
ORF             RRGRRSVVYFASVCQTVSLP---------------------------------------A 34
                         : .                                                

Firmicut1       LMQLLREEGFCRVITNTVVSG-KYVPLLKQYG-FFIVSLIHEMKTTIELYG-----FLSP 284
Firmicut2       LFRMLRRQGFMTVITNTIVSG-HYVELLKDYN-FKIISLIHEMRASIELYG-----FVEP 287
Bacteroi2       ILAELEQQNIGLIFINSIANAEVYYDFLRPFHQLPLVLFVHELAMSVKIYT-----QEKQ 149
Bacteroi1       LFWKLNKKSFRPNSK-KYDCAISYNDGVELYYMVDCLCANVKIAWNHTNYTNSFTYKPTL 155
ORF             KFAHLLPNSLTTS----------------------------------------------- 47
                 :  *  :.:                                                  

Firmicut1       GESIAYYSDYIVFPDYSVQKGFETLFPVIRGKC---SIRPQGVYMQVDEEGDSRFSFEEY 341
Firmicut2       GRKIAENADYIVFPNQYVENDFRKIYPEIKGNT---LIKAQGVYLEN-VEEDANFNLEEY 343
Bacteroi2       LAYLLKKTDHLIAVSNAVADYYIRKYDFPGAHVSTFTLIDHEHIDQRLAAVQHDILEKTY 209
Bacteroi1       DKFYYDRVNYIVTISEECAYTLKRVFPENADKVR--IIENIVLKDTLMSLAEVENPYDSY 213
ORF             -----DAKFHFVDTEFDTIERLASVSSPEPNHVS---------------VTDAALNKQQL 87
                         :::  .                :                   :     .  

Firmicut1       GFSLDQKIVMCSGTCELRKGTDLFVSAAQILSRKEKTEEVHFVWTG-KFSDPILEGWIYN 400
Firmicut2       GINLNDLVIMSSGTCELRKGVDLFVNAALIFIDRNREKDVHFIWTG-NFNNKELQCWIIN 402
Bacteroi2       KVPEDAIVIGGCGNAEWRKGNDIFNWIASRVIRKTQPLPVYFVWVG-AGPQHEIYELIAS 268
Bacteroi1       SIDRDTTVICTVAGLYVRKG----FDYASVALGNLKKEGIKFLWFI-VGGGP-EENEIKD 267
ORF             GLSPDSLLLVGTGGLDLAGGFDRFISIASCALADPCEPTVQFLWVGPLDNSASETTTLVR 147
                 .  :  ::   .      *       *           : *:*             :  

Firmicut1       QIRQSNLEGKVHFIPFIKDKQKYHELLRHADVFWLTSREDPFPSVVLEAMKYEIPVVAFK 460
Firmicut2       QLERSGSQRKIQFIPFIKEAAKYKTLLSRADAFWAMSREDPFPSTVLEAMKNEVPVVGFN 462
Bacteroi2       DIRQMGLSDKIILIPPTPRALDY---INRFDVLLLSSREDPYPLVVMEAALQEIPVVCFE 325
Bacteroi1       LVRNNNIENETFFLHQQSNPYKF---VKWADIFLLTSHAEGKSIAIEEAKLLEKPILITH 324
ORF             ELHDAELGDRVLMTADEGQASDL---LRVADVLLMTHREPSGSTSINEYLAAEKPIVWFR 204
                 :.      .  :        .    :   * :    :    .  : *    * *::  .

Firmicut1       NSGGVNTMLDEGRGNLISEFDVEEMAKVTKQLLSER--REEVLTGAKAWIEDKLKFDDYI 518
Firmicut2       GTGGIQVMLSNNRGILIDGFNLEKVVECTEKLVEQNE-KNKVLIEEAKKYVDEMEFDSYV 521
Bacteroi2       DAGGAPELIEADAGFVVPYMDISAASDAIIQLILDPSLRNTMGQNARRKVLERHNTDKSV 385
Bacteroi1       FASAFDQIESGKSG-LVAEMTNDSVTEKLRMLIGDKD--LQTNLSNYLKNNTHSNRDKNI 381
ORF             SNPAVERLMSGDPCGVEPGRVLQAADQVKCLLSDQDYRRRLSNANRKRRDRLVSVDELVY 264
                   .   : .     :      .   .    *  :                     :   

Firmicut1       RYLEQLFQNKVQIVPEIDLYEIITPKLHEYFQGEVS-SQIEKERIHQIQIMKDSKKFKFS 577
Firmicut2       KFLRDFLSKPVKINPKLDIYKWSEKTCHYYDRLEETDCTLKRKQIEWDRCKLLFKRKEIS 581
Bacteroi2       ASVEAIIQKYLPLQVSEQHNQ--------------------------------------- 406
Bacteroi1       NALYSLINS--------------------------------------------------- 390
ORF             QASASLVERSVERTPRIDEVQSARTG----IRLPESFHGRRRKRV--------------- 305
                     :..                                                    

Firmicut1       DDKIVFFDIADRTNYFEDKMVMKESGEICRDLFLDRPLICVPPYYCKKSAGELTGSRKIL 637
Firmicut2       KEDMVYLDSAIATSDVEDEIIMDYCTNICEEVFPTVRKKHIPIRVNDEKFEDIEGFLKIL 641
Bacteroi2       ------------------------------------------------------------
Bacteroi1       ------------------------------------------------------------
ORF             ------------------------------------------------------------
                                                                            

Firmicut1       CGANVLSKQMEKSGQLLLLEQLMEYQGIYLMGAGVCDFTEGDQISDYTQNVLHFLLSSKG 697
Firmicut2       CGNNLLSTRLEKSKQWLLPNNIWNYRNTCLLGAGISHFDTETSFSDFTKSFLRFILQSKY 701
Bacteroi2       ------------------------------------------------------------
Bacteroi1       ------------------------------------------------------------
ORF             ------------------------------------------------------------
                                                                            

Firmicut1       VHAVRDEKTRYFLEQIGIHNVIRTGCPSLWSLTAQRCKKIPKEKGKEVLVVLEEKKDCRE 757
Firmicut2       YHSVCDEQTKEILHSMGIKNVINTSFPSMWKLTPDFCARIPVNKAENVLTTISGRPENVE 761
Bacteroi2       ------------------------------------------------------------
Bacteroi1       ------------------------------------------------------------
ORF             ------------------------------------------------------------
                                                                            

Firmicut1       EDSRMLNMLREEYETVYIWIKDRKNLDYLQKLGDLNKYKLLPGSISTLKKIFCETAELDY 817
Firmicut2       NDLFMLKILKENYKKVYIWIQKQFDYQYIRREMKLDDYTIIPPSLSELDKILLIK-DMDY 820
Bacteroi2       ------------------------------------------------------------
Bacteroi1       ------------------------------------------------------------
ORF             ------------------------------------------------------------
                                                                            

Firmicut1       IGTNIQIGIQSLTMLKRSLILSDTDYAQALREEMHLPVLGR-SMQKEVEAWITAPYET-R 875
Firmicut2       IGARFHIAVRNLNYGHRSLIIGGDDNAKGIISETNLPVLQEHKIKSDLQDIIYQKWGVNK 880
Bacteroi2       ------------------------------------------------------------
Bacteroi1       ------------------------------------------------------------
ORF             ------------------------------------------------------------
                                                                            

Firmicut1       VLLPEENIKYWKSQFVRKFARWYKKR 901
Firmicut2       IVIPLDKIKEWKDQFEL--------- 897
Bacteroi2       --------------------------
Bacteroi1       --------------------------
ORF

Protein Domains

PROTOCOLE:


1) INTERPRO / copy and past protein sequence / paramètre par defaut / submit job



ANALYSE DES RÉSULTATS:


On obtient un seul résultat.

Le domaine étant ununtegrated, on ne considère pas que notre ORF possède un domaine proteique fonctionnel.

RÉSULTATS BRUTS:

GOS_1088020	917852CB52301BEE	305	superfamily	SSF53756	UDP-Glycosyltransferase/glycogen phosphorylase	84	268	1e-09	T	17-Nov-2009	NULL	NULL	

Phylogeny

PROTOCOLE:






ANALYSE DES RÉSULTATS:

RÉSULTATS BRUTS:



Taxonomy report

PROTOCOLE:


ANALYSE DES RESULTATS:

RÉSULTATS BRUTS:




BLAST

PROTOCOLE:



1)NCBI / blastp / copy and past ORF sequence / database "nr" / BLAST /

a. copy and past la list complete des hits /

b. copy and paste homologue alignements


2)NCBI / blastp / copy and past ORF sequence / database "swissprot" / BLAST /

a. copy and past la list complete des hits /

b. copy and paste homologue alignements


3) NCBI / blastx / copy and past ORF sequence / paramètre par defaut / BLAST /

a. copy and past la list complete des hits /

b. copy and paste homologue alignements/


4) NCBI / blastx / copy and past ORF sequence / database "swissprot" / BLAST /

a. copy and past la list complete des hits /

b. copy and paste homologue alignements


5)NCBI / blastp / copy and past ORF sequence / database "env_nr" / BLAST /

a. copy and past la list complete des hits /

b. copy and paste homologue alignements /







ANALYSE DES RÉSULTATS:



La zone des séquences, dont la e-value est comprise entre 1e-10 et 1e-03, est une zone considérée comme "floue" car il y est difficile de determiner un score seuil.



BLASTP DATABASE NR

Dans notre cas, il existe 10 séquences dont la e-value est inférieure à 1e-03 :


ref|ZP_02042715.1| hypothetical protein RUMGNA_03519 [Ruminoc... 55.1 1e-05

ref|ZP_01959102.1| hypothetical protein BACCAC_00698 [Bactero... 52.4 6e-05

ref|YP_003087157.1| glycosyl transferase group 1 [Dyadobacter... 48.9 6e-04 Gene info

ref|ZP_02429964.1| hypothetical protein CLOSCI_00168 [Clostri... 48.5 8e-04

ref|YP_722826.1| glycosyl transferase family protein [Trichod... 48.1 0.001 Gene info

ref|ZP_01468902.1| hypothetical protein BL107_05779 [Synechoc... 47.4 0.002

ref|ZP_01092016.1| glycosyl transferase, group 1 [Blastopirel... 47.0 0.003

ref|ZP_05737044.1| glycosyl transferase CpoA [Granulicatella ... 46.2 0.004

ref|ZP_01892940.1| Glycosyltransferase [Marinobacter algicola... 45.8 0.006

ref|YP_001233756.1| glycosyl transferase, group 1 [Acidiphili... 45.4 0.007 Gene info


Au delà de cette valeur, les séquences ne sont plus considérées comme homologues.


Une analyse des alignements deux à deux peut nous permettre d'affiner notre sélection des séquences homologues.

Pour cela, cinq critères sont à évaluer : .l'alignement

.le % d'identité

.la e-value

.la proportion de l'alignement de query sur subject

.la position



1.b) exemple du premier alignement deux à deux :



>ref|ZP_02042715.1| hypothetical protein RUMGNA_03519 [Ruminococcus gnavus ATCC 29149]

gb|EDN75896.1| hypothetical protein RUMGNA_03519 [Ruminococcus gnavus ATCC 29149]

Length_901


Score _ 55.1 bits (131), Expect _ 1e-05, Method: Compositional matrix adjust.

Identities _ 58/227 (25%), Positives _ 98/227 (43%), Gaps _ 14/227 (6%)


Query 79 DAALNKQQLGLSPDSLLLVGTGGLDLAGGFDRFISIASCALADPCEPTVQFLWVGPLDNS 138

D+ + ++ G S D +++ +G +L G D F+S A V F+W G +

Sbjct 333 DSRFSFEEYGFSLDQKIVMCSGTCELRKGTDLFVSAAQILSRKEKTEEVHFVWTGKFSDP 392


Query 139 ASETTTLVRELHDAELGDRVL---MTADEGQASDLLRVADVLLMTHREPSGSTSINEYLA 195

E + ++ + L +V D+ + +LLR ADV +T RE + + E +

Sbjct 393 ILE-GWIYNQIRQSNLEGKVHFIPFIKDKQKYHELLRHADVFWLTSREDPFPSVVLEAMK 451


Query 196 AEKPIVWFRSNPAVERLM-SGDPCGVEPGRVLQAADQVKCLLSDQDYRRR--LSNANRKR 252

E P+V F+++ V ++ G + V + A K LLS+ RR L+ A

Sbjct 452 YEIPVVAFKNSGGVNTMLDEGRGNLISEFDVEEMAKVTKQLLSE---RREEVLTGAKAWI 508


Query 253 RDRLVSVDELVYQASASLVERSVERTPRIDEVQSARTGIRLPESFHG 299

D+L D + Y L + V+ P ID + +L E F G

Sbjct 509 EDKLKFDDYIRY--LEQLFQNKVQIVPEIDLYEIITP--KLHEYFQG 551



. Nous avons vu précédemment que notre ORF, de 305 aa, est incomplète en 3' et en 5'. On suppose alors que notre ORF continue de part et d'autre de l'alignement avec subject ce qui appuie l'éventualité que cette séquence subject est homologue.


. Les pourcentages sont de 25% d'identité (acides aminés identiques) et 43% "positives" (d'acides aminés proches).Ce qui couvre 68% de forte similarité.


. L'e-value est comprise entre 1e-10 < 1e-05 < 1e-03, donc cette séquence remplie le critère d'homologie.


. 227 acides aminés de query ( de 305 aa au total) s'alignent avec la séquence subject (de 901 aa au total). On a donc un rapport d'environ un tiers d'homologie, ce qui est correcte.


.Query débute à la 333ième position et finie à la 551ième position de subject. L'aligenement deux à deux évalué étant le premier de la liste, la position de query sur subject sera un critère de selection afin de qualifier d'homologues ou non les autres séquences alignées.


D'après ces 5 critères, cette séquence subject est bien considérée comme homologue à query.


On choisit pour les prochains alignements deux à deux les séquences respectant ces critères. Les 10 premières séquences sont qualifiées d'homologues.





Ainsi, tous les alignements deux à deux de chaque protocole sont évalués de cette façon.



BLASTP DATABASE SWISSPROT

2.b) Ce BLAST ne nous permet pas d'identifier d'homologues, avec la base de donnée swissprot (la e-value la plus faible étant de 0,48) .




Le manque de séquences homologues nous amène à effectuer un blastx.



BLASTX DATABASE NR

3.b) On obtient six séquences homologues :


ref|YP_001233756.1| glycosyl transferase, group 1 [Acidiphili... 47.8 0.002

ref|ZP_04334866.1| glycosyltransferase [Nocardiopsis dassonvi... 47.0 0.003

ref|ZP_02042715.1| hypothetical protein RUMGNA_03519 [Ruminoc... 47.0 0.003

ref|ZP_01892940.1| Glycosyltransferase [Marinobacter algicola... 47.0 0.003

ref|NP_495358.1| hypothetical protein H43E16.1 [Caenorhabditi... 47.0 0.003

ref|ZP_03893052.1| glycosyltransferase [Geodermatophilus obsc... 45.4 0.008




BLASTX DATABASE SWISSPROT

4.b)Un BLASTX contre la database SWISSPROT n'est pas utile, et ne nous apprendra rien de plus qu'un BLASTX contre la database NR.




Les résultats ne s'améliorant pas, on est amené à effectuer un blastp contre la banque environnementale.




BLASTP DATABASE ENV_NR

5.b) On obtient quatre séquences homologues:


gb|EDJ40535.1| hypothetical protein GOS_1701247 [marine metag... 47.8 3e-04

gb|EBV08418.1| hypothetical protein GOS_6958884 [marine metag... 47.4 4e-04

gb|EDJ21520.1| hypothetical protein GOS_1735167 [marine metag... 47.0 5e-04

gb|ECY60820.1| hypothetical protein GOS_2330446 [marine metag... 44.3 0.004




A l'issue de tous ces protocoles, le nombre maximum d'homologues obtenus est de 10.

Une phylogénie construite à partir de 10 homologues donnerait un arbre incohérent : on arrête donc l'analyse de notre séquence ici.



On effectue tout de même un alignement multiple afin d'évaluer la proportion d'aa manquant au sein de notre ORF.



RÉSULTATS BRUTS:

_______________________________________________________________________________________________________________________
                                            1) blastp contre la base "nr" :
______________________________________________________________________________________________________________________

a) liste complete des hits

                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|ZP_02042715.1|  hypothetical protein RUMGNA_03519 [Ruminoc...  55.1    1e-05
ref|ZP_01959102.1|  hypothetical protein BACCAC_00698 [Bactero...  52.4    6e-05
ref|YP_003087157.1|  glycosyl transferase group 1 [Dyadobacter...  48.9    6e-04 Gene info
ref|ZP_02429964.1|  hypothetical protein CLOSCI_00168 [Clostri...  48.5    8e-04
ref|YP_722826.1|  glycosyl transferase family protein [Trichod...  48.1    0.001 Gene info
ref|ZP_01468902.1|  hypothetical protein BL107_05779 [Synechoc...  47.4    0.002
ref|ZP_01092016.1|  glycosyl transferase, group 1 [Blastopirel...  47.0    0.003
ref|ZP_05737044.1|  glycosyl transferase CpoA [Granulicatella ...  46.2    0.004
ref|ZP_01892940.1|  Glycosyltransferase [Marinobacter algicola...  45.8    0.006
ref|YP_001233756.1|  glycosyl transferase, group 1 [Acidiphili...  45.4    0.007 Gene info
ref|YP_425043.1|  glycosyl transferase, group 1 [Rhodospirillu...  44.3    0.018 Gene info
ref|ZP_04992095.1|  glycosyl transferase [Streptomyces sp. SPB...  43.9    0.023
ref|ZP_05249303.1|  predicted protein [Francisella philomiragi...  43.9    0.024
ref|ZP_04755636.1|  glycosyl transferase, group 1 [Francisella...  43.9    0.024
ref|ZP_05788400.1|  glycosyltransferase [Synechococcus sp. WH ...  43.1    0.033
ref|YP_001677988.1|  glycosyl transferase, group 1 [Francisell...  42.7    0.047 Gene info
ref|ZP_04207354.1|  Glycosyltransferase [Bacillus cereus Rock4...  42.4    0.058
ref|ZP_04243363.1|  Glycosyltransferase [Bacillus cereus Rock1...  42.4    0.062
ref|YP_595623.1|  hypothetical protein LIC007 [Lawsonia intrac...  42.4    0.063 Gene info
ref|ZP_02925713.1|  YjiB [Verrucomicrobium spinosum DSM 4136]      42.4    0.069
ref|ZP_04225980.1|  Glycosyltransferase [Bacillus cereus Rock3...  42.0    0.082
ref|YP_002381143.1|  glycosyl transferase group 1 [Cyanothece ...  42.0    0.082 Gene info
ref|ZP_02931472.1|  glycosyl transferase, group 1 [Verrucomicr...  42.0    0.082
ref|YP_935069.1|  glucosyltransferase I [Azoarcus sp. BH72] >e...  42.0    0.084 Gene info
ref|YP_190649.1|  hypothetical protein GOX0204 [Gluconobacter ...  42.0    0.089 Gene info
ref|YP_629292.1|  glycosyl transferase, group 1 [Myxococcus xa...  42.0    0.091 Gene info
ref|ZP_04231851.1|  Glycosyltransferase [Bacillus cereus Rock3...  41.6    0.10 
ref|ZP_03452034.1|  glycosyl transferase, group 1 [Burkholderi...  41.6    0.10 
ref|ZP_02491150.1|  glycosyl transferase, group 1 [Burkholderi...  41.6    0.11 
ref|ZP_04492835.1|  glycosyltransferase [Spirosoma linguale DS...  40.4    0.25 
ref|ZP_02033435.1|  hypothetical protein PARMER_03460 [Parabac...  40.4    0.26 
gb|EES53934.1|  glycosyl transferase, group 1 [Leptospirillum ...  40.0    0.28 
ref|NP_893920.1|  glycosyltransferase [Prochlorococcus marinus...  39.7    0.44  Gene info
ref|NP_972037.1|  glycosyl transferase, group 1 family protein...  39.3    0.46  Gene info
ref|YP_002371377.1|  glycosyl transferase group 1 [Cyanothece ...  39.3    0.53  Gene info
ref|YP_941911.1|  glycosyl transferase, group 1 [Psychromonas ...  38.9    0.61  Gene info
ref|ZP_03574134.1|  glycosyl transferase, group 1 [Burkholderi...  38.9    0.65 
gb|AAN64562.1|  glycosyltransferase [Streptococcus gordonii]       38.9    0.65 
ref|YP_711941.1|  putative phosphatidylinositol alpha-mannosyl...  38.9    0.74  Gene info
ref|NP_861985.1|  rb110 [Ruegeria sp. PR1b] >gb|AAN05131.1| RB...  38.5    0.81  Gene info
ref|ZP_05048197.1|  glycosyl transferase, group 1 family prote...  38.5    0.94 
ref|YP_343532.1|  glycosyl transferase, group 1 [Nitrosococcus...  38.5    0.94  Gene info
ref|YP_001374576.1|  glycosyl transferase group 1 [Bacillus ce...  38.5    0.99  Gene info
ref|ZP_02918000.1|  hypothetical protein BIFDEN_01299 [Bifidob...  38.1    1.0  
ref|YP_631816.1|  glycosyl transferase, group 1 [Myxococcus xa...  38.1    1.0   Gene info
ref|ZP_04076649.1|  Glycosyltransferase [Bacillus thuringiensi...  38.1    1.2  
ref|ZP_04106458.1|  Glycosyltransferase [Bacillus thuringiensi...  38.1    1.2  
ref|ZP_04088616.1|  Glycosyltransferase [Bacillus thuringiensi...  38.1    1.2  
ref|YP_002490316.1|  glycosyl transferase group 1 [Methylobact...  38.1    1.3   Gene info
ref|ZP_03780933.1|  hypothetical protein RUMHYD_00363 [Blautia...  38.1    1.3  
ref|ZP_06020428.1|  conserved hypothetical protein [Lactobacil...  38.1    1.3  
ref|YP_001505215.1|  glycosyl transferase group 1 [Frankia sp....  38.1    1.3   Gene info
ref|YP_002449306.1|  glycosyl transferase, group 1 family prot...  38.1    1.3   Gene info
ref|ZP_05024229.1|  glycosyl transferase, group 1 family prote...  38.1    1.3  
ref|ZP_04094669.1|  Glycosyltransferase [Bacillus thuringiensi...  37.7    1.3  
ref|YP_934776.1|  glycosyltransferase [Azoarcus sp. BH72] >emb...  37.7    1.4   Gene info
ref|ZP_00995807.1|  Lipopolysaccharide biosynthesis protein [J...  37.7    1.5  
ref|ZP_03104309.1|  glycosyl transferase, group 1 family prote...  37.7    1.6  
ref|NP_713804.1|  glycosyl transferase [Leptospira interrogans...  37.7    1.7   Gene info
ref|YP_003185056.1|  glycosyl transferase group 1 [Alicyclobac...  37.4    1.8   Gene info
ref|YP_000571.1|  glycosyl transferase [Leptospira interrogans...  37.4    1.8   Gene info
ref|YP_001925497.1|  glycosyl transferase group 1 [Methylobact...  37.4    1.8   Gene info
ref|YP_534040.1|  glycosyl transferase, group 1 [Rhodopseudomo...  37.4    1.8   Gene info
ref|YP_002445321.1|  glycosyl transferase, group 1 family prot...  37.4    2.0   Gene info
gb|AAS83018.1|  hypothetical protein pRhico010 [Azospirillum b...  37.4    2.1  
ref|YP_531223.1|  glycosyl transferase, group 1 [Rhodopseudomo...  37.0    2.4   Gene info
ref|ZP_02692503.1|  glycosyl transferase, group 1 [Epulopisciu...  37.0    2.7  
ref|NP_105587.1|  contains weak similarity to sugar transferas...  37.0    2.7   Gene info
ref|ZP_05746820.1|  group 1 glycosyl transferase [Lactobacillu...  37.0    2.9  
ref|YP_001326936.1|  phosphatidylserine decarboxylase [Sinorhi...  37.0    2.9   Gene info
ref|YP_003090436.1|  glycosyl transferase group 1 [Pedobacter ...  37.0    2.9   Gene info
ref|ZP_02736596.1|  Glycosyltransferase-like protein [Gemmata ...  36.6    3.1  
ref|ZP_02736667.1|  glycosyl transferase, group 1 [Gemmata obs...  36.6    3.3  
ref|ZP_05519072.1|  glycosyl transferase [Streptomyces hygrosc...  36.6    3.4  
ref|ZP_01011642.1|  putative glycosyltransferase [Rhodobactera...  36.6    3.4  
ref|YP_001865020.1|  glycosyl transferase, group 1 [Nostoc pun...  36.6    3.5   Gene info
ref|ZP_04773563.1|  glycosyl transferase group 1 [Allochromati...  36.6    3.5  
ref|YP_460903.1|  glycosyltransferase [Syntrophus aciditrophic...  36.6    3.5   Gene info
ref|ZP_03691867.1|  glycosyl transferase group 1 [Thioalkalivi...  36.6    3.6  
ref|YP_003191439.1|  Ig domain protein group 2 domain protein ...  36.6    3.8   Gene info
ref|YP_003051633.1|  glycosyl transferase group 1 [Methylovoru...  36.6    3.8   Gene info
ref|YP_001433178.1|  glycosyl transferase group 1 [Roseiflexus...  36.2    4.3   Gene info
ref|ZP_04184268.1|  Glycosyltransferase [Bacillus cereus AH127...  36.2    4.3  
ref|YP_783233.1|  glycosyl transferase, group 1 [Rhodopseudomo...  36.2    4.3   Gene info
ref|ZP_01312132.1|  glycosyl transferase, group 1 [Desulfuromo...  36.2    4.3  
ref|ZP_04298709.1|  Glycosyltransferase [Bacillus cereus MM3] ...  36.2    4.5  
ref|YP_001433991.1|  glycosyl transferase group 1 [Roseiflexus...  36.2    4.8   Gene info
ref|ZP_05553372.1|  glycosyltransferase [Lactobacillus coleoho...  36.2    5.0  
ref|YP_076536.1|  putative glycosyl transferase [Symbiobacteri...  35.8    5.2   Gene info
ref|XP_002132604.1|  GA25805 [Drosophila pseudoobscura pseudoo...  35.8    5.7   Gene info
ref|ZP_05062489.1|  glycosyl transferase, group 1 [gamma prote...  35.8    5.8  
ref|NP_809521.1|  glycosyltransferase [Bacteroides thetaiotaom...  35.8    5.9   Gene info
ref|ZP_03993747.1|  glycosyltransferase [Mobiluncus mulieris A...  35.8    6.0  
ref|YP_002482481.1|  glycosyl transferase group 1 [Cyanothece ...  35.8    6.1   Gene info
ref|YP_001275972.1|  glycosyl transferase, group 1 [Roseiflexu...  35.8    6.2   Gene info
ref|YP_603946.1|  alpha amylase, catalytic region [Deinococcus...  35.8    6.2   Gene info
ref|YP_002352489.1|  glycosyl transferase group 1 [Dictyoglomu...  35.8    6.4   Gene info
ref|YP_159437.1|  glycosyl transferase group 1 [Aromatoleum ar...  35.4    7.2   Gene info
ref|YP_001643154.1|  glycosyl transferase family protein [Baci...  35.4    7.2   Gene info
ref|ZP_03493422.1|  glycosyl transferase group 1 [Alicyclobaci...  35.4    7.4  



b) alignements deux à deux
                   

>ref|ZP_02042715.1|  hypothetical protein RUMGNA_03519 [Ruminococcus gnavus ATCC 29149]
 gb|EDN75896.1|  hypothetical protein RUMGNA_03519 [Ruminococcus gnavus ATCC 29149]
Length=901

 Score = 55.1 bits (131),  Expect = 1e-05, Method: Compositional matrix adjust.
 Identities = 58/227 (25%), Positives = 98/227 (43%), Gaps = 14/227 (6%)

Query  79   DAALNKQQLGLSPDSLLLVGTGGLDLAGGFDRFISIASCALADPCEPTVQFLWVGPLDNS  138
            D+  + ++ G S D  +++ +G  +L  G D F+S A           V F+W G   + 
Sbjct  333  DSRFSFEEYGFSLDQKIVMCSGTCELRKGTDLFVSAAQILSRKEKTEEVHFVWTGKFSDP  392

Query  139  ASETTTLVRELHDAELGDRVL---MTADEGQASDLLRVADVLLMTHREPSGSTSINEYLA  195
              E   +  ++  + L  +V       D+ +  +LLR ADV  +T RE    + + E + 
Sbjct  393  ILE-GWIYNQIRQSNLEGKVHFIPFIKDKQKYHELLRHADVFWLTSREDPFPSVVLEAMK  451

Query  196  AEKPIVWFRSNPAVERLM-SGDPCGVEPGRVLQAADQVKCLLSDQDYRRR--LSNANRKR  252
             E P+V F+++  V  ++  G    +    V + A   K LLS+   RR   L+ A    
Sbjct  452  YEIPVVAFKNSGGVNTMLDEGRGNLISEFDVEEMAKVTKQLLSE---RREEVLTGAKAWI  508

Query  253  RDRLVSVDELVYQASASLVERSVERTPRIDEVQSARTGIRLPESFHG  299
             D+L   D + Y     L +  V+  P ID  +      +L E F G
Sbjct  509  EDKLKFDDYIRY--LEQLFQNKVQIVPEIDLYEIITP--KLHEYFQG  551



>ref|ZP_01959102.1|  hypothetical protein BACCAC_00698 [Bacteroides caccae ATCC 43185]
 gb|EDM22317.1|  hypothetical protein BACCAC_00698 [Bacteroides caccae ATCC 43185]
Length=390

 Score = 52.4 bits (124),  Expect = 6e-05, Method: Compositional matrix adjust.
 Identities = 47/193 (24%), Positives = 81/193 (41%), Gaps = 27/193 (13%)

Query  59   DTIERLASVSSPEPNHVSVTDAALNKQQLGLSPDSLLLVGTGGLDLAGGFDRFISIASCA  118
            DT+  LA V +P  ++              +  D+ ++    GL +  GFD     AS A
Sbjct  198  DTLMSLAEVENPYDSY-------------SIDRDTTVICTVAGLYVRKGFDY----ASVA  240

Query  119  LADPCEPTVQFLWV----GPLDNSASETTTLVRELHDAELGDRVLMTADEGQASDLLRVA  174
            L +  +  ++FLW     GP +N   E   LVR   +  + +       +      ++ A
Sbjct  241  LGNLKKEGIKFLWFIVGGGPEEN---EIKDLVR---NNNIENETFFLHQQSNPYKFVKWA  294

Query  175  DVLLMTHREPSGSTSINEYLAAEKPIVWFRSNPAVERLMSGDPCGVEPGRVLQAADQVKC  234
            D+ L+T      S +I E    EKPI+      A +++ SG    V         ++++ 
Sbjct  295  DIFLLTSHAEGKSIAIEEAKLLEKPILITHFASAFDQIESGKSGLVAEMTNDSVTEKLRM  354

Query  235  LLSDQDYRRRLSN  247
            L+ D+D +  LSN
Sbjct  355  LIGDKDLQTNLSN  367


>ref|YP_003087157.1| Gene info glycosyl transferase group 1 [Dyadobacter fermentans DSM 18053]
 gb|ACT93992.1| Gene info glycosyl transferase group 1 [Dyadobacter fermentans DSM 18053]
Length=406

 GENE ID: 8226349 Dfer_2777 | glycosyl transferase group 1
[Dyadobacter fermentans DSM 18053]

 Score = 48.9 bits (115),  Expect = 6e-04, Method: Compositional matrix adjust.
 Identities = 46/167 (27%), Positives = 72/167 (43%), Gaps = 5/167 (2%)

Query  92   DSLLLVGTGGLDLAGGFDRFISIASCALADPCEPTVQFLWVGPLDNSASETTTLV-RELH  150
            D++++ G G  +   G D F  IAS  +       V F+WVG       E   L+  ++ 
Sbjct  214  DAIVIGGCGNAEWRKGNDIFNWIASRVIRKTQPLPVYFVWVGA--GPQHEIYELIASDIR  271

Query  151  DAELGDRVLMTADEGQASDLLRVADVLLMTHREPSGSTSINEYLAAEKPIVWFRSNPAVE  210
               L D++++     +A D +   DVLL++ RE      + E    E P+V F       
Sbjct  272  QMGLSDKIILIPPTPRALDYINRFDVLLLSSREDPYPLVVMEAALQEIPVVCFEDAGGAP  331

Query  211  RLMSGDPCGVEPGRVLQAA-DQVKCLLSDQDYRRRL-SNANRKRRDR  255
             L+  D   V P   + AA D +  L+ D   R  +  NA RK  +R
Sbjct  332  ELIEADAGFVVPYMDISAASDAIIQLILDPSLRNTMGQNARRKVLER  378


>ref|ZP_02429964.1|  hypothetical protein CLOSCI_00168 [Clostridium scindens ATCC 
35704]
 gb|EDS08621.1|  hypothetical protein CLOSCI_00168 [Clostridium scindens ATCC 
35704]
Length=897

 Score = 48.5 bits (114),  Expect = 8e-04, Method: Compositional matrix adjust.
 Identities = 41/211 (19%), Positives = 92/211 (43%), Gaps = 13/211 (6%)

Query  79   DAALNKQQLGLSPDSLLLVGTGGLDLAGGFDRFISIASCALADPCEPTVQFLWVGPLDNS  138
            DA  N ++ G++ + L+++ +G  +L  G D F++ A   +    E  V F+W G  +N 
Sbjct  335  DANFNLEEYGINLNDLVIMSSGTCELRKGVDLFVNAALIFIDRNREKDVHFIWTGNFNNK  394

Query  139  ASETTTLVRELHDAELGDRVL---MTADEGQASDLLRVADVLLMTHREPSGSTSINEYLA  195
              +   ++ +L  +    ++       +  +   LL  AD      RE    +++ E + 
Sbjct  395  ELQ-CWIINQLERSGSQRKIQFIPFIKEAAKYKTLLSRADAFWAMSREDPFPSTVLEAMK  453

Query  196  AEKPIVWFRSNPAVERLMSGDPCGVEPGRVLQAADQVKCLLSDQDYRRRLSNANRKRRDR  255
             E P+V F     ++ ++S +   +  G  L+   +    L +Q+ + ++     K+   
Sbjct  454  NEVPVVGFNGTGGIQVMLSNNRGILIDGFNLEKVVECTEKLVEQNEKNKVLIEEAKK---  510

Query  256  LVSVDELVYQASASLVE----RSVERTPRID  282
               VDE+ + +    +     + V+  P++D
Sbjct  511  --YVDEMEFDSYVKFLRDFLSKPVKINPKLD  539


>ref|YP_722826.1| Gene info glycosyl transferase family protein [Trichodesmium erythraeum 
IMS101]
 gb|ABG52353.1| Gene info glycosyl transferase, family 2 [Trichodesmium erythraeum IMS101]
Length=703

 GENE ID: 4243659 Tery_3238 | glycosyl transferase family protein
[Trichodesmium erythraeum IMS101]

 Score = 48.1 bits (113),  Expect = 0.001, Method: Compositional matrix adjust.
 Identities = 54/230 (23%), Positives = 100/230 (43%), Gaps = 12/230 (5%)

Query  84   KQQLGLSPDSLLLVGTGGLDLAGGFDRFISIASCALADPCEPTVQF--LWVGPLDNSASE  141
            +++L +S D+ ++   G LD   G D  + +    L     P  +F  +WVG  D+  S+
Sbjct  217  RKELSISEDTFVIACCGTLDWRKGGDLIVPLL-VILKKKLSPEKKFVCIWVGNWDSQLSQ  275

Query  142  TTTLVRELHDAELGDRVLMTADEGQASDLLRVADVLLMTHREPSGSTSINEYLAAEKPIV  201
               +   +  AEL + ++ T  +    + +  ADV L+  RE      + E    + P+V
Sbjct  276  LE-IEYTVEKAELENNIIFTGYQKSPLNYMSCADVFLLLSREDPFPLVMMEAGVCKLPVV  334

Query  202  WFRSNPAVERLMSGDPCGVEPGRVLQA-ADQVKCLLSDQDYRRRL-SNANRKRRD---RL  256
             F  +      +  +   + P   L+  A+++  L ++   R+ +  NA RK  +     
Sbjct  335  GFDGSGGATEFVESEAGLLAPYLNLEVMAEKIAILYNNTSLRKEMGENAYRKVNELYNET  394

Query  257  VSVDELVYQASASLVERSVERTPRIDEVQSARTGIRLPESFHG-RRRKRV  305
            VS  +++ Q   SLV +S E T    +    R  + +P   H    RKR+
Sbjct  395  VSAPKIL-QLIQSLVHKSSETTVTFKDF-VPRVSVIVPNYNHAPYLRKRL  442


>ref|ZP_01468902.1|  hypothetical protein BL107_05779 [Synechococcus sp. BL107]
 gb|EAU71015.1|  hypothetical protein BL107_05779 [Synechococcus sp. BL107]
Length=934

 Score = 47.4 bits (111),  Expect = 0.002, Method: Composition-based stats.
 Identities = 50/199 (25%), Positives = 86/199 (43%), Gaps = 16/199 (8%)

Query  84   KQQLGLSPDSLLLVGTGGLDLAGGFDRFISIASCALADPCEPTVQFLWVGPLDNSASETT  143
            + +LG+  DS +++G G ++   G D FI  A   L +  E  + FLW+G L  S  +  
Sbjct  479  RTELGIPKDSKIILGCGTIEPRKGVDIFIETADFFLNNSSE-NIFFLWIGDLPASKCDDR  537

Query  144  TLVRELHDAELGD----RVLMTADEGQASDLLRVADVLLMTHREPSGSTSINEYLAAEKP  199
                +L D +L D      L+     +A    +  DV  +T R+      + E +A +KP
Sbjct  538  NWAEQLLD-KLKDLPNSNCLILGGCTKADRYFQACDVFYLTSRKDPFPGVVLEAMACKKP  596

Query  200  IVWFRSNPAVERLMSGDPCG--VEPGRVLQAADQVKCLLSDQD--YRRRLSNANRKRRDR  255
            I+ F     V    +    G  +    V  AA+++   + +    Y   + N N  +RD 
Sbjct  597  IIAFEDATDVGNAFNDGTGGFLLSQFDVTLAAEKIYLYIKNPQLAYDAGIHNENVIQRDY  656

Query  256  LVSVDELVYQASASLVERS  274
            + S      + +  L+ERS
Sbjct  657  IFS------KYANFLIERS  669


>ref|ZP_01092016.1|  glycosyl transferase, group 1 [Blastopirellula marina DSM 3645]
 gb|EAQ79546.1|  glycosyl transferase, group 1 [Blastopirellula marina DSM 3645]
Length=217

 Score = 47.0 bits (110),  Expect = 0.003, Method: Compositional matrix adjust.
 Identities = 43/169 (25%), Positives = 73/169 (43%), Gaps = 7/169 (4%)

Query  96   LVGTGGLDLAGGFDRFISIASCALADPCEPTVQFLWVGPLDNSASETTTLVRELHDAELG  155
            ++  G LD   GFD  ++    +L    +  +Q +  GP +++  E     R+LH   L 
Sbjct  36   MIAVGRLDAQKGFDWLLNALRDSLLQADDWELQIVGAGPQESALKEQA---RQLH---LV  89

Query  156  DRVLMTADEGQASDLLRVADVLLMTHREPSGSTSINEYLAAEKPIVWFRSNPAVERLMSG  215
            DRV           LL  AD+ L++ R      ++ E +AA  P+V        + L   
Sbjct  90   DRVQFLGRRADVPQLLTDADLFLLSSRWEGMPNAMIEAMAARLPVVATDVEGVAQLLGPH  149

Query  216  DPCGVEP-GRVLQAADQVKCLLSDQDYRRRLSNANRKRRDRLVSVDELV  263
            +   + P G   +   +V  L +D   R  L  ANR R +   ++D++V
Sbjct  150  EETQLSPIGDAAEFVSRVSVLAADATLRVELGEANRARIEAHFTLDKMV  198


>ref|ZP_05737044.1|  glycosyl transferase CpoA [Granulicatella adiacens ATCC 49175]
 gb|EEW38063.1|  glycosyl transferase CpoA [Granulicatella adiacens ATCC 49175]
Length=344

 Score = 46.2 bits (108),  Expect = 0.004, Method: Compositional matrix adjust.
 Identities = 40/141 (28%), Positives = 62/141 (43%), Gaps = 18/141 (12%)

Query  72   PNHVS------VTDAALN--KQQLGLSPDSLLLVGTGGLDLAGGFDRFISIASCALADPC  123
            PN VS      +TD +    +++ GL+PD   +V  G L +  G   FI +A        
Sbjct  135  PNFVSRETFYKITDQSKRELRKKFGLNPDQFTVVSAGQLQMRKGVLEFIDLAEKM-----  189

Query  124  EPTVQFLWVGPLD-NSASETTTLVRELHDAELGDRV--LMTADEGQASDLLRVADVLLMT  180
             P +QF+W G       SE    + E H  +L D V  L   +  Q ++   +ADV+L  
Sbjct  190  -PEIQFVWAGDFAFKGISEGRKEIME-HMDDLPDNVHFLGLVERNQMNEFFNMADVMLQL  247

Query  181  HREPSGSTSINEYLAAEKPIV  201
              E     +I E + A  P++
Sbjct  248  SFEELFPMTILESMNANIPLL  268


>ref|ZP_01892940.1|  Glycosyltransferase [Marinobacter algicola DG893]
 gb|EDM48984.1|  Glycosyltransferase [Marinobacter algicola DG893]
Length=368

 Score = 45.8 bits (107),  Expect = 0.006, Method: Compositional matrix adjust.
 Identities = 53/211 (25%), Positives = 93/211 (44%), Gaps = 13/211 (6%)

Query  71   EPNHVSVTDAALNKQQLGLSPDSLLLVGTGGLDLAGGFDRFISIASCALADPCEPTVQFL  130
            +PN      A+  K++LGL P + ++V  G +  A G++  ++ A+  +A   +P V F+
Sbjct  167  DPNLFCEAPASGLKEELGLPPAATVVVSIGNIRPAKGYEHLVN-AAIKMAR-LDPGVHFV  224

Query  131  WVGPLDNSASETTTLVRELHDAELGDRVLMTADEGQASDLLRVADVLLMTHREPSGSTSI  190
             VG     AS    L  ++  AE    +         +D+LR AD+ L+       S S 
Sbjct  225  VVG--HQRASLFKQLETQIARAEEPPNIHWLGFRADVADILRQADIFLLPSVSEGFSIST  282

Query  191  NEYLAAEKPIVWFRSNPAVERLMSGDPCGVEPGR----VLQAADQVKCLLSDQDYRRRLS  246
             E + A  PI+  RS    E L  G+   + P +    ++ A +++K    D     ++ 
Sbjct  283  VEAMMAGVPIIATRSGGPEEILSDGETGLLIPTKDPDAIVSAVERLK----DPALSNKVI  338

Query  247  NANRKRRDRLVSVDELVYQASASLVERSVER  277
               R+      S+  ++ QA   L ER + R
Sbjct  339  EKARQNALERFSLGSML-QAYHGLYERFIRR  368


_____________________________________________________________________________________________________________________
                                       2) blastp contre la database "swissprot" :
______________________________________________________________________________________________________________________


a) liste complete des hits

                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

sp|P58188.1|LPXK_RICTY  RecName: Full=Tetraacyldisaccharide 4'...  35.4    0.48 
sp|Q74EG5.3|FPG_GEOSL  RecName: Full=Formamidopyrimidine-DNA g...  34.3    0.90 
sp|O32267.1|TUAH_BACSU  RecName: Full=Putative teichuronic aci...  34.3    0.91 
sp|Q07963.1|UBR2_YEAST  RecName: Full=E3 ubiquitin-protein lig...  33.9    1.4   Gene info
sp|Q9R6W9.1|ATKC_ANASL  RecName: Full=Potassium-transporting A...  33.1    2.0  
sp|Q0ACQ9.1|CH60_ALHEH  RecName: Full=60 kDa chaperonin; AltNa...  32.3    3.8   Gene info
sp|Q9ZCL0.1|LPXK_RICPR  RecName: Full=Tetraacyldisaccharide 4'...  31.6    6.4  
sp|Q9EPL6.1|MMP1B_MOUSE  RecName: Full=Interstitial collagenas...  31.6    6.5   Gene info
sp|Q09699.1|SNF5_SCHPO  RecName: Full=SWI/SNF chromatin-remode...  31.2    7.8   Gene info


b) alignements deux à deux 


>sp|P58188.1|LPXK_RICTY  RecName: Full=Tetraacyldisaccharide 4'-kinase; AltName: Full=Lipid 
A 4'-kinase
Length=321

 Score = 35.4 bits (80),  Expect = 0.48, Method: Compositional matrix adjust.
 Identities = 40/158 (25%), Positives = 65/158 (41%), Gaps = 12/158 (7%)

Query  91   PDSLLLVGTGGLDLAGGFDRFISIASCALADPCEPTVQFLWVGPLDNSASETTTLVRELH  150
            P  ++ VG   +   G     I +A    A      V F+ +     S  ++TT++++ H
Sbjct  44   PAKVICVGNCSVGGTGKTQIVIYLAKLLKAK----NVSFVIITKAYGSNIKSTTIIQKWH  99

Query  151  DA-ELGDRVLMTADEGQASDLLRVADVLLMTHREPSGSTSI-----NEYLAAEKPIVWFR  204
             A E+GD  +M A  G A     + D+L + +        +     N YL  +  IV   
Sbjct  100  TALEVGDEGIMLAKYGTAIAAKHIKDILPLINELKPDVIIVDDFLQNPYLHKDFTIVSVD  159

Query  205  SNPAVER--LMSGDPCGVEPGRVLQAADQVKCLLSDQD  240
            S        L+   P    P +VL AAD +  + S+QD
Sbjct  160  SQRLFGNRFLIPAGPLRQNPKQVLDAADLIFLVSSNQD  197.


_______________________________________________________________________________________________________________________
                                               3) blastx contre la database "nr" :
_______________________________________________________________________________________________________________________


a) liste complète des hits


                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|YP_001233756.1|  glycosyl transferase, group 1 [Acidiphili...  47.8    0.002 Gene info
ref|ZP_04334866.1|  glycosyltransferase [Nocardiopsis dassonvi...  47.0    0.003
ref|ZP_02042715.1|  hypothetical protein RUMGNA_03519 [Ruminoc...  47.0    0.003
ref|ZP_01892940.1|  Glycosyltransferase [Marinobacter algicola...  47.0    0.003
ref|NP_495358.1|  hypothetical protein H43E16.1 [Caenorhabditi...  47.0    0.003 UniGene infoGene info
ref|ZP_03893052.1|  glycosyltransferase [Geodermatophilus obsc...  45.4    0.008
ref|YP_425043.1|  glycosyl transferase, group 1 [Rhodospirillu...  45.1    0.011 Gene info
ref|YP_003087157.1|  glycosyl transferase group 1 [Dyadobacter...  44.3    0.018 Gene info
ref|YP_935069.1|  glucosyltransferase I [Azoarcus sp. BH72] >e...  44.3    0.018 Gene info
ref|XP_002494890.1|  ZYRO0A12100p [Zygosaccharomyces rouxii] >...  43.5    0.031 Gene info
ref|YP_190649.1|  hypothetical protein GOX0204 [Gluconobacter ...  43.5    0.031 Gene info
ref|ZP_03452034.1|  glycosyl transferase, group 1 [Burkholderi...  43.1    0.040
ref|XP_001919147.1|  PREDICTED: hypothetical protein, partial ...  43.1    0.040 UniGene infoGene info
ref|ZP_02491150.1|  glycosyl transferase, group 1 [Burkholderi...  43.1    0.040
ref|ZP_04992095.1|  glycosyl transferase [Streptomyces sp. SPB...  42.7    0.053
ref|ZP_01959102.1|  hypothetical protein BACCAC_00698 [Bactero...  42.7    0.053
ref|ZP_02925713.1|  YjiB [Verrucomicrobium spinosum DSM 4136]      42.4    0.069
gb|AAS83018.1|  hypothetical protein pRhico010 [Azospirillum b...  42.0    0.090
ref|NP_596690.1|  sequence orphan [Schizosaccharomyces pombe] ...  42.0    0.090 UniGene infoGene info
ref|XP_843163.1|  proteophosphoglycan 5 [Leishmania major stra...  42.0    0.090 Gene info
ref|XP_002174158.1|  predicted protein [Schizosaccharomyces ja...  41.2    0.15  Gene info
ref|NP_014487.1|  Haze-protective mannoprotein that reduces th...  41.2    0.15  UniGene infoGene info
ref|XP_843162.1|  proteophosphoglycan ppg4 [Leishmania major s...  41.2    0.15  Gene info
ref|ZP_05788400.1|  glycosyltransferase [Synechococcus sp. WH ...  40.8    0.20 
ref|YP_001505215.1|  glycosyl transferase group 1 [Frankia sp....  40.8    0.20  Gene info
gb|ACZ22128.1|  uncharacterized membrane protein, putative vir...  40.0    0.34 
ref|ZP_05249303.1|  predicted protein [Francisella philomiragi...  40.0    0.34 
ref|ZP_04755636.1|  glycosyl transferase, group 1 [Francisella...  40.0    0.34 
ref|XP_002548147.1|  predicted protein [Candida tropicalis MYA...  40.0    0.34  Gene info
ref|ZP_04207354.1|  Glycosyltransferase [Bacillus cereus Rock4...  40.0    0.34 
ref|ZP_04243363.1|  Glycosyltransferase [Bacillus cereus Rock1...  40.0    0.34 
gb|EEE27844.1|  DNA mismatch repair protein, putative [Toxopla...  40.0    0.34 
ref|YP_595623.1|  hypothetical protein LIC007 [Lawsonia intrac...  40.0    0.34  Gene info
gb|ACU16297.1|  unknown [Glycine max]                              39.7    0.45 
ref|ZP_04225980.1|  Glycosyltransferase [Bacillus cereus Rock3...  39.7    0.45 
ref|ZP_04231851.1|  Glycosyltransferase [Bacillus cereus Rock3...  39.7    0.45 
ref|ZP_03574134.1|  glycosyl transferase, group 1 [Burkholderi...  39.7    0.45 
ref|ZP_02692503.1|  glycosyl transferase, group 1 [Epulopisciu...  39.7    0.45 
ref|XP_001478231.1|  PREDICTED: hypothetical protein [Mus musc...  39.7    0.45  UniGene infoGene info
ref|XP_002618275.1|  hypothetical protein CLUG_01734 [Clavispo...  39.3    0.58  Gene info
ref|YP_001677988.1|  glycosyl transferase, group 1 [Francisell...  39.3    0.58  Gene info
ref|XP_843161.1|  proteophosphoglycan ppg3 [Leishmania major s...  39.3    0.58  Gene info
ref|ZP_01092016.1|  glycosyl transferase, group 1 [Blastopirel...  39.3    0.58 
gb|EEE29804.1|  conserved hypothetical protein [Toxoplasma gon...  38.9    0.76 
ref|XP_001935126.1|  predicted protein [Pyrenophora tritici-re...  38.9    0.76  Gene info
ref|ZP_02736596.1|  Glycosyltransferase-like protein [Gemmata ...  38.9    0.76 
ref|XP_843164.1|  proteophosphoglycan ppg1 [Leishmania major s...  38.9    0.76  Gene info
ref|XP_001965503.1|  GF22529 [Drosophila ananassae] >gb|EDV350...  38.5    0.99  Gene info
ref|YP_001222343.1|  putative glycosyltransferase [Clavibacter...  38.5    0.99  Gene info
gb|AAC48525.1|  gastric mucin [Sus scrofa]                         38.5    0.99  Gene info
ref|NP_014442.1|  Anchorage subunit of a-agglutinin of a-cells...  38.5    0.99  UniGene infoGene info
gb|EEE34921.1|  zinc finger (C3HC4 type, RING finger) protein ...  38.1    1.3  
ref|XP_002367888.1|  zinc finger (C3HC4 type RING finger) prot...  38.1    1.3   Gene info
ref|ZP_05048197.1|  glycosyl transferase, group 1 family prote...  38.1    1.3  
gb|EDV12223.1|  a-agglutinin anchorage subunit [Saccharomyces ...  38.1    1.3  
gb|EDV11809.1|  conserved hypothetical protein [Saccharomyces ...  38.1    1.3  
ref|XP_001729652.1|  hypothetical protein MGL_3196 [Malassezia...  38.1    1.3   Gene info
ref|XP_001161935.1|  PREDICTED: hypothetical protein [Pan trog...  38.1    1.3   UniGene infoGene info
ref|YP_343532.1|  glycosyl transferase, group 1 [Nitrosococcus...  38.1    1.3   Gene info
emb|CAA61860.1|  AOF1001 [Saccharomyces cerevisiae]                38.1    1.3  
ref|NP_014050.1|  Putative protein of unknown function with so...  38.1    1.3   UniGene infoGene info
ref|NP_786208.1|  extracellular protein [Lactobacillus plantar...  38.1    1.3   Gene info
ref|ZP_00995807.1|  Lipopolysaccharide biosynthesis protein [J...  38.1    1.3  
ref|YP_722826.1|  glycosyl transferase family protein [Trichod...  38.1    1.3   Gene info
ref|ZP_05542131.1|  glycosyl transferase [Streptomyces griseof...  37.7    1.7  
gb|EDK37104.2|  hypothetical protein PGUG_01202 [Pichia guilli...  37.7    1.7  
ref|ZP_02907008.1|  protein of unknown function DUF88 [Burkhol...  37.7    1.7  
ref|YP_002490316.1|  glycosyl transferase group 1 [Methylobact...  37.7    1.7   Gene info
ref|ZP_02033435.1|  hypothetical protein PARMER_03460 [Parabac...  37.7    1.7  
ref|XP_001487825.1|  hypothetical protein PGUG_01202 [Pichia g...  37.7    1.7   Gene info
ref|ZP_03691867.1|  glycosyl transferase group 1 [Thioalkalivi...  37.4    2.2  
ref|XP_002372019.1|  protein kinase domain-containing protein ...  37.4    2.2   Gene info
ref|ZP_02931472.1|  glycosyl transferase, group 1 [Verrucomicr...  37.4    2.2  
gb|EDN62850.1|  a-agglutinin anchorage subunit [Saccharomyces ...  37.4    2.2  
gb|EDL18121.1|  mCG142255, isoform CRA_b [Mus musculus]            37.4    2.2  
gb|EDL18120.1|  mCG142255, isoform CRA_a [Mus musculus]            37.4    2.2  
ref|YP_002462744.1|  glycosyl transferase group 1 [Chloroflexu...  37.4    2.2   Gene info
ref|XP_516298.2|  PREDICTED: similar to TPRXL protein [Pan tro...  37.4    2.2   UniGene infoGene info
ref|XP_001001905.1|  PREDICTED: similar to secreted gel-formin...  37.4    2.2   Gene info
sp|Q8TFG9.2|YL61_SCHPO  RecName: Full=Uncharacterized serine/t...  37.4    2.2  
ref|XP_002648973.1|  Hypothetical protein CBG21295 [Caenorhabd...  37.4    2.2   Gene info
ref|YP_003191439.1|  Ig domain protein group 2 domain protein ...  37.0    2.9   Gene info
ref|ZP_05518933.1|  glycosyl transferase [Streptomyces hygrosc...  37.0    2.9  
gb|EEE25262.1|  conserved hypothetical protein [Toxoplasma gon...  37.0    2.9  
ref|XP_002366950.1|  zinc finger (C3HC4 type RING finger) prot...  37.0    2.9   Gene info
ref|XP_001955715.1|  GF18902 [Drosophila ananassae] >gb|EDV442...  37.0    2.9   Gene info
ref|ZP_02948666.1|  hypothetical protein CBY_0635 [Clostridium...  37.0    2.9  
ref|YP_001827228.1|  putative glycosyl transferase [Streptomyc...  37.0    2.9   Gene info
ref|XP_001371191.1|  PREDICTED: hypothetical protein [Monodelp...  37.0    2.9   UniGene infoGene info
ref|XP_001068112.1|  PREDICTED: hypothetical protein [Rattus n...  37.0    2.9   UniGene infoGene info
ref|XP_445190.1|  unnamed protein product [Candida glabrata] >...  37.0    2.9   Gene info
ref|NP_012097.1|  Putative protein of unknown function; serine...  37.0    2.9   UniGene infoGene info
ref|YP_740981.1|  glycosyl transferase, group 1 [Alkalilimnico...  37.0    2.9   Gene info
ref|YP_531223.1|  glycosyl transferase, group 1 [Rhodopseudomo...  37.0    2.9   Gene info
ref|XP_002498218.1|  ZYRO0G05104p [Zygosaccharomyces rouxii] >...  36.6    3.8   Gene info
gb|EEE34817.1|  PHD-finger domain-containing protein, putative...  36.6    3.8  
gb|EEE24021.1|  zinc finger (C3HC4 type, RING finger) protein ...  36.6    3.8  
ref|XP_002074189.1|  GK14511 [Drosophila willistoni] >gb|EDW85...  36.6    3.8   Gene info
ref|XP_001609001.1|  hypothetical protein [Babesia bovis T2Bo]...  36.6    3.8   Gene info
ref|XP_416173.2|  PREDICTED: hypothetical protein [Gallus gallus]  36.6    3.8   UniGene infoGene info



b) alignements deux à deux 



>ref|YP_001233756.1| Gene info glycosyl transferase, group 1 [Acidiphilium cryptum JF-5]
 gb|ABQ29837.1| Gene info glycosyl transferase, group 1 [Acidiphilium cryptum JF-5]
Length=1247

 GENE ID: 5161252 Acry_0615 | glycosyl transferase, group 1
[Acidiphilium cryptum JF-5]

 Score = 47.8 bits (112),  Expect = 0.002
 Identities = 45/154 (29%), Positives = 72/154 (46%), Gaps = 20/154 (12%)
 Frame = +1

Query  208  PEPNHVSVT-DAALNKQ---QLGLSPDSLLLVGTGGLDLAGGFDRFISIASCALADPCEP  375
            P+ N+  V  DAA  ++   +LGL+ D+  ++G G  DL  GFD F+   +  L     P
Sbjct  752  PQGNYQGVRFDAAARRRFRARLGLAVDAFAVLGVGFGDLRKGFDLFLQ--AFRLLAAARP  809

Query  376  TVQFLWVGPLDNSASETTTLVRELHDAEL------GDRVLMTADEGQASDLLRVADVLLM  537
             + F+W+G       ET   +R+   AE+      G   L+  DE  A      AD+  +
Sbjct  810  DIHFVWLG-------ETHLWIRDYLGAEIEAARATGRFHLLPFDEDVAPAYCG-ADLYAL  861

Query  538  THREPSGSTSINEYLAAEKPIVWFRSNPAVERLM  639
            T RE    T++ E +AA  P + F  +  +  L+
Sbjct  862  TSREDPLPTTVIEAMAAGLPAIAFAGSGGIPDLL  895


>ref|ZP_04334866.1|  glycosyltransferase [Nocardiopsis dassonvillei subsp. dassonvillei 
DSM 43111]
 gb|EEK33318.1|  glycosyltransferase [Nocardiopsis dassonvillei subsp. dassonvillei 
DSM 43111]
Length=265

 Score = 47.0 bits (110),  Expect = 0.003
 Identities = 51/176 (28%), Positives = 67/176 (38%), Gaps = 10/176 (5%)
 Frame = +1

Query  187  RLASVSSPEPNHVSVTDAALNKQQLGLSPDSLLLVGTGGLDLAGGFDRFISIASCALADP  366
            RLA V++PE     V      +  L + P+  LL+    L    G D  ++ A       
Sbjct  62   RLAVVAAPETG-APVNGREATRADLAVLPERPLLLTVARLAEQKGLDMLLAAAPAIADRR  120

Query  367  CEPTVQFLWVGPLDNSASETTTLVRELHDAELGDRVLMTADEGQASDLLRVADVLLMTHR  546
             EP V     GPL     +T         AE+   V M       +DLL  ADV  +T +
Sbjct  121  PEPVVAIAGDGPLWGQLHDTA--------AEMRADVRMLGHRADVADLLAAADVFCLTSQ  172

Query  547  EPSGSTSINEYLAAEKPIVWFRSNPAVERLMSGDPCGVEPGRVLQAADQVKCLLSD  714
                S  I E L A  P+V  R    +  L SG    V PG     A  V  +L D
Sbjct  173  WEGPSLVIMEALRAGLPVVSTRVG-GIPDLYSGTVLMVPPGDPAAFAAAVGRVLDD  227


>ref|ZP_02042715.1|  hypothetical protein RUMGNA_03519 [Ruminococcus gnavus ATCC 29149]
 gb|EDN75896.1|  hypothetical protein RUMGNA_03519 [Ruminococcus gnavus ATCC 29149]
Length=901

 Score = 47.0 bits (110),  Expect = 0.003
 Identities = 34/138 (24%), Positives = 64/138 (46%), Gaps = 4/138 (2%)
 Frame = +1

Query  235  DAALNKQQLGLSPDSLLLVGTGGLDLAGGFDRFISIASCALADPCEPTVQFLWVGPLDNS  414
            D+  + ++ G S D  +++ +G  +L  G D F+S A           V F+W G   + 
Sbjct  333  DSRFSFEEYGFSLDQKIVMCSGTCELRKGTDLFVSAAQILSRKEKTEEVHFVWTGKFSDP  392

Query  415  ASETTTLVRELHDAELGDRV---LMTADEGQASDLLRVADVLLMTHREPSGSTSINEYLA  585
              E   +  ++  + L  +V       D+ +  +LLR ADV  +T RE    + + E + 
Sbjct  393  ILE-GWIYNQIRQSNLEGKVHFIPFIKDKQKYHELLRHADVFWLTSREDPFPSVVLEAMK  451

Query  586  AEKPIVWFRSNPAVERLM  639
             E P+V F+++  V  ++
Sbjct  452  YEIPVVAFKNSGGVNTML  469


>ref|ZP_01892940.1|  Glycosyltransferase [Marinobacter algicola DG893]
 gb|EDM48984.1|  Glycosyltransferase [Marinobacter algicola DG893]
Length=368

 Score = 47.0 bits (110),  Expect = 0.003
 Identities = 44/168 (26%), Positives = 78/168 (46%), Gaps = 10/168 (5%)
 Frame = +1

Query  211  EPNHVSVTDAALNKQQLGLSPDSLLLVGTGGLDLAGGFDRFISIASCALADPCEPTVQFL  390
            +PN      A+  K++LGL P + ++V  G +  A G++  ++ A+  +A   +P V F+
Sbjct  167  DPNLFCEAPASGLKEELGLPPAATVVVSIGNIRPAKGYEHLVN-AAIKMA-RLDPGVHFV  224

Query  391  WVGPLDNSASETTTLVRELHDAELGDRVLMTADEGQASDLLRVADVLLMTHREPSGSTSI  570
             VG     AS    L  ++  AE    +         +D+LR AD+ L+       S S 
Sbjct  225  VVG--HQRASLFKQLETQIARAEEPPNIHWLGFRADVADILRQADIFLLPSVSEGFSIST  282

Query  571  NEYLAAEKPIVWFRSNPAVERLMSGDPCGV-----EPGRVLQAADQVK  699
             E + A  PI+  RS    E ++S    G+     +P  ++ A +++K
Sbjct  283  VEAMMAGVPIIATRSG-GPEEILSDGETGLLIPTKDPDAIVSAVERLK  329


>ref|NP_495358.1| UniGene infoGene info hypothetical protein H43E16.1 [Caenorhabditis elegans]
 gb|AAF39909.1| Gene info Hypothetical protein H43E16.1 [Caenorhabditis elegans]
Length=1203

 GENE ID: 174101 H43E16.1 | hypothetical protein [Caenorhabditis elegans]
(10 or fewer PubMed links)

 Score = 47.0 bits (110),  Expect = 0.003
 Identities = 41/151 (27%), Positives = 70/151 (46%), Gaps = 10/151 (6%)
 Frame = -2

Query  500  ACPSSAVIKTRSPSSASCNSLTSVVVSDAELSSGPTHKNWTVGSQGSASAQLAIEMNRSK  321
            A  SS +  T+ PS +S +  ++V    + +++ P   + +  SQGS+SAQ     +   
Sbjct  689  ASSSSQMTSTQQPSGSSSSIGSTVNQGSSSVTTQPPASSRSTASQGSSSAQPIASSSTMG  748

Query  320  PPARSSPPVPTSSKLSGESPSCCLFSAASVTETWFGSGLETLASRSM---------VSNS  168
              A SS P PT+S  S  S +    S ++V  +  GS   +L S +M         V+N 
Sbjct  749  STAGSSSPQPTASSTSVPSSTGATSSGSTVGSSTMGSTQSSLPSSTMTNTGSTGSTVTNQ  808

Query  167  VSTK*NFASDVVRLLGNR*ANFAGSETV*QT  75
            +++   + +     + +  AN  GS T  QT
Sbjct  809  LASSSTYGASSTEPIASSTAN-PGSSTSGQT  838


>ref|ZP_03893052.1|  glycosyltransferase [Geodermatophilus obscurus DSM 43160]
 gb|EEI30591.1|  glycosyltransferase [Geodermatophilus obscurus DSM 43160]
Length=375

 Score = 45.4 bits (106),  Expect = 0.008
 Identities = 49/177 (27%), Positives = 75/177 (42%), Gaps = 12/177 (6%)
 Frame = +1

Query  187  RLASVSSPEPNHVSVTDAALNKQQLGLSPDSLLLVGTGGLDLAGGFDRFISIASCALADP  366
            R+A VS+P P   +   AA  + +LGL+    L++  G L    G+D  +  A+      
Sbjct  170  RVAPVSAP-PLPAAARTAAEVRAELGLADGRPLVLAVGRLHPQKGYDVLLDAAARWAGSS  228

Query  367  CEPTVQFLWVGPLDNSASETTTLVRELHDAELGDRVLMTADEGQASDLLRVADVLLMTHR  546
              P V     GPL +  +      R          V++       +DLL  AD+ ++  R
Sbjct  229  PPPLVAVAGDGPLQDELAARIAAERL--------PVVLLGRRSDVADLLAAADLAVLPSR  280

Query  547  EPSGSTSINEYLAAEKPIVWFRSNPAVERLMSGDPCGVEP-GRVLQAADQVKCLLSD  714
              + S +  E L A  P+V  R+    E L  GD   + P G  +  AD V  LL+D
Sbjct  281  WEARSLTAQEALRAGTPLVATRTGGLPELL--GDGAQLVPVGDPVALADSVTGLLAD  335


______________________________________________________________________________________________________________________
4) blastx contre la database swissprot :
______________________________________________________________________________________________________________________
      
a) liste complète des hits

                                                             Score     E
Sequences producing significant alignments:                       (Bits)  Value

sp|O94317.1|YH5D_SCHPO  RecName: Full=Uncharacterized serine-r...  42.0    0.005 Gene info
sp|Q05164.2|HPF1_YEAST  RecName: Full=Haze protective factor 1...  41.2    0.009 Gene info
sp|P32323.1|AGA1_YEAST  RecName: Full=A-agglutinin anchorage s...  38.5    0.059 Gene info
sp|Q04893.1|YM96_YEAST  RecName: Full=Uncharacterized protein ...  38.1    0.077 Gene info
sp|Q8TFG9.2|YL61_SCHPO  RecName: Full=Uncharacterized serine/t...  37.4    0.13 
sp|P40442.1|YIQ9_YEAST  RecName: Full=Putative uncharacterized...  37.0    0.17  Gene info
sp|Q9UBL0.2|ARP21_HUMAN  RecName: Full=cAMP-regulated phosphop...  36.2    0.29  Gene info
sp|P14328.2|SP96_DICDI  RecName: Full=Spore coat protein SP96      35.8    0.38 
sp|P58188.1|LPXK_RICTY  RecName: Full=Tetraacyldisaccharide 4'...  34.7    0.85 
sp|P98088.3|MUC5A_HUMAN  RecName: Full=Mucin-5AC; AltName: Ful...  34.3    1.1   Gene info
sp|Q74EG5.3|FPG_GEOSL  RecName: Full=Formamidopyrimidine-DNA g...  34.3    1.1  
sp|Q8TGE1.1|AWA1_YEAST  RecName: Full=Cell wall protein AWA1; ...  34.3    1.1  
sp|Q54TA5.1|TBC5B_DICDI  RecName: Full=TBC1 domain family memb...  34.3    1.1  
sp|Q17RH7.2|TPRXL_HUMAN  RecName: Full=Putative protein TPRXL      33.9    1.4   Gene info
sp|O74346.1|MAP4_SCHPO  RecName: Full=P cell-type agglutinatio...  32.7    3.2   Gene info
sp|Q4ZHG4.3|FNDC1_HUMAN  RecName: Full=Fibronectin type III do...  32.0    5.5   Gene info
sp|Q4L9P0.1|SRAP_STAHJ  RecName: Full=Serine-rich adhesin for ...  32.0    5.5   Gene info
sp|Q5FUR3.1|CLPP1_GLUOX  RecName: Full=ATP-dependent Clp prote...  32.0    5.5  
sp|Q700K0.1|SSPO_RAT  RecName: Full=SCO-spondin; Flags: Precursor  32.0    5.5   Gene info
sp|Q9SQI2.2|GIGAN_ARATH  RecName: Full=Protein GIGANTEA            32.0    5.5   Gene info
sp|Q0ACQ9.1|CH60_ALHEH  RecName: Full=60 kDa chaperonin; AltNa...  32.0    5.5   Gene info
sp|Q54S20.1|MED13_DICDI  RecName: Full=Mediator of RNA polymer...  32.0    5.5  
sp|Q8JHV9.1|BIR7A_XENLA  RecName: Full=Baculoviral IAP repeat-...  31.6    7.2   Gene info
sp|P40472.1|SIM1_YEAST  RecName: Full=Protein SIM1; Flags: Pre...  31.6    7.2   Gene info
sp|Q9EPL6.1|MMP1B_MOUSE  RecName: Full=Interstitial collagenas...  31.6    7.2   Gene info
sp|Q54LU8.1|Y8646_DICDI  RecName: Full=Probable serine/threoni...  31.6    7.2  
sp|A2WPU3.2|AMYC1_ORYSI  RecName: Full=Alpha-amylase isozyme C...  31.2    9.4  
sp|A2VEC9.1|SSPO_HUMAN  RecName: Full=SCO-spondin; Flags: Prec...  31.2    9.4   Gene info
sp|A8LL57.1|ISPE_DINSH  RecName: Full=4-diphosphocytidyl-2-C-m...  31.2    9.4   Gene info
sp|Q9Y597.2|KCTD3_HUMAN  RecName: Full=BTB/POZ domain-containi...  31.2    9.4   Gene info
sp|Q569M3.1|CENPA_XENLA  RecName: Full=Histone H3-like centrom...  31.2    9.4   Gene info
sp|Q0JMV4.1|AMYC1_ORYSJ  RecName: Full=Alpha-amylase isozyme C...  31.2    9.4   Gene info
sp|Q6Z358.1|C3H49_ORYSJ  RecName: Full=Zinc finger CCCH domain...  31.2    9.4   Gene info
sp|P03173.1|VGLC_HHV2G  RecName: Full=Glycoprotein C; AltName:...  31.2    9.4  
sp|P10035.2|HMH2_DROME  RecName: Full=Homeobox protein H2.0        31.2    9.4   Gene info
sp|Q9ZCL0.1|LPXK_RICPR  RecName: Full=Tetraacyldisaccharide 4'...  31.2    9.4  



b) alignements deux à deux 


>sp|O94317.1|YH5D_SCHPO Gene info RecName: Full=Uncharacterized serine-rich protein C215.13; Flags: 
Precursor
Length=534

 GENE ID: 2540669 SPBC215.13 | sequence orphan [Schizosaccharomyces pombe]
(10 or fewer PubMed links)

 Score = 42.0 bits (97),  Expect = 0.005
 Identities = 36/123 (29%), Positives = 62/123 (50%), Gaps = 7/123 (5%)
 Frame = -2

Query  491  SSAVIKTRSPSSASCNSLTSVVVSDAELSSGPTHKNWTVGSQGSASAQLAIEMNR-----  327
            SS V  + SPSS+S ++LTS  +S + + S  +  + T  S  S+S+      +      
Sbjct  213  SSVVSSSSSPSSSSSSTLTSSSLSTSSIPSTSSSSSSTSSSLSSSSSSSTASSSSSSSSI  272

Query  326  -SKPPARSSPPVPTSSKLSGESPSCCLFSAASVTETWFGSGLETLASRSMVSNSVSTK*N  150
             S   + SS P  TSS +S  S S    ++ S T +   S   + +S ++ S+S+S+  +
Sbjct  273  ISSSSSSSSSPTSTSSTISSSSSSSSSPTSTSSTISSSSSSSSSFSS-TLSSSSMSSSSS  331

Query  149  FAS  141
            F+S
Sbjct  332  FSS  334


______________________________________________________________________________________________________________________
                                       5) blastp contre la database "env_nr"
______________________________________________________________________________________________________________________
 
a) liste complète des hits

                                                                  Score     E
Sequences producing significant alignments:                       (Bits)  Value

gb|EDJ40535.1|  hypothetical protein GOS_1701247 [marine metag...  47.8    3e-04
gb|EBV08418.1|  hypothetical protein GOS_6958884 [marine metag...  47.4    4e-04
gb|EDJ21520.1|  hypothetical protein GOS_1735167 [marine metag...  47.0    5e-04
gb|ECY60820.1|  hypothetical protein GOS_2330446 [marine metag...  44.3    0.004
gb|ECJ09441.1|  hypothetical protein GOS_5681052 [marine metag...  42.0    0.018
gb|EBU08120.1|  hypothetical protein GOS_7169411 [marine metag...  40.4    0.057
gb|ECX50005.1|  hypothetical protein GOS_2527321 [marine metag...  40.4    0.061
gb|EBE72341.1|  hypothetical protein GOS_9715930 [marine metag...  40.0    0.063
gb|ECB13204.1|  hypothetical protein GOS_5580058 [marine metag...  39.7    0.098
gb|EDE20847.1|  hypothetical protein GOS_1170778 [marine metag...  38.5    0.18 
gb|EBV66546.1|  hypothetical protein GOS_6869516 [marine metag...  38.5    0.21 
gb|ECL29089.1|  hypothetical protein GOS_3978974 [marine metag...  38.5    0.23 
gb|EBA94587.1|  hypothetical protein GOS_314838 [marine metage...  38.5    0.23 
gb|EDJ05125.1|  hypothetical protein GOS_1763384 [marine metag...  38.1    0.30 
gb|ECZ72804.1|  hypothetical protein GOS_2131654 [marine metag...  37.4    0.46 
gb|EBJ27822.1|  hypothetical protein GOS_8950960 [marine metag...  36.6    0.84 
gb|EBO67416.1|  hypothetical protein GOS_8040183 [marine metag...  36.2    0.92 
gb|EBL89679.1|  hypothetical protein GOS_8496424 [marine metag...  36.2    1.0  
gb|ECX44944.1|  hypothetical protein GOS_2536220 [marine metag...  36.2    1.1  
gb|EBD72952.1|  hypothetical protein GOS_9881768 [marine metag...  35.8    1.5  
gb|ECS91518.1|  hypothetical protein GOS_8932329 [marine metag...  35.4    1.7  
gb|EBZ04160.1|  hypothetical protein GOS_3447734 [marine metag...  35.4    1.8  
gb|ECO06356.1|  hypothetical protein GOS_3471318 [marine metag...  35.0    2.1  
gb|ECZ01619.1|  hypothetical protein GOS_2256350 [marine metag...  35.0    2.1  
gb|ECQ42432.1|  hypothetical protein GOS_5713727 [marine metag...  35.0    2.2  
gb|EDG44476.1|  hypothetical protein GOS_780794 [marine metage...  35.0    2.3  
gb|EBN14509.1|  hypothetical protein GOS_8294558 [marine metag...  35.0    2.3  
gb|ECH31853.1|  hypothetical protein GOS_5739219 [marine metag...  35.0    2.4  
gb|EBI08196.1|  hypothetical protein GOS_9152602 [marine metag...  35.0    2.4  
gb|EDD19728.1|  hypothetical protein GOS_1342414 [marine metag...  35.0    2.6  
gb|EBL14977.1|  hypothetical protein GOS_8615611 [marine metag...  35.0    2.6  
gb|ECO39520.1|  hypothetical protein GOS_5624697 [marine metag...  35.0    2.6  
gb|ECL74184.1|  hypothetical protein GOS_5676354 [marine metag...  34.7    2.8  
gb|EDI47895.1|  hypothetical protein GOS_425797 [marine metage...  34.7    2.8  
gb|EDD59641.1|  hypothetical protein GOS_1277409 [marine metag...  34.7    2.9  
gb|ECS93926.1|  hypothetical protein GOS_8928364 [marine metag...  34.7    3.0  
gb|EBN23182.1|  hypothetical protein GOS_8280386 [marine metag...  34.7    3.0  
gb|EBI24997.1|  hypothetical protein GOS_9124339 [marine metag...  34.7    3.0  
gb|EBM90446.1|  hypothetical protein GOS_8333749 [marine metag...  34.7    3.0  
gb|EDE79765.1|  hypothetical protein GOS_1068140 [marine metag...  34.7    3.1  
gb|EBV55734.1|  hypothetical protein GOS_6886842 [marine metag...  34.7    3.1  
gb|ECM77121.1|  hypothetical protein GOS_5080530 [marine metag...  34.7    3.1  
gb|EBN29639.1|  hypothetical protein GOS_8269885 [marine metag...  34.7    3.2  
gb|ECH92179.1|  hypothetical protein GOS_3353504 [marine metag...  34.7    3.2  
gb|EDG48816.1|  hypothetical protein GOS_773173 [marine metage...  34.7    3.3  
gb|ECQ29373.1|  hypothetical protein GOS_6230205 [marine metag...  34.7    3.3  
gb|EDH62935.1|  hypothetical protein GOS_570813 [marine metage...  34.7    3.3  
gb|ECO48815.1|  hypothetical protein GOS_5233834 [marine metag...  34.7    3.4  
gb|ECS93886.1|  hypothetical protein GOS_8928418 [marine metag...  34.7    3.5  
gb|EDA30341.1|  hypothetical protein GOS_2026200 [marine metag...  34.3    3.5  
gb|ECF07760.1|  hypothetical protein GOS_4161494 [marine metag...  34.3    3.6  
gb|ECK45717.1|  hypothetical protein GOS_3783849 [marine metag...  34.3    3.6  
gb|EBP03457.1|  hypothetical protein GOS_7978536 [marine metag...  34.3    3.6  
gb|EBU62722.1|  hypothetical protein GOS_7031138 [marine metag...  34.3    3.7  
gb|EDG06126.1|  hypothetical protein GOS_847253 [marine metage...  34.3    3.7  
gb|ECJ61446.1|  hypothetical protein GOS_3641965 [marine metag...  34.3    3.8  
gb|ECA46306.1|  hypothetical protein GOS_4758410 [marine metag...  34.3    3.9  
gb|EBA94061.1|  hypothetical protein GOS_315706 [marine metage...  34.3    3.9  
gb|ECN19099.1|  hypothetical protein GOS_3417676 [marine metag...  34.3    3.9  
gb|EBM79610.1|  hypothetical protein GOS_8351846 [marine metag...  34.3    3.9  
gb|EDA63871.1|  hypothetical protein GOS_1964896 [marine metag...  34.3    4.0  
gb|ECR17013.1|  hypothetical protein GOS_6255104 [marine metag...  34.3    4.1  
gb|ECA32807.1|  hypothetical protein GOS_5283709 [marine metag...  34.3    4.1  
gb|ECA58939.1|  hypothetical protein GOS_4253312 [marine metag...  34.3    4.1  
gb|EBV15964.1|  hypothetical protein GOS_6947085 [marine metag...  34.3    4.1  
gb|ECL34697.1|  hypothetical protein GOS_3762427 [marine metag...  34.3    4.2  
gb|EDG19173.1|  hypothetical protein GOS_824428 [marine metage...  34.3    4.2  
gb|EBG70792.1|  hypothetical protein GOS_9387673 [marine metag...  34.3    4.2  
gb|ECW50480.1|  hypothetical protein GOS_2709508 [marine metag...  34.3    4.3  
gb|EDD25300.1|  hypothetical protein GOS_1332888 [marine metag...  34.3    4.3  
gb|EBO50235.1|  hypothetical protein GOS_8069502 [marine metag...  34.3    4.3  
gb|ECA06277.1|  hypothetical protein GOS_6373056 [marine metag...  34.3    4.4  
gb|EBP20659.1|  hypothetical protein GOS_7948975 [marine metag...  34.3    4.4  
gb|EDC21480.1|  hypothetical protein GOS_1515846 [marine metag...  34.3    4.5  
gb|EBC38457.1|  hypothetical protein GOS_78649 [marine metagen...  34.3    4.5  
gb|ECM70718.1|  hypothetical protein GOS_5340113 [marine metag...  34.3    4.5  
gb|EBL74735.1|  hypothetical protein GOS_8520701 [marine metag...  33.9    4.6  
gb|EBJ60780.1|  hypothetical protein GOS_8870028 [marine metag...  33.9    4.6  
gb|EBJ38546.1|  hypothetical protein GOS_8907144 [marine metag...  33.9    4.7  
gb|ECG65837.1|  hypothetical protein GOS_4870410 [marine metag...  33.9    4.8  
gb|EBO25160.1|  hypothetical protein GOS_8111482 [marine metag...  33.9    4.8  
gb|ECX17008.1|  hypothetical protein GOS_2586568 [marine metag...  33.9    5.1  
gb|EBV77669.1|  hypothetical protein GOS_6852464 [marine metag...  33.9    5.2  
gb|EDH65548.1|  hypothetical protein GOS_565904 [marine metage...  33.9    5.3  
gb|ECC72461.1|  hypothetical protein GOS_6283700 [marine metag...  33.9    5.3  
gb|ECR37166.1|  hypothetical protein GOS_5445902 [marine metag...  33.9    5.6  
gb|EBF39944.1|  hypothetical protein GOS_9604041 [marine metag...  33.9    5.7  
gb|EBY95925.1|  hypothetical protein GOS_3762253 [marine metag...  33.9    5.7  
gb|EBY78224.1|  hypothetical protein GOS_4452567 [marine metag...  33.9    5.7  
gb|EBF05996.1|  hypothetical protein GOS_9659297 [marine metag...  33.5    6.0  
gb|EBK08090.1|  hypothetical protein GOS_8791689 [marine metag...  33.5    6.2  
gb|ECP73707.1|  hypothetical protein GOS_4897231 [marine metag...  33.5    6.2  
gb|EDH45672.1|  hypothetical protein GOS_602401 [marine metage...  33.5    6.3  
gb|ECN59035.1|  hypothetical protein GOS_5307759 [marine metag...  33.5    6.3  
gb|EBD87573.1|  hypothetical protein GOS_9858059 [marine metag...  33.5    6.3  
gb|ECO75524.1|  hypothetical protein GOS_4185795 [marine metag...  33.5    6.5  
gb|ECJ19122.1|  hypothetical protein GOS_5288444 [marine metag...  33.5    6.5  
gb|EBX11311.1|  hypothetical protein GOS_6638624 [marine metag...  33.5    6.5  
gb|ECK98200.1|  hypothetical protein GOS_5178716 [marine metag...  33.5    6.6  
gb|EBK98292.1|  hypothetical protein GOS_8643216 [marine metag...  33.5    6.6  



b) alignements deux à deux



>gb|EDJ40535.1|  hypothetical protein GOS_1701247 [marine metagenome]
Length=439

 Score = 47.8 bits (112),  Expect = 3e-04, Method: Compositional matrix adjust.
 Identities = 49/148 (33%), Positives = 68/148 (45%), Gaps = 13/148 (8%)

Query  72   PNHVSV-----TDAALNKQQLGLSPDSLLLVGTGGLDLAGGFDRFISIASCALADPCEPT  126
            PN V +      D A  +Q+LGL PD L+LV  GGL    GF R I +   AL +   P 
Sbjct  229  PNGVDIEKFRPIDQATARQKLGLEPDDLVLVSVGGLVEGKGFHRVIEVLP-ALRERF-PA  286

Query  127  VQFLWVG---PLDNSASETTTLVRELHDAELGDRVLMTADEGQASDLLRVADVLLMTHRE  183
            + FL VG   P D+  +   + V EL  A+    +   A E Q    L  AD+ ++  R 
Sbjct  287  LNFLIVGKGWPWDDWEARLRSQVIELGLADCVKFLGPVAPE-QLHVPLSAADLFVLATRS  345

Query  184  PSGSTSINEYLAAEKPIVWFR--SNPAV  209
               +  + E +A   P+V  R   NP V
Sbjct  346  EGWANVLLEAMACGLPVVTTRVGGNPEV  373


>gb|EBV08418.1|  hypothetical protein GOS_6958884 [marine metagenome]
Length=232

 Score = 47.4 bits (111),  Expect = 4e-04, Method: Compositional matrix adjust.
 Identities = 42/160 (26%), Positives = 72/160 (45%), Gaps = 3/160 (1%)

Query  91   PDSLLLVGTGGLDLA-GGFDRFISIASCALADPCEPTVQFLWVGPLDNSASETTTLVREL  149
            P S  +VGT G  +   G D F++          E  + F+W G   N+AS     V EL
Sbjct  48   PISSFIVGTCGTPIWRKGPDIFLNTVKKIATKYSEENIFFIWFGGDQNTASFLDFNV-EL  106

Query  150  HDAELGDRVLMTADEGQASDLLRVADVLLMTHREPSGSTSINEYLAAEKPIVWFRSNPAV  209
             + EL + V +     +     +  D+L  T RE     ++ E   ++ P + F  +   
Sbjct  107  ENLELKNFVKVFPSSKELKFFYKSLDMLFCTSREEPFGLTLFEAGLSKIPCLAFEKSGGP  166

Query  210  ERLMSGDPCGVEP-GRVLQAADQVKCLLSDQDYRRRLSNA  248
            E ++S +   + P G   +AAD++  + +D+  R + SNA
Sbjct  167  EEILSDNKGIIIPYGDFSKAADEIVKIKNDKIIREKYSNA  206


>gb|EDJ21520.1|  hypothetical protein GOS_1735167 [marine metagenome]
Length=695

 Score = 47.0 bits (110),  Expect = 5e-04, Method: Compositional matrix adjust.
 Identities = 45/164 (27%), Positives = 66/164 (40%), Gaps = 5/164 (3%)

Query  85   QQLGLSPDSLLLVGTGGLDLAGGFDRFISIASCALADPCEPTVQFLWVGPLDNSASETTT  144
            + LG++ D+ +++G G  D   G D F  +    L     P  + +WVG LD S +E   
Sbjct  224  ETLGIAADAEVVLGVGYGDRRKGIDLFADMGCAVLQS--RPQTRCVWVGDLDASVAEDVR  281

Query  145  LVRELHDAELGDRVLMTADEGQASDLLRVADVLLMTHREPSGSTSINEYLAAEKPIVWFR  204
            L   +  A L  R      + + +     ADV  +T RE    + + E L A  P+V F 
Sbjct  282  LF--IAQAGLASRFHFLGWQNETALYYAGADVYALTSREDPFPSVVLEALHAGLPVVAFD  339

Query  205  SNPAVERLMSGDPCGVEPGRVLQA-ADQVKCLLSDQDYRRRLSN  247
                  RL       + P    +A    V  LL D   R RL  
Sbjct  340  GGGGYVRLAESGCVRLVPAEDAKAFGAAVVALLGDSPSRERLGQ  383


>gb|ECY60820.1|  hypothetical protein GOS_2330446 [marine metagenome]
Length=344

 Score = 44.3 bits (103),  Expect = 0.004, Method: Compositional matrix adjust.
 Identities = 51/177 (28%), Positives = 78/177 (44%), Gaps = 17/177 (9%)

Query  95   LLVGTGGLDLAGGFDRFISIASCALADPCEPTVQFLWVGPLDNSASETTTLVRELHDAEL  154
            +L+  G L    GFD  +     ALA    P V FLW   L     +   L+R   + ++
Sbjct  173  ILLALGRLHENKGFDVLLE----ALA--AVPDV-FLW---LAGDGPQKQNLIRIADNLKI  222

Query  155  GDRVLMTADEGQASDLLRVADVLLMTHR-EPSGSTSINEYLAAEKPIVWFRSNPAVERLM  213
             DRV         S+LL  AD L+   R EP G+  +  ++   KP++   S+   E + 
Sbjct  223  SDRVRFLGWRSDTSNLLAAADALICPSRHEPLGNVILESWVQG-KPVIAAASDGPKELIT  281

Query  214  SGDP---CGVEPGRVLQAADQVKCLLSDQDYRRRLSNANRKRRDRLVSVDELVYQAS  267
            SG     C ++    L  AD +K LL D +    L++  RK  +R  S   +V Q +
Sbjct  282  SGKNGFLCEIDNPTNL--ADTIKVLLGDANLAEELASEGRKTYERNFSKKIIVKQYT  336