GOS 1325040

From Metagenes
Warning: this metagenomic sequence has been carefully annotated by students during bioinformatics assignments. These quality annotations are therefore the result of a teaching exercise that you are most welcome to amend and extend if necessary!


Sequence
CAMERA AccNum : JCVI_READ_1092343626978
Annotathon code: GOS_1325040
Sample :
  • GPS :15°16'40s; 148°13'28w
  • Polynesia Archipelagos: Tikehau Lagoon - Fr. Polynesia
  • Coral Atoll (-1.2m, 27.8°C, 0.1-0.8 microns)
Authors
Team : Algarve
Username : malufa
Annotated on : 2010-07-04 23:58:29
  • FábioAlexandreBastos a36796@ualg.pt
  • LuísAndréRoqueFortes a36803@ualg.pt
  • MariaGonçalvesFernandes a36805@ualg.pt

Synopsis

Genomic Sequence

>JCVI_READ_1092343626978 GOS_1325040 Genomic DNA
TCGAGAGCAAAGTGAAAAGCAAAGAATTCTTTATCCATATTTAATATCAATAGGAAGGCAGACCTCATCGGATATGGTTGAGCAGGACTCCATCTCTCAC
AGTGCAGATACAAGAAGAATTTTAAGTCAAACAGTAGGAATCCAAGAACAATTGTATTTGAAAATGCGTCCTGAGAATAGCAATGACTCAGGGAGTGAAG
TTGTTAAGCATGAGCGTGTAAACCGTTCTGATTCGTTCCATCGCGATCAGTTATTTTTACAAGCATTGCGTCAAGGCGTATTAATTCAACAATTAGCTGC
AACGGCATTAAACCTTTTAGATGGCTTTGGTACAAATAACGTCAATTTGAATACGGTTGGGAGAGACCCAGCTAATTTGAATGCTCAGTCAATTTATAAT
CGTTTAATTGATCCTGATAATGGTCGATTAGCACAACTTTCAGAAGCTTTGTCTGTTTTGTCTAATAATACGGGTATTGATGAAGAAACAGCTATACAAA
TTAACGAATTATTGAATGTTGAAGTGGGTCAATCAGGATTGACTTTTAAGCAGTGGATTAGTGGCGATCCTGCTGTGTCTGAATATATAGAAAATAACAG
ATCCAACCCAGAAGCTGTTCTCAATGATTTGCGTGAAGTTCTGGTCTTAGCTCATCAATCAGCAAGTGAAATAGCTAATGACCTGTATACCCAGCTTGCC
AATCATGTCATTCTTTTAGAAGAAAATGATTTGGTTGAAAGTGGTGTAGAGGCGGCTGCAGAGAGTGCCGAAACATCAGAATCAGAAGTGATGCATTATT
TGAATCATCAGAACAGCAAGTTCAATTCAACACTAAACCCTCAGCATTCAGTGTAGATTCATGTTTTGTCGAAGATGACTCTTTTGATGAAAGTGATGAT
TCGGCATCTCTTGTCGATCAGCAATCGT

Translation

[2 - 853/928]   direct strand
>GOS_1325040 Translation [2-853   direct strand]
REQSEKQRILYPYLISIGRQTSSDMVEQDSISHSADTRRILSQTVGIQEQLYLKMRPENSNDSGSEVVKHERVNRSDSFHRDQLFLQALRQGVLIQQLAA
TALNLLDGFGTNNVNLNTVGRDPANLNAQSIYNRLIDPDNGRLAQLSEALSVLSNNTGIDEETAIQINELLNVEVGQSGLTFKQWISGDPAVSEYIENNR
SNPEAVLNDLREVLVLAHQSASEIANDLYTQLANHVILLEENDLVESGVEAAAESAETSESEVMHYLNHQNSKFNSTLNPQHSV

[ Warning ] 5' incomplete: does not start with a Methionine

Annotator commentaries

The ORF sequence does contain over 60aa in lenght (284aa in totality). Like it is very long, it is likely that this ORF does indeed code for a protein. So, knowing that e-values are non-significants (E_0.048 on tBLASTx) over all the Blast research showing that there isn't credible homology sequences, this is an ORFan. So, to conclude, it's coding.


Due non-significant e-values the taxonomy, molecular weight and biological process can't be concluded.


ORF finding

PROTOCOL


a)foward strand

Sequence Manipulation Suite (SMS)/ Orf finder /'any codon' initiation / frames 1, 2 and 3 / forward strand / 60 codons / 'standart (1)' genetic code


b)reverse strand

Sequence Manipulation Suite (SMS)/ Orf finder /'any codon' initiation / frames 1, 2 and 3 / reverse strand / 60 codons / 'standart (1)' genetic code



RESULTS ANALYSIS


When doing research with SMS it was found three open reading frames (two on forward strand and one on reverse strand). Was selected the longest one for all subsequent analyses which is ORF number 1 in reading frame 2 on the forward strand. This ORF extends from base 2 to base 856.

RAW RESULTS

a) forward strand 

>ORF number 1 in reading frame 1 on the direct strand extends from base 739 to base 927.
AAGTGGTGTAGAGGCGGCTGCAGAGAGTGCCGAAACATCAGAATCAGAAGTGATGCATTA
TTTGAATCATCAGAACAGCAAGTTCAATTCAACACTAAACCCTCAGCATTCAGTGTAGAT
TCATGTTTTGTCGAAGATGACTCTTTTGATGAAAGTGATGATTCGGCATCTCTTGTCGAT
CAGCAATCG

>Translation of ORF number 1 in reading frame 1 on the direct strand.
KWCRGGCRECRNIRIRSDALFESSEQQVQFNTKPSAFSVDSCFVEDDSFDESDDSASLVD
QQS

>ORF number 1 in reading frame 2 on the direct strand extends from base 2 to base 856.
CGAGAGCAAAGTGAAAAGCAAAGAATTCTTTATCCATATTTAATATCAATAGGAAGGCAG
ACCTCATCGGATATGGTTGAGCAGGACTCCATCTCTCACAGTGCAGATACAAGAAGAATT
TTAAGTCAAACAGTAGGAATCCAAGAACAATTGTATTTGAAAATGCGTCCTGAGAATAGC
AATGACTCAGGGAGTGAAGTTGTTAAGCATGAGCGTGTAAACCGTTCTGATTCGTTCCAT
CGCGATCAGTTATTTTTACAAGCATTGCGTCAAGGCGTATTAATTCAACAATTAGCTGCA
ACGGCATTAAACCTTTTAGATGGCTTTGGTACAAATAACGTCAATTTGAATACGGTTGGG
AGAGACCCAGCTAATTTGAATGCTCAGTCAATTTATAATCGTTTAATTGATCCTGATAAT
GGTCGATTAGCACAACTTTCAGAAGCTTTGTCTGTTTTGTCTAATAATACGGGTATTGAT
GAAGAAACAGCTATACAAATTAACGAATTATTGAATGTTGAAGTGGGTCAATCAGGATTG
ACTTTTAAGCAGTGGATTAGTGGCGATCCTGCTGTGTCTGAATATATAGAAAATAACAGA
TCCAACCCAGAAGCTGTTCTCAATGATTTGCGTGAAGTTCTGGTCTTAGCTCATCAATCA
GCAAGTGAAATAGCTAATGACCTGTATACCCAGCTTGCCAATCATGTCATTCTTTTAGAA
GAAAATGATTTGGTTGAAAGTGGTGTAGAGGCGGCTGCAGAGAGTGCCGAAACATCAGAA
TCAGAAGTGATGCATTATTTGAATCATCAGAACAGCAAGTTCAATTCAACACTAAACCCT
CAGCATTCAGTGTAG

>Translation of ORF number 1 in reading frame 2 on the direct strand.
REQSEKQRILYPYLISIGRQTSSDMVEQDSISHSADTRRILSQTVGIQEQLYLKMRPENS
NDSGSEVVKHERVNRSDSFHRDQLFLQALRQGVLIQQLAATALNLLDGFGTNNVNLNTVG
RDPANLNAQSIYNRLIDPDNGRLAQLSEALSVLSNNTGIDEETAIQINELLNVEVGQSGL
TFKQWISGDPAVSEYIENNRSNPEAVLNDLREVLVLAHQSASEIANDLYTQLANHVILLE
ENDLVESGVEAAAESAETSESEVMHYLNHQNSKFNSTLNPQHSV*

No ORFs were found in reading frame 3.
-------------------------------------------------------------------------------

b) reverse strand

No ORFs were found in reading frame 1.

No ORFs were found in reading frame 2.

>ORF number 1 in reading frame 3 on the reverse strand extends from base 726 to base 926.
CAACTTCACTCCCTGAGTCATTGCTATTCTCAGGACGCATTTTCAAATACAATTGTTCTT
GGATTCCTACTGTTTGACTTAAAATTCTTCTTGTATCTGCACTGTGAGAGATGGAGTCCT
GCTCAACCATATCCGATGAGGTCTGCCTTCCTATTGATATTAAATATGGATAAAGAATTC
TTTGCTTTTCACTTTGCTCTC

>Translation of ORF number 1 in reading frame 3 on the reverse strand.
QLHSLSHCYSQDAFSNTIVLGFLLFDLKFFLYLHCERWSPAQPYPMRSAFLLILNMDKEF
FAFHFAL

Multiple Alignement

PROTOCOL



RESULTS ANALYSIS

RAW RESULTS

Protein Domains

PROTOCOL

INTERPRO scan



RESULTS ANALYSIS

Until now there aren't homologous sequences on database used by INTERPRO to have results.

RAW RESULTS
No hits reported.

Phylogeny

PROTOCOL



RESULTS ANALYSIS

RAW RESULTS

Taxonomy report

PROTOCOL


a) BLASTp

BLASTp versus Non-redundant protein sequence(NR), NCBI default parameters apart from "Max Target Sequence_1000" on Algorithm Parameters / Taxonomy Reports / Lineage Report


b) BLASTx

BLASTx versus Non-redundant protein sequence(NR), NCBI default parameters apart from "Max Target Sequence_1000" on Algorithm Parameters / Taxonomy Reports / Lineage Report


c) tBLASTn

tBLASTn versus Non-redundant protein sequence(NR), NCBI default parameters apart from "Max Target Sequence_1000" on Algorithm Parameters / Taxonomy Reports / Lineage Report


d) tblastx

tBLASTx versus Non-redundant protein sequence(NR), NCBI default parameters apart from "Max Target Sequence_1000" on Algorithm Parameters / Taxonomy Reports / Lineage Report



RESULTS ANALYSIS

We have tried to found results, but all we've got aren't true, so the possible function of protein is unknown. Impossible to find something with significant e-values.

RAW RESULTS

a) Blastp

Lineage Report

cellular organisms
. Eukaryota           [eukaryotes]
. . Fungi/Metazoa group [eukaryotes]
. . . Candida             [ascomycetes]
. . . . Candida albicans    [ascomycetes]
. . . . . Candida albicans SC5314 -----   39 2 hits [ascomycetes]         likely long chain fatty acid-CoA synthetase Faa4p [Candida 
. . . . . Candida albicans WO-1 .......   39 1 hit  [ascomycetes]         likely long chain fatty acid-CoA synthetase Faa4p [Candida 
. . . . Candida dubliniensis CD36 -----   37 2 hits [ascomycetes]         long-chain acyl-coa synthetase, putative; long-chain-fatty-
. . . . Candida tropicalis MYA-3404 ...   35 2 hits [ascomycetes]         long-chain-fatty-acid--CoA ligase 1 [Candida tropicalis MYA
. . . Gallus gallus (bantam) ----------   36 2 hits [birds]               PREDICTED: similar to voltage-gated sodium channel type II 
. . . Hydra magnipapillata ............   35 1 hit  [hydrozoans]          PREDICTED: similar to transposase domain-containing protein
. . Sorghum bicolor (milo) ------------   36 2 hits [monocots]            hypothetical protein SORBIDRAFT_02g022180 [Sorghum bicolor]
. Methanobrevibacter smithii DSM 2374 -   38 2 hits [euryarchaeotes]      desulfoferrodoxin [Methanobrevibacter smithii DSM 2374] >gi
. Bacteroides sp. 4_3_47FAA ...........   37 2 hits [CFB group bacteria]  AcrB/AcrD family multidrug resistance protein [Bacteroides 
. Bacteroides vulgatus ATCC 8482 ......   37 2 hits [CFB group bacteria]  AcrB/AcrD family multidrug resistance protein [Bacteroides 
. Bordetella pertussis Tohama I .......   36 2 hits [b-proteobacteria]    hypothetical protein BP2583 [Bordetella pertussis Tohama I]
. Bacteroides dorei DSM 17855 .........   35 2 hits [CFB group bacteria]  hypothetical protein BACDOR_00220 [Bacteroides dorei DSM 17
. Bacteroides sp. 9_1_42FAA ...........   35 2 hits [CFB group bacteria]  hypothetical protein BACDOR_00220 [Bacteroides dorei DSM 17
. Bacteroides dorei 5_1_36/D4 .........   35 2 hits [CFB group bacteria]  hypothetical protein BACDOR_00220 [Bacteroides dorei DSM 17
. Bacteroides sp. 3_1_33FAA ...........   35 2 hits [CFB group bacteria]  hypothetical protein BACDOR_00220 [Bacteroides dorei DSM 17
. Clostridium difficile QCD-66c26 .....   35 1 hit  [firmicutes]          putative helicase [Clostridium difficile QCD-66c26] >gi|255
. Clostridium difficile QCD-37x79 .....   35 1 hit  [firmicutes]          putative helicase [Clostridium difficile QCD-66c26] >gi|255
. Clostridium difficile R20291 ........   35 2 hits [firmicutes]          putative helicase [Clostridium difficile QCD-66c26] >gi|255
. Clostridium difficile 630 ...........   35 2 hits [firmicutes]          putative conjugative transposon DNA recombination protein [
. Clostridium difficile QCD-32g58 .....   35 1 hit  [firmicutes]          hypothetical protein CdifQ_04002142 [Clostridium difficile 
. Colwellia psychrerythraea 34H .......   35 2 hits [g-proteobacteria]    putative MSHA biogenesis protein MshN [Colwellia psychreryt

----------------------------------------------------------------------------------------------------------------------------------------

b)Bastx
Lineage Report

cellular organisms
. Bacteria           [bacteria]
. . Lactobacillus vaginalis ATCC 49540 -   38 2 hits [firmicutes]          flavin reductase [Lactobacillus vaginalis ATCC 49540] >gi|2
. . Colwellia psychrerythraea 34H ......   38 2 hits [g-proteobacteria]    putative MSHA biogenesis protein MshN [Colwellia psychreryt
. . Bordetella pertussis Tohama I ......   38 2 hits [b-proteobacteria]    hypothetical protein BP2583 [Bordetella pertussis Tohama I]
. . Bordetella parapertussis 12822 .....   37 1 hit  [b-proteobacteria]    hypothetical protein BPP2603 [Bordetella parapertussis 1282
. . Bordetella bronchiseptica RB50 .....   37 2 hits [b-proteobacteria]    hypothetical protein BPP2603 [Bordetella parapertussis 1282
. . Bordetella parapertussis ...........   37 1 hit  [b-proteobacteria]    hypothetical protein BPP2603 [Bordetella parapertussis 1282
. . Bacteroides sp. 4_3_47FAA ..........   36 2 hits [CFB group bacteria]  AcrB/AcrD family multidrug resistance protein [Bacteroides 
. . Bacteroides vulgatus ATCC 8482 .....   36 2 hits [CFB group bacteria]  AcrB/AcrD family multidrug resistance protein [Bacteroides 
. . Cyanothece sp. CCY0110 .............   36 2 hits [cyanobacteria]       hypothetical protein CY0110_06504 [Cyanothece sp. CCY0110] 
. Tetrahymena thermophila --------------   37 2 hits [ciliates]            hypothetical protein TTHERM_00689930 [Tetrahymena thermophi
. Tetrahymena thermophila SB210 ........   37 2 hits [ciliates]            hypothetical protein TTHERM_00689930 [Tetrahymena thermophi
. Methanobrevibacter smithii DSM 2374 ..   36 2 hits [euryarchaeotes]      desulfoferrodoxin [Methanobrevibacter smithii DSM 2374] >gi
. Sorghum bicolor (milo) ...............   36 2 hits [monocots]            hypothetical protein SORBIDRAFT_02g022180 [Sorghum bicolor]
. Candida albicans SC5314 ..............   36 2 hits [ascomycetes]         likely long chain fatty acid-CoA synthetase Faa4p [Candida 
. Candida albicans WO-1 ................   36 1 hit  [ascomycetes]         likely long chain fatty acid-CoA synthetase Faa4p [Candida 
. Lachancea thermotolerans CBS 6340 ....   35 1 hit  [ascomycetes]         KLTH0F02508p [Lachancea thermotolerans] >gi|238935703|emb|C
. Lachancea thermotolerans .............   35 1 hit  [ascomycetes]         KLTH0F02508p [Lachancea thermotolerans] >gi|238935703|emb|C
. Ricinus communis .....................   35 2 hits [eudicots]            translation initiation factor, putative [Ricinus communis] 
. Danio rerio (zebra fish) .............   35 2 hits [bony fishes]         PREDICTED: hypothetical protein [Danio rerio]
. Candida dubliniensis CD36 ............   35 2 hits [ascomycetes]         long-chain acyl-coa synthetase, putative; long-chain-fatty-
. Schizosaccharomyces pombe ............   35 3 hits [ascomycetes]         autophagy associated protein Aut12 [Schizosaccharomyces pom

--------------------------------------------------------------------------------

-------------------------------------------------------------------------------------------------------------------------------------------
c)tblastn
Lineage Report

Candida [ascomycetes]
. Candida albicans SC5314 ---   39 1 hit  [ascomycetes]  Candida albicans SC5314 likely long chain fatty acid-CoA sy
. Candida dubliniensis CD36 .   38 2 hits [ascomycetes]  Candida dubliniensis CD36 long-chain acyl-coa synthetase, p
-----------------------------------------------------------------------------------------------------------------------------------

d) tblastx

Lineage Report

Bacteria [bacteria]
. Pirellula staleyi DSM 6068 -----      1 hit  [planctomycetes]    Pirellula staleyi DSM 6068, complete genome
. Bordetella pertussis Tohama I ..      1 hit  [b-proteobacteria]  Bordetella pertussis strain Tohama I, complete genome; segm
. Bordetella bronchiseptica RB50 .      1 hit  [b-proteobacteria]  Bordetella bronchiseptica strain RB50, complete genome; seg
. Bordetella parapertussis .......      1 hit  [b-proteobacteria]  Bordetella parapertussis strain 12822, complete genome; seg

BLAST

PROTOCOL


a) BLASTp

BLASTp versus Non-redundant protein sequence(NR), NCBI default parameters apart from "Max Target Sequence_1000" on Algorithm Parameters


b) BLASTx

BLASTx versus Non-redundant protein sequence(NR), NCBI default parameters apart from "Max Target Sequence_1000" on Algorithm Parameters


c) tBLASTn

tBLASTn versus Non-redundant protein sequence(NR), NCBI default parameters apart from "Max Target Sequence_1000" on Algorithm Parameters


d) tblastx

tBLASTx versus Non-redundant protein sequence(NR), NCBI default parameters apart from "Max Target Sequence_1000" on Algorithm Parameters



RESULTS ANALYSIS

The e-values on Blastp, Blastx, tBlastn and tBastx aren't significants, with E_0.49, E_0.83, E_1.7 and E_2.0 (smallest number) respectively. It happens because until now there aren't homologous sequences on database used by NCBI to have credible results.

RAW RESULTS

a) Blastp
                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|XP_719354.1|  likely long chain fatty acid-CoA synthetase ...  39.3    0.49 
ref|ZP_05974895.1|  desulfoferrodoxin [Methanobrevibacter smit...  38.1    1.2  
ref|XP_002422475.1|  long-chain acyl-coa synthetase, putative;...  37.7    1.3  
ref|ZP_05257114.1|  AcrB/AcrD family multidrug resistance prot...  37.4    1.8  
ref|YP_001297587.1|  AcrB/AcrD family multidrug resistance pro...  37.4    1.8  
ref|XP_422025.2|  PREDICTED: similar to voltage-gated sodium c...  37.0    2.5  
ref|XP_001233892.1|  PREDICTED: similar to voltage-gated sodiu...  37.0    2.5  
ref|XP_002462231.1|  hypothetical protein SORBIDRAFT_02g022180...  36.6    3.5  
ref|NP_881210.1|  hypothetical protein BP2583 [Bordetella pert...  36.6    3.6  
ref|XP_002169261.1|  PREDICTED: similar to transposase domain-...  35.4    6.3  
ref|ZP_03298861.1|  hypothetical protein BACDOR_00220 [Bactero...  35.4    7.1  
ref|XP_002546351.1|  long-chain-fatty-acid--CoA ligase 1 [Cand...  35.4    7.8  
ref|ZP_05273446.1|  putative helicase [Clostridium difficile Q...  35.0    8.9  
ref|YP_001088370.1|  putative conjugative transposon DNA recom...  35.0    9.3  
ref|ZP_01803368.1|  hypothetical protein CdifQ_04002142 [Clost...  35.0    9.7  
ref|YP_271222.1|  putative MSHA biogenesis protein MshN [Colwe...  35.0    9.7  

ALIGNMENTS
>ref|XP_719354.1| likely long chain fatty acid-CoA synthetase Faa4p [Candida albicans 
SC5314]
 gb|EAL00466.1| likely long chain fatty acid-CoA synthetase Faa4p [Candida albicans 
SC5314]
 gb|EEQ44017.1| long-chain-fatty-acid-CoA ligase 4 [Candida albicans WO-1]
Length=696

 Score = 39.3 bits (90),  Expect = 0.49, Method: Compositional matrix adjust.
 Identities = 31/104 (29%), Positives = 42/104 (40%), Gaps = 19/104 (18%)

Query  120  GRDPANLNAQSIYNRLIDPDNGRLAQLSEALSVLSNNTGIDEETAIQINELLN-------  172
            G  P +++AQ   + LI P       L   L+    N  I E T  QI  L         
Sbjct  431  GGSPISVDAQVFISTLIAP-----MLLGYGLTETCANAAILEHTHFQIGTLGTLVGSITA  485

Query  173  --VEVGQSGLTFKQ-----WISGDPAVSEYIENNRSNPEAVLND  209
              V+V  +G   K      W+ G P VSEY +N +   EA  +D
Sbjct  486  KLVDVADAGYYAKNNQGEIWLKGGPVVSEYYKNEKETKEAFTDD  529


>ref|ZP_05974895.1| desulfoferrodoxin [Methanobrevibacter smithii DSM 2374]
 gb|EFC94139.1| desulfoferrodoxin [Methanobrevibacter smithii DSM 2374]
Length=208

 Score = 38.1 bits (87),  Expect = 1.2, Method: Compositional matrix adjust.
 Identities = 26/77 (33%), Positives = 40/77 (51%), Gaps = 10/77 (12%)

Query  158  GIDEETAIQINELLNVEVGQSGLTFKQWISGDPAVSEYIENNRSNPEA-VLNDLREV---  213
            G+ E+T  +I    ++ +G+    F+QW+  +P  SE+I+ N S PE    +D+R V   
Sbjct  43   GLYEKTMDEIYSKRHISIGEDFEDFRQWVKDNPEYSEFIDEN-STPEIDDYSDMRPVGGL  101

Query  214  ---LVLAHQSASEIAND  227
                 L H  A EI ND
Sbjct  102  NVDYFLEH--AEEIIND  116


>ref|XP_002422475.1| long-chain acyl-coa synthetase, putative; long-chain-fatty-acid-CoA 
ligase, putative [Candida dubliniensis CD36]
 emb|CAX40483.1| long-chain acyl-coa synthetase, putative; long-chain-fatty-acid-CoA 
ligase, putative [Candida dubliniensis CD36]
Length=696

 Score = 37.7 bits (86),  Expect = 1.3, Method: Compositional matrix adjust.
 Identities = 36/149 (24%), Positives = 56/149 (37%), Gaps = 28/149 (18%)

Query  83   QLFLQALRQGVLIQQLAATALNLLDGFGTNNVNLNTVGR--------DPANLNAQSIYNR  134
            ++F  A +    ++       ++ D F    V   T G+         P +++AQ   + 
Sbjct  387  KIFWGAFKAKTTLKHFGIPGGDMFD-FVFKKVKSATGGQLRYVLNGGSPISVDAQVFIST  445

Query  135  LIDPDNGRLAQLSEALSVLSNNTGIDEETAIQINELLN---------VEVGQSGLTFKQ-  184
            LI P       L   L+    N  I E T  QI  L           V+V  +G   K  
Sbjct  446  LIAP-----MLLGYGLTETCANAAILEHTHFQIGTLGTLVGSITAKLVDVADAGYYAKNN  500

Query  185  ----WISGDPAVSEYIENNRSNPEAVLND  209
                W+ G P VS+Y +N +   EA  +D
Sbjct  501  QGEIWLKGGPVVSQYYKNEKETKEAFTDD  529


>ref|ZP_05257114.1| AcrB/AcrD family multidrug resistance protein [Bacteroides sp. 
4_3_47FAA]
 gb|EET17506.1| AcrB/AcrD family multidrug resistance protein [Bacteroides sp. 
4_3_47FAA]
Length=1072

 Score = 37.4 bits (85),  Expect = 1.8, Method: Composition-based stats.
 Identities = 27/112 (24%), Positives = 57/112 (50%), Gaps = 3/112 (2%)

Query  141  GRLAQLSEALSVLSNNTGIDEETAIQINELLNVEVGQSGLTFKQWISGDPAVSEYI-ENN  199
            GRL Q +E   ++     + +   +++ ++ ++E+G+   TF   ++G  AVS  + +  
Sbjct  236  GRLQQTTEFEDIVIK--ALPDGNVLRLGDVADIELGRLAYTFNNMVNGHKAVSCIVYQMA  293

Query  200  RSNPEAVLNDLREVLVLAHQSASEIANDLYTQLANHVILLEENDLVESGVEA  251
             SN    ++DL +VL  A +S     N    Q AN  +    ++++++ +EA
Sbjct  294  GSNATETISDLEKVLAKAQESLPTGLNINIAQNANDFLFASIHEVIKTLIEA  345


>ref|YP_001297587.1| AcrB/AcrD family multidrug resistance protein [Bacteroides vulgatus 
ATCC 8482]
 gb|ABR37965.1| AcrB/AcrD family multidrug resistance protein [Bacteroides vulgatus 
ATCC 8482]
Length=1072

 Score = 37.4 bits (85),  Expect = 1.8, Method: Composition-based stats.
 Identities = 27/112 (24%), Positives = 57/112 (50%), Gaps = 3/112 (2%)

Query  141  GRLAQLSEALSVLSNNTGIDEETAIQINELLNVEVGQSGLTFKQWISGDPAVSEYI-ENN  199
            GRL Q +E   ++     + +   +++ ++ ++E+G+   TF   ++G  AVS  + +  
Sbjct  236  GRLQQTTEFEDIVIK--ALPDGNVLRLGDVADIELGRLAYTFNNMVNGHKAVSCIVYQMA  293

Query  200  RSNPEAVLNDLREVLVLAHQSASEIANDLYTQLANHVILLEENDLVESGVEA  251
             SN    ++DL +VL  A +S     N    Q AN  +    ++++++ +EA
Sbjct  294  GSNATETISDLEKVLAKAQESLPTGLNINIAQNANDFLFASIHEVIKTLIEA  345


>ref|XP_422025.2| PREDICTED: similar to voltage-gated sodium channel type II alpha 
subunit isoform 2 [Gallus gallus]
Length=2006

 Score = 37.0 bits (84),  Expect = 2.5, Method: Composition-based stats.
 Identities = 24/106 (22%), Positives = 54/106 (50%), Gaps = 9/106 (8%)

Query  168   NELLNVEVG----QSGLTFKQWISGDPAVSEYIENNRSNPEAVLNDLREVLVLAHQSASE  223
             NE+ N+++     Q G+ F +       V E+I+      + VL++++ +  L ++  S 
Sbjct  999   NEMNNLQIAVARIQKGIDFVK-----RKVREFIQKAFVRKQKVLDEIKPLEDLNNKKDSC  1053

Query  224   IANDLYTQLANHVILLEENDLVESGVEAAAESAETSESEVMHYLNH  269
             I+N    ++  ++  L+E +   SG+ ++ E     ES+ M ++N+
Sbjct  1054  ISNHTIVEIGKNLAYLKEGNGTTSGIGSSVEKYVVDESDYMSFINN  1099


>ref|XP_001233892.1| PREDICTED: similar to voltage-gated sodium channel type II alpha 
subunit isoform 1 [Gallus gallus]
Length=2006

 Score = 37.0 bits (84),  Expect = 2.5, Method: Composition-based stats.
 Identities = 24/106 (22%), Positives = 54/106 (50%), Gaps = 9/106 (8%)

Query  168   NELLNVEVG----QSGLTFKQWISGDPAVSEYIENNRSNPEAVLNDLREVLVLAHQSASE  223
             NE+ N+++     Q G+ F +       V E+I+      + VL++++ +  L ++  S 
Sbjct  999   NEMNNLQIAVARIQKGIDFVK-----RKVREFIQKAFVRKQKVLDEIKPLEDLNNKKDSC  1053

Query  224   IANDLYTQLANHVILLEENDLVESGVEAAAESAETSESEVMHYLNH  269
             I+N    ++  ++  L+E +   SG+ ++ E     ES+ M ++N+
Sbjct  1054  ISNHTIVEIGKNLAYLKEGNGTTSGIGSSVEKYVVDESDYMSFINN  1099


>ref|XP_002462231.1| hypothetical protein SORBIDRAFT_02g022180 [Sorghum bicolor]
 gb|EER98752.1| hypothetical protein SORBIDRAFT_02g022180 [Sorghum bicolor]
Length=1479

 Score = 36.6 bits (83),  Expect = 3.5, Method: Compositional matrix adjust.
 Identities = 30/89 (33%), Positives = 44/89 (49%), Gaps = 10/89 (11%)

Query  114   VNLNTVGRDPANLNAQSIYNRLIDPDNGRLAQLSEALSVLSNNTGIDEETAIQINELLNV  173
              NLN   R  A L   ++YN +IDP+N R+        V+S    +DE +  QI     V
Sbjct  936   CNLNL--RFKAGLPPDAVYNIIIDPENKRV--FKNIKEVISRKVLLDEGSR-QI-----V  985

Query  174   EVGQSGLTFKQWISGDPAVSEYIENNRSN  202
             EV Q+ +    W SG  +V  +++ NR N
Sbjct  986   EVEQAAIWKFLWWSGILSVHVFVDQNRKN  1014


>ref|NP_881210.1| hypothetical protein BP2583 [Bordetella pertussis Tohama I]
 emb|CAE42858.1| hypothetical protein [Bordetella pertussis Tohama I]
Length=252

 Score = 36.6 bits (83),  Expect = 3.6, Method: Compositional matrix adjust.
 Identities = 50/206 (24%), Positives = 78/206 (37%), Gaps = 30/206 (14%)

Query  73   VNRSDSFHRDQLFLQALRQGVLIQQLAATALNLLDGFGTNNVNLNTVGRDPANLNAQSIY  132
            VN+     RD+     LRQ + +    A  L +  G G+N ++L  +G DPANL    + 
Sbjct  50   VNQRIQHERDRALANMLRQHLSVPPAQARVLEVGCGSGSNLLDLIRLGFDPANLMGNDLM  109

Query  133  NRLIDPDNGRLAQLSEALSVLSNNT--------GIDE-ETAIQINELLNVEVGQSGLTFK  183
            +  ++    RL Q   A+ +L+ N         G D    +     +L+ +  +  L  +
Sbjct  110  DSRVEQARHRLPQ---AVPILAGNALDLDLEREGFDVLYQSTVFTSILDAQ-ARRDLAQR  165

Query  184  QWISGDPA----VSEYIENNRSNPEAVLNDLREV-------------LVLAHQSASEIAN  226
             W    P       ++  NN  NP      LREV             + LA   A  +  
Sbjct  166  MWHWTKPGGGVLWYDFAYNNPRNPNVRGVPLREVRALFPHARIHTARVTLAPPLARRLPA  225

Query  227  DLYTQLANHVILLEENDLVESGVEAA  252
             LY  + N    L  + L   G  AA
Sbjct  226  TLYAPVNNLCPFLRTHLLCWIGKPAA  251


>ref|XP_002169261.1| PREDICTED: similar to transposase domain-containing protein, 
partial [Hydra magnipapillata]
Length=706

 Score = 35.4 bits (80),  Expect = 6.3, Method: Compositional matrix adjust.
 Identities = 21/64 (32%), Positives = 33/64 (51%), Gaps = 1/64 (1%)

Query  202  NPEAVLNDLREVLVLAHQSASEIANDLYTQLANHVILLEENDLVESGVEAAAESAETSES  261
            N +++L D+R + VL HQ+  E  NDL T L NH   L ++     G     ++ E +E 
Sbjct  84   NSQSLLKDIR-IWVLKHQTTRECVNDLLTILRNHGHQLPKDTRTVLGTLNKVDTTEMNEG  142

Query  262  EVMH  265
            E  +
Sbjct  143  EYKY  146


>ref|ZP_03298861.1| hypothetical protein BACDOR_00220 [Bacteroides dorei DSM 17855]
 ref|ZP_04540251.1| AcrB/AcrD family multidrug resistance protein [Bacteroides sp. 
9_1_42FAA]
 ref|ZP_04555560.1| AcrB/AcrD family multidrug resistance protein [Bacteroides sp. 
D4]
 ref|ZP_06089590.1| AcrB/AcrD family multidrug resistance protein [Bacteroides sp. 
3_1_33FAA]
 gb|EEB27279.1| hypothetical protein BACDOR_00220 [Bacteroides dorei DSM 17855]
 gb|EEO46422.1| AcrB/AcrD family multidrug resistance protein [Bacteroides sp. 
D4]
 gb|EEO62127.1| AcrB/AcrD family multidrug resistance protein [Bacteroides sp. 
9_1_42FAA]
 gb|EEZ20665.1| AcrB/AcrD family multidrug resistance protein [Bacteroides sp. 
3_1_33FAA]
Length=1072

 Score = 35.4 bits (80),  Expect = 7.1, Method: Composition-based stats.
 Identities = 26/112 (23%), Positives = 57/112 (50%), Gaps = 3/112 (2%)

Query  141  GRLAQLSEALSVLSNNTGIDEETAIQINELLNVEVGQSGLTFKQWISGDPAVSEYI-ENN  199
            GRL Q +E   ++     + +   +++ ++ ++E+G+   TF   ++G  AVS  + +  
Sbjct  236  GRLQQTNEFEDIVIK--ALPDGNVLRLGDVADIELGRLAYTFNNTVNGHKAVSCIVYQMA  293

Query  200  RSNPEAVLNDLREVLVLAHQSASEIANDLYTQLANHVILLEENDLVESGVEA  251
             +N    +++L EVL  A +S     N    Q AN  +    ++++++ +EA
Sbjct  294  GTNATETISNLEEVLAKAQESLPTGLNINIAQNANDFLFASIHEVIKTLIEA  345


>ref|XP_002546351.1| long-chain-fatty-acid--CoA ligase 1 [Candida tropicalis MYA-3404]
 gb|EER30430.1| long-chain-fatty-acid--CoA ligase 1 [Candida tropicalis MYA-3404]
Length=696

 Score = 35.4 bits (80),  Expect = 7.8, Method: Compositional matrix adjust.
 Identities = 29/104 (27%), Positives = 40/104 (38%), Gaps = 19/104 (18%)

Query  120  GRDPANLNAQSIYNRLIDPDNGRLAQLSEALSVLSNNTGIDEETAIQINELLN-------  172
            G  P +++AQ   + L+ P       L   L+    NT I E T  QI  L         
Sbjct  430  GGSPISIDAQVFISTLLAP-----MLLGYGLTETCANTTITEHTRFQIGTLGALVGSVTA  484

Query  173  --VEVGQSGLTFKQ-----WISGDPAVSEYIENNRSNPEAVLND  209
              V+V  +G   K      W+ G P V EY +N      A  +D
Sbjct  485  KLVDVADAGYFAKNNQGEIWLKGGPVVKEYYKNEEETKAAFTDD  528

------------------------------------------------------------------------------------------------------------------------

b) BLASTx


                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|ZP_03959799.1|  flavin reductase [Lactobacillus vaginalis ...  38.9    0.83 
ref|YP_271222.1|  putative MSHA biogenesis protein MshN [Colwe...  38.9    0.83 
ref|NP_881210.1|  hypothetical protein BP2583 [Bordetella pert...  38.1    1.4  
ref|NP_884827.1|  hypothetical protein BPP2603 [Bordetella par...  37.7    1.8  
ref|XP_001026996.1|  hypothetical protein TTHERM_00689930 [Tet...  37.4    2.4  
ref|ZP_05974895.1|  desulfoferrodoxin [Methanobrevibacter smit...  37.0    3.1  
ref|XP_001030107.1|  IQ calmodulin-binding motif family protei...  37.0    3.1  
ref|XP_002462231.1|  hypothetical protein SORBIDRAFT_02g022180...  36.6    4.1  
ref|XP_719354.1|  likely long chain fatty acid-CoA synthetase ...  36.6    4.1  
ref|ZP_05257114.1|  AcrB/AcrD family multidrug resistance prot...  36.2    5.4  
ref|YP_001297587.1|  AcrB/AcrD family multidrug resistance pro...  36.2    5.4  
ref|ZP_01732050.1|  hypothetical protein CY0110_06504 [Cyanoth...  36.2    5.4  
ref|XP_002554320.1|  KLTH0F02508p [Lachancea thermotolerans] >...  35.8    7.0  
ref|XP_002532478.1|  translation initiation factor, putative [...  35.8    7.0  
ref|XP_002666118.1|  PREDICTED: hypothetical protein [Danio re...  35.4    9.1  
ref|XP_682936.4|  PREDICTED: hypothetical protein isoform 2 [D...  35.4    9.1  
ref|XP_002422475.1|  long-chain acyl-coa synthetase, putative;...  35.4    9.1  
ref|NP_593016.1|  autophagy associated protein Aut12 [Schizosa...  35.4    9.1  

ALIGNMENTS
>ref|ZP_03959799.1| flavin reductase [Lactobacillus vaginalis ATCC 49540]
 gb|EEJ40642.1| flavin reductase [Lactobacillus vaginalis ATCC 49540]
Length=250

 Score = 38.9 bits (89),  Expect = 0.83
 Identities = 35/121 (28%), Positives = 51/121 (42%), Gaps = 15/121 (12%)
 Frame = +2

Query  41   LISIGRQTSSDM-VEQDSISHSAD------TRRILSQT-VGIQEQLYLKMRPENSNDSGS  196
            L    R TSS M ++Q SI H  D       R I  Q+ VG    L++ +     N    
Sbjct  32   LYEAARHTSSSMFLQQFSILHLTDPKKRAAVREISKQSYVGANGDLFIFIADLYRNQQ--  89

Query  197  EVVKHERVNRSDSFHRDQLFLQALRQGVLIQQLAATALNLLDGFGTNNVNLNTVGRDPAN  376
              ++H+  N     H   +F+QA    +L  Q A TA   ++  G   V L ++  DP  
Sbjct  90   --IRHQLGNDDGRLHTTDIFMQAAEDAILALQNALTA---IESMGLGGVVLGSIKNDPEK  144

Query  377  L  379
            L
Sbjct  145  L  145


>ref|YP_271222.1| putative MSHA biogenesis protein MshN [Colwellia psychrerythraea 
34H]
 gb|AAZ25814.1| putative MSHA biogenesis protein MshN [Colwellia psychrerythraea 
34H]
Length=403

 Score = 38.9 bits (89),  Expect = 0.83
 Identities = 23/64 (35%), Positives = 34/64 (53%), Gaps = 1/64 (1%)
 Frame = +2

Query  344  NLNTVGRDPANLNAQSIYNRL-IDPDNGRLAQLSEALSVLSNNTGIDEETAIQINELLNV  520
            N NTV  DP N++   IY++  I P NG++A+ ++   VLSNN+       I    L+N 
Sbjct  102  NENTVQNDPINVHVTKIYDQQEIAPINGQIAEPTDVNKVLSNNSAETTAKLITAKPLVNN  161

Query  521  EVGQ  532
               Q
Sbjct  162  SASQ  165


>ref|NP_881210.1| hypothetical protein BP2583 [Bordetella pertussis Tohama I]
 emb|CAE42858.1| hypothetical protein [Bordetella pertussis Tohama I]
Length=252

 Score = 38.1 bits (87),  Expect = 1.4
 Identities = 25/84 (29%), Positives = 41/84 (48%), Gaps = 3/84 (3%)
 Frame = +2

Query  218  VNRSDSFHRDQLFLQALRQGVLIQQLAATALNLLDGFGTNNVNLNTVGRDPANLNAQSIY  397
            VN+     RD+     LRQ + +    A  L +  G G+N ++L  +G DPANL    + 
Sbjct  50   VNQRIQHERDRALANMLRQHLSVPPAQARVLEVGCGSGSNLLDLIRLGFDPANLMGNDLM  109

Query  398  NRLIDPDNGRLAQLSEALSVLSNN  469
            +  ++    RL Q   A+ +L+ N
Sbjct  110  DSRVEQARHRLPQ---AVPILAGN  130


>ref|NP_884827.1| hypothetical protein BPP2603 [Bordetella parapertussis 12822]
 ref|NP_888589.1| hypothetical protein BB2046 [Bordetella bronchiseptica RB50]
 emb|CAE37895.1| hypothetical protein [Bordetella parapertussis]
 emb|CAE32542.1| hypothetical protein [Bordetella bronchiseptica RB50]
Length=237

 Score = 37.7 bits (86),  Expect = 1.8
 Identities = 25/84 (29%), Positives = 41/84 (48%), Gaps = 3/84 (3%)
 Frame = +2

Query  218  VNRSDSFHRDQLFLQALRQGVLIQQLAATALNLLDGFGTNNVNLNTVGRDPANLNAQSIY  397
            VN+     RD+     LRQ + +    A  L +  G G+N ++L  +G DPANL    + 
Sbjct  35   VNQRIQHERDRALAYMLRQHLSVPPAQARVLEVGCGSGSNLLDLIRLGFDPANLMGNDLM  94

Query  398  NRLIDPDNGRLAQLSEALSVLSNN  469
            +  ++    RL Q   A+ +L+ N
Sbjct  95   DSRVEQARHRLPQ---AVPILAGN  115


>ref|XP_001026996.1| hypothetical protein TTHERM_00689930 [Tetrahymena thermophila]
 gb|EAS06754.1| hypothetical protein TTHERM_00689930 [Tetrahymena thermophila 
SB210]
Length=844

 Score = 37.4 bits (85),  Expect = 2.4
 Identities = 36/158 (22%), Positives = 69/158 (43%), Gaps = 23/158 (14%)
 Frame = +2

Query  80   EQDSISHSADTRRILSQTVGIQEQLYLKMRPENSNDSGSEVVKHERVNRSDSFHRDQLFL  259
            ++DS     + + I +Q   I +  ++     + +   +  ++   +N+  +F ++QLF 
Sbjct  605  QEDSQKTKVNYQLIQNQ---INQNNFISKNQYDQSKQNTYNIQPNELNQQPNFQKNQLFQ  661

Query  260  QALRQGVLIQQLAATALNLLDGFGTNNVNLNTVGRDPANLNAQSI------YNRLIDPDN  421
             A +     QQ             +NN+N NT+ +   N N QS       Y   I+ D 
Sbjct  662  FASKAH--FQQ------------SSNNLNQNTLNKQSNNNNVQSNIFSTSQYPEQINEDP  707

Query  422  GRLAQLSEALSVLSNNTGIDEETAIQINELLNVEVGQS  535
             +L  +S+    L + T ID++   Q N++   + GQS
Sbjct  708  SQLNHISKDECFLQDQTYIDQDIYQQKNKIQKSDYGQS  745


>ref|ZP_05974895.1| desulfoferrodoxin [Methanobrevibacter smithii DSM 2374]
 gb|EFC94139.1| desulfoferrodoxin [Methanobrevibacter smithii DSM 2374]
Length=208

 Score = 37.0 bits (84),  Expect = 3.1
 Identities = 23/76 (30%), Positives = 38/76 (50%), Gaps = 8/76 (10%)
 Frame = +2

Query  473  GIDEETAIQINELLNVEVGQSGLTFKQWISGDPAVSEYIENNRSNPEAVLNDLREV----  640
            G+ E+T  +I    ++ +G+    F+QW+  +P  SE+I+ N +      +D+R V    
Sbjct  43   GLYEKTMDEIYSKRHISIGEDFEDFRQWVKDNPEYSEFIDENSTPEIDDYSDMRPVGGLN  102

Query  641  --LVLAHQSASEIAND  682
                L H  A EI ND
Sbjct  103  VDYFLEH--AEEIIND  116


>ref|XP_001030107.1| IQ calmodulin-binding motif family protein [Tetrahymena thermophila]
 gb|EAR82444.1| IQ calmodulin-binding motif family protein [Tetrahymena thermophila 
SB210]
Length=3792

 Score = 37.0 bits (84),  Expect = 3.1
 Identities = 46/209 (22%), Positives = 85/209 (40%), Gaps = 44/209 (21%)
 Frame = +2

Query  182   NDSGSEVVKHERVNRSD-SFHRDQLFL-----QALRQGVLIQQLAATALNLLDGFG----  331
             N   + +  HE +N++  S  RD+L       Q     +        ++N  D F     
Sbjct  2087  NFENNSIQTHENINKNQTSQQRDELLKEFSVSQHSENQIFNNSNQNQSINSQDNFQNGNN  2146

Query  332   ---TNNVNLNTVGRDPANLNAQSIYNRL-IDPDNGRLAQLSEALSVLSNNTGIDEETAIQ  499
                TN  NLN   ++  NLN Q+  N + I+P++      S   ++ ++N   +     Q
Sbjct  2147  LNKTNQNNLNISNQNNFNLNNQNFNNLIQINPNSNSQNSNSRNQNLNNSNQNFNSPQKSQ  2206

Query  500   I----NELLNVEVGQSGLT--------------------FKQWISGDPAVSEYIENNRSN  607
                  NE+LN +  Q+ L+                     KQW      + EY +N   N
Sbjct  2207  TQNFNNEILNSDFKQNDLSQISYNEFKQLQSDLKTKQEQIKQWADIVSMLQEYDQN---N  2263

Query  608   PEAVLNDLREVLVLAHQSASEIANDLYTQ  694
             P++++  ++   + + Q+  ++AN+L  Q
Sbjct  2264  PQSLIEQIK---LKSQQNFQQLANELKNQ  2289


>ref|XP_002462231.1| hypothetical protein SORBIDRAFT_02g022180 [Sorghum bicolor]
 gb|EER98752.1| hypothetical protein SORBIDRAFT_02g022180 [Sorghum bicolor]
Length=1479

 Score = 36.6 bits (83),  Expect = 4.1
 Identities = 30/88 (34%), Positives = 44/88 (50%), Gaps = 10/88 (11%)
 Frame = +2

Query  344   NLNTVGRDPANLNAQSIYNRLIDPDNGRLAQLSEALSVLSNNTGIDEETAIQINELLNVE  523
             NLN   R  A L   ++YN +IDP+N R+        V+S    +DE +  QI     VE
Sbjct  937   NLNL--RFKAGLPPDAVYNIIIDPENKRV--FKNIKEVISRKVLLDEGSR-QI-----VE  986

Query  524   VGQSGLTFKQWISGDPAVSEYIENNRSN  607
             V Q+ +    W SG  +V  +++ NR N
Sbjct  987   VEQAAIWKFLWWSGILSVHVFVDQNRKN  1014


>ref|XP_719354.1| likely long chain fatty acid-CoA synthetase Faa4p [Candida albicans 
SC5314]
 gb|EAL00466.1| likely long chain fatty acid-CoA synthetase Faa4p [Candida albicans 
SC5314]
 gb|EEQ44017.1| long-chain-fatty-acid-CoA ligase 4 [Candida albicans WO-1]
Length=696

 Score = 36.6 bits (83),  Expect = 4.1
 Identities = 31/104 (29%), Positives = 42/104 (40%), Gaps = 19/104 (18%)
 Frame = +2

Query  359  GRDPANLNAQSIYNRLIDPDNGRLAQLSEALSVLSNNTGIDEETAIQINELLN-------  517
            G  P +++AQ   + LI P       L   L+    N  I E T  QI  L         
Sbjct  431  GGSPISVDAQVFISTLIAP-----MLLGYGLTETCANAAILEHTHFQIGTLGTLVGSITA  485

Query  518  --VEVGQSGLTFKQ-----WISGDPAVSEYIENNRSNPEAVLND  628
              V+V  +G   K      W+ G P VSEY +N +   EA  +D
Sbjct  486  KLVDVADAGYYAKNNQGEIWLKGGPVVSEYYKNEKETKEAFTDD  529


>ref|ZP_05257114.1| AcrB/AcrD family multidrug resistance protein [Bacteroides sp. 
4_3_47FAA]
 gb|EET17506.1| AcrB/AcrD family multidrug resistance protein [Bacteroides sp. 
4_3_47FAA]
Length=1072

 Score = 36.2 bits (82),  Expect = 5.4
 Identities = 25/95 (26%), Positives = 47/95 (49%), Gaps = 3/95 (3%)
 Frame = +2

Query  422  GRLAQLSEALSVLSNNTGIDEETAIQINELLNVEVGQSGLTFKQWISGDPAVSEYI-ENN  598
            GRL Q +E   ++     + +   +++ ++ ++E+G+   TF   ++G  AVS  + +  
Sbjct  236  GRLQQTTEFEDIVIK--ALPDGNVLRLGDVADIELGRLAYTFNNMVNGHKAVSCIVYQMA  293

Query  599  RSNPEAVLNDLREVLVLAHQSASEIANDLYTQLAN  703
             SN    ++DL +VL  A +S     N    Q AN
Sbjct  294  GSNATETISDLEKVLAKAQESLPTGLNINIAQNAN  328


>ref|YP_001297587.1| AcrB/AcrD family multidrug resistance protein [Bacteroides vulgatus 
ATCC 8482]
 gb|ABR37965.1| AcrB/AcrD family multidrug resistance protein [Bacteroides vulgatus 
ATCC 8482]
Length=1072

 Score = 36.2 bits (82),  Expect = 5.4
 Identities = 25/95 (26%), Positives = 47/95 (49%), Gaps = 3/95 (3%)
 Frame = +2

Query  422  GRLAQLSEALSVLSNNTGIDEETAIQINELLNVEVGQSGLTFKQWISGDPAVSEYI-ENN  598
            GRL Q +E   ++     + +   +++ ++ ++E+G+   TF   ++G  AVS  + +  
Sbjct  236  GRLQQTTEFEDIVIK--ALPDGNVLRLGDVADIELGRLAYTFNNMVNGHKAVSCIVYQMA  293

Query  599  RSNPEAVLNDLREVLVLAHQSASEIANDLYTQLAN  703
             SN    ++DL +VL  A +S     N    Q AN
Sbjct  294  GSNATETISDLEKVLAKAQESLPTGLNINIAQNAN  328


>ref|ZP_01732050.1| hypothetical protein CY0110_06504 [Cyanothece sp. CCY0110]
 gb|EAZ88521.1| hypothetical protein CY0110_06504 [Cyanothece sp. CCY0110]
Length=721

 Score = 36.2 bits (82),  Expect = 5.4
 Identities = 34/125 (27%), Positives = 55/125 (44%), Gaps = 16/125 (12%)
 Frame = +2

Query  356  VGRDPANLNAQSIYNRLIDPDNGRLAQLSEALSVLSNNTGIDEETAI--QINELLNVEVG  529
            +G DPA+L   S ++ +   D  R+ Q  E++ + +N + I  E  +    N  L +E  
Sbjct  196  LGYDPASLIGCSFFDLIHQADQQRIRQSFESIKIEANQSTILSEFRVCSHDNRWLMLEAI  255

Query  530  QSGLTFKQWISGDPAVSEYIENNRSNPEAVLNDLREVLVLAHQSASEIANDLYTQLANHV  709
               L        DPAV+ ++ N         +D+ E    A Q   +  +D  T LAN  
Sbjct  256  AKNL------EDDPAVAGFVIN--------CHDITERHYTAQQLRYDAYHDKLTGLANRS  301

Query  710  ILLEE  724
             LLE+
Sbjct  302  ALLEQ  306

-----------------------------------------------------------------------------------------------------

c) tBLASTn

 Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|XM_714261.1|  Candida albicans SC5314 likely long chain fa...  39.7    1.7  
ref|XM_002422430.1|  Candida dubliniensis CD36 long-chain acyl...  38.1    4.2  
emb|FM992695.1|  Candida dubliniensis CD36 chromosome R, compl...  38.1    4.2  

ALIGNMENTS
>ref|XM_714261.1| Candida albicans SC5314 likely long chain fatty acid-CoA synthetase 
Faa4p (FAA4) mRNA, complete cds
Length=2091

 Score = 39.7 bits (91),  Expect = 1.7, Method: Compositional matrix adjust.
 Identities = 31/104 (29%), Positives = 42/104 (40%), Gaps = 19/104 (18%)
 Frame = +1

Query  120   GRDPANLNAQSIYNRLIDPDNGRLAQLSEALSVLSNNTGIDEETAIQINELLN-------  172
             G  P +++AQ   + LI P       L   L+    N  I E T  QI  L         
Sbjct  1291  GGSPISVDAQVFISTLIAP-----MLLGYGLTETCANAAILEHTHFQIGTLGTLVGSITA  1455

Query  173   --VEVGQSGLTFKQ-----WISGDPAVSEYIENNRSNPEAVLND  209
               V+V  +G   K      W+ G P VSEY +N +   EA  +D
Sbjct  1456  KLVDVADAGYYAKNNQGEIWLKGGPVVSEYYKNEKETKEAFTDD  1587


>ref|XM_002422430.1| Candida dubliniensis CD36 long-chain acyl-coa synthetase, putative 
(CD36_35160) mRNA, complete cds
Length=2091

 Score = 38.1 bits (87),  Expect = 4.2, Method: Compositional matrix adjust.
 Identities = 30/104 (28%), Positives = 42/104 (40%), Gaps = 19/104 (18%)
 Frame = +1

Query  120   GRDPANLNAQSIYNRLIDPDNGRLAQLSEALSVLSNNTGIDEETAIQINELLN-------  172
             G  P +++AQ   + LI P       L   L+    N  I E T  QI  L         
Sbjct  1291  GGSPISVDAQVFISTLIAP-----MLLGYGLTETCANAAILEHTHFQIGTLGTLVGSITA  1455

Query  173   --VEVGQSGLTFKQ-----WISGDPAVSEYIENNRSNPEAVLND  209
               V+V  +G   K      W+ G P VS+Y +N +   EA  +D
Sbjct  1456  KLVDVADAGYYAKNNQGEIWLKGGPVVSQYYKNEKETKEAFTDD  1587


>emb|FM992695.1| Candida dubliniensis CD36 chromosome R, complete sequence
Length=2267510

 Features in this part of subject sequence:
   long-chain acyl-coa synthetase, putative

 Score = 38.1 bits (87),  Expect = 4.2, Method: Compositional matrix adjust.
 Identities = 30/104 (28%), Positives = 42/104 (40%), Gaps = 19/104 (18%)
 Frame = +2

Query  120      GRDPANLNAQSIYNRLIDPDNGRLAQLSEALSVLSNNTGIDEETAIQINELLN-------  172
                G  P +++AQ   + LI P       L   L+    N  I E T  QI  L         
Sbjct  2144924  GGSPISVDAQVFISTLIAP-----MLLGYGLTETCANAAILEHTHFQIGTLGTLVGSITA  2145088

Query  173      --VEVGQSGLTFKQ-----WISGDPAVSEYIENNRSNPEAVLND  209
                  V+V  +G   K      W+ G P VS+Y +N +   EA  +D
Sbjct  2145089  KLVDVADAGYYAKNNQGEIWLKGGPVVSQYYKNEKETKEAFTDD  2145220



--------------------------------------------------------------------------------------------

d) tBLASTx

                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value  N

gb|CP001848.1|  Pirellula staleyi DSM 6068, complete genome        29.5    2.0    1
emb|BX640418.1|  Bordetella pertussis strain Tohama I, complet...  40.8    2.5    1
emb|BX640443.1|  Bordetella bronchiseptica strain RB50, comple...  40.3    3.4    1
emb|BX640431.1|  Bordetella parapertussis strain 12822, comple...  40.3    3.4    1
emb|FN357331.1|  Schistosoma mansoni genome sequence supercont...  26.3    4.9    1

ALIGNMENTS
>gb|CP001848.1| Pirellula staleyi DSM 6068, complete genome
Length=6196199

 Features in this part of subject sequence:
   heme-binding protein

 Score = 29.5 bits (58),  Expect = 2.0
 Identities = 9/23 (39%), Positives = 14/23 (60%), Gaps = 0/23 (0%)
 Frame = +3/-3

Query  576      CLNI*KITDPTQKLFSMICVKFW  644
                CLN+  +  PT ++F+M C   W
Sbjct  3404472  CLNVLPLIAPTHRMFTMTCTT*W  3404404


 Features in this part of subject sequence:
   heme-binding protein

 Score = 28.5 bits (56),  Expect = 2.0
 Identities = 15/62 (24%), Positives = 32/62 (51%), Gaps = 0/62 (0%)
 Frame = +2/-2

Query  341      VNLNTVGRDPANLNAQSIYNRLIDPDNGRLAQLSEALSVLSNNTGIDEETAIQINELLNV  520
                ++L  +  DP   +  + Y+  +DP+  R+A+++    +  +    DEE   ++  LL+V
Sbjct  3404674  LSLGDLDLDPKTRDVIAGYSVAVDPNTLRVARIAFGKRIADSFPASDEELNRELARLLSV  3404495

Query  521      EV  526
                 V
Sbjct  3404494  LV  3404489


>emb|BX640418.1| Bordetella pertussis strain Tohama I, complete genome; segment 
8/12
Length=349346

 Features in this part of subject sequence:
   hypothetical protein

 Score = 40.8 bits (83),  Expect = 2.5
 Identities = 22/73 (30%), Positives = 35/73 (47%), Gaps = 0/73 (0%)
 Frame = +2/-3

Query  218     VNRSDSFHRDQLFLQALRQGVLIQQLAATALNLLDGFGTNNVNLNTVGRDPANLNAQSIY  397
               VN+     RD+     LRQ + +    A  L +  G G+N ++L  +G DPANL    + 
Sbjct  303480  VNQRIQHERDRALANMLRQHLSVPPAQARVLEVGCGSGSNLLDLIRLGFDPANLMGNDLM  303301

Query  398     NRLIDPDNGRLAQ  436
               +  ++    RL Q
Sbjct  303300  DSRVEQARHRLPQ  303262


>emb|BX640443.1| Bordetella bronchiseptica strain RB50, complete genome; segment 
7/16
Length=346274

 Features in this part of subject sequence:
   hypothetical protein

 Score = 40.3 bits (82),  Expect = 3.4
 Identities = 22/73 (30%), Positives = 35/73 (47%), Gaps = 0/73 (0%)
 Frame = +2/-3

Query  218     VNRSDSFHRDQLFLQALRQGVLIQQLAATALNLLDGFGTNNVNLNTVGRDPANLNAQSIY  397
               VN+     RD+     LRQ + +    A  L +  G G+N ++L  +G DPANL    + 
Sbjct  105939  VNQRIQHERDRALAYMLRQHLSVPPAQARVLEVGCGSGSNLLDLIRLGFDPANLMGNDLM  105760

Query  398     NRLIDPDNGRLAQ  436
               +  ++    RL Q
Sbjct  105759  DSRVEQARHRLPQ  105721


>emb|BX640431.1| Bordetella parapertussis strain 12822, complete genome; segment 
9/14
Length=347894

 Features in this part of subject sequence:
   hypothetical protein

 Score = 40.3 bits (82),  Expect = 3.4
 Identities = 22/73 (30%), Positives = 35/73 (47%), Gaps = 0/73 (0%)
 Frame = +2/-1

Query  218    VNRSDSFHRDQLFLQALRQGVLIQQLAATALNLLDGFGTNNVNLNTVGRDPANLNAQSIY  397
              VN+     RD+     LRQ + +    A  L +  G G+N ++L  +G DPANL    + 
Sbjct  21932  VNQRIQHERDRALAYMLRQHLSVPPAQARVLEVGCGSGSNLLDLIRLGFDPANLMGNDLM  21753

Query  398    NRLIDPDNGRLAQ  436
              +  ++    RL Q
Sbjct  21752  DSRVEQARHRLPQ  21714


>emb|FN357331.1| Schistosoma mansoni genome sequence supercontig Smp_scaff000040
Length=1760029

 Features flanking this part of subject sequence:
   16951 bp at 5' side: expressed protein
   109604 bp at 3' side: hypothetical protein

 Score = 26.3 bits (51),  Expect = 4.9
 Identities = 8/18 (44%), Positives = 10/18 (55%), Gaps = 0/18 (0%)
 Frame = -1/-1

Query  904      PNHHFHQKSHLRQNMNLH  851
                PNHH H  +H   + N H
Sbjct  1173367  PNHHHHHPNHHHHHPNHH  1173314


 Features flanking this part of subject sequence:
   16765 bp at 5' side: expressed protein
   109775 bp at 3' side: hypothetical protein

 Score = 23.5 bits (45),  Expect = 4.9
 Identities = 9/23 (39%), Positives = 10/23 (43%), Gaps = 0/23 (0%)
 Frame = -3/-1

Query  746      HHFQPNHFLLKE*HDWQAGYTGH  678
                HH  PNH   +  H  Q  Y  H
Sbjct  1173196  HHHHPNHHHHQHHHQQQQHYHHH  1173128


 Features flanking this part of subject sequence:
   16843 bp at 5' side: expressed protein
   109724 bp at 3' side: hypothetical protein

 Score = 22.1 bits (42),  Expect = 4.9
 Identities = 7/14 (50%), Positives = 7/14 (50%), Gaps = 0/14 (0%)
 Frame = -3/-1

Query  767      HSLQPPLHHFQPNH  726
                H   P  HH  PNH
Sbjct  1173247  HHHHPNHHHHHPNH  1173206