GOS 3000010

From Metagenes
Warning: this metagenomic sequence has been carefully annotated by students during bioinformatics assignments. These quality annotations are therefore the result of a teaching exercise that you are most welcome to amend and extend if necessary!


Sequence
CAMERA AccNum : JCVI_READ_1091118858977
Annotathon code: GOS_3000010
Sample :
  • GPS :24°10'29n; 84°20'40w
  • Caribbean Sea: Gulf of Mexico - USA
  • Coastal Sea (-2m, 26.4°C, 0.1-0.8 microns)
Authors
Team : Algarve 2012
Username : DavidBrito1
Annotated on : 2013-03-10 01:44:16
  • Brito David

Synopsis

  • Taxonomy: Viruses (NCBI info)
    Rank: superkingdom - Genetic Code: Standard - NCBI Identifier: 10239
    Kingdom: Viruses - Phylum: - Class: - Order:
    Viruses;

Genomic Sequence

>JCVI_READ_1091118858977 GOS_3000010 Genomic DNA
AGTTATAGTCAATACTCTATGTGGGCACAATGCCCTCACAAATGGAAAACTGCATATGTAGATGGTCATAGAAAATTTACAGATAGTATCCACACAATGT
TTGGTACTTCAATGCACGAAGTAATACAGACATTTCTAACTGTAATGTATAATGACACAGCAAAATTAGCTGAACAATTACCATTAGAAGATATGTTACT
TACAAGAATGAAACGCAACTTTGAAGAGATTGTAAAAGCCAATGGCGGTGAGATGTTTTGTGAAGAAAAGGATATGGTTGAGTTCTACAGACACGGTGTA
GAGATACTCAAATTCATAAGAAAGAAGAGAGCTCAATACTTCAGTAAGAAAGGTTATGAATTAGTTGGTATTGAAACTCCTATAGATTATGATCTGCCTA
ATAAGATAAAGTTTGTAGGTTTTTTAGATGTAGTGATTAGGGATACAGTTAGAGATGTTATTAAGATATATGATATTAAAACCTCTACAATGGGTTGGAA
CAAATGGATGAAGGCTGACAAACTGAAGAGTGACCAATTATTACTTTACAAACAATTCTACTCTAAGCAATACAATCATCCTTTGGATAAGATTGAGGTT
GAATTCTTTATTGTAAAGAGAAAGCTATATGAGAACACAGACTTTCCGCAAAAAAGAGTTCAGAAGTTTGTGCCAGCAAATGGTAAACCATCGATTAATA
AAGTGGTTGCTAGATTGAATGAATTTATGACAGAATGTTTTGATTCTGATGGAGAATACAATATTGAGCATATTTATAGAAAAGAAGCATCTAAAAAGAA
TTGCAAATTTTGTGAATTCAATCAAACAGAATATTGTGACGCAGGAGTTAAGTAATGGTTAGATTAAGTTTAAGGATGAACCTGGCGGATATCCTAAATA
ATGAAGAAATTATAATAGAAAA

Translation

[1 - 852/922]   direct strand
>GOS_3000010 Translation [1-852   direct strand]
SYSQYSMWAQCPHKWKTAYVDGHRKFTDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAEQLPLEDMLLTRMKRNFEEIVKANGGEMFCEEKDMVEFYRHGV
EILKFIRKKRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLDVVIRDTVRDVIKIYDIKTSTMGWNKWMKADKLKSDQLLLYKQFYSKQYNHPLDKIEV
EFFIVKRKLYENTDFPQKRVQKFVPANGKPSINKVVARLNEFMTECFDSDGEYNIEHIYRKEASKKNCKFCEFNQTEYCDAGVK

[ Warning ] 5' incomplete: does not start with a Methionine

Annotator commentaries

[img]compass24.png[/img]




The GOS_3000010.81 DNA sequence from the Caribbean Sea, Gulf of Mexico in study is probably coding based on the observations obtained with this line of research. The ORF used for analysis is 852 bp long, it´s from reading frame 1 of the forward strand, it´s size and it´s InterPro and BLAST results support the conding hypothesis. Two other ORFs were identified but in opposition they didn´t produce good BLAST results in contrast to the chosen ORF, so it is more likely that this ORF encodes a protein with known function, also supporting the coding status selected.


The InterPro results also suggest that the sequence has a known function related with nuclease activity (see Protein Domains analysis for more details on this). The BLAST[12] results were limited since there was only one good result among all (E-value_8e-05 and score_51.6) however it was in conformity with the InterPro[13] results regarding the molecular function since gp37 has a exonucleolytic role. Another aspect worth mentioning is that the selected sequence has 284 amino acids and the gp37 has 309AA, this is a good result since proteins with similar size tend to have identical functions.


One aspect that justifies the lack of results is that it´s extremely difficult to do good protein sequence analysis[14] in the superfamily identified in Protein Domains so [i]in silico[/i] methods may be insufficient. The results obtained when using BLASTp were inadequate for multiple alignement and to create phylogenetic trees. Nevertheless due to the Taxonomy report results is more likely that the sequence under study belongs to a virus instead of a bacterium. This hypothesis is supported by the fact that viruses can be found more easily than bacteria in the oceans[15] and there are few data about viruses what can justify the lack of results.


Another aspect that deserve to be mentioned is that the gene name ‘PaP-PAD20_gp37’[16] although it´s not a usual gene name it was used since it was the only one found. In short,there are great probabilities that the GOS_3000010.81 DNA sequence is coding, encodes for a functional domain with a function related to nuclease activity and that this sequence probably belongs to a virus, nevertheless further studies are necessary to ensure safety in these conclusions.




References


PROGRAM_blastp&BLAST_PROGRAMS_blastp&PAGE_TYPE_BlastSearch&SHOW_DEFAULTS_on&LINK_LOC_blasthome [Accessed on: 21.04.2012 12:58].



  • 14 Kosinski J., Feder M., Bujnicki J.M. (2005). The PD-(D/E)XK superfamily revisited: identification of new members among proteins involved in DNA metabolism and functional predictions for domains of (hitherto) unknown function.BMC Bioinformatics.Vol.6: 172.


  • 15 Suttle C. (2007). Marine viruses — major players in the global ecosystem. Nat Rev Microbiol.Vol.5(10):801-12.


ORF finding

PROTOCOL


a) SMS ORFinder[1] / forward strand / frames 1, 2 & 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code

b) SMS ORFinder[1] / reverse strand / frames 1, 2 & 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code




RESULTS ANALYSIS


Using SMS ORFinder with the asked parameters,the following ORFs were found in "Genomic DNA GOS_3000010":


- One in the forward strand in reading frame 1 with 852 bp* (1-852)

- Two in the reverse strand in reading frames 2 and 3, with 231 bp* (65-295) and with 218 bp* (637-854), respectively.


In the second and third reading frames of the forward strand and reading frame 1 in the reverse strand no ORFs were found.



Because all ORFs are in different reading frames it was verified if there was overlap.

The following was inferred:


- ORF from reading frame 1 of the forward strand versus ORF from reading frame 2 of the reverse strand: Overlaps from base 627 to base 857 of the forward strand.

- ORF from reading frame 1 of the forward strand versus ORF from reading frame 3 of the reverse strand: Overlaps from base 68 to base 284 of the forward strand.



The ORF selected for analysis was the ORF from reading frame 1 of the forward strand, since it has the largest number of base pairs (852) comparing with the others found (231 and 218). This selection parameter is used because large ORFs that have no biological function tend to suffer from STOP codon insertions due to evolutionary pressure, so a large ORF is more likely to have biological function.

In addition, this is the only ORF provided by SMS ORFinder that is probably coding, since it´s larger than 200 amino acids and has BLAST and InterProScan results with acceptable scores and E values (i.e. 8e-05 and 5.6e-23 respectively).


Regarding the 5´ and 3´ ends it can be said that the 3´end is complete because at the ORF 3´end there is a STOP codon (TAA). The 5´, on the other hand, is inconclusive because the ORF starts at position 1 so there is no space for a STOP codon, if it was in position 4 or higher there would exist a STOP codon, therefore it is probable that the 5´ is incomplete but it´s inconclusive. There were not detected any internal STOP codons in this ORF.


  • The bp values don´t include the STOP codon.

References



a) forward strand


>ORF number 1 in reading frame 1 on the direct strand extends from base 1 to base 855.
AGTTATAGTCAATACTCTATGTGGGCACAATGCCCTCACAAATGGAAAACTGCATATGTA
GATGGTCATAGAAAATTTACAGATAGTATCCACACAATGTTTGGTACTTCAATGCACGAA
GTAATACAGACATTTCTAACTGTAATGTATAATGACACAGCAAAATTAGCTGAACAATTA
CCATTAGAAGATATGTTACTTACAAGAATGAAACGCAACTTTGAAGAGATTGTAAAAGCC
AATGGCGGTGAGATGTTTTGTGAAGAAAAGGATATGGTTGAGTTCTACAGACACGGTGTA
GAGATACTCAAATTCATAAGAAAGAAGAGAGCTCAATACTTCAGTAAGAAAGGTTATGAA
TTAGTTGGTATTGAAACTCCTATAGATTATGATCTGCCTAATAAGATAAAGTTTGTAGGT
TTTTTAGATGTAGTGATTAGGGATACAGTTAGAGATGTTATTAAGATATATGATATTAAA
ACCTCTACAATGGGTTGGAACAAATGGATGAAGGCTGACAAACTGAAGAGTGACCAATTA
TTACTTTACAAACAATTCTACTCTAAGCAATACAATCATCCTTTGGATAAGATTGAGGTT
GAATTCTTTATTGTAAAGAGAAAGCTATATGAGAACACAGACTTTCCGCAAAAAAGAGTT
CAGAAGTTTGTGCCAGCAAATGGTAAACCATCGATTAATAAAGTGGTTGCTAGATTGAAT
GAATTTATGACAGAATGTTTTGATTCTGATGGAGAATACAATATTGAGCATATTTATAGA
AAAGAAGCATCTAAAAAGAATTGCAAATTTTGTGAATTCAATCAAACAGAATATTGTGAC
GCAGGAGTTAAGTAA

>Translation of ORF number 1 in reading frame 1 on the direct strand.
SYSQYSMWAQCPHKWKTAYVDGHRKFTDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAEQL
PLEDMLLTRMKRNFEEIVKANGGEMFCEEKDMVEFYRHGVEILKFIRKKRAQYFSKKGYE
LVGIETPIDYDLPNKIKFVGFLDVVIRDTVRDVIKIYDIKTSTMGWNKWMKADKLKSDQL
LLYKQFYSKQYNHPLDKIEVEFFIVKRKLYENTDFPQKRVQKFVPANGKPSINKVVARLN
EFMTECFDSDGEYNIEHIYRKEASKKNCKFCEFNQTEYCDAGVK*

No ORFs were found in reading frame 2.

No ORFs were found in reading frame 3.


---------------------------------------------------------------------------------------------------------------

b) reverse strand


No ORFs were found in reading frame 1.

>ORF number 1 in reading frame 2 on the reverse strand extends from base 65 to base 298.
CCATTACTTAACTCCTGCGTCACAATATTCTGTTTGATTGAATTCACAAAATTTGCAATT
CTTTTTAGATGCTTCTTTTCTATAAATATGCTCAATATTGTATTCTCCATCAGAATCAAA
ACATTCTGTCATAAATTCATTCAATCTAGCAACCACTTTATTAATCGATGGTTTACCATT
TGCTGGCACAAACTTCTGAACTCTTTTTTGCGGAAAGTCTGTGTTCTCATATAG

>Translation of ORF number 1 in reading frame 2 on the reverse strand.
PLLNSCVTIFCLIEFTKFAILFRCFFSINMLNIVFSIRIKTFCHKFIQSSNHFINRWFTI
CWHKLLNSFLRKVCVLI*

>ORF number 1 in reading frame 3 on the reverse strand extends from base 639 to base 857.
AACTCAACCATATCCTTTTCTTCACAAAACATCTCACCGCCATTGGCTTTTACAATCTCT
TCAAAGTTGCGTTTCATTCTTGTAAGTAACATATCTTCTAATGGTAATTGTTCAGCTAAT
TTTGCTGTGTCATTATACATTACAGTTAGAAATGTCTGTATTACTTCGTGCATTGAAGTA
CCAAACATTGTGTGGATACTATCTGTAAATTTTCTATGA

>Translation of ORF number 1 in reading frame 3 on the reverse strand.
NSTISFSSQNISPPLAFTISSKLRFILVSNISSNGNCSANFAVSLYITVRNVCITSCIEV
PNIVWILSVNFL*

Multiple Alignement

PROTOCOL



RESULTS ANALYSIS




Regrettably it is not possible to do multiple alignment, because the BLASTp results were not conclusive, so there aren't enough sequences to align.


RAW RESULTS

Protein Domains

InterPro[2], default parameters at EBI


RESULTS ANALYSIS


Using InterProScan, one unintegrated domain was found, since there was only one result, the PDDEXK_1 domain was used in this line of investigation, as recommended in FAQ. In addition, the protein domain found had an E-Value of 5.6e-23 which has significance since it is < 1e-4. Obviously since only one domain was found there is no overlapping between them.

The results indicate that the ORF in study has a domain that belongs to the PD-(D/E)XK nuclease superfamily, whitch is a member of the PDDEXK (CL0236) clan. This clan has a total of 60 members [3].


The members of this superfamily are known for their contribution to the hydrolysis of the nucleic acid phosphodiester bond (4,11,12) and their role in marking it´s active site[4]. This superfamily is used in genetic engineering and molecular medicine for investigation on the mechanisms of phosphodiester hydrolysis and protein-DNA interactions. This protein domain was first identified in type II restriction enzymes, by far the best studied restriction enzymes.[5]


It´s extremely difficult to do good protein sequence analysis and structure prediction in this superfamily because the similarity between them are usualy very low that are required more sensitive metods than PSI-BLAST searches[6]. This could indicate that the Blastp search recommended in the Rule Book is insufficient for a good analysis, however due to lack of knowledge on how the PSI-BLAST works it was recommended not to use it for this line of research.


It should be mentioned that there was no GO TERMS for this protein domain, so the Gene Ontology site was not used.



References




  • 4 Kinch L. N., Ginalski K., Rychlewski L., Grishin N. V. (2005). Identification of novel restriction endonuclease-like fold families among hypothetical proteins. Nucleic Acids Res. Vol.[b]33[/b]:3598–3605.


  • 5 Bujnicki J.M, Rychlewski L. (2001).Grouping together highly diverged PD-(D/E)XK nucleases and identification of novel superfamily members using structure-guided alignment of sequence profiles.J Mol Microbiol Biotechnol.Vol.3(1):69-72.


  • 6 Kosinski J., Feder M., Bujnicki J.M. (2005). The PD-(D/E)XK superfamily revisited: identification of new members among proteins involved in DNA metabolism and functional predictions for domains of (hitherto) unknown function.BMC Bioinformatics.Vol.6: 172.
RAW RESULTS

GOS_3000010	ACA09E3CA7E33670	284	HMMPfam	PF12705	PDDEXK_1	1	275	5.6e-23	T	10-Mar-2012	NULL	NULL	

Phylogeny

PROTOCOL



RESULTS ANALYSIS


Due to not be possible to do multiple alignment for the reasons given above in the BLAST and Taxonomy report analysis, it is not possible to create phylogenetic trees.

RAW RESULTS

Taxonomy report

PROTOCOL


1) BLASTp versus SWISSPROT, NCBI default parameters "1000 max target sequences" [10]

2) BLASTp versus NR, NCBI default parameters "1000 max target sequences" [10]


RESULTS ANALYSIS


The results obtained when using BLASTp were very bad due to these factors:


- The superkingdom toxonomic rank is different in the SWISSPROT and in NR results (i.e. Bacteria and Viruses, respectively).In other words they are different in a very low taxonomic level suggesting low reliability;

- The number of hits in the results are low as well;

- The number of results are very limited.


Unfortunately due to the factors mentioned above it is not possible to do multiple alignment or to construct a phylogenetic tree.


Nevertheless due to the previous analysis is more likely that the sequence belongs to a virus. Another aspect is that viruses are without a doubt the most abundant forms of genetic diversity in the seas, since they are present in a great way in ocean waters[11]. This supports the hypothesis of the sequence under analysis belongs to the category of viruses and not bacteria. There is not much information associated to viruses so this would also justify the nearly uninformative results obtained.



References



  • 11 Suttle C. (2007). Marine viruses — major players in the global ecosystem. Nat Rev Microbiol.Vol.5(10):801-12.
RAW RESULTS

1) BLASTp versus SWISSPROT:

Lineage Report
root
. Bacteria              [bacteria]
. . Bacilli               [firmicutes]
. . . Bacillales            [firmicutes]
. . . . Bacillaceae           [firmicutes]
. . . . . Bacillus              [firmicutes]
. . . . . . Bacillus cereus group [firmicutes]
. . . . . . . Bacillus cytotoxicus NVH 391-98 ---------------------   36 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . . Bacillus weihenstephanensis KBAB4 ...................   34 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . . Bacillus thuringiensis serovar konkukian str. 97-27 .   34 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . . Bacillus cereus ATCC 10987 ..........................   34 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . . Bacillus cereus E33L ................................   34 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . . Bacillus anthracis (anthrax) ........................   34 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . . Bacillus cereus .....................................   34 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . . Bacillus thuringiensis str. Al Hakam ................   34 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . . Bacillus cereus AH820 ...............................   34 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . . Bacillus cereus AH187 ...............................   34 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . . Bacillus anthracis str. A0248 .......................   34 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . . Bacillus anthracis str. CDC 684 .....................   34 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . . Bacillus cereus 03BB102 .............................   34 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . . Bacillus cereus Q1 ..................................   34 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . . Bacillus cereus G9842 ...............................   34 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . . Bacillus cereus B4264 ...............................   34 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . . Bacillus cereus ATCC 14579 ..........................   34 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . Bacillus clausii KSM-K16 ------------------------------   35 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . Bacillus licheniformis ATCC 14580 .....................   33 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . Bacillus halodurans C-125 .............................   31 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . . Bacillus subtilis .....................................   31 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . Anoxybacillus flavithermus WK1 --------------------------   35 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . Geobacillus stearothermophilus ..........................   35 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . Geobacillus kaustophilus HTA426 .........................   34 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . Geobacillus sp. WCH70 ...................................   33 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . . Lysinibacillus sphaericus ...............................   33 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . . Brevibacillus brevis NBRC 100599 --------------------------   33 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . Enterococcus faecalis ---------------------------------------   33 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . Lactobacillus gasseri ATCC 33323 ............................   31 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . . Lactobacillus johnsonii NCC 533 .............................   31 1 hit  [firmicutes]     RecName: Full=6-phosphofructokinase; Short=Phosphofructokin
. . Streptomyces caeruleus ----------------------------------------   33 1 hit  [high GC Gram+]  RecName: Full=DNA gyrase subunit B, novobiocin-resistant
. Influenza B virus (STRAIN B/LENINGRAD/179/86) -------------------   30 1 hit  [viruses]        RecName: Full=Glycoprotein NB >gi|54039851|sp|P67909.1|VNB_
. Influenza B virus (STRAIN B/MEMPHIS/6/86) .......................   30 1 hit  [viruses]        RecName: Full=Glycoprotein NB >gi|54039851|sp|P67909.1|VNB_
. Influenza B virus (B/Singapore/222/79) ..........................   30 1 hit  [viruses]        RecName: Full=Glycoprotein NB
. Influenza B virus (STRAIN B/HONG KONG/8/73) .....................   30 1 hit  [viruses]        RecName: Full=Glycoprotein NB
. Influenza B virus (B/USSR/100/83) ...............................   30 1 hit  [viruses]        RecName: Full=Glycoprotein NB
. Influenza B virus (STRAIN B/VICTORIA/3/85) ......................   30 1 hit  [viruses]        RecName: Full=Glycoprotein NB
. Influenza B virus (B/Memphis/3/89) ..............................   30 1 hit  [viruses]        RecName: Full=Glycoprotein NB

-------------------------------------------------------------------------------------------------------------------------------------------------------------

2) BLASTp versus NR, NCBI 


root
. Viruses                   [viruses]
. . unclassified Siphoviridae [viruses]
. . . Propionibacterium phage PAD20 ---------------------   51 2 hits [viruses]           Gp37 [Propionibacterium phage PAD20] >gi|260066552|gb|ACX30
. . . Propionibacterium phage PA6 .......................   48 2 hits [viruses]           gp37 [Propionibacterium phage PA6] >gi|91982943|gb|ABE68606
. . . Propionibacterium phage PAS50 .....................   47 2 hits [viruses]           Gp37 [Propionibacterium phage PAS50] >gi|260066598|gb|ACX30
. . unidentified phage ----------------------------------   43 1 hit  [viruses]           hypothetical protein 1013_scaffold3125_00045 [unidentified 
. Propionibacterium acnes HL096PA2 ----------------------   45 1 hit  [high GC Gram+]     hypothetical protein HMPREF9338_01036 [Propionibacterium ac
. Propionibacterium acnes HL096PA3 ......................   45 1 hit  [high GC Gram+]     hypothetical protein HMPREF9338_01036 [Propionibacterium ac
. Lactobacillus mali KCTC 3596 = DSM 20444 ..............   44 1 hit  [firmicutes]        hypothetical protein LmalK35_11471 [Lactobacillus mali KCTC
. Desulfovibrio magneticus RS-1 .........................   42 4 hits [d-proteobacteria]  hypothetical protein DMR_27350 [Desulfovibrio magneticus RS
. uncultured Termite group 1 bacterium phylotype Rs-D17 .   41 2 hits [bacteria]          hypothetical protein TGRD_565 [uncultured Termite group 1 b
. Desulfatibacillum alkenivorans AK-01 ..................   41 2 hits [d-proteobacteria]  unnamed protein product [Desulfatibacillum alkenivorans AK-
. Lachnospiraceae bacterium 7_1_58FAA ...................   39 2 hits [firmicutes]        hypothetical protein HMPREF0995_02017 [Lachnospiraceae bact
. Bacillus sp. SG-1 .....................................   38 2 hits [firmicutes]        6-phosphofructokinase [Bacillus sp. SG-1] >gi|148851226|gb|
. Pirellula staleyi DSM 6068 ............................   38 2 hits [planctomycetes]    hypothetical protein Psta_3407 [Pirellula staleyi DSM 6068]
. Vibrio splendidus 12B01 ...............................   38 2 hits [g-proteobacteria]  NADH oxidase [Vibrio splendidus 12B01] >gi|84378606|gb|EAP9
. Corynebacterium matruchotii ATCC 14266 ................   38 2 hits [high GC Gram+]     conserved hypothetical protein [Corynebacterium matruchotii
. Bacillus coahuilensis m4-4 ............................   38 1 hit  [firmicutes]        6-phosphofructokinase [Bacillus coahuilensis m4-4]
. Anaeromyxobacter dehalogenans 2CP-C ...................   38 2 hits [d-proteobacteria]  hypothetical protein Adeh_3553 [Anaeromyxobacter dehalogena
. Corynebacterium matruchotii ATCC 33806 ................   38 2 hits [high GC Gram+]     hypothetical protein CORMATOL_01472 [Corynebacterium matruc
. Aggregatibacter actinomycetemcomitans RhAA1 ...........   37 1 hit  [g-proteobacteria]  hypothetical protein RHAA1_05568 [Aggregatibacter actinomyc
. Bacillus sp. 2_A_57_CT2 ...............................   37 2 hits [firmicutes]        6-phosphofructokinase [Bacillus sp. 2_A_57_CT2] >gi|3173964
. Paenibacillus sp. Aloe-11 .............................   37 2 hits [firmicutes]        hypothetical protein WG8_1100 [Paenibacillus sp. Aloe-11] >
. Corynebacterium efficiens YS-314 ......................   37 4 hits [high GC Gram+]     RecB family exonuclease [Corynebacterium efficiens YS-314] 
. Desulfosporosinus meridiei DSM 13257 ..................   37 2 hits [firmicutes]        6-phosphofructokinase [Desulfosporosinus meridiei DSM 13257
. Paenibacillus polymyxa E681 ...........................   36 2 hits [firmicutes]        hypothetical protein PPE_01015 [Paenibacillus polymyxa E681
. Desulfitobacterium metallireducens DSM 15288 ..........   36 2 hits [firmicutes]        6-phosphofructokinase [Desulfitobacterium metallireducens D
. Dehalococcoides sp. CBDB1 .............................   36 2 hits [GNS bacteria]      cbdbA602 gene product [Dehalococcoides sp. CBDB1] >gi|73660
. Dehalococcoides sp. BAV1 ..............................   36 2 hits [GNS bacteria]      exonuclease-like protein [Dehalococcoides sp. BAV1] >gi|146
. Vibrio splendidus LGP32 ...............................   36 2 hits [g-proteobacteria]  NADH oxidase [Vibrio splendidus LGP32] >gi|218324160|emb|CA
. Bacillus sp. m3-13 ....................................   36 1 hit  [firmicutes]        6-phosphofructokinase [Bacillus sp. m3-13]
. Desmospora sp. 8437 ...................................   36 2 hits [firmicutes]        6-phosphofructokinase [Desmospora sp. 8437] >gi|332968085|g
. Bacillus cytotoxicus NVH 391-98 .......................   36 3 hits [firmicutes]        6-phosphofructokinase [Bacillus cytotoxicus NVH 391-98] >gi
. Desulfosporosinus youngiae DSM 17734 ..................   36 2 hits [firmicutes]        ABC-type sugar transport system, periplasmic component [Des
. Vibrio sp. MED222 .....................................   36 2 hits [g-proteobacteria]  NADH oxidase [Vibrio sp. MED222] >gi|85837587|gb|EAQ55699.1
. Bacillus sp. NRRL B-14911 .............................   36 2 hits [firmicutes]        6-phosphofructokinase [Bacillus sp. NRRL B-14911] >gi|89084
. Dehalococcoides sp. VS ................................   36 2 hits [GNS bacteria]      hypothetical protein DhcVS_558 [Dehalococcoides sp. VS] >gi
. Desulfosporosinus orientis DSM 765 ....................   36 2 hits [firmicutes]        unnamed protein product [Desulfosporosinus orientis DSM 765
. Streptomyces zinciresistens K42 .......................   36 2 hits [high GC Gram+]     ATP-dependent DNA helicase [Streptomyces zinciresistens K42
. Bacillus sp. 1NLA3E ...................................   35 2 hits [firmicutes]        6-phosphofructokinase [Bacillus sp. 1NLA3E] >gi|372452667|g
. Methanocella sp. HZ254 ................................   35 1 hit  [euryarchaeotes]    putative RecB family exonuclease [Methanocella sp. HZ254]
. Mannheimia haemolytica PHL213 .........................   33 2 hits [g-proteobacteria]  hypothetical protein MHA_2093 [Mannheimia haemolytica PHL21
. Diplosphaera colitermitum TAV2 ........................   36 2 hits [verrucomicrobia]   hypothetical protein ObacDRAFT_9284 [Diplosphaera colitermi

BLAST

PROTOCOL


1) BLASTp versus SWISSPROT, NCBI default parameters "1000 max target sequences" [7]

2) BLASTp versus NR, NCBI default parameters "1000 max target sequences" [7]

3) BLASTx versus NR, NCBI default parameters "1000 max target sequences" [8]


RESULTS ANALYSIS


Using BLASTp versus the SWISSPROT database the results obtained had bad scores(i.e. ≤36.6), hits( 1 hit) and E-Values (≥0.17) comparing to the BLASTp versus NR results (best e-value of 8e-05, score of 51.6 and a max of 2 hits). This are predictable results since the NR database compares the sequence with several other databases and SwissProt has only one database.


However, the E-value 8e-05 is not big enough to be completely reliable, BLASTX was used in an attempt to optimize the results. Unfortunately, the best results had E-values ​​≥ 0.011, and these results were hypothetical proteins. Thus, the BLASTP vs NR results were regarded as the most reliable, but this does not ensure with great certainty that there is homology.


A Blastp search was performed for the other ORFs detected with SMS ORFinder (raw results in Note pad) and the following was inferred:


-The ORF in reading frame 2 on the reverse strand, had bad E-values (i.e. ≥ 4.6), furthermore two out of the four results are hypothetical proteins.

- The ORF in reading frame 3 on the reverse strand had only one result with a bad E-value (i.e. 8.5), apart from this, the only result is an hypothetical protein.


These results indicate that the ORF selected is the more appropriate for this line of research since it produces the best results.


The best result is the only one with a significant E-value is a glycoprotein (gp37) belonging to the organism Propionibacterium phage PAD20. The gp37 was was described as having an exonucleolytic role[9].


The results obtained in Lineage Report are extremely bad, not showing enough diversity to make outgrups or ingrups to be used for multiple alignment and to create a phylogenetic tree with scientific accuracy.


References




  • 9 Lood R., Collin M. (2001)Characterization and genome sequencing of two Propionibacterium acnes phages displaying pseudolysogeny.J Gen Virol.Vol.12:198.
RAW RESULTS

1) BLASTp versus SWISSPROT:

                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

sp|A7GTP4.1|K6PF_BACCN  RecName: Full=6-phosphofructokinase; S...  36.6    0.17  
sp|B7GGT1.1|K6PF_ANOFW  RecName: Full=6-phosphofructokinase; S...  35.8    0.33  
sp|Q5WEF6.1|K6PF_BACSK  RecName: Full=6-phosphofructokinase; S...  35.4    0.38  
sp|P00512.2|K6PF_BACST  RecName: Full=6-phosphofructokinase; S...  35.0    0.58 
sp|A9VJR1.1|K6PF_BACWK  RecName: Full=6-phosphofructokinase; S...  34.7    0.63  
sp|Q5KWB1.1|K6PF_GEOKA  RecName: Full=6-phosphofructokinase; S...  34.7    0.64  
sp|Q6HCT3.1|K6PF_BACHK  RecName: Full=6-phosphofructokinase; S...  34.7    0.66 
sp|B7IJZ8.1|K6PF_BACC2  RecName: Full=6-phosphofructokinase; S...  34.7    0.68  
sp|Q817F3.1|K6PF_BACCR  RecName: Full=6-phosphofructokinase; S...  34.7    0.70  
sp|C5D663.1|K6PF_GEOSW  RecName: Full=6-phosphofructokinase; S...  33.9    1.1   
sp|Q65G82.1|K6PF_BACLD  RecName: Full=6-phosphofructokinase; S...  33.9    1.2   
sp|P50074.1|GYRBR_STRSH  RecName: Full=DNA gyrase subunit B, n...  33.5    1.8  
sp|C0Z7W7.1|K6PF_BREBN  RecName: Full=6-phosphofructokinase; S...  33.1    1.8   
sp|Q836R3.1|K6PF_ENTFA  RecName: Full=6-phosphofructokinase; S...  33.1    1.9  
sp|Q93LR4.1|K6PF_BACSH  RecName: Full=6-phosphofructokinase; S...  33.1    2.3  
sp|P67908.1|VNB_INBLN  RecName: Full=Glycoprotein NB >sp|P6790...  30.8    4.0  
sp|P16204.1|VNB_INBSI  RecName: Full=Glycoprotein NB               30.4    4.2  
sp|P16192.1|VNB_INBHK  RecName: Full=Glycoprotein NB               30.4    4.2  
sp|P16206.1|VNB_INBUS  RecName: Full=Glycoprotein NB               30.4    4.3  
sp|P16208.1|VNB_INBVI  RecName: Full=Glycoprotein NB               30.4    4.4  
sp|Q043V2.1|K6PF_LACGA  RecName: Full=6-phosphofructokinase; S...  32.0    5.0   
sp|P16200.1|VNB_INBMF  RecName: Full=Glycoprotein NB               30.4    5.2  
sp|Q9K843.1|K6PF_BACHD  RecName: Full=6-phosphofructokinase; S...  32.0    5.3  
sp|Q74JM8.1|K6PF_LACJO  RecName: Full=6-phosphofructokinase; S...  32.0    5.5   
sp|O34529.1|K6PF_BACSU  RecName: Full=6-phosphofructokinase; S...  31.6    5.8  

ALIGNMENTS

>sp|A7GTP4.1|K6PF_BACCN  RecName: Full=6-phosphofructokinase; Short=Phosphofructokinase; 
AltName: Full=Phosphohexokinase
Length=319

 GENE ID: 5343947 Bcer98_3283 | 6-phosphofructokinase
[Bacillus cereus subsp. cytotoxis NVH 391-98] (10 or fewer PubMed links)

 Score = 36.6 bits (83),  Expect = 0.17, Method: Compositional matrix adjust.
 Identities = 23/78 (29%), Positives = 39/78 (50%), Gaps = 5/78 (6%)

Query  89   EKDMVEFYRHGVEILKFIRK----KRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLDV  144
            EK + +  +HG+E L  I      + A+  +++G+  VG+   ID D+P     +GF D 
Sbjct  83   EKGIEQLKKHGIEGLVVIGGDGSYQGAKKLTEQGFPCVGVPGTIDNDIPGTDFTIGF-DT  141

Query  145  VIRDTVRDVIKIYDIKTS  162
             +   +  + KI D  TS
Sbjct  142  ALNTVIDAIDKIRDTATS  159


>sp|B7GGT1.1|K6PF_ANOFW  RecName: Full=6-phosphofructokinase; Short=Phosphofructokinase; 
AltName: Full=Phosphohexokinase
Length=319

 GENE ID: 7036756 pfkA | 6-phosphofructokinase [Anoxybacillus flavithermus WK1]
(10 or fewer PubMed links)

 Score = 35.8 bits (81),  Expect = 0.33, Method: Compositional matrix adjust.
 Identities = 23/77 (30%), Positives = 37/77 (48%), Gaps = 5/77 (6%)

Query  90   KDMVEFYRHGVEILKFIRK----KRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLDVV  145
            K + +  +HG+E L  I      + A+  ++ GY  VG+   ID D+P     +GF D  
Sbjct  84   KGIEQLKKHGIEGLVVIGGDGSYQGAKKLTEHGYPCVGVPGTIDNDIPGTDFTIGF-DTA  142

Query  146  IRDTVRDVIKIYDIKTS  162
            +   +  + KI D  TS
Sbjct  143  LNTVIDAIDKIRDTATS  159


>sp|Q5WEF6.1|K6PF_BACSK  RecName: Full=6-phosphofructokinase; Short=Phosphofructokinase; 
AltName: Full=Phosphohexokinase
Length=319

 GENE ID: 3202098 pfkA | 6-phosphofructokinase [Bacillus clausii KSM-K16]
(10 or fewer PubMed links)

 Score = 35.4 bits (80),  Expect = 0.38, Method: Compositional matrix adjust.
 Identities = 28/97 (29%), Positives = 42/97 (43%), Gaps = 17/97 (18%)

Query  82   GGEMF----CEE--------KDMVEFYRHGVEILKFIRK----KRAQYFSKKGYELVGIE  125
            GG M     CEE        K + +  + G+E L  I      + AQ  +K G+  +G+ 
Sbjct  64   GGTMLYTARCEEFKTLEGQQKGIEQLKKFGIEGLVVIGGDGSYRGAQQLTKHGFPTIGVP  123

Query  126  TPIDYDLPNKIKFVGFLDVVIRDTVRDVIKIYDIKTS  162
              ID D+P     +GF D  +   +  + KI D  TS
Sbjct  124  GTIDNDIPGTDFTIGF-DTALNTVIDAIDKIRDTATS  159


>sp|P00512.2|K6PF_BACST  RecName: Full=6-phosphofructokinase; Short=Phosphofructokinase; 
AltName: Full=Phosphohexokinase
Length=319

 Score = 35.0 bits (79),  Expect = 0.58, Method: Compositional matrix adjust.
 Identities = 22/79 (28%), Positives = 39/79 (49%), Gaps = 5/79 (6%)

Query  88   EEKDMVEFYRHGVEILKFIRK----KRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLD  143
            ++K + +  +HG+E L  I      + A+  ++ G+  VG+   ID D+P     +GF D
Sbjct  82   QKKGIEQLKKHGIEGLVVIGGDGSYQGAKKLTEHGFPCVGVPGTIDNDIPGTDFTIGF-D  140

Query  144  VVIRDTVRDVIKIYDIKTS  162
              +   +  + KI D  TS
Sbjct  141  TALNTVIDAIDKIRDTATS  159


>sp|A9VJR1.1|K6PF_BACWK  RecName: Full=6-phosphofructokinase; Short=Phosphofructokinase; 
AltName: Full=Phosphohexokinase
Length=319

 GENE ID: 5844644 BcerKBAB4_4429 | 6-phosphofructokinase
[Bacillus weihenstephanensis KBAB4]

 Score = 34.7 bits (78),  Expect = 0.63, Method: Compositional matrix adjust.
 Identities = 22/77 (29%), Positives = 38/77 (49%), Gaps = 5/77 (6%)

Query  90   KDMVEFYRHGVEILKFIRK----KRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLDVV  145
            K + +  +HG+E L  I      + A+  +++G+  VG+   ID D+P     +GF D  
Sbjct  84   KGIEQLKKHGIEGLVVIGGDGSYQGAKKLTEQGFPCVGVPGTIDNDIPGTDFTIGF-DTA  142

Query  146  IRDTVRDVIKIYDIKTS  162
            +   +  + KI D  TS
Sbjct  143  LNTVIDAIDKIRDTATS  159


>sp|Q5KWB1.1|K6PF_GEOKA  RecName: Full=6-phosphofructokinase; Short=Phosphofructokinase; 
AltName: Full=Phosphohexokinase
Length=319

 GENE ID: 3184307 pfkA | 6-phosphofructokinase [Geobacillus kaustophilus HTA426]
(10 or fewer PubMed links)

 Score = 34.7 bits (78),  Expect = 0.64, Method: Compositional matrix adjust.
 Identities = 22/79 (28%), Positives = 39/79 (49%), Gaps = 5/79 (6%)

Query  88   EEKDMVEFYRHGVEILKFIRK----KRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLD  143
            ++K + +  +HG+E L  I      + A+  ++ G+  VG+   ID D+P     +GF D
Sbjct  82   QKKGIEQLKKHGIEGLVVIGGDGSYQGAKKLTEHGFPCVGVPGTIDNDIPGTDFTIGF-D  140

Query  144  VVIRDTVRDVIKIYDIKTS  162
              +   +  + KI D  TS
Sbjct  141  TALNTVIDAIDKIRDTATS  159


>sp|Q6HCT3.1|K6PF_BACHK  RecName: Full=6-phosphofructokinase; Short=Phosphofructokinase; 
AltName: Full=Phosphohexokinase
 sp|Q72ZD8.1|K6PF_BACC1  RecName: Full=6-phosphofructokinase; Short=Phosphofructokinase; 
AltName: Full=Phosphohexokinase
 sp|Q633J8.1|K6PF_BACCZ  RecName: Full=6-phosphofructokinase; Short=Phosphofructokinase; 
AltName: Full=Phosphohexokinase
 9 more sequence titles
 Length=319

 Score = 34.7 bits (78),  Expect = 0.66, Method: Compositional matrix adjust.
 Identities = 22/77 (29%), Positives = 38/77 (49%), Gaps = 5/77 (6%)

Query  90   KDMVEFYRHGVEILKFIRK----KRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLDVV  145
            K + +  +HG+E L  I      + A+  +++G+  VG+   ID D+P     +GF D  
Sbjct  84   KGIEQLKKHGIEGLVVIGGDGSYQGAKKLTEQGFPCVGVPGTIDNDIPGTDFTIGF-DTA  142

Query  146  IRDTVRDVIKIYDIKTS  162
            +   +  + KI D  TS
Sbjct  143  LNTVIDAIDKIRDTATS  159


>sp|B7IJZ8.1|K6PF_BACC2  RecName: Full=6-phosphofructokinase; Short=Phosphofructokinase; 
AltName: Full=Phosphohexokinase
 sp|B7HFB3.1|K6PF_BACC4  RecName: Full=6-phosphofructokinase; Short=Phosphofructokinase; 
AltName: Full=Phosphohexokinase
Length=319

 GENE ID: 7183255 pfkA | 6-phosphofructokinase [Bacillus cereus G9842]

 Score = 34.7 bits (78),  Expect = 0.68, Method: Compositional matrix adjust.
 Identities = 22/77 (29%), Positives = 38/77 (49%), Gaps = 5/77 (6%)

Query  90   KDMVEFYRHGVEILKFIRK----KRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLDVV  145
            K + +  +HG+E L  I      + A+  +++G+  VG+   ID D+P     +GF D  
Sbjct  84   KGIEQLKKHGIEGLVVIGGDGSYQGAKKLTEQGFPCVGVPGTIDNDIPGTDFTIGF-DTA  142

Query  146  IRDTVRDVIKIYDIKTS  162
            +   +  + KI D  TS
Sbjct  143  LNTVIDAIDKIRDTATS  159


>sp|Q817F3.1|K6PF_BACCR  RecName: Full=6-phosphofructokinase; Short=Phosphofructokinase; 
AltName: Full=Phosphohexokinase
Length=319

 GENE ID: 1206945 pfkA | 6-phosphofructokinase [Bacillus cereus ATCC 14579]
(10 or fewer PubMed links)

 Score = 34.7 bits (78),  Expect = 0.70, Method: Compositional matrix adjust.
 Identities = 22/77 (29%), Positives = 38/77 (49%), Gaps = 5/77 (6%)

Query  90   KDMVEFYRHGVEILKFIRK----KRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLDVV  145
            K + +  +HG+E L  I      + A+  +++G+  VG+   ID D+P     +GF D  
Sbjct  84   KGIEQLKKHGIEGLVVIGGDGSYQGAKKLTEQGFPCVGVPGTIDNDIPGTDFTIGF-DTA  142

Query  146  IRDTVRDVIKIYDIKTS  162
            +   +  + KI D  TS
Sbjct  143  LNTVIDAIDKIRDTATS  159


>sp|C5D663.1|K6PF_GEOSW  RecName: Full=6-phosphofructokinase; Short=Phosphofructokinase; 
AltName: Full=Phosphohexokinase
Length=319

 GENE ID: 7976508 GWCH70_2686 | 6-phosphofructokinase [Geobacillus sp. WCH70]

 Score = 33.9 bits (76),  Expect = 1.1, Method: Compositional matrix adjust.
 Identities = 22/77 (29%), Positives = 37/77 (48%), Gaps = 5/77 (6%)

Query  90   KDMVEFYRHGVEILKFIRK----KRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLDVV  145
            K + +  +HG+E L  I      + A+  ++ G+  VG+   ID D+P     +GF D  
Sbjct  84   KGIEQLKKHGIEGLVVIGGDGSYQGAKKLTEHGFPCVGVPGTIDNDIPGTDFTIGF-DTA  142

Query  146  IRDTVRDVIKIYDIKTS  162
            +   +  + KI D  TS
Sbjct  143  LNTVIDAIDKIRDTATS  159


>sp|Q65G82.1|K6PF_BACLD  RecName: Full=6-phosphofructokinase; Short=Phosphofructokinase; 
AltName: Full=Phosphohexokinase
Length=319

 GENE ID: 3028939 pfkA | 6-phosphofructokinase
[Bacillus licheniformis ATCC 14580] (10 or fewer PubMed links)
 GENE ID: 3098675 pfkA | 6-phosphofructokinase
[Bacillus licheniformis ATCC 14580] (10 or fewer PubMed links)

 Score = 33.9 bits (76),  Expect = 1.2, Method: Compositional matrix adjust.
 Identities = 23/78 (29%), Positives = 36/78 (46%), Gaps = 5/78 (6%)

Query  89   EKDMVEFYRHGVEILKFIRKKRAQYFSKK----GYELVGIETPIDYDLPNKIKFVGFLDV  144
            EK +    ++G+E L  I    +   +KK    G+  VG+   ID D+P     +GF D 
Sbjct  83   EKGIANLKKYGIEGLVVIGGDGSYMGAKKLTEHGFPCVGVPGTIDNDIPGTDLTIGF-DT  141

Query  145  VIRDTVRDVIKIYDIKTS  162
             +   +  + KI D  TS
Sbjct  142  ALNTVIDAIDKIRDTATS  159


>sp|P50074.1|GYRBR_STRSH  RecName: Full=DNA gyrase subunit B, novobiocin-resistant
Length=677

 Score = 33.5 bits (75),  Expect = 1.8, Method: Compositional matrix adjust.
 Identities = 30/122 (25%), Positives = 51/122 (42%), Gaps = 4/122 (3%)

Query  7    MWAQCPHKWKTAYVDGHRKFTDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAEQLPLEDML  66
            + A+   +W   Y D    + ++IHT  G +  E  +T LT + N  A+    L  +D  
Sbjct  288  LSAEIALQWNGQYTDSVYSYANAIHTHEGGTHEEGFRTALTTVVNRYAREKRLLRDKDAN  347

Query  67   LTR--MKRNFEEIVKANGGEMFCEEKDMVEFYRHGVEIL--KFIRKKRAQYFSKKGYELV  122
            L+   ++     I+  N GE   E +   +     V  L  K + +  A +F +   E V
Sbjct  348  LSGEDIREGLTAIISVNVGEPQFEGQTKTKLGNTEVRTLLQKIVHEHLADWFDRNPNEAV  407

Query  123  GI  124
             I
Sbjct  408  DI  409

------------------------------------------------------------------------------------------------------------------------

2) BLASTp versus NR

                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|YP_004414787.1|  Gp37 [Propionibacterium phage PAD20] >gb|...  51.6    8e-05 
ref|YP_001285613.1|  gp37 [Propionibacterium phage PA6] >gb|AB...  48.1    0.001 
ref|YP_004414741.1|  Gp37 [Propionibacterium phage PAS50] >gb|...  47.8    0.002 
gb|EGE72742.1|  hypothetical protein HMPREF9338_01036 [Propion...  45.1    0.010
ref|ZP_09449261.1|  hypothetical protein LmalK35_11471 [Lactob...  44.7    0.016
gb|AFB75615.1|  hypothetical protein 1013_scaffold3125_00045 [...  43.9    0.021
ref|YP_002954112.1|  hypothetical protein DMR_27350 [Desulfovi...  42.4    0.070 
ref|YP_002955568.1|  hypothetical protein DMR_41910 [Desulfovi...  42.4    0.072 
ref|YP_001956509.1|  hypothetical protein TGRD_565 [uncultured...  41.6    0.13  
ref|YP_002430514.1|  unnamed protein product [Desulfatibacillu...  41.2    0.16  
ref|ZP_09531181.1|  hypothetical protein HMPREF0995_02017 [Lac...  39.3    0.71 
ref|ZP_01859639.1|  6-phosphofructokinase [Bacillus sp. SG-1] ...  38.9    0.93 
ref|YP_003371931.1|  hypothetical protein Psta_3407 [Pirellula...  38.9    1.0   
ref|ZP_00989528.1|  NADH oxidase [Vibrio splendidus 12B01] >gb...  38.9    1.3  
ref|ZP_07403575.1|  conserved hypothetical protein [Corynebact...  38.5    1.4  
ref|ZP_03227332.1|  6-phosphofructokinase [Bacillus coahuilens...  38.5    1.4  
ref|YP_466756.1|  hypothetical protein Adeh_3553 [Anaeromyxoba...  38.1    1.5   
ref|ZP_03710644.1|  hypothetical protein CORMATOL_01472 [Coryn...  38.1    1.6  
gb|EHK90219.1|  hypothetical protein RHAA1_05568 [Aggregatibac...  37.4    1.7  
ref|ZP_08005947.1|  6-phosphofructokinase [Bacillus sp. 2_A_57...  37.7    2.2  
ref|ZP_09772576.1|  hypothetical protein WG8_1100 [Paenibacill...  37.7    2.5  
ref|ZP_05750143.1|  RecB family exonuclease [Corynebacterium e...  37.7    2.6  
ref|NP_738239.1|  unnamed protein product [Corynebacterium eff...  37.4    2.7   
ref|ZP_08980738.1|  6-phosphofructokinase [Desulfosporosinus m...  37.4    3.6  
ref|YP_003869401.1|  hypothetical protein PPE_01015 [Paenibaci...  37.0    4.1   
ref|ZP_08976929.1|  6-phosphofructokinase [Desulfitobacterium ...  37.0    4.6  
ref|YP_307697.1|  cbdbA602 gene product [Dehalococcoides sp. C...  37.0    5.0   
ref|YP_001214056.1|  exonuclease-like protein [Dehalococcoides...  37.0    5.1   
ref|YP_002394711.1|  NADH oxidase [Vibrio splendidus LGP32] >e...  37.0    5.3   
ref|ZP_07709432.1|  6-phosphofructokinase [Bacillus sp. m3-13]     36.6    5.6  
ref|ZP_08466187.1|  6-phosphofructokinase [Desmospora sp. 8437...  36.6    5.7  
ref|YP_001376497.1|  6-phosphofructokinase [Bacillus cytotoxic...  36.6    6.0   
ref|ZP_09653703.1|  ABC-type sugar transport system, periplasm...  36.6    6.7  
ref|ZP_01063020.1|  NADH oxidase [Vibrio sp. MED222] >gb|EAQ55...  36.6    6.8  
ref|ZP_01173236.1|  6-phosphofructokinase [Bacillus sp. NRRL B...  36.2    6.8  
ref|YP_003330038.1|  hypothetical protein DhcVS_558 [Dehalococ...  36.6    7.0   
ref|YP_004971370.1|  unnamed protein product [Desulfosporosinu...  36.2    7.3   
ref|ZP_08804370.1|  ATP-dependent DNA helicase [Streptomyces z...  36.6    8.6  
ref|ZP_09600276.1|  6-phosphofructokinase [Bacillus sp. 1NLA3E...  35.8    9.0  
gb|AFC99724.1|  putative RecB family exonuclease [Methanocella...  35.8    9.1  
ref|ZP_04978594.1|  hypothetical protein MHA_2093 [Mannheimia ...  33.9    9.2  
ref|ZP_03724162.1|  hypothetical protein ObacDRAFT_9284 [Diplo...  36.2    9.7  

ALIGNMENTS

>ref|YP_004414787.1|  Gp37 [Propionibacterium phage PAD20]
 gb|ACX30831.1|  Gp37 [Propionibacterium phage PAD20]
Length=309

 GENE ID: 10498672 PaP-PAD20_gp37 | Gp37 [Propionibacterium phage PAD20]

 Score = 51.6 bits (122),  Expect = 8e-05, Method: Compositional matrix adjust.
 Identities = 48/184 (26%), Positives = 81/184 (44%), Gaps = 26/184 (14%)

Query  1    SYSQYSMWAQCPHKWKTAYVDGHRKFTDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAEQL  60
            SYS  S WA+C  KW+  +   H +      T+ G+++H + + +   +YN     AE  
Sbjct  17   SYSSLSQWAECGEKWRLQH-GYHTQHHTWYATIAGSAIHHITEQYDLHLYNP----AEYP  71

Query  61   PLEDMLLTRMKRNFE---EIVKANGGEM---------FCE-----EKDMVEFYRHG---V  100
             L D L +  K  F     + ++ G E+          CE     +KD   +  +G   V
Sbjct  72   ALPDKLAS-FKNIFATQVALAESEGTEIKPSGRICKNMCESGGPHKKDYNWWMMYGPTFV  130

Query  101  EILKFIRKKRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLDVVIRDTVRDVIKIYDIK  160
            +  K  R+   +Y +       GIE P++  LP+  + VG++D +  DT      I D+K
Sbjct  131  DRWKTWRRNHPEYITAILDGKPGIEYPVETTLPDGTQIVGYIDRIFTDTDTGETFILDLK  190

Query  161  TSTM  164
            T  +
Sbjct  191  TGRL  194


>ref|YP_001285613.1|  gp37 [Propionibacterium phage PA6]
 gb|ABE68606.1|  gp37 [Propionibacterium phage PA6]
Length=315

 GENE ID: 5247065 PaP-PA6_gp37 | gp37 [Propionibacterium phage PA6]
(10 or fewer PubMed links)

 Score = 48.1 bits (113),  Expect = 0.001, Method: Compositional matrix adjust.
 Identities = 47/183 (26%), Positives = 79/183 (43%), Gaps = 24/183 (13%)

Query  1    SYSQYSMWAQCPHKWKTAYVDGHRKFTDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAEQL  60
            SYS  + WA+C  KW+ A+   H +      T+ G+++H + + +   +YN     AE  
Sbjct  17   SYSSLTQWAECGEKWRLAH-GYHAQHHTWYATIAGSAIHHITEQYDLHLYNP----AEYP  71

Query  61   PLEDML--LTRMKRNFEEIVKANGGEM---------FCE-----EKDMVEFYRHG---VE  101
             L D L   T +      + ++ G  +          CE     +KD   +  +G   V+
Sbjct  72   ALPDKLSSFTNIFDTQIALAESEGTNIKPSGRICKNMCESGGPHKKDYNWWMMYGPTFVD  131

Query  102  ILKFIRKKRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLDVVIRDTVRDVIKIYDIKT  161
              K  R+   +Y +       GIE P++  L +  K VG++D V  DT      I D+KT
Sbjct  132  RWKTWRRNHPEYATAVIDGQPGIEYPVETTLDDDTKIVGYIDRVFTDTDTGETFILDLKT  191

Query  162  STM  164
              +
Sbjct  192  GRL  194


>ref|YP_004414741.1|  Gp37 [Propionibacterium phage PAS50]
 gb|ACX30876.1|  Gp37 [Propionibacterium phage PAS50]
Length=343

 GENE ID: 10498625 PaP-PAS50_gp37 | Gp37 [Propionibacterium phage PAS50]

 Score = 47.8 bits (112),  Expect = 0.002, Method: Compositional matrix adjust.
 Identities = 47/184 (26%), Positives = 79/184 (43%), Gaps = 26/184 (14%)

Query  1    SYSQYSMWAQCPHKWKTAYVDGHRKFTDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAEQL  60
            SYS  S WA+C  KW+ ++   H +      T+ G+++H + + +   +Y+     AE L
Sbjct  17   SYSSLSQWAECGEKWRLSH-GYHAQHHTWYATIAGSAIHRITEQYDLHLYSP----AEYL  71

Query  61   PLEDMLLTRMKRNFEEIVKANGGEM------------FCE-----EKDMVEFYRHG---V  100
             L D L +  K  F+  V     E              CE     +KD   +  +G   V
Sbjct  72   ALPDKLAS-FKNVFDTQVALAESEGTRHKPSGRICKNMCESGGPHKKDYNWWMMYGPTFV  130

Query  101  EILKFIRKKRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLDVVIRDTVRDVIKIYDIK  160
            +  K  R+   +Y +       GIE P++  L +  + VG++D +  DT      I D+K
Sbjct  131  DRWKTWRRNHPEYITAVIDGKPGIEYPVETTLDDGTQIVGYIDRIFTDTDTGETFILDLK  190

Query  161  TSTM  164
            T  +
Sbjct  191  TGRL  194


>gb|EGE72742.1|  hypothetical protein HMPREF9338_01036 [Propionibacterium acnes 
HL096PA2]
 gb|EGE74950.1|  hypothetical protein HMPREF9337_00814 [Propionibacterium acnes 
HL096PA3]
Length=315

 Score = 45.1 bits (105),  Expect = 0.010, Method: Compositional matrix adjust.
 Identities = 45/179 (25%), Positives = 83/179 (46%), Gaps = 16/179 (9%)

Query  1    SYSQYSMWAQCPHKWKTAYVDGHRKFTDSIHTMFGTSMHEVIQTFLTVMYN--DTAKLAE  58
            SYS  S WA+C  KW+  +   H +      T+ G+++H + + +   +YN  +  +L +
Sbjct  17   SYSSLSQWAECGEKWRLQH-GYHTQHHTWYATIAGSAIHHITEQYDLHLYNPDEYPELPD  75

Query  59   QLP-LEDMLLTRMKRNFEEI--VKANG--GEMFCE-----EKDMVEFYRHG---VEILKF  105
            +L    ++  T++     E   +K +G   +  CE     +KD   +  +G   V+  K 
Sbjct  76   KLSSFTNIFDTQVALAESEGTNIKPSGRVCKNMCESGGPNKKDYNWWMMYGPIFVDRWKT  135

Query  106  IRKKRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLDVVIRDTVRDVIKIYDIKTSTM  164
             R+   +Y +       GIE P++  L +  + VG++D V  DT      I D+KT  +
Sbjct  136  WRRNHPEYATAVIDGQPGIEYPVETTLQDGTQIVGYIDRVFTDTNTGETFILDLKTGRL  194


>ref|ZP_09449261.1|  hypothetical protein LmalK35_11471 [Lactobacillus mali KCTC 3596 
= DSM 20444]
Length=386

 Score = 44.7 bits (104),  Expect = 0.016, Method: Compositional matrix adjust.
 Identities = 52/212 (25%), Positives = 96/212 (45%), Gaps = 38/212 (18%)

Query  1    SYSQYSMWAQCPHKWKTAYVDGHRKFTDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAEQL  60
            S+S+ S + +CP  +   Y++  R  TD+++T+FG+  HE+IQ +L       AK  ++ 
Sbjct  22   SFSRISTFLECPWAYNMLYIEKRRINTDNVYTIFGSECHEIIQDYL-------AKKIDRA  74

Query  61   PLEDMLLTRMKR------NFE-EIVKANGGEMFCEEKDMVEFYRHGVEILKFIRKKR---  110
             +     + ++R      +F+ +  K   G +     ++  +++H   +   +R ++   
Sbjct  75   EMSKSWASFVERWEDDPTSFQFDTKKIESGYL----DNLTHYFKHTEGLNYPVRNEKPVI  130

Query  111  AQYFSKKGYELVGIETPIDYDLPNKIKFVGFLDVVIRDTVRDVIKIYDIKTSTMGWNKWM  170
            A+   K G  LV               FVG++D    D   + + + D KTS+       
Sbjct  131  AKLHDKDGKLLV---------------FVGYVDSEYTDEDGN-LHLIDFKTSSKTTFTPK  174

Query  171  KADKLKSDQLLLYKQFYSKQYNHPLDKIEVEF  202
               K KS QLLLY     ++   P DKI+ +F
Sbjct  175  NLPK-KSMQLLLYAIAEHQRTGIPYDKIKCKF  205


>gb|AFB75615.1|  hypothetical protein 1013_scaffold3125_00045 [unidentified phage]
Length=261

 Score = 43.9 bits (102),  Expect = 0.021, Method: Compositional matrix adjust.
 Identities = 41/147 (28%), Positives = 72/147 (49%), Gaps = 29/147 (20%)

Query  1    SYSQYSMWAQCPHKWKTAYVDGHRKF--TDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAE  58
            SYS+   +  CP++W   Y+   +KF   D   + +GT MH++I+     +Y+   K   
Sbjct  16   SYSRIKAFEDCPYRWYLKYI---KKFHGKDMFFSSYGTFMHKLIE-----LYHKGEKTPR  67

Query  59   QLPLEDMLLTRMKRNFEEIVKANGGEMFCEEKDMVEFYRHGVEILKFIRKKRAQYFSKKG  118
            Q  + DM L   K   E + +A   ++F        ++  G++ LK +     Q F    
Sbjct  68   Q--IVDMYLQDFKT--EVVGRAPNRKVFS------SYFTGGLQYLKAL-----QPFP---  109

Query  119  YELVGIETPIDYDLPNKIKFVGFLDVV  145
            Y +VG+E  +D+ + N I FVG++D +
Sbjct  110  YGMVGVEKKVDF-VVNGIPFVGYIDFL  135


>ref|YP_002954112.1|  hypothetical protein DMR_27350 [Desulfovibrio magneticus RS-1]
 dbj|BAH76226.1|  hypothetical protein [Desulfovibrio magneticus RS-1]
Length=265

 GENE ID: 7981838 DMR_27350 | hypothetical protein
[Desulfovibrio magneticus RS-1] (10 or fewer PubMed links)

 Score = 42.4 bits (98),  Expect = 0.070, Method: Compositional matrix adjust.
 Identities = 44/184 (24%), Positives = 80/184 (43%), Gaps = 14/184 (8%)

Query  1    SYSQYSMWAQCPHKWKTAYVDGHRKFTDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAEQL  60
            S S  + +  C  ++K A +D       +   +FGT++H++I+     +Y+    +  +L
Sbjct  11   SASSINSYLDCGLQFKFAKIDKREPEAIAEALIFGTTIHKIIE-----LYHHERSIGTRL  65

Query  61   PLEDMLLTRMKRNFEEIVKANGGEMFCEEKDMVEFYRHGVEILKFIRKKRAQYFSKKGYE  120
             + D+     K   E   +  G   F E KD       G  +L          F + G +
Sbjct  66   SVMDVTAAFEKCYTEAFEQQAGKIQFKEGKDFDSTLLEGKSLL----ATYVTQFPETGLK  121

Query  121  LVGIETPIDYDLPN-KIKFVGFLDVVIRDTVRDVIKIYDIKTSTMGWNKWMKADKLKSDQ  179
            ++G+E    + +    I  VG +D+V  D   +++ I D KTS+  ++     D  KS Q
Sbjct  122  VIGLEKAFSFQIEGVPIPIVGVMDMVEEDAGGNIV-IVDHKTSSRSYS---SDDIDKSLQ  177

Query  180  LLLY  183
            L +Y
Sbjct  178  LTIY  181


>ref|YP_002955568.1|  hypothetical protein DMR_41910 [Desulfovibrio magneticus RS-1]
 dbj|BAH77682.1|  hypothetical protein [Desulfovibrio magneticus RS-1]
Length=265

 GENE ID: 7980637 DMR_41910 | hypothetical protein
[Desulfovibrio magneticus RS-1] (10 or fewer PubMed links)

 Score = 42.4 bits (98),  Expect = 0.072, Method: Compositional matrix adjust.
 Identities = 44/184 (24%), Positives = 80/184 (43%), Gaps = 14/184 (8%)

Query  1    SYSQYSMWAQCPHKWKTAYVDGHRKFTDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAEQL  60
            S S  + +  C  ++K A +D       +   +FGT++H++I+     +Y+    +  +L
Sbjct  11   SASSINSYLDCGLQFKFAKIDKREPEAIAEALIFGTTIHKIIE-----LYHHERSIGTRL  65

Query  61   PLEDMLLTRMKRNFEEIVKANGGEMFCEEKDMVEFYRHGVEILKFIRKKRAQYFSKKGYE  120
             + D+     K   E   +  G   F E KD       G  +L          F + G +
Sbjct  66   SVMDVTAAFEKCYTEAFEQQAGKIQFKEGKDFDSTLLEGKSLL----ATYVTQFPETGLK  121

Query  121  LVGIETPIDYDLPN-KIKFVGFLDVVIRDTVRDVIKIYDIKTSTMGWNKWMKADKLKSDQ  179
            ++G+E    + +    I  VG +D+V  D   +++ I D KTS+  ++     D  KS Q
Sbjct  122  VIGLEKAFSFQIEGVPIPIVGVMDMVEEDAGGNIV-IVDHKTSSRSYS---SDDIDKSLQ  177

Query  180  LLLY  183
            L +Y
Sbjct  178  LTIY  181


>ref|YP_001956509.1|  hypothetical protein TGRD_565 [uncultured Termite group 1 bacterium 
phylotype Rs-D17]
 dbj|BAG14048.1|  conserved hypothetical protein [uncultured Termite group 1 bacterium 
phylotype Rs-D17]
Length=248

 GENE ID: 6373291 TGRD_565 | hypothetical protein
[uncultured Termite group 1 bacterium phylotype Rs-D17]
(10 or fewer PubMed links)

 Score = 41.6 bits (96),  Expect = 0.13, Method: Compositional matrix adjust.
 Identities = 57/212 (27%), Positives = 90/212 (42%), Gaps = 31/212 (15%)

Query  1    SYSQYSMWAQCPHKWKTAYVDG-HRKFTDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAEQ  59
            SYS+ +M+  CP+K+K  Y+D  H      I  +FG  +H+ ++ F            EQ
Sbjct  8    SYSRVNMYLFCPYKYKLMYLDNLHIPINADI--IFGHIIHKALEKFHVG--------KEQ  57

Query  60   LPLEDMLLTRMKRNFEEIVKANGGEMFCEEKDMVEFYRHGVEILKFIRKKRAQYFSKKGY  119
                DML       FE    A   + F + + + E+Y  G  +L    K     F+    
Sbjct  58   --SYDML-------FECYDDAWRNDGFADPQQIFEYYECGRRMLASYYKS----FNASDT  104

Query  120  ELVGIETPIDYDLPNKIKFVGFLDVVIRDTVRDVIKIYDIKTSTMGWNKWMKADKLKSD-  178
            E++ +E   D ++  K KF+G +D V R       +I D KT    W +    +++  D 
Sbjct  105  EVIYVEKAFDANI-GKYKFIGIIDRVDR-YPDGKYEIVDYKTHAKIWEQ----ERVDKDL  158

Query  179  QLLLYKQFYSKQYNHPLDKIEVEFFIVKRKLY  210
            QL  Y       +    DKI V F    +K+Y
Sbjct  159  QLSFYVYACKNVFGFDPDKISVYFLSENKKIY  190


>ref|YP_002430514.1|  unnamed protein product [Desulfatibacillum alkenivorans AK-01]
 gb|ACL03046.1|  conserved hypothetical protein [Desulfatibacillum alkenivorans 
AK-01]
Length=271

 GENE ID: 7165261 Dalk_1345 | hypothetical protein
[Desulfatibacillum alkenivorans AK-01] (10 or fewer PubMed links)

 Score = 41.2 bits (95),  Expect = 0.16, Method: Compositional matrix adjust.
 Identities = 38/167 (23%), Positives = 76/167 (46%), Gaps = 21/167 (13%)

Query  1    SYSQYSMWAQCPHKWKTAYVDGHRKFTDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAEQL  60
            S S  S++  C   +K  YVD  +  + S   +FG+++H V++ F    YN +  + E +
Sbjct  13   SCSSISLYLDCSLAYKFRYVDRLKSESVSDALVFGSAIHAVLERF----YN-SLMIGEVI  67

Query  61   PLEDMLLTRMKRNFEEIVKANGGEMFCEEKDMVEFYRH-----GVEILKFIRKKRAQYFS  115
            P+ D L+   +  ++E           E +D++++ R      G+E    + +  +Q F 
Sbjct  68   PV-DALVDLWELTWKE---------HAEGRDIIDWKRGNDFDKGLETGAGLLRAFSQKFE  117

Query  116  KKGYELVGIETPIDYDLPN-KIKFVGFLDVVIRDTVRDVIKIYDIKT  161
                 ++ +E      +    I  +G  D+V++D    ++ I D KT
Sbjct  118  VVDTAIICVEEAFSLTIDGLDIPVIGVFDLVLQDLASGLVTIVDHKT  164


>ref|ZP_09531181.1|  hypothetical protein HMPREF0995_02017 [Lachnospiraceae bacterium 
7_1_58FAA]
 gb|EHO34210.1|  hypothetical protein HMPREF0995_02017 [Lachnospiraceae bacterium 
7_1_58FAA]
Length=251

 Score = 39.3 bits (90),  Expect = 0.71, Method: Compositional matrix adjust.
 Identities = 35/153 (23%), Positives = 68/153 (44%), Gaps = 29/153 (19%)

Query  1    SYSQYSMWAQCPHKWKTAYV----DGHR-KFTDSIHTMFGTSMHEVIQTFLTVMYNDTAK  55
            SYS+ + +  CP+KW  +Y+    +G   K        FG+ MH+++Q +L+        
Sbjct  4    SYSRVASFDDCPYKWFLSYLYRDENGRSLKKKSGFFAEFGSYMHKILQMYLS-------G  56

Query  56   LAEQLPLEDMLLTRMKRNFEEIVKANGGEMFCEEKDMVEFYRHGVEILKFIRKKRAQYFS  115
            L E+  L    +   K N     KA   +++      + +++ G   L          FS
Sbjct  57   LLEKERLSTYYVAHFKENV--FSKAPNSKIY------MNYFQQGFHYL--------DDFS  100

Query  116  KKGYELVGIETPIDYDLPNKIKFVGFLDVVIRD  148
                 ++G+E  +D+    + KF GF+D++ ++
Sbjct  101  FPSRTIIGVEEKVDFMFAGR-KFTGFVDLISKN  132


>ref|ZP_01859639.1|  6-phosphofructokinase [Bacillus sp. SG-1]
 gb|EDL65376.1|  6-phosphofructokinase [Bacillus sp. SG-1]
Length=319

 Score = 38.9 bits (89),  Expect = 0.93, Method: Compositional matrix adjust.
 Identities = 50/184 (27%), Positives = 72/184 (39%), Gaps = 35/184 (19%)

Query  82   GGEMF----CEE--------KDMVEFYRHGVEILKFIRK----KRAQYFSKKGYELVGIE  125
            GG M     CEE        K + +  +HG+E L  I      + A+  ++ GY  VG+ 
Sbjct  64   GGTMLYTARCEEFKTKEGQKKGIEQLNKHGIEGLVVIGGDGSYRGAKALTELGYPCVGVP  123

Query  126  TPIDYDLPNKIKFVGFLDVVIRDTVRDVIKIYDIKTS--------TMGWNKWMKADKLKS  177
              ID D+P     +GF D  +   +  + KI D  TS         MG N    A  L S
Sbjct  124  GTIDNDIPGTEYTIGF-DTALNTVIDAIDKIRDTATSHERTFIIEVMGRNAGDLA--LWS  180

Query  178  DQLLLYKQFYSKQYNHPLDKIEVEFFIVKRKLYENTDFPQKRVQKFVPANGKPSINKVVA  237
                  +     +  H +D I        R+L +  +  +K     V A G  S N+  A
Sbjct  181  GLAGGAETILIPEDKHDMDDI-------ARRLKKGQERGKKH-SIIVVAEGVMSGNEFAA  232

Query  238  RLNE  241
            RL E
Sbjct  233  RLKE  236

------------------------------------------------------------------------------------------------------------------

3) BLASTx versus NR

                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|YP_001956509.1|  hypothetical protein TGRD_565 [uncultured...  45.1    0.011
ref|ZP_09449261.1|  hypothetical protein LmalK35_11471 [Lactob...  44.7    0.021
gb|AFB75615.1|  hypothetical protein 1013_scaffold3125_00045 [...  43.9    0.027
ref|YP_004414787.1|  Gp37 [Propionibacterium phage PAD20] >gb|...  43.1    0.056
ref|YP_004414741.1|  Gp37 [Propionibacterium phage PAS50] >gb|...  41.2    0.25 
ref|YP_001285613.1|  gp37 [Propionibacterium phage PA6] >gb|AB...  40.8    0.31 
ref|ZP_09772576.1|  hypothetical protein WG8_1100 [Paenibacill...  38.1    2.3  
gb|EGE72742.1|  hypothetical protein HMPREF9338_01036 [Propion...  37.4    4.1  
ref|ZP_09531181.1|  hypothetical protein HMPREF0995_02017 [Lac...  37.0    4.7  
ref|XP_002489617.1|  hypothetical protein [Komagataella pastor...  36.6    6.4  
emb|CCA36439.1|  putative membrane protein [Komagataella pasto...  36.6    6.6  
ref|YP_003869401.1|  hypothetical protein PPE_01015 [Paenibaci...  36.6    7.1  
ref|YP_846148.1|  hypothetical protein Sfum_2030 [Syntrophobac...  34.7    7.7  
ref|YP_307697.1|  cbdbA602 gene product [Dehalococcoides sp. C...  36.6    8.0  
ref|YP_001214056.1|  exonuclease-like protein [Dehalococcoides...  36.6    8.0  
ref|ZP_09653703.1|  ABC-type sugar transport system, periplasm...  36.6    8.6  
ref|XP_003114498.1|  hypothetical protein CRE_27182 [Caenorhab...  36.2    9.1  

ALIGNMENTS

>ref|YP_001956509.1| hypothetical protein TGRD_565 [uncultured Termite group 1 bacterium 
phylotype Rs-D17]
 dbj|BAG14048.1| conserved hypothetical protein [uncultured Termite group 1 bacterium 
phylotype Rs-D17]
Length=248

 Score = 45.1 bits (105),  Expect = 0.011
 Identities = 52/212 (25%), Positives = 86/212 (41%), Gaps = 31/212 (15%)
 Frame = +1

Query  1    SYSQYSMWAQCPHKWKTAYVDG-HRKFTDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAEQ  177
            SYS+ +M+  CP+K+K  Y+D  H      I  +FG  +H+ ++ F              
Sbjct  8    SYSRVNMYLFCPYKYKLMYLDNLHIPINADI--IFGHIIHKALEKFHVGKEQS-------  58

Query  178  LPLEDMLLTRMKRNFEEIVKANGGEMFCEEKDMVEFYRHGVEILKFIRKKRAQYFSKKGY  357
                DML       FE    A   + F + + + E+Y  G  +L    K     F+    
Sbjct  59   ---YDML-------FECYDDAWRNDGFADPQQIFEYYECGRRMLASYYKS----FNASDT  104

Query  358  ELVGIETPIDYDLPNKIKFVGFLdvvirdtvrdviKIYDIKTSTMGWNKWMKADKLKSD-  534
            E++ +E   D ++  K KF+G +   +        +I D KT    W +    +++  D 
Sbjct  105  EVIYVEKAFDANI-GKYKFIGII-DRVDRYPDGKYEIVDYKTHAKIWEQ----ERVDKDL  158

Query  535  QLLLYKQFYSKQYNHPLDKIEVEFFIVKRKLY  630
            QL  Y       +    DKI V F    +K+Y
Sbjct  159  QLSFYVYACKNVFGFDPDKISVYFLSENKKIY  190


>ref|ZP_09449261.1| hypothetical protein LmalK35_11471 [Lactobacillus mali KCTC 3596 
= DSM 20444]
Length=386

 Score = 44.7 bits (104),  Expect = 0.021
 Identities = 17/46 (37%), Positives = 31/46 (67%), Gaps = 0/46 (0%)
 Frame = +1

Query  1    SYSQYSMWAQCPHKWKTAYVDGHRKFTDSIHTMFGTSMHEVIQTFL  138
            S+S+ S + +CP  +   Y++  R  TD+++T+FG+  HE+IQ +L
Sbjct  22   SFSRISTFLECPWAYNMLYIEKRRINTDNVYTIFGSECHEIIQDYL  67


>gb|AFB75615.1| hypothetical protein 1013_scaffold3125_00045 [unidentified phage]
Length=261

 Score = 43.9 bits (102),  Expect = 0.027
 Identities = 40/144 (28%), Positives = 70/144 (49%), Gaps = 29/144 (20%)
 Frame = +1

Query  1    SYSQYSMWAQCPHKWKTAYVDGHRKF--TDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAE  174
            SYS+   +  CP++W   Y+   +KF   D   + +GT MH++I+     +Y+   K   
Sbjct  16   SYSRIKAFEDCPYRWYLKYI---KKFHGKDMFFSSYGTFMHKLIE-----LYHKGEKTPR  67

Query  175  QLPLEDMLLTRMKRNFEEIVKANGGEMFCEEKDMVEFYRHGVEILKFIRKKRAQYFSKKG  354
            Q  + DM L   K   E + +A   ++F        ++  G++ LK +     Q F    
Sbjct  68   Q--IVDMYLQDFKT--EVVGRAPNRKVFS------SYFTGGLQYLKAL-----QPFP---  109

Query  355  YELVGIETPIDYDLPNKIKFVGFL  426
            Y +VG+E  +D+ + N I FVG++
Sbjct  110  YGMVGVEKKVDF-VVNGIPFVGYI  132


>ref|YP_004414787.1| Gp37 [Propionibacterium phage PAD20]
 gb|ACX30831.1| Gp37 [Propionibacterium phage PAD20]
Length=309

 Score = 43.1 bits (100),  Expect = 0.056
 Identities = 48/184 (26%), Positives = 81/184 (44%), Gaps = 26/184 (14%)
 Frame = +1

Query  1    SYSQYSMWAQCPHKWKTAYVDGHRKFTDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAEQL  180
            SYS  S WA+C  KW+  +   H +      T+ G+++H + + +   +YN     AE  
Sbjct  17   SYSSLSQWAECGEKWRLQH-GYHTQHHTWYATIAGSAIHHITEQYDLHLYNP----AEYP  71

Query  181  PLEDMLLTRMKRNFE---EIVKANGGEM---------FCE-----EKDMVEFYRHG---V  300
             L D L +  K  F     + ++ G E+          CE     +KD   +  +G   V
Sbjct  72   ALPDKLAS-FKNIFATQVALAESEGTEIKPSGRICKNMCESGGPHKKDYNWWMMYGPTFV  130

Query  301  EILKFIRKKRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLdvvirdtvrdviKIYDIK  480
            +  K  R+   +Y +       GIE P++  LP+  + VG++D +  DT      I D+K
Sbjct  131  DRWKTWRRNHPEYITAILDGKPGIEYPVETTLPDGTQIVGYIDRIFTDTDTGETFILDLK  190

Query  481  TSTM  492
            T  +
Sbjct  191  TGRL  194


>ref|YP_004414741.1| Gp37 [Propionibacterium phage PAS50]
 gb|ACX30876.1| Gp37 [Propionibacterium phage PAS50]
Length=343

 Score = 41.2 bits (95),  Expect = 0.25
 Identities = 48/184 (26%), Positives = 82/184 (45%), Gaps = 26/184 (14%)
 Frame = +1

Query  1    SYSQYSMWAQCPHKWKTAYVDGHRKFTDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAEQL  180
            SYS  S WA+C  KW+ ++   H +      T+ G+++H + + +   +Y+     AE L
Sbjct  17   SYSSLSQWAECGEKWRLSH-GYHAQHHTWYATIAGSAIHRITEQYDLHLYSP----AEYL  71

Query  181  PLEDMLLTRMKRNFEEIV----------KANGG--EMFCE-----EKDMVEFYRHG---V  300
             L D L +  K  F+  V          K +G   +  CE     +KD   +  +G   V
Sbjct  72   ALPDKLAS-FKNVFDTQVALAESEGTRHKPSGRICKNMCESGGPHKKDYNWWMMYGPTFV  130

Query  301  EILKFIRKKRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLdvvirdtvrdviKIYDIK  480
            +  K  R+   +Y +       GIE P++  L +  + VG++D +  DT      I D+K
Sbjct  131  DRWKTWRRNHPEYITAVIDGKPGIEYPVETTLDDGTQIVGYIDRIFTDTDTGETFILDLK  190

Query  481  TSTM  492
            T  +
Sbjct  191  TGRL  194


>ref|YP_001285613.1| gp37 [Propionibacterium phage PA6]
 gb|ABE68606.1| gp37 [Propionibacterium phage PA6]
Length=315

 Score = 40.8 bits (94),  Expect = 0.31
 Identities = 47/183 (26%), Positives = 79/183 (43%), Gaps = 24/183 (13%)
 Frame = +1

Query  1    SYSQYSMWAQCPHKWKTAYVDGHRKFTDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAEQL  180
            SYS  + WA+C  KW+ A+   H +      T+ G+++H + + +   +YN     AE  
Sbjct  17   SYSSLTQWAECGEKWRLAH-GYHAQHHTWYATIAGSAIHHITEQYDLHLYNP----AEYP  71

Query  181  PLEDML--LTRMKRNFEEIVKANGGEM---------FCE-----EKDMVEFYRHG---VE  303
             L D L   T +      + ++ G  +          CE     +KD   +  +G   V+
Sbjct  72   ALPDKLSSFTNIFDTQIALAESEGTNIKPSGRICKNMCESGGPHKKDYNWWMMYGPTFVD  131

Query  304  ILKFIRKKRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLdvvirdtvrdviKIYDIKT  483
              K  R+   +Y +       GIE P++  L +  K VG++D V  DT      I D+KT
Sbjct  132  RWKTWRRNHPEYATAVIDGQPGIEYPVETTLDDDTKIVGYIDRVFTDTDTGETFILDLKT  191

Query  484  STM  492
              +
Sbjct  192  GRL  194


>ref|ZP_09772576.1| hypothetical protein WG8_1100 [Paenibacillus sp. Aloe-11]
 gb|EHS58851.1| hypothetical protein WG8_1100 [Paenibacillus sp. Aloe-11]
Length=305

 Score = 38.1 bits (87),  Expect = 2.3
 Identities = 31/102 (30%), Positives = 43/102 (42%), Gaps = 13/102 (13%)
 Frame = +1

Query  529  SDQLLLYKQFYSKQYNHPLDKIEVEFFIVKRKLYENTDFPQKRVQKFVPANGKPSINKVV  708
            SDQL LY  +  + Y  PL+KIEV    +    +E     Q+ + K V         + V
Sbjct  207  SDQLFLYASYVQEHYQLPLEKIEVRVEYLMTGEHEVYRPVQEDIDKVV---------RNV  257

Query  709  ARLNEFMTECFDSDGEYN---IEHIYRKEASKKNCKFCEFNQ  825
             R  E M  C D D  YN    E  +    S++ C  C F +
Sbjct  258  GRYIEEMKSCLDDD-YYNRPKPESFFTPMPSRRACGGCNFRE  298


>gb|EGE72742.1| hypothetical protein HMPREF9338_01036 [Propionibacterium acnes 
HL096PA2]
 gb|EGE74950.1| hypothetical protein HMPREF9337_00814 [Propionibacterium acnes 
HL096PA3]
Length=315

 Score = 37.4 bits (85),  Expect = 4.1
 Identities = 43/181 (24%), Positives = 77/181 (43%), Gaps = 20/181 (11%)
 Frame = +1

Query  1    SYSQYSMWAQCPHKWKTAYVDGHRKFTDSIHTMFGTSMHEVIQTFLTVMYNDTAKLAEQL  180
            SYS  S WA+C  KW+  +   H +      T+ G+++H + + +   +YN       +L
Sbjct  17   SYSSLSQWAECGEKWRLQH-GYHTQHHTWYATIAGSAIHHITEQYDLHLYNPDE--YPEL  73

Query  181  PLEDMLLTRMKRNFEEIVKANGGEM---------FCE-----EKDMVEFYRHG---VEIL  309
            P +    T +      + ++ G  +          CE     +KD   +  +G   V+  
Sbjct  74   PDKLSSFTNIFDTQVALAESEGTNIKPSGRVCKNMCESGGPNKKDYNWWMMYGPIFVDRW  133

Query  310  KFIRKKRAQYFSKKGYELVGIETPIDYDLPNKIKFVGFLdvvirdtvrdviKIYDIKTST  489
            K  R+   +Y +       GIE P++  L +  + VG++D V  DT      I D+KT  
Sbjct  134  KTWRRNHPEYATAVIDGQPGIEYPVETTLQDGTQIVGYIDRVFTDTNTGETFILDLKTGR  193

Query  490  M  492
            +
Sbjct  194  L  194


>ref|ZP_09531181.1| hypothetical protein HMPREF0995_02017 [Lachnospiraceae bacterium 
7_1_58FAA]
 gb|EHO34210.1| hypothetical protein HMPREF0995_02017 [Lachnospiraceae bacterium 
7_1_58FAA]
Length=251

 Score = 37.0 bits (84),  Expect = 4.7
 Identities = 34/147 (23%), Positives = 63/147 (43%), Gaps = 29/147 (20%)
 Frame = +1

Query  1    SYSQYSMWAQCPHKWKTAYV----DGHR-KFTDSIHTMFGTSMHEVIQTFLTVMYNDTAK  165
            SYS+ + +  CP+KW  +Y+    +G   K        FG+ MH+++Q +L       + 
Sbjct  4    SYSRVASFDDCPYKWFLSYLYRDENGRSLKKKSGFFAEFGSYMHKILQMYL-------SG  56

Query  166  LAEQLPLEDMLLTRMKRNFEEIVKANGGEMFCEEKDMVEFYRHGVEILKFIRKKRAQYFS  345
            L E+  L    +   K N     KA   +++      + +++ G   L          FS
Sbjct  57   LLEKERLSTYYVAHFKENV--FSKAPNSKIY------MNYFQQGFHYL--------DDFS  100

Query  346  KKGYELVGIETPIDYDLPNKIKFVGFL  426
                 ++G+E  +D+    + KF GF+
Sbjct  101  FPSRTIIGVEEKVDFMFAGR-KFTGFV  126


>ref|XP_002489617.1| hypothetical protein [Komagataella pastoris GS115]
 emb|CAY67336.1| hypothetical protein PAS_chr1-3_0311 [Komagataella pastoris GS115]
Length=259

 Score = 36.6 bits (83),  Expect = 6.4
 Identities = 19/57 (33%), Positives = 27/57 (47%), Gaps = 3/57 (5%)
 Frame = -2

Query  843  CVTIFCLIEFTKFAILFRCFFSINMLNIVFSIRIK---TFCHKFIQSSNHFINRWFT  682
            C+      + T F  LFRCF S+ +  I+FS  IK      H F   S+ F+  + T
Sbjct  50   CIMYLSFYQMTAFLALFRCFMSLTITCIIFSASIKYQEKIYHNFNYFSSIFLEEYIT  106


>emb|CCA36439.1| putative membrane protein [Komagataella pastoris CBS 7435]
Length=273

 Score = 36.6 bits (83),  Expect = 6.6
 Identities = 19/57 (33%), Positives = 27/57 (47%), Gaps = 3/57 (5%)
 Frame = -2

Query  843  CVTIFCLIEFTKFAILFRCFFSINMLNIVFSIRIK---TFCHKFIQSSNHFINRWFT  682
            C+      + T F  LFRCF S+ +  I+FS  IK      H F   S+ F+  + T
Sbjct  64   CIMYLSFYQMTAFLALFRCFMSLTITCIIFSASIKYQEKIYHNFNYFSSIFLEEYIT  120


>ref|YP_003869401.1| hypothetical protein PPE_01015 [Paenibacillus polymyxa E681]
 gb|ADM68863.1| Conserved hypothetical protein [Paenibacillus polymyxa E681]
Length=305

 Score = 36.6 bits (83),  Expect = 7.1
 Identities = 30/105 (29%), Positives = 44/105 (42%), Gaps = 19/105 (18%)
 Frame = +1

Query  529  SDQLLLYKQFYSKQYNHPLDKIEVEFFIVKRKLYENTDFPQKRVQKFVPANGKPSINKVV  708
            SDQL LY  +  + Y  PL+KIEV    +    +E            V    +  I+KVV
Sbjct  207  SDQLFLYASYVQEHYQLPLEKIEVRVEYLMTGEHE------------VYRPAQEDIDKVV  254

Query  709  ARLNEFMTE---CFDSDGEYN---IEHIYRKEASKKNCKFCEFNQ  825
              +  ++ E   C D D  YN    E  +    S++ C  C F +
Sbjct  255  GNVGRYIDEMKSCLDDD-YYNRPKPESFFTPMPSRRACGGCNFRE  298