GOS 2168020

From Metagenes
Warning: this metagenomic sequence has been carefully annotated by students during bioinformatics assignments. These quality annotations are therefore the result of a teaching exercise that you are most welcome to amend and extend if necessary!


Sequence
CAMERA AccNum : JCVI_READ_1092344195271
Annotathon code: GOS_2168020
Sample :
  • GPS :15°16'40s; 148°13'28w
  • Polynesia Archipelagos: Tikehau Lagoon - Fr. Polynesia
  • Coral Atoll (-1.2m, 27.8°C, 0.1-0.8 microns)
Authors
Team : Algarve 2011
Username : ZAF
Annotated on : 2011-05-24 23:03:22
  • Matos Ana
  • Rodrigues José
  • Rodrigues Filipa

Synopsis

Genomic Sequence

>JCVI_READ_1092344195271 GOS_2168020 Genomic DNA
CTCCCCTTATTAATATCCTTCTGCATAACTTTTAACTTAGCTTTAGACACAAAGCTTGTTACATCAGCCATAGCGTCCACAAGTTCATTCACAAGACTTC
TCTGATCGCTTAATTTACCTGCTCTTTCTTTGAAGCCTGCATCGGCAGAAAGAGCATCTTTTATTTCTTCAAAGACTCTTAAGTCAAAAGCAGTAATTTG
AACATGAGATCCAACTTTACCTGTTGATCTCCCTGCATCAAGTTGTTTTGCTGTGAGAGGATAATCATGAATACTGTAGCTCATTAAAGGATGTGAACCC
TCTAACTTTGACCCACCTGACCATTCATCATCTTCAGTGGGTTTGTATGAATTCTTACTTGAAAGTAGACTTACTATTTCTTTTCTCACATCTAATAAGT
CGTTGTACATTGCAGTAAGTGAAGAAATATCATTCTCTACTTCACTTTGTTCATCTTGAAGAAGAGCAGGAAGCAAATCGTCTTGTATATCTTCACTATT
CTGAAAAGTATTCAACCACTCCCCAATAAGATCAAGTACTTCATCTTGTTCTTTTTGTAAGGTTTCAGCTAGGTATTCGTAAGTAGCAACATGGTTTTTT
AGATGTAAGTTACTTGGTGCATATTTTAGAATATCTTTAAGCCCTGAAAGAAAGTCATAGATGCTACTAACATCATTAGAAAGAGAAGTAATCGCTTCCA
TGAGAGTAGCAAGACAAGAAATCCCATCAAATCCGAAGTCATCATCAAAGTACTTAACGAACCCTTTCATATCAAAAATCTTAGCGATCTTTTGCTTTTC
TTTCCAAAGCTCTCTTTCACTGTAAACCCCATCCAAAAGCTGAACTTCAGAGTCAGTT

Translation

[2 - 856/858]   indirect strand
>GOS_2168020 Translation [2-856   indirect strand]
TDSEVQLLDGVYSERELWKEKQKIAKIFDMKGFVKYFDDDFGFDGISCLATLMEAITSLSNDVSSIYDFLSGLKDILKYAPSNLHLKNHVATYEYLAETL
QKEQDEVLDLIGEWLNTFQNSEDIQDDLLPALLQDEQSEVENDISSLTAMYNDLLDVRKEIVSLLSSKNSYKPTEDDEWSGGSKLEGSHPLMSYSIHDYP
LTAKQLDAGRSTGKVGSHVQITAFDLRVFEEIKDALSADAGFKERAGKLSDQRSLVNELVDAMADVTSFVSKAKLKVMQKDINKG

[ Warning ] 5' incomplete: does not start with a Methionine
[ Warning ] 3' incomplete: following codon is not a STOP

Annotator commentaries

The chosen ORF extends in the reverse strand from base 2 to base 856 giving it a translation of around 250 a.a. This was the chosen ORF because it extended far beyond the 60 a.a minimum needed, all other found ORFs were barely above the minimum or even below it.


BLAST results gave very high E-values (an acceptable E-value is at least 10^-6) even after performing searches in 3 diferent databases, it is then possible to assume that there are no known homologs for this protein.


Given the size of the ORF and the fact that it is very likely that it is coding and even tough no homologs were found, it is possible that this is a ORFan.


Because of the poor results given by BLAST the Interpro ressults for the protein domains was expected (it resulted in No hits!), therefore with so few data the results from multiple alignments, phylogenetic tree and taxonomy report have very little significance or no significance at all, because of this none of them were created, also because of this it was not possible to find the START and STOP codons in the 5' and 3' extremeties, therefore ramainig an incomplete ORF.



ORF finding

PROTOCOL


a) SMS ORFinder / forward strand / frames 1, 2 & 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code

b) SMS ORFinder / reverse strand / frames 1, 2 & 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code



RESULTS ANALYSIS


It was possible to identify 2 ORFs in the direct strand in reading frame 1 and 3, 1 ORF in the reverse strand in reading frame 2.

In the direct strand the ORF in reading frame 1 extends from base 280 to base 459, in the reading frame 3 the ORF extends from base 548 to base 857


The biggest ORF is in the reverse strand in reading frame 2 with 854 bases, therefore this is the ORF that will be used.


However the BLASTp results in the databases of Swissprot, NR and Enviromental samples did not produced any significant results with the highest E-value of 1.2 (NR), 0.028 (Swissprot) and 1.3 (environmental samples) far from the 10^-6 that constitutes an aceptable E-value.


It is unlikely that this ORF does not codes anything because of its size that extends in the reverse strand from base 2 to 856



a) Direct strand

>ORF number 1 in reading frame 1 on the direct strand extends from base 280 to base 459.
CTCATTAAAGGATGTGAACCCTCTAACTTTGACCCACCTGACCATTCATCATCTTCAGTG
GGTTTGTATGAATTCTTACTTGAAAGTAGACTTACTATTTCTTTTCTCACATCTAATAAG
TCGTTGTACATTGCAGTAAGTGAAGAAATATCATTCTCTACTTCACTTTGTTCATCTTGA


>Translation of ORF number 1 in reading frame 1 on the direct strand.
LIKGCEPSNFDPPDHSSSSVGLYEFLLESRLTISFLTSNKSLYIAVSEEISFSTSLCSS*


No ORFs were found in reading frame 2.

>ORF number 1 in reading frame 3 on the direct strand extends from base 648 to base 857.
AAGAAAGTCATAGATGCTACTAACATCATTAGAAAGAGAAGTAATCGCTTCCATGAGAGT
AGCAAGACAAGAAATCCCATCAAATCCGAAGTCATCATCAAAGTACTTAACGAACCCTTT
CATATCAAAAATCTTAGCGATCTTTTGCTTTTCTTTCCAAAGCTCTCTTTCACTGTAAAC
CCCATCCAAAAGCTGAACTTCAGAGTCAGT

>Translation of ORF number 1 in reading frame 3 on the direct strand.
KKVIDATNIIRKRSNRFHESSKTRNPIKSEVIIKVLNEPFHIKNLSDLLLFFPKLSFTVN
PIQKLNFRVS

-----------------------------------------------------------------------------------------------

b) Reverse strand

No ORFs were found in reading frame 1.

>ORF number 1 in reading frame 2 on the reverse strand extends from base 2 to base 856.
ACTGACTCTGAAGTTCAGCTTTTGGATGGGGTTTACAGTGAAAGAGAGCTTTGGAAAGAA
AAGCAAAAGATCGCTAAGATTTTTGATATGAAAGGGTTCGTTAAGTACTTTGATGATGAC
TTCGGATTTGATGGGATTTCTTGTCTTGCTACTCTCATGGAAGCGATTACTTCTCTTTCT
AATGATGTTAGTAGCATCTATGACTTTCTTTCAGGGCTTAAAGATATTCTAAAATATGCA
CCAAGTAACTTACATCTAAAAAACCATGTTGCTACTTACGAATACCTAGCTGAAACCTTA
CAAAAAGAACAAGATGAAGTACTTGATCTTATTGGGGAGTGGTTGAATACTTTTCAGAAT
AGTGAAGATATACAAGACGATTTGCTTCCTGCTCTTCTTCAAGATGAACAAAGTGAAGTA
GAGAATGATATTTCTTCACTTACTGCAATGTACAACGACTTATTAGATGTGAGAAAAGAA
ATAGTAAGTCTACTTTCAAGTAAGAATTCATACAAACCCACTGAAGATGATGAATGGTCA
GGTGGGTCAAAGTTAGAGGGTTCACATCCTTTAATGAGCTACAGTATTCATGATTATCCT
CTCACAGCAAAACAACTTGATGCAGGGAGATCAACAGGTAAAGTTGGATCTCATGTTCAA
ATTACTGCTTTTGACTTAAGAGTCTTTGAAGAAATAAAAGATGCTCTTTCTGCCGATGCA
GGCTTCAAAGAAAGAGCAGGTAAATTAAGCGATCAGAGAAGTCTTGTGAATGAACTTGTG
GACGCTATGGCTGATGTAACAAGCTTTGTGTCTAAAGCTAAGTTAAAAGTTATGCAGAAG
GATATTAATAAGGGG

>Translation of ORF number 1 in reading frame 2 on the reverse strand.
TDSEVQLLDGVYSERELWKEKQKIAKIFDMKGFVKYFDDDFGFDGISCLATLMEAITSLS
NDVSSIYDFLSGLKDILKYAPSNLHLKNHVATYEYLAETLQKEQDEVLDLIGEWLNTFQN
SEDIQDDLLPALLQDEQSEVENDISSLTAMYNDLLDVRKEIVSLLSSKNSYKPTEDDEWS
GGSKLEGSHPLMSYSIHDYPLTAKQLDAGRSTGKVGSHVQITAFDLRVFEEIKDALSADA
GFKERAGKLSDQRSLVNELVDAMADVTSFVSKAKLKVMQKDINKG

No ORFs were found in reading frame 3.

Multiple Alignement

PROTOCOL



RESULTS ANALYSIS


Because of the high E-values (acceptable E-value being 10^-6), the Multiple alignments would not have any significance.

RAW RESULTS

Protein Domains

PROTOCOL


a) Interpro




RESULTS ANALYSIS


Considering the obtained results in BLAST it was expected that Interpro would produce No hits. It is likely that the domains contained in the ORF have still not been researched and thus do not appear in any database.

a)No hits found.

Phylogeny

PROTOCOL



RESULTS ANALYSIS


Because of the high E-values (acceptable E-value being 10^-6) obtained in the BLAST results it is not possible to create a significant philogenetic tree

RAW RESULTS

Taxonomy report

PROTOCOL


a)BLASTp versus NR, NCBI default parameters apart from "Number of descriptions_1000"

b)BLASTp versus Swissprot, NCBI default parameters apart from "Number of descriptions_1000"

c)BLASTp versus Environmental samples, NCBI default parameters apart from "Number of descriptions_1000"



RESULTS ANALYSIS


Because of the high E-values that are far from the acceptable 10^-6, it has no significance to choose in-groups and out-groups.

RAW RESULTS

a)Lineage Report NR

cellular organisms
. Eukaryota           [eukaryotes]
. . Fungi/Metazoa group [eukaryotes]
. . . Bilateria           [animals]
. . . . Coelomata           [animals]
. . . . . Culex quinquefasciatus ------------   38 2 hits [flies]                 guanylate cyclase [Culex quinquefasciatus] >gi|167881247|gb
. . . . . Oryzias latipes (medaka) ..........   37 3 hits [bony fishes]           soluble guanylyl cyclase alpha subunit [Oryzias latipes]
. . . . . Oryzias curvinotus ................   37 1 hit  [bony fishes]           soluble guanylyl cyclase alpha1 subunit [Oryzias curvinotus]
. . . . Caenorhabditis briggsae AF16 --------   35 1 hit  [nematodes]             hypothetical protein CBG_12313 [Caenorhabditis briggsae AF1
. . . . Caenorhabditis briggsae .............   35 1 hit  [nematodes]             Hypothetical protein CBG12313 [Caenorhabditis briggsae]
. . . Talaromyces stipitatus ATCC 10500 -----   36 2 hits [ascomycetes]           cell polarity protein (Tea1), putative [Talaromyces stipita
. . Ricinus communis ------------------------   37 2 hits [eudicots]              conserved hypothetical protein [Ricinus communis] >gi|22353
. . Vitis vinifera (wine grape) .............   36 2 hits [eudicots]              PREDICTED: hypothetical protein [Vitis vinifera] >gi|296081
. . Dictyostelium discoideum AX4 ............   36 2 hits [cellular slime molds]  hypothetical protein DDB_G0283893 [Dictyostelium discoideum
. . Dictyostelium discoideum ................   36 1 hit  [cellular slime molds]  hypothetical protein DDB_G0283893 [Dictyostelium discoideum
. . Populus trichocarpa (black cottonwood) ..   35 2 hits [eudicots]              predicted protein [Populus trichocarpa] >gi|222851577|gb|EE
. . Hyacinthus orientalis (common hyacinth) .   35 1 hit  [monocots]              auxin-independent growth protein [Hyacinthus orientalis]
. Nitrococcus mobilis Nb-231 ----------------   38 2 hits [g-proteobacteria]      two-component response regulator NtrC [Nitrococcus mobilis 
. Collinsella aerofaciens ATCC 25986 ........   37 2 hits [high GC Gram+]         Hypothetical protein COLAER_00711 [Collinsella aerofaciens 
. Pseudomonas aeruginosa LESB58 .............   36 2 hits [g-proteobacteria]      PvdP [Pseudomonas aeruginosa LESB58] >gi|218771858|emb|CAW2
. Enterococcus faecalis V583 ................   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis V583] >gi|227
. Enterococcus faecalis TX0104 ..............   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis V583] >gi|227
. Enterococcus faecalis HH22 ................   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis V583] >gi|227
. Enterococcus faecalis T2 ..................   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis V583] >gi|227
. Enterococcus faecalis CH188 ...............   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis V583] >gi|227
. Enterococcus faecalis TX0860 ..............   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis V583] >gi|227
. Enterococcus faecalis TX0635 ..............   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis V583] >gi|227
. Enterococcus faecalis TX0309B .............   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis V583] >gi|227
. Enterococcus faecalis TX0630 ..............   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis V583] >gi|227
. Enterococcus faecalis TX0309A .............   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis V583] >gi|227
. Enterococcus faecalis TX1322 ..............   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis ATCC 29200 ..........   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis T1 ..................   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis ATCC 4200 ...........   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis T3 ..................   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis T8 ..................   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis DS5 .................   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis Merz96 ..............   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis HIP11704 ............   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis JH1 .................   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis E1Sol ...............   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis Fly1 ................   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis D6 ..................   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis ARO1/DG .............   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis T11 .................   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis X98 .................   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis R712 ................   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis S613 ................   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis PC1.1 ...............   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TUSoD Ef11 ..........   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX4248 ..............   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX0855 ..............   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX2134 ..............   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX0109 ..............   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX0411 ..............   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX0470 ..............   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis DAPTO 512 ...........   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis DAPTO 516 ...........   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX0102 ..............   36 2 hits [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus sp. 7L76 .....................   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX2137 ..............   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX4000 ..............   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX0027 ..............   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX2141 ..............   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX0031 ..............   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX0043 ..............   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX0312 ..............   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX0645 ..............   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX1302 ..............   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX1341 ..............   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX1342 ..............   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX1346 ..............   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis 62 ..................   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX1322] >gi|2
. Enterococcus faecalis TX4244 ..............   36 1 hit  [firmicutes]            adenylosuccinate lyase [Enterococcus faecalis TX4244]
. Pseudomonas aeruginosa ....................   36 2 hits [g-proteobacteria]      PvdD/PvdJ [Pseudomonas aeruginosa]
. Streptomyces sp. AA4 ......................   36 2 hits [high GC Gram+]         predicted protein [Streptomyces sp. AA4] >gi|302435423|gb|E
. Pseudomonas putida BIRD-1 .................   35 1 hit  [g-proteobacteria]      TonB-dependent siderophore receptor [Pseudomonas putida BIR
. Pseudomonas putida F1 .....................   35 2 hits [g-proteobacteria]      TonB-dependent siderophore receptor [Pseudomonas putida F1]
. Pseudomonas putida KT2440 .................   35 2 hits [g-proteobacteria]      TonB-dependent siderophore receptor [Pseudomonas putida KT2

________________________________________________________________________________________________________________________________________________________=

b)Lineage Report Swissprot

cellular organisms
. Eukaryota          [eukaryotes]
. . Dictyostelium discoideum ------------------------   36 1 hit  [cellular slime molds]  RecName: Full=Probable E3 ubiquitin-protein ligase DDB_G028
. . Arabidopsis thaliana (thale-cress) ..............   32 2 hits [eudicots]              RecName: Full=Syntaxin-72; Short=AtSYP72
. . Homo sapiens (man) ..............................   31 2 hits [primates]              RecName: Full=Brain-specific angiogenesis inhibitor 2; Flag
. . Pongo abelii (Orang-utan) .......................   31 1 hit  [primates]              RecName: Full=Brain-specific angiogenesis inhibitor 2; Flag
. . Saccharomyces cerevisiae (yeast) ................   31 1 hit  [ascomycetes]           RecName: Full=Mitochondrial distribution and morphology pro
. . Rattus norvegicus (brown rat) ...................   31 2 hits [rodents]               RecName: Full=Delta(3,5)-Delta(2,4)-dienoyl-CoA isomerase, 
. . Mus musculus (mouse) ............................   31 1 hit  [rodents]               RecName: Full=Guanylyl cyclase GC-E; AltName: Full=Guanylat
. Pectobacterium atrosepticum -----------------------   33 1 hit  [enterobacteria]        RecName: Full=UPF0061 protein ECA1842
. Chloroflexus aurantiacus J-10-fl ..................   32 1 hit  [GNS bacteria]          RecName: Full=DNA replication and repair protein recF >gi|2
. Chloroflexus sp. Y-400-fl .........................   32 1 hit  [GNS bacteria]          RecName: Full=DNA replication and repair protein recF >gi|2
. Pectobacterium carotovorum subsp. carotovorum PC1 .   32 1 hit  [enterobacteria]        RecName: Full=UPF0061 protein PC1_2463
. Chloroflexus aggregans DSM 9485 ...................   31 1 hit  [GNS bacteria]          RecName: Full=DNA replication and repair protein recF
. Mycoplasma mobile .................................   31 1 hit  [mycoplasmas]           RecName: Full=UPF0082 protein MMOB1910

__________________________________________________________________________________________________________________________________________________________=

c)Lineage Report Enviremental Samples

                                                                  (Bits)  Value

gb|EBG98654.1|  hypothetical protein GOS_9340206 [marine metag...  35.8    1.3  
gb|EDH73298.1|  hypothetical protein GOS_551721 [marine metage...  33.9    4.7  
gb|EBA86708.1|  hypothetical protein GOS_328313 [marine metage...  33.5    6.5  
gb|EBJ87123.1|  hypothetical protein GOS_8826099 [marine metag...  33.5    6.8  
gb|EBF66105.1|  hypothetical protein GOS_9561335 [marine metag...  33.5    6.8  

ALIGNMENTS
>gb|EBG98654.1| hypothetical protein GOS_9340206 [marine metagenome]
Length=310

 Score = 35.8 bits (81),  Expect = 1.3, Method: Compositional matrix adjust.
 Identities = 18/54 (33%), Positives = 28/54 (52%), Gaps = 0/54 (0%)

Query  206  LDAGRSTGKVGSHVQITAFDLRVFEEIKDALSADAGFKERAGKLSDQRSLVNEL  259
             D    T +  S + +TAFDLR+++  KD    +  F+E   ++ D   L NEL
Sbjct  54   FDQQEKTNETTSKIDLTAFDLRIYDLEKDIKKLNNNFEELIFQIDDLNQLYNEL  107


>gb|EDH73298.1| hypothetical protein GOS_551721 [marine metagenome]
Length=292

 Score = 33.9 bits (76),  Expect = 4.7, Method: Compositional matrix adjust.
 Identities = 19/57 (33%), Positives = 29/57 (51%), Gaps = 3/57 (5%)

Query  33  FVKYFDDDFGFDGISCLATLMEAITSLSNDVSS--IYDFLSGLKDILKYA-PSNLHL  86
           FV   +++FGF G     TL+      S+ V+   +    SGL D++  A P+ LHL
Sbjct  24  FVASVNEEFGFSGAKAFTTLLRTCGRESDKVAEELVRSVSSGLPDMVVVAEPTGLHL  80


>gb|EBA86708.1| hypothetical protein GOS_328313 [marine metagenome]
Length=352

 Score = 33.5 bits (75),  Expect = 6.5, Method: Compositional matrix adjust.
 Identities = 24/65 (37%), Positives = 32/65 (49%), Gaps = 7/65 (11%)

Query  226  LRVFEEIKDALSADAGFK-----ERAGKLSDQRSLVNELV--DAMADVTSFVSKAKLKVM  278
            L VF  I D L  D  F      E  G +SD  +L+ E +  D + DV+ FVSK  LK+ 
Sbjct  283  LEVFGNINDGLQYDKFFISGEDLEVIGTVSDDEALILERLFRDGILDVSGFVSKDGLKIT  342

Query  279  QKDIN  283
              + N
Sbjct  343  GGEFN  347


>gb|EBJ87123.1| hypothetical protein GOS_8826099 [marine metagenome]
Length=360

 Score = 33.5 bits (75),  Expect = 6.8, Method: Compositional matrix adjust.
 Identities = 24/65 (37%), Positives = 32/65 (49%), Gaps = 7/65 (11%)

Query  226  LRVFEEIKDALSADAGFK-----ERAGKLSDQRSLVNELV--DAMADVTSFVSKAKLKVM  278
            L VF  I D L  D  F      E  G +SD  +L+ E +  D + DV+ FVSK  LK+ 
Sbjct  184  LEVFGNINDGLQYDKFFISGEDLEVIGTVSDDEALILERLFRDGILDVSGFVSKDGLKIT  243

Query  279  QKDIN  283
              + N
Sbjct  244  GGEFN  248


>gb|EBF66105.1| hypothetical protein GOS_9561335 [marine metagenome]
Length=268

 Score = 33.5 bits (75),  Expect = 6.8, Method: Compositional matrix adjust.
 Identities = 22/59 (37%), Positives = 29/59 (49%), Gaps = 4/59 (7%)

Query  56   ITSLSNDVSSIYDFLSGLKD-ILKYAPSNLHL---KNHVATYEYLAETLQKEQDEVLDL  110
            I S  N V   Y+ L+G  D  LK++    HL   +  VA  EYL +T    Q+ VLD 
Sbjct  172  IGSRPNQVGKEYEALTGFPDTCLKFSKDQPHLHPTQKPVALMEYLIKTYTNPQETVLDF  230



BLAST

PROTOCOL


a)BLASTp versus NR, NCBI default parameters apart from "Number of descriptions_1000"

b)BLASTp versus Swissprot, NCBI default parameters apart from "Number of descriptions_1000"

c)BLASTp versus Environmental samples, NCBI default parameters apart from "Number of descriptions_1000"



RESULTS ANALYSIS


After searching in 3 diferent databases the results are still not significant with E-values that are far from the aceptable 10^-6 , wich means that there are no known homolog proteins.


Because of very high E-values obtained it is not possible to create a philogenetic tree.



a)                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|XP_001846805.1|  guanylate cyclase [Culex quinquefasciatus...  38.1    1.2  
ref|ZP_01128278.1|  two-component response regulator NtrC [Nit...  38.1    1.3  
dbj|BAA76690.1|  soluble guanylyl cyclase alpha subunit [Oryzi...  37.7    1.7  
dbj|BAC80220.1|  soluble guanylyl cyclase alpha1 subunit [Oryz...  37.7    1.9  
ref|ZP_01771723.1|  Hypothetical protein COLAER_00711 [Collins...  37.4    2.5  
ref|XP_002530048.1|  conserved hypothetical protein [Ricinus c...  37.4    2.7  
ref|YP_002440499.1|  PvdP [Pseudomonas aeruginosa LESB58] >emb...  37.0    2.8  
ref|XP_002285492.1|  PREDICTED: hypothetical protein [Vitis vi...  37.0    3.3  
ref|NP_001098122.1|  soluble guanylyl cyclase alpha subunit [O...  37.0    3.4  
ref|NP_816013.1|  adenylosuccinate lyase [Enterococcus faecali...  36.6    3.7  
ref|ZP_04433950.1|  adenylosuccinate lyase [Enterococcus faeca...  36.6    3.7  
gb|EFT90659.1|  adenylosuccinate lyase [Enterococcus faecalis ...  36.6    3.8  
ref|XP_638906.2|  hypothetical protein DDB_G0283893 [Dictyoste...  36.6    4.1  
ref|XP_002480614.1|  cell polarity protein (Tea1), putative [T...  36.6    4.4  
gb|AAX16335.1|  PvdD/PvdJ [Pseudomonas aeruginosa]                 36.6    4.5  
ref|ZP_07278870.1|  predicted protein [Streptomyces sp. AA4] >...  36.2    5.0  
gb|AAX16306.1|  PvdD/PvdJ [Pseudomonas aeruginosa]                 36.2    5.3  
ref|XP_002311757.1|  predicted protein [Populus trichocarpa] >...  35.8    6.9  
gb|ADR58081.1|  TonB-dependent siderophore receptor [Pseudomon...  35.8    7.5  
ref|YP_001265733.1|  TonB-dependent siderophore receptor [Pseu...  35.4    7.9  
ref|NP_742517.1|  TonB-dependent siderophore receptor [Pseudom...  35.4    8.7  
emb|CAP31309.2|  hypothetical protein CBG_12313 [Caenorhabditi...  35.4    8.7  
ref|XP_002639604.1|  Hypothetical protein CBG12313 [Caenorhabd...  35.4    8.8  
gb|AAT08765.1|  auxin-independent growth protein [Hyacinthus o...  35.4    9.1  

ALIGNMENTS
>ref|XP_001846805.1| guanylate cyclase [Culex quinquefasciatus]
 gb|EDS44630.1| guanylate cyclase [Culex quinquefasciatus]
Length=694

 Score = 38.1 bits (87),  Expect = 1.2, Method: Compositional matrix adjust.
 Identities = 28/89 (32%), Positives = 45/89 (51%), Gaps = 5/89 (5%)

Query  15   RELWKEKQKIAKIFDMKGFVKYFDDDFGFDGISCLATLMEAITSLSNDVSSIYDFLSGLK  74
            R+L +++Q  A+ FD      +F D  GF  IS +++ ME +T L N +  ++D +    
Sbjct  447  RQLKQQRQVPAETFD--SVTIFFSDIVGFTYISAVSSAMEVVTML-NTLYRLFDSIILKY  503

Query  75   DILKYAPSNLHLKNHVATYEYLAETLQKE  103
            D+ K +PS   L+ HV        T Q E
Sbjct  504  DVYKVSPS--LLRWHVVVTTPPPHTPQVE  530


>ref|ZP_01128278.1| two-component response regulator NtrC [Nitrococcus mobilis Nb-231]
 gb|EAR20794.1| two-component response regulator NtrC [Nitrococcus mobilis Nb-231]
Length=465

 Score = 38.1 bits (87),  Expect = 1.3, Method: Compositional matrix adjust.
 Identities = 34/120 (29%), Positives = 55/120 (46%), Gaps = 8/120 (6%)

Query  37   FDDDFGFDGISCLATLMEAITSLSNDVSSIYD-FLSGLKDILKYAPSNLHLKNHVATYEY  95
            F +D  F  ++ +   +  +   S+D+ ++ + FL+     LK  P  L      A   Y
Sbjct  292  FREDL-FHRLNVIRIHVCTLRERSSDIPALAEHFLARAARELKVEPKKLS----PAVERY  346

Query  96   LAETLQKEQDEVLDLIGEWLNTFQNSEDIQDDLLPALLQDEQSEVE--NDISSLTAMYND  153
              +         L+ I  WL    +S+DI+ + LPA L+ E S+VE  ND  SL A + D
Sbjct  347  FMQQPWPGNVRELENICRWLTVMASSQDIELEDLPAELRAEPSQVEIGNDWESLLACWAD  406


>dbj|BAA76690.1| soluble guanylyl cyclase alpha subunit [Oryzias latipes]
Length=678

 Score = 37.7 bits (86),  Expect = 1.7, Method: Compositional matrix adjust.
 Identities = 30/111 (28%), Positives = 51/111 (46%), Gaps = 13/111 (11%)

Query  15   RELWKEKQKIAKIFDMKGFVKYFDDDFGFDGISCLATLMEAITSLSNDVSSIYDFLSGLK  74
            ++LW+ +   AK F+    +  F D  GF  +  L T M+ IT L N++ + +D+  G  
Sbjct  473  QQLWQGETVQAKKFNQVTML--FSDIVGFTAVCSLCTPMQVITML-NELYTKFDYQCGEL  529

Query  75   DILK--------YAPSNLHLKN--HVATYEYLAETLQKEQDEVLDLIGEWL  115
            D+ K             LH ++  H     ++A  + +  DEVL   GE +
Sbjct  530  DVYKVETIGDAYCVAGGLHRESETHAVEIAFMALKMMELSDEVLTPTGEPI  580


>dbj|BAC80220.1| soluble guanylyl cyclase alpha1 subunit [Oryzias curvinotus]
Length=678

 Score = 37.7 bits (86),  Expect = 1.9, Method: Compositional matrix adjust.
 Identities = 30/111 (28%), Positives = 51/111 (46%), Gaps = 13/111 (11%)

Query  15   RELWKEKQKIAKIFDMKGFVKYFDDDFGFDGISCLATLMEAITSLSNDVSSIYDFLSGLK  74
            ++LW+ +   AK F+    +  F D  GF  +  L T M+ IT L N++ + +D+  G  
Sbjct  473  QQLWQGETVQAKKFNQVTML--FSDIVGFTAVCSLCTPMQVITML-NELYTKFDYQCGEL  529

Query  75   DILK--------YAPSNLHLKN--HVATYEYLAETLQKEQDEVLDLIGEWL  115
            D+ K             LH ++  H     ++A  + +  DEVL   GE +
Sbjct  530  DVYKVETIGDAYCVAGGLHRESDTHAVEIAFMALKMMELSDEVLTPTGEPI  580


>ref|ZP_01771723.1| Hypothetical protein COLAER_00711 [Collinsella aerofaciens ATCC 
25986]
 gb|EBA40187.1| Hypothetical protein COLAER_00711 [Collinsella aerofaciens ATCC 
25986]
Length=853

 Score = 37.4 bits (85),  Expect = 2.5, Method: Compositional matrix adjust.
 Identities = 20/69 (29%), Positives = 36/69 (53%), Gaps = 8/69 (11%)

Query  41   FGFDGISCLATLMEAITSLSNDVSSIYDFLSGLKDILKYAPSNLHLK--------NHVAT  92
            FG  G SCLA++ + + +     + +  FL+G +D L + P+ L  +        + VA 
Sbjct  515  FGEGGGSCLASMTDDVAATVKANADVVSFLAGYRDALAHLPAELRKRLVDDSSQVDAVAE  574

Query  93   YEYLAETLQ  101
             +Y+AE L+
Sbjct  575  EDYIAERLR  583


>ref|XP_002530048.1| conserved hypothetical protein [Ricinus communis]
 gb|EEF32348.1| conserved hypothetical protein [Ricinus communis]
Length=552

 Score = 37.4 bits (85),  Expect = 2.7, Method: Compositional matrix adjust.
 Identities = 24/82 (30%), Positives = 41/82 (50%), Gaps = 8/82 (9%)

Query  7    LLDGVYSERELWKEKQKIAKIFDMKGFVKYFDDDFGFDGISCLATLMEAITSLSNDVSSI  66
            L+  V  + ++WK++ K   IFD+  F+ Y  DD     +  +  + E  T  +   SSI
Sbjct  186  LILPVLKQDQIWKDQTKFEDIFDVDHFIDYLKDD-----VRIVRDIPEWFTDKAELFSSI  240

Query  67   YDFLSGLKDILKYAPSNLHLKN  88
                  +K+I KYAP+  ++ N
Sbjct  241  R---RTVKNIPKYAPAQFYIDN  259


>ref|YP_002440499.1| PvdP [Pseudomonas aeruginosa LESB58]
 emb|CAW27637.1| PvdP [Pseudomonas aeruginosa LESB58]
Length=537

 Score = 37.0 bits (84),  Expect = 2.8, Method: Compositional matrix adjust.
 Identities = 35/138 (26%), Positives = 59/138 (43%), Gaps = 20/138 (14%)

Query  18   WKE--KQKIAKIFDMKGFVKYFDDDFGFDGISCLATLMEAITSLSNDVSSIYDFLSGLKD  75
            WK   +  +   FD  GF++YFD+  GF       ++  A  ++ +D  S  ++L GLK 
Sbjct  280  WKRLPRPAVPLEFDRPGFIRYFDNPDGF-------SVPPAWVAVGDDEYS--EWLRGLKS  330

Query  76   ILKYAPSNLHLKNHVATYEYLAE-TLQKEQDEVLDLIGEWLNTF--------QNSEDIQD  126
               Y  + L  ++     EYLA+ TL +   E+   + +WL+           N   +  
Sbjct  331  AEAYHSNFLAWESQYQDPEYLAKLTLGQFGSEMELGMHDWLHMRWATVTRDPSNGSPVMG  390

Query  127  DLLPALLQDEQSEVENDI  144
            D +P+         END 
Sbjct  391  DRVPSDFSPRWFRPENDF  408


>ref|XP_002285492.1| PREDICTED: hypothetical protein [Vitis vinifera]
 emb|CBI16804.3| unnamed protein product [Vitis vinifera]
Length=552

 Score = 37.0 bits (84),  Expect = 3.3, Method: Compositional matrix adjust.
 Identities = 22/73 (31%), Positives = 38/73 (53%), Gaps = 8/73 (10%)

Query  16   ELWKEKQKIAKIFDMKGFVKYFDDDFGFDGISCLATLMEAITSLSNDVSSIYDFLSGLKD  75
            ++WK++ K   IFD+  F+ Y  DD     +  +  + E  T  S  ++SI      +K+
Sbjct  195  QIWKDQTKFEDIFDVDHFIDYLKDD-----VRIVRDIPEWFTDKSELLTSIR---RTVKN  246

Query  76   ILKYAPSNLHLKN  88
            I KYAP+  ++ N
Sbjct  247  IPKYAPAQFYIDN  259


>ref|NP_001098122.1| soluble guanylyl cyclase alpha subunit [Oryzias latipes]
 dbj|BAA19198.1| soluble guanylyl cyclase alpha subunit [Oryzias latipes]
Length=678

 Score = 37.0 bits (84),  Expect = 3.4, Method: Compositional matrix adjust.
 Identities = 30/111 (28%), Positives = 51/111 (46%), Gaps = 13/111 (11%)

Query  15   RELWKEKQKIAKIFDMKGFVKYFDDDFGFDGISCLATLMEAITSLSNDVSSIYDFLSGLK  74
            ++LW+ +   AK F+    +  F D  GF  +  L T M+ IT L N++ + +D+  G  
Sbjct  473  QQLWQGETVQAKKFNQVTML--FSDIVGFTAVCSLCTPMQVITML-NELYTKFDYQCGEL  529

Query  75   DILK--------YAPSNLHLKN--HVATYEYLAETLQKEQDEVLDLIGEWL  115
            D+ K             LH ++  H     ++A  + +  DEVL   GE +
Sbjct  530  DVYKVETIGDAYCVAGGLHRESETHAVEIAFMALKMIELSDEVLTPTGEPI  580

______________________________________________________________________________________________=

b)                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

sp|Q54QG5.2|Y3893_DICDI  RecName: Full=Probable E3 ubiquitin-p...  36.6    0.19 
sp|Q6D646.1|Y1842_ERWCT  RecName: Full=UPF0061 protein ECA1842     33.1    1.9  
sp|Q94KK6.1|SYP72_ARATH  RecName: Full=Syntaxin-72; Short=AtSYP72  32.7    2.5  
sp|A9WDD4.1|RECF_CHLAA  RecName: Full=DNA replication and repa...  32.7    2.6  
sp|C6DKP3.1|Y2463_PECCP  RecName: Full=UPF0061 protein PC1_2463    32.3    3.4  
sp|B8G3J6.1|RECF_CHLAD  RecName: Full=DNA replication and repa...  32.0    4.2  
sp|O60241.2|BAI2_HUMAN  RecName: Full=Brain-specific angiogene...  32.0    5.0  
sp|Q5R7Y0.2|BAI2_PONAB  RecName: Full=Brain-specific angiogene...  32.0    5.0  
sp|Q9FJE8.1|H2A7_ARATH  RecName: Full=Probable histone H2A.7; ...  32.0    5.3  
sp|Q05930.1|MDM30_YEAST  RecName: Full=Mitochondrial distribut...  31.6    6.0  
sp|Q6KI99.1|Y1910_MYCMO  RecName: Full=UPF0082 protein MMOB1910    31.6    6.5  
sp|O15550.2|KDM6A_HUMAN  RecName: Full=Lysine-specific demethy...  31.2    8.0  
sp|Q62651.2|ECH1_RAT  RecName: Full=Delta(3,5)-Delta(2,4)-dien...  31.2    8.4  
sp|P52785.1|GUC2E_MOUSE  RecName: Full=Guanylyl cyclase GC-E; ...  31.2    8.5  
sp|P20814.1|CP2CD_RAT  RecName: Full=Cytochrome P450 2C13, mal...  30.8    9.5  

ALIGNMENTS
>sp|Q54QG5.2|Y3893_DICDI RecName: Full=Probable E3 ubiquitin-protein ligase DDB_G0283893
Length=5875

 Score = 36.6 bits (83),  Expect = 0.19, Method: Compositional matrix adjust.
 Identities = 19/66 (29%), Positives = 36/66 (55%), Gaps = 0/66 (0%)

Query  12    YSERELWKEKQKIAKIFDMKGFVKYFDDDFGFDGISCLATLMEAITSLSNDVSSIYDFLS  71
             YS R+ +  K+K A I D++G  K F D+ G+D ++ L + +  +  +++D    + F  
Sbjct  3653  YSIRDEFLLKKKFAGILDLEGKTKGFSDEIGYDHLAKLISYLTLMLEVASDRPKSWQFFC  3712

Query  72    GLKDIL  77
                D+L
Sbjct  3713  AHNDVL  3718


>sp|Q6D646.1|Y1842_ERWCT RecName: Full=UPF0061 protein ECA1842
Length=483

 Score = 33.1 bits (74),  Expect = 1.9, Method: Compositional matrix adjust.
 Identities = 19/46 (42%), Positives = 23/46 (50%), Gaps = 1/46 (2%)

Query  167  SKNSYKPTEDDEWSGGSKLEGSHPLMS-YSIHDYPLTAKQLDAGRS  211
            S + + P +DD WSG   L G  PL   YS H +   A QL  GR 
Sbjct  46   SSDWFTPEQDDVWSGTRLLPGMEPLAQVYSGHQFGSWAGQLGDGRG  91


>sp|Q94KK6.1|SYP72_ARATH RecName: Full=Syntaxin-72; Short=AtSYP72
Length=267

 Score = 32.7 bits (73),  Expect = 2.5, Method: Compositional matrix adjust.
 Identities = 24/76 (32%), Positives = 48/76 (64%), Gaps = 8/76 (10%)

Query  98   ETLQKEQDEVLDLIGEWLNTFQN-SEDIQDDL---LPALLQDEQSEVENDISSLTAMYND  153
            E  +K+QDE LD+I E L+  +N + D+ ++L   +P L+++ +++V+   S L    N 
Sbjct  173  EMRRKKQDEGLDIISEGLDALKNLARDMNEELDKQVP-LMEEMETKVDGATSDLK---NT  228

Query  154  LLDVRKEIVSLLSSKN  169
             + ++K++V + SS+N
Sbjct  229  NVRLKKQLVQMRSSRN  244


>sp|A9WDD4.1|RECF_CHLAA RecName: Full=DNA replication and repair protein recF
 sp|B9LH68.1|RECF_CHLSY RecName: Full=DNA replication and repair protein recF
Length=392

 Score = 32.7 bits (73),  Expect = 2.6, Method: Compositional matrix adjust.
 Identities = 19/67 (29%), Positives = 36/67 (54%), Gaps = 4/67 (5%)

Query  18   WKEKQKIAKIFDMKGFVKYFDDDFGFDGISCLATLMEAITSLSNDVSSIYDFLSGLKDIL  77
            W+E++++ +  D +  + Y+D +    G   LA  + AI  L++    +Y  +SG +D L
Sbjct  184  WREQRRVPRHVDAE--LAYWDQELAAAGGYLLAERLRAIVELNDLAGPLYQEMSGGEDRL  241

Query  78   K--YAPS  82
            +  YA S
Sbjct  242  QIEYAAS  248


>sp|C6DKP3.1|Y2463_PECCP RecName: Full=UPF0061 protein PC1_2463
Length=483

 Score = 32.3 bits (72),  Expect = 3.4, Method: Compositional matrix adjust.
 Identities = 18/46 (40%), Positives = 23/46 (50%), Gaps = 1/46 (2%)

Query  167  SKNSYKPTEDDEWSGGSKLEGSHPLMS-YSIHDYPLTAKQLDAGRS  211
            S + + P +D  WSG   L G  PL   YS H + + A QL  GR 
Sbjct  46   SSDWFTPEQDAVWSGERLLPGMEPLAQVYSGHQFGMWAGQLGDGRG  91


>sp|B8G3J6.1|RECF_CHLAD RecName: Full=DNA replication and repair protein recF
Length=392

 Score = 32.0 bits (71),  Expect = 4.2, Method: Compositional matrix adjust.
 Identities = 16/55 (30%), Positives = 29/55 (53%), Gaps = 2/55 (3%)

Query  18   WKEKQKIAKIFDMKGFVKYFDDDFGFDGISCLATLMEAITSLSNDVSSIYDFLSG  72
            W+E++++ +  D +  + Y+D +    G   LA  + A+  LS    SIY  +SG
Sbjct  184  WREQRRLPRNVDAE--LGYWDQELAAAGGYLLAERLRAVVELSALAGSIYRKISG  236


>sp|O60241.2|BAI2_HUMAN RecName: Full=Brain-specific angiogenesis inhibitor 2; Flags: 
Precursor
Length=1585

 Score = 32.0 bits (71),  Expect = 5.0, Method: Composition-based stats.
 Identities = 26/121 (22%), Positives = 51/121 (43%), Gaps = 21/121 (17%)

Query  46   ISCLATLMEAITSLSNDVSSIYDFLSGLKDILK---YAPSNLHLKNHVATYEYLAETLQK  102
            +  L  L+   T  S D+    D L  + D  K   Y PS   ++       ++ +   K
Sbjct  619  VRSLQELLARRTYYSGDLLFSVDILRNVTDTFKRATYVPSADDVQRFFQVVSFMVDAENK  678

Query  103  EQ------------------DEVLDLIGEWLNTFQNSEDIQDDLLPALLQDEQSEVENDI  144
            E+                  ++ + L+G+ L  FQ+S  + D+L+ ++ ++  S V +DI
Sbjct  679  EKWDDAQQVSPGSVHLLRVVEDFIHLVGDALKAFQSSLIVTDNLVISIQREPVSAVSSDI  738

Query  145  S  145
            +
Sbjct  739  T  739


>sp|Q5R7Y0.2|BAI2_PONAB RecName: Full=Brain-specific angiogenesis inhibitor 2; Flags: 
Precursor
Length=1485

 Score = 32.0 bits (71),  Expect = 5.0, Method: Composition-based stats.
 Identities = 26/121 (22%), Positives = 51/121 (43%), Gaps = 21/121 (17%)

Query  46   ISCLATLMEAITSLSNDVSSIYDFLSGLKDILK---YAPSNLHLKNHVATYEYLAETLQK  102
            +  L  L+   T  S D+    D L  + D  K   Y PS   ++       ++ +   K
Sbjct  552  VRSLQELLARRTYYSGDLLFSVDILRNVTDTFKRATYVPSADDVQRFFQVVSFMVDAENK  611

Query  103  EQ------------------DEVLDLIGEWLNTFQNSEDIQDDLLPALLQDEQSEVENDI  144
            E+                  ++ + L+G+ L  FQ+S  + D+L+ ++ ++  S V +DI
Sbjct  612  EKWDDAQQVSPGSVHLLRVVEDFIHLVGDALKAFQSSLIVTDNLVISIQREPVSAVSSDI  671

Query  145  S  145
            +
Sbjct  672  T  672


>sp|Q9FJE8.1|H2A7_ARATH RecName: Full=Probable histone H2A.7; AltName: Full=HTA6
Length=150

 Score = 32.0 bits (71),  Expect = 5.3, Method: Compositional matrix adjust.
 Identities = 25/83 (31%), Positives = 39/83 (47%), Gaps = 4/83 (4%)

Query  95   YLAETLQKEQDEVLDLIGEWLNTFQNSEDIQDDLLPALLQDEQSEVENDISSLTAMYNDL  154
            Y+A  L+    EVL+L G      + S  I   LL A+  DE  E+   +S +T  +  +
Sbjct  60   YMAAVLEYLAAEVLELAGNAARDNKKSRIIPRHLLLAIRNDE--ELGKLLSGVTIAHGGV  117

Query  155  LDVRKEIVSLLSSKNSYKPTEDD  177
            L     +  LL  K++ KP E+ 
Sbjct  118  LPNINSV--LLPKKSATKPAEEK  138


>sp|Q05930.1|MDM30_YEAST RecName: Full=Mitochondrial distribution and morphology protein 
30
Length=598

 Score = 31.6 bits (70),  Expect = 6.0, Method: Compositional matrix adjust.
 Identities = 12/26 (47%), Positives = 18/26 (70%), Gaps = 0/26 (0%)

Query  109  DLIGEWLNTFQNSEDIQDDLLPALLQ  134
            DL+  W N F N+E+ +  +LPALL+
Sbjct  348  DLLSVWSNNFHNAENFESTVLPALLE  373


>sp|Q6KI99.1|Y1910_MYCMO RecName: Full=UPF0082 protein MMOB1910
Length=244

 Score = 31.6 bits (70),  Expect = 6.5, Method: Compositional matrix adjust.
 Identities = 26/110 (24%), Positives = 53/110 (49%), Gaps = 11/110 (10%)

Query  24   IAKIFDMKGFVKYFDDDFGFDGISCLATLMEAITSLSNDVSSIYDF------LSGLKDIL  77
            I  IF+ KG +    +++     +    ++EA+ + + DV +  DF       S  +++ 
Sbjct  132  IPYIFERKGVLDILKEEYE----NSDQLMLEALDAGAEDVQTFEDFSRIITNPSNFQEVK  187

Query  78   KYAPSNLHLKNHV-ATYEYLAETLQKEQDEVLDLIGEWLNTFQNSEDIQD  126
                  L L+N+  A  +YL  T    + E L+ +  W+ T +++ED+Q+
Sbjct  188  DKIDKALSLENYATAEIQYLPNTTVSFEKEKLEKLETWIETLEDNEDVQE  237


>sp|O15550.2|KDM6A_HUMAN RecName: Full=Lysine-specific demethylase 6A; AltName: Full=Histone 
demethylase UTX; AltName: Full=Ubiquitously-transcribed 
TPR protein on the X chromosome; AltName: Full=Ubiquitously-transcribed 
X chromosome tetratricopeptide repeat protein
Length=1401

 Score = 31.2 bits (69),  Expect = 8.0, Method: Composition-based stats.
 Identities = 16/52 (31%), Positives = 26/52 (50%), Gaps = 2/52 (3%)

Query  155  LDVRKEIVSLLSSKNSYKPTEDDEWSGGSKLEGSHPLMSYSIHDYPLTAKQL  206
            L +  E+ S   + N+ +    D WSGG  +  SHP +    H + LT ++L
Sbjct  425  LPIPAELTSRQGAMNTAQQNTSDNWSGGHAV--SHPPVQQQAHSWCLTPQKL  474

__________________________________________________________________________________________________=

c)                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

gb|EBG98654.1|  hypothetical protein GOS_9340206 [marine metag...  35.8    1.3  
gb|EDH73298.1|  hypothetical protein GOS_551721 [marine metage...  33.9    4.7  
gb|EBA86708.1|  hypothetical protein GOS_328313 [marine metage...  33.5    6.5  
gb|EBJ87123.1|  hypothetical protein GOS_8826099 [marine metag...  33.5    6.8  
gb|EBF66105.1|  hypothetical protein GOS_9561335 [marine metag...  33.5    6.8  

ALIGNMENTS
>gb|EBG98654.1| hypothetical protein GOS_9340206 [marine metagenome]
Length=310

 Score = 35.8 bits (81),  Expect = 1.3, Method: Compositional matrix adjust.
 Identities = 18/54 (34%), Positives = 28/54 (52%), Gaps = 0/54 (0%)

Query  206  LDAGRSTGKVGSHVQITAFDLRVFEEIKDALSADAGFKERAGKLSDQRSLVNEL  259
             D    T +  S + +TAFDLR+++  KD    +  F+E   ++ D   L NEL
Sbjct  54   FDQQEKTNETTSKIDLTAFDLRIYDLEKDIKKLNNNFEELIFQIDDLNQLYNEL  107


>gb|EDH73298.1| hypothetical protein GOS_551721 [marine metagenome]
Length=292

 Score = 33.9 bits (76),  Expect = 4.7, Method: Compositional matrix adjust.
 Identities = 19/57 (34%), Positives = 29/57 (51%), Gaps = 3/57 (5%)

Query  33  FVKYFDDDFGFDGISCLATLMEAITSLSNDVSS--IYDFLSGLKDILKYA-PSNLHL  86
           FV   +++FGF G     TL+      S+ V+   +    SGL D++  A P+ LHL
Sbjct  24  FVASVNEEFGFSGAKAFTTLLRTCGRESDKVAEELVRSVSSGLPDMVVVAEPTGLHL  80


>gb|EBA86708.1| hypothetical protein GOS_328313 [marine metagenome]
Length=352

 Score = 33.5 bits (75),  Expect = 6.5, Method: Compositional matrix adjust.
 Identities = 24/65 (37%), Positives = 32/65 (50%), Gaps = 7/65 (10%)

Query  226  LRVFEEIKDALSADAGFK-----ERAGKLSDQRSLVNELV--DAMADVTSFVSKAKLKVM  278
            L VF  I D L  D  F      E  G +SD  +L+ E +  D + DV+ FVSK  LK+ 
Sbjct  283  LEVFGNINDGLQYDKFFISGEDLEVIGTVSDDEALILERLFRDGILDVSGFVSKDGLKIT  342

Query  279  QKDIN  283
              + N
Sbjct  343  GGEFN  347


>gb|EBJ87123.1| hypothetical protein GOS_8826099 [marine metagenome]
Length=360

 Score = 33.5 bits (75),  Expect = 6.8, Method: Compositional matrix adjust.
 Identities = 24/65 (37%), Positives = 32/65 (50%), Gaps = 7/65 (10%)

Query  226  LRVFEEIKDALSADAGFK-----ERAGKLSDQRSLVNELV--DAMADVTSFVSKAKLKVM  278
            L VF  I D L  D  F      E  G +SD  +L+ E +  D + DV+ FVSK  LK+ 
Sbjct  184  LEVFGNINDGLQYDKFFISGEDLEVIGTVSDDEALILERLFRDGILDVSGFVSKDGLKIT  243

Query  279  QKDIN  283
              + N
Sbjct  244  GGEFN  248


>gb|EBF66105.1| hypothetical protein GOS_9561335 [marine metagenome]
Length=268

 Score = 33.5 bits (75),  Expect = 6.8, Method: Compositional matrix adjust.
 Identities = 22/59 (38%), Positives = 29/59 (50%), Gaps = 4/59 (6%)

Query  56   ITSLSNDVSSIYDFLSGLKD-ILKYAPSNLHL---KNHVATYEYLAETLQKEQDEVLDL  110
            I S  N V   Y+ L+G  D  LK++    HL   +  VA  EYL +T    Q+ VLD 
Sbjct  172  IGSRPNQVGKEYEALTGFPDTCLKFSKDQPHLHPTQKPVALMEYLIKTYTNPQETVLDF  230