GOS 2140030

From Metagenes
Warning: this metagenomic sequence has been carefully annotated by students during bioinformatics assignments. These quality annotations are therefore the result of a teaching exercise that you are most welcome to amend and extend if necessary!


Sequence
CAMERA AccNum : JCVI_READ_1091118858891
Annotathon code: GOS_2140030
Sample :
  • GPS :24°10'29n; 84°20'40w
  • Caribbean Sea: Gulf of Mexico - USA
  • Coastal Sea (-2m, 26.4°C, 0.1-0.8 microns)
Authors
Team : Algarve 2011
Username : ccj2011
Annotated on : 2011-07-19 01:22:09
  • Domingues Joana Manuel Portela
  • Guerreiro Carla Sofia de Jesus
  • Serra Carolina Alves

Synopsis

Genomic Sequence

>JCVI_READ_1091118858891 GOS_2140030 Genomic DNA
CCTATCACTCCATCGCTTAACAAAGTATGGAATCGGGTCCGCCTCGACCACTGTATGCCTAAATCGTAATTGGCATACTGCTCCCTGTAATCCAACCGGC
AATGTTTCCATCGGGTCCGGATTGGGGGCTATTGTCAGGAGTGGTATCCAAATCTGATTATTCGCATCCCACCTGGCCCAAGCCGCATGAGGCATTAACC
CAAATGGGATGCTTAAGGAATCATTTGCGCGAACACCTGACACTCCCAGTGGACGAACTCTCGTCCACACACTCCCGCTCGTATCTAATTGAGGCGCCCC
TACCGGCATCCAGGGATCGCCCGCTTGCGCCTTTTGAACTATTCGCAATCCCTGCACCCCGCTGATTCGAAGTTCTACAAGATCAGCAGCCCGCAGTCGG
TGAGGCACTTGCGTTTGAAGCAACCCGCCCGGCCCCAAAAGCTGCAACCATGCGCTCTTCATTTCATCATCACTCTGTATGGAAGATGAAATAATGGTGA
AATCGATCGGATCCCAAATCATTGGCCGAACGTTTCCCGTGAGCTCCAATTTCTGCCCGGACTTCAAGCCGTGCGGATACAAGAATTCCAAAAAATGTTC
TCCGCCCTGTTGCGCAACAATTCCAGCAAACCCATGAGTAGGCTCATCATATATTGACGATAAATCCACTCCGCCTTTTTCCTCCGGCAATTCGGCCGCA
GAGAAAGTTTCCATGAATTGATAAAACCCATACGACCGATTCCCCAACCCGTCACCACCTAACCAATCCTTGAGAGAATTGCGATTGTTAGATGTCTGCA
GCCAATAAGCCACCTGAGCCGGCATACCTCCTGCAGTTCGATTCTCCCCCAATGCTTCGTTGATACCCCGTGCCGGCCGACTCAAATGAAAGGCATCTCC
TATGGCCGCTCCCATTTCATATAGGGCACCGCTATATCGATATGGGCTATATGACGGATTGGAAAAGAGGGCTGGATCCATGTGGT

Translation

[3 - 986/986]   indirect strand
>GOS_2140030 Translation [3-986   indirect strand]
HMDPALFSNPSYSPYRYSGALYEMGAAIGDAFHLSRPARGINEALGENRTAGGMPAQVAYWLQTSNNRNSLKDWLGGDGLGNRSYGFYQFMETFSAAELP
EEKGGVDLSSIYDEPTHGFAGIVAQQGGEHFLEFLYPHGLKSGQKLELTGNVRPMIWDPIDFTIISSSIQSDDEMKSAWLQLLGPGGLLQTQVPHRLRAA
DLVELRISGVQGLRIVQKAQAGDPWMPVGAPQLDTSGSVWTRVRPLGVSGVRANDSLSIPFGLMPHAAWARWDANNQIWIPLLTIAPNPDPMETLPVGLQ
GAVCQLRFRHTVVEADPIPYFVKRWSDR

[ Warning ] 5' incomplete: does not start with a Methionine
[ Warning ] 3' incomplete: following codon is not a STOP

Annotator commentaries

The sequence in study is from Caribbean Sea: Gulf of Mexico.


The ORF choosed, ORF number 1 in reading frame 3 on the reverse strand extended from base 3 to base 986, is a ORFan, because it has more than 200 a.a. (i.e 328 aminoacids). Because this ORF is an ORFan, it also could be considered codant.


The ORFan has no known homologous sequences, because when we search them in BLASTp or BLASTx, we can't found significant results. The e-values ​​are too high (1.7 to 6.3) and the score values are too low (36.2 to 38.1), so we can't get a conclusive result.


As the homologous of the ORF in study weren't found, we can't define its taxonomy/phylogeny, so the knowledge of its origin isn't possible. If we couldn't know which organism these protein are from, we can't built a phylogenetic tree.


So, we can't classify the protein, such as its biological process and its function.


ORF finding

PROTOCOL


a) SMS ORFinder / forward strand / frames 1, 2 & 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code

b) SMS ORFinder / reverse strand / frames 1, 2 & 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code


RESULTS ANALYSIS


a) In forward strand it was found 3 ORF's in reading frame 1, reading frame 2 and another in reading frame 3.


b) In reverse strand it was found 2 ORF; one in reading frame 2 and one ORF in reading frame 3 and in reading frame 1 ORF's were not found.


1) The ORF chosed was the ORF number 1 in reading frame 3 on the reverse strand extended from base 3 to base 986. The reason why this was the chosen one are related to the fact that this ORF is the biggest one, it has more than sixty aminoacids, what means that it has more probability to have a function discribed.


This ORF is an ORFan, that happens because it has more than 200 aminoacids (it has 328 aminoacids). ORF is codant, because like was said previously it has more than 200 a.a.


It can be concluded that the ORF nas no homologous sequences, because when evaluating the results found in BLASTp vs NR, and these e-values ​​are too high (1.7 to 6.3) and it has low scores, and those results are not conclusive for discovery of their homologous sequences.



Remaining ORFs doesn't have homologous sequences because the e-values was too high and the scores too low.


2) ORF number 1 in reading frame 1 on the direct strand extends from base 289 to base 489.


Results from BLASTp vs NR for e-values were: "no significant similarity found".


Results from BLASTp vs SWISSPROT for e-values were: 1.9 to 7.2



3) ORF number 1 in reading frame 2 on the direct strand extends from base 419 to base 637


Results from BLASTp vs NR for e-values were: "no significant similarity found".


Results from BLASTp vs SWISSPROT for e-values were: 0.63



4) ORF number 1 in reading frame 3 on the direct strand extends from base 63 to base 365


Results from BLASTp vs NR for e-values were: 5.3 to 8.5


Results from BLASTp vs SWISSPROT for e-values were: 1.1 a 7.3



5) ORF number 1 in reading frame 1 on the reverse strand extends from base 229 to base 423.


Results from BLASTp vs NR for e-values were: "no significant similarity found".


Results from BLASTp vs SWISSPROT for e-values were: "no significant similarity found".



RAW RESULTS

a) forward strand 
 
>ORF number 1 in reading frame 1 on the direct strand extends from base 289 to base 489.
TTGAGGCGCCCCTACCGGCATCCAGGGATCGCCCGCTTGCGCCTTTTGAACTATTCGCAA
TCCCTGCACCCCGCTGATTCGAAGTTCTACAAGATCAGCAGCCCGCAGTCGGTGAGGCAC
TTGCGTTTGAAGCAACCCGCCCGGCCCCAAAAGCTGCAACCATGCGCTCTTCATTTCATC
ATCACTCTGTATGGAAGATGA

>Translation of ORF number 1 in reading frame 1 on the direct strand.
LRRPYRHPGIARLRLLNYSQSLHPADSKFYKISSPQSVRHLRLKQPARPQKLQPCALHFI
ITLYGR*

>ORF number 1 in reading frame 2 on the direct strand extends from base 419 to base 637.
AGCAACCCGCCCGGCCCCAAAAGCTGCAACCATGCGCTCTTCATTTCATCATCACTCTGT
ATGGAAGATGAAATAATGGTGAAATCGATCGGATCCCAAATCATTGGCCGAACGTTTCCC
GTGAGCTCCAATTTCTGCCCGGACTTCAAGCCGTGCGGATACAAGAATTCCAAAAAATGT
TCTCCGCCCTGTTGCGCAACAATTCCAGCAAACCCATGA

>Translation of ORF number 1 in reading frame 2 on the direct strand.
SNPPGPKSCNHALFISSSLCMEDEIMVKSIGSQIIGRTFPVSSNFCPDFKPCGYKNSKKC
SPPCCATIPANP*

>ORF number 1 in reading frame 3 on the direct strand extends from base 63 to base 365.
ATCGTAATTGGCATACTGCTCCCTGTAATCCAACCGGCAATGTTTCCATCGGGTCCGGAT
TGGGGGCTATTGTCAGGAGTGGTATCCAAATCTGATTATTCGCATCCCACCTGGCCCAAG
CCGCATGAGGCATTAACCCAAATGGGATGCTTAAGGAATCATTTGCGCGAACACCTGACA
CTCCCAGTGGACGAACTCTCGTCCACACACTCCCGCTCGTATCTAATTGAGGCGCCCCTA
CCGGCATCCAGGGATCGCCCGCTTGCGCCTTTTGAACTATTCGCAATCCCTGCACCCCGC
TGA

>Translation of ORF number 1 in reading frame 3 on the direct strand.
IVIGILLPVIQPAMFPSGPDWGLLSGVVSKSDYSHPTWPKPHEALTQMGCLRNHLREHLT
LPVDELSSTHSRSYLIEAPLPASRDRPLAPFELFAIPAPR*

---------------------------------------------------------------------------------------------------------------

b) reverse strand

>ORF number 1 in reading frame 1 on the reverse strand extends from base 229 to base 423.
GTGGTGACGGGTTGGGGAATCGGTCGTATGGGTTTTATCAATTCATGGAAACTTTCTCTG
CGGCCGAATTGCCGGAGGAAAAAGGCGGAGTGGATTTATCGTCAATATATGATGAGCCTA
CTCATGGGTTTGCTGGAATTGTTGCGCAACAGGGCGGAGAACATTTTTTGGAATTCTTGT
ATCCGCACGGCTTGA

>Translation of ORF number 1 in reading frame 1 on the reverse strand.
VVTGWGIGRMGFINSWKLSLRPNCRRKKAEWIYRQYMMSLLMGLLELLRNRAENIFWNSC
IRTA*

No ORFs were found in reading frame 2.

>ORF number 1 in reading frame 3 on the reverse strand extends from base 3 to base 986.
CACATGGATCCAGCCCTCTTTTCCAATCCGTCATATAGCCCATATCGATATAGCGGTGCC
CTATATGAAATGGGAGCGGCCATAGGAGATGCCTTTCATTTGAGTCGGCCGGCACGGGGT
ATCAACGAAGCATTGGGGGAGAATCGAACTGCAGGAGGTATGCCGGCTCAGGTGGCTTAT
TGGCTGCAGACATCTAACAATCGCAATTCTCTCAAGGATTGGTTAGGTGGTGACGGGTTG
GGGAATCGGTCGTATGGGTTTTATCAATTCATGGAAACTTTCTCTGCGGCCGAATTGCCG
GAGGAAAAAGGCGGAGTGGATTTATCGTCAATATATGATGAGCCTACTCATGGGTTTGCT
GGAATTGTTGCGCAACAGGGCGGAGAACATTTTTTGGAATTCTTGTATCCGCACGGCTTG
AAGTCCGGGCAGAAATTGGAGCTCACGGGAAACGTTCGGCCAATGATTTGGGATCCGATC
GATTTCACCATTATTTCATCTTCCATACAGAGTGATGATGAAATGAAGAGCGCATGGTTG
CAGCTTTTGGGGCCGGGCGGGTTGCTTCAAACGCAAGTGCCTCACCGACTGCGGGCTGCT
GATCTTGTAGAACTTCGAATCAGCGGGGTGCAGGGATTGCGAATAGTTCAAAAGGCGCAA
GCGGGCGATCCCTGGATGCCGGTAGGGGCGCCTCAATTAGATACGAGCGGGAGTGTGTGG
ACGAGAGTTCGTCCACTGGGAGTGTCAGGTGTTCGCGCAAATGATTCCTTAAGCATCCCA
TTTGGGTTAATGCCTCATGCGGCTTGGGCCAGGTGGGATGCGAATAATCAGATTTGGATA
CCACTCCTGACAATAGCCCCCAATCCGGACCCGATGGAAACATTGCCGGTTGGATTACAG
GGAGCAGTATGCCAATTACGATTTAGGCATACAGTGGTCGAGGCGGACCCGATTCCATAC
TTTGTTAAGCGATGGAGTGATAGG

>Translation of ORF number 1 in reading frame 3 on the reverse strand.
HMDPALFSNPSYSPYRYSGALYEMGAAIGDAFHLSRPARGINEALGENRTAGGMPAQVAY
WLQTSNNRNSLKDWLGGDGLGNRSYGFYQFMETFSAAELPEEKGGVDLSSIYDEPTHGFA
GIVAQQGGEHFLEFLYPHGLKSGQKLELTGNVRPMIWDPIDFTIISSSIQSDDEMKSAWL
QLLGPGGLLQTQVPHRLRAADLVELRISGVQGLRIVQKAQAGDPWMPVGAPQLDTSGSVW
TRVRPLGVSGVRANDSLSIPFGLMPHAAWARWDANNQIWIPLLTIAPNPDPMETLPVGLQ
GAVCQLRFRHTVVEADPIPYFVKRWSDR

Multiple Alignement

PROTOCOL



RESULTS ANALYSIS


It isn't possible to do a multiple alignment, because the results of BLASTp to find the homologous sequences were not conclusive,so there aren't several sequences to align.

RAW RESULTS

Protein Domains

PROTOCOL


InterPro, default parameters at EBI


RESULTS ANALYSIS


The results of InterProScan confirm the results obtained in BLAST, and there are no annotated protein domain sequences.

RAW RESULTS

No hits found.

Phylogeny

PROTOCOL



RESULTS ANALYSIS


The tree could not be built because the phylogeny / taxonomy of the ORF was not determined. This ORF does not have homologous sequences. The phylogeny is very important in building the tree because it is constructed by comparing the proteins from various organisms.

RAW RESULTS

Taxonomy report

PROTOCOL


BLASTp vs NR, default NCBI parameters + "1000 Max target sequences"


RESULTS ANALYSIS


Recalling the results found in BLASTp vs. NR, the phylogeny and taxonomy can not be determined, since the results of homologous sequences were not conclusive, so there is no way of comparing the ORF sequences and their homologous sequences, and only then, could know which organism belongs.

RAW RESULTS

cellular organisms
. Bacteria           [bacteria]
. . Arthrobacter chlorophenolicus A6 ------------   38 2 hits [high GC Gram+]     GAF sensor signal transduction histidine kinase [Arthrobact
. . Burkholderia gladioli BSR3 ..................   36 1 hit  [b-proteobacteria]  NAD-dependent epimerase/dehydratase [Burkholderia gladioli 
. . Pelagibaca bermudensis HTCC2601 .............   36 2 hits [a-proteobacteria]  hypothetical protein 1100011001290_R2601_08106 [Pelagibaca 
. Strongylocentrotus purpuratus (purple urchin) -   37 2 hits [sea urchins]       PREDICTED: hypothetical protein, partial [Strongylocentrotu

BLAST

PROTOCOL


1) BLASTp vs NR, defaut NCBI parameters + "1000 Max target sequence"


2) BLASTp vs SWISSPROT, defaut NCBI parameters + "1000 Max target sequence"




RESULTS ANALYSIS


Primarily it was used the BLASTp vs NR because it is the most embracing data base.


The results had not been a sufficiently significant, even his e-values ​​and scores too.

The scores were to low and the e-values were to high.

The values of the score were between 36.2 and 38.1 (meaning it were too low).

The e-values were between 1.7 and 6.3 (meaning it were too high).

It could be concluded that there is no homologous sequences for the ORF in study.



The results from the BLASTp vs SWISSPROT has no significant similarity found.

RAW RESULTS

1) BLASTp vs NR

                                                                  Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|YP_002486247.1|  GAF sensor signal transduction histidine ...  38.1    1.7  
ref|XP_001191599.1|  PREDICTED: hypothetical protein, partial ...  37.7    2.5  
gb|AEA59851.1|  NAD-dependent epimerase/dehydratase [Burkholde...  36.6    5.3  
ref|ZP_01445303.1|  hypothetical protein 1100011001290_R2601_0...  36.2    6.3  

ALIGNMENTS
>ref|YP_002486247.1| GAF sensor signal transduction histidine kinase [Arthrobacter 
chlorophenolicus A6]
 gb|ACL38158.1| GAF sensor signal transduction histidine kinase [Arthrobacter 
chlorophenolicus A6]
Length=404

 Score = 38.1 bits (87),  Expect = 1.7, Method: Compositional matrix adjust.
 Identities = 34/106 (32%), Positives = 47/106 (44%), Gaps = 1/106 (1%)

Query  16   RYSGALYEMGAAIGDAFHLSRPARGINEALGENRTAGGMPAQVAYWLQTSNNRNSLKDWL  75
            R    L  + A I D  HL R AR +++ALGE R A  + A+ A  + + + RN L   L
Sbjct  147  RQRHGLEILAAEIVDVLHLERRARQLSDALGEARRANALLAEFAGRI-SHDLRNPLTSVL  205

Query  76   GGDGLGNRSYGFYQFMETFSAAELPEEKGGVDLSSIYDEPTHGFAG  121
            G   LG  S G     +     E+    G   LS I D  ++   G
Sbjct  206  GYVELGQMSDGAASDPDLAGYLEVAAGSGQRMLSMIEDVLSYATTG  251


>ref|XP_001191599.1| PREDICTED: hypothetical protein, partial [Strongylocentrotus 
purpuratus]
 ref|XP_787543.2| PREDICTED: hypothetical protein, partial [Strongylocentrotus 
purpuratus]
Length=1221

 Score = 37.7 bits (86),  Expect = 2.5, Method: Composition-based stats.
 Identities = 22/48 (46%), Positives = 25/48 (52%), Gaps = 1/48 (2%)

Query  87   FYQFMETFSAAELPEEKGGVDLS-SIYDEPTHGFAGIVAQQGGEHFLE  133
            FY FM  F  A LP  +     S SIYDEPT    GIV+  G E  +E
Sbjct  118  FYTFMAAFVYAVLPPSQVAARSSRSIYDEPTQACDGIVSDSGSETDIE  165


>gb|AEA59851.1| NAD-dependent epimerase/dehydratase [Burkholderia gladioli BSR3]
Length=213

 Score = 36.6 bits (83),  Expect = 5.3, Method: Compositional matrix adjust.
 Identities = 30/89 (34%), Positives = 45/89 (51%), Gaps = 9/89 (10%)

Query  150  GNVRPMIWDPIDFTIISSSIQSDDEMKSAWLQLLGPGGLLQTQVPHRLRAADLVE-LRIS  208
            GN+     D  D   +++++Q  D   SA+    GPGG   +QVP  +RA  LV+  R +
Sbjct  44   GNITAKTADLFDAASVAAALQGQDVAASAY----GPGGGDASQVPKAVRA--LVDGARAA  97

Query  209  GVQGLRIVQKAQAGDPWMPVGAPQLDTSG  237
            GV+ L +V    AG   +  G   +DT G
Sbjct  98   GVKRLLVV--GGAGSLEVAPGTQLVDTEG  124


>ref|ZP_01445303.1| hypothetical protein 1100011001290_R2601_08106 [Pelagibaca bermudensis 
HTCC2601]
 gb|EAU44480.1| hypothetical protein R2601_08106 [Roseovarius sp. HTCC2601]
Length=292

 Score = 36.2 bits (82),  Expect = 6.3, Method: Compositional matrix adjust.
 Identities = 21/55 (38%), Positives = 27/55 (49%), Gaps = 2/55 (4%)

Query  111  IYDEPTHGFAGIVAQQGGEHFLEFLYPHGLKSGQKLELTGNVRP--MIWDPIDFT  163
            IYD   H  +G+  QQ G+  L F   HGL S + L   G  +P  M  DP  F+
Sbjct  166  IYDTGDHQISGVSQQQSGDQSLTFTSQHGLVSLRDLPRVGEAQPQEMPEDPEAFS  220


---------------------------------------------------------------------------------------------------

2) BLASTp vs SWISSPROT


No significant similarity found