GOS 1347030

From Metagenes
Warning: this metagenomic sequence has been carefully annotated by students during bioinformatics assignments. These quality annotations are therefore the result of a teaching exercise that you are most welcome to amend and extend if necessary!


Sequence
CAMERA AccNum : JCVI_READ_1092343626976
Annotathon code: GOS_1347030
Sample :
  • GPS :15°16'40s; 148°13'28w
  • Polynesia Archipelagos: Tikehau Lagoon - Fr. Polynesia
  • Coral Atoll (-1.2m, 27.8°C, 0.1-0.8 microns)
Authors
Team : Algarve
Username : KONG
Annotated on : 2010-07-05 00:51:13
  • 37007 FilipaLopesCoelho
  • 37009 JoaoDiogoBaiaCoelhoAlmeida
  • 37018 RicardoSilvaBrandao

Synopsis

Genomic Sequence

>JCVI_READ_1092343626976 GOS_1347030 Genomic DNA
TGCGCTCGACGCGTCTCCGCTCCGACGTGTGGAGGCGCCGTGATGGAGGAAAGGAGGGCGAGGTGATCTCACCCTGGAGATTTGCGCGTTTTTTTCTGCG
CCGCGCGGTGCGCCGCGCGGTGCGCGCCGCAGTCGCCGTTCATCGTTGCGCACGATGCAGCACTCCCGCCCACCCCAGCTTCCACACTCCTCAACCACAG
GAGTGCGACCAACCCACCTCAACCACACCCTCATCGGCCGCCGCGTCGGCGCCAGCCACGCAACCTCTTCCCACCACTCGCACTCGATACCACCACCGCC
ACCGCCACCGTCTCCACCACCGCCGCCGTCGCCTCCGCCGCCTTTTCAATCACCATTGCCGCCGCCGCTGCCGCCGCCGTCGCCGCCGCCGCCATCGCCG
CCGCTGCCAATCCCGCCGCCGTACGCGCCGTTCTGGTTCTGGCAGCGCGACGACGTGCTGGATCGGCCGTGGTACTCGAGCCGCGAGTACTTGCTCGAGT
GGTTCGAGCGCGACGCGACGACGCTGAAGCCGGAGGACGGGCTGCACAAGCTCTCGGGCTGGGGCTACCTCAACACCACCCAGTACCAGGAGATCGTGCG
CGCGGC

Translation

[44 - 604/606]   direct strand
>GOS_1347030 Translation [44-604   direct strand]
WRKGGRGDLTLEICAFFSAPRGAPRGARRSRRSSLRTMQHSRPPQLPHSSTTGVRPTHLNHTLIGRRVGASHATSSHHSHSIPPPPPPPSPPPPPSPPPP
FQSPLPPPLPPPSPPPPSPPLPIPPPYAPFWFWQRDDVLDRPWYSSREYLLEWFERDATTLKPEDGLHKLSGWGYLNTTQYQEIVRA

[ Warning ] 5' incomplete: does not start with a Methionine
[ Warning ] 3' incomplete: following codon is not a STOP

Annotator commentaries

The GOS_1347030 sequence codifie a protein that is not identical to other known protein in current databases.

The single-value obtained in BLASTX is high (e7) and the rate of identity of our protein with Chlorobium limicola DSM 245 is just 42%.

It is not possible to conclude about the molecular function, tree biological process and where it initiates transcription of the protein, as there is no sequences to align with the methionine of our protein.

In the future, with the increase of the database the results can change. Homologous sequences may already exist, and we can determine more about the features and functions of this protein.






ORF finding

PROTOCOL


a) SMS ORFinder / forward strand / frames 1, 2 & 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code

b) SMS ORFinder / reverse strand / frames 1, 2 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code




RESULTS ANALYSIS


The ORF's doesn't start with a Methionine and not end with stop codon. ORF start in base 44.

The ORF that codify a protein is the reading frame 2 on the direct strand extends and the reverse strand extends from base 262 to base 606. Is the biggest ORF found in direct strand.

We found 3 ORF'S in foward strand and 3 ORF’s in reverse strand.


RAW RESULTS

a) forward strand

>ORF number 1 in reading frame 1 on the direct strand extends from base 67 to base 606.
TCTCACCCTGGAGATTTGCGCGTTTTTTTCTGCGCCGCGCGGTGCGCCGCGCGGTGCGCG
CCGCAGTCGCCGTTCATCGTTGCGCACGATGCAGCACTCCCGCCCACCCCAGCTTCCACA
CTCCTCAACCACAGGAGTGCGACCAACCCACCTCAACCACACCCTCATCGGCCGCCGCGT
CGGCGCCAGCCACGCAACCTCTTCCCACCACTCGCACTCGATACCACCACCGCCACCGCC
ACCGTCTCCACCACCGCCGCCGTCGCCTCCGCCGCCTTTTCAATCACCATTGCCGCCGCC
GCTGCCGCCGCCGTCGCCGCCGCCGCCATCGCCGCCGCTGCCAATCCCGCCGCCGTACGC
GCCGTTCTGGTTCTGGCAGCGCGACGACGTGCTGGATCGGCCGTGGTACTCGAGCCGCGA
GTACTTGCTCGAGTGGTTCGAGCGCGACGCGACGACGCTGAAGCCGGAGGACGGGCTGCA
CAAGCTCTCGGGCTGGGGCTACCTCAACACCACCCAGTACCAGGAGATCGTGCGCGCGGC


>Translation of ORF number 1 in reading frame 1 on the direct strand.
SHPGDLRVFFCAARCAARCAPQSPFIVAHDAALPPTPASTLLNHRSATNPPQPHPHRPPR
RRQPRNLFPPLALDTTTATATVSTTAAVASAAFSITIAAAAAAAVAAAAIAAAANPAAVR
AVLVLAARRRAGSAVVLEPRVLARVVRARRDDAEAGGRAAQALGLGLPQHHPVPGDRARG


>ORF number 1 in reading frame 2 on the direct strand extends from base 44 to base 604.
TGGAGGAAAGGAGGGCGAGGTGATCTCACCCTGGAGATTTGCGCGTTTTTTTCTGCGCCG
CGCGGTGCGCCGCGCGGTGCGCGCCGCAGTCGCCGTTCATCGTTGCGCACGATGCAGCAC
TCCCGCCCACCCCAGCTTCCACACTCCTCAACCACAGGAGTGCGACCAACCCACCTCAAC
CACACCCTCATCGGCCGCCGCGTCGGCGCCAGCCACGCAACCTCTTCCCACCACTCGCAC
TCGATACCACCACCGCCACCGCCACCGTCTCCACCACCGCCGCCGTCGCCTCCGCCGCCT
TTTCAATCACCATTGCCGCCGCCGCTGCCGCCGCCGTCGCCGCCGCCGCCATCGCCGCCG
CTGCCAATCCCGCCGCCGTACGCGCCGTTCTGGTTCTGGCAGCGCGACGACGTGCTGGAT
CGGCCGTGGTACTCGAGCCGCGAGTACTTGCTCGAGTGGTTCGAGCGCGACGCGACGACG
CTGAAGCCGGAGGACGGGCTGCACAAGCTCTCGGGCTGGGGCTACCTCAACACCACCCAG
TACCAGGAGATCGTGCGCGCG

>Translation of ORF number 1 in reading frame 2 on the direct strand.
WRKGGRGDLTLEICAFFSAPRGAPRGARRSRRSSLRTMQHSRPPQLPHSSTTGVRPTHLN
HTLIGRRVGASHATSSHHSHSIPPPPPPPSPPPPPSPPPPFQSPLPPPLPPPSPPPPSPP
LPIPPPYAPFWFWQRDDVLDRPWYSSREYLLEWFERDATTLKPEDGLHKLSGWGYLNTTQ
YQEIVRA

>ORF number 1 in reading frame 3 on the direct strand extends from base 3 to base 527.
CGCTCGACGCGTCTCCGCTCCGACGTGTGGAGGCGCCGTGATGGAGGAAAGGAGGGCGAG
GTGATCTCACCCTGGAGATTTGCGCGTTTTTTTCTGCGCCGCGCGGTGCGCCGCGCGGTG
CGCGCCGCAGTCGCCGTTCATCGTTGCGCACGATGCAGCACTCCCGCCCACCCCAGCTTC
CACACTCCTCAACCACAGGAGTGCGACCAACCCACCTCAACCACACCCTCATCGGCCGCC
GCGTCGGCGCCAGCCACGCAACCTCTTCCCACCACTCGCACTCGATACCACCACCGCCAC
CGCCACCGTCTCCACCACCGCCGCCGTCGCCTCCGCCGCCTTTTCAATCACCATTGCCGC
CGCCGCTGCCGCCGCCGTCGCCGCCGCCGCCATCGCCGCCGCTGCCAATCCCGCCGCCGT
ACGCGCCGTTCTGGTTCTGGCAGCGCGACGACGTGCTGGATCGGCCGTGGTACTCGAGCC
GCGAGTACTTGCTCGAGTGGTTCGAGCGCGACGCGACGACGCTGA

>Translation of ORF number 1 in reading frame 3 on the direct strand.
RSTRLRSDVWRRRDGGKEGEVISPWRFARFFLRRAVRRAVRAAVAVHRCARCSTPAHPSF
HTPQPQECDQPTSTTPSSAAASAPATQPLPTTRTRYHHRHRHRLHHRRRRLRRLFNHHCR
RRCRRRRRRRHRRRCQSRRRTRRSGSGSATTCWIGRGTRAASTCSSGSSATRRR*

----------------------------------------------------------------------------------------

b) reverse strand

>ORF number 1 in reading frame 1 on the reverse strand extends from base 43 to base 261.
CCCCAGCCCGAGAGCTTGTGCAGCCCGTCCTCCGGCTTCAGCGTCGTCGCGTCGCGCTCG
AACCACTCGAGCAAGTACTCGCGGCTCGAGTACCACGGCCGATCCAGCACGTCGTCGCGC
TGCCAGAACCAGAACGGCGCGTACGGCGGCGGGATTGGCAGCGGCGGCGATGGCGGCGGC
GGCGACGGCGGCGGCAGCGGCGGCGGCAATGGTGATTGA

>Translation of ORF number 1 in reading frame 1 on the reverse strand.
PQPESLCSPSSGFSVVASRSNHSSKYSRLEYHGRSSTSSRCQNQNGAYGGGIGSGGDGGG
GDGGGSGGGNGD*

>ORF number 2 in reading frame 1 on the reverse strand extends from base 262 to base 606.
AAAGGCGGCGGAGGCGACGGCGGCGGTGGTGGAGACGGTGGCGGTGGCGGTGGTGGTATC
GAGTGCGAGTGGTGGGAAGAGGTTGCGTGGCTGGCGCCGACGCGGCGGCCGATGAGGGTG
TGGTTGAGGTGGGTTGGTCGCACTCCTGTGGTTGAGGAGTGTGGAAGCTGGGGTGGGCGG
GAGTGCTGCATCGTGCGCAACGATGAACGGCGACTGCGGCGCGCACCGCGCGGCGCACCG
CGCGGCGCAGAAAAAAACGCGCAAATCTCCAGGGTGAGATCACCTCGCCCTCCTTTCCTC
CATCACGGCGCCTCCACACGTCGGAGCGGAGACGCGTCGAGCGCA

>Translation of ORF number 2 in reading frame 1 on the reverse strand.
KGGGGDGGGGGDGGGGGGGIECEWWEEVAWLAPTRRPMRVWLRWVGRTPVVEECGSWGGR
ECCIVRNDERRLRRAPRGAPRGAEKNAQISRVRSPRPPFLHHGASTRRSGDASSA

>ORF number 1 in reading frame 2 on the reverse strand extends from base 38 to base 376.
GGTAGCCCCAGCCCGAGAGCTTGTGCAGCCCGTCCTCCGGCTTCAGCGTCGTCGCGTCGC
GCTCGAACCACTCGAGCAAGTACTCGCGGCTCGAGTACCACGGCCGATCCAGCACGTCGT
CGCGCTGCCAGAACCAGAACGGCGCGTACGGCGGCGGGATTGGCAGCGGCGGCGATGGCG
GCGGCGGCGACGGCGGCGGCAGCGGCGGCGGCAATGGTGATTGAAAAGGCGGCGGAGGCG
ACGGCGGCGGTGGTGGAGACGGTGGCGGTGGCGGTGGTGGTATCGAGTGCGAGTGGTGGG
AAGAGGTTGCGTGGCTGGCGCCGACGCGGCGGCCGATGA

>Translation of ORF number 1 in reading frame 2 on the reverse strand.
GSPSPRACAARPPASASSRRARTTRASTRGSSTTADPARRRAARTRTARTAAGLAAAAMA
AAATAAAAAAAMVIEKAAEATAAVVETVAVAVVVSSASGGKRLRGWRRRGGR*

>ORF number 1 in reading frame 3 on the reverse strand extends from base 3 to base 257.
CGCGCGCACGATCTCCTGGTACTGGGTGGTGTTGAGGTAGCCCCAGCCCGAGAGCTTGTG
CAGCCCGTCCTCCGGCTTCAGCGTCGTCGCGTCGCGCTCGAACCACTCGAGCAAGTACTC
GCGGCTCGAGTACCACGGCCGATCCAGCACGTCGTCGCGCTGCCAGAACCAGAACGGCGC
GTACGGCGGCGGGATTGGCAGCGGCGGCGATGGCGGCGGCGGCGACGGCGGCGGCAGCGG
CGGCGGCAATGGTGA

>Translation of ORF number 1 in reading frame 3 on the reverse strand.
RAHDLLVLGGVEVAPARELVQPVLRLQRRRVALEPLEQVLAARVPRPIQHVVALPEPERR
VRRRDWQRRRWRRRRRRRQRRRQW*

Multiple Alignement

PROTOCOL



RESULTS ANALYSIS


It is impossible to align with other proteins since not exist this homology with known proteins in database. So, we can't select the in-group and out-group to conduct the multiple alignement.

RAW RESULTS

No hits reported.

Protein Domains

PROTOCOL


InterPro, default parameters at EBI



RESULTS ANALYSIS


We did not get the protein domains. This fact can occur for several reasons: the protein may be incomplete, may not have conserved domains or the database has no identical homologous structures, this don't let the comparison of the known protein domains with the "unknown" protein.

RAW RESULTS


No hits reported.

Phylogeny

PROTOCOL



RESULTS ANALYSIS


Since the protein GOS_1347030 is not identical to any known protein is not possible to construct a tree.

RAW RESULTS

Taxonomy report

PROTOCOL


1)BLASTp versus NR, NCBI default parameters + "1000 Max target sequences"



RESULTS ANALYSIS


We only had obtain one e-value (e7) so is impossible associate our protein with others already known.

The obtained protein (the only one) belongs to green sulfur bacteria, but because of the high-value and can not associate our protein for this type.

RAW RESULTS

1)

Chlorobium limicola DSM 245 [green sulfur bacteria]
. Chlorobium limicola DSM 245 -   34 2 hits [green sulfur bacteria]  5 nucleotidase, deoxy, cytosolic type C [Chlorobium limicol

BLAST

PROTOCOL


1)BLASTp versus NR, NCBI default parameters apart from "Number of descriptions_500"

2)BLASTx versus NR, NCBI default parameters apart from "Number of descriptions_500"



RESULTS ANALYSIS


The results are not significant. This may occur because the sequence have more than 150 amino acids or possibly because at present there are no known homologous sequences in the database.

RAW RESULTS
1)

No significant similarity found

2)
                                                                  Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|YP_001943142.1|  5 nucleotidase, deoxy, cytosolic type C [...  34.3    7.0  



ALIGNMENTS
>ref|YP_001943142.1| 5 nucleotidase, deoxy, cytosolic type C [Chlorobium limicola 
DSM 245]
 gb|ACD90163.1| 5 nucleotidase, deoxy, cytosolic type C [Chlorobium limicola 
DSM 245]
Length=197

 Score = 34.3 bits (77),  Expect = 7.0
 Identities = 17/40 (42%), Positives = 23/40 (57%), Gaps = 1/40 (2%)
 Frame = +1

Query  439  REYLLEWFERDATTLKPEDGLHKLSGWGYLNTTQYQEIVR  558
            RE   EWFE+    L P D  + LS WG L+ +QY+ + R
Sbjct  25   REIAAEWFEKSIEELPP-DVSYGLSEWGVLDRSQYESLHR  63