GOS 1508020

From Metagenes
Warning: this metagenomic sequence has been carefully annotated by students during bioinformatics assignments. These quality annotations are therefore the result of a teaching exercise that you are most welcome to amend and extend if necessary!


Sequence
CAMERA AccNum : JCVI_READ_1092351624807
Annotathon code: GOS_1508020
Sample :
  • GPS :1°23'21n; 91°49'1w
  • Galapagos Islands: Wolf Island - Ecuador
  • Coastal (-1.7m, 21.8°C, 0.1-0.8 microns)
Authors
Team : Algarve
Username : AMS1
Annotated on : 2010-07-14 20:45:21
  • a27930 MónicaAlexandraIsidoroGomes
  • a28985 SusanaFilipaJordãoViegas
  • a34705 AndréJerónimoGuerra

Synopsis

Genomic Sequence

>JCVI_READ_1092351624807 GOS_1508020 Genomic DNA
GAGCATTCCATCGTACTCGGGATTGTAAGCATCCGAGGGCGATGCCGGGTCGAGGTCCGAGTAGAAATCAGGTCGTTGCTCAATGGAAATGTCGATGGGT
TCGCTCAAATCCAAGGTGAGTGTAACATCAACCGTGTAGGGAATTCGAGATTCAAAGAACTCAGGATCATTAAGGATACCTAGGATCGTGAATGCTTGAC
CGGTGACATCGAAATGCGCCCATGAGTTCCAGTGCGATGATTCAACATCGAGGTCGCAAATCCATGCGTCCGAAACACGTGTGCACGCTCCTTGTGCATC
GTTGCTTCCATCTTCTCCGAGATCGATTTCAAGTTGCAAAGCACCTCCTTGTGCGGTATCCAAATCAAACTCTGTTGGAACGAAGCGAGCCTCCATTTGA
GTTGTGCCGTTTGGAATCCACACATAATGCGAATCTTCTGTGTAATACACAGCTCCAGTGGCACCGTTGTTGAAATGATTCCAATCGCCCTTCCAACTAT
GGCTTACTCGATCAGTATTGACTGGAATAGATGTTTCAACCATCATGGATTCGTAAATATCCCATGCATCCCAAACTGTGTATTCTGGATGATCAACAAA
ACCGTCCTCATCTTGGTCGCGCATGAGTTGAAGCGTTCGTGCCAATGCCATGGCTGCATCAACATCGGTCAAACCATGACCAACCCTCCAATCATGGCAT
TCACCCGTTGCCGAGCGATAACATTCCTCGGGAATATCGTTGCATGAATCTTCGTCGTTACCTTCTTCGCACGCATTGTCCATTCCTTCGTAACGTGCCG

Translation

[3 - 800/800]   indirect strand
>GOS_1508020 Translation [3-800   indirect strand]
ARYEGMDNACEEGNDEDSCNDIPEECYRSATGECHDWRVGHGLTDVDAAMALARTLQLMRDQDEDGFVDHPEYTVWDAWDIYESMMVETSIPVNTDRVSH
SWKGDWNHFNNGATGAVYYTEDSHYVWIPNGTTQMEARFVPTEFDLDTAQGGALQLEIDLGEDGSNDAQGACTRVSDAWICDLDVESSHWNSWAHFDVTG
QAFTILGILNDPEFFESRIPYTVDVTLTLDLSEPIDISIEQRPDFYSDLDPASPSDAYNPEYDGML

[ Warning ] 5' incomplete: does not start with a Methionine
[ Warning ] 3' incomplete: following codon is not a STOP

Annotator commentaries

It was not found any reliable homology with any known protein domain, thus we can not correlate the sequence to any known organism or family of organisms neither deduce anything about the protein molecular function or biological process.


ORF finding

PROTOCOL


a) SMS ORFinder / forward strand / frames 1, 2 & 3 / min 60 AA / 'any codon' initiation / 'standart' genetic code

b) SMS ORFinder / reverse strand / frames 1, 2 & 3 / min 60 AA / 'any codon' initiation / 'standart' genetic code



RESULTS ANALYSIS


In our nucleotide sequence analysis we found two ORFs in reading frame 1 on the forward strand, one with 182 aa that extends from base 1 to base 183 with a start codon GAG, which encodes a glutamic acid (E), and ends in frame ACC which encodes a threonine(T) followed by the stop codon TAG, and another with 506 aa that extends from base 241 to base 747 with a start codon TTC, which encodes a phenylalanine (F), and ends in frame GCA which encodes an alanine (A) followed by the stop codon TGA.

We also obtained one ORF in reading frame 2 on the direct strand with 200 aa, that extends from base 200 to base 400 with a start codon CCG, which encodes a proline (P), and ends in frame ATT which encodes an isoleucine(I) followed by the stop codon TGA.

Was found one ORF in the reading frame 3 on the direct strand with 266 aa,that extends from base 207 to base 473 with a start codon CAT, which encodes a histidine (H), and ends in frame TGT which encodes a cysteine(C) followed by the stop codon TGA.

On the reverse strand it was found two ORfs, one in reading frame 1 with 323 aa, that extends from base 298 to base 621 with a start codon GCC, which encodes an alanine (A), and ends in frame TCC which encodes a serine(S) followed by the stop codon TAG. And another in the reading frame 3 with 797 aa, that extends from base 3 to base 800 with a start codon GCA, which encodes an alanine (A), and ends in frame CTC which encodes a leucine(L).

The ORF used was the one on the reverse strand in the reading frame 3.

RAW RESULTS

a) forward strand

>ORF number 1 in reading frame 1 on the direct strand extends from base 1 to base 183.
GAGCATTCCATCGTACTCGGGATTGTAAGCATCCGAGGGCGATGCCGGGTCGAGGTCCGA
GTAGAAATCAGGTCGTTGCTCAATGGAAATGTCGATGGGTTCGCTCAAATCCAAGGTGAG
TGTAACATCAACCGTGTAGGGAATTCGAGATTCAAAGAACTCAGGATCATTAAGGATACC
TAG

>Translation of ORF number 1 in reading frame 1 on the direct strand.
EHSIVLGIVSIRGRCRVEVRVEIRSLLNGNVDGFAQIQGECNINRVGNSRFKELRIIKDT
*

>ORF number 2 in reading frame 1 on the direct strand extends from base 241 to base 747.
TTCAACATCGAGGTCGCAAATCCATGCGTCCGAAACACGTGTGCACGCTCCTTGTGCATC
GTTGCTTCCATCTTCTCCGAGATCGATTTCAAGTTGCAAAGCACCTCCTTGTGCGGTATC
CAAATCAAACTCTGTTGGAACGAAGCGAGCCTCCATTTGAGTTGTGCCGTTTGGAATCCA
CACATAATGCGAATCTTCTGTGTAATACACAGCTCCAGTGGCACCGTTGTTGAAATGATT
CCAATCGCCCTTCCAACTATGGCTTACTCGATCAGTATTGACTGGAATAGATGTTTCAAC
CATCATGGATTCGTAAATATCCCATGCATCCCAAACTGTGTATTCTGGATGATCAACAAA
ACCGTCCTCATCTTGGTCGCGCATGAGTTGAAGCGTTCGTGCCAATGCCATGGCTGCATC
AACATCGGTCAAACCATGACCAACCCTCCAATCATGGCATTCACCCGTTGCCGAGCGATA
ACATTCCTCGGGAATATCGTTGCATGA

>Translation of ORF number 2 in reading frame 1 on the direct strand.
FNIEVANPCVRNTCARSLCIVASIFSEIDFKLQSTSLCGIQIKLCWNEASLHLSCAVWNP
HIMRIFCVIHSSSGTVVEMIPIALPTMAYSISIDWNRCFNHHGFVNIPCIPNCVFWMINK
TVLILVAHELKRSCQCHGCINIGQTMTNPPIMAFTRCRAITFLGNIVA*

>ORF number 1 in reading frame 2 on the direct strand extends from base 200 to base 400.
CCGGTGACATCGAAATGCGCCCATGAGTTCCAGTGCGATGATTCAACATCGAGGTCGCAA
ATCCATGCGTCCGAAACACGTGTGCACGCTCCTTGTGCATCGTTGCTTCCATCTTCTCCG
AGATCGATTTCAAGTTGCAAAGCACCTCCTTGTGCGGTATCCAAATCAAACTCTGTTGGA
ACGAAGCGAGCCTCCATTTGA

>Translation of ORF number 1 in reading frame 2 on the direct strand.
PVTSKCAHEFQCDDSTSRSQIHASETRVHAPCASLLPSSPRSISSCKAPPCAVSKSNSVG
TKRASI*

>ORF number 1 in reading frame 3 on the direct strand extends from base 207 to base 473.
CATCGAAATGCGCCCATGAGTTCCAGTGCGATGATTCAACATCGAGGTCGCAAATCCATG
CGTCCGAAACACGTGTGCACGCTCCTTGTGCATCGTTGCTTCCATCTTCTCCGAGATCGA
TTTCAAGTTGCAAAGCACCTCCTTGTGCGGTATCCAAATCAAACTCTGTTGGAACGAAGC
GAGCCTCCATTTGAGTTGTGCCGTTTGGAATCCACACATAATGCGAATCTTCTGTGTAAT
ACACAGCTCCAGTGGCACCGTTGTTGA

>Translation of ORF number 1 in reading frame 3 on the direct strand.
HRNAPMSSSAMIQHRGRKSMRPKHVCTLLVHRCFHLLRDRFQVAKHLLVRYPNQTLLERS
EPPFELCRLESTHNANLLCNTQLQWHRC*

---------------------------------------------------------------------------------------------------
b) reverse strand

>ORF number 1 in reading frame 1 on the reverse strand extends from base 298 to base 621.
GCCATAGTTGGAAGGGCGATTGGAATCATTTCAACAACGGTGCCACTGGAGCTGTGTATT
ACACAGAAGATTCGCATTATGTGTGGATTCCAAACGGCACAACTCAAATGGAGGCTCGCT
TCGTTCCAACAGAGTTTGATTTGGATACCGCACAAGGAGGTGCTTTGCAACTTGAAATCG
ATCTCGGAGAAGATGGAAGCAACGATGCACAAGGAGCGTGCACACGTGTTTCGGACGCAT
GGATTTGCGACCTCGATGTTGAATCATCGCACTGGAACTCATGGGCGCATTTCGATGTCA
CCGGTCAAGCATTCACGATCCTAG

>Translation of ORF number 1 in reading frame 1 on the reverse strand.
AIVGRAIGIISTTVPLELCITQKIRIMCGFQTAQLKWRLASFQQSLIWIPHKEVLCNLKS
ISEKMEATMHKERAHVFRTHGFATSMLNHRTGTHGRISMSPVKHSRS*

No ORFs were found in reading frame 2.

>ORF number 1 in reading frame 3 on the reverse strand extends from base 3 to base 800.
GCACGTTACGAAGGAATGGACAATGCGTGCGAAGAAGGTAACGACGAAGATTCATGCAAC
GATATTCCCGAGGAATGTTATCGCTCGGCAACGGGTGAATGCCATGATTGGAGGGTTGGT
CATGGTTTGACCGATGTTGATGCAGCCATGGCATTGGCACGAACGCTTCAACTCATGCGC
GACCAAGATGAGGACGGTTTTGTTGATCATCCAGAATACACAGTTTGGGATGCATGGGAT
ATTTACGAATCCATGATGGTTGAAACATCTATTCCAGTCAATACTGATCGAGTAAGCCAT
AGTTGGAAGGGCGATTGGAATCATTTCAACAACGGTGCCACTGGAGCTGTGTATTACACA
GAAGATTCGCATTATGTGTGGATTCCAAACGGCACAACTCAAATGGAGGCTCGCTTCGTT
CCAACAGAGTTTGATTTGGATACCGCACAAGGAGGTGCTTTGCAACTTGAAATCGATCTC
GGAGAAGATGGAAGCAACGATGCACAAGGAGCGTGCACACGTGTTTCGGACGCATGGATT
TGCGACCTCGATGTTGAATCATCGCACTGGAACTCATGGGCGCATTTCGATGTCACCGGT
CAAGCATTCACGATCCTAGGTATCCTTAATGATCCTGAGTTCTTTGAATCTCGAATTCCC
TACACGGTTGATGTTACACTCACCTTGGATTTGAGCGAACCCATCGACATTTCCATTGAG
CAACGACCTGATTTCTACTCGGACCTCGACCCGGCATCGCCCTCGGATGCTTACAATCCC
GAGTACGATGGAATGCTC

>Translation of ORF number 1 in reading frame 3 on the reverse strand.
ARYEGMDNACEEGNDEDSCNDIPEECYRSATGECHDWRVGHGLTDVDAAMALARTLQLMR
DQDEDGFVDHPEYTVWDAWDIYESMMVETSIPVNTDRVSHSWKGDWNHFNNGATGAVYYT
EDSHYVWIPNGTTQMEARFVPTEFDLDTAQGGALQLEIDLGEDGSNDAQGACTRVSDAWI
CDLDVESSHWNSWAHFDVTGQAFTILGILNDPEFFESRIPYTVDVTLTLDLSEPIDISIE
QRPDFYSDLDPASPSDAYNPEYDGML

Multiple Alignement

PROTOCOL



RESULTS ANALYSIS


We can't make the multiple alignement because the E-value is very high (0,82), which make the results inconclusive.

RAW RESULTS

Protein Domains

PROTOCOL


InterProScan : default parameters at EBI


RESULTS ANALYSIS


On the InterPro Scan results for our ORF,the only result obtain was to a calcium-binding site and this is not a reliable result, so we can't predict any protein domains for this ORF.

RAW RESULTS

Sequence_1	96013972668C5881	266	PatternScan	PS00018	EF_HAND_1	61	73	NA	?	26-Mar-2010	IPR018247	EF-Hand 1, calcium-binding site	

Phylogeny

PROTOCOL



RESULTS ANALYSIS


We can't make the taxonomy report neither the multiple alignement because the E-value is very high (0,82), which make the results inconclusive. So we don't have any results to build the phylogenetic tree.

RAW RESULTS

Taxonomy report

PROTOCOL




RESULTS ANALYSIS


We can't make the taxonomy report because the E-value is very high (0.82), which make the results inconclusive.

RAW RESULTS

BLAST

PROTOCOL


a)BLASTp versus NR, NCBI default parameters * 'Number of descriptions_1000'

b)BLASTx versus NR, NCBI default parameters apart from "Number of descriptions_1000"


RESULTS ANALYSIS


Since the E values from the BLASTp (0.82) and BLASTx (4.1) are very high none of the analises made from this ORF will have any reliability. The first hit on BLASTp says that it is a conversed hypothetical protein so the function of these protein that is the most homologous with ours, it's not been discovered yet.



RAW RESULTS

a)BLASTp

                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|ZP_06288220.1|  conserved hypothetical protein [Prevotella...  38.5    0.82 
ref|XP_001990575.1|  GH18166 [Drosophila grimshawi] >gb|EDV936...  37.7    1.2  
ref|YP_003088182.1|  glucosamine-6-phosphate isomerase [Dyadob...  35.4    6.0  

ALIGNMENTS
>ref|ZP_06288220.1| conserved hypothetical protein [Prevotella timonensis CRIS 5C-B1]
 gb|EFA98659.1| conserved hypothetical protein [Prevotella timonensis CRIS 5C-B1]
Length=670

 Score = 38.5 bits (88),  Expect = 0.82, Method: Compositional matrix adjust.
 Identities = 27/92 (29%), Positives = 44/92 (47%), Gaps = 14/92 (15%)

Query  3    YEGMDNACEEGNDEDSCNDIPE--ECYRSATGECHDWRV---GHGLTDVDAAMALARTLQ  57
            YE M +   +G +    ++I E  +  +S  G     RV     GLTD  A++A      
Sbjct  102  YEAMHDQLFKGQNVQPTDEIVEVAKLVKSTKGNISQLRVYIITDGLTDPTASIA------  155

Query  58   LMRDQDEDGFVDHPEYTVWDAWDIYESMMVET  89
               + +E+GFV   EY +WD W +Y+   ++T
Sbjct  156  -PEENEEEGFV--IEYNIWDMWRVYQQHNIKT  184


>ref|XP_001990575.1| GH18166 [Drosophila grimshawi]
 gb|EDV93637.1| GH18166 [Drosophila grimshawi]
Length=518

 Score = 37.7 bits (86),  Expect = 1.2, Method: Compositional matrix adjust.
 Identities = 20/68 (29%), Positives = 31/68 (45%), Gaps = 0/68 (0%)

Query  140  VPTEFDLDTAQGGALQLEIDLGEDGSNDAQGACTRVSDAWICDLDVESSHWNSWAHFDVT  199
            +P  F L    G  L L   L E+ S  +  +  R  DA IC++ +  +H+    HF+  
Sbjct  91   IPVFFRLPALHGMGLSLTQSLLEEHSVQSLLSQNRTFDAVICEVFMNEAHFGFAEHFNAP  150

Query  200  GQAFTILG  207
               F+ LG
Sbjct  151  LITFSTLG  158


>ref|YP_003088182.1| glucosamine-6-phosphate isomerase [Dyadobacter fermentans DSM 
18053]
 gb|ACT95017.1| glucosamine-6-phosphate isomerase [Dyadobacter fermentans DSM 
18053]
Length=646

 Score = 35.4 bits (80),  Expect = 6.0, Method: Compositional matrix adjust.
 Identities = 18/52 (34%), Positives = 28/52 (53%), Gaps = 4/52 (7%)

Query  76   WDAWDIYESMMVETSIPVNTDRVSHSWKGDWNHFNNGATGAVYYTEDSHYVW  127
            W  WDIYE   +E ++P++ D+V    +G + H  +   G V+  EDS   W
Sbjct  566  WAEWDIYE---IEMAVPMSPDQVLKKRQGIFKH-QSQKDGVVFQGEDSREFW  613


b)BLASTx


                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|YP_003088182.1|  glucosamine-6-phosphate isomerase [Dyadob...  36.2    4.1  

ALIGNMENTS
>ref|YP_003088182.1| glucosamine-6-phosphate isomerase [Dyadobacter fermentans DSM 
18053]
 gb|ACT95017.1| glucosamine-6-phosphate isomerase [Dyadobacter fermentans DSM 
18053]
Length=646

 Score = 36.2 bits (82),  Expect = 4.1
 Identities = 18/52 (34%), Positives = 28/52 (53%), Gaps = 4/52 (7%)
 Frame = +1

Query  226  WDAWDIYESMMVETSIPVNTDRVSHSWKGDWNHFNNGATGAVYYTEDSHYVW  381
            W  WDIYE   +E ++P++ D+V    +G + H  +   G V+  EDS   W
Sbjct  566  WAEWDIYE---IEMAVPMSPDQVLKKRQGIFKH-QSQKDGVVFQGEDSREFW  613