GOS 2146020

From Metagenes
Warning: this metagenomic sequence has been carefully annotated by students during bioinformatics assignments. These quality annotations are therefore the result of a teaching exercise that you are most welcome to amend and extend if necessary!


Sequence
CAMERA AccNum : JCVI_READ_1091143176922
Annotathon code: GOS_2146020
Sample :
  • GPS :5°33'10n; 87°5'16w
  • Eastern Tropical Pacific: Dirty Rock, Cocos Island - Costa Rica
  • Fringing Reef (-1.1m, 28.3°C, 0.8-3.0 microns)
Authors
Team : Algarve 2011
Username : SGIA
Annotated on : 2011-05-24 21:24:03
  • Alves Iolanda
  • Gonçalves Sylvie

Synopsis

Genomic Sequence

>JCVI_READ_1091143176922 GOS_2146020 Genomic DNA
CTCGCTTCGGACGTTTCGACTGTTGATCACGAGAGCATCGAGCAGGCGTCACCCGTATCGGACCGTGAGTATGGACCTGCCGTCACGCCAATGCGCGAGG
TGCCGGTGTCGTGCAGCAGCGAGCCAGAGGATCACGGGACGGCAGCGACTCCGACACCGCTCTCGAGTGCGGCAGCGTGGTGCACGAGGGCGATGCTCAG
TTCGTCGTCCAACCCACAGCTCCTCTCCAGCTCGTCCACCTCGATGTCTCCGGCGCTGCCGTCCGGAAGTGTGGCCATGTCGCCGTCGCCTGCAGTAGGC
CGCCCGTCACATGTGGCCAGGGCGATTGCGGAGGAGGCAGCTGCACCAGACGAGACCCTTGGCATGATTGGGGACCTGAGAGAGCTCGCTGCCCGCCTCA
ACGGTGGAGTCGCCGATCCGAACACACCGCAGCAGGAGGCCATGTACATGAGAGGCCTCGAGCTTCCTTTCGATGACAGCGAGCTGACTCCAGCCGACAT
GACTCCGACGCAAACGGGAGGGCTGACGAGCCCTCCCACTACCGCTGGCTCGAGCGGCAAGATGCGGCGCATCGTGTCTGCGAGCGATAGCGAGACGACC
TACGAGACCGAGGACGAGCAGGGCAGACCGCTCTCTGTGCGCAGGGCGACGACGCACACAGCCACAGTGACGGTCTCCACCCCGTCGGAAGTAGTCGTCA
ATGTTGGGTCTCCACCCCGTCAGGCGGCTCCTCGCGCGGCAGAGGCAGCGACGGCGATTGCACAGGCGGCTCCTCGCGCGGCAGAGGCAGTGGCAAGCAC
CAGCGACGACCAGGCCCCGGCGATGTCGGCAGAGGCAAGGGCGCTCATCGCGTCCACTTCACAGAGCACGCTCGAGCGCACGCGCCGAGACGAGGAGAAG
ATGCGTGTGTGATTTTGTGGAGAGGCA

Translation

[1 - 909/927]   direct strand
>GOS_2146020 Translation [1-909   direct strand]
LASDVSTVDHESIEQASPVSDREYGPAVTPMREVPVSCSSEPEDHGTAATPTPLSSAAAWCTRAMLSSSSNPQLLSSSSTSMSPALPSGSVAMSPSPAVG
RPSHVARAIAEEAAAPDETLGMIGDLRELAARLNGGVADPNTPQQEAMYMRGLELPFDDSELTPADMTPTQTGGLTSPPTTAGSSGKMRRIVSASDSETT
YETEDEQGRPLSVRRATTHTATVTVSTPSEVVVNVGSPPRQAAPRAAEAATAIAQAAPRAAEAVASTSDDQAPAMSAEARALIASTSQSTLERTRRDEEK
MRV

[ Warning ] 5' incomplete: does not start with a Methionine

Annotator commentaries

We opted for the status coding because the ORF chosen has 304 aminoaacid, it is an ORFan so it probable code for a protein.


However, the BLAST analysis results, revealed too high E-Values (0.41) and very low scores (40) which means that our ORF probably had no homologs.


Performing further analysis, we found impossible to draw conclusions about the function of the protein and define the ingroup and the outgroup to predict the taxonomic classification and also to do the multiple alignement. These results, although negative, confirm the results obtained in BLAST. Then we can affirm that this ORF has no statistical and biological significance known.






ORF finding

PROTOCOL


a) SMS ORFinder/ forward strand/ frames 1,2 ,3 /min 60 AA/ 'any codon' initiation/ 'standard' genetic code

b) SMS ORFinder/ reverse strand/ frames 1,2 ,3 /min 60 AA/ 'any codon' initiation/ 'standard' genetic code


RESULTS ANALYSIS


On the direct strand were found five ORFs, one in the reading frame +1, two in the reading frame +2 and two in the reading frame +3. On the indirect strand we found five ORFs, two in the reading frame -1, two in the reading frame -2 and

one in the reading frame -3.


Our longest ORF is in the reading frame +1 on the direct strand, it has 912 nucleotides extending from base 1 to base 912. We have choosen this ORF to perform further analysis because it has 304 aa, which means, it is an ORFan (over 200 aminoacids), then it is probable that code for a protein. Having this into consideration we opted for the status coding. However, the BLASTp results showed a high E-Values (40.0) and low scores(0.41),in other words the sequence does not have homology. This ORF is complete in 3´end because it has a stop codon and it is incomplete in 5´end.


Conserning the remaining ORFs, four of them were not chosen because the BLASTp analysis did not find significant similarity. In the other six ORFs, the BLASTp analysis obtained high E-Values (0.90; 1.9; 0.26; 1.9; 0.50) and low scores (37.0; 35.8; 39.3; 35.8; 37.79) which shows that these ORFs does not have homology.



RAW RESULTS

a) direct strand
>ORF number 1 in reading frame +1 on the direct strand extends from base 1 to base 912.
CTCGCTTCGGACGTTTCGACTGTTGATCACGAGAGCATCGAGCAGGCGTCACCCGTATCG
GACCGTGAGTATGGACCTGCCGTCACGCCAATGCGCGAGGTGCCGGTGTCGTGCAGCAGC
GAGCCAGAGGATCACGGGACGGCAGCGACTCCGACACCGCTCTCGAGTGCGGCAGCGTGG
TGCACGAGGGCGATGCTCAGTTCGTCGTCCAACCCACAGCTCCTCTCCAGCTCGTCCACC
TCGATGTCTCCGGCGCTGCCGTCCGGAAGTGTGGCCATGTCGCCGTCGCCTGCAGTAGGC
CGCCCGTCACATGTGGCCAGGGCGATTGCGGAGGAGGCAGCTGCACCAGACGAGACCCTT
GGCATGATTGGGGACCTGAGAGAGCTCGCTGCCCGCCTCAACGGTGGAGTCGCCGATCCG
GAGCTGACTCCAGCCGACATGACTCCGACGCAAACGGGAGGGCTGACGAGCCCTCCCACT
ACCGCTGGCTCGAGCGGCAAGATGCGGCGCATCGTGTCTGCGAGCGATAGCGAGACGACC
TACGAGACCGAGGACGAGCAGGGCAGACCGCTCTCTGTGCGCAGGGCGACGACGCACACA
GCCACAGTGACGGTCTCCACCCCGTCGGAAGTAGTCGTCAATGTTGGGTCTCCACCCCGT
CAGGCGGCTCCTCGCGCGGCAGAGGCAGCGACGGCGATTGCACAGGCGGCTCCTCGCGCG
GCAGAGGCAGTGGCAAGCACCAGCGACGACCAGGCCCCGGCGATGTCGGCAGAGGCAAGG
GCGCTCATCGCGTCCACTTCACAGAGCACGCTCGAGCGCACGCGCCGAGACGAGGAGAAG
ATGCGTGTGTGA

>Translation of ORF number 1 in reading frame +1 on the direct strand.
LASDVSTVDHESIEQASPVSDREYGPAVTPMREVPVSCSSEPEDHGTAATPTPLSSAAAW
CTRAMLSSSSNPQLLSSSSTSMSPALPSGSVAMSPSPAVGRPSHVARAIAEEAAAPDETL
GMIGDLRELAARLNGGVADPNTPQQEAMYMRGLELPFDDSELTPADMTPTQTGGLTSPPT
TAGSSGKMRRIVSASDSETTYETEDEQGRPLSVRRATTHTATVTVSTPSEVVVNVGSPPR
QAAPRAAEAATAIAQAAPRAAEAVASTSDDQAPAMSAEARALIASTSQSTLERTRRDEEK
MRV*

>ORF number 1 in reading frame 2 on the direct strand extends from base 2 to base 298.
TCGCTTCGGACGTTTCGACTGTTGATCACGAGAGCATCGAGCAGGCGTCACCCGTATCGG
ACCGTGAGTATGGACCTGCCGTCACGCCAATGCGCGAGGTGCCGGTGTCGTGCAGCAGCG
AGCCAGAGGATCACGGGACGGCAGCGACTCCGACACCGCTCTCGAGTGCGGCAGCGTGGT
GCACGAGGGCGATGCTCAGTTCGTCGTCCAACCCACAGCTCCTCTCCAGCTCGTCCACCT
CGATGTCTCCGGCGCTGCCGTCCGGAAGTGTGGCCATGTCGCCGTCGCCTGCAGTAG

>Translation of ORF number 1 in reading frame 2 on the direct strand.
SLRTFRLLITRASSRRHPYRTVSMDLPSRQCARCRCRAAASQRITGRQRLRHRSRVRQRG
ARGRCSVRRPTHSSSPARPPRCLRRCRPEVWPCRRRLQ*

>ORF number 2 in reading frame 2 on the direct strand extends from base 695 to base 925.
TCGTCAATGTTGGGTCTCCACCCCGTCAGGCGGCTCCTCGCGCGGCAGAGGCAGCGACGG
CGATTGCACAGGCGGCTCCTCGCGCGGCAGAGGCAGTGGCAAGCACCAGCGACGACCAGG
CCCCGGCGATGTCGGCAGAGGCAAGGGCGCTCATCGCGTCCACTTCACAGAGCACGCTCG
AGCGCACGCGCCGAGACGAGGAGAAGATGCGTGTGTGATTTTGTGGAGAGG

>Translation of ORF number 2 in reading frame 2 on the direct strand.
SSMLGLHPVRRLLARQRQRRRLHRRLLARQRQWQAPATTRPRRCRQRQGRSSRPLHRARS
SARAETRRRCVCDFVER

>ORF number 1 in reading frame 3 on the direct strand extends from base 69 to base 476.
GTATGGACCTGCCGTCACGCCAATGCGCGAGGTGCCGGTGTCGTGCAGCAGCGAGCCAGA
GGATCACGGGACGGCAGCGACTCCGACACCGCTCTCGAGTGCGGCAGCGTGGTGCACGAG
GGCGATGCTCAGTTCGTCGTCCAACCCACAGCTCCTCTCCAGCTCGTCCACCTCGATGTC
TCCGGCGCTGCCGTCCGGAAGTGTGGCCATGTCGCCGTCGCCTGCAGTAGGCCGCCCGTC
ACATGTGGCCAGGGCGATTGCGGAGGAGGCAGCTGCACCAGACGAGACCCTTGGCATGAT
TGGGGACCTGAGAGAGCTCGCTGCCCGCCTCAACGGTGGAGTCGCCGATCCGAACACACC
GCAGCAGGAGGCCATGTACATGAGAGGCCTCGAGCTTCCTTTCGATGA

>Translation of ORF number 1 in reading frame 3 on the direct strand.
VWTCRHANARGAGVVQQRARGSRDGSDSDTALECGSVVHEGDAQFVVQPTAPLQLVHLDV
SGAAVRKCGHVAVACSRPPVTCGQGDCGGGSCTRRDPWHDWGPERARCPPQRWSRRSEHT
AAGGHVHERPRASFR*

>ORF number 2 in reading frame 3 on the direct strand extends from base 591 to base 926.
CGAGACGACCTACGAGACCGAGGACGAGCAGGGCAGACCGCTCTCTGTGCGCAGGGCGAC
GACGCACACAGCCACAGTGACGGTCTCCACCCCGTCGGAAGTAGTCGTCAATGTTGGGTC
TCCACCCCGTCAGGCGGCTCCTCGCGCGGCAGAGGCAGCGACGGCGATTGCACAGGCGGC
TCCTCGCGCGGCAGAGGCAGTGGCAAGCACCAGCGACGACCAGGCCCCGGCGATGTCGGC
AGAGGCAAGGGCGCTCATCGCGTCCACTTCACAGAGCACGCTCGAGCGCACGCGCCGAGA
CGAGGAGAAGATGCGTGTGTGATTTTGTGGAGAGGC

>Translation of ORF number 2 in reading frame 3 on the direct strand.
RDDLRDRGRAGQTALCAQGDDAHSHSDGLHPVGSSRQCWVSTPSGGSSRGRGSDGDCTGG
SSRGRGSGKHQRRPGPGDVGRGKGAHRVHFTEHARAHAPRRGEDACVILWRG


b)Indirect strand
>ORF number 1 in reading frame 1 on the indirect strand extends from base 70 to base 621.
AGTGGACGCGATGAGCGCCCTTGCCTCTGCCGACATCGCCGGGGCCTGGTCGTCGCTGGT
GCTTGCCACTGCCTCTGCCGCGCGAGGAGCCGCCTGTGCAATCGCCGTCGCTGCCTCTGC
CGCGCGAGGAGCCGCCTGACGGGGTGGAGACCCAACATTGACGACTACTTCCGACGGGGT
GGAGACCGTCACTGTGGCTGTGTGCGTCGTCGCCCTGCGCACAGAGAGCGGTCTGCCCTG
CTCGTCCTCGGTCTCGTAGGTCGTCTCGCTATCGCTCGCAGACACGATGCGCCGCATCTT
GCCGCTCGAGCCAGCGGTAGTGGGAGGGCTCGTCAGCCCTCCCGTTTGCGTCGGAGTCAT
GTCGGCTGGAGTCAGCTCGCTGTCATCGAAAGGAAGCTCGAGGCCTCTCATGTACATGGC
CTCCTGCTGCGGTGTGTTCGGATCGGCGACTCCACCGTTGAGGCGGGCAGCGAGCTCTCT
CAGGTCCCCAATCATGCCAAGGGTCTCGTCTGGTGCAGCTGCCTCCTCCGCAATCGCCCT
GGCCACATGTGA

>Translation of ORF number 1 in reading frame 1 on the indirect strand.
SGRDERPCLCRHRRGLVVAGACHCLCRARSRLCNRRRCLCRARSRLTGWRPNIDDYFRRG
GDRHCGCVRRRPAHRERSALLVLGLVGRLAIARRHDAPHLAARASGSGRARQPSRLRRSH
VGWSQLAVIERKLEASHVHGLLLRCVRIGDSTVEAGSELSQVPNHAKGLVWCSCLLRNRP
GHM*

>ORF number 2 in reading frame 1 on the indirect strand extends from base 622 to base 879.
CGGGCGGCCTACTGCAGGCGACGGCGACATGGCCACACTTCCGGACGGCAGCGCCGGAGA
CATCGAGGTGGACGAGCTGGAGAGGAGCTGTGGGTTGGACGACGAACTGAGCATCGCCCT
CGTGCACCACGCTGCCGCACTCGAGAGCGGTGTCGGAGTCGCTGCCGTCCCGTGATCCTC
TGGCTCGCTGCTGCACGACACCGGCACCTCGCGCATTGGCGTGACGGCAGGTCCATACTC
ACGGTCCGATACGGGTGA

>Translation of ORF number 2 in reading frame 1 on the indirect strand.
RAAYCRRRRHGHTSGRQRRRHRGGRAGEELWVGRRTEHRPRAPRCRTRERCRSRCRPVIL
WLAAARHRHLAHWRDGRSILTVRYG*

>ORF number 1 in reading frame 2 on the indirect strand extends from base 2 to base 208.
GCCTCTCCACAAAATCACACACGCATCTTCTCCTCGTCTCGGCGCGTGCGCTCGAGCGTG
CTCTGTGAAGTGGACGCGATGAGCGCCCTTGCCTCTGCCGACATCGCCGGGGCCTGGTCG
TCGCTGGTGCTTGCCACTGCCTCTGCCGCGCGAGGAGCCGCCTGTGCAATCGCCGTCGCT
GCCTCTGCCGCGCGAGGAGCCGCCTGA

>Translation of ORF number 1 in reading frame 2 on the indirect strand.
ASPQNHTRIFSSSRRVRSSVLCEVDAMSALASADIAGAWSSLVLATASAARGAACAIAVA
ASAARGAA*

>ORF number 2 in reading frame 2 on the indirect strand extends from base 329 to base 796.
GTCGTCTCGCTATCGCTCGCAGACACGATGCGCCGCATCTTGCCGCTCGAGCCAGCGGTA
GTGGGAGGGCTCGTCAGCCCTCCCGTTTGCGTCGGAGTCATGTCGGCTGGAGTCAGCTCG
CTGTCATCGAAAGGAAGCTCGAGGCCTCTCATGTACATGGCCTCCTGCTGCGGTGTGTTC
GGATCGGCGACTCCACCGTTGAGGCGGGCAGCGAGCTCTCTCAGGTCCCCAATCATGCCA
AGGGTCTCGTCTGGTGCAGCTGCCTCCTCCGCAATCGCCCTGGCCACATGTGACGGGCGG
CCTACTGCAGGCGACGGCGACATGGCCACACTTCCGGACGGCAGCGCCGGAGACATCGAG
GTGGACGAGCTGGAGAGGAGCTGTGGGTTGGACGACGAACTGAGCATCGCCCTCGTGCAC
CACGCTGCCGCACTCGAGAGCGGTGTCGGAGTCGCTGCCGTCCCGTGA

>Translation of ORF number 2 in reading frame 2 on the indirect strand.
VVSLSLADTMRRILPLEPAVVGGLVSPPVCVGVMSAGVSSLSSKGSSRPLMYMASCCGVF
GSATPPLRRAASSLRSPIMPRVSSGAAASSAIALATCDGRPTAGDGDMATLPDGSAGDIE
VDELERSCGLDDELSIALVHHAAALESGVGVAAVP*

>ORF number 1 in reading frame 3 on the indirect strand extends from base 531 to base 731.
GGCGGGCAGCGAGCTCTCTCAGGTCCCCAATCATGCCAAGGGTCTCGTCTGGTGCAGCTG
CCTCCTCCGCAATCGCCCTGGCCACATGTGACGGGCGGCCTACTGCAGGCGACGGCGACA
TGGCCACACTTCCGGACGGCAGCGCCGGAGACATCGAGGTGGACGAGCTGGAGAGGAGCT
GTGGGTTGGACGACGAACTGA

>Translation of ORF number 1 in reading frame 3 on the indirect strand.
GGQRALSGPQSCQGSRLVQLPPPQSPWPHVTGGLLQATATWPHFRTAAPETSRWTSWRGA
VGWTTN*

Multiple Alignement

PROTOCOL



RESULTS ANALYSIS


It was impossible to do the multiple alignement because the BLAST E-value is too high(0.41)and the score very low (40) which are unreliable data to draw any conclusions.

RAW RESULTS

Protein Domains

PROTOCOL


InterPro, default parameters at EBI


RESULTS ANALYSIS



No domains were found, which means that InterPro does not have in its database the domains corresponding to the domains present in this protein. However, these results confirm the results obtained in BLAST: too high E-Value (0.41) and a very low score (40).


Thus, it is impossible to draw any conclusions about the function of this protein. Perhaps we could do a second analysis in the future, which could provide more definitive results.

RAW RESULTS

Phylogeny

PROTOCOL



RESULTS ANALYSIS



The BLAST alignment with our protein has a too high E-Value (0.41) so the taxonomy report results has no statistical and biological significance and consequently it is not possible to perform the multiple alignement, which means it is not also possible to define the ingroups and the outgroups to build a phylogenetic tree.


Taxonomy report

PROTOCOL

1) BLASTp vs NR default NCBI parameters "1000 max target sequences"

2) BLASTp vs SWISSPROT, default NCBI parameters

3) BLASTx vs NR default NCBI parameters "1000 max target sequences"



RESULTS ANALYSIS


It is not possible to define the ingroup and the outgroup to do a phylogenetic analysis, because the BLAST alignment with our protein has a too high E-Value (0.41), which means it has no statistical and therefore no biological significance. The dissimilarity between the hits obtained in each BLAST are proof of that.



RAW RESULTS

1) BLASTp vs NR default NCBI parameters "1000 max target sequences"

Eukaryota       [eukaryotes]
. Theileria       [apicomplexans]
. . Theileria parva [apicomplexans]
. . . Theileria parva strain Muguga -----   40 1 hit  [apicomplexans]  hypothetical protein [Theileria parva strain Muguga] >gi|68
. . . Theileria parva ...................   40 1 hit  [apicomplexans]  hypothetical protein [Theileria parva strain Muguga] >gi|68
. Ciona intestinalis --------------------   36 1 hit  [tunicates]      PREDICTED: similar to glioma tumor suppressor candidate reg
. Phaeodactylum tricornutum CCAP 1055/1 .   36 2 hits [diatoms]        predicted protein [Phaeodactylum tricornutum CCAP 1055/1] >

-------------------------------------


2) BLASTp vs SWISSPROT, default NCBI parameters

cellular organisms
. Bacteria           [bacteria]
. . Agrobacterium vitis ------------------------   39 5 hits [a-proteobacteria]  RecName: Full=Probable tartrate dehydrogenase/decarboxylase
. . Geobacillus thermodenitrificans NG80-2 .....   34 1 hit  [firmicutes]        RecName: Full=Septation ring formation regulator EzrA
. Arabidopsis thaliana (thale-cress) -----------   36 1 hit  [eudicots]          RecName: Full=Aconitate hydratase 2, mitochondrial; Short=A
. Solanum tuberosum (potatoes) .................   33 1 hit  [eudicots]          RecName: Full=Aconitate hydratase, cytoplasmic; Short=Aconi
. Plasmodium falciparum RO-33 ..................   32 1 hit  [apicomplexans]     RecName: Full=Merozoite surface protein 1; AltName: Full=Me
. Plasmodium falciparum Mad20/Papua New Guinea .   32 1 hit  [apicomplexans]     RecName: Full=Merozoite surface protein 1; AltName: Full=Me
. Plasmodium falciparum FC27/Papua New Guinea ..   32 1 hit  [apicomplexans]     RecName: Full=Merozoite surface protein 1; AltName: Full=Me
. Plasmodium falciparum Palo Alto/Uganda .......   32 1 hit  [apicomplexans]     RecName: Full=Merozoite surface protein 1; AltName: Full=Gp
. Plasmodium falciparum CAMP/Malaysia ..........   32 1 hit  [apicomplexans]     RecName: Full=Merozoite surface protein 1; AltName: Full=Me
. Oryza sativa Japonica Group (Japanese rice) ..   31 1 hit  [monocots]          RecName: Full=Putative leucine aminopeptidase 1; AltName: F
. Homo sapiens (man) ...........................   31 1 hit  [primates]          RecName: Full=Myosin-If; AltName: Full=Myosin-Ie
. Aspergillus terreus NIH2624 ..................   31 1 hit  [ascomycetes]       RecName: Full=Adenylosuccinate synthetase; Short=AMPSase; S




3) BLASTx vs NR default NCBI parameters "1000 max target sequences"

cellular organisms
. Eukaryota          [eukaryotes]
. . Micromonas pusilla CCMP1545 ---------   43 6 hits [green algae]       predicted protein [Micromonas pusilla CCMP1545] >gi|2264605
. . Drosophila yakuba ...................   40 1 hit  [flies]             similar to Drosophila melanogaster mtacp1 [Drosophila yakub
. . Tetraodon nigroviridis ..............   40 2 hits [bony fishes]       unnamed protein product [Tetraodon nigroviridis]
. . Gallus gallus (bantam) ..............   37 1 hit  [birds]             PREDICTED: laminin, gamma 1 (formerly LAMB2) [Gallus gallus]
. . Capsaspora owczarzaki ATCC 30864 ....   36 1 hit  [eukaryotes]        FAM72A protein [Capsaspora owczarzaki ATCC 30864]
. . Homo sapiens (man) ..................   36 2 hits [primates]          PREDICTED: hypothetical protein LOC100506187, partial [Homo
. Streptomyces albus J1074 --------------   40 2 hits [high GC Gram+]     oxygenase [Streptomyces albus J1074] >gi|291352844|gb|EFE79
. Frankia symbiont of Datisca glomerata .   36 2 hits [high GC Gram+]     hypothetical protein FsymDgDRAFT_0602 [Frankia symbiont of 
. Burkholderia cepacia ..................   36 1 hit  [b-proteobacteria]  unknown [Burkholderia cepacia]
. Magnetospirillum magnetotacticum MS-1 .   36 1 hit  [a-proteobacteria]  COG0514: Superfamily II DNA helicase [Magnetospirillum magn
. Pseudomonas putida W619 ...............   35 2 hits [g-proteobacteria]  hypothetical protein PputW619_2853 [Pseudomonas putida W619

BLAST

PROTOCOL

1) BLASTp vs NR default NCBI parameters "1000 max target sequences"

2) BLASTp vs SWISSPROT, default NCBI parameters

3) BLASTx vs NR default NCBI parameters "1000 max target sequences"



RESULTS ANALYSIS



Looking at the BLASTp vs NR and the BLASTp vs SWISSPROT results, the high E-Values (0.41;0.029) and the low scores (40.0; 39.3) have no statistical and biological meaning. So, we did the BLASTX analysis with the our genomic DNA to confirm if there were any sequencing errors in the first results and the E-Value obtained were very high (0.24) and the score results very low (40.8) which are concordant to the first results.


we can conclude that our protein does not have homology known.




RAW RESULTS

1) BLASTp vs NR

                                                                  Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|XP_764562.1|  hypothetical protein [Theileria parva strain...  40.0    0.41 
ref|XP_002122704.1|  PREDICTED: similar to glioma tumor suppre...  36.2    5.9  
ref|XP_002180624.1|  predicted protein [Phaeodactylum tricornu...  36.2    6.0  

ALIGNMENTS
>ref|XP_764562.1| hypothetical protein [Theileria parva strain Muguga]
 gb|EAN32279.1| hypothetical telomeric SfiI fragment 20 protein 3 [Theileria 
parva]
Length=3300

 Score = 40.0 bits (92),  Expect = 0.41, Method: Composition-based stats.
 Identities = 56/245 (23%), Positives = 80/245 (33%), Gaps = 18/245 (7%)

Query  26   PAVTPMREVPVSCSSEPEDHGTAATPTPLSSAAAWCTRAMLSSSSNPQLLSSSSTSMSPA  85
            P +T   E      + PE  G   TP   + AA   T A   S   P    +  T+   A
Sbjct  608  PPLTAPPEAKPIIPTTPEVSGEVVTP---AKAATVTTPAKAPSPKVP----TPPTADESA  660

Query  86   LPSGSVAMSPSPAVGRPSHVARAIAEEAAAPDETLGMIGDLRELAARLNGGVADPNTPQQ  145
             PS +   S +P V  P+    A       PDE+   +                P  P  
Sbjct  661  TPSTTPDESATPVVTTPAKAPDAKVTTPPTPDESATPV-------VTTPAKAPSPKVPTP  713

Query  146  EAMYMRGLELPFDDSELTPADMTPTQT--GGLTSPPTTAGSSGKMRRI-VSASDSE-TTY  201
                         D   TP   TP +     +T+PPT   S+  +      A D++ TT 
Sbjct  714  PTADESATPSTTPDESATPVVTTPAKAPDAKVTTPPTPDESATPVVTTPAKAPDAKVTTP  773

Query  202  ETEDEQGRPLSVRRATTHTATVTVSTPSEVVVNVGSPPRQAAPRAAEAATAIAQAAPRAA  261
             T DE   P +    +   A  T S       +  + P    P+AA     +    P  A
Sbjct  774  PTPDESATPSTTADESATPAKATPSATPSTTADESATPVVTTPKAAPKTEVVKSTDPVPA  833

Query  262  EAVAS  266
            +   S
Sbjct  834  KVSVS  838


>ref|XP_002122704.1| PREDICTED: similar to glioma tumor suppressor candidate region 
gene 1 [Ciona intestinalis]
Length=996

 Score = 36.2 bits (82),  Expect = 5.9, Method: Compositional matrix adjust.
 Identities = 27/118 (23%), Positives = 56/118 (48%), Gaps = 3/118 (2%)

Query  182  AGSSGKMRR-IVSASDSETTYETEDEQGRPLSVRRATTHTATVTVSTPSEVVVNVGSPPR  240
            +GS G +   IV ++  +TT ++ ++   P+ + R    +  + +STP  + +NV + P+
Sbjct  443  SGSEGHVNNTIVQSTRLQTTTQSGNKSNSPVPINRIQIASVNIPISTPQNIQINVTALPK  502

Query  241  QAAPRAAEAATAIAQAAPRAAEAVASTSDDQAPAMSAEARALIAST--SQSTLERTRR  296
             +         +IAQA P A     + S      +++  +  + +   SQ T E+ +R
Sbjct  503  SSFSTDTTKTVSIAQAIPIAVTTATTVSSALVKQLASNLKLKLTTQQASQLTKEQLQR  560


>ref|XP_002180624.1| predicted protein [Phaeodactylum tricornutum CCAP 1055/1]
 gb|EEC48032.1| predicted protein [Phaeodactylum tricornutum CCAP 1055/1]
Length=707

 Score = 36.2 bits (82),  Expect = 6.0, Method: Compositional matrix adjust.
 Identities = 20/58 (35%), Positives = 29/58 (50%), Gaps = 2/58 (3%)

Query  84   PALPSGSVAMSPSPAVG--RPSHVARAIAEEAAAPDETLGMIGDLRELAARLNGGVAD  139
            P  P G + MSPSP  G   P  V  ++ E   A ++   +  DL+EL  R+ G + D
Sbjct  649  PVAPRGCMIMSPSPLFGTPHPGPVYGSVKENPFADEDDEQIAADLQELGGRMVGSILD  706


---------------------------------------------------------------------------------


2) BLASTp vs SWISSPROT
               
                                                                  Score     E
Sequences producing significant alignments:                       (Bits)  Value

sp|P70792.1|TTUC4_AGRVI  RecName: Full=Probable tartrate dehyd...  39.3    0.029
sp|P70787.1|TTUC2_AGRVI  RecName: Full=Probable tartrate dehyd...  39.3    0.033
sp|O34296.1|TTUC3_AGRVI  RecName: Full=Probable tartrate dehyd...  39.3    0.036
sp|O34295.1|TTUC5_AGRVI  RecName: Full=Probable tartrate dehyd...  38.9    0.037
sp|Q44471.1|TTUC1_AGRVI  RecName: Full=Probable tartrate dehyd...  38.9    0.038
sp|Q9SIB9.2|ACO2M_ARATH  RecName: Full=Aconitate hydratase 2, ...  36.6    0.21 
sp|A4IRU1.1|EZRA_GEOTN  RecName: Full=Septation ring formation...  34.7    0.73 
sp|O04916.1|ACOC_SOLTU  RecName: Full=Aconitate hydratase, cyt...  33.1    2.6  
sp|P19598.2|MSP1_PLAF3  RecName: Full=Merozoite surface protei...  32.7    2.7  
sp|P08569.3|MSP1_PLAFM  RecName: Full=Merozoite surface protei...  32.7    2.7  
sp|P13819.1|MSP1_PLAFF  RecName: Full=Merozoite surface protei...  32.7    2.7  
sp|P50495.1|MSP1_PLAFP  RecName: Full=Merozoite surface protei...  32.7    2.7  
sp|P04934.2|MSP1_PLAFC  RecName: Full=Merozoite surface protei...  32.7    2.7  
sp|Q2QSB9.2|AMPL1_ORYSJ  RecName: Full=Putative leucine aminop...  32.0    4.9  
sp|O00160.3|MYO1F_HUMAN  RecName: Full=Myosin-If; AltName: Ful...  31.6    7.1  
sp|Q0CM45.1|PURA_ASPTN  RecName: Full=Adenylosuccinate synthet...  31.6    7.4  

ALIGNMENTS
>sp|P70792.1|TTUC4_AGRVI RecName: Full=Probable tartrate dehydrogenase/decarboxylase ttuC'; 
Short=TDH; AltName: Full=D-malate dehydrogenase [decarboxylating]
Length=358

 Score = 39.3 bits (90),  Expect = 0.029, Method: Compositional matrix adjust.
 Identities = 36/150 (24%), Positives = 53/150 (35%), Gaps = 17/150 (11%)

Query  126  LRELAARLNGGVADPNTPQQEAMYMRGLELPFDDSELTPADMTPTQTGGLTSPPTTAGSS  185
            L++  A   G V  P+ P    ++  GL LP        A++ PT+     +PP      
Sbjct  65   LKKFDAIFFGAVGAPDVPDHITLW--GLRLPICQGFDQYANVRPTKVLPGITPPLRNCGP  122

Query  186  GKMRRIVSASDSETTYETEDEQGRPLSVRRATTHTATVTVSTPSEVVVNVGSPPRQAAPR  245
            G +  ++   +SE  Y               + H        P EV   V    R    R
Sbjct  123  GDLDWVIVRENSEGEY---------------SGHGGRAHKGLPEEVGTEVAIFTRVGVTR  167

Query  246  AAEAATAIAQAAPRAAEAVASTSDDQAPAM  275
                A  +AQA PR    V + S+ Q   M
Sbjct  168  IMRYAFKLAQARPRKLLTVVTKSNAQRHGM  197


>sp|P70787.1|TTUC2_AGRVI RecName: Full=Probable tartrate dehydrogenase/decarboxylase ttuC; 
Short=TDH; AltName: Full=D-malate dehydrogenase [decarboxylating]
Length=364

 Score = 39.3 bits (90),  Expect = 0.033, Method: Compositional matrix adjust.
 Identities = 36/150 (24%), Positives = 53/150 (35%), Gaps = 17/150 (11%)

Query  126  LRELAARLNGGVADPNTPQQEAMYMRGLELPFDDSELTPADMTPTQTGGLTSPPTTAGSS  185
            L++  A   G V  P+ P    ++  GL LP        A++ PT+     +PP      
Sbjct  65   LKKFDAIFFGAVGAPDVPDHITLW--GLRLPICQGFDQYANVRPTKVLPGITPPLRNCGP  122

Query  186  GKMRRIVSASDSETTYETEDEQGRPLSVRRATTHTATVTVSTPSEVVVNVGSPPRQAAPR  245
            G +  ++   +SE  Y               + H        P EV   V    R    R
Sbjct  123  GDLDWVIVRENSEGEY---------------SGHGGRAHKGLPEEVGTEVAIFTRVGVTR  167

Query  246  AAEAATAIAQAAPRAAEAVASTSDDQAPAM  275
                A  +AQA PR    V + S+ Q   M
Sbjct  168  IMRYAFKLAQARPRKLLTVVTKSNAQRHGM  197


>sp|O34296.1|TTUC3_AGRVI RecName: Full=Probable tartrate dehydrogenase/decarboxylase ttuC'; 
Short=TDH; AltName: Full=D-malate dehydrogenase [decarboxylating]
Length=358

 Score = 39.3 bits (90),  Expect = 0.036, Method: Compositional matrix adjust.
 Identities = 36/150 (24%), Positives = 53/150 (35%), Gaps = 17/150 (11%)

Query  126  LRELAARLNGGVADPNTPQQEAMYMRGLELPFDDSELTPADMTPTQTGGLTSPPTTAGSS  185
            L++  A   G V  P+ P    ++  GL LP        A++ PT+     +PP      
Sbjct  65   LKKFDAIFFGAVGAPDVPDHITLW--GLRLPICQGFDQYANVRPTKILPGITPPLRNCGP  122

Query  186  GKMRRIVSASDSETTYETEDEQGRPLSVRRATTHTATVTVSTPSEVVVNVGSPPRQAAPR  245
            G +  ++   +SE  Y               + H        P EV   V    R    R
Sbjct  123  GDLDWVIVRENSEGEY---------------SGHGGRAHRGLPEEVGTEVAIFTRVGVTR  167

Query  246  AAEAATAIAQAAPRAAEAVASTSDDQAPAM  275
                A  +AQA PR    V + S+ Q   M
Sbjct  168  IMRYAFKLAQARPRKLLTVVTKSNAQRHGM  197


>sp|O34295.1|TTUC5_AGRVI RecName: Full=Probable tartrate dehydrogenase/decarboxylase ttuC'; 
Short=TDH; AltName: Full=D-malate dehydrogenase [decarboxylating]
Length=358

 Score = 38.9 bits (89),  Expect = 0.037, Method: Compositional matrix adjust.
 Identities = 36/150 (24%), Positives = 53/150 (35%), Gaps = 17/150 (11%)

Query  126  LRELAARLNGGVADPNTPQQEAMYMRGLELPFDDSELTPADMTPTQTGGLTSPPTTAGSS  185
            L++  A   G V  P+ P    ++  GL LP        A++ PT+     +PP      
Sbjct  65   LKKFDAIFFGAVGAPDVPDHITLW--GLRLPICQGFDQYANVRPTKILPGITPPLRNCGP  122

Query  186  GKMRRIVSASDSETTYETEDEQGRPLSVRRATTHTATVTVSTPSEVVVNVGSPPRQAAPR  245
            G +  ++   +SE  Y               + H        P EV   V    R    R
Sbjct  123  GDLDWVIVRENSEGEY---------------SGHGGRAHRGLPEEVGTEVAIFTRVGVTR  167

Query  246  AAEAATAIAQAAPRAAEAVASTSDDQAPAM  275
                A  +AQA PR    V + S+ Q   M
Sbjct  168  IMRYAFKLAQARPRKLLTVVTKSNAQRHGM  197


>sp|Q44471.1|TTUC1_AGRVI RecName: Full=Probable tartrate dehydrogenase/decarboxylase ttuC; 
Short=TDH; AltName: Full=D-malate dehydrogenase [decarboxylating]
Length=364

 Score = 38.9 bits (89),  Expect = 0.038, Method: Compositional matrix adjust.
 Identities = 36/150 (24%), Positives = 53/150 (35%), Gaps = 17/150 (11%)

Query  126  LRELAARLNGGVADPNTPQQEAMYMRGLELPFDDSELTPADMTPTQTGGLTSPPTTAGSS  185
            L++  A   G V  P+ P    ++  GL LP        A++ PT+     +PP      
Sbjct  65   LKKFDAIFFGAVGAPDVPDHITLW--GLRLPICQGFDQYANVRPTKILPGITPPLRNCGP  122

Query  186  GKMRRIVSASDSETTYETEDEQGRPLSVRRATTHTATVTVSTPSEVVVNVGSPPRQAAPR  245
            G +  ++   +SE  Y               + H        P EV   V    R    R
Sbjct  123  GDLDWVIVRENSEGEY---------------SGHGGRAHRGLPEEVGTEVAIFTRVGVTR  167

Query  246  AAEAATAIAQAAPRAAEAVASTSDDQAPAM  275
                A  +AQA PR    V + S+ Q   M
Sbjct  168  IMRYAFKLAQARPRKLLTVVTKSNAQRHGM  197


>sp|Q9SIB9.2|ACO2M_ARATH RecName: Full=Aconitate hydratase 2, mitochondrial; Short=Aconitase 
2; AltName: Full=Citrate hydro-lyase 2; Flags: Precursor
Length=990

 Score = 36.6 bits (83),  Expect = 0.21, Method: Compositional matrix adjust.
 Identities = 22/58 (38%), Positives = 26/58 (45%), Gaps = 4/58 (7%)

Query  104  HVARAIAEEAAAPDETLGMIGDLRELAARLNGGVADPNTPQQEAMYMRGLELPFDDSE  161
            HV     +     DET+ MI    E   R N    D N PQQ+ +Y   LEL  DD E
Sbjct  410  HVTLQYLKLTGRSDETVAMI----EAYLRANNMFVDYNEPQQDRVYSSYLELNLDDVE  463


>sp|A4IRU1.1|EZRA_GEOTN RecName: Full=Septation ring formation regulator EzrA
Length=567

 Score = 34.7 bits (78),  Expect = 0.73, Method: Compositional matrix adjust.
 Identities = 22/60 (37%), Positives = 29/60 (48%), Gaps = 5/60 (8%)

Query  72   PQLLSSSSTSMSPA----LPSGSVAMSPSPAVGRPSHVARAIAEEAAAPDETLGMIGDLR  127
            P LLS   TS+ PA    L  G   M  S  +    H+ R +AE+     + L MIG+LR
Sbjct  216  PALLSECQTSL-PAQLAELADGYREMEQSGYILDHLHIERTLAEKQEKIGQCLAMIGELR  274


>sp|O04916.1|ACOC_SOLTU RecName: Full=Aconitate hydratase, cytoplasmic; Short=Aconitase; 
AltName: Full=Citrate hydro-lyase
Length=616

 Score = 33.1 bits (74),  Expect = 2.6, Method: Compositional matrix adjust.
 Identities = 20/58 (34%), Positives = 24/58 (41%), Gaps = 4/58 (7%)

Query  104  HVARAIAEEAAAPDETLGMIGDLRELAARLNGGVADPNTPQQEAMYMRGLELPFDDSE  161
            HV     +     DE +GM+    E   R N    D N PQQE +Y   L L   D E
Sbjct  36   HVTLEYLKLTGRSDEIVGMV----EAYLRANNMFVDYNEPQQEKVYSSYLNLDLADVE  89


>sp|P19598.2|MSP1_PLAF3 RecName: Full=Merozoite surface protein 1; AltName: Full=Merozoite 
surface antigens; AltName: Full=PMMSA; AltName: Full=p190; 
Flags: Precursor
Length=1682

 Score = 32.7 bits (73),  Expect = 2.7, Method: Composition-based stats.
 Identities = 33/117 (28%), Positives = 53/117 (45%), Gaps = 24/117 (21%)

Query  41   EPED-HGTAATPTPLSSAAAWCTRAMLSSSSNPQLLSSSSTSMSPALPSGSVAMSPSPAV  99
            EP+   GT++T +P ++       A  S+S N Q  +SS+ + +       VA+S  PAV
Sbjct  866  EPKQITGTSSTSSPGNTTVNTAQSATHSNSQNQQSNASSTNTQN------GVAVSSGPAV  919

Query  100  GRPSHVARAIAEEAAAPDETLGMIGDLRELAARLNGG----VADP---NTPQQEAMY  149
                       EE+  P   L +  DL+ + + LN G    V +P   +T + E  Y
Sbjct  920  ----------VEESHDPLTVLSISNDLKGIVSLLNLGNKTKVPNPLTISTTEMEKFY  966


>sp|P08569.3|MSP1_PLAFM RecName: Full=Merozoite surface protein 1; AltName: Full=Merozoite 
surface antigens; AltName: Full=PMMSA; AltName: Full=p190; 
Flags: Precursor
Length=1701

 Score = 32.7 bits (73),  Expect = 2.7, Method: Composition-based stats.
 Identities = 31/111 (28%), Positives = 50/111 (45%), Gaps = 23/111 (21%)

Query  46   GTAATPTPLSSAAAWCTRAMLSSSSNPQLLSSSSTSMSPALPSGSVAMSPSPAVGRPSHV  105
            GT++T +P ++       A  S+S N Q  +SS+ + +       VA+S  PAV      
Sbjct  890  GTSSTSSPGNTTVNTAQSATHSNSQNQQSNASSTNTQN------GVAVSSGPAV------  937

Query  106  ARAIAEEAAAPDETLGMIGDLRELAARLNGG----VADP---NTPQQEAMY  149
                 EE+  P   L +  DL+ + + LN G    V +P   +T + E  Y
Sbjct  938  ----VEESHDPLTVLSISNDLKGIVSLLNLGNKTKVPNPLTISTTEMEKFY  984


>sp|P13819.1|MSP1_PLAFF RecName: Full=Merozoite surface protein 1; AltName: Full=Merozoite 
surface antigens; AltName: Full=PMMSA; Flags: Precursor
Length=1701

 Score = 32.7 bits (73),  Expect = 2.7, Method: Composition-based stats.
 Identities = 31/111 (28%), Positives = 50/111 (45%), Gaps = 23/111 (21%)

Query  46   GTAATPTPLSSAAAWCTRAMLSSSSNPQLLSSSSTSMSPALPSGSVAMSPSPAVGRPSHV  105
            GT++T +P ++       A  S+S N Q  +SS+ + +       VA+S  PAV      
Sbjct  890  GTSSTSSPGNTTVNTAQSATHSNSQNQQSNASSTNTQN------GVAVSSGPAV------  937

Query  106  ARAIAEEAAAPDETLGMIGDLRELAARLNGG----VADP---NTPQQEAMY  149
                 EE+  P   L +  DL+ + + LN G    V +P   +T + E  Y
Sbjct  938  ----VEESHDPLTVLSISNDLKGIVSLLNLGNKTKVPNPLTISTTEMEKFY  984


>sp|P50495.1|MSP1_PLAFP RecName: Full=Merozoite surface protein 1; AltName: Full=Gp195; 
AltName: Full=Merozoite surface antigens; AltName: Full=PMMSA; 
Flags: Precursor
Length=1726

 Score = 32.7 bits (73),  Expect = 2.7, Method: Composition-based stats.
 Identities = 31/111 (28%), Positives = 50/111 (45%), Gaps = 23/111 (21%)

Query  46    GTAATPTPLSSAAAWCTRAMLSSSSNPQLLSSSSTSMSPALPSGSVAMSPSPAVGRPSHV  105
             GT++T +P ++       A  S+S N Q  +SS+ + +       VA+S  PAV      
Sbjct  915   GTSSTSSPGNTTVNTAQSATHSNSQNQQSNASSTNTQN------GVAVSSGPAV------  962

Query  106   ARAIAEEAAAPDETLGMIGDLRELAARLNGG----VADP---NTPQQEAMY  149
                  EE+  P   L +  DL+ + + LN G    V +P   +T + E  Y
Sbjct  963   ----VEESHDPLTVLSISNDLKGIVSLLNLGNKTKVPNPLTISTTEMEKFY  1009


>sp|P04934.2|MSP1_PLAFC RecName: Full=Merozoite surface protein 1; AltName: Full=Merozoite 
surface antigens; AltName: Full=PMMSA; AltName: Full=p195; 
Flags: Precursor
Length=1726

 Score = 32.7 bits (73),  Expect = 2.7, Method: Composition-based stats.
 Identities = 31/111 (28%), Positives = 50/111 (45%), Gaps = 23/111 (21%)

Query  46    GTAATPTPLSSAAAWCTRAMLSSSSNPQLLSSSSTSMSPALPSGSVAMSPSPAVGRPSHV  105
             GT++T +P ++       A  S+S N Q  +SS+ + +       VA+S  PAV      
Sbjct  915   GTSSTSSPGNTTVNTAQSATHSNSQNQQSNASSTNTQN------GVAVSSGPAV------  962

Query  106   ARAIAEEAAAPDETLGMIGDLRELAARLNGG----VADP---NTPQQEAMY  149
                  EE+  P   L +  DL+ + + LN G    V +P   +T + E  Y
Sbjct  963   ----VEESHDPLTVLSISNDLKGIVSLLNLGNKTKVPNPLTISTTEMEKFY  1009


>sp|Q2QSB9.2|AMPL1_ORYSJ RecName: Full=Putative leucine aminopeptidase 1; AltName: Full=Leucyl 
aminopeptidase 1; Short=LAP 1; AltName: Full=Proline 
aminopeptidase 1; AltName: Full=Prolyl aminopeptidase 1
Length=542

 Score = 32.0 bits (71),  Expect = 4.9, Method: Compositional matrix adjust.
 Identities = 23/68 (34%), Positives = 32/68 (47%), Gaps = 0/68 (0%)

Query  235  VGSPPRQAAPRAAEAATAIAQAAPRAAEAVASTSDDQAPAMSAEARALIASTSQSTLERT  294
            +G    Q   R  + A  ++ A   A E V S ++   PA+ AE  + IAS+    L  T
Sbjct  186  IGFGSGQEMGRKLQYANHVSSAVIFAKELVNSPANVLTPAVLAEEASNIASSYSDVLTAT  245

Query  295  RRDEEKMR  302
              DEEK R
Sbjct  246  ILDEEKCR  253


>sp|O00160.3|MYO1F_HUMAN RecName: Full=Myosin-If; AltName: Full=Myosin-Ie
Length=1098

 Score = 31.6 bits (70),  Expect = 7.1, Method: Composition-based stats.
 Identities = 30/114 (26%), Positives = 41/114 (36%), Gaps = 2/114 (2%)

Query  21    DREYGPAVTPMREVPVSCSSEPEDHGTAATPTPLSSAAAWCTRAMLSSSSNPQLLSSSST  80
             DR   P       +P+   S    H     P   S  A+   RA   S  N + L+    
Sbjct  954   DRNGVPPSARGGPLPLEIMSGGGTHRPPRGPPSTSLGASRRPRARPPSEHNTEFLNVPDQ  1013

Query  81    SMSPALPSGSVAMSPSPAVGRPSHVARAIAEEAAAPDETLGMIGDLRELAARLN  134
              M+      SV   P P VGRP    R       A  + +G   D+ EL+  +N
Sbjct  1014  GMAGMQRKRSVGQRPVPGVGRPKPQPRTHGPRCRALYQYVGQ--DVDELSFNVN  1065


>sp|Q0CM45.1|PURA_ASPTN RecName: Full=Adenylosuccinate synthetase; Short=AMPSase; Short=AdSS; 
AltName: Full=IMP--aspartate ligase
Length=424

 Score = 31.6 bits (70),  Expect = 7.4, Method: Compositional matrix adjust.
 Identities = 24/89 (27%), Positives = 37/89 (42%), Gaps = 2/89 (2%)

Query  34   VPVSCSSEPEDHGTAATPTPLSSAAAWCTRAMLSSSSNPQLLSSSSTSMSPALPSGSVAM  93
            V +   S+  D G       LS  A  C RA    ++   ++  S+T     LPSG ++ 
Sbjct  3    VTIVLGSQWGDEGKGKITDMLSQQATLCCRAAGGHNAGHTIVHGSNTYDFHILPSGLISP  62

Query  94   SPSPAVGRPS--HVARAIAEEAAAPDETL  120
            S    +G  +  HV     E AA  ++ L
Sbjct  63   SCVNLIGAGTVVHVPSFFKELAALEEKGL  


---------------------------------------------------------------------------------------------------

3) BLASTx vs NR 

                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

ref|XP_003057870.1|  predicted protein [Micromonas pusilla CCM...  43.9    0.033
ref|ZP_06589285.1|  oxygenase [Streptomyces albus J1074] >gb|E...  40.8    0.28 
gb|AAR10035.1|  similar to Drosophila melanogaster mtacp1 [Dro...  40.0    0.48 
emb|CAF93350.1|  unnamed protein product [Tetraodon nigroviridis]  40.0    0.48 
ref|XP_003058352.1|  chloroplast envelope protein translocase ...  38.1    1.8  
ref|XP_001234659.1|  PREDICTED: laminin, gamma 1 (formerly LAM...  37.4    3.1  
gb|EFW46586.1|  FAM72A protein [Capsaspora owczarzaki ATCC 30864]  37.0    4.0  
ref|ZP_06473344.1|  hypothetical protein FsymDgDRAFT_0602 [Fra...  37.0    4.0  
ref|XP_003059090.1|  predicted protein [Micromonas pusilla CCM...  37.0    4.0  
gb|AAK81683.1|  unknown [Burkholderia cepacia]                     37.0    4.0  
ref|ZP_00047615.1|  COG0514: Superfamily II DNA helicase [Magn...  37.0    4.0  
ref|XP_003120506.1|  PREDICTED: hypothetical protein LOC100506...  36.2    6.9  
ref|XP_003118575.1|  PREDICTED: hypothetical protein LOC100506...  36.2    6.9  
ref|YP_001749714.1|  hypothetical protein PputW619_2853 [Pseud...  35.8    9.0  
emb|CAG06733.1|  unnamed protein product [Tetraodon nigroviridis]  35.8    9.0  

ALIGNMENTS
>ref|XP_003057870.1| predicted protein [Micromonas pusilla CCMP1545]
 gb|EEH57821.1| predicted protein [Micromonas pusilla CCMP1545]
Length=307

 Score = 43.9 bits (102),  Expect = 0.033
 Identities = 31/95 (33%), Positives = 40/95 (42%), Gaps = 12/95 (13%)
 Frame = +3

Query  183  HEGDAQFVVQPTAPLQLVHLDVSGAAVRKCGHVAVACSRPPVTCGQGDCGGGSCTRRDPW  362
            H G  + V  P   ++  H D S A +  C  V V      ++  Q  CGGG   R  P 
Sbjct  212  HGGPGERVDCPCFGVRDSHYDNSCAELHACEDVKV------LSAAQTGCGGGGPPRPGPG  265

Query  363  HDWGPER------ARCPPQRWSRRSEHTAAGGHVH  449
               GP        ARC P    RR  H+++GGH H
Sbjct  266  PGPGPHHRTCRDGARCCPDSGCRRCSHSSSGGHRH  300


>ref|ZP_06589285.1| oxygenase [Streptomyces albus J1074]
 gb|EFE79746.1| oxygenase [Streptomyces albus J1074]
Length=388

 Score = 40.8 bits (94),  Expect = 0.28
 Identities = 28/99 (28%), Positives = 37/99 (37%), Gaps = 26/99 (26%)
 Frame = -3

Query  388  RALSGPQSCQGSRLVQLPPPQSPWPHVTGGLLQATATWPHFRTAAPETSRWTSWRGAVGW  209
            R  SG  S +G+R    PPP                     R + P +   T+WRG   W
Sbjct  275  RCASGGASSRGTRPTSTPPPPG-------------------RASTPASRTRTTWRGNWPW  315

Query  208  TTN*ASPSCTTLPHSRAVSESLPSRDPLARCCTTPAPRA  92
                    CT  P  R  + + PS  P AR C+ P  R+
Sbjct  316  W-------CTATPTPRCSTPTTPSASPSARPCSPPPARS  347


>gb|AAR10035.1| similar to Drosophila melanogaster mtacp1 [Drosophila yakuba]
Length=110

 Score = 40.0 bits (92),  Expect = 0.48
 Identities = 23/69 (33%), Positives = 33/69 (48%), Gaps = 0/69 (0%)
 Frame = -3

Query  316  PHVTGGLLQATATWPHFRTAAPETSRWTSWRGAVGWTTN*ASPSCTTLPHSRAVSESLPS  137
            P +T  L QA++T     +   +   WT+WR +  W T+  S S T +P S     +L S
Sbjct  42   PSMTSQLSQASSTLSRTSSTTWDWIPWTTWRSSWPWRTSLDSRSPTLMPRSCLNLPTLLS  101

Query  136  RDPLARCCT  110
              P  R CT
Sbjct  102  TSPTRRMCT  110


>emb|CAF93350.1| unnamed protein product [Tetraodon nigroviridis]
Length=1350

 Score = 40.0 bits (92),  Expect = 0.48
 Identities = 39/136 (29%), Positives = 50/136 (37%), Gaps = 18/136 (13%)
 Frame = -3

Query  496   RLESARCHRKEARGLSCTWPPAAVCSDRRLH--R*GGQRALSGPQSCQGSRLVQLPPPQS  323
             RL   + HR+  R L    PP  +C DRR    R  G   + GP   +   L  L     
Sbjct  934   RLGFGKVHRRRLRPLGTADPPERLCVDRRGRPARVSGAGGVPGPVVPERRCLPGLLAVPP  993

Query  322   PWPHVTGGLLQATATWPHFRTAAPETSRWTSWRGAVGWTTN*ASPSCTTLPHSRAVSESL  143
              W  V+G LL  T+T    R  AP    W        W++   S S  ++P         
Sbjct  994   RWSTVSGALLARTST----RAWAPGCDTWGKTSSRPSWSSLACSFSGRSVPR--------  1041

Query  142   PSRDPLARCCTTPAPR  95
                 P AR C    PR
Sbjct  1042  ----PCARRCCGAWPR  1053


>ref|XP_003058352.1| chloroplast envelope protein translocase family [Micromonas pusilla 
CCMP1545]
 gb|EEH56807.1| chloroplast envelope protein translocase family [Micromonas pusilla 
CCMP1545]
Length=573

 Score = 38.1 bits (87),  Expect = 1.8
 Identities = 22/72 (31%), Positives = 32/72 (44%), Gaps = 6/72 (8%)
 Frame = -3

Query  223  GAVGWTTN*ASPSCTTLPHSRAVSESLPSRDPLARCCTTPAPRALA*RQVHTHGPIRVTP  44
            G  GWT + AS  CT++P   A   ++ +     R C TP P A+A       G + V P
Sbjct  173  GYHGWTFDGASGKCTSIPQLPATGNAIDTALASPRSCVTPYPTAVA------QGMLFVLP  226

Query  43   ARCSRDQQSKRP  8
                +   + RP
Sbjct  227  VSALKSDPASRP  238


>ref|XP_001234659.1| PREDICTED: laminin, gamma 1 (formerly LAMB2) [Gallus gallus]
Length=1604

 Score = 37.4 bits (85),  Expect = 3.1
 Identities = 37/135 (27%), Positives = 48/135 (36%), Gaps = 41/135 (30%)
 Frame = -3

Query  493  LESARCHRKEARGLSCTWPPAAVCSDRRLHR*GGQRALSGPQSCQGSRLVQLPPPQSPW-  317
            L + R HR+     +C  PP+  C                             PP SP  
Sbjct  71   LRADRRHRRNPSRATCATPPSPTCG--------------------------TAPPSSPTT  104

Query  316  ---PHVTGGLLQATATWPHFRTAAPETSRWTSWRGAVGWTTN*ASPSC--TTLPHSRAVS  152
               P   GG  +A A WP   T  P TS  TS R          SP+C  ++ P  +  S
Sbjct  105  TDRPTAPGG--RARACWPACSTPPPSTSPCTSERPL-------TSPTCGXSSTPAGQRAS  155

Query  151  ESLPSRDPLARCCTT  107
             S  +R  +AR C T
Sbjct  156  PSTNARGRMARGCPT  170


>gb|EFW46586.1| FAM72A protein [Capsaspora owczarzaki ATCC 30864]
Length=360

 Score = 37.0 bits (84),  Expect = 4.0
 Identities = 26/93 (28%), Positives = 39/93 (42%), Gaps = 9/93 (10%)
 Frame = -3

Query  355  SRLVQLPPPQSPWPHVTGGLLQATATWPHFRTAAPETSRWTSWRGAVGWTTN*ASPSCTT  176
            +RL +LP P +P  +  GG+   T+  PH R        W  W G   W          T
Sbjct  10   TRLARLPQPSTPCSN--GGVADETSLQPHMRRHLEHQEHW--WSGVSAWQQQ--QQQQHT  63

Query  175  LPH---SRAVSESLPSRDPLARCCTTPAPRALA  86
            +     S +V+     ++P  R   TPAP ++A
Sbjct  64   IGRDVLSSSVTNQQARQEPSPRARATPAPASVA  96


>ref|ZP_06473344.1| hypothetical protein FsymDgDRAFT_0602 [Frankia symbiont of Datisca 
glomerata]
 gb|EFD29871.1| hypothetical protein FsymDgDRAFT_0602 [Frankia symbiont of Datisca 
glomerata]
Length=155

 Score = 37.0 bits (84),  Expect = 4.0
 Identities = 24/68 (35%), Positives = 33/68 (49%), Gaps = 0/68 (0%)
 Frame = -2

Query  251  GDIEVDELERSCGLDDELSIALVHHAAALESGVGVAAVP*SSGSLLHDTGTSRIGVTAGP  72
            G ++VD +  S G  D   +AL  H   L+  V V     S G +L   GT+++GV   P
Sbjct  83   GRLQVDSIRISTGSPDWAFVALRPHEYDLDPAVAVLRWNVSGGWVLDQVGTAQVGVGRVP  142

Query  71   YSRSDTGD  48
             S SD  D
Sbjct  143  VSVSDEFD  150


>ref|XP_003059090.1| predicted protein [Micromonas pusilla CCMP1545]
 gb|EEH56222.1| predicted protein [Micromonas pusilla CCMP1545]
Length=408

 Score = 37.0 bits (84),  Expect = 4.0
 Identities = 48/178 (27%), Positives = 69/178 (39%), Gaps = 20/178 (11%)
 Frame = +3

Query  96   RGAGVVQQRARGS-------RDGSDSDTALEC----GSVVHEGDAQFVVQPTAPLQLVHL  242
            RGA V    A G+       R  +  D A+E     G+  H  +A+    PTA  +  H 
Sbjct  87   RGADVAAVDADGNTPAHLLARHAAVRDRAVERLRKHGASAHARNARGET-PTALAERAH-  144

Query  243  DVSGAAVRKCGHVAVACSRPPVTCGQGDCGGGSCTRRDPWHDWGPERARCPP---QRWSR  413
            +   A+ R   H      R     G    GGG     D +  +G   AR      +RW  
Sbjct  145  EAHAASAR--AHAERRAERRRAESGGRGLGGGGGDD-DGYEGFGDASARAAANEARRWRE  201

Query  414  RSEHTA-AGGHVHERPRASFR*QRADSSRHDSDANGRADEPSHYRWLERQDAAHRVCE  584
            +    A AGG + E        +  +S R   ++ G  D+P  YRW E ++A  R  E
Sbjct  202  KLMDAAFAGGEIDEFGGGGGGVEAGESHRAFFESYGDEDDPDAYRWREEEEAYRRYVE  259


>gb|AAK81683.1| unknown [Burkholderia cepacia]
Length=176

 Score = 37.0 bits (84),  Expect = 4.0
 Identities = 37/149 (25%), Positives = 55/149 (37%), Gaps = 20/149 (13%)
 Frame = -3

Query  565  ASCRSSQR*WEGSSALPFASESCR-------LESARCHRKEARGLSCTWPPAAVCSDRRL  407
            A+ R ++  W  S  +P    +C        + +  C    ARG  C  P A  C+ R  
Sbjct  26   AAARPTRSGWRASWPVPCRRRACMCWSTPRPIRAIVCATTRARGCRCR-PSACTCAPR--  82

Query  406  HR*GGQRALSGPQSCQGSRLVQLPPPQSPWPHVTGGLLQATATWPHFRTAAPETSRWTSW  227
                 QRA  GP+    S  +  P P          + ++T   P    AA  TSRW   
Sbjct  83   -----QRASYGPRPSNCSAAISWPTPSCRCL-----ISRSTFPAPASTGAAWATSRWVPA  132

Query  226  RGAVGWTTN*ASPSCTTLPHSRAVSESLP  140
               +      ASP  T+     A + ++P
Sbjct  133  SPGITRRNGTASPRWTSCCRRAATTRTMP  161


>ref|ZP_00047615.1| COG0514: Superfamily II DNA helicase [Magnetospirillum magnetotacticum 
MS-1]
Length=387

 Score = 37.0 bits (84),  Expect = 4.0
 Identities = 31/97 (32%), Positives = 44/97 (45%), Gaps = 7/97 (7%)
 Frame = +3

Query  261  VRKCGHVAVACSRPPVTCGQGDCGGGSCTRRDPWHDWGPERARCPPQRWSRRSEHTA--A  434
            +R+ GH A+  S    T G+G     + T  +   +    RA     R  RR +  A   
Sbjct  282  IRRNGHAALKTSASGATAGRGWAARCAGTADEACRNTMASRATDRKARTGRRQDRAANQQ  341

Query  435  GGHVHERPRASFR*QRADSSRHDSDANGRADEPSHYR  545
             G  H RPR     +RA S  ++S+ANGRA  P+  R
Sbjct  342  KGSEH-RPRR----KRAKSKAYNSEANGRASLPTEPR  373


>ref|XP_003120506.1| PREDICTED: hypothetical protein LOC100506187, partial [Homo sapiens]
Length=120

 Score = 36.2 bits (82),  Expect = 6.9
 Identities = 33/90 (37%), Positives = 39/90 (43%), Gaps = 7/90 (8%)
 Frame = +3

Query  261  VRKCGHVAVACSRPPVTCGQGDCGGGSCTRRDPWHDWGPERAR---CPPQRWSRRSEHTA  431
            V +CG     CS  P T      GGGS  R DP   +G E AR       R   R+EH A
Sbjct  3    VLRCGPQRTWCSANPETQTPCPGGGGSGGRGDPHAVFGSEAARRAGSAGDRCGNRAEH-A  61

Query  432  AGGHVHER--PRASFR*QRADSSRHDSDAN  515
             GGH   R  P+  F  ++    R DS  N
Sbjct  62   LGGHFVTRIYPQNQFE-KKGGKKRPDSVLN  90


>ref|XP_003118575.1| PREDICTED: hypothetical protein LOC100506187, partial [Homo sapiens]
Length=120

 Score = 36.2 bits (82),  Expect = 6.9
 Identities = 33/90 (37%), Positives = 39/90 (43%), Gaps = 7/90 (8%)
 Frame = +3

Query  261  VRKCGHVAVACSRPPVTCGQGDCGGGSCTRRDPWHDWGPERAR---CPPQRWSRRSEHTA  431
            V +CG     CS  P T      GGGS  R DP   +G E AR       R   R+EH A
Sbjct  3    VLRCGPQRTWCSANPETQTPCPGGGGSGGRGDPHAVFGSEAARRAGSAGDRCGNRAEH-A  61

Query  432  AGGHVHER--PRASFR*QRADSSRHDSDAN  515
             GGH   R  P+  F  ++    R DS  N
Sbjct  62   LGGHFVTRIYPQNQFE-KKGGKKRPDSVLN  90


>ref|YP_001749714.1| hypothetical protein PputW619_2853 [Pseudomonas putida W619]
 gb|ACA73345.1| conserved hypothetical protein [Pseudomonas putida W619]
Length=173

 Score = 35.8 bits (81),  Expect = 9.0
 Identities = 26/88 (30%), Positives = 34/88 (39%), Gaps = 11/88 (13%)
 Frame = +3

Query  267  KCGHVAVACSRPPVTCGQGDCGGGSCTRRDPWHDWGPERAR----CPPQRWSRRSEHTAA  434
            KCG  A   ++    CG+G CG  S  R D  HD    RA      P  +    +     
Sbjct  77   KCGGSAAKATQAEGKCGEGKCGDASFARTDTDHDGRVSRAELLAVAPDAKAEFATIDANQ  136

Query  435  GGHVHERPRASFR*QRADSSRHDSDANG  518
             G++ E     FR       +H  DANG
Sbjct  137  DGYLSEGEVYQFR-------KHQFDANG  157


>emb|CAG06733.1| unnamed protein product [Tetraodon nigroviridis]
Length=2303

 Score = 35.8 bits (81),  Expect = 9.0
 Identities = 32/113 (28%), Positives = 41/113 (36%), Gaps = 12/113 (11%)
 Frame = +3

Query  135  RDGSDSDTALECGSVVHEGDAQFVVQPTAP--LQLVHLDVSGAAVRKCGHVAVACSRPPV  308
            RDGS    + +C   V   DA   +  TA       HL V G   +KC      C  P  
Sbjct  383  RDGSCISNSSKCDQKVDCEDAGDEMNCTATDCSSYFHLGVKGVTFQKC-EFTTLCYAPSW  441

Query  309  TC-GQGDCGG--------GSCTRRDPWHDWGPERARCPPQRWSRRSEHTAAGG  440
             C G  DCG         GS   + P   +     RC P+ W+   E+    G
Sbjct  442  LCDGANDCGDFSDERNCPGSSKEKCPTPFFACPSGRCIPKSWTCDKENDCENG  494