GOS 2144010

From Metagenes
Warning: this metagenomic sequence has been carefully annotated by students during bioinformatics assignments. These quality annotations are therefore the result of a teaching exercise that you are most welcome to amend and extend if necessary!


Sequence
CAMERA AccNum : JCVI_READ_1092343626163
Annotathon code: GOS_2144010
Sample :
  • GPS :15°8'37s; 147°26'6w
  • Polynesia Archipelagos: Rangirora Atoll - Fr. Polynesia
  • Coral Reef Atoll (-1m, 27.3°C, 0.1-0.8 microns)
Authors
Team : Algarve 2011
Username : andfil
Annotated on : 2011-05-30 16:23:05
  • Nunes André Filipe da Conceição Marçal Alves
  • Nunes David Ricardo da Conceição Marçal Alves

Synopsis

  • Taxonomy: Viruses (NCBI info)
    Rank: no rank - Genetic Code: Standard - NCBI Identifier: 10239
    Kingdom: - Phylum: - Class: - Order:
    Viruses;

Genomic Sequence

>JCVI_READ_1092343626163 GOS_2144010 Genomic DNA
TTCTTTAAGCAACTCTTTGTCGTCTTGATCAAAGAAATCATCAATGAAATCTAAGTCTTTCATTCGCGGTACTCTTGAAGGATGTCTAGAACTCGGTTGA
GAGCATCATGTGCCCCTGCTTTCTTCTCGTCGGAGAAGCAGTTGAACATGTCAGTGCTATCATACAGAGCGGTCTTGAGTTTGTAAACTCTAGTTTGAAT
CTCTTGTTTGTTCACATGTCCTCTCGGCATGGGTTTGCTTCTATCATACACGTATGTATAAAAAAAGGGGACCCTGTTGGGGTCCCCTGACATTTCGTGG
AATCGAGATCACATGAGGTTGGTGACCTGGACGCGACGATAGTACATGTTCGCATTTGCGGTGAGGGTTTCGCCATCGGGGGTGCCGTTGTAAGCGCCGT
TGGTGGTGACGAATGGGTTGCTGACCATGCCGTAACGAGTCTTGAAACCAATTTTTGGTTGGAAGTTGTTAGGATCGATCGAGCGAACCATCTGGAGGGG
AACATATGGGCAGTAGAATAGACCTGCGTCATATGGGGAGGTGCCCTTATAACCTACGACATAGTAGTGCTTGTCGCTGAGGTTAGCAGCATAAGGATCA
ACATAGACCTTGATGCGTCCGTTGATGGTGCCAACTGCGAGGTTGCCAGTGTCATCAACTGTACCGATTGCAGGACCGCCTGCACCAGTTAGACCGCTGC
TGTAGTCGAGAACGCCTGCCATTGCCAGTGCGGATGCAACGTCAGCAGAGCAGATCAGGAAGTTGCCCTTTCCTCTACGAGTCTCTTGTGCGATTGCGTT
GCAGTCGCGCTCGATTTGGAAGAGAAGTCCCTTGAATTTCTCAACAGACCAACGACCA

Translation

[137 - 547/858]   indirect strand
>GOS_2144010 Translation [137-547   indirect strand]
MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKGTSPYDAGLFYCPYVPLQMVRSIDPNNFQPKIGFKTRYGMV
SNPFVTTNGAYNGTPDGETLTANANMYYRRVQVTNLM

Annotator commentaries

The GOS_2144010 Genomic DNA sequence has 7 ORFS. The selected ORF is a coding one because it is the larger ORF, because it has a lenht of more than 60 aminoacids in lenght and also because the selected ORF has homologies with other sequences found in the NR database, having an E-value of 1e-72and a score of 275 bits.


The protein encodede by the selected ORF has homologs with an E-value of 1e-72, a score of 275 bits and 1 hit. The protein encoded by the selectd ORF has an E-value of 1e-24. However,the protein encoded by the selected ORF does not have a biological process nor a molecular function because the protein domain found belongs to a family of major capsid Gp23 proteins from T4-like bacteriophages. As one can see, there are neither a biological process nor a molecular function that is suitable for the biological process and molecular function of the protein encoded by the selected ORF in the Annotathon fields for the biological process and molecular function.


Nevertheless, the protein encoded by the selected ORF has a molecular function, which is GO:0019028-viral capsid, having an E-value of 1e-24. Because the biological source of our ORF is a virus, we can not designate a biological process as our biological source is not a living organism.


In the taxonomy report, we can see that the selected ORF has homologs with an E-value of 1e-72 and a score of 275 bits, which means that the selected ORF belongs to the same taxa as the Prochlorococcus phage Syn1.


In the tree drawn by the PhyML method the branch to which belongs the GOS_2144010 Genomic DNA sequence has a branch suport value of 0.87, but as one can see, the branch immediately before has a branch suport value of 0. What that tell us is that the branch in which the GOS_2144010 Genomic DNA sequence is does not exist but that those sequences plus the sequences of the branch before are all from the same branch. In that case, the branch suport value is 0.79.


As for the tree drawn with the BioNJ method the branch to which the the GOS_2144010 Genomic DNA sequence belongs has a branch suport value of 0.8. As the branch immediately before has a branch suport value of 0.9, we can say with a hight

degree of certainty that the branch i which the GOS_2144010 Genomic DNA sequence belongs has significance.


In the tree drawn by the PhyML, the GOS_2144010 Genomic DNA sequence appeares with the Cyanophage_Syn19 sequence, the Synechococcus_phage_S-PM2 sequence and the Cyanophage_Syn1 sequence.


In the tree drawn by the BioNJ, the GOS_2144010 Genomic DNA sequence appeares only with the Cyanophage_Syn1 sequence.

In both trees, the GOS_2144010 Genomic DNA sequence is always present with the Cyanophage_Syn1 sequence, which means that the biological source of the selected ORF probably belongs to the same taxa as the Cyanophage_Syn1 sequence.


However, we note that both trees had to be unrooted because we could not chose sequences for the OUTgroups, we can not say if both trees represent in a reliable manner the true evolutionary history of the taxa of which the biological source of the selected ORF belongs to.

ORF finding

PROTOCOL

a) SMS ORFinder / forward strand / frames 1, 2 & 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code

b) SMS ORFinder / reverse strand / frames 1, 2 & 3 / min 60 AA / 'any codon' initiation / 'standard' genetic code


RESULTS ANALYSIS


The results provided by the SMS ORFinder indicated that there are 3 ORFs in the direct strand (1 ORF in frame 2 and 2 ORFs in frame 3). IN the reverse strand there are 4 ORFs (1 ORF in frame 1, 2 ORF in frame 2 and 1 ORF in frame 3).

The selected ORF was ORF1 in frame 2 in the reverse strand because it was the larger ORF (more than 60aa)in lenght. This ORF was also selected because it is a complete one, it has a START codon at the 5`end , it has a STOP codon downstream of the 3´end and also because the ORF lacks STOP codons inside it.

The START codon of the select ORF is at position 137. The STOP codon (TGA) is at position 548 and the selected ORF has 411 bases in lenght.

Since the ORF under study has an E-value of 1e-72 and a score of 275 bits when we used the BLAST, we can probably say that the ORF has homology with sequences found in the database NR that are not found by chance, which means that the ORF under study has a biological meaning.

The remaining ORFs have no biological significance because the E-values ande the scores for those ORFs are,respectively, larger and smaller than the E-value and score for the selected ORF when we compared them using the BLAST.

The selected ORF is a coding one because it has more than 60 aminoacids in lenght and also because it has convincing homologs when using BLAST.

RAW RESULTS

a) forward strand

No ORFs were found in reading frame 1.

>ORF number 1 in reading frame 2 on the direct strand extends from base 626 to base 835.
TGGTGCCAACTGCGAGGTTGCCAGTGTCATCAACTGTACCGATTGCAGGACCGCCTGCAC
CAGTTAGACCGCTGCTGTAGTCGAGAACGCCTGCCATTGCCAGTGCGGATGCAACGTCAG
CAGAGCAGATCAGGAAGTTGCCCTTTCCTCTACGAGTCTCTTGTGCGATTGCGTTGCAGT
CGCGCTCGATTTGGAAGAGAAGTCCCTTGA

>Translation of ORF number 1 in reading frame 2 on the direct strand.
WCQLRGCQCHQLYRLQDRLHQLDRCCSRERLPLPVRMQRQQSRSGSCPFLYESLVRLRCS
RARFGREVP*

>ORF number 1 in reading frame 3 on the direct strand extends from base 90 to base 290.
AACTCGGTTGAGAGCATCATGTGCCCCTGCTTTCTTCTCGTCGGAGAAGCAGTTGAACAT
GTCAGTGCTATCATACAGAGCGGTCTTGAGTTTGTAAACTCTAGTTTGAATCTCTTGTTT
GTTCACATGTCCTCTCGGCATGGGTTTGCTTCTATCATACACGTATGTATAAAAAAAGGG
GACCCTGTTGGGGTCCCCTGA

>Translation of ORF number 1 in reading frame 3 on the direct strand.
NSVESIMCPCFLLVGEAVEHVSAIIQSGLEFVNSSLNLLFVHMSSRHGFASIIHVCIKKG
DPVGVP*

>ORF number 2 in reading frame 3 on the direct strand extends from base 291 to base 521.
CATTTCGTGGAATCGAGATCACATGAGGTTGGTGACCTGGACGCGACGATAGTACATGTT
CGCATTTGCGGTGAGGGTTTCGCCATCGGGGGTGCCGTTGTAAGCGCCGTTGGTGGTGAC
GAATGGGTTGCTGACCATGCCGTAACGAGTCTTGAAACCAATTTTTGGTTGGAAGTTGTT
AGGATCGATCGAGCGAACCATCTGGAGGGGAACATATGGGCAGTAGAATAG

>Translation of ORF number 2 in reading frame 3 on the direct strand.
HFVESRSHEVGDLDATIVHVRICGEGFAIGGAVVSAVGGDEWVADHAVTSLETNFWLEVV
RIDRANHLEGNIWAVE*

------------------------------------------------------------------------------------------------------------------------
b) reverse strand

>ORF number 1 in reading frame 1 on the reverse strand extends from base 391 to base 615.
CAACTTCCAACCAAAAATTGGTTTCAAGACTCGTTACGGCATGGTCAGCAACCCATTCGT
CACCACCAACGGCGCTTACAACGGCACCCCCGATGGCGAAACCCTCACCGCAAATGCGAA
CATGTACTATCGTCGCGTCCAGGTCACCAACCTCATGTGATCTCGATTCCACGAAATGTC
AGGGGACCCCAACAGGGTCCCCTTTTTTTATACATACGTGTATGA

>Translation of ORF number 1 in reading frame 1 on the reverse strand.
QLPTKNWFQDSLRHGQQPIRHHQRRLQRHPRWRNPHRKCEHVLSSRPGHQPHVISIPRNV
RGPQQGPLFLYIRV*

>ORF number 1 in reading frame 2 on the reverse strand extends from base 2 to base 550.
GGTCGTTGGTCTGTTGAGAAATTCAAGGGACTTCTCTTCCAAATCGAGCGCGACTGCAAC
GCAATCGCACAAGAGACTCGTAGAGGAAAGGGCAACTTCCTGATCTGCTCTGCTGACGTT
GCATCCGCACTGGCAATGGCAGGCGTTCTCGACTACAGCAGCGGTCTAACTGGTGCAGGC
GGTCCTGCAATCGGTACAGTTGATGACACTGGCAACCTCGCAGTTGGCACCATCAACGGA
CGCATCAAGGTCTATGTTGATCCTTATGCTGCTAACCTCAGCGACAAGCACTACTATGTC
GTAGGTTATAAGGGCACCTCCCCATATGACGCAGGTCTATTCTACTGCCCATATGTTCCC
CTCCAGATGGTTCGCTCGATCGATCCTAACAACTTCCAACCAAAAATTGGTTTCAAGACT
CGTTACGGCATGGTCAGCAACCCATTCGTCACCACCAACGGCGCTTACAACGGCACCCCC
GATGGCGAAACCCTCACCGCAAATGCGAACATGTACTATCGTCGCGTCCAGGTCACCAAC
CTCATGTGA

>Translation of ORF number 1 in reading frame 2 on the reverse strand.
GRWSVEKFKGLLFQIERDCNAIAQETRRGKGNFLICSADVASALAMAGVLDYSSGLTGAG
GPAIGTVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKGTSPYDAGLFYCPYVP
LQMVRSIDPNNFQPKIGFKTRYGMVSNPFVTTNGAYNGTPDGETLTANANMYYRRVQVTN
LM*

>ORF number 2 in reading frame 2 on the reverse strand extends from base 551 to base 799.
TCTCGATTCCACGAAATGTCAGGGGACCCCAACAGGGTCCCCTTTTTTTATACATACGTG
TATGATAGAAGCAAACCCATGCCGAGAGGACATGTGAACAAACAAGAGATTCAAACTAGA
GTTTACAAACTCAAGACCGCTCTGTATGATAGCACTGACATGTTCAACTGCTTCTCCGAC
GAGAAGAAAGCAGGGGCACATGATGCTCTCAACCGAGTTCTAGACATCCTTCAAGAGTAC
CGCGAATGA

>Translation of ORF number 2 in reading frame 2 on the reverse strand.
SRFHEMSGDPNRVPFFYTYVYDRSKPMPRGHVNKQEIQTRVYKLKTALYDSTDMFNCFSD
EKKAGAHDALNRVLDILQEYRE*

>ORF number 1 in reading frame 3 on the reverse strand extends from base 306 to base 647.
GTTATAAGGGCACCTCCCCATATGACGCAGGTCTATTCTACTGCCCATATGTTCCCCTCC
AGATGGTTCGCTCGATCGATCCTAACAACTTCCAACCAAAAATTGGTTTCAAGACTCGTT
ACGGCATGGTCAGCAACCCATTCGTCACCACCAACGGCGCTTACAACGGCACCCCCGATG
GCGAAACCCTCACCGCAAATGCGAACATGTACTATCGTCGCGTCCAGGTCACCAACCTCA
TGTGATCTCGATTCCACGAAATGTCAGGGGACCCCAACAGGGTCCCCTTTTTTTATACAT
ACGTGTATGATAGAAGCAAACCCATGCCGAGAGGACATGTGA

>Translation of ORF number 1 in reading frame 3 on the reverse strand.
VIRAPPHMTQVYSTAHMFPSRWFARSILTTSNQKLVSRLVTAWSATHSSPPTALTTAPPM
AKPSPQMRTCTIVASRSPTSCDLDSTKCQGTPTGSPFFIHTCMIEANPCREDM*

Multiple Alignement

PROTOCOL

MUSCLE / default parameters EBI, output order _ input


RESULTS ANALYSIS

The GOS sequence is considerably smaller than the selected sequences for Ingroups, the GOS sequence only begins at the 434th codon and ends at the 632nd codon.

When using the MSA, one can see that the program show us 4 conserved protein domains instead of just 1 as it was with the InterProScan. We can also say that despite the fact that the number of protein domais found by both programs was different, if we compare the lenght of the protein domain found by the InterProScan with the combined lenght of all of the protein domains found by the MSA, we can say that they are approximately the same size in lenght, which means it is all the same conserved protein domain.

The MSA analysis did not indicates the start codon, because the sequences do not start all at the same start codon number.

There is one sequence that starts at the 1st codon, there are two sequences that start at the 4th codon, four sequences star at the 5th codon, six sequences start at the 8th codon, two sequences start at the 9th codon and there is one that starts at the 12th codon.

The stop codon is not the same for all the sequences also, but the stop codon has much more consensus that the start codon.

The stop codon ends at the 629th codon for six sequences, there are nine sequences that end at the 630th codon and there are three sequences that end at the 631st codon.

The Gos sequence ends at the 630th codon.

The sequences that have been chosen for the multiple alignment despite being much larger that the GOS sequence, when doing the alignment, at the beginning the multiple alignment contains many gaps and the alignment is really bad.

However the alignment gets much better when the GOS sequence starts and it is aligned with the remaining sequences, the number of gaps in the multiple alignment decreases very much and above all, the conserved domains only appear in the multiple alignment after the GOS sequence has started.




RAW RESULTS

                         10        20        30        40        50        60
                 =========+=========+=========+=========+=========+=========+
gi|30044131|ref  ----MSQPKINEQLIEKWQPLLEG--------CRNDWERHTLATLLENQYREAKKH----
gi|294338215|em  -------MDKNVSLNEKVESYIK--DS--RYAALNESEAVLMSTLLSNTALASQ------
gi|312262644|gb  --------MSKKSLLKKWQPLVES-EG--MPAIASMQRKDIVARIFENQDEDIAHNEGGV
gi|34419596|ref  -----------MNLTEKWKDLLEA-EGADMPEIATATKQKIMSKIFENQDRDINNDP--M
gi|294661606|re  -------MSKKNELMEKWNDLLESQEG--LPDIATKSKKQLIAAIMEAQEKDAEVDP--V
gi|311992934|re  ---MSTQIKTKAQLVADWKPLLEA-EG--APEI-AQGKHAIIAKMFENQEADIKSDA--A
gi|311993169|re  -------MKKINPLVEKWTPLLEN-EA--LPEIVGAGKKALIAKIMENQESAIKTEP--A
gi|308814537|re  --------MKKNALVQKWSALLEN-EA--LPEIVGASKQAIIAKIFENQEQDILTAP--E
gi|282598976|re  ---MTQRQLVTEKMRQDWAPVLDN-KD--EYKSKALSRRDIMTRLLENQSQWCNEN----
gi|290457632|sp  ----MSKKLVTEEMRTQWLPVLEK-KS---EQIQPLTAENVSVRLLQNQAEWNAKN----
gi|326804652|re  ----MTKKLVTEEMRKQWLPVLQK-ES---EAIQPLSAENVTIRLMQNQAEWNAKN----
gi|282599333|re  ----MAKKLVTEQMREQWLPVLQK-ES---ESIQPLSAENVAVRLLQNQAEWNAKN----
gi|86372247|gb|  MSDNTEQKSHSELLLEKWDGVLHA-KG--EQEIVSTHRQKVTAQLLENTQEYLAES----
gi|326783094|re  -------MFNAESLQKKWAPVLSH-DG--LPEIKDNYRKSVTAILLENQEKALREE----
gi|326783543|re  --------MSLQQLQEKWAPVLNH-ES--LPEIEDTHKRGVVAQLLENQEKAITEE----
gi|58532919|ref  -------MFNAEHLQEKWSPVLNH-GE--APAIGDRYKRAVTSVLLENQERFLREE----
GOS_2144010      ------------------------------------------------------------
gi|326784102|re  -------MFNAEHLQEKWSPVLNN-EA--ANPIADRYKKAVTSVLLENQERFLREE----
                                                                             


                         70        80        90       100       110       120
                 =========+=========+=========+=========+=========+=========+
gi|30044131|ref  ----------------------LME-------------------TTQTTEVDGWN-LALP
gi|294338215|em  -------------------GALVGE-------------------SVISSDIAKFTPILMP
gi|312262644|gb  YTDQVVVNSMVDVKGRLEEAKALQEANI---GGDHGYDATKIASGEMSGSITNVGPAVMG
gi|34419596|ref  YRDPQLVEAF---------NAGLNEAVV---NGDHGYDPANIAQGVTTGAVTNIGPTVMG
gi|294661606|re  YRDEKIVESF---------GGFLAEAEI---AGDHGYDATKIASGNSSGAITNIGPAVIG
gi|311992934|re  YRDEKLAEAF---------GGFLTEAEI---GGDHGYDPQNIAAGQTSGAVTQIGPAVMG
gi|311993169|re  FRDEKIAEAF---------GSFLTEAEI---GGDHGYDAQNIAAGQTSGAVTQIGPAVMG
gi|308814537|re  YRDEKISEAF---------GSFLTEAEI---GGDHGYDATNIAAGQTSGAVTQIGPAVMG
gi|282598976|re  -------------------RRFLGE---------------AAAPGNSTGAVATWSPVLIS
gi|290457632|sp  ----------------------LGE---------------SEGPSSVNANVGKWQPVLID
gi|326804652|re  ----------------------LGE---------------SDAPGSVNSTVGKWQPVLID
gi|282599333|re  ----------------------LGE---------------SDAPGSVNNSVGKWQPVLID
gi|86372247|gb|  --------------------------------------------SNVTSGVQNWDPVLIA
gi|326783094|re  -------------------RAVLTE--------APTNVGPINTQTTSAGAVDGFDPILIS
gi|326783543|re  -------------------ASVLNE---------TLQTTGYTGASTATGPVAGFDPVLIS
gi|58532919|ref  -------------------RGMLNEVAVNSLGAGTIAPAGSALGSANTGGLAGFDPVLIS
GOS_2144010      ------------------------------------------------------------
gi|326784102|re  -------------------RGMLQEVAVNSLGAGTVSPGGSALGSANTAGLAGFDPVLIS
                                                                             


                        130       140       150       160       170       180
                 =========+=========+=========+=========+=========+=========+
gi|30044131|ref  IVRRVFANLRATDLVSVQPLSLPTGLVFYLDFKSPELPGNGSVYGGTGLTTDTATGGLYD
gi|294338215|em  IVRRVYPALVANQLLGIQPLTMPTGYIYALV----------NRYTGNKKDGAVSPVG---
gi|312262644|gb  LVRRAIPQLIAFDICGVQPMTSSTGQVFTLR----------AIYGGDSQDANAR--E---
gi|34419596|ref  MVRRAIPQLIAFDIAGVQPMTGPTSQVFTLR----------SVYGKDPLTGAEA------
gi|294661606|re  MVRRAIPNLIAFDICGVQPMTGPTGQVFALR----------AVYGKDPLASGAK--E---
gi|311992934|re  MVRRAIPNLIAFDICGVQPMQGPTGQVFALR----------AVYGKDPIAAGAK--E---
gi|311993169|re  MVRRAIPNLIAFDICGVQPMSSPTGQVFALR----------AVYGKDPLAAGAK--E---
gi|308814537|re  MVRRAIPHLIAFDICGVQPLNNPTGQVFALR----------AVYGKDPVAAGAKE-----
gi|282598976|re  MVKRSTPNLIALDFMGTQPLGTPDGLIFAMR----------ARYNNQT--------G---
gi|290457632|sp  MAKRLAPNNIAMDFFGVQPLAGPDGQIFALR----------ARQGVGD-ASNTQQSR---
gi|326804652|re  MAKRLAPINIAMDFFGVQPLSGPDGQIFALR----------ARQGVGD-GSTTAQAR---
gi|282599333|re  MAKRLAPINIAMDFFGVQPLSGPDGQIFALR----------ARQGVGD-SSNTQQSR---
gi|86372247|gb|  MVRRMAPKLIAYDFLGVQPMSAPTGLIFSLK----------ARAAQTPPNGNPQGGP---
gi|326783094|re  LIRRAMPKLIAYDIAGVQPMSGPTGLIFAMR----------SQYTNQS--------G---
gi|326783543|re  LIRRSMPQLIAYDIAGVQPMTGPTGLIFAMR----------TNYGAERDPAASGY-----
gi|58532919|ref  LVRRAMPNLMAYDVCGVQPMSGPTGLIFAMR----------SRYENQA--------G---
GOS_2144010      ------------------------------------------------------------
gi|326784102|re  LVRRAMPNLMAYDVCGVQPMSGPTGLIFAMR----------SRYENQA--------G---
                                                                             


                        190       200       210       220       230       240
                 =========+=========+=========+=========+=========+=========+
gi|30044131|ref  ENARLSRREYETTITVDLATAQQATMRDVGFDTGIASLVSSGAVYYVDVPVASLPGVA--
gi|294338215|em  --------KAQILVFEANVTKGDTVTGTTSTATGKIIHVEKDGK----------TALV--
gi|312262644|gb  --------AFHPAFSPDADFSGRGAQVK--IAEFARGTAFANGA-FAHLFIEAATGVQAG
gi|34419596|ref  ---------FHPTRQADASFSGQAAAST--IADFPTTGAATDGTPYKAEVTTSG-GDV--
gi|294661606|re  --------AFHPMFSPDSMYSGQGAAPSNGFTKLTSAQAIADGAIVFHDFVE--TGRV--
gi|311992934|re  --------AFHPMYAPDAMFSGRGSHEV--FAPLASGTVVAQGTIYKHEFVA--TGTA--
gi|311993169|re  --------AFHPMYAPDAMFSGQGAAEK--FAAVKAADVLTVGTIVVHDFAD--VGRA--
gi|308814537|re  --------AFHPMYAPNAMFSGQGAAQT--FEALAASKVLEVGKIYSHFFEA--TGSA--
gi|282598976|re  --------DEAFYQDIKSGHSGD--------------------------------GTV--
gi|290457632|sp  --------KELFMEEAQTNYSGD-------------------------------QTTV--
gi|326804652|re  --------KELFMNEADSGYSGD--------------------------------GTV--
gi|282599333|re  --------KELFMQEADSGYSGD--------------------------------GTV--
gi|86372247|gb|  --------EILGLDEANTAYSGR--------------------------------GTH--
gi|326783094|re  --------NEAFFDEPDAQFSGTKG------------------------------GTP--
gi|326783543|re  --------DEAFFNEPNAGFSGGPGAYD--------------------------PGAS--
gi|58532919|ref  --------EEALFNEPDTGFTGGYDASQ---------------------------GDY--
GOS_2144010      ------------------------------------------------------------
gi|326784102|re  --------EEALFNEPDAGFTAGLDATT---------------------------GAY--
                                                                             


                        250       260       270       280       290       300
                 =========+=========+=========+=========+=========+=========+
gi|30044131|ref  --DVNTVRFWQYDDASGDPENTVAYPLPRYNRIVGAVGSALYARLFFVTGSDFATVAGGT
gi|294338215|em  --QLTNDKKFQNEEANKGTR-----------IVNVYSNEATFHKILETYSGPYSTA----
gi|312262644|gb  TKTVQFIKDYAIDALPAEQV-----------EAGLAYKWLLAQGYAVETSSAMATAFA--
gi|34419596|ref  --SMRYFLALGAVTLAVAGQ-----------MTATEYTDGVAGGLLVEIDAGMATSQA--
gi|294661606|re  --FLQNVSGAPVTVTGSTDD-----------ALDAAVIAEQEKGTLAEISYGMATSVA--
gi|311992934|re  --FLQATGAVTL-ATTADAA-----------ELDAEVIKQMDAGILVEIAEGMATSIA--
gi|311993169|re  --YFQVAEGFTVDAGATDAE-----------KLDKAVKAAEEAGQLVEIAEGMATSVA--
gi|308814537|re  --HFQAVEAVTVDAAATDAA-----------KLDAAVTALLEAGKLAELAEGMATSIA--
gi|282598976|re  --DAGDPSGFTKA------------------LVDQFTGGT--SGADPAYGKGMTTAEM--
gi|290457632|sp  --HSGDPSGFSQADIEGSGT-----------EVSS-------------YGKAMDTVKA--
gi|326804652|re  --QAGDPSGFSQAEIEGSGS-----------VVTT-------------IGKGMPSTDA--
gi|282599333|re  --QAGDPSGFTQAEIEGSGA-----------GVTT-------------IGKGMPTTDA--
gi|86372247|gb|  --------------AGDDPF-------------------------DAAFETGFTKTTA--
gi|326783094|re  ----------PTATTEKNPG-----------LINDATGGGTTEGNYDLASSKFTTSEQ--
gi|326783543|re  --------DATNDAEGTNPA-----------LLNDSPAGTYEQTAD---ATGMTTATA--
gi|58532919|ref  --AVRTGAGVGGDSEGNNPA-----------LLNDAAPGT------YEVGSKMPREDL--
GOS_2144010      ------------------------------------------------------------
gi|326784102|re  --TPRTGAGVGGDAEGNNPA-----------LLNDSSPGT------YETPRGFSREDL--
                                                                             


                        310       320       330       340       350       360
                 =========+=========+=========+=========+=========+=========+
gi|30044131|ref  PSTQDLDLVYYIDARNDFEDQSTDPDYPDPGFQSLDIPEINLELRSRPVATKTRKLRAAW
gi|294338215|em  ------------------DGEKLAEDMNTVGFG----------IEKDTVEAKTRKLKAEY
gi|312262644|gb  ------------ELQQGFNGSTGNE-WNEMSFR----------IDKQVVEAKSRQLKAQY
gi|34419596|ref  ------------ELQENFNGSSNNE-WNEMSFR----------IDKQVVEAKSRQLKAQY
gi|294661606|re  ------------ELQEQFNGSTGNP-WNEMGFR----------IDKQVIEARSRQLKAQY
gi|311992934|re  ------------ELQEGFNGSNNNP-WNEMGFR----------IDKQVIEAKSRQLKAQY
gi|311993169|re  ------------ELQENFNGSTDNP-WNEMGFR----------IDKQVIEAKSRQLKAQY
gi|308814537|re  ------------ELQEGFNGSQDNP-WNEMGFR----------IDKQVIEAKSRQLKASY
gi|282598976|re  ------------ET----LGTTGGKVWGKMAVT----------VEKQSVSAKGRGLYADY
gi|290457632|sp  ------------EQ----LGSPTQP-WARVGIT----------IQKATVTAKSRGLYADY
gi|326804652|re  ------------EL----LGTTTNP-WARVGIT----------VQKATVTAKSRGLYADY
gi|282599333|re  ------------EL----LGTTTNP-WARVGIT----------VQKATVTAKSRGLYADY
gi|86372247|gb|  ------------------AGEVDL--WNSVGVT----------IDKVSITAGTRQLRADY
gi|326783094|re  ------------ES---LGDSTSNA-FMEMAFS----------IDRIAVEAKGRALRADY
gi|326783543|re  ---------------EALDDSSSNTAFREMGFS----------IEKVTVTARARALKAEY
gi|58532919|ref  ------------ER----MGEANRL-FREMSFS----------IEKTSVTAQSRALKAEY
GOS_2144010      ------------------------------------------------------------
gi|326784102|re  ------------EQ----AGDAGKL-FREMSFS----------IEKTSVTAKSRALKAEY
                                                                             


                        370       380       390       400       410       420
                 =========+=========+=========+=========+=========+=========+
gi|30044131|ref  TPEAMQDLAAYHKGVDLENEIVTLMSQYIAREIDLEILSTIMAHARRTDNY--GFWSEVV
gi|294338215|em  TLEMYEDLKNQH-GVLADEHLANLIAAEMQTEIDREIINFVNNTATVVADTLSPGHEHKE
gi|312262644|gb  SIEMAQDLRAVH-GLDADSELSSILANEIMHEINREMVLWINATAKVGKTGWTNMHGGKS
gi|34419596|ref  SIELAQDLRAVH-GLDADAELSGILANEVMVELNREIVNLVNSQAQIGKSGWTQG-AGAA
gi|294661606|re  SVELAQDLRAVH-GMDADAELSAILATEIMLEINREIVDMINYTAQVGKTGFTQTVGSKA
gi|311992934|re  SIELAQDLRAVH-GMDADAELANILATEIMLEINREVIDWINYSAQVGKTGQTLTVGSKA
gi|311993169|re  SIELAQDLRAVH-GMDADAELSGILATEIMLEINREVVDWINYSAQLGKTGMTQTVGSKA
gi|308814537|re  SIELAQDLRAVH-GMDADAELSGILATEIMLEINREVIDWINYSAQVGKSGMTNTVGAKA
gi|282598976|re  SHELRQDMAAVH-GEDVDAILSDMLVNEIQAEMNREFIRTLNVSSKLGF--------GGA
gi|290457632|sp  SHELRQDMMAIH-GEDVDAILSDVMVTEIQAEMNREFIRTMNFTAVRFKKF------GTN
gi|326804652|re  SHELRQDMMAIH-GEDVDNILSDVMVTEIQAEMNREFIRTMNFSAVRFKKF------GTN
gi|282599333|re  SHELRQDMMAIH-GEDVDSILSDVMVTEIQAEMNREFIRTMNFSAVRFKKF------GAN
gi|86372247|gb|  SIELAQDMKRVH-GLDADAELVNILSSEIIAEMNREAVRTCYSAAKLGAQF-----ATTA
gi|326783094|re  SVELAQDLKAIH-GLDAESELANILSTEILAEINREVVRTVYRGAKPGA----QANVANA
gi|326783543|re  SIELAQDLKAIH-GLDAEQELANILSTEILAEINREVVRTIYTNAVKGA----QNNTATA
gi|58532919|ref  TLELAQDLKAIH-GLDAEQELANILSSEVLAEINREVVRRVYTVAKKGA----QNNVANA
GOS_2144010      ------------------------------------------------------------
gi|326784102|re  TLELAQDLKAIH-GLDAEQELANILSSEVLAEINREVVRRVYSVAKPGA----ANNVANA
                                                                             


                        430       440       450       460       470       480
                 =========+=========+=========+=========+=========+=========+
gi|30044131|ref  GEYYDETSGNFVAGNFYGSKQEWLATLMIELNKVSNRIQQKTAVAGANFLVTSPQVAALL
gi|294338215|em  A------------GRWEIER---YRCNAIKIDLEARNIGLMTRRGSGNTLLVSPKVATML
gi|312262644|gb  GVFDFQDTKDIRGARWAGES---YKALVVQIDKEANEIARQTGRGQGNFIICSRNVAAAL
gi|34419596|ref  GVFDFSDAVDVKGARWAGEA---YKALLIQIEKEANEIGRQTGRGNGNFIIASRNVVSAL
gi|294661606|re  GAFDFQDPIDVRGARWAGES---YKALLIQIDKEANEIARQTGRGAGNFIIASRNVVSAL
gi|311992934|re  GVFDFQDPIDVRGARWAGES---FKALLFQIDKESAEIARQTGRGAGNFIIASRNVVNVL
gi|311993169|re  GVFDFQDPVDIRGARWAGES---FKALLFQIDKEAAEIARQTGRGAGNFIIASRNVVNAL
gi|308814537|re  GVFDFQDPIDIRGARWAGES---FKALLFQIDKEAAEIARQTGRGAGNFIIASRNVVNVL
gi|282598976|re  GIFDLMTDTD---GRWLVER---LKGLMFRIELEANAIAIDTRRGKGNRLLCSANVASAL
gi|290457632|sp  GVVDVAADVS---GRWALEK---WKYLVFMLEVEANGVGVDTRRGKANRVLCSPNVASAL
gi|326804652|re  GVVDIAQDIS---GRWALEK---WKFLTFMLEVEANGIGVDTRRGKGNRVLCSPNVASAL
gi|282599333|re  GVVDISTDIS---GRWALEK---WKYMTFMLEVEANGIGVDTRRGKGNRVLCSPNVASAL
gi|86372247|gb|  GTFDLNADAD---GRWSVER---FKGLMFAIERDANRIAIESRRGKGNFLIVSSDVASAL
gi|326783094|re  GVFDLDVDSN---GRWSVEK---FKGLMFQIERDANAIAQETRRGKGNVIITSADVASAL
gi|326783543|re  GVFDLDVDSN---GRWSVEK---FKGLLFQIERDANAIGQQTRRGKGNILICSADVASAL
gi|58532919|ref  GIFDLDVDSN---GRWSVEK---FKGLLFQVERDANAIAQETRRGKGNFLICSADVASAL
GOS_2144010      -------------GRWSVEK---FKGLLFQIERDCNAIAQETRRGKGNFLICSADVASAL
gi|326784102|re  GIFDLDVDSN---GRWSVEK---FKGLLFQIERDCNAIAQDTRRGKGNFLICSADVASAL
                                         ####################################


                        490       500       510       520       530       540
                 =========+=========+=========+=========+=========+=========+
gi|30044131|ref  E--SMPGFTPGNDNRDG-------GTGIFYVGMVQ---GRYRLYKNIYQNQ----PVIIM
gi|294338215|em  DQIGTFKFASSSSNIAT----------DVFTGNVGTYDGRYNVIVDQYAKS----DYITV
gi|312262644|gb  GHTDMM-VTPAAQGANT-T-MNTDTTSSLFAGVLA---GKYRVYIDQYAVE----DYFTV
gi|34419596|ref  SMTDTL-VGPAAQGMQD-GSMNTDTNQTVFAGVLG---GRFKVYIDQYAVN----DYFTV
gi|294661606|re  ARIDSG-ITPAGQGLQK-T-LNVDTTKAVFAGVLG---GVYKVYIDQYARG----DYFTV
gi|311992934|re  ASVDTS-VTPAAQGLAR-G-LNTDTTKAVFAGILG---GRYKVYIDQYARQ----DYFTI
gi|311993169|re  AAVDTG-VTPAAQGLGQ-G-FNADTTKTVFAGILG---GRYKVYIDQYARQ----DYFTI
gi|308814537|re  AAVDTS-VSYAAQGLGK-G-FNVDTTKAVFAGVLG---GKYRVYIDQYARA----DYFTI
gi|282598976|re  AMAGMLDFTPAMAANAG---LDVDATGQTFAGVLA---NGMKVFIDPYAVL----DYINI
gi|290457632|sp  AMAGMLDYSPALNVQAQ---LAVDPTGQTFAGVLS---NGMRVYIDPYAVA----EYITL
gi|326804652|re  AMSGMLDYAPVLQENTK---LAVDPTGQTFAGVLS---NGMRVYVDPYAVA----EYITL
gi|282599333|re  AMSGMLDYAPALQENTK---LAIDPTGQTFAGVLS---NGMRVYIDPYAVA----EYITL
gi|86372247|gb|  AMAGILDYAPALNSQVN---LEVDATGATYAGN-A---GRYRVYIDPYLGV----DAYLV
gi|326783094|re  AMSGVLDYDSGISGAVG-GIGEIDDTGNTFVGTLN---GRFKVYIDPYSANVSDNQYYVV
gi|326783543|re  GMAGVLDYSPALNGNNA--LTGVDDTSSTLVGTLN---GRIKVYVDPYSANVADKHYYVA
gi|58532919|ref  AMAGVLDYSSGLNGAGGPSIGEVDDTGNLAVGTIN---GRIKVFVDPYAANLSDKHYYVI
GOS_2144010      AMAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTIN---GRIKVYVDPYAANLSDKHYYVV
gi|326784102|re  AMAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTIN---GRIKVYVDPYAANLSDKHYYVV
                                                       ###########           


                        550       560       570       580       590       600
                 =========+=========+=========+=========+=========+=========+
gi|30044131|ref  GNQDLNTPWQTGAVYAPYVPLLFTPTIVDPVNFSYRRGLMTRYALEVV------------
gi|294338215|em  LYKG-STAQDSLGFFCPYVPLSFQKVM-NQESGQPGMIARTRYGLATNPLEPENYART--
gi|312262644|gb  GYKG-SSEMDAGLFYCPYVALTPLRGT-DPKNFQPVLGFKTRYGVKLHPMADSMQNKG--
gi|34419596|ref  GFKG-STEMDAGVFYSPYVPLTPLRGS-DSKNFQPVIGFKTRYGVQVNPFADPTASAT--
gi|294661606|re  GYKG-DNEMDAGIYYAPYVALTPLRGS-DPKNFQPVMGFKTRYGVGINPFANSRSQAP--
gi|311992934|re  GYKG-DNEMDAGIYYAPYVALTPLRGA-DPKNFQPVLGFKTRYGIGINPLADTAAQQPAG
gi|311993169|re  GYKG-ANEMDAGIYYAPYVALTPLRGS-DPKNFQPVMGFKTRYGIGINPFADSASQQP--
gi|308814537|re  GYKG-ANEMDAGIYYAPYVALTPLRGS-DPKNFQPVMGFKTRYGIGINPFADPAAQAP--
gi|282598976|re  VYKG-ESELDAGIFYAPYTPLEMYRGL-GEDSMNPRLAFKTRYGVVANPFYAQKADGT--
gi|290457632|sp  AYKG-ATALDAGIYFAPYVPLEMYRTQ-GETTFAPRMAFKTRYGIAANPFVQIPANQDPQ
gi|326804652|re  AYKG-ATALDAGIFFAPYVPLEMYRTQ-GETTFSPRMAFKTRYGICANPFVQIPANQDPQ
gi|282599333|re  AYKG-ATALDAGIFFAPYVPLEMYRTQ-GETTFSPRMAFKTRYGICANPFVQIPANQDPQ
gi|86372247|gb|  GYKG-SSAYDAGLYFAPYVPLELFRAT-DPTNFHPALGFKTRYAMAENPFVGNSTNIK--
gi|326783094|re  GYKG-SNAYDAGLFYCPYVPLQMYRAI-GQDTFQPRIGFKTRYGMVLNPFAKGLTALS--
gi|326783543|re  GYKG-TSPYDAGLFYCPYVPLQQVRAI-NPDTFQPKIGFKTRYGMVSNPFAQGLTQGS--
gi|58532919|ref  GYKG-TSPYDAGLFYCPYVPLQMVRSI-DPNTFQPKIGFKTRYGMVSNPFVTTNGLYN--
GOS_2144010      GYKG-TSPYDAGLFYCPYVPLQMVRSI-DPNNFQPKIGFKTRYGMVSNPFVTTNGAYN--
gi|326784102|re  GYKG-TSPYDAGLFYCPYVPLQMVRSI-DPNNFQPKIGFKTRYGMVSNPFVTTNGLYS--
                          ################   ####################            


                        610       620       630
                 =========+=========+=========+=
gi|30044131|ref  ----------------RPEFYGLLYVKLLQP
gi|294338215|em  --------------------FGVDLTGTILA
gi|312262644|gb  FAKITNGMPQHTNMFGKNAFFRRVLVAGV--
gi|34419596|ref  --KVGNGAPV-AASMGKNAYFRRVFVKGL--
gi|294661606|re  SDRITSGMIT-KEMFGKNAYFRKVYVKGL--
gi|311992934|re  NARIANGMPSIANSVGKNGYFRRVLVKGI--
gi|311993169|re  NARIQSGMPSIVNSVGKNAYFRRIWVKGL--
gi|308814537|re  TKRIQNGMPDIVNSLGLNGYFRRVYVKGI--
gi|282598976|re  -KPTGLGLGQ-----GENGYGRKLLIKGITA
gi|290457632|sp  VYVTEDGIAK-----DTNVYFRKGLIKNLY-
gi|326804652|re  VYVTADGIAQ-----DSNPYFRKGLIKSLF-
gi|282599333|re  VYVTADGIAQ-----DSNPYFRKGLIKGLF-
gi|86372247|gb|  --------------ANNNFYYRKAKVTNLL-
gi|326783094|re  -----NSDPQHSTNLNANAYYRRVRVANLM-
gi|326783543|re  -----GALTA-----NTNRYYRRVQVANLM-
gi|58532919|ref  GTPDGEALTP-----NANMYYRRVQVTNLM-
GOS_2144010      GTPDGETLTA-----NANMYYRRVQVTNLM-
gi|326784102|re  GTPDGETLTP-----STNMYYRRVQVTNLM

Protein Domains

PROTOCOL


Interpro, default parameters at EBI


RESULTS ANALYSIS


When using InterProScan, it was observed that the selected ORF encoeded a protein from a family of major capsid Gp23 proteins from T4-like bacteriophages.This family of proteins contains approximately 500 residues long.

The bacteriophage T4-like has several proteins that constituted the capsid: Gp20, Gp23 e Gp24. The Gp23 capsid protein forms the hexagonal capside lattice.

For the proper funcionning of the Gp23 capsid protein it is necessary the aid of two chaperones (the E. coli chaperone GroEL and the phage coded Gp23-specific chaperone) so that the Gp23 adquires the propor structure for playing it's role as part of the virus capsid.

The capsid of the bacteriophage T4-like also has two aditional proteins called HOC and SOC, which are responsable for the reinforcemente of the structure of the virus capsid by binding to the Gp23 proteins.

The conserved protein domain found by the InterProScan has an E-value of 1e-24, which means that the protein domain found has significance, i.e., it has an E-value<1e-4.

When using the MSA, one can see that the program show us 4 conserved protein domains instead of just 1 as it was with the InterProScan. We can also say that despite the fact that the number of protein domais found by both programs was different, if we compare the lenght of the protein domain found by the InterProScan with the combined lenght of all of the protein domains found by the MSA, we can say that they are approximately the same size in lenght, which means it is all the same conserved protein domain.




RAW RESULTS

Sequence_1	6CD382DA10ED5973	137	HMMPfam	PF07068	Gp23	21	107	1e-24	T	23-Mar-2011	IPR010762	Major capsid Gp23	

Phylogeny

PROTOCOL

a) Phylogeny.fr / PhyML method / out group: none

b) Phylogeny.fr / BioNJ method / out group: none



RESULTS ANALYSIS

As we said previously we could not chose an OUTgroup for building the tree and, as a result, both trees were unrooted and therefore unstable.

The GOS_2144010 Genomic DNA sequence appears in both trees drawn by PhyML and BioNJ methods,constutively together with Synechococcus_phage_S-PM2_gi_58532919, Cyanophage_Syn19_gi_326783543 e Cyanophage_Syn1_gi_326784102, which means that both trees are concordant in the way that the Gos sequence always appeares in the same branch with the same sequences in both trees.

Nevertheless, beacuse the trees are unrooted and also becuase the bootstrap value of the tree drawn by the PhyML is 0.87, but the value on the branch before is 0, this means that it is not possible to make any conclusion out of the tree drawn by the PhyML.

However, the tree draw by the BioNJ has a bootstrap value of 0.8 and the value of the branch before is 0.9, this means that it is in fact possible to say that the branch where the GOS_2144010 Genomic DNA sequence is, has a much better probability of belonging to Synechococcus or to Cyanophage.

Therefore the GOS_2144010 Genomic DNA sequence has a high probability of belonging to Synechococcus or to Cyanophage.

when we add the sequences to the Phylogeny.fr to draw the trees by the PhyML and by BioNJ methods, the program renamed the sequences using the GenBnak data base, as so, the Prochlorococcus phage Syn1 sequence was renamed in both trees and came to be called as Cyanophage Syn1 sequence.

Both trees obtained do not seem to be very similar with one another.

When doing the selection of the Ingroups and Outgroups, it was verified that the Ingroups where from the order Caudovirales, therefore the Outgroups would have to be from another order.

However because it was stipulated that it would have to be five sequences to outgroups, it was verified that there was only two sequences available for Outgroups, whereas the remaining sequences did not have E-values that were significant enought to be considered.

When doing the philogenetic tree it was verified that the Outgroup sequences did not stayed in the same branch and therefore they could not be considered Outgroups, futhermore one of the Outgroup sequences was in the same branch as the GOS sequence.

Consequently the Outgroup sequences could not be considered Outgroups but instead be considered Ingroup sequences and in this manner there would be no Outgroups, because all the remaining sequences available for Outgroups did not have significant E-values and because of that the tree would be unrooted.

Thereby it was defined that it would only be Ingroups and it would belong to the domain virus, making the tree unrooted.


RAW RESULTS

a)PhyML method

                                                                                                   -------0.2------
 
       +----------------------------------------------------------------------Campylobacter_phage_CPt10_gi_294338215               [Virus]      
       |
       |               +---------------------Stenotrophomonas_phage_Smp14_gi_86372247                                              [Virus]      
       |
       |               |
       |               |           +-------Cyanophage_Syn19_gi_326783543                                                           [Virus]      
       |
       |               |         0 |
       |               |      +----+ 0
       |   0.93        |      |    |--Synechococcus_phage_S-PM2_gi_58532919                                                        [Virus]
 +-----+---------------+      |    |
 |     |               |      |    | +-Cyanophage_Syn1_gi_326784102                                                                [Virus]
 |     |               |      |    +-+
 |     |               |      |  0.87+GOS_2144010_Genomic_DNA
 |     |               | 0.79 |
 |     |               +------+                                            0 +--Salmonella_phage_Vi01_gi_326804652                 [Virus]
 |     |                      |                                              |
 |     |                      |                                  0.99        |-Shigella_phage_phiSboM-AG3_gi_282599333             [Virus]
 |     |                      |                            +-----------------+
 |     |                      |               1            |                 +------Serratia_phage_KSP90_gi_290457632              [Virus]
 |     |                      |    +-----------------------+
 |     |                      |0.86|                       |
 |     |                      +----+                       +------Deftia_phage_phiW-14_gi_282598976                                [Virus]
 |     |                           |
 |     |                           +-----Cyanophage_M4-259_gi_326783094                                                            [Virus]
 |     |
 |     |            +---------Aeromonas_phage_PX29_gi_312262644                                                                    [Virus]
 |     |    0.94    |
 |     +------------+     +---------Vibrio_phage_KVP40_gi_34419596                                                                 [Virus]
 |                  | 0.6 |
 |                  +-----+    +-Klebsiella_phage_KP15_gi_294661606                                                                [Virus]
 |                        |0.84|
 |                        +----+  +Enterobacteria_phage_CC31_gi_311993169                                                          [Virus]
 |                        0.95 |  |
 |                             +--+
 |                            0.77|+--Acinetobacter_phage_Acj61_gi_311992934                                                       [Virus]
 |                                ++
 |                                 +--Shigella_phage_SP18_gi_308814537                                                             [Virus]
 |
 +----------------------------------------------------------------------------------Rhodothermus_phage_RM378_gi_30044131           [Virus]


------------------------------------------------------------------------------------------------------------------------

b)BioNJ method

                                                                                                          ----0.1--
 
                     +-------Vibrio_phage_KVP40_gi_34419596                                                                        [Virus]
                0.9  |
            +--------+ +---------Aeromonas_phage_PX29_gi_312262644                                                                 [Virus]
            |  0.56  | |
            |        +-+      +--Klebsiella_phage_KP15_gi_294661606                                                                [Virus]
            |          | 0.97 |
            |          +------+
            |            0.89 |  +Enterobacteria_phage_CC31_gi_311993169                                                           [Virus]
            |                 +--+
            |               0.59 |+--Acinetobacter_phage_Acj61_gi_311992934                                                        [Virus]
            |                    ++
            |                     +--Shigella_phage_SP18_gi_308814537                                                              [Virus]
            |
            |           +--------------------Stenotrophomonas_phage_Smp14_gi_86372247                                              [Virus]
 +----------+           |
 |          |           |      +--------Cyanophage_M4-259_gi_326783094                                                             [Virus]
 |          |           | 0.78 |
 |          |     0.44  |+-----+   +-----Cyanophage_Syn19_gi_326783543                                                             [Virus]
 |          |   +-------+| 0.86|   |
 |          |   |       ||     +---+
 |          |   |       ||      0.9|  +--Synechococcus_phage_S-PM2_gi_58532919                                                     [Virus]
 |          |   |       ||         +--+
 |          |   |       ||         0.8| +-Cyanophage_Syn1_gi_326784102                                                             [Virus]
 |          |   |       ++            +-+
 |          |   |    0.34|              +GOS_2144010_Genomic_DNA
 |          |   |        |
 |          +---+        |                   +-----------Deftia_phage_phiW-14_gi_282598976                                         [Virus]
 |          0.34|        |      0.98         |
 |              |        +-------------------+           +-----Serratia_phage_KSP90_gi_290457632                                   [Virus]
 |              |                            |     1     |
 |              |                            +-----------+  +-Salmonella_phage_Vi01_gi_326804652                                   [Virus]
 |              |                                    0.8 |  |
 |              |                                        +--+
 |              |                                           +--Shigella_phage_phiSboM-AG3_gi_282599333                             [Virus]
 |              |
 |              +-----------------------------------------------------------------Campylobacter_phage_CPt10_gi_294338215           [Virus]
 |
 +--------------------------------------------------------------------------------Rhodothermus_phage_RM378_gi_30044131             [Virus]

Taxonomy report

PROTOCOL

BLASTp versus NR, NCBI default parameters apart from "Number of descriptions_1000"


RESULTS ANALYSES


By analising the lineage report and the organism report, we conclued that the ORF we have selected belonged to the superkingdom viruses and to the order Caudovirales.

Because the E-values and the scores are so similar, we can say that our ORF belongs to the order Caudovirales, but we can not say something about the other taxonomic levels. That is because we can only identify until the order because for the other taxonomy levels, the E-values and the scores obtained are not low enough to exclude other biological soucers for our DNA sequence.



ref|YP_004324495.1| precursor of major head subunit [Prochlorococcus phage Syn1]

ref|YP_195142.1| precursor of major head subunit [Synechococcus phage S-PM2]

gb|ABC95191.1| GP23-major capsid protein [Stenotrophomonas phage Smp14]

ref|YP_003358893.1| gp23 major head protein [Deftia phage phiW-14]

sp|P85989.2|CAPSD_BPSK9 RecName: Full_Major capsid protein; AltName: Full_Virion protein D; Flags: Precursor

ref|YP_004327523.1| Gp23-Major capsid protein [Salmonella phage Vi01]

ref|YP_003358645.1| Gp23 major head protein [Shigella phage phiSboM-AG3]

ref|NP_899609.1| gp23 [Vibrio phage KVP40]

gb|ADQ52939.1| gp23 major head protein [Aeromonas phage PX29]

ref|YP_004010035.1| gp23 major head protein [Enterobacteria phage CC31]

ref|YP_003934811.1| major head protein [Shigella phage SP18]

ref|YP_003580059.1| gp23 major capsid protein [Klebsiella phage KP15]

ref|YP_004009801.1| gp23 major head protein [Acinetobacter phage Acj61]

emb|CBJ94252.1| Possible phage major capsid protein [Campylobacter phage CPt10]

ref|NP_835728.1| similar to major head protein [Rhodothermus phage RM378]

ref|YP_004323954.1| precursor of major head subunit [Synechococcus phage Syn19]

ref|YP_004323491.1| precursor of major head subunit [Prochlorococcus phage P-HM2]


When doing the selection of the Ingroups and Outgroups, it was verified that the Ingroups where from the order Caudovirales, therefore the Outgroups would have to be from another order.

However because it was stipulated that it would have to be five sequences to outgroups, it was verified that there was only two sequences available for Outgroups, whereas the remaining sequences did not have E-values that were significant enought to be considered.

When doing the philogenetic tree it was verified that the Outgroup sequences did not stayed in the same branch and therefore they could not be considered Outgroups, futhermore one of the Outgroup sequences was in the same branch as the GOS sequence.

Consequently the Outgroup sequences could not be considered Outgroups but instead be considered Ingroup sequences and in this manner there would be no Outgroups, because all the remaining sequences available for Outgroups did not have significant E-values and because of that the tree would be unrooted.

Thereby it was defined that it would only be Ingroups and it would belong to the domain virus, making the tree unrooted.


















RAW RESULTS

Lineage Report

root
. Viruses                   [viruses]
. . Caudovirales              [viruses]
. . . unclassified Caudovirales [viruses]
. . . . Prochlorococcus phage Syn1 -------------  275  1 hit  [viruses]           precursor of major head subunit [Prochlorococcus phage Syn1]
. . . . Prochlorococcus phage Syn33 ............  196  1 hit  [viruses]           precursor of major head subunit [Prochlorococcus phage Syn3
. . . . Enterobacteria phage IP008 .............   50  1 hit  [viruses]           major capsid protein [Enterobacteria phage IP008]
. . . Synechococcus phage S-PM2 ----------------  270  3 hits [viruses]           precursor of major head subunit [Synechococcus phage S-PM2]
. . . Synechococcus phage S-RSM4 ...............  228  2 hits [viruses]           major capsid protein [Synechococcus phage S-RSM4] >gi|25570
. . . Prochlorococcus phage P-SSM4 .............  194  2 hits [viruses]           precursor of major head subunit [Prochlorococcus phage P-SS
. . . Synechococcus phage S-ShM2 ...............  190  1 hit  [viruses]           precursor of major head subunit [Synechococcus phage S-ShM2]
. . . Prochlorococcus phage P-RSM4 .............  189  1 hit  [viruses]           precursor of major head subunit [Prochlorococcus phage P-RS
. . . Synechococcus phage syn9 .................  189  2 hits [viruses]           major capsid [Synechococcus phage syn9] >gi|76574538|gb|ABA
. . . Prochlorococcus phage P-SSM7 .............  189  1 hit  [viruses]           precursor of major head subunit [Prochlorococcus phage P-SS
. . . Synechococcus phage S-SM1 ................  188  1 hit  [viruses]           precursor of major head subunit [Synechococcus phage S-SM1]
. . . Synechococcus phage S-SSM5 ...............  185  1 hit  [viruses]           precursor of major head subunit [Synechococcus phage S-SSM5]
. . . Prochlorococcus phage P-HM1 ..............  181  1 hit  [viruses]           precursor of major head subunit [Prochlorococcus phage P-HM
. . . Synechococcus phage S-SSM7 ...............  163  1 hit  [viruses]           precursor of major head subunit [Synechococcus phage S-SSM7]
. . . Synechococcus phage S-SM2 ................  159  1 hit  [viruses]           precursor of major head subunit [Synechococcus phage S-SM2]
. . . Prochlorococcus phage P-SSM2 .............  155  3 hits [viruses]           major capsid [Prochlorococcus phage P-SSM2]
. . . uncultured Myoviridae ....................  131 20 hits [viruses]           gp23 major capsid protein [uncultured Myoviridae]
. . . Stenotrophomonas phage Smp14 .............  115  1 hit  [viruses]           GP23-major capsid protein [Stenotrophomonas phage Smp14]
. . . Vibrio phage KVP40 .......................   98  3 hits [viruses]           gp23 [Vibrio phage KVP40] >gi|34333277|gb|AAQ64432.1| gp23 
. . . Vibrio phage KVP20 .......................   98  1 hit  [viruses]           major capsid protein precursor [Bacteriophage KVP20]
. . . Aeromonas phage PX29 .....................   97  1 hit  [viruses]           gp23 major head protein [Aeromonas phage PX29]
. . . Serratia phage KSP90 .....................   96  2 hits [viruses]           RecName: Full=Major capsid protein; AltName: Full=Virion pr
. . . Aeromonas phage 25 .......................   95  4 hits [viruses]           gp23 precursor of major head subunit [Aeromonas phage 25] >
. . . Aeromonas phage Aeh1 .....................   95  2 hits [viruses]           gp23 major head protein [Aeromonas phage Aeh1] >gi|33414960
. . . Aeromonas phage phiAS4 ...................   94  4 hits [viruses]           precursor of major head subunit [Aeromonas phage phiAS4] >g
. . . Aeromonas phage phiAS5 ...................   94  2 hits [viruses]           major head protein [Aeromonas phage phiAS5] >gi|306021328|g
. . . Acinetobacter phage Acj9 .................   93  4 hits [viruses]           gp23 major head subunit precursor [Acinetobacter phage Acj9
. . . Enterobacteria phage IME08 ...............   92  4 hits [viruses]           gp23 major head protein [Enterobacteria phage IME08] >gi|29
. . . Enterobacteria phage JS98 ................   92  4 hits [viruses]           gp23 major head protein [Enterobacteria phage JS98] >gi|521
. . . Aeromonas phage 31 .......................   91  4 hits [viruses]           gp23 [Aeromonas phage 31] >gi|62114800|gb|AAX63648.1| gp23 
. . . Shigella phage phiSboM-AG3 ...............   91  2 hits [viruses]           Gp23 major head protein [Shigella phage phiSboM-AG3] >gi|22
. . . Shigella phage SP18 ......................   91  2 hits [viruses]           major head protein [Shigella phage SP18] >gi|308206129|gb|A
. . . Enterobacteria phage vB_EcoM-VR7 .........   91  2 hits [viruses]           major capsid protein [Enterobacteria phage vB_EcoM-VR7] >gi
. . . Aeromonas phage 65 .......................   91  4 hits [viruses]           gp23 major head protein [Aeromonas phage 65] >gi|312262834|
. . . Salmonella phage Vi01 ....................   91  1 hit  [viruses]           Gp23-Major capsid protein [Salmonella phage Vi01]
. . . Aeromonas phage 44RR2.8t .................   90  4 hits [viruses]           gp23 [Aeromonas phage 44RR2.8t] >gi|34850740|gb|AAF61693.2|
. . . Enterobacteria phage CC31 ................   90  2 hits [viruses]           gp23 major head protein [Enterobacteria phage CC31] >gi|284
. . . Enterobacteria phage JS10 ................   90  2 hits [viruses]           gp23 major head protein [Enterobacteria phage JS10] >gi|220
. . . Deftia phage phiW-14 .....................   89  2 hits [viruses]           gp23 major head protein [Deftia phage phiW-14] >gi|25721904
. . . Enterobacteria phage RB69 ................   89  3 hits [viruses]           major capsid protein [Enterobacteria phage RB69] >gi|323504
. . . Acinetobacter phage Acj61 ................   88  4 hits [viruses]           gp23 major head protein [Acinetobacter phage Acj61] >gi|295
. . . Acinetobacter phage Ac42 .................   88  4 hits [viruses]           gp23 precursor of major head subunit [Acinetobacter phage A
. . . Enterobacteria phage SV14 ................   87  1 hit  [viruses]           major capsid protein [Enterobacteria phage SV14]
. . . Acinetobacter phage 133 ..................   86  4 hits [viruses]           gp23 major head protein [Acinetobacter phage 133] >gi|29948
. . . Enterobacteria phage T4 ..................   86  5 hits [viruses]           gp23 major head protein [Enterobacteria phage T4] >gi|20141
. . . Enterobacteria phage RB43 ................   86  4 hits [viruses]           gp23 precursor of major head subunit [Enterobacteria phage 
. . . Enterobacteria phage RB16 ................   86  4 hits [viruses]           gp23 major capsid protein [Enterobacteria phage RB16] >gi|2
. . . Klebsiella phage KP15 ....................   86  4 hits [viruses]           gp23 major capsid protein [Klebsiella phage KP15] >gi|29266
. . . Enterobacteria phage RB14 ................   85  2 hits [viruses]           gp23 precursor of major head subunit [Enterobacteria phage 
. . . Enterobacteria phage T4T .................   85  1 hit  [viruses]           major capsid protein (g23) [Enterobacteria phage T4] >gi|29
. . . Enterobacteria phage AR1 .................   85  3 hits [viruses]           RecName: Full=Major capsid protein; AltName: Full=Protein G
. . . Enterobacteria phage RB32 ................   85  2 hits [viruses]           major head subunit precursor [Enterobacteria phage RB32] >g
. . . Enterobacteria phage RB51 ................   85  2 hits [viruses]           gp23 precursor of major head subunit [Enterobacteria phage 
. . . Enterobacteria phage T6 ..................   84  3 hits [viruses]           RecName: Full=Major capsid protein; AltName: Full=Protein G
. . . Enterobacteria phage JSE .................   84  4 hits [viruses]           major capsid protein [Enterobacteria phage JSE] >gi|2200291
. . . Enterobacteria phage Phi1 ................   84  4 hits [viruses]           major capsid protein [Enterobacteria phage Phi1] >gi|149380
. . . Enterobacteria phage RB49 ................   84  4 hits [viruses]           major capsid protein [Enterobacteria phage RB49] >gi|333481
. . . Enterobacteria phage MV13 ................   83  1 hit  [viruses]           major capsid protein [Enterobacteria phage MV13]
. . . Enterobacteria phage MV9 .................   80  1 hit  [viruses]           major capsid protein [Enterobacteria phage MV9] >gi|9593275
. . . Enterobacteria phage MV12 ................   80  1 hit  [viruses]           major capsid protein [Enterobacteria phage MV9] >gi|9593275
. . . Campylobacter phage CPt10 ................   62  1 hit  [viruses]           Possible phage major capsid protein [Campylobacter phage CP
. . . Campylobacter phage CP220 ................   62  1 hit  [viruses]           Possible phage major capsid protein [Campylobacter phage CP
. . . Enterobacteria phage T2L .................   51  1 hit  [viruses]           major head protein [Enterobacteria phage T2L] >gi|270272127
. . . Enterobacteria phage RB1 .................   51  1 hit  [viruses]           major head protein [Enterobacteria phage T2L] >gi|270272127
. . . Enterobacteria phage KEP10 ...............   48  1 hit  [viruses]           major capsid protein [Enterobacteria phage KEP10]
. . . Enterobacteria phage T2 ..................   47  1 hit  [viruses]           major capsid protein [Enterobacteria phage T2]
. . . Rhodothermus phage RM378 .................   41  1 hit  [viruses]           similar to major head protein [Rhodothermus phage RM378]
. . . Enterobacteria phage RB2 .................   40  1 hit  [viruses]           major head protein [Enterobacteria phage RB2] >gi|270272132
. . . Enterobacteria phage RB3 .................   40  1 hit  [viruses]           major head protein [Enterobacteria phage RB2] >gi|270272132
. . . Enterobacteria phage RB6 .................   40  1 hit  [viruses]           major head protein [Enterobacteria phage RB2] >gi|270272132
. . . Enterobacteria phage RB8 .................   40  1 hit  [viruses]           major head protein [Enterobacteria phage RB2] >gi|270272132
. . . Enterobacteria phage RB9 .................   40  1 hit  [viruses]           major head protein [Enterobacteria phage RB2] >gi|270272132
. . . Enterobacteria phage RB10 ................   40  1 hit  [viruses]           major head protein [Enterobacteria phage RB2] >gi|270272132
. . . Enterobacteria phage RB15 ................   40  1 hit  [viruses]           major head protein [Enterobacteria phage RB2] >gi|270272132
. . Synechococcus phage Syn19 ------------------  192  1 hit  [viruses]           precursor of major head subunit [Synechococcus phage Syn19]
. . Prochlorococcus phage P-HM2 ................  182  1 hit  [viruses]           precursor of major head subunit [Prochlorococcus phage P-HM
. . Enterobacteria phage RB55 ..................   41  1 hit  [viruses]           major head protein [Enterobacteria phage T6] >gi|270272158|
. . Enterobacteria phage RB59 ..................   41  1 hit  [viruses]           major head protein [Enterobacteria phage T6] >gi|270272158|
. . Enterobacteria phage RB4 ...................   40  1 hit  [viruses]           major head protein [Enterobacteria phage RB2] >gi|270272132
. . Enterobacteria phage RB7 ...................   40  1 hit  [viruses]           major head protein [Enterobacteria phage RB2] >gi|270272132
. Vitis vinifera (wine grape) ------------------   34  2 hits [eudicots]          unnamed protein product [Vitis vinifera]
. Labrenzia aggregata IAM 12614 ................   33  2 hits [a-proteobacteria]  hypothetical protein SIAM614_31391 [Stappia aggregata IAM 1
. Pediculus humanus corporis (human body lice) .   33  2 hits [lice]              leucine-rich transmembrane protein, putative [Pediculus hum
. Physcomitrella patens subsp. patens ..........   33  2 hits [mosses]            predicted protein [Physcomitrella patens subsp. patens] >gi

BLAST

PROTOCOL


1)BLASTp versus SWISSPROT, default NCBI parameters

2)BLASTp versus NR, default NCBI parameters + "1000 Max target sequences"


RESULTS ANALYSIS


When using BLASTp versus SWISSPROT, the numerical values for the E-value and for the score were the following: the E-value was equal to 4e-20 while the score was equal to 96.3 bits. As the minimum values are an E-value less than 1e-4 and a score equivalent to or higher than 100 bits, we can conclued that there was no bilogical meaning for the selected ORF when we used BLASTp versus SWISSPROT.

When using BLASTp versus NR, it was found biological significance because it was achieved an E-value of 1e-72and a score of 275 bits. Therefore, refering to what was said previously, we can conclued that the protein encoeded by the selected ORF has a homology with sequences found in the NR database.

The homolos found in the NR data base are or precursor of major head subunit or major capsid protein gp23. What that tell us is that the biological source of the protein encoded by the selected ORF probably is a virus because the protein encoded by the selected ORF belongs to a capsid of a virus.

RAW RESULTS 

1)BLASTp versus SWISSPROT
                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

sp|P85989.2|CAPSD_BPSK9  RecName: Full=Major capsid protein; A...  96.3    4e-20
sp|P04535.2|COAT_BPT4  RecName: Full=Major capsid protein; Alt...  86.7    3e-17
sp|Q9ZXI0.1|COAT_BPAR1  RecName: Full=Major capsid protein; Al...  85.9    5e-17
sp|Q38055.1|COAT_BPT6  RecName: Full=Major capsid protein; Alt...  84.7    1e-16
sp|P19896.2|VG24_BPT4  RecName: Full=Head vertex protein Gp24      31.6    1.3  
sp|Q0KEP6.1|LOLB_RALEH  RecName: Full=Outer-membrane lipoprote...  30.0    3.5  
sp|Q9YN02.3|RPOA_PRRS1  RecName: Full=Replicase polyprotein 1a...  29.6    4.5  
sp|P15575.1|B3AT_CHICK  RecName: Full=Band 3 anion transport p...  29.6    5.1  
sp|B2AGU0.1|LOLB_CUPTR  RecName: Full=Outer-membrane lipoprote...  29.6    5.3  
sp|P03728.1|VHTJ_BPT7  RecName: Full=Head-to-tail joining protein  29.3    5.6  
sp|Q9WJB2.2|RPOA_PRRSR  RecName: Full=Replicase polyprotein 1a...  29.3    5.7  
sp|Q8B912.3|RPOA_PRRSB  RecName: Full=Replicase polyprotein 1a...  29.3    6.1  
sp|Q04561.3|RPOA_PRRSL  RecName: Full=Replicase polyprotein 1a...  29.3    7.1  
sp|A7TKW8.1|GET1_VANPO  RecName: Full=Golgi to ER traffic prot...  29.3    7.1  

ALIGNMENTS
>sp|P85989.2|CAPSD_BPSK9 RecName: Full=Major capsid protein; AltName: Full=Virion protein 
D; Flags: Precursor
Length=441

 Score = 96.3 bits (238),  Expect = 4e-20, Method: Compositional matrix adjust.
 Identities = 54/140 (39%), Positives = 78/140 (56%), Gaps = 13/140 (9%)

Query  1    MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKG  60
            MAG+LDYS  L      A   VD TG    G ++  ++VY+DPYA       Y  + YKG
Sbjct  310  MAGMLDYSPALNVQAQLA---VDPTGQTFAGVLSNGMRVYIDPYAV----AEYITLAYKG  362

Query  61   TSPYDAGLFYCPYVPLQMVRSIDPNNFQPKIGFKTRYGMVSNPFVTTNGAYNG----TPD  116
             +  DAG+++ PYVPL+M R+     F P++ FKTRYG+ +NPFV      +     T D
Sbjct  363  ATALDAGIYFAPYVPLEMYRTQGETTFAPRMAFKTRYGIAANPFVQIPANQDPQVYVTED  422

Query  117  GETLTANANMYYRRVQVTNL  136
            G  +  + N+Y+R+  + NL
Sbjct  423  G--IAKDTNVYFRKGLIKNL  440


>sp|P04535.2|COAT_BPT4 RecName: Full=Major capsid protein; AltName: Full=Protein Gp23
Length=521

 Score = 86.7 bits (213),  Expect = 3e-17, Method: Compositional matrix adjust.
 Identities = 52/130 (40%), Positives = 72/130 (55%), Gaps = 12/130 (9%)

Query  14   AGGPAIG-TVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKGTSPYDAGLFYCP  72
            A G A G + D T ++  G + G+ +VY+D YA     + Y+ VGYKG +  DAG++Y P
Sbjct  397  AQGLATGFSTDTTKSVFAGVLGGKYRVYIDQYA----KQDYFTVGYKGPNEMDAGIYYAP  452

Query  73   YVPLQMVRSIDPNNFQPKIGFKTRYGMVSNPFVTTNGA------YNGTPDGETLTANANM  126
            YV L  +R  DP NFQP +GFKTRYG+  NPF  +          +G P     +   N 
Sbjct  453  YVALTPLRGSDPKNFQPVMGFKTRYGIGINPFAESAAQAPASRIQSGMPSILN-SLGKNA  511

Query  127  YYRRVQVTNL  136
            Y+RRV V  +
Sbjct  512  YFRRVYVKGI  521


>sp|Q9ZXI0.1|COAT_BPAR1 RecName: Full=Major capsid protein; AltName: Full=Protein Gp23
Length=521

 Score = 85.9 bits (211),  Expect = 5e-17, Method: Compositional matrix adjust.
 Identities = 48/120 (40%), Positives = 67/120 (56%), Gaps = 11/120 (9%)

Query  23   DDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKGTSPYDAGLFYCPYVPLQMVRSI  82
            D T ++  G + G+ +VY+D YA     + Y+ VGYKG +  DAG++Y PYV L  +R  
Sbjct  407  DTTKSVFAGVLGGKYRVYIDQYA----KQDYFTVGYKGPNEMDAGIYYAPYVALTPLRGS  462

Query  83   DPNNFQPKIGFKTRYGMVSNPFVTTNGA------YNGTPDGETLTANANMYYRRVQVTNL  136
            DP NFQP +GFKTRYG+  NPF  +          +G P     +   N Y+RRV V  +
Sbjct  463  DPKNFQPVMGFKTRYGIGINPFAESAAQAPASRIQSGMPSILN-SLGKNAYFRRVYVKGI  521


>sp|Q38055.1|COAT_BPT6 RecName: Full=Major capsid protein; AltName: Full=Protein Gp23
Length=502

 Score = 84.7 bits (208),  Expect = 1e-16, Method: Compositional matrix adjust.
 Identities = 40/85 (47%), Positives = 55/85 (65%), Gaps = 4/85 (5%)

Query  23   DDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKGTSPYDAGLFYCPYVPLQMVRSI  82
            D T ++  G + G+ +VY+D YA     + Y+ VGYKG +  DAG++Y PYV L  +R  
Sbjct  407  DTTKSVFAGVLGGKYRVYIDQYA----KQDYFTVGYKGPNEMDAGIYYAPYVALTPLRGS  462

Query  83   DPNNFQPKIGFKTRYGMVSNPFVTT  107
            DP NFQP +GFKTRYG+  NPF  +
Sbjct  463  DPKNFQPVMGFKTRYGIGINPFAES  487


>sp|P19896.2|VG24_BPT4 RecName: Full=Head vertex protein Gp24
Length=427

 Score = 31.6 bits (70),  Expect = 1.3, Method: Composition-based stats.
 Identities = 10/23 (43%), Positives = 15/23 (65%), Gaps = 0/23 (0%)

Query  82   IDPNNFQPKIGFKTRYGMVSNPF  104
            +DP + QP IG   RY + +NP+
Sbjct  364  VDPESLQPSIGLLVRYALSANPY  386


>sp|Q0KEP6.1|LOLB_RALEH RecName: Full=Outer-membrane lipoprotein lolB; Flags: Precursor
Length=197

 Score = 30.0 bits (66),  Expect = 3.5, Method: Compositional matrix adjust.
 Identities = 23/76 (30%), Positives = 31/76 (41%), Gaps = 1/76 (1%)

Query  1    MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYVDPY-AANLSDKHYYVVGYK  59
            +AG+ D+  G    G PA  T D+ G LA    NG    YV    AA    +   +    
Sbjct  122  VAGMRDWLHGRATQGAPARTTRDEQGRLATLAQNGWTVRYVAWQDAAAQVPRRIDLARDA  181

Query  60   GTSPYDAGLFYCPYVP  75
            G++P    L   P  P
Sbjct  182  GSNPLSVRLVIDPRTP  197


>sp|Q9YN02.3|RPOA_PRRS1 RecName: Full=Replicase polyprotein 1ab; AltName: Full=ORF1ab 
polyprotein; Contains: RecName: Full=Nsp1-alpha papain-like 
cysteine proteinase; AltName: Full=PCP1-alpha; Contains: RecName: 
Full=Nsp1-beta papain-like cysteine proteinase; AltName: 
Full=PCP1-beta; Contains: RecName: Full=Nsp2 cysteine proteinase; 
AltName: Full=CP2; Short=CP; Contains: RecName: Full=Non-structural 
protein 3; Short=Nsp3; Contains: RecName: 
Full=3C-like serine proteinase; Short=3CLSP; AltName: Full=Nsp4; 
Contains: RecName: Full=Non-structural protein 5-6-7; Short=Nsp5-6-7; 
Contains: RecName: Full=Non-structural protein 
8; Short=Nsp8; Contains: RecName: Full=RNA-directed RNA polymerase; 
Short=Pol; Short=RdRp; AltName: Full=Nsp9; Contains: 
RecName: Full=Helicase; Short=Hel; AltName: Full=Nsp10; Contains: 
RecName: Full=Non-structural protein 11; Short=Nsp11; 
Contains: RecName: Full=Non-structural protein 12; Short=Nsp12
Length=3961

 Score = 29.6 bits (65),  Expect = 4.5, Method: Composition-based stats.
 Identities = 21/65 (32%), Positives = 27/65 (42%), Gaps = 9/65 (14%)

Query  59    KGTSPYDAGLFYCPYVPLQMV--------RSIDPNNFQPKIGF-KTRYGMVSNPFVTTNG  109
             KGTSP D  L   PY P + V          +DP  +Q + G    R G+  N     +G
Sbjct  3203  KGTSPLDEVLEQVPYKPPRTVIMHVEQGLTPLDPGRYQTRRGLVSVRRGIRGNEVELPDG  3262

Query  110   AYNGT  114
              Y  T
Sbjct  3263  DYAST  3267


>sp|P15575.1|B3AT_CHICK RecName: Full=Band 3 anion transport protein; AltName: Full=Solute 
carrier family 4 member 1
Length=922

 Score = 29.6 bits (65),  Expect = 5.1, Method: Composition-based stats.
 Identities = 11/27 (41%), Positives = 17/27 (63%), Gaps = 0/27 (0%)

Query  80   RSIDPNNFQPKIGFKTRYGMVSNPFVT  106
            RS+DP  ++   G +T  G +SNP V+
Sbjct  14   RSLDPEGYEDTKGSRTSLGTMSNPLVS  40


>sp|B2AGU0.1|LOLB_CUPTR RecName: Full=Outer-membrane lipoprotein lolB; Flags: Precursor
Length=203

 Score = 29.6 bits (65),  Expect = 5.3, Method: Compositional matrix adjust.
 Identities = 16/41 (39%), Positives = 20/41 (49%), Gaps = 0/41 (0%)

Query  1    MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYV  41
            +AG+ D+  G    G PA  T D+ G LA    NG    YV
Sbjct  122  VAGMRDWLHGRATQGSPARATRDEQGRLATLAQNGWTVRYV  162


>sp|P03728.1|VHTJ_BPT7 RecName: Full=Head-to-tail joining protein
Length=536

 Score = 29.3 bits (64),  Expect = 5.6, Method: Composition-based stats.
 Identities = 13/26 (50%), Positives = 17/26 (65%), Gaps = 4/26 (15%)

Query  55   VVGYKGTSPYDAGLFYCPYVPLQMVR  80
            V G  GT P +A    CPY+P++MVR
Sbjct  237  VQGSDGTYPKEA----CPYIPIRMVR  258


>sp|Q9WJB2.2|RPOA_PRRSR RecName: Full=Replicase polyprotein 1ab; AltName: Full=ORF1ab 
polyprotein; Contains: RecName: Full=Nsp1-alpha papain-like 
cysteine proteinase; AltName: Full=PCP1-alpha; Contains: RecName: 
Full=Nsp1-beta papain-like cysteine proteinase; AltName: 
Full=PCP1-beta; Contains: RecName: Full=Nsp2 cysteine proteinase; 
AltName: Full=CP2; Short=CP; Contains: RecName: Full=Non-structural 
protein 3; Short=Nsp3; Contains: RecName: 
Full=3C-like serine proteinase; Short=3CLSP; AltName: Full=Nsp4; 
Contains: RecName: Full=Non-structural protein 5-6-7; Short=Nsp5-6-7; 
Contains: RecName: Full=Non-structural protein 
8; Short=Nsp8; Contains: RecName: Full=RNA-directed RNA polymerase; 
Short=Pol; Short=RdRp; AltName: Full=Nsp9; Contains: 
RecName: Full=Helicase; Short=Hel; AltName: Full=Nsp10; Contains: 
RecName: Full=Non-structural protein 11; Short=Nsp11; 
Contains: RecName: Full=Non-structural protein 12; Short=Nsp12
Length=3960

 Score = 29.3 bits (64),  Expect = 5.7, Method: Composition-based stats.
 Identities = 21/65 (32%), Positives = 27/65 (42%), Gaps = 9/65 (14%)

Query  59    KGTSPYDAGLFYCPYVPLQMV--------RSIDPNNFQPKIGF-KTRYGMVSNPFVTTNG  109
             KGTSP D  L   PY P + V          +DP  +Q + G    R G+  N     +G
Sbjct  3202  KGTSPLDEVLEQVPYKPPRTVIMHVEQGLTPLDPGRYQTRRGLVSVRRGIRGNEVGLPDG  3261

Query  110   AYNGT  114
              Y  T
Sbjct  3262  DYAST  3266


>sp|Q8B912.3|RPOA_PRRSB RecName: Full=Replicase polyprotein 1ab; AltName: Full=ORF1ab 
polyprotein; Contains: RecName: Full=Nsp1-alpha papain-like 
cysteine proteinase; AltName: Full=PCP1-alpha; Contains: RecName: 
Full=Nsp1-beta papain-like cysteine proteinase; AltName: 
Full=PCP1-beta; Contains: RecName: Full=Nsp2 cysteine proteinase; 
AltName: Full=CP2; Short=CP; Contains: RecName: Full=Non-structural 
protein 3; Short=Nsp3; Contains: RecName: 
Full=3C-like serine proteinase; Short=3CLSP; AltName: Full=Nsp4; 
Contains: RecName: Full=Non-structural protein 5-6-7; Short=Nsp5-6-7; 
Contains: RecName: Full=Non-structural protein 
8; Short=Nsp8; Contains: RecName: Full=RNA-directed RNA polymerase; 
Short=Pol; Short=RdRp; AltName: Full=Nsp9; Contains: 
RecName: Full=Helicase; Short=Hel; AltName: Full=Nsp10; Contains: 
RecName: Full=Non-structural protein 11; Short=Nsp11; 
Contains: RecName: Full=Non-structural protein 12; Short=Nsp12
Length=3961

 Score = 29.3 bits (64),  Expect = 6.1, Method: Composition-based stats.
 Identities = 21/65 (32%), Positives = 27/65 (42%), Gaps = 9/65 (14%)

Query  59    KGTSPYDAGLFYCPYVPLQMV--------RSIDPNNFQPKIGF-KTRYGMVSNPFVTTNG  109
             KGTSP D  L   PY P + V          +DP  +Q + G    R G+  N     +G
Sbjct  3203  KGTSPLDEVLEQVPYKPPRTVIMHVEQGLTPLDPGRYQTRRGLVSVRRGIRGNEVDLPDG  3262

Query  110   AYNGT  114
              Y  T
Sbjct  3263  DYAST  3267


2)BLASTp versus NR   
                                                                   Score    E
Sequences producing significant alignments:                       (Bits)  Value

gb|ADO99225.1|  precursor of major head subunit [Prochlorococc...   275    1e-72
ref|YP_195142.1|  precursor of major head subunit [Synechococc...   270    3e-71
gb|AAL09972.1|AF363675_7  major capsid protein gp23 [Synechoco...   259    7e-68
ref|YP_003097339.1|  major capsid protein [Synechococcus phage...   228    3e-58
gb|ADO99659.1|  precursor of major head subunit [Prochlorococc...   196    9e-49
ref|YP_214669.1|  precursor of major head subunit [Prochloroco...   194    4e-48
gb|ADO99436.1|  precursor of major head subunit [Synechococcus...   192    2e-47
gb|ADO97735.1|  precursor of major head subunit [Synechococcus...   190    5e-47
gb|ADO98523.1|  precursor of major head subunit [Prochlorococc...   189    7e-47
ref|YP_717802.1|  major capsid [Synechococcus phage syn9] >gb|...   189    1e-46
gb|ADO98979.1|  precursor of major head subunit [Prochlorococc...   189    1e-46
gb|ADO97217.1|  precursor of major head subunit [Synechococcus...   188    2e-46
gb|ADO97944.1|  precursor of major head subunit [Synechococcus...   185    1e-45
gb|ADO99900.1|  precursor of major head subunit [Prochlorococc...   182    2e-44
gb|ADO98744.1|  precursor of major head subunit [Prochlorococc...   181    4e-44
gb|ADO98214.1|  precursor of major head subunit [Synechococcus...   163    7e-39
gb|ADO97461.1|  precursor of major head subunit [Synechococcus...   159    1e-37
gb|ACY76014.1|  major capsid [Prochlorococcus phage P-SSM2]         155    2e-36
ref|YP_214367.1|  precursor of major head subunit [Prochloroco...   155    2e-36
gb|ABW90949.1|  gp23 major capsid protein [uncultured Myoviridae]   131    4e-29
gb|ABC95191.1|  GP23-major capsid protein [Stenotrophomonas ph...   115    2e-24
ref|NP_899609.1|  gp23 [Vibrio phage KVP40] >gb|AAQ64432.1| gp...  99.0    2e-19
dbj|BAA25567.1|  major capsid protein precursor [Vibrio phage ...  99.0    2e-19
dbj|BAA25880.1|  major capsid protein precursor [Bacteriophage...  98.2    3e-19
gb|ADQ52939.1|  gp23 major head protein [Aeromonas phage PX29]     97.1    7e-19
sp|P85989.2|CAPSD_BPSK9  RecName: Full=Major capsid protein; A...  96.3    1e-18
ref|YP_656387.1|  gp23 precursor of major head subunit [Aeromo...  95.9    2e-18
ref|NP_944113.1|  gp23 major head protein [Aeromonas phage Aeh...  95.9    2e-18
ref|YP_003969123.1|  precursor of major head subunit [Aeromona...  94.4    4e-18
ref|YP_003969308.1|  major head protein [Aeromonas phage phiAS...  94.4    4e-18
ref|YP_004010322.1|  gp23 major head subunit precursor [Acinet...  93.2    1e-17
ref|YP_003734314.1|  gp23 major head protein [Enterobacteria p...  92.4    2e-17
ref|YP_001595301.1|  gp23 major head protein [Enterobacteria p...  92.4    2e-17
ref|YP_238888.1|  gp23 [Aeromonas phage 31] >gb|AAX63648.1| gp...  91.7    3e-17
ref|YP_003358645.1|  Gp23 major head protein [Shigella phage p...  91.7    3e-17
ref|YP_003934811.1|  major head protein [Shigella phage SP18] ...  91.3    3e-17
ref|YP_004063870.1|  major capsid protein [Enterobacteria phag...  91.3    3e-17
ref|YP_004300919.1|  gp23 major head protein [Aeromonas phage ...  91.3    4e-17
emb|CBW38020.1|  Gp23-Major capsid protein [Salmonella phage V...  91.3    4e-17
ref|NP_932516.1|  gp23 [Aeromonas phage 44RR2.8t] >gb|AAF61693...  90.9    5e-17
ref|YP_004010035.1|  gp23 major head protein [Enterobacteria p...  90.5    6e-17
ref|YP_002922518.1|  gp23 major head protein [Enterobacteria p...  90.1    9e-17
ref|YP_003358893.1|  gp23 major head protein [Deftia phage phi...  89.7    1e-16
ref|NP_861877.1|  major capsid protein [Enterobacteria phage R...  89.4    1e-16
ref|YP_004009801.1|  gp23 major head protein [Acinetobacter ph...  89.0    2e-16
ref|YP_004009546.1|  gp23 precursor of major head subunit [Aci...  89.0    2e-16
emb|CAB01542.1|  major capsid protein [Enterobacteria phage SV14]  87.0    7e-16
gb|AAF61699.1|AF222002_1  major capsid protein [Enterobacteria...  87.0    8e-16
ref|YP_004300759.1|  gp23 major head protein [Acinetobacter ph...  86.7    8e-16
ref|NP_049787.1|  gp23 major head protein [Enterobacteria phag...  86.7    9e-16
ref|YP_239203.1|  gp23 precursor of major head subunit [Entero...  86.7    1e-15
ref|YP_003858514.1|  gp23 major capsid protein [Enterobacteria...  86.3    1e-15
ref|YP_003580059.1|  gp23 major capsid protein [Klebsiella pha...  86.3    1e-15
ref|YP_002854508.1|  gp23 precursor of major head subunit [Ent...  85.9    1e-15
gb|AAA32503.1|  major capsid protein (g23) [Enterobacteria pha...  85.9    1e-15
sp|Q9ZXI0.1|COAT_BPAR1  RecName: Full=Major capsid protein; Al...  85.9    2e-15
ref|YP_803115.1|  major head subunit precursor [Enterobacteria...  85.9    2e-15
ref|YP_002854130.1|  gp23 precursor of major head subunit [Ent...  85.9    2e-15
sp|Q38055.1|COAT_BPT6  RecName: Full=Major capsid protein; Alt...  84.7    3e-15
ref|YP_002922237.1|  major capsid protein [Enterobacteria phag...  84.7    3e-15
ref|YP_001469506.1|  major capsid protein [Enterobacteria phag...  84.7    3e-15
ref|NP_891732.1|  major capsid protein [Enterobacteria phage R...  84.7    3e-15
gb|ABF57699.1|  major capsid protein [Enterobacteria phage MV13]   83.2    1e-14
gb|ABF57697.1|  major capsid protein [Enterobacteria phage MV9...  80.1    9e-14
emb|CBJ94252.1|  Possible phage major capsid protein [Campylob...  62.4    2e-08
emb|CBJ93860.1|  Possible phage major capsid protein [Campylob...  62.4    2e-08
gb|ABW90953.1|  gp23 major capsid protein [uncultured Myoviridae]  54.7    4e-06
gb|ABW90951.1|  gp23 major capsid protein [uncultured Myoviridae]  54.3    5e-06
gb|ABW90943.1|  gp23 major capsid protein [uncultured Myoviridae]  54.3    5e-06
gb|ABW90957.1|  gp23 major capsid protein [uncultured Myoviridae]  54.3    6e-06
gb|ABW90955.1|  gp23 major capsid protein [uncultured Myoviridae]  53.9    7e-06
gb|ABW90948.1|  gp23 major capsid protein [uncultured Myoviridae]  52.4    2e-05
gb|ABW90939.1|  gp23 major capsid protein [uncultured Myoviridae]  52.4    2e-05
gb|ABW90956.1|  gp23 major capsid protein [uncultured Myoviridae]  52.4    2e-05
gb|ABW90938.1|  gp23 major capsid protein [uncultured Myovirid...  52.0    2e-05
gb|ABW90941.1|  gp23 major capsid protein [uncultured Myoviridae]  52.0    2e-05
gb|ABW90950.1|  gp23 major capsid protein [uncultured Myoviridae]  52.0    3e-05
gb|ACZ67493.1|  major head protein [Enterobacteria phage T2L] ...  51.6    3e-05
gb|ABW90940.1|  gp23 major capsid protein [uncultured Myoviridae]  51.6    3e-05
gb|ABW90947.1|  gp23 major capsid protein [uncultured Myoviridae]  51.6    4e-05
gb|ABW90942.1|  gp23 major capsid protein [uncultured Myoviridae]  51.6    4e-05
gb|ABW90944.1|  gp23 major capsid protein [uncultured Myoviridae]  50.8    5e-05
gb|ADD10654.1|  major capsid protein [Enterobacteria phage IP008]  50.1    1e-04
gb|ABW90954.1|  gp23 major capsid protein [uncultured Myoviridae]  49.3    2e-04
dbj|BAF95749.1|  major capsid protein [Enterobacteria phage KE...  48.9    2e-04
gb|ABW90952.1|  gp23 major capsid protein [uncultured Myoviridae]  48.9    2e-04
gb|ADD10656.1|  major capsid protein [Enterobacteria phage T2]     47.4    6e-04
ref|YP_003969120.1|  precursor of head vertex subunit [Aeromon...  43.1    0.011
ref|YP_656390.1|  gp24 precursor of head vertex subunit [Aerom...  43.1    0.011
ref|YP_003580062.1|  gp24 capsid vertex protein [Klebsiella ph...  42.0    0.026
ref|NP_835728.1|  similar to major head protein [Rhodothermus ...  41.6    0.036
ref|YP_004300761.1|  gp24 head vertex protein [Acinetobacter p...  41.2    0.045
gb|ACZ67496.1|  major head protein [Enterobacteria phage T6] >...  41.2    0.051
gb|ACZ67499.1|  major head protein [Enterobacteria phage RB2] ...  40.8    0.054
ref|NP_932519.1|  gp24 [Aeromonas phage 44RR2.8t] >ref|YP_2388...  40.8    0.058
ref|YP_239204.1|  gp24 precursor of head vertex subunit [Enter...  39.3    0.18 
ref|YP_004009547.1|  gp24 head vertex protein [Acinetobacter p...  39.3    0.20 
ref|NP_891734.1|  vertex head subunit [Enterobacteria phage RB...  38.9    0.20 
ref|YP_001469509.1|  capsid vertex protein [Enterobacteria pha...  38.9    0.21 
ref|YP_002922240.1|  vertex head subunit [Enterobacteria phage...  38.9    0.22 
ref|YP_003858516.1|  gp24 capsid vertex protein [Enterobacteri...  38.9    0.22 
ref|YP_004300918.1|  gp24 head vertex protein [Aeromonas phage...  36.6    1.3  
ref|YP_004010323.1|  gp24 head vertex protein [Acinetobacter p...  35.4    2.3  
ref|YP_004009802.1|  gp24 head vertex protein [Acinetobacter p...  35.4    2.4  
emb|CBI24268.3|  unnamed protein product [Vitis vinifera]          34.3    4.9  
ref|XP_002262814.1|  PREDICTED: hypothetical protein [Vitis vi...  34.3    4.9  
ref|YP_001595304.1|  gp24 head vertex protein [Enterobacteria ...  33.9    6.7  
ref|ZP_01549642.1|  hypothetical protein SIAM614_31391 [Stappi...  33.9    6.7  
ref|XP_002423671.1|  leucine-rich transmembrane protein, putat...  33.9    7.1  
ref|YP_003734318.1|  gp24 head vertex protein [Enterobacteria ...  33.9    7.7  
ref|XP_001778923.1|  predicted protein [Physcomitrella patens ...  33.5    8.3  

ALIGNMENTS
>gb|ADO99225.1| precursor of major head subunit [Prochlorococcus phage Syn1]
Length=468

 Score =  275 bits (704),  Expect = 1e-72, Method: Compositional matrix adjust.
 Identities = 132/137 (96%), Positives = 134/137 (98%), Gaps = 0/137 (0%)

Query  1    MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKG  60
            MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKG
Sbjct  332  MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKG  391

Query  61   TSPYDAGLFYCPYVPLQMVRSIDPNNFQPKIGFKTRYGMVSNPFVTTNGAYNGTPDGETL  120
            TSPYDAGLFYCPYVPLQMVRSIDPNNFQPKIGFKTRYGMVSNPFVTTNG Y+GTPDGETL
Sbjct  392  TSPYDAGLFYCPYVPLQMVRSIDPNNFQPKIGFKTRYGMVSNPFVTTNGLYSGTPDGETL  451

Query  121  TANANMYYRRVQVTNLM  137
            T + NMYYRRVQVTNLM
Sbjct  452  TPSTNMYYRRVQVTNLM  468


>ref|YP_195142.1| precursor of major head subunit [Synechococcus phage S-PM2]
 emb|CAF34172.1| precursor of major head subunit [Synechococcus phage S-PM2]
Length=468

 Score =  270 bits (691),  Expect = 3e-71, Method: Compositional matrix adjust.
 Identities = 128/137 (93%), Positives = 131/137 (96%), Gaps = 0/137 (0%)

Query  1    MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKG  60
            MAGVLDYSSGL GAGGP+IG VDDTGNLAVGTINGRIKV+VDPYAANLSDKHYYV+GYKG
Sbjct  332  MAGVLDYSSGLNGAGGPSIGEVDDTGNLAVGTINGRIKVFVDPYAANLSDKHYYVIGYKG  391

Query  61   TSPYDAGLFYCPYVPLQMVRSIDPNNFQPKIGFKTRYGMVSNPFVTTNGAYNGTPDGETL  120
            TSPYDAGLFYCPYVPLQMVRSIDPN FQPKIGFKTRYGMVSNPFVTTNG YNGTPDGE L
Sbjct  392  TSPYDAGLFYCPYVPLQMVRSIDPNTFQPKIGFKTRYGMVSNPFVTTNGLYNGTPDGEAL  451

Query  121  TANANMYYRRVQVTNLM  137
            T NANMYYRRVQVTNLM
Sbjct  452  TPNANMYYRRVQVTNLM  468


>gb|AAL09972.1|AF363675_7 major capsid protein gp23 [Synechococcus phage S-PM2]
Length=476

 Score =  259 bits (663),  Expect = 7e-68, Method: Compositional matrix adjust.
 Identities = 127/144 (88%), Positives = 130/144 (90%), Gaps = 7/144 (5%)

Query  1    MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKG  60
            MAGVLDYSSGL GAGGP+IG VDDTGNLAVGTINGRIKV+V PYAANLSDKHYYV+GYKG
Sbjct  333  MAGVLDYSSGLNGAGGPSIGEVDDTGNLAVGTINGRIKVFVVPYAANLSDKHYYVIGYKG  392

Query  61   TSPYDAGLFYCPYVPLQMVRSI-------DPNNFQPKIGFKTRYGMVSNPFVTTNGAYNG  113
            TSPYDAGLFYCPYVPLQMVRSI       DPN FQPKIGFKTRYGMVSNPFVTTNG YNG
Sbjct  393  TSPYDAGLFYCPYVPLQMVRSIDPEWFAYDPNTFQPKIGFKTRYGMVSNPFVTTNGLYNG  452

Query  114  TPDGETLTANANMYYRRVQVTNLM  137
            TPDGE LT NANMYYRRVQVTNLM
Sbjct  453  TPDGEALTPNANMYYRRVQVTNLM  476


>ref|YP_003097339.1| major capsid protein [Synechococcus phage S-RSM4]
 emb|CAR63302.1| major capsid protein [Synechococcus phage S-RSM4]
Length=465

 Score =  228 bits (580),  Expect = 3e-58, Method: Compositional matrix adjust.
 Identities = 108/137 (79%), Positives = 119/137 (87%), Gaps = 3/137 (2%)

Query  1    MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKG  60
            M+G LDYSSGL+GAGGP+IG VDDTGNL VGT+NGRIKVYVDPY+AN+S+ HYYVVGYKG
Sbjct  332  MSGTLDYSSGLSGAGGPSIGEVDDTGNLLVGTMNGRIKVYVDPYSANVSNNHYYVVGYKG  391

Query  61   TSPYDAGLFYCPYVPLQMVRSIDPNNFQPKIGFKTRYGMVSNPFVTTNGAYNGTPDGETL  120
            +SPYDAGLFYCPYVPLQM+RSIDP  FQPKIGFKTRYGMV+NPFV    A  GTPD E L
Sbjct  392  SSPYDAGLFYCPYVPLQMLRSIDPETFQPKIGFKTRYGMVANPFVE---ASAGTPDAEAL  448

Query  121  TANANMYYRRVQVTNLM  137
            TA+ N YYRRV V NLM
Sbjct  449  TASKNQYYRRVLVKNLM  465


>gb|ADO99659.1| precursor of major head subunit [Prochlorococcus phage Syn33]
Length=459

 Score =  196 bits (498),  Expect = 9e-49, Method: Compositional matrix adjust.
 Identities = 98/137 (72%), Positives = 109/137 (80%), Gaps = 7/137 (5%)

Query  1    MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKG  60
            MAGVLDY+  L G  G  +  VDDT +  VGT+NGRIKVYVDPY+AN+SDKH+YV GYKG
Sbjct  330  MAGVLDYTPALAGNNG--LAAVDDTSSTLVGTLNGRIKVYVDPYSANVSDKHFYVAGYKG  387

Query  61   TSPYDAGLFYCPYVPLQMVRSIDPNNFQPKIGFKTRYGMVSNPFVTTNGAYNGTPDGETL  120
            TSPYDAGLFYCPYVPLQ VR+I+PN FQPKIGFKTRYGMVSNPF  + G   G+     L
Sbjct  388  TSPYDAGLFYCPYVPLQQVRAINPNTFQPKIGFKTRYGMVSNPF--SGGLTQGSG---AL  442

Query  121  TANANMYYRRVQVTNLM  137
            TANAN YYRRVQV NLM
Sbjct  443  TANANKYYRRVQVANLM  459


>ref|YP_214669.1| precursor of major head subunit [Prochlorococcus phage P-SSM4]
 gb|AAX46909.1| precursor of major head subunit [Prochlorococcus phage P-SSM4]
Length=462

 Score =  194 bits (493),  Expect = 4e-48, Method: Compositional matrix adjust.
 Identities = 97/137 (71%), Positives = 110/137 (80%), Gaps = 7/137 (5%)

Query  1    MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKG  60
            MAGVLDY+ GL G    A+  VDDT +  VGT+NGRIKVYVDPY++N++DKH+YV GYKG
Sbjct  333  MAGVLDYAPGLQG--NSALTGVDDTSSTLVGTLNGRIKVYVDPYSSNVADKHFYVAGYKG  390

Query  61   TSPYDAGLFYCPYVPLQMVRSIDPNNFQPKIGFKTRYGMVSNPFVTTNGAYNGTPDGETL  120
            TSPYDAGLFYCPYVPLQ VR+I+PN FQPKIGFKTRYGMVSNPF  + G   G+     L
Sbjct  391  TSPYDAGLFYCPYVPLQQVRAINPNTFQPKIGFKTRYGMVSNPF--SGGLTQGSG---AL  445

Query  121  TANANMYYRRVQVTNLM  137
            TANAN YYRRVQV NLM
Sbjct  446  TANANKYYRRVQVANLM  462


>gb|ADO99436.1| precursor of major head subunit [Synechococcus phage Syn19]
Length=457

 Score =  192 bits (487),  Expect = 2e-47, Method: Compositional matrix adjust.
 Identities = 97/137 (71%), Positives = 106/137 (77%), Gaps = 7/137 (5%)

Query  1    MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKG  60
            MAGVLDYS  L G    A+  VDDT +  VGT+NGRIKVYVDPY+AN++DKHYYV GYKG
Sbjct  328  MAGVLDYSPALNGNN--ALTGVDDTSSTLVGTLNGRIKVYVDPYSANVADKHYYVAGYKG  385

Query  61   TSPYDAGLFYCPYVPLQMVRSIDPNNFQPKIGFKTRYGMVSNPFVTTNGAYNGTPDGETL  120
            TSPYDAGLFYCPYVPLQ VR+I+P+ FQPKIGFKTRYGMVSNPF     A   T     L
Sbjct  386  TSPYDAGLFYCPYVPLQQVRAINPDTFQPKIGFKTRYGMVSNPF-----AQGLTQGSGAL  440

Query  121  TANANMYYRRVQVTNLM  137
            TAN N YYRRVQV NLM
Sbjct  441  TANTNRYYRRVQVANLM  457


>gb|ADO97735.1| precursor of major head subunit [Synechococcus phage S-ShM2]
Length=465

 Score =  190 bits (483),  Expect = 5e-47, Method: Compositional matrix adjust.
 Identities = 97/137 (71%), Positives = 107/137 (78%), Gaps = 8/137 (6%)

Query  1    MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKG  60
            MAGVLDY+  L G  G      DDT +  VGT+NGRIKVYVDPY+AN+SDKH+YVVGYKG
Sbjct  337  MAGVLDYAPALNGNNGL---IPDDTSSTLVGTLNGRIKVYVDPYSANISDKHFYVVGYKG  393

Query  61   TSPYDAGLFYCPYVPLQMVRSIDPNNFQPKIGFKTRYGMVSNPFVTTNGAYNGTPDGETL  120
            T+ YDAGLFYCPYVPLQMVR+I+PN FQPKIGFKTRYGMVSNPF   NG   G+     L
Sbjct  394  TNAYDAGLFYCPYVPLQMVRAINPNTFQPKIGFKTRYGMVSNPFA--NGLTQGSG---AL  448

Query  121  TANANMYYRRVQVTNLM  137
            TANAN YYRR QV NLM
Sbjct  449  TANANRYYRRTQVANLM  465


>gb|ADO98523.1| precursor of major head subunit [Prochlorococcus phage P-RSM4]
Length=465

 Score =  189 bits (481),  Expect = 7e-47, Method: Compositional matrix adjust.
 Identities = 97/137 (71%), Positives = 105/137 (77%), Gaps = 8/137 (6%)

Query  1    MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKG  60
            MAGVLDY+ GL G  G      DDT +  VGT+NGRIKVYVDPY+AN+SDKHYYV GYKG
Sbjct  337  MAGVLDYAPGLQGNNGL---VPDDTSSTLVGTLNGRIKVYVDPYSANVSDKHYYVAGYKG  393

Query  61   TSPYDAGLFYCPYVPLQMVRSIDPNNFQPKIGFKTRYGMVSNPFVTTNGAYNGTPDGETL  120
            TSPYDAGLFYCPYVPLQ VR+I+P+ FQPKIGFKTRYGMVSNPF     A   T     L
Sbjct  394  TSPYDAGLFYCPYVPLQQVRAINPDTFQPKIGFKTRYGMVSNPF-----AQGLTQGSGAL  448

Query  121  TANANMYYRRVQVTNLM  137
            TAN N YYRRVQV NLM
Sbjct  449  TANTNKYYRRVQVANLM  465


>ref|YP_717802.1| major capsid [Synechococcus phage syn9]
 gb|ABA47103.1| major capsid [Synechococcus phage syn9]
Length=457

 Score =  189 bits (479),  Expect = 1e-46, Method: Compositional matrix adjust.
 Identities = 95/137 (69%), Positives = 107/137 (78%), Gaps = 7/137 (5%)

Query  1    MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKG  60
            MAGVLDY+  L G  G  +  VDDT +  VGT+NGRIKVYVDPY+AN++DKH+YV GYKG
Sbjct  328  MAGVLDYTPALNGNNG--LAGVDDTSSTLVGTLNGRIKVYVDPYSANVADKHFYVAGYKG  385

Query  61   TSPYDAGLFYCPYVPLQMVRSIDPNNFQPKIGFKTRYGMVSNPFVTTNGAYNGTPDGETL  120
            TSPYDAGLFYCPYVPLQ VR+I+P+ FQPKIGFKTRYGMVSNPF    G   G+     L
Sbjct  386  TSPYDAGLFYCPYVPLQQVRAINPDTFQPKIGFKTRYGMVSNPFA--GGLTQGSG---AL  440

Query  121  TANANMYYRRVQVTNLM  137
            T NAN YYRRVQV NLM
Sbjct  441  TVNANKYYRRVQVANLM  457


>gb|ADO98979.1| precursor of major head subunit [Prochlorococcus phage P-SSM7]
Length=462

 Score =  189 bits (479),  Expect = 1e-46, Method: Compositional matrix adjust.
 Identities = 94/137 (69%), Positives = 109/137 (80%), Gaps = 7/137 (5%)

Query  1    MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKG  60
            MAGVLDY+ GL G    A+  VDDT +  VGT+NG+IKVYVDPY+AN++DKH+YV GYKG
Sbjct  333  MAGVLDYAPGLQG--NNALTGVDDTSSTLVGTLNGKIKVYVDPYSANVADKHFYVAGYKG  390

Query  61   TSPYDAGLFYCPYVPLQMVRSIDPNNFQPKIGFKTRYGMVSNPFVTTNGAYNGTPDGETL  120
            TSPYDAGLFYCPYVPLQ VR+I+P+ FQPKIGFKTRYGMVSNPF  + G   G+     L
Sbjct  391  TSPYDAGLFYCPYVPLQQVRAINPDTFQPKIGFKTRYGMVSNPF--SGGLTQGSG---AL  445

Query  121  TANANMYYRRVQVTNLM  137
            TANAN YYRR QV N+M
Sbjct  446  TANANKYYRRTQVANIM  462


>gb|ADO97217.1| precursor of major head subunit [Synechococcus phage S-SM1]
Length=458

 Score =  188 bits (477),  Expect = 2e-46, Method: Compositional matrix adjust.
 Identities = 95/137 (69%), Positives = 104/137 (76%), Gaps = 8/137 (6%)

Query  1    MAGVLDYSSGLTGAGGPAIGTVDDTGNLAVGTINGRIKVYVDPYAANLSDKHYYVVGYKG  60
            MAGVLDY+  L G  G      DD  +  VGT+NGRIKVYVDPY+AN++DKHYYV GYKG
Sbjct  330  MAGVLDYAPALAGNNGL---IPDDNSSTLVGTLNGRIKVYVDPYSANVADKHYYVAGYKG  386

Query  61   TSPYDAGLFYCPYVPLQMVRSIDPNNFQPKIGFKTRYGMVSNPFVTTNGAYNGTPDGETL  120
            TSPYDAGLFYCPYVPLQ VR+I+PN FQPKIGFKTRYGMVSNPF     A   T     L
Sbjct  387  TSPYDAGLFYCPYVPLQQVRAINPNTFQPKIGFKTRYGMVSNPF-----AQGLTQGSGAL  441