Forum "General Annotathon issues, using the interface, bugs etc."

Thread subject: This sequence TO36D_5877010.1

[ Return to forums ]
This sequence TO36D_5877010.1
CamaliaTan
31 May 2016 10:53
Non evaluated contribution

This sequence is taken by Camalia Tan 

CamaliaTan
24 Jun 2016 21:05
Non evaluated contribution

A total of 3 ORF. Frame +1,-1, -2

FRAME +1

The first sequence, longest ORF has a length of 903bp. The 5’ sequence is present but there is no stop codon therefore it is not a complete sequence. The results of BLASTp with NR database using the FASTA protein sequence from the longest ORF. The compound is an animo acid molecule type and is part of the OM_Channel_Superfamilies (Outer membrane channel superfamilies). The top 2 most closely similar homologs proteins are selected; ‘TonB-dependent siderophore receptor [SAR86 cluster bacterium SAR86B]’ with a Max score: 266, total score: 266, Query cover: 88%, E-value: 1e-80, Identical: 50%. ‘TonB-dependent receptor [alpha proteobacterium IMCC14465]’ with a Max score: 266, total score: 266, Query cover: 88%, E-value: 1e-80, Identical: 50%.

In the case for E-value, if E < 1e - 50 it indicates that the match found is not by chance and as shown above, both the homologs are significantly low this shows that there is an extremely high confidence that the database match is a result of homologs relationship. The identical level of these two homologs protein are only 50%, the low in percentage could be due to the lack of research upon this new marine microbe species and does not indicate that this ORF protein sequence is not true.

The genpept of the top two chosen closely homologs protein from the putative sample protein sequence, ‘TonB-dependent siderophore receptor [SAR86 cluster bacterium SAR86B]’ is from planktonic marine bacterial: cluster bacterium SAR86B (Proteobacteria, marine metagenome) and ‘TonB-dependent receptor [alpha proteobacterium IMCC14465]’ is isolated in the East Sea, Belonging to the PS1 Clade of Alphaproteobacteria: alpha proteobacterium IMCC14465. In ‘TonB-dependent siderophore receptor [SAR86 cluster bacterium SAR86B]’, there is one published article “Genomic insights to SAR86, an abundant and uncultivated marine bacterial lineage.”, however no 3D structure of the TonB-dependent receptor was found. In ‘TonB-dependent receptor [alpha proteobacterium IMCC14465]’, there is no published article, and there is no 3D structure of the TonB-dependent receptor available.

The conserved domain regions (specific hits, milti-domains) and conserved domain sites of TonB-dependent siderophore receptor [SAR86 cluster bacterium SAR86B] and TonB-dependent receptor [alpha proteobacterium IMCC14465] are the same. Conserved domain region consist of Ligand_Gate_Channel (cd01347) and Fiu (COD4774), conserved domain region consist of N-terminal plug and ligand-binding site. Both the sites are in the region, and the region Ligand_Gate_Channel is under the superfamily OM(outer membrane)_channels (cl21487). Both indicate the presence of TonB-dependent receptors but in different marine bacteria species. Unfortunately, there is no TonB-dependent 3D structure found, thus visual comparison of the receptor from the two different species is not possible. 

The putative conserved domains contains of Ligand_Gate_Channel (cd01347), also known as TonB-dependent channels that is formed from a monomeric 22 anti-parallel beta-barrel and have multiple large extracelluar loops that binds to ligands, when there is a TonB-dependent conformation change of the channel it will allow the passage of ligands. Fiu (COD4774), Fiu, work as an outer membrane receptor for monomeric catechols however does not belong to any superfamilies, and the structure of this receptor unknown. The conserved domain sites are N-terminal plug and ligand-binding sites. Both the sites are part of the Ligand_Gate_Channel region, and the region Ligand_Gate_Channel is under the superfamily OM (Outer membrane protein) _channels (cl21487). The N-terminal plug and ligand-binding site is situated in the Ligand gated channel and assist in the conformational alteration of the channel during the interaction of the specific ligand and thus allowing the passing of the ligands to the periplasm and prevent any free diffusion of small molecules when there is no ligand. The taxonomy of ligand_gated_channel is under bacteria, proteobacteria. This would proof that TonB-dependent receptors are only present in bacteria, or more specifically in the bacteria species, Proteobacteria. There is a high chance that the putative protein sample has TonB-dependent receptor as well, and could even be placed under the possibility that its identity a kind of proteobacteria. However, due to the lack of actual 3D structure of the TonB-dependent receptor and the huge range of proteobacteria species that possesses this receptor, it is impossible to pit-point the exact organism of the sample just by the information of the putative protein having a TonB-dependent receptor.

The Phylogenetic tree of the longest strand does not infer much about the very beginning of the origin species of the DNA. However, from the third generation of the bacteria onwards, majority of the bacteria species evolution revolves around g-(blue) or a-(green) proteobacteria and eventually in the last generation the proteobacteria is link to TonB-dependent receptor. It appears that both the g-(blue) or a-(green) proteobacteria have developed relatively similar TonB-dependent receptors over the course of evolution. It may indicate that TonB-dependent receptors are only present in the descendent of this particular species, as the proteobactiera is developed from this origin specie and most of the descendent of this specie has TonB-dependent receptors. This phylogenetic tree would make sense, and from the information obtained from the linage microbe it had linked, the DNA genome sequence from the putative protein sample could indicate that it is also descendent of the proteobacteria species with its presence of TonB-dependent receptors.

The result of swissport shows the amino acids composition has high percentages of amino acids are Asp (11.6%), Leu (8.0%) and Ser (10.3%). The protein is classified as stable. Grand average of hydropathicity(GRAVY) is -0.512. By knowing whether the protein is hydrophobic or hydrophilic, it can indicate the identity of the protein sequence and also the subcellular location of the protein. Analysing the result, the high value of amino acid Asp, Leu and Ser shows that the protein is charged and has both hydrophobic and hydrophilic components. The Grand average of hydropathicity (GRAVY) is -0.512 indicates that it is a hydrophilic protein, since the more negative of the value the more hydrophilic therefore meaning that the protein could be on the surface of the membrane. In conclusion, it is deduce that the subcellular location of the protein is a peripheral membrane protein and the protein complexes have some part of their peptide sequence embedded in the hydrophobic region of the membrane.

FRAME -1 

The second sequence ORF is accepted. The size of the Frame -1 has a length of 9135bp. The 5’ sequence is present but there is no stop codon therefore it is not a complete sequence.The result of BLASTp with NR database of the second ORF sequence. The compound is an amino acid molecule type but there is no putative conserve domains or homologs protein found.This protein sequence have not been studies/researched on.

FRAME -2

The third ORF is accepted. The size of the Frame -2 has a length of 177bp. The 5’ sequence is present and there is a stop codon therefore it is complete. The result of BLASTp with NR database of the third ORF sequence. The compound is an amino acid molecule type, there is no putative conserve domains found, and one homologs protein is found “FS general substrate transporter [Coniophora puteana RWD-64-598 SS2]”. The FASTA protein seqeunce of ‘FS general substrate transporter [Coniophora puteana RWD-64-598 SS2]’ and ‘TonB-dependent receptor [alpha proteobacterium IMCC14465]’, with the result of the BLASTp alignment between this two sequence. There are no similarities in homologs protein, or any studied published journal.

The third ORF FASTA protein sequence was used to find out the protein properties in Swissport. The result of swissport shows the amino acids composition has high percentages of amino acids are Leu (20.7%), Cys (8.6%), Phe(8.6%) and Gln (8.6%). The protein does not contain any Trp residue, which could result in more than 10% error in the computed extinction coefficient. The swissport result from this protein is not to be taken into account due to high amount of error. The only homologs protein had run a BLASTp alignment with one of the two homologs protein chosen in ORF Frame +1 in order to find any possible links between the 2 proteins, however no similarities were found. The two are completely different protein sequences with no common match. These results proofs that the ORF Frame -2 is not an ORF sequence to be used due to all the evidence that have been analysed

Conclusion
The ORF Frame +1 is a true positive due to the existence of protein homologs and a decent sized ORF. From the result shown above, both homologous proteins have the exact same conserved domains. In Phylogenetic tree, despite the difference in species of proteobacteria, TonB-dependent receptors are present in almost all the descendent of proterbacteria. In the swissport result, the protein is deduced to be a peripheral membrane protein. The protein sequence in ORD frame +1 has a high possibility of it being labelled as a ‘TonB-dependent receptor’ that is a peripheral membrane protein, and this also help in identifying the DNA gemone of the sample to be a kind of marine bacteria that is under the direct descendent of proterbacteria. The ORF Frame -1 has no proof to be true negative due to the no existing protein homologs and indecent sized of an ORF. There are no studies found of this protein sequence. The ORF Frame -2 has no proof to be true negative due to the lack of existing protein homologs and the indecent sized of an ORF. There are not much studies found in this protein sequence and the swissport does not work as a strong support to the properties of the protein sequence of frame -2 since there is more than 10% in the computed extinction coefficient as  mention before.

Thus, only the longest strand of ORF Frame +1 is taken into account in the deduction of the identity of the protein’s sequence origin family. However, it is yet to be possible to accurately categorise the exact proteobacteria specie the sample is, or even whether if it is a g-(blue) or a-(green) proteobacteria just from the information obtained and the protein sequence provided. But, it is concluded that the sample has the presence of peripheral membrane protein the ‘TonB-dependent receptors’ and is one of the descendent of Proteobacteria.