A BLAST search is then performed on frame +1 ORF to find other proteins with similar sequences, and this can be used to find a possible identity of the protein that our DNA sequence codes for. The BLAST search is done using blastp, against the protein database. Protein database used is NR (non-redundant), as it is the largest database and hence can produce more comprehensive results. Databases like SwissProt is not used as it is small, though it is highly detailed and accura
BLAST directly from ORF results page
|
|
Report of BLAST:
Results with the highest scores were chosen for subsequent analysis
|
|
Results from BLAST search. Top 5 results were chosen. Note the max score and E-values of results.
From the BLAST search, top 5 sequences with highest alignment scores were chosen for subsequent analysis. The max score shows the similarity between query sequence and the sequence from database. The E- value shows the reliability of S score. An E-value of <10-8 considered to be better so we can be more confident that we correctly identified the protein and that the alignment is not due to chance. In normal case, different proteins should be selected to find out the possibility of it matching the query sequence. However, as seen above, the results of this particular query are all the same protein: CTP synthetase. Hence the top 5 were chosen but note that all selected proteins are from different species for a more informative analysis.
A multiple alignment analysis is then done.
Results of multiple alignment analysis. Red signifies the similar regions.
This multiple alignment results are mostly red, showing that they are highly similar. This proves that our query protein fits into the protein family from the results. Since our protein sequence alignment matches multiple proteins’, we can conclude that this sequence consist of conserved domains. This is so as conserved domains are preserved without change in evolution, and this is despite the sequences coming from different species.
With possible conserved domains preserved throughout evolution, the multiple alignment results serve as a basis for a phylogenetic tree.