14 Jun 2010 18:33
Non evaluated contribution
I'm completly newbie to annotations and that's my first sequence to annotate.
I started this task with finding ORF using Glimmer3 application but i failed with
iterated attempt (g3-iterated.csh script). Everything is fine when I use simplest
attempt with building training set with long-orfs program and then running glimmer3
with icm build from this training set. When I try to use g3-iterated.csh I get:
> Step 6 of 8: Making PWM from upstream regions
> Sum count of frequencies is 0!
> Failed to create PWM
Has anyone any expirience using g3-iterated.csh ?
16 Jun 2010 15:09
I haven't used g3-iterated.csh but I guess it's about determining codon usage biases of sorts? If so, then this method is indeed extremely powerful on whole genomes (or at least on large contigs with significant numbers of ORFs). However the approach isn't applicable (at least not easily!) to metagenome sequence data, since one can't calculate background/coding nucleotide stats from essentially single 800bp reads with no matched training sets... The rudimentary approach of selecting the largest STOP-STOP ORF based on the most relevant genetic code (e.g. bacterial or wrongly named universal) is still the easiest way to start the search for a protein coding region in a single sequence read?