This document describes the experience of the Marseille University bioinformatics lecturing team in running the Annotathon. We hope this will help interested readers evaluate how the Annotathon can blend into their own curricula, as well as provide an operational guide for those who wish to join in. These guidelines are meant to evolve and adapt to your specific environment!
- 1 Course Organization
- 2 Annotathon Team Management
- 3 Bioinformatics Analyzes
- 4 Links, References & Contacts
The Annotathon is run as part of an obligatory course for biology undergraduate students. The course is mostly run in the third year of preparation for the Bachelor's degrees (french Licence) with majors in either Molecular Biology or Biochemistry. We have also run the Annotathon with fourth year students preparing a Science Master's degree. Because the course is obligatory in the core BSc curriculum, cohorts are usually quite large, varying from 30 to 120 students. The course accounts for 3 credits in a total of 30 for that semester (European Credits Transfer System).
Prerequisite: students are required to have taken an introductory bioinformatics course, in our case an obligatory 30hrs course during their second year. This 101 course introduces resource centers, life science databases and similarity based sequence analysis; a 4x4 hrs hands-on practical in the computer center ensures students have used basic tools at least once in canned exercises (GENBANK, SWISSPROT, PUBMED, dot plots, ORF finder, BLASTp & BLASTx).
For each cohort, the Annotathon is run half a day a week over a relatively compact 8 week period which starts with three 3 hrs lectures covering:
- week 1: Identification of conserved protein domains (patterns, profiles, PROSITE, INTERPRO)
- week 2: Introduction to molecular phylogeny (evolution models, homo-,para-,ortho-logy, tree inference, NJ algorithm)
- week 3: Presentation of the Annotathon: origin of metagenomic sequences, tour of the Annotathon environment, description of annotation fields, practical work organization
The lectures are immediately followed by four practicals (at a rate of one 4 hours session per week) in the computer center. Because of cohort sizes and limiting computing equipment, students are split up into 2 to 4 subgroups and students work in pairs during the practicals. Each student pair has undivided access to a computer terminal during the 16hrs of supervised practicals.
We start the first practical session by having student pairs create a new Annotathon account in the relevant team, and then spend half an hour reading the "Rule Book". The rest of the first session is then devoted to explaining the analysis workflow step by step (going through each editable field of the sequence annotation form), with regular assemblies around the videoprojector screen for general briefings and demos. The last three practicals are unstructured and entirely devoted to helping out individual students with their specific analyses results and interpretations. We have also found it useful to start the last three sessions with a 20 minute review of interesting annotations issues faced by students (videoprojection of annotations).
Each student pair is assigned three metagenomic sequences to annotate over the course of the 4 week practical. It is made very clear to students that to complete their annotation assignments, they will be required to work outside supervised practical sessions. Two half days of homework is a suggested minimum, with Annotathon computer logs showing that students are connected on average for 46 hrs (min=16hrs, max=109hrs), representing an average of 30 hrs homework.
The computer center is thus in free access 2 hours every lunch time and two hours every evening (longer evening access would be much preferable). We have found that with the generalization of home broadband Internet connection, students tend to prefer working on private computer equipment (own laptops, parent's of friend's PC's or even Internet Cafe's).
An Annotathon closing date and time is agreed on at the beginning of the course, usually around 10 days after the last practical session. Annotations can no longer be edited after the closing date.
Other than access to an amphitheater for the 10hrs of lectures (preferably with Internet access and video projector), the course requires access to a networked computer room with one terminal per student (or pair of students) for the supervised practicals, as well as ample computer center free access outside teaching hours. A terminal should also be available for one of the instructors for live team management tasks (see Team Management) and demos (see video projection below).
A video projector connected to the instructor's terminal is most useful in the computer center during the first practical session, preferably with a switch allowing either ordinary monitor display or video projection.
Our experience suggests that a minimum of one instructor per 20 students is necessary for a successful Annotathon based course. Instructors not only supervise the practical sessions, but are also each assigned an equal quota of the student annotations to evaluate over the 5 weeks duration of the Annotathon (see Evaluation below). Instructors are university bioinformatics professors and assistant-professors. We also seek the assistance of postgraduate students (usually bioinformatics PhD students) to help with the practical sessions.
We have found that adequate instructor to student ratios are of paramount importance during the practical sessions, especially the first and second sessions where students need to be made rapidly confident with the interface and analysis workflow. If the student direct questioning intensity returns to more reasonable levels after that, the online evaluation of student annotations which picks up at that point is also highly demanding for instructors. According to Annotathon connection logs, instructors spend on average 30 hrs evaluating student annotations (20 students pairs per instructor, three sequences evaluated twice per student pair, see Evaluation below). The team leader instructor is usually connected a further 20 hrs with account management, forum questions etc.
We recommend that each student annotation is evaluated twice (allowing students to improve their annotations following initial instructor comments), although the Annotathon can also be configured for single pass annotations/evaluations (see below "Annotathon Team Management"). Each instructor evaluation pass of each annotated sequence consists in two parts:
- instructor free text comments
- a numerical mark between 1 and 10
In the case of three sequence assignments per student combined with iterative double evaluations, each student will receive 6 numerical marks by the end of the course. After Annotathon closure, numerical marks are first normalized across instructors (median subtracted and divided by standard deviation). The final Annotathon grade for each student is then computed as the average of the 6 instructor normalized individual evaluations.
In addition, instructors can grade the intrinsic difficulty of each sequence on a 1 very easy to 5 very difficult scale. These difficulty factors are used to give more weight in the final student grade to annotations that required extensive work, e.g. ORF's with numerous homologs presenting complex phylogenetic relationships are more difficult and represent more work than a non coding sequence with no detectable homologs. Individual evaluations are simply repeated difficulty factor times in the computation of the evaluation mean used to derive each student grade.
It is important that students submit their annotations for evaluation regularly rather than wait until the deadline. One measure that helps enforce regular submissions is to only allow new sequences to be added to the student cart after the preceding sequence annotations have been submitted to evaluation. This ensures that students get evaluation feedback early on in the practicals, allowing them to take comments into account with subsequent sequences. This also helps instructors spread out their evaluation duties.
With some teams we also added an extra session where each student pair presented one of their three annotated sequences to the class with the help of a videoprojector. The 5 minute presentation would be followed by 5 minutes questions by instructors or fellow classmates. The presentation would give rise to an additional individual mark for each student.
Because our students work in pairs during practicals, the course overall grade is composed for 50% by the Annotathon grade generated for the pair, and for the other 50% by a individual written examination for each student.
Annotathon Team Management
An Annotathon team consists of:
- annotators (a student cohort)
- instructors (guide students and evaluate annotations)
- a team leader (instructor who manages the team)
Students, instructors and team leader each have individual Annotathon accounts with specific roles and privileges. Students annotators essentially
Instructors are each assigned an equal quota of the annotations to evaluate. As well as being an instructor, the team leader has special privileges (such as changing the team's configuration parameters, inviting new instructors into the team, or responsibility for producing the final grade).
Creating a new team
To create a new team for you and your students, please contact the Annotathon webmaster who will open a team leader account for you (follow the "Contact" link in the "Help" tab on the online Annotathon portal). Provide the webmaster with the following information for team creation:
- team name (20 characters, will appear on the Annotathon website)
- name of host institution (e.g. university or college name).
- team leader preferred username
- team leader email address
The team leader will receive account details (including password) by email. We normally open new teams by the next day.
General team configuration
Log into your team leader account and click on the "Team" tab. Your team will be initially configured in the same way as the "Open Access" generic team. At the top of the "Team" tab, a short summary recalls essential data about the team (number of registered students, number of opened accounts, number of instructors etc.)
Review and amend as necessary the following parameters of the "Team" tab form:
- Public team name
- Official affiliation (e.g. university or college name)
- start date for student annotations
- closing date for student annotations
- Three introductory texts that will appear in the student "Rule Book" ("Rational", "Team information" & "Evaluation")
Remember to click on the "Update Team Parameters" to save your modifications.
Inviting new instructors
Under "Team->Instructors" are listed the instructors currently associated with the team. To invite a new instructor, enter his username and email in the fields provided. He will automatically receive his login parameters by email.
Instructors whose usernames are ticked are considered active evaluators, i.e. will be assigned their share of annotations to evaluate as students add new sequences to their carts. Deselect an instructor so that he receives no additional sequences to evaluate (annotations assigned to him previously remain his responsibility).
Enable the "Team->Predefined list of student names" in order to make students select their names in a predefined list of names (after saving this modification, you will then need to upload a CSV formatted file of student names). If left disabled, students are allowed to enter their names as free text at account creation.
The "Team->Number of student work groups" is only used if a large student cohort is split into subgroups during practicals. This is informational unless you choose the RECYCLE sequence dispatch scheme (see below).
Sequence data source
Choose from the "Team->Sequence data set" list which dataset students will be allowed to choose sequences from when adding sequences to their carts. The "Global Ocean Sampling expedition" dataset is selected by default. The GOS dataset having not been uploaded in full, click on the "Sequence reserve" link to see how many sequences have been uploaded and how many are still available (i.e. not annotated yet).
If you wish to upload more sequences to an existing dataset or create an entirely new dataset, click on the "To add more sequences to the database..." link. Create a tab delimited text file with one sequence per line as shown below. If necessary, this file can also contain sample definitions (lines starting with #SAMPLE). The format is:
#SAMPLE DATASET_NAME SAMPLE_ID SAMPLE_LOCATION COUNTRY DATE TIME GPS DEPTH TEMP SALINITY PORE_SIZE HABITAT GEO_LOC SAMPLE_ID SEQ_ACC_NUM SEQ_START SEQ_END DNA_SEQUENCE
A valid dataset file defining two samples and 3 sequences could look like this:
#SAMPLE Global Ocean Sampling expedition JCVI_SMPL_1103283000001 Sargasso Sea, Station 11 Bermuda (UK) 02/26/03 10:10:00 31°10'30n; 64°19'27.6w 5 20.5 36.7 0.1-0.8 Open Ocean Sargasso Sea #SAMPLE Global Ocean Sampling expedition JCVI_SMPL_1103283000021 Off Key West, FL USA 01/08/04 06:25:00 24°29'18n; 83°4'12w 1.7 25 36 0.1-0.8 Coastal Caribbean Sea JCVI_SMPL_1103283000001 JCVI_READ_1098127013050 1 938 TGCTCGCCGCGCTCTGGGGTTCA[...snip...]ACCCTGATGGACTTAGCCCTTGGCTGGACTAC JCVI_SMPL_1103283000021 JCVI_READ_1098127013057 1 927 TGGACCTCTTCCAATGTCTGCTC[...snip...]CCATCCGGTCTAAATGTACAAGTTTCCACT JCVI_SMPL_1103283000021 JCVI_READ_1098127013056 1 942 CCAATGTCTGCTCCACCACCTTT[...snip...]TTTGCTACTTGCACCCATATCTCACCTTGATAACCT
Make sure you respect the exact GPS syntax (31°10'30n; 64°19'27.6w) for correct sample map positioning. If you specify existing SAMPLE_ID's or existing SEQ_ACC_NUM's (which you created), then the corresponding samples or sequences will be updated accordingly (otherwise they are created).
To create a new dataset, just specify a new DATASET_NAME that isn't present in the database (use the "Sequence reserve" link to list available dataset and sample names).
If the dataset selected for the team has several defined samples, then students will be asked to select from which sample location they wish to pick a new sequence from when adding new items to their sequence cart.
Student sequence carts
The "Team->Cart sequence dispatch" parameter allows you to switch between NOVEL and RECYCLE modes. The default NOVEL mode ensures that only new sequences (never before annotated by any Annotathon team) will be added to student carts. In RECYCLE mode, sequences are only ever dispatched once per team (so that a given sequence is only annotated by one student pair inside a given team), however the same sequences are redundantly dispatched across distinct teams (this allows instructors to compare parallel annotation during evaluations).
A third CONVERGENT sequence dispatch mode is under development: this mode will be identical to the RECYCLE mode, however sequences will only be recycled through distinct teams until the main controlled vocabulary annotations converge. Convergence is reached when GO Biological Process, Molecular Function and Taxonomic Classification are identically assigned to the same sequence independently by three distinct teams.
Compiling individual annotation Evaluations into an overall grade
->tool synopsis (table)
Protein Functionnal Roles
Links, References & Contacts
- Annotathon portal: http://annotathon.univ-mrs.fr/
- Instructor manual: http://annotathon.univ-mrs.fr/Metagenes/index.php/Annotathon_Instructor_Manual
- Student user guide: http://annotathon.univ-mrs.fr/?actionjs=rules
- About the Annotathon, contacts: http://annotathon.univ-mrs.fr/Metagenes/index.php/Annotathon