How to handle data returned from the MR

Sequence analysis of PSI plasmids

As you know, the PSI:Biology-MR sequence verifies all PSI plasmids by comparing the clone's actual sequence (from the MR) to the reference sequence provided by the PSI center using our Automated Clone Evaluation (ACE) software. Each difference between the actual and expected sequences is recorded as a discrepancy object, which is categorized according to its type: silent mutation, amino acid substitution, frameshift, linker mutation, etc. Clones will be either accepted or rejected based on the criteria developed using the questionnaire sent out last year and analysis of discrepancy frequency for plasmids already processed in ACE.

After the ACE analysis several "bins" will be created:

  1. Completed and accepted clones. For these clones, the plasmid information will be uploaded into DNASU and samples will be prepared for distribution.
  2. Incomplete clones. These are clones that require additional sequencing or evaluation, either because sequencing with at least one primer failed, sequencing did not result in full coverage of the insert or there were regions of sequence with low confidence scores that led to discrepancies. These clones will be put in our salvage pathway for further processing at the MR.
  3. "No match clones". The clone sequence does not align with the sequence expected for that clone. We will handle these in the same way as "rejected" clones.
  4. "Rejected" clones. These are clones that exceed at least one high-confidence (i.e., clearly not due to sequencing error) threshold for reject criteria in ACE. It is worth noting that the ACE algorithm is designed to conserve resources. Thus, as soon as a clone surpasses a high-confidence rejection threshold, the clone is marked "rejected" and all further work on the clone stops, even if additional sequencing reads are needed to assemble a complete sequence. This is to avoid expending additional sequencing, processing and personnel costs on a clone that is clearly not useful. However, such clones are not physically "discarded", but are retained until a person reviews and ?signs off? that they should indeed be removed from the list. Thus, each month the MR will return data to the PSI center for all rejected clones as a spreadsheet containing all of the data for your recently analyzed clones.

Data you will receive from the MR

  1. Clone Identifiers such as our clone ID, your original Clone ID, the vector, and other clone annotations you sent us.
  2. The "Group", which will tell you whether the clone is a "no match" or "rejected". The rejected clones exceed at least one high-confidence (i.e., has a strong phred score in the region of discrepancy, and therefore, is clearly not due to sequencing error) threshold for reject criteria in ACE. Because of the conservative approach, there are some rejected clones that do not have assembled sequence available in the spreadsheet, but there will always be a high quality discrepancy in the available sequence that will force the clone's rejection. It is also worth noting that we never reject a clone because of a low quality discrepancy (so you do not need to spend time looking at these).
  3. Actual sequence information, which includes our internal sequence ID, the location of your trace files, the actual assembled clone sequence from the MR and its start and stop locations. Note that no match clones do not have assembled sequence information; however, you will be able to review the sequence by looking directly at the trace files. We have set up an FTP site for centers to grab the trace files (http://tracefiles.dnasu.asu.edu).? The trace file location can be found in the "Clone Directory Name" column according to the following example: D:\trace_files_root\clone_samples\329082\502750 (where the trace file location will be in folder 329082 > folder 502750 > .ab1 files)
  4. The list of discrepancies. This includes the type of 5' linker, CDS and 3' linker discrepancy (ex. Substitution, deletion, insertion, truncation) and the detailed list of discrepancies with abbreviations as follows:
Discrepancy? Abbreviation Description
Silent Substitution a87g,E29E nucleotide 87 mutated from an "a" to a "g"
amino acid 29 unchanged
Missense Substitution t8c,L3P nucleotide 8 mutated from a "t" to a "c"
amino acid 3 mutated from a "L" to a "P"
5' Linker Substitution a-26g nucleotide at position -26 mutated from an "a" to a "g"
Insertion ins@369,3 insertion of 3 nucleotides at position 369
Deletion del@19,3 deletion of 3 nucleotides at position 19

What to do with this returned data

Once you review your discrepancy spreadsheet, you have several options in handling these clones:

  1. Confirm that the clone should be rejected
  2. Argue that processing should continue on the clone that was rejected before being fully assembled by explaining why the fatal discrepancy(ies) is (are) in error
  3. Recommend that the any fully sequenced but rejected clones should be accepted despite their discrepancies and include a disclaimer that explains why. We will post the actual sequence and disclaimer on DNASU.
  4. If it is determined that the clone is actually correct and the original predicted annotation was in error, you can update the clone's annotation. To do this, return a spreadsheet containing the Clone ID, a column with the original data (headed, for example, NTSeqOld) and a column containing a corrected version of the data (headed, for example, NTSeqNew). We will then update our database and reanalyze the clone with the new annotation that should then lead to acceptance of the plasmid.