Triage your Results

As you are performing your initial "sanity check" on each sequence record, this is also a good opportunity to perform an initial triage on your results. With some of your specimens, it will be very easy to make an accurate species determination. Some records will be much more difficult. This page discusses how to approach a "first look" at the BLAST results for your specimen.

While looking through your BLAST results for the first time, note if the specimen has any reference sequences (from GenBank or Mycoflora BLAST) that have a high identity value (98-100%) to your new sequence.  If these reference sequences use the same species name you believed your specimen to be, then there will be a possibility that additional analysis will not be necessary for this specimen, and you can utilize this name for your specimen. Always be sure to look a bit further down in the BLAST results for the specimen as well. If there are sequences going under the same name with lower levels of sequence similarity (>97%) then more analysis will be necessary in the future. Read below for some more considerations to take into account when performing your initial triage based on BLAST results, particularly the query cover percentage.

What is a "high likelihood match?"

When we are performing an initial triage on the results, we are only looking to make a species determination for a record where the DNA results can be ascertained with a reasonably high level of certainty. This initial step will begin to whittle down the total number of specimens you have left that will need more advanced analysis, which can often be quite time consuming.

There is not a set quantitative method to make this initial assessment for all fungal species, but a good heuristic involves a quick overview of four components of the reference sequences that come up in BLAST results: high "Identity" value, high "Query Cover" value, the number of reference sequences being matched this high in the results, and that the species name comes from a trusted source.

For many groups of fungi, two specimens with a sequence similarity of 97% or above are considered to be a single species. Keep in mind, however, that this number is highly variable between different groups. For some groups, two specimens may have 99% sequence similarity, but they may be different species. The 97% number is a general rule-of-thumb for species-level determination in fungi; it is not a hard and fast rule.

A high Identity value, for the purposes of the initial triage, generally falls in the range of 98-100% sequence similarity. The identity reports on the percentage of base pairs that are the same between your sequence and the reference. If 99 out of 100 base pairs match, then you have a 99% identity value in your results. If there are a large number of reference sequences that fall into the 98-100% range in your results, all with the same species name you believed it to be, then you would likely not need to review that sequence much more in the future, and it can be verified for inclusion in the final flora. 

A high Query Cover value for the intial triage is in the 70%+ range. If the top results fall below this range, it would generally be a good idea to review the sequence more in the future, and not verify it as a part of your initial triage. Query cover is the percentage of the query seqeuence that overlaps the subject sequence. BLAST results do not typically attempt to match the full length of a sequence. Often, the results only report on a single segment of a sequence that most closely aligns. It is possible to have a 99% identity match, but only across 35% of your sequence (query cover). In this case you would have no information on how closely the other 65% of your sequence matches up. This means reviewing the query cover percentage of a BLAST search is just as important as the identity percentage, especially when performing an initial triage. 

You can see this visually from your MycoMap BLAST results. NCBI stores the data for 3 days from the time the BLAST is initiated. Click the following button:

It will take you to the NCBI BLAST results page. The red lines are a visual representation of the query cover your results are based on:

The final thing to keep in mind is that the species names associated with reference sequences in your search may not be accurate. It all comes down to how much you trust the identifier who originally proposed the name for the reference sequence. If the sequence is from a type specimen or from a paper that reviews the group, that certainly carries a lot of weight. Most reference sequences, however, will have little supporting provenance on how the species determination was made. We will discuss this more in the future, but always keep this in mind, especially when you are new to analyzing sequence data.

I have a great match. What now?

Each row of your dashboard has a checkmark icon on each record line (highlighted in red below). Clicking this checkmark turns it green. A green checkmark represents a record that has been "verified" for the given project, and means that the project owner believes the species name in use for the record accurately reflects the best name available for the record. It will flag that record for inclusion in the resulting flora for the project, as well as for the resulting comprehensive flora for North America. 

Records in a project can be filtered by whether or not the display includes these assessed (verified) records.
The ultimate goal of a project owner is to verify all of the records in their projects with the "correct" species name. Future sections on sequence analysis will go into more details about this process for more difficult specimens, but for your initial triage, only verify the sequence records you are certain represent the appropriate species name.

About North American Mycoflora Project

We are working towards a single goal - the development of the first comprehensive mycoflora of North America. This project is a consortium of citizen scientists and professional mycologists performing a biological survey of all the macrofungi that occur in North America.

Latest News

22 February 2018
16 October 2017
©2017 North American Mycoflora Project. All Rights Reserved.