Thursday, December 6, 2018

Is this a new Blastocystis subtype? Maybe not! Here's Why!

The genetic diversity of Blastocystis is becoming comparable to the universe! Seventeen subtypes (which are likely separate species or even genera) have been acknowledged so far, but quite a few more have been mentioned.

However, before assigning new Blastocystis subtype numbers to your SSU rDNA sequences, you'd need to do some QC work on your data. Sometimes we notice sequences deposited in the NCBI Database or included in articles that may look like new Blastocystis subtypes.... but they're most likely not!

I asked Prof Graham Clark from London School of Hygiene and Tropical Medicine, who has more than 20 years' experience in the Blasto business, to give a couple of examples, explaining where issues may arise. He says:


'One of the tasks I do when I have a few minutes to spare is to look at new Blastocystis sequences that have been deposited into GenBank. I am always hoping to stumble across some exciting new subtypes or new hosts that will expand our understanding of diversity in Blastocystis. Only rarely does this happen, however. I do, occasionally, come across sequences that are problematic and it is these that I want to focus on.

Chimaeras: This problem occurs during PCR amplification when one primer binds to a Blastocystis subtype DNA and the other primer binds to a different source of DNA. In the first case I came across the other source was a different Blastocystis subtype, meaning that the sequence at one end of the PCR product matched one subtype and the sequence at the other end matched a different subtype. This observation is mentioned in the paper describing barcoding of Blastocystis (Scicluna et al, 2006). Since then I have seen other chimaeric sequences: one recently was a mixture of Blastocystis plus a plant while another was Blastocystis plus a free-living protist.
Chimaeras are produced when there is incomplete replication of a DNA strand during a cycle. After denaturation in the next cycle, the single stranded partial product can bind to another single stranded product from a different source and synthesis results in a product combining sequences from two sources. The conservation of ribosomal RNA genes means there can be sufficient similarity to allow binding between sequences from distantly related organisms.
Chimaeras are generally only found when the sequences are from cloned ribosomal RNA gene sequences obtained by PCR, although they also occur in some forms of Next Generation  Sequencing. When mixed PCR products are sequenced directly the sequence obtained is the average of all the products in that reaction, and so chimaera sequences will usually be ‘diluted out’ by the major product of the reaction. Only when a single sequence from that mixture is isolated and studied will chimaeras be detected.
If the ‘alien’ region makes up a significant percentage of the sequence then the result of BLAST analysis will show a percentage divergence from known subtypes that indicates it may represent a new subtype. A quick way to evaluate this is to compare the BLAST results using the first and last thirds of the sequence. If it is a new subtype the results should be similar. In a recently detected chimaera, the first third was a 100% match to a known Blastocystis subtype while the last third was a 95% match to asparagus. This approach is an easy way to check whether there is something to get excited about.
A chimaera sequence can sometimes be detected because of its impact on phylogenetic trees. The sequence will be on its own branch, often at the base of a clade containing the subtype found at the Blastocystis-matching end.

Non-Blastocystis Blastocystis sequences: Like chimaeras these are often PCR artefacts, most commonly encountered when amplifying from stool DNA, especially if the stool is non-human. There is an expectation that Blastocystis-specific primers will only amplify Blastocystis DNA but, sadly, that is not always the case. I have personally seen this many times - if Blastocystis DNA is a minority of the eukaryotic DNA in the sample then the likelihood of artefacts increases greatly. These are generally identified easily if the sequence is compared using BLAST against the full nr/nt nucleotide collection in GenBank. However, there is a temptation to limit the search to the genus Blastocystis to speed up the identification process, because that is what you expect it to be. Again because of the conservation of ribosomal RNA genes, if ribosomal RNA genes are amplified there will be a match to Blastocystis, and the divergence will likely suggest, again, a new subtype.  Comparing against the full nucleotide collection will always show whether the sequence is of Blastocystis origin.

Both chimaeras and non-Blastocystis products are easily identified if the correct steps are taken. In conclusion, be suspicious of anything that is significantly divergent to known Blastocystis – it could be an indication of an artefact.'
Fig. 1. A 'Blastaragus' (a chimaera of a Blastocystis and an asparagus)

Fig. 2. An example of a chimaeric DNA sequence (the 'Blastaragus' from Fig. 1). Notice how the consensus sequence starts out as Blastocystis ST14, shifts to asparagus, and then shifts back again to Blastocystis ST14.



I thank Graham, and I really hope that this information will be picked up by many of our colleageus. And please share! Research into Blastocystis is rapdily expanding, and we should all take on the responsibility of QCing our data.

Thanks for listening!

By the way... if you're interested in tutorials on Blastocystis subtyping from our recent workshop in Colombia, please look up Workshop Session 4 in the manual available at this link. 

Hope to be back before Christmas!