Showing posts with label phylogeny. Show all posts
Showing posts with label phylogeny. Show all posts

Thursday, January 30, 2020

Pre-empting Pandora's Box - Update on Blastocystis Subtypes and Reference Data

Back in 2006, when we came up with the subtype terminology for Blastocystis, the spectrum of and boundaries between Blastocystis subtypes were quite clear and distinct. Since then, the genetic make-up of Blastocystis has appeared to be an even bigger universe than we (or at least I) expected, and we may be far from having explored the entire 'galaxy' yet.

New technologies make it easier to sequence DNA, and sequences attributed to Blastocystis are accumulating in the publicly available databases with great speed. While this situation is one of the things that stimulate research (genetic diversity, co-evolution, host specificity, parasite-host-microbiome interaction, etc.), issues have emerged when it comes quality-controlling DNA sequences and putting taxonomic identifiers on these sequences.

For Blastocystis, the main taxonomic identifier is the 'subtype'. In 2013, 17 subtypes of Blastocystis had been acknowledged based on SSU rDNA analysis, and since then, quite a few more have been suggested by independent researchers all around the world. While it's great to see the field advance and more and more researchers 'checking in' on Blastocystis, care should be taken to ensure that Blastocystis terminology remains a useful one. And this... is not an easy task!

Some things are relatively straightforward though. For instance, sequence quality control. A simple BLAST query in GenBank (NCBI Database) should tell you whether your sequence is Blastocystis or something else. Like banana. Or asparagus. DNA sequence chimeras are sequences where one piece of DNA is combined with a piece of DNA from another strain/species/genus/etc., which can happen during PCR-based amplification of DNA. Suppose you have a sequence that is 75% Blastocystis and 25% banana. If you BLAST such a sequence, you might get Blastocystis as the top hit, but with a modest amount of sequence identity - maybe 85%. If you're not cautious, you might jump to the conclusion that this might be a new subtype, since 85% similarity is a lot less than the 95-97% similarity that is used pragmatically to delimit the boundary between subtypes. But if you look carefully at the alignment of the query sequence and the reference sequence, you'll probably note that a large part of the sequence aligns very well to the most similar reference sequence, while a minor part of it has great dissimilarity. This should be a warning sign, and you should try and BLAST only the bit of the sequence not aligning up well... and when you do this, you might end up with... banana! In which case you would have to discard this part of the sequence. Please also see one of my recent posts for more on this. If you do not check for chimeras, you might end up including chimeric DNA sequences in your phylogenetic analyses that will distort and confuse the interpretation and - in the worst case - lead to erroneous calling of new subtypes.

What is less easy is to set a 'one-fits-all' threshold for sequence similarity... how similar can Blastocystis DNA sequences be to be considered the same subtype? When do you have evidence of a 'new' subtype? It's difficult to know, as long as the data available in public databases is so limited as it is. Moreover, researchers do not always use the same genetic markers. It's still common practice to amplify and sequence only about 1/3 of the SSU rRNA gene and use that as a taxonomic identifier. But if it's not the same 1/3 then it gets tricky to compare data. Moreover, we actually need near-complete SSU rDNA sequences (at least 1600 bp or so) to be able to infer robust phylogenetic relationships between reference sequences and sequences potentially reflecting new subtypes. Obviously, this is because variation can exist across the entire SSU rRNA gene.

One subtype that has proven particularly challenging is ST14, a subtype which is common in larger herbivourous mammals, is very difficult to delimit. It may easily be confused with other subtypes, if sufficiently long sequences are not used for investigation. To this end, we try to keep a pragmatic approach to Blastocystis subtype terminology, and it may turn out that it would be more practical and relevant to refer to ST24 and ST25 as ST14 (see figure below). For now, we suggest keeping them as separate subtypes. Near-complete Blastocystis SSU rDNA sequences from a lot of larger herbivorous mammals will help us resolve the taxonomy in the top part of the tree shown in the figure above.

In terms of acquiring near-complete SSU rDNA sequences, I would personally recommend MinION sequencing of PCR products obtained by the universal eukaryotic primers RD5 + RD3. And if DNA from cultures isused (yes, it IS possible to culture Blastocystis not only from human hosts, but also from a variety of animals), then then MinION sequencing and analysis of the data output should be a straight-forward and relative cost-effective task.

Figure. As of January 2020, 'real' Blastocystis subtypes are most likely subtypes 1–17, 21, 23–26. This simplified phylogeny gives and indication of the relatedness of the subtypes and the relative host specificity. Humans can host subtypes 1–9 and also 12; when subtypes other than 1–4 are encountered in human samples, this may reflect cases of zoonotic transmission.


Graham Clark and I just published an article in Trends in Parasitology on this, and we concluded that some of the newly proposed subtypes are in fact invalid. Invalid subtypes (subtypes 18, 19, 20, 22) typically reflected DNA sequence chimeras.

In the figure above, you can see the subtypes identified to date that we consider valid.

We also provided updated guidelines on Blastocystis subtyping. One very important thing to include here is reference sequence data. It would be very useful if our wonderful Blasto colleagues could all try and use the same reference sequences when they develop multiple sequence alignments for phylogenetic analyses. We have already done all the work for you, so all there is to it, is to download the sequences from London School of Hygiene and Tropical Medicine's server available here and align them with your own DNA sequences. It would make life easier for all of us!

๐ŸŒž๐ŸŒž๐ŸŒž๐ŸŒž๐ŸŒž๐ŸŒž๐ŸŒž๐ŸŒž๐ŸŒž๐ŸŒž๐ŸŒž๐ŸŒž๐ŸŒž๐ŸŒž๐ŸŒž๐ŸŒž๐ŸŒž๐ŸŒž๐ŸŒž

Corrected proofs of the article can be downloaded here.

Thanks for reading!

Thursday, December 6, 2018

Is this a new Blastocystis subtype? Maybe not! Here's Why!

The genetic diversity of Blastocystis is becoming comparable to the universe! Seventeen subtypes (which are likely separate species or even genera) have been acknowledged so far, but quite a few more have been mentioned.

However, before assigning new Blastocystis subtype numbers to your SSU rDNA sequences, you'd need to do some QC work on your data. Sometimes we notice sequences deposited in the NCBI Database or included in articles that may look like new Blastocystis subtypes.... but they're most likely not!

I asked Prof Graham Clark from London School of Hygiene and Tropical Medicine, who has more than 20 years' experience in the Blasto business, to give a couple of examples, explaining where issues may arise. He says:


'One of the tasks I do when I have a few minutes to spare is to look at new Blastocystis sequences that have been deposited into GenBank. I am always hoping to stumble across some exciting new subtypes or new hosts that will expand our understanding of diversity in Blastocystis. Only rarely does this happen, however. I do, occasionally, come across sequences that are problematic and it is these that I want to focus on.

Chimaeras: This problem occurs during PCR amplification when one primer binds to a Blastocystis subtype DNA and the other primer binds to a different source of DNA. In the first case I came across the other source was a different Blastocystis subtype, meaning that the sequence at one end of the PCR product matched one subtype and the sequence at the other end matched a different subtype. This observation is mentioned in the paper describing barcoding of Blastocystis (Scicluna et al, 2006). Since then I have seen other chimaeric sequences: one recently was a mixture of Blastocystis plus a plant while another was Blastocystis plus a free-living protist.
Chimaeras are produced when there is incomplete replication of a DNA strand during a cycle. After denaturation in the next cycle, the single stranded partial product can bind to another single stranded product from a different source and synthesis results in a product combining sequences from two sources. The conservation of ribosomal RNA genes means there can be sufficient similarity to allow binding between sequences from distantly related organisms.
Chimaeras are generally only found when the sequences are from cloned ribosomal RNA gene sequences obtained by PCR, although they also occur in some forms of Next Generation  Sequencing. When mixed PCR products are sequenced directly the sequence obtained is the average of all the products in that reaction, and so chimaera sequences will usually be ‘diluted out’ by the major product of the reaction. Only when a single sequence from that mixture is isolated and studied will chimaeras be detected.
If the ‘alien’ region makes up a significant percentage of the sequence then the result of BLAST analysis will show a percentage divergence from known subtypes that indicates it may represent a new subtype. A quick way to evaluate this is to compare the BLAST results using the first and last thirds of the sequence. If it is a new subtype the results should be similar. In a recently detected chimaera, the first third was a 100% match to a known Blastocystis subtype while the last third was a 95% match to asparagus. This approach is an easy way to check whether there is something to get excited about.
A chimaera sequence can sometimes be detected because of its impact on phylogenetic trees. The sequence will be on its own branch, often at the base of a clade containing the subtype found at the Blastocystis-matching end.

Non-Blastocystis Blastocystis sequences: Like chimaeras these are often PCR artefacts, most commonly encountered when amplifying from stool DNA, especially if the stool is non-human. There is an expectation that Blastocystis-specific primers will only amplify Blastocystis DNA but, sadly, that is not always the case. I have personally seen this many times - if Blastocystis DNA is a minority of the eukaryotic DNA in the sample then the likelihood of artefacts increases greatly. These are generally identified easily if the sequence is compared using BLAST against the full nr/nt nucleotide collection in GenBank. However, there is a temptation to limit the search to the genus Blastocystis to speed up the identification process, because that is what you expect it to be. Again because of the conservation of ribosomal RNA genes, if ribosomal RNA genes are amplified there will be a match to Blastocystis, and the divergence will likely suggest, again, a new subtype.  Comparing against the full nucleotide collection will always show whether the sequence is of Blastocystis origin.

Both chimaeras and non-Blastocystis products are easily identified if the correct steps are taken. In conclusion, be suspicious of anything that is significantly divergent to known Blastocystis – it could be an indication of an artefact.'
Fig. 1. A 'Blastaragus' (a chimaera of a Blastocystis and an asparagus)

Fig. 2. An example of a chimaeric DNA sequence (the 'Blastaragus' from Fig. 1). Notice how the consensus sequence starts out as Blastocystis ST14, shifts to asparagus, and then shifts back again to Blastocystis ST14.



I thank Graham, and I really hope that this information will be picked up by many of our colleageus. And please share! Research into Blastocystis is rapdily expanding, and we should all take on the responsibility of QCing our data.

Thanks for listening!

By the way... if you're interested in tutorials on Blastocystis subtyping from our recent workshop in Colombia, please look up Workshop Session 4 in the manual available at this link. 

Hope to be back before Christmas!

Thursday, April 10, 2014

Resources For Blastocystis Epidemiology Research

 I often get questions related to Blastocystis epidemiology research, and many of these are 'how-to' questions.

And as announced, I've chosen to dedicate a separate post listing some easy-to-use tools for subtyping Blastocystis from humans and animals.

First, I want to guide your attention to the YouTube video that I made; it takes you through various important steps of subtyping and introduces you to the online database that can be used to call subtypes by BLASTing batches of fasta files - provided that they are the right ones! And what do I mean by 'right ones'? Well, in order to get subtype information in a split second you need to have DNA sequences covering the first 500 base pairs (5'-end) of the Blastocystis small subunit (SSU) rRNA gene.


The online query database can be found here, and as you can see, it has a 'Sequence and profiles definition' section and an 'Isolates database' section; for now, never mind the latter. Now, to test this, press the 'Sequence and profiles definition', press the 'Sequence query' link, copy the following fasta file and paste it into the query box:

>gi|359391562|gb|JN682513.1|
CTGCCAGTAGTCATACGCTCGTCTCAAAGATTAAGCCATGCATGTGTAAGTATAAATATTTGACTTTGAA
ACTGCGAATGGCTCATTATATCAGTTATAGTTTATTTGATGAACAATACTACTTGGATAACCGTAGTAAT
TCTAGAGCTAATACATGACAAAATCCTCGACTTTGAAGAGGTGTATTTATTAGAATGAAACCAAGAGACT
TCGGTCTATTTGTGAGTAATAATAACTAATCGTATCGCATGCTTAGGTAGCGATATGTCTTTCAAGTTTC
TGCCCTATCAGCTTTGGATGGTAGTGTATTGGACTACCATGGCAGTAACGGGTAACGAAGAATTTGGGTT
CGATTTCGGAGAGGGAGCCTGAGAGATGGCTACCACATCCAAGGAAGGCAGCAGGCGCGTAAATTACCCA
ATCCTGACATAGGGAGGTAGTGACAATAAATCACAATGCGGAACTATTAGTTTTGCAATTGGATTGAGAA
CAATGTACAAATGTTATCGATAAACAATTGGAGGGCAAGTCTGGTGCCAGCAGCCGCGGTAATTCCAGCT
CCAATAGCGTATATTAACGTTGTTGCAGTTAAAAAGCTCGTAGTTGAATTGAAGTGAACTTGGATTGATG
TGATCTTCGGATGACGTGAATCAAAGTTGACTCTTTCCAAAGTCAATACATTGGTATTCATTTATCTTTG
TAT

 Submit your query, and then what you see is this:

Which means that a 100% identify was found and that what you pasted in was ST4, allele no. 94. This allele belongs to the rare genotype of Blastocystis. sp. ST4.

Now, even if you have a non-Blastocystis sequence, you will sometimes get a result providing the gene region is the correct one, and this is where to exert great awareness. Below is a sequence of Saccharomyces cerevisiae, which may be amplified by the barcoding primers; try and paste it into the query box and submit it for analysis:

>Saccharomyces_cerevisiae_(J01353)
TATCTGGTTGATCCTGCCAGTAGTCATATGCTTGTCTCAAAGATTAAGCCATGCATGTCTAAGTATAAGCAATTTATACAGTGAAACTGCGAATGGCTCATTAAATCAGTTATCGTTTATTTGATAGTTCCTTTACTACA
TGGTATAACCGTGGTAATTCTAGAGCTAATACATGCTTAAAATCTCGACCCTTTGGAAGAGATGTATTTATTAGATAAAAAATCAATGTCTTCGGACTCTTTGATGATTCATAATAACTTTTCGAATCGCATGGCCTTGT
GCTGGCGATGGTTCATTCAAATTTCTGCCCTATCAACTTTCGATGGTAGGATAGTGGCCTACCATGGTTTCAACGGGTAACGGGGAATAAGGGTTCGATTCCGGAGAGGGAGCCTGAGAAACGGCTACCACATCCAAGGA
AGGCAGCAGGCGCGCAAATTACCCAATCCTAATTCAGGGAGGTAGTGACAATAAATAACGATACAGGGCCCATTCGGGTCTTGTAATTGGAATGAGTACAATGTAAATACCTTAACGAGGAACAATTGGAGGGCAAGTCT
GGTGCCAGCAGCCGCGGTAATTCCAGCTCCAATAGCGTATATTAAAGTTGTTGCAGTTAAAAAGCTCGTAGTTGAACTTTGGGCCCGGTTGGCCGGTCCGATTTTTTCGTGTACTGGATTTCCAACGGGGCCTTTCCTTC


What you'll see is this:


As you can see, there are many mismatches in the alignment.. so this is not allele 42 (ST4), of course not, it's not even Blastocystis!  This is why I suggest you always nucleotide BLAST your fasta files at the NCBI database (use this link). Only if they match Blastocystis, go ahead and call the subtype and the allele using the pubmlst.org/blastocystis database.

If you have a Blastocystis sequence that exhibits polymorphism compared to the reference sequences in the Blastocystis database, it may be due to one of two reasons: 1) The sequence may be unclear and/or edited erroneously, or 2) the sequence represents a new allele or a new subtype.

This means that if your sequence does not fit 100% with those in the database, I suggest you have a meticulous look at it, and if there are unclear sections, then re-sequence the whole lot - preferentially bidirectionally. If you end up with a clear sequence which still exhibits one or more polymorphisms, then please submit it to the database - you can do so be contacting the curator, who is basically me.

What you want is sequences looking like this:



For sequence editing you may want to use CHROMAS or FinchTv. These are good for single nucleotide sequence editing. If I do bidirectional sequencing or in cases where I'm having multiple sequences covering a gene (for instance when I'm sequencing complete SSU rRNA genes), I use STADEN Package; installing it may be a pain, though, make sure you use the right browser for starters... Once it has been installed, it works brilliantly, and the SOP I made for it is available below (please note that I made this SOP a couple of years ago; more recent software versions are on the market).




When is a subtype a novel subtype? Well, we addressed this question in our recent review in Advances in Parasitology. If you cannot access this journal, I suggest you look it up in the LSHTM Online Library - where you can find the pre-print version (go here to download). If you think you're dealing with a new subtype (less than 97-98% identity to reference sequences in GenBank), I suggest you look up this blog post. Importantly, please note that there is an alignment of reference sequences (representing all the 17 subtypes currently known) here - however, it requires access to the journal (and then look up 'Supplementary content' - there's a notepad file you can download). I can hope for colleagues using this alignment for phylogenetic analysis of Blastocystis SSU rRNA genes, since this is one important step towards further standardisation of Blastocystis terminology.

Other useful free online software:

For quick nucleotide alignments (groups your sequences in clusters) you can use MultAlin - chose the DNA - 5-0 option from the alignment parameters drop down menu.Trick: I usually do alignments in MultAlin and once I get the alignment, I choose the 'Results as fasta files' option (scroll to the bottom of the page), - this gives you an inventory of aligned fasta files that you can copy and paste directly into the 'build DNA alignment' function in MEGA6... now you can for instance search for specific DNA signatures (this option is not available in the MultAlin output unfortunately) and you can do phylogeny too.

And so, for alignment and phylogeny, I recommend MEGA6 or any more recent version.

Useful papers:

Scicluna SM, Tawari B, & Clark CG (2006). DNA barcoding of Blastocystis. Protist, 157 (1), 77-85 PMID: 16431158 

Stensvold CR (2013). Comparison of sequencing (barcode region) and sequence-tagged-site PCR for Blastocystis subtyping. Journal of Clinical Microbiology, 51 (1), 190-4 PMID: 23115257 

Alfellani MA, Taner-Mulla D, Jacob AS, Imeede CA, Yoshikawa H, Stensvold CR, & Clark CG (2013). Genetic diversity of Blastocystis in livestock and zoo animals. Protist, 164 (4), 497-509 PMID: 23770574 

Stensvold CR (2013). Blastocystis: Genetic diversity and molecular methods for diagnosis and epidemiology. Tropical Parasitology, 3 (1), 26-34 PMID: 23961438 

Alfellani MA, Stensvold CR, Vidal-Lapiedra A, Onuoha ES, Fagbenro-Beyioku AF, & Clark CG (2013). Variable geographic distribution of Blastocystis subtypes and its potential implications. Acta Tropica, 126 (1), 11-8 PMID: 23290980 

Clark CG, van der Giezen M, Alfellani MA, & Stensvold CR (2013). Recent developments in Blastocystis research. Advances in Parasitology, 82, 1-32 PMID: 23548084

Stensvold CR, Ahmed UN, Andersen LO, & Nielsen HV (2012). Development and evaluation of a genus-specific, probe-based, internal-process-controlled real-time PCR assay for sensitive and specific detection of Blastocystis spp. Journal of Clinical Microbiology, 50 (6), 1847-51 PMID: 22422846

Stensvold CR, Suresh GK, Tan KS, Thompson RC, Traub RJ, Viscogliosi E, Yoshikawa H, & Clark CG (2007). Terminology for Blastocystis subtypes--a consensus. Trends in Parasitology, 23 (3), 93-6 PMID: 17241816

Moreover, London School of Hygiene and Tropical Medicine Online Library currently comprises 25 papers on Blastocystis, most of which can be accessed for free (pre-print version) here.

This blog post might be updated later on, and so you may want to subscribe to blog updates - you can do so using the designated function in the sidebar.If you have any suggestions to how to improve this post, feel free to contact me.