Showing posts with label reference data. Show all posts
Showing posts with label reference data. Show all posts

Thursday, January 30, 2020

Pre-empting Pandora's Box - Update on Blastocystis Subtypes and Reference Data

Back in 2006, when we came up with the subtype terminology for Blastocystis, the spectrum of and boundaries between Blastocystis subtypes were quite clear and distinct. Since then, the genetic make-up of Blastocystis has appeared to be an even bigger universe than we (or at least I) expected, and we may be far from having explored the entire 'galaxy' yet.

New technologies make it easier to sequence DNA, and sequences attributed to Blastocystis are accumulating in the publicly available databases with great speed. While this situation is one of the things that stimulate research (genetic diversity, co-evolution, host specificity, parasite-host-microbiome interaction, etc.), issues have emerged when it comes quality-controlling DNA sequences and putting taxonomic identifiers on these sequences.

For Blastocystis, the main taxonomic identifier is the 'subtype'. In 2013, 17 subtypes of Blastocystis had been acknowledged based on SSU rDNA analysis, and since then, quite a few more have been suggested by independent researchers all around the world. While it's great to see the field advance and more and more researchers 'checking in' on Blastocystis, care should be taken to ensure that Blastocystis terminology remains a useful one. And this... is not an easy task!

Some things are relatively straightforward though. For instance, sequence quality control. A simple BLAST query in GenBank (NCBI Database) should tell you whether your sequence is Blastocystis or something else. Like banana. Or asparagus. DNA sequence chimeras are sequences where one piece of DNA is combined with a piece of DNA from another strain/species/genus/etc., which can happen during PCR-based amplification of DNA. Suppose you have a sequence that is 75% Blastocystis and 25% banana. If you BLAST such a sequence, you might get Blastocystis as the top hit, but with a modest amount of sequence identity - maybe 85%. If you're not cautious, you might jump to the conclusion that this might be a new subtype, since 85% similarity is a lot less than the 95-97% similarity that is used pragmatically to delimit the boundary between subtypes. But if you look carefully at the alignment of the query sequence and the reference sequence, you'll probably note that a large part of the sequence aligns very well to the most similar reference sequence, while a minor part of it has great dissimilarity. This should be a warning sign, and you should try and BLAST only the bit of the sequence not aligning up well... and when you do this, you might end up with... banana! In which case you would have to discard this part of the sequence. Please also see one of my recent posts for more on this. If you do not check for chimeras, you might end up including chimeric DNA sequences in your phylogenetic analyses that will distort and confuse the interpretation and - in the worst case - lead to erroneous calling of new subtypes.

What is less easy is to set a 'one-fits-all' threshold for sequence similarity... how similar can Blastocystis DNA sequences be to be considered the same subtype? When do you have evidence of a 'new' subtype? It's difficult to know, as long as the data available in public databases is so limited as it is. Moreover, researchers do not always use the same genetic markers. It's still common practice to amplify and sequence only about 1/3 of the SSU rRNA gene and use that as a taxonomic identifier. But if it's not the same 1/3 then it gets tricky to compare data. Moreover, we actually need near-complete SSU rDNA sequences (at least 1600 bp or so) to be able to infer robust phylogenetic relationships between reference sequences and sequences potentially reflecting new subtypes. Obviously, this is because variation can exist across the entire SSU rRNA gene.

One subtype that has proven particularly challenging is ST14, a subtype which is common in larger herbivourous mammals, is very difficult to delimit. It may easily be confused with other subtypes, if sufficiently long sequences are not used for investigation. To this end, we try to keep a pragmatic approach to Blastocystis subtype terminology, and it may turn out that it would be more practical and relevant to refer to ST24 and ST25 as ST14 (see figure below). For now, we suggest keeping them as separate subtypes. Near-complete Blastocystis SSU rDNA sequences from a lot of larger herbivorous mammals will help us resolve the taxonomy in the top part of the tree shown in the figure above.

In terms of acquiring near-complete SSU rDNA sequences, I would personally recommend MinION sequencing of PCR products obtained by the universal eukaryotic primers RD5 + RD3. And if DNA from cultures isused (yes, it IS possible to culture Blastocystis not only from human hosts, but also from a variety of animals), then then MinION sequencing and analysis of the data output should be a straight-forward and relative cost-effective task.

Figure. As of January 2020, 'real' Blastocystis subtypes are most likely subtypes 1–17, 21, 23–26. This simplified phylogeny gives and indication of the relatedness of the subtypes and the relative host specificity. Humans can host subtypes 1–9 and also 12; when subtypes other than 1–4 are encountered in human samples, this may reflect cases of zoonotic transmission.

Graham Clark and I just published an article in Trends in Parasitology on this, and we concluded that some of the newly proposed subtypes are in fact invalid. Invalid subtypes (subtypes 18, 19, 20, 22) typically reflected DNA sequence chimeras.

In the figure above, you can see the subtypes identified to date that we consider valid.

We also provided updated guidelines on Blastocystis subtyping. One very important thing to include here is reference sequence data. It would be very useful if our wonderful Blasto colleagues could all try and use the same reference sequences when they develop multiple sequence alignments for phylogenetic analyses. We have already done all the work for you, so all there is to it, is to download the sequences from London School of Hygiene and Tropical Medicine's server available here and align them with your own DNA sequences. It would make life easier for all of us!


Corrected proofs of the article can be downloaded here.

Thanks for reading!

Wednesday, December 11, 2013

Molecular Epidemiology: Developing a Language

Initiatives towards standardising diagnostic methods and convening on taxonomy and reference data is extremely important in a world where multiple research teams independently carry out research using molecular markers to identify and differentiate species and genotypes of infectious organisms; such activity is crucial to identify patterns of transmission, differences in virulence, and opportunities for control and intervention. Without such standards, efforts to survey and surveil such organisms would be more or less futile, and so they are the backbone of molecular epidemiology.

Having seen that a variety of morphologically similar but genetically diverse Blastocystis organisms found in humans could in fact colonise a range of different hosts, we realised back in 2006 that all these variants could not all be 'Blastocystis hominis', which was then the species name used for Blastocystis found in humans, and together with colleagues we took to revisiting Blastocystis terminology: We recognised that we did not know enough about host specificity and genetic diversity to be able to come up with relevant species names, and so we invented (or maybe not invented, but at least 'formalised') the subtype system, a sort of a barcode system, where genetically similar (typically 98-100%) organisms are assigned to the same subtype, hence ST1, ST2, ST3, etc., which we today now know so well.

Slapeta now suggests a barcoding system for Cryptosporidium. This single-celled parasite takes a major toll on the health of infants and toddlers in developing countries (in some places surpassed only by norovirus), and may also cause debilitating disease in immunocompromised. The nomenclature for Cryptosporidium is very complicated for those of us who are not experts; for instance, I only recently realised that C. parvum may now only refer to the Mouse I genotype and not the 'common' or 'traditional' C. parvum (which now appears to be C. pestis), which is common in both humans and cattle. However, there is a debate going on as to which taxonomy should be followed, and whether this novel leap in 'Cryptosporidium taxonomy revision' can be endorsed by Slapeta's fellow Crypto experts, remains to be seen. Contentiousness aside, barcoding Cryptosporidium does seem relevant due to the fact that the host specificity of Cryptosporidium is relatively loose; for instance humans and cattle are known to share at least 9 species of Cryptosporidium... 

In his paper, Jan Slapeta lists all the known species of Cryptosporidium (in the 'revised' terminology), and even includes GenBank reference strains for common molecular markers such as actin, HSP70 and COWP1 used for genotyping. Interestingly, he does not include the GP60 marker, a molecular marker for which the terminology is also discordant.

Slapeta moreover includes a file with reference SSU rDNA sequences that enable a standardisation of genetic analyses. This year, we did in fact a similar thing for Blastocystis: Along with our 2013 Protist paper surveying Blastocystis subtypes in animals (including the identification of a couple of new subtypes!), we uploaded a reference alignment consisting of some complete SSU rRNA gene sequences present in GenBank; one or more for each of the now known 17 subtypes; more will be added as more subtypes are discovered. The file can be downloaded when accessing the online version of the paper, and we hope that everyone interested in analysing sequences that represent potentially novel subtypes will use this reference alignment (which has been edited to eliminate regions of ambiguous base alignment); it should be quite helpful. Again, I also bring your attention to the pubmlst Blastocystis database, where fast files obtained by Blastocystis barcoding can be queried in batches for quick analysis of large amounts of sequence data. There's a Youtube video here on Blastocystis barcoding and how to use the pubmlst database.

Consensus on methods, terminology and diagnostic algorithms is essential to developing a common language and understanding of how infectious organisms impact our lives; without it,  confusion wreaks havoc with our efforts.


Alfellani MA, Taner-Mulla D, Jacob AS, Imeede CA, Yoshikawa H, Stensvold CR, & Clark CG (2013). Genetic diversity of Blastocystis in livestock and zoo animals. Protist, 164 (4), 497-509 PMID: 23770574

Kotloff KL, Nataro JP, Blackwelder WC, Nasrin D, Farag TH, Panchalingam S, Wu Y, Sow SO, Sur D, Breiman RF, Faruque AS, Zaidi AK, Saha D, Alonso PL, Tamboura B, Sanogo D, Onwuchekwa U, Manna B, Ramamurthy T, Kanungo S, Ochieng JB, Omore R, Oundo JO, Hossain A, Das SK, Ahmed S, Qureshi S, Quadri F, Adegbola RA, Antonio M, Hossain MJ, Akinsola A, Mandomando I, Nhampossa T, AcΓ‘cio S, Biswas K, O'Reilly CE, Mintz ED, Berkeley LY, Muhsen K, Sommerfelt H, Robins-Browne RM, & Levine MM (2013). Burden and aetiology of diarrhoeal disease in infants and young children in developing countries (the Global Enteric Multicenter Study, GEMS): a prospective, case-control study. Lancet, 382 (9888), 209-22 PMID: 23680352

Ε lapeta J (2013). Cryptosporidiosis and Cryptosporidium species in animals and humans: a thirty colour rainbow? International Journal for Parasitology, 43 (12-13), 957-70 PMID: 23973380  

Stensvold CR, Suresh GK, Tan KS, Thompson RC, Traub RJ, Viscogliosi E, Yoshikawa H, & Clark CG (2007). Terminology for Blastocystis subtypes--a consensus. Trends in Parasitology, 23 (3), 93-6 PMID: 17241816

Striepen B (2013). Parasitic infections: Time to tackle cryptosporidiosis. Nature, 503 (7475), 189-91 PMID: 24236315

Xiao L, Ryan UM, Fayer R, Bowman DD, & Zhang L (2012). Cryptosporidium tyzzeri and Cryptosporidium pestis: which name is valid? Experimental Parasitology, 130 (3), 308-9 PMID: 22230707