Showing posts with label GenBank. Show all posts
Showing posts with label GenBank. Show all posts

Wednesday, May 2, 2012

Blastocystis Sequence Typing Home Page

Last year, we launched the Blastocystis Sequence Typing Home Page, which is a publicly accessible resource including two major facilities: 1) A sequence database and 2) An isolate database.
The databases cover both SSU-rDNA data and Multilocus Sequence Typing (MLST) data. For those interested in MLST, please visit this paper.The rest of this post will be about SSU-rDNA sequences.

The database has a BLAST function. Barcoding sequences (i.e. sequences which include the 500 5'-most bases in the SSU-rDNA) can be submitted individually or in bulks, and the output file will include information on subtype (ST) and allele. The number of alleles in ST3 is huge (currently n=38) compared to other subtypes, for which only 2-3 alleles have been identified (e.g. ST8). In case a sequence is submitted that is not similar to an allele already present in the database, I suggest that you do an individual sequence query, which enables the generation of an alignment, which will show you the polymorphism(s). In case a new allele is identified, I suggest that we submit this new allele to the sequence database.
We not only strongly encourage using this BLAST feature for quick and standardised subtype and allele identification, but also for submitting isolate data, i.e. barcode sequences with provenance data (data on host, symptoms, geographical origin, etc.); again this can be done by contacting the curator (me); please look up the site for more information.

Our goal is to produce a database which accommodates large sets of data that can be submitted to scrutiny by everyone. The isolate database currently holds almost 700 isolates with 118 unique alleles - I hope this can be expanded much, much more. Also, data extracts can be done at all times, and below is a random example of an extract from human and non-human data from France downloaded from GenBank:
The colours indicate different alleles in different hosts (see legend to the right). A file with all alleles in fasta format is available here. You can paste them into the search field here for a total list of alleles currently in the database; try clicking on a couple to familiarise yourself with the system... One of the things that we can see here is that alleles 34, 36, 37 (ST3) and allele 4 (ST1) are the most common alleles in humans in France. It may seem a bit confusing to speak of both subtypes AND alleles. However, alleles are a good proxy for MLST data, and hence, looking at alleles is useful, e.g. in terms of transmission studies.

There are many other ways of extracting and visualising data from the isolate database. For more information on barcoding, subtypes, alleles, and the databases, please do not hesitate to contact me. I emphasise that the database only works with sequences that include the barcode region; mutliple SSU-rDNA targets have been used for subtyping, but due to the fact that this database is based on barcode data, we recommend that subtyping be done by barcoding (see references).

Useful literature:

Stensvold, C., Alfellani, M., & Clark, C. (2012). Levels of genetic diversity vary dramatically between Blastocystis subtypes Infection, Genetics and Evolution, 12 (2), 263-273 DOI: 10.1016/j.meegid.2011.11.002  

Scicluna SM, Tawari B, & Clark CG (2006). DNA barcoding of Blastocystis. Protist, 157 (1), 77-85 PMID: 16431158