The National Center for Genome Analysis Support (NCGAS) offers this three-day, online, semi-asyncrhonous workshop on high performance computing (HPC) usage and transcriptome assembly, annotation and analysis. The workshop consists of discussions, lectures, and hands-on tutorials/activities to cover topics important to getting started constructing and analyzing transcriptomes. While the focus is largely on de novo assembly, genome-guided transcriptome assembly and analysis is discussed, and demo code is provided. Material covers both the availability and use of HPC resources, alongside the task of assembling a new transcriptome, in order to provide a more comprehensive preparation for this and future bioinformatic tasks. The main case study will consist of using four separate assemblers (Trinity, SOAP de novo, Velvet Oases, and TransABySS), with multiple kmers, to be combined and curated with Evigenes. This combined assembly with multiple parameters is considered much more robust than simply using one assembler, and the NCGAS pipeline streamlines the process and allows for customization if desired. Downstream analyses such as differential expression, generating KEGG pathway images, and annotation using Trinotate will also be discussed. While material will make heavy use of XSEDE and IU machines, the material is transferable to any cluster.
The NCGAS was funded by the National Science Foundation under Grant Nos. DBI-1062432 2011 , ABI-1458641 2015 , and ABI-1759906 2018 to Indiana University.
Objectives
Participants should leave with the following knowledge:
- Familiarity with nationally available compute resources
- An understanding of the differences, pros, and cons of VMs, Gateways, Clusters, and Clouds
- How to run and optimize a job submission on a cluster
- How to manage large data sets and move data between resources
- How to run NCGAS’s transcriptome tools to produce robust transcriptomes
- How to check quality and clean up a de novo transcriptome
- Familiarity with some of the considerations in downstream analyses
- How to get help for both genomic and computational questions
Participant data will not be assembled during the workshop, but the entire pipeline will be used by participants with smaller scale demo data. There will be a limited number of slots to meet with NCGAS personnel to discuss your data. Online meetings after the workshop can also be scheduled.
Prerequisites
This workshop is geared for beginners but basic Unix commands were not covered. As such, participants must have basic Linux functionality (sign in, moving around file system, etc.), but expertise is not required. It would be helpful if participants had some exposure to using a cluster for compute jobs and an idea of the end goals for their data.
Beginner tutorials in Linux (bash) can be found here:
Apply for July 26-28, 2021 on-line workshop (due June 13, 2021)
Go to application (via REDCap survey)Agenda
Day 1:
- Introduction to NCGAS staff
- Introduction to clusters and usage
- Optimizing jobs
- Overview of de novo transcriptome analysis pipeline
- Data management and movement tutorial
- National HPC resource availability discussion
Day 2:
- Introduction to the case study
- Introduction to RNAseq data and considerations
- QC of raw data
- Using and troubleshooting the workflow
- Annotation using Trinotate
- Introduction to KEGG pathway analysis
Day 3:
- Differential expression discussion and demo
- Genome-guided analysis discussion and demo
- Case study wrap up
Archived workshop: De novo Assembly of Transcriptomes using HPC resources
The National Center for Genome Analysis Support (NCGAS) offered this three-day workshop on HPC usage and de novo transcriptome assembly. The workshop consisted of discussions, lectures, and hands-on tutorials, to cover topics important to getting started constructing and analyzing transcriptomes—without the use of a genome. Material covers both the availability and use of high performance computing (HPC) resources, alongside the task of assembling a new transcriptome, in order to provide a more comprehensive preparation for this and future bioinformatic tasks. Transcriptome assembly consisted of using four separate assemblers (Trinity, SOAP de novo, Velvet Oases, and TransABySS), with multiple kmers, to be combined and curated with Evigenes. This combined assembly with multiple parameters is considered much more robust than simply using one assembler, and the NCGAS pipeline streamlines the process and allows for customization if desired. They also discussed downstream analyses such as differential expression and annotation. While material made heavy use of XSEDE and IU machines, the material is transferable to any cluster.
The NCGAS was funded by the National Science Foundation under Grant Nos. DBI-1062432 2011 , ABI-1458641 2015 , and ABI-1759906 2018 to Indiana University.
Access the course materials
De novo Assembly of Transcriptomes (on GitHub)Go to the archive of talks from the workshop
De novo Assembly of Transcriptomes (on YouTube)The Supercomputing for Everyone Series (SC4ES) aims to bring more users into the realm of advanced computing, whether it be visualization, computation, analytics, storage, or any related discipline. Research Technologies can take you to the next level of computing.
Supercomputing for Everyone Series workshops and seminars are led by personnel from Research Technologies, a division of University Information Technology Services and a center in the Pervasive Technology Institute at Indiana University.