The Sequence Read Archive (SRA) hosts ~11PB of data from across the world. Mining this database is useful for research and provides the ability to add more datasets at no cost. The goal of this workshop is to help researchers working with, or interested in working with, SRA to be able to run their bioinformatics workflows efficiently using computational resources available through NCGAS/XSEDE.
By the end of this workshop, participants should leave with the following knowledge:
- Introduction to NCGAS and HPC
- Bioinformatics programs available to mine SRA
- R to visualize the data
- Basic Linux functionality (sign in, moving around file system, etc.), but expertise is not required. Take the Unix the Basics course.
- For online courses, you will need your own computer and access to the internet
- An SSH client is required. This knowledge base article is helpful: https://kb.iu.edu/d/ahjh (Links to an external site.) (Links to an external site.)
- Mac and Linux machines should have built in SSH clients. For Windows, check and see if you have a program called PuTTY. If you have Windows 10, you may also consider installing the Windows subsystem for Linux: https://docs.microsoft.com/en-us/windows/wsl/install-win10 (Links to an external site.) (Links to an external site.)
- For in-person workshops, a laptop is required—if you do not have one, contact the organizer to borrow one
- Introduction to HPC and nationally available resources
- Tools available to mine SRA
- Visualization using R
The Supercomputing for Everyone Series (SC4ES) aims to bring more users into the realm of advanced computing, whether it be visualization, computation, analytics, storage, or any related discipline. Research Technologies can take you to the next level of computing.
Supercomputing for Everyone Series workshops and seminars are led by personnel from Research Technologies, a division of University Information Technology Services and a center in the Pervasive Technology Institute at Indiana University.