Intro to R for Biologists

Objectives

  • Navigate and use RStudio (on and off Jetstream)—load files, export graphs, etc.
  • Understand how to install, load, and use new libraries.
  • Become familiar with Bioconductor Project.
  • Understand basic data types, functions, objects, and classes in R.
  • Write and use a function.

Prerequisites

  • Unix familiarity is a plus, but not required.
  • A laptop is required - if you do not have one, contact the organizers to borrow one.

Daily Agenda & Requirements

  • Day 1: Introduction
    The goal of this section is to get you acquainted with R, both the environment and the language. We’ll discuss data types, manipulation, the structure of commands, how to get help and more information, how to load packages, and how to use the environment. The hope is that you will use R more intuitively. We will discuss some common errors and troubleshooting during the recitation meeting.

    This section does not focus on any individual analysis or demonstration, rather it focuses on reading and making sense of the language. This is very helpful for new users or anyone currently copying, pasting, and hoping the command will work.

    Requirements: There are no requirements for this section. Basic Unix skills (how variables work, cat, pwd, etc.) are helpful, but we will not be using command line, but will be referencing them throughout.
  • Introduction lab
    A guided activity to practice your skills from day 1. This will give you practice using R and working with sequence data/vectors with a bit more independence. We will answer questions and help troubleshoot the activity during the recitation meeting.

  • Day 2: Introduction to visualization
    We will build on the basic data types and syntax of R to explore visualization of geological data. The two main families of plotting will be introduced (plot style and ggplot style), with examples of how to plot various types of data on geographical maps. This is a useful skill for ecologists and geneticists alike. During the online recitation meeting, we will further discuss options in graphing, troubleshoot setting up Google maps, and share some helpful tutorials/cheat sheets for the plotting language in R.

    Requirements: This is a lab based on the material covered in day 1—familiarity with that material will be useful. Day 1 material will be available online.

  • Introduction to visualization lab
    Thisactivity will extend the same plotting syntax types to a different kind of data—plotting ordination (PCA, PCoA, and nMDS plots) for use in exploring various data you may have. Microbiome, ecological, or population genetics are common examples. We will discuss ordination, when to use different types, and some of the finer points in choosing packages during the recitation meeting.

  • Day 3: Making your own scripts and functions
    The goal of this section is to get a bit more in depth on how to read, understand, and troubleshoot R code by introducing classes and functions. Classes and functions are a large part of R, and therefore a large part of understanding the syntax and function of the language. We will walk through creating your own function for summarizing tables of data (both ecological and genetic data sets are available for use). We will discuss more tips for designing and writing code in R during the recitation.

    Requirements: This material assumes basic usage of R covered in the previous two days, or a moderate familiarity with R basics.

  • Making your own scripts and functions lab
    This activity builds on day 2's lab, where you will create a function to graph a sliding window plot for GC content. This activity is meant to practice building functions, but this particular example can easily be applied to visualize the variation across any continuous data, such as ecological measure through time, population variation over a genome, etc. We will help answer questions and troubleshoot this activity during the online recitation.

Online Course Available via Expand

View Course in Expand

The Supercomputing for Everyone Series (S4ES) aims to bring more users into the realm of advanced computing, whether it be visualization, computation, analytics, storage, or any related discipline. Let Research Technologies staff take you to the next level of computing.

Supercomputing for Everyone Series workshops and seminars are led by personnel from Research Technologies, a division of University Information Technology Services and a center in the Pervasive Technology Institute at Indiana University.