Biol697 An introduction to Computational Data Analysis in Biology

Biol 697: An Introduction to Computational Data Analysis for Biology


Weekly Schedule: Tuesday & Thursday 11-12:30

Office Hours:Prof. Byrnes will hold office hours Wednesday from 10:00- 11:30. If you need to schedule a different time, email him.


Overview:
This course will cover the basic statistical knowledge necessary for a graduate student to design, execute, and analyze a basic research project. The course aims to have students focus on thinking about the biological processes that they are studying in their research and how to translate them into statistical models. The course will take a hands-on computational approach, teaching students the statistical programming language R. In addition to teaching the fundamentals of data analysis, we will emphasize several key concepts of efficient computer programming that students can use in a variety of other areas outside of data analysis.

Prerequisites:
I will assume a basic knowledge of algebra and introductory calculus (although no calculus will be used). Undergraduate courses in probability theory and computer science are useful, but not required. Students who are new to programming should skim chapter 1 of Adler before beginning the course.

Also, you must install R and R-studio on your laptop before the first day of class. Please bring your latop to all classes.

Required Texts:
Adler, J. (2009) R in a Nutshell: A Desktop Quick Reference. O'Reilly. [amazon]

Media. Vickers, A. (2009) What is a p-value anyway? 34 Stories to Help You Actually Understand Statistics. Addison Wesley. [amazon]

Whitlock, W.C. and Schluter, D. (2008) The Analysis of Biological Data. Roberts and Company Publishers. [amazon]

Recommended Texts:
I will be drawing on examples and materials from a few other sources. They include wonderful examples of R code in the context of data analysis. You are not required to have these, but you will either find them useful in this course or in future endeavors.

Bolker, B. (2009) Ecological Models and Data in R. Princeton University Press. [link with preprint]

Matloff, N. (2011) The Art of R Programming: A Tour of Statistical Software Design. No Starch Press. [no startch]

Song, S. Qian (2009) Environmental and Ecological Statistics with R. Chapman and Hall/CRC Press, London. [amazon]

Assignments and Exams:
Your grade will be determined by a combination of weekly homework, a midterm, and a final exam. Homework will consist of a problem set and a short response to a chapter from Vickers. Homework will be worth 50% of your course grade. All exams will be take-home. The midterm will be worth 20% and the final will be worth 30%. Additionally, students may earn extra credit for a statistical write-up of their own research data to be turned in during the finals period.

All homework is open book, open internet (save looking for answer keys). Feel free to discuss the work with your classmates, but your answers must be your own. Exams will be open book, open internet, but no collaboration between classmates.

Course Content:
Every week we will have assigned readings from the above texts and other papers and chapters at my discretions. PDFs will be provided as needed. Assignments for that week will be due the following Tuesday.

Week Topic Readings Assignments Due Next Tues.
1. Introduction, Data & Data Management
course syllabus
lecture 1 handout
pop quiz!
lecture 2 handout

Vickers Ch. 1,32-34

Adler Chapter 3
Homework 1
homework 1 ponds data
homework 1 solution code
2. Biological Processes & Sampling Populations
lecture 3 handout
lecture 3 code

lecture 4 handout
lecture 4 code
desert bird census data

Whitlock & Schluter Chapters 1 [pdf], 3-4

Bolker's Distributions Bestiary
[ pdf]

Vickers Chapters 2-5
Homework 2
homework 2 solution code
3. Statistical Distributions & Data Visualization
lecture 5 handout
lecture 5 code

lecture 6 handout
lecture 6 code
data for lecture and homework (original source)

Whitlock & Schluter Chapters 2, 10
A Layered Grammer of Graphics
Bolker's Distributions Bestiary [pdf]
(which you should have read last week)
Adler Chapter 6, 14 for reference
Vickers Chapters 6-8, 11-12

Homework 3
Homework 3 Solution
4. Probability and Hypothesis Testing
lecture 7 handout
lecture 7 code

lecture 8 handout
lecture 8 code
Power via Simulation Handout
code for Power handout

Whitlock & Schluter Chapters 5-7
Vickers Chapters 13-15
Google's R Style Guide

Homework 4
Homework 4 Solution
5. Writing Functions to Test Hypotheses
lecture 9 handout
lecture 10 handout

Whitlock & Schluter Chapters 9 & 11
Adler Chapter 9
Hurblert & Lombardi 2009
No Vickers. Use the Hurlbert & Lombardi as a
prompt (if you need one) for next week's reflection

Homework 5
Homework 5 Solution
6. Fitting Linear Models: Least Squares
lecture 11 handout
lecture 11 code
wolf inbreeding data
pufferfish data

lecture 12 handout
lecture 12 code

Whitlock & Schluter Chapters 16 - 17.6
Vickers Chapters 17-19

Homework 6
Homework 6 Data
Homework 6 Solution
7. Fitting Linear Models: Likelihood
lecture 13 handout
lecture 13 code
Bee Lifespan Data

lecture 14 handout
lecture 14 code

Whitlock & Schluter Chapter 20
Bolker 2012
Vickers Chapters 20-22

Homework 7
Spider Data
Bird Data
8. Generalized Linear Models
lecture 15 handout
lecture 15 code
lecture 16 handout
lecture 16 code

Whitlock & Schluter 17.8
O'Hara 2009 (skip part on mixed models)
Do not log transform count data
The arcsine is asinine
Optional Nonlinear Least Squares & Power Laws
Vickers Chapters 23-24

No Reflection Due
9. Experiments & the Linear Model (ANOVA)
lecture 17 handout
lecture 17 code
LTER Kelp Data
Daphnia Resistance Data
Disorders and Gene Expression

lecture 18 handout
lecture 18 code

Whitlock & Schluter 15
Wickham 2011 (skim)

10. Multiple Predictors
lecture 19 handout
lecture 19 code
Zooplankton Predation Data
Bee Gene Expression

lecture 20 handout
lecture 20 code
Fire Recovery Data


Whitlock & Schluter 18
No Vickers, but, reflection on how your have changed how you think about data & models

Homework 8
Homework 8 Solution
Seed Survival Data
11. Interactions, Covariates & Experiments
lecture 21 handout
lecture 21 code
Neanderthal Brain Size
Intertidal Algae

lecture 22 handout
lecture 22 code

Whitlock & Schluter 14
Hurlbert 1984
Vickers Ch. 25-26
12. Confounding Variables
lecture 23 handout

Selections from Pearl Ch3
Cottingham et al 2005
Vickers Ch 27-28

No homework: Thanksgiving
13. Nesting, Random Effects, and Hierarchical Models
lecture 24 handout
lecture 24 code
Intertidal Richness Data
Nested Plant Growth Data

lecture 25 handout
lecture 25 code
Hawaiian Bird Data
Averaged Plankton Timeseries Data


Schielzeth and Nakagawa 2012
Schielzeth and Nakagawa Appendix
Bolker et al 2009
Optional (but recommended): Making Sense of Random Effects
Optional (but recommended): Zuur 5.1-5.6
Optional: Bolker Examples
Vickers Ch 29-31

No homework: Work on Exams
14. Information Theoretic & Bayesian Approaches
lecture 26 handout
lecture 26 code

lecture 27 handout
lecture 27 code


Hobbs and Hilborn 2001
Symonds and Moussalli 2010
Ellison 1996

Optional (but recommended) Ripley's Introduction to AIC
Vickers 32-34

Exam Part I Due on the 6th
15. Wrap Up and Concluding Discussion
Murtaugh 2007
Final Reflection! Wrap-Up of Your Thoughts!
Final due on the 18th, Extra Credit on the 20th.