Applied biostatistics
MATH-493
Media
- Announcements (Forum)
- CRAN: The Comprehensive R Archive Network (URL)
- R Studio downloads page (URL)
- Quick-R home page (URL)
- R Markdown: The Definitive Guide (URL)
- R Markdown lessons from R Studio (URL)
- knitr with R Markdown - Karl Broman's website (URL)
- Forum: Finding group members (Forum)
17 February
Organization: you will write a short group report (~ 5-7 pages; a 'group' can be 1-4 persons), a short group article critique (1 page, it can be in question/answer format), and a longer individual report (up to ~ 7-10 pages). The 2 reports will be about data analyses you carry out. The group data set will be assigned to you. For the individual report, you can choose a topic from a list that I will provide once we have covered all the eligible topics in lecture. I will announce when you can email me your choice, so please do not send me an email earlier than that. Once you email me your choice, I will assign you a data set on that topic.
The purpose of this course is to help you to learn something without too much stress!! That is why you can do each of the 2 reports twice: a preliminary version, which will be commented according to posted criteria, then a final version, where you can incorporate the comments, due at the end of the semester. Only the final version will count towards your course note. The deadlines will be posted on the course moodle page.
For the article critique, you will get the 1/2 point (full credit) as long as you submit it by the deadline - you don't need to do a preliminary version.
In order to give you time to work on your reports, there will be no in-person lectures and mainly optional topics toward the end of the course. These 'extra' topics are NOT required, there are slides (and possibly videos) in case you are interested. There is no penalty associated with not following them.
Grading
- 1/2 point: short report 1 (either regression or anova, will be assigned to you), can be in a group of up to 4 people
- 1/2 point: short critique on a scientific article (will be assigned to you), can be in a group of up to 4 people
- 5 points: individual analysis report (your choice among a number of topics)
- Video Lecture 1a (URL)
- Lecture 1a slides - Reproducible Research (File)
- Video Lecture 1b (from end of Prob-Stat I) (URL)
- Lecture 1b slides - Hypothesis testing review (File)
- Power example (File)
- Lab 1 - EDA (exploratory data analysis) (File)
- Installing R and RStudio (URL)
- Getting started with R/RStudio (URL)
- TP 0 (File)
- Q+A Week 1 (Forum)
24 February
You can already email me your groups (1 email per group); remember, each group can contain 1-4 persons. Each group will be assigned to analyze EITHER a regression data set OR an anova data set.
- Video Lecture 2a (URL)
- Video Lecture 2b (URL)
- Lecture 2 slides (File)
- Lab 2a (File)
- Lab 2a - reponses to questions in class (File)
- Linear Models in R (File)
- Linear Statistical Models (MASS ch. 6) (File)
- Lab 2b (optional but strongly suggested - tutorial on R Markdown) (URL)
- Q+A Week 2 (Forum)
3 March
Report 1: (initial/preliminary deadline Friday 11 April - any time)
The purpose of this assignment is to give you practice writing a scientific report. Report writing is an extremely important skill, regardless of whether you continue in an academic career, in government or in industry.
You should analyze your data in an appropriate manner (either like lab week 2 for regression or lab week 3 for anova, or a combination if you have both factor and continuous explanatory variables) and write a short report, ~ 5 pages (7 pages max).
The goal is NOT to replicate the analysis presented in the paper corresponding to the data set, so don't worry if you do something different, or obtain results that are different from the paper when you are doing the same thing that the paper seems to describe. YOU are in charge of the analyses you carry out !!
Please submit your report as a .pdf file, (NOT .DOC, etc.) in the moodle assignment space, 1 per group. The spaces will be labeled R1, R2, A1, A2, for regression problems 1-2 and anova problems 1-2. Your file name should be labeled as XX-##.pdf, etc., where XX is your assigned problem (either R1, R2, A1, or A2) and ## is your group number.
Your report should contain a short background/intro to the problem (including the aim of the original study), a presentation of the results of your statistical analyses, including exploratory data analysis, model fitting and final model, along with a short discussion of any shortcomings of the final model, and your conclusions. Include relevant graphics and tables, but DO NOT include any raw R code or output (you will be penalized for this if you do). Your graphs should be 'pretty', if you copy/paste a graph from the screen, it will most likely appear to be blurry (png file) and you will be penalized for this. It is easiest to include nice-looking graphs if you save a pdf version and use R Markdown, but this is not the only way.
Please use 12 point size and margins of 2.5 cm. Please remember to number each page at the bottom (including page 1). Inside the top margin of each page, please include the surnames of each group member (separated by commas).
Do not include a cover page, abstract, table of contents, or EPFL logo, and do not exceed 7 pages (not including any references) or you will be penalized.
Your report will also be graded based on language use and overall presentation. (It can be in either English or French.)
As a reminder, this report counts for 1/2 point (out of 6) of your course note.
The initial deadline 11 April, any time) is for your preliminary report. The final version is due by 30 June (any time).
If you turn in your report before the initial deadline then we will be able to comment on your report and you can re-do it before the final deadline. If you need to turn it in later, that's ok, I should still have enough time to comment it for you, so.... NO STRESS !!!!!
When you email me with the names of your group members I will send you the dataset (after Lab 3).
- Report 1 - Group / Data assignments (File)
- Report 1 criteria (File)
- Reports - ADDITIONAL TIPS (File)
- Q+A Report 1 (Forum)
- Video Lecture 3a (URL)
- Video Lecture 3b (URL)
- Lecture 3 slides (File)
- Lab 3 (File)
- Lab 3 - reponses to questions in class (File)
- Q+A Week 3 (Forum)
- Supporting papers for Report 1 (Folder)
10 March
- Video Lecture 4a (URL)
- Video Lecture 4b (URL)
- Lecture 4 slides (File)
- Lab 4 (File)
- Lab 4 data (File)
- Q+A Week 4 (Forum)
17 March
- Video Lecture 5a (URL)
- Video Lecture 5b (URL)
- Video Lecture 5c (URL)
- Lecture 5 slides (File)
- Titanic example of logistic regression in R (URL)
- Evaluating logistic regression models (URL)
- Poisson regression in R (URL)
- Lab 5 (File)
- Lab 5 - responses to questions in class (File)
- Q + A Week 5 (Forum)
24 March
Second assignment This assignment is a statistical critique of a published paper. Your report can either be written as a full review or in a question/answer format by just simply by responding to each question. Your report should not be more than 1 page.
You can turn in this report any time before the final deadline - 30 June 2024. You will get full credit (i.e. 1/2 point toward your course note) for turning in a reasonable effort.
There is a deposit slot near the bottom of the course moodle page for you to submit your report.
Groups who worked on regression problems:
L1: http://www.jcancer.org/v09p1421.htm
Groups who worked on anova problems:
L2: https://www.sciencedirect.com/science/article/pii/S1743919118307337
A guide sheet (study assessment questions) is uploaded to help you to address statistical issues.
The file contains a longer list of questions to consider when evaluating a study in your future career. As a guide for your 2nd assignment report, please make sure that you respond particularly to the following: (numbers in parentheses represent points out of 6)
(1) 1. Briefly give the biomedical background for the paper. What question/hypothesis is being investigated?
(1) 2. What data are collected (include how many individuals, what variables, inclusion / exclusion criteria for the study)?
(1) 3. What analyses were carried out? Are these analyses appropriate for the problem?
(1) 4. What other analyses should have been done (or might have been done but not shown)? Explain.
(1) 5. Is there any mention of power of the analyses? How would you go about trying to estimate power?
(NOTE: you do NOT have to actually give power estimates, just say how you might go about it.)
(1) 6. What conclusions do the authors draw? Are these conclusions substantiated by the results? Explain.
- Study assessment questions (File)
- Video Lecture 6a (URL)
- Video Lecture 6b (URL)
- Lecture 6 slides (File)
- R Data Camp survival analysis tutorial (URL)
- Cox ph modeling in R (URL)
- Q+A Week 6 (Forum)
- My.stepwise R package (URL)
- Lab 6 (File)
- Comments Lab 6 (File)
- pbc paper (survival) (URL)
31 March
- Video Lecture 7a (URL)
- Video Lecture 7b (URL)
- Lecture 7 slides (File)
- Discrete data analysis with R (URL)
- Lab 7 (File)
- Additional comments labs 6+7 (File)
- More on mosaic plots (URL)
- Q+A Week 7 (Forum)
- Categorical Data Analysis (A. Agresti, 2013) (File)
- Book by Michael Friendly (URL)
31 March - Individual Report Topic Choice
- survival analysis
- logistic regression
- generalized linear model (other than logistic, e.g. Poisson)
- discrete data / contingency table analysis
- genome-wide association study (GWAS)
and EMAIL ME your choice (please follow the email instructions in the announcement). I will then send you a dataset for analysis (or you can start working on the GWAS tutorial if you are doing a GWAS, just let me know).
Your final report should be ~7-10 pages (absolute maximum, not including references; fewer pages is better if you can be concise).
The preliminary deadline is Friday 16 May (any time), then I should be able to give you feedback in 1-2 weeks. You should then have a few more weeks to work on it before the final deadline of Monday 30 June (any time).
- Survival criteria (File)
- Logistic criteria (File)
- GLM criteria (File)
- Discrete criteria (File)
- GWAS criteria (File)
7 April (Note: NO CLASS AND NO LAB)
NOTE: There will be NO CLASS today and NO LAB tomorrow.
This week's labs are OPTIONAL and there will be NO LAB MEETING; you might want to have a look at them though if you choose to do a GWAS as your individual report.
NOTE: The GWAS tutorial uses biocLite to install BioConductor packages - this is the older method. The newer method to install BioConductor packages is by using BiocManager.
First install BiocManager:
if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager")
After that is installed, then you can install any BioConductor package (e.g. GWASTools) as follows:
BiocManager::install("GWASTools")
- Video Lecture 8a (same as Statistical Genetics Lecture 4a) (URL)
- Video Lecture 8b (same as Statistical Genetics Lecture 4b) (URL)
- Video Lecture 8c (URL)
- Lecture 8 slides (File)
- Lab 8a - PCA (OPTIONAL) (File)
- Food data (File)
- Lab 8b - Multiple testing (OPTIONAL) (URL)
- GWAS-extras + data files (Folder)
- PCs in GWAS - Nature Genetics paper (File)
- PCs in GWAS - slide presentation (File)
- Manhattan plot with -log10p on y-axis (URL)
- GWASTools BioConductor R package (data cleaning/qc tools, maybe interesting) (URL)
- Other tutorial for SNPRelate (possibly interesting, completely optional) (URL)
- SNPassoc paper (optional but maybe useful) (URL)
- SNPassoc R package (contains functions for eigenstrat, Cochran-Armitage (trend) test, etc.) (URL)
- Q+A week 8 (Forum)
11 April - Deposit Prelim Report 1
14 April
NO CLASS OR LAB - time to work on reports; if you have any questions, please visit me during office hours or make an appointment with me.
- Lecture 9 slides (OPTIONAL) (File)
- Quick introduction to pwr package in R (OPTIONAL) (URL)
- Lab 9 (OPTIONAL) - power tutorial (File)
18 - 27 April - PÂQUES / EASTER
NO CLASS OR LAB - PÂQUES / EASTER
28 April
NO CLASS OR LAB - time to work on reports; if you have any questions, please visit me during office hours or make an appointment with me.
5 May
NO CLASS OR LAB - time to work on reports; if you have any questions, please visit me during office hours or make an appointment with me.
12 May
NO CLASS OR LAB - time to work on reports; if you have any questions, please visit me during office hours or make an appointment with me.
- Video Lecture 12a (OPTIONAL) (URL)
- Video lecture 12b (OPTIONAL) (URL)
- Lecture 12 slides (OPTIONAL) (File)
- Lab 12 (old lab 6) - OPTIONAL (URL)
- Hierarchichal clustering in R (URL)
- Heatmaps in R (URL)
- Heatmaps in R - short tutorial (URL)
- R graph gallery - heatmaps (URL)
19 May - Deposit Prelim Report 3 (Individual)
Your report MUST be in pdf format. Please name your preliminary report EXACTLY as follows:
lastname-topic-prelim.pdf
(for example, if I were doing survival analysis, my report would be named
As a reminder, the possible topic names are:
logistic; glm; survival; discrete; gwas.
- survival - comments A-K (File)
- survival - comments L-Z (File)
- logistic - comments A-C (File)
- logistic - comments D-K (File)
- logistic - comments L-P (File)
- logistic - comments Q-Z (File)
- GLM - comments (File)
- discrete - comments (File)
- GWAS - comments (File)
- Late discrete comments (File)
- Late comments (all topics) - 26 June 2025 (File)