Exploratory data analysis in environmental health
ENV-444
Media
Media
Introduction
- Theoretical lectures, including a section dedicated to the writing/structure of scientific articles
- Practical exercises
- A semester project, consisting in the writing of a scientific article (work in groups)
Keywords: Exploratory spatial
data analysis; Geocomputation; EDA; ESDA;
Geovisualization; GIS; Geoda; Thematic mapping; Semiology of graphics;
Spatial statistics; Principal Component Analysis; Rate smoothing;
Spatial regression; Logistic regression; Scientific paper writing; Open
access; Open source
Software used: Geoda (v1.22) and QGIS (v3.34 LTR), RStudio (version communicated later)
Organization: components of the course are:
- Theoretical lectures
- Exercises in the room GRB330 on personal computers (only open source and free software will be used) and/or in the IT/TP room GRB001.
- Semester project: scientific article (work in groups)
- Technical questions can be asked continuously by means of the Forum on ED.
- Questions related to organizational and administrative elements of the course can be asked directly by email to Stéphane Joost.
- 7/9 exercises (individual short reports or "comptes-rendus") = 20% of the final grade
- 1 scientific article (semester project, group work) = 30%
- 1 individual oral exam = 50%. During the oral exam, each student will be asked questions about the research
carried out by his/her group. Questions will also be asked about
the theoretical content of the course.
- Submission of the scientific article (semester project): January 10 (Friday), 2025, at 23h59
- Oral exam: Monday 13.01.2025, 08h15 to 18h15 (BC01); Tuesday 14.01.2025, 08h15 to 18h15 (BC01); Wednesday 15.01.2025, 08h15 to 18h15 (BC01)
- Schedule of the exam
Indications as regards the composition of groups will be given by teachers after week 4 on the basis of the
total number of students.
Groups will have to produce:
- A project proposal, i.e. the description of the content of the scientific article (semester project) to be produced. It will contain the idea of the research to be carried out to verify working hypotheses related to a topic defined by the group (examples: relationship between obesity and road traffic nightnoise, relationship between Body Mass Index and green areas; relationship between the frequency intake of sugar sweetened beverages and estimated soil temperature, etc.).
- The scientific article (8-10 pages max.)
- ~20 environmental variables that each student will produce in the context of an exercise and that will constitute an open dataset of reference;
- Health data - Body Mass Index (BMI) and frequency intake of sugar sweetened beverages made available for ~14'000 Geneva citizen.
Students will have the opportunity to download additional data to characterize the territory of the State of Geneva according to the research hypotheses developed. The main geodata source is the Système d'Information du Territoire Genevois (SITG).
Documentation - Course material
- Geoda Workbook
- Slides of the theoretical lectures (distributed through corresponding weeks herunder)
Week 1
- Lecture 1 - Introduction to exploratory data analysis in environmental health (v2 updated) (File)
- Exercise 1 - Readings Morgenthaler and Anselin (File)
- Article 1 - Morgenthaler (2009) (File)
- Article 2 - Anselin et al. (2006) (File)
- Article 3 - for comparison purpose - Anselin et al. 2022 (File)
- Solution Exercise 1 - Readings Morgenthaler and Anselin (File)
Lundi du Jeûne
Week 2
For the exercises, in case you do not want to install the software on your computer (Geoda, QGIS, RStudio), you can use the ENAC-SSIE virtual environment (https://vdi.epfl.ch/portal/webclient/#/home).
Exercise 2b is a tutorial so that you learn how to use several among Geoda exploratory tools. Thus you will receive no Solution file for this part.
- Lecture 2 - EDA approaches and cognitive processes for data exploration (File)
- Exercise 2a - Chicago - Statement (File)
- Exercise 2a - Data Chicago (File)
- Exercise 2a - Chicago - Solution (File)
- Exercise 2b - New York - Statement (File)
- Exercise 2b - Data New York (File)
- Lecture 3a - Relationship between health & place - exposome (File)
- 3b. Introduction to population epidemiology (File)
- Exercise 3 - Environmental dataset for Geneva (File)
- Exercise 3 - Data Geneva (File)
Week 4
- Lecture 4. Introduction to spatial epidemiology (File)
- Exercise 4 - Geneva Health data handling and aggregation (File)
- SQL theory - Support for exercise 4 (File)
- Health data description (exercise 4) (File)
- codebook.health.data (exercise 4) (File)
- gva.health.data.bmi.ssb-lv03.corr v2 (File)
- Joost et al. - 2019 - Overlapping spatial clusters of sugar-sweetened beverages (File)
- Exercise 4 - Health data Geneva solution v3 (File)
Week 5
- Lecture 5. Order stats, rate smoothing and confounding factors (File)
- Exercise 5 - Confounding factors variable adjustment (File)
- Exercise 5 - Data (File)
- exercise 5 -confounding factors variable adjustment - solution (File)
- Groups for semester project (Group choice)
Vacances
Week 6
- Lecture 6. Publishing scientific articles & data (File)
- Exercise 6. Prepare and upload an open data set (deadline November 15) (File)
- Instructions for the description of the project (deadline November 8) (File)
Week 7
- Lecture # 7a - Geographically Weighted Regression (File)
- Lecture #7b - Medical cohorts - Bus Santé and Specchio (File)
- Exercise # 7 - Geographically weighted Regression (GWR; deadline November 22, 23h59)) (File)
- Data for exercise # 7 (GWR) (File)
- Exercise #7 - Setup instructions (File)
- Exercise #7 - Base Python code (GWR.ipynb) (File)
- Exercice #7 - Base Pathon code (GWR.ipynb zippé) (File)
- Exercise #7 - GWR - Solution (File)
Week 8
- 8a. PCA and CAH in health territorial diagnostics (File)
- 8b. Principal Component Analysis (PCA) - Theory (File)
- 8c. Hierarchical Ascendent Classification (HAC) - Theory (File)
- 8d. Hierarchical Ascendant Classification - illustrated example (File)
- Exercise #8 - HAC and PCA with Geoda.v2 (deadline November 29, 23h59) (File)
- Data for exercise #8 (.zip file) (File)
- Exercise #8 - HAC and PCA with Geoda - Solution (File)
- Lecture #9 - Spatial relative risk (File)
- Lecture #9 - SPARR - Detailed technical information (File)
- Exercise#9 - sparr (deadline December 6, 23h59) (File)
- exercise #9 - SPARR (File)
- Exercise#9 - sparr solution (File)
Week 10
- Lecture #10 - Environment and metabolic syndrome (File)
- Additional geocomputation tools (File)
- Data for geocomputational tools (URL)
Week 11
At 9h15 we will have a presentation by Dr Anaïs Ladoy, responsible for geographic information at the Pôle santé numérique et qualité of the Direction Générale de la Santé of the Vaud canton. She will give a talk about how geographic information can be used in the domain of public health policies application and elaboration.
- Lecture#11a - Anaïs Ladoy DGS Vaud en geographic information (File)
- Lecture #11b - Figures, maps and legends + reminder of rules for the paper (File)
- Models of papers for your semester projectThe two ... (Text and media area)
Week 12
Week 13