First we load the data:
We then list all variables present in our environment:
## [1] "expression" "metadata"
We see which type of variable are expression and metadata
## [1] "data.frame"
## [1] "data.frame"
Both of them are data.frame
We can easly see that the order of sample ID in the metadata (rows) and in the expression data (columns) are not the same order. It can be problematic if we write some statistical test in R.
Using summary, we don’t see obvious problems. Strain, Condition and Tissue are in charactet, we should transform them into factors to maybe detect some typos
## ID SampleName Time ZTime
## Length:71 Length:71 Min. : 0.00 Min. : 0.000
## Class :character Class :character 1st Qu.: 9.00 1st Qu.: 3.000
## Mode :character Mode :character Median :18.00 Median : 6.000
## Mean :23.75 Mean : 8.873
## 3rd Qu.:36.00 3rd Qu.:12.000
## Max. :48.00 Max. :18.000
## Condition Tissu Strain
## Length:71 Length:71 Length:71
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
Having a quick look at expression data, we see that GSM239937 and GSM239879 are in character type while they should be numeric like the other
## GSM239929 GSM239904 GSM239898 GSM239901 GSM239873
## Min. : 3.5 Min. : 3.5 Min. : 3.4 Min. : 3.4 Min. : 3.4
## 1st Qu.: 5.2 1st Qu.: 5.3 1st Qu.: 5.2 1st Qu.: 5.3 1st Qu.: 5.3
## Median : 6.7 Median : 7.0 Median : 7.0 Median : 7.0 Median : 7.1
## Mean : 6.8 Mean : 6.9 Mean : 7.0 Mean : 6.9 Mean : 7.0
## 3rd Qu.: 8.1 3rd Qu.: 8.6 3rd Qu.: 8.6 3rd Qu.: 8.6 3rd Qu.: 8.3
## Max. :11.2 Max. :11.4 Max. :11.4 Max. :11.3 Max. :11.2
## GSM239934 GSM239887 GSM239927 GSM239882 GSM239935
## Min. : 3.4 Min. : 3.4 Min. : 3.3 Min. : 3.4 Length:100
## 1st Qu.: 5.1 1st Qu.: 5.4 1st Qu.: 5.1 1st Qu.: 5.4 Class :character
## Median : 6.9 Median : 7.0 Median : 6.9 Median : 6.9 Mode :character
## Mean : 6.8 Mean : 7.0 Mean : 6.8 Mean : 7.0
## 3rd Qu.: 8.1 3rd Qu.: 8.4 3rd Qu.: 8.1 3rd Qu.: 8.3
## Max. :11.2 Max. :11.3 Max. :11.2 Max. :11.2
## GSM239909 GSM239890 GSM239879 GSM239893 GSM239908
## Min. : 3.4 Min. : 3.2 Min. : 3.4 Min. : 3.5 Min. : 3.4
## 1st Qu.: 5.3 1st Qu.: 5.5 1st Qu.: 5.3 1st Qu.: 5.3 1st Qu.: 5.3
## Median : 7.0 Median : 7.0 Median : 7.1 Median : 7.0 Median : 7.0
## Mean : 6.9 Mean : 7.0 Mean : 7.0 Mean : 6.9 Mean : 7.0
## 3rd Qu.: 8.6 3rd Qu.: 8.6 3rd Qu.: 8.4 3rd Qu.: 8.6 3rd Qu.: 8.6
## Max. :11.3 Max. :11.3 Max. :11.2 Max. :11.2 Max. :11.1
## GSM239868 GSM239869 GSM239891 GSM239928 GSM239892
## Min. : 3.2 Min. : 3.5 Min. : 3.5 Min. : 3.5 Length:100
## 1st Qu.: 5.4 1st Qu.: 5.4 1st Qu.: 5.3 1st Qu.: 5.1 Class :character
## Median : 6.9 Median : 7.0 Median : 7.0 Median : 6.8 Mode :character
## Mean : 7.0 Mean : 7.0 Mean : 6.9 Mean : 6.8
## 3rd Qu.: 8.5 3rd Qu.: 8.5 3rd Qu.: 8.6 3rd Qu.: 8.1
## Max. :11.3 Max. :11.3 Max. :11.3 Max. :11.3
## GSM239895 GSM239910 GSM239881 GSM239919 GSM239921
## Min. : 3.5 Min. : 3.5 Min. : 3.4 Min. : 3.3 Min. : 3.4
## 1st Qu.: 5.3 1st Qu.: 5.3 1st Qu.: 5.3 1st Qu.: 5.1 1st Qu.: 5.0
## Median : 7.0 Median : 7.0 Median : 7.0 Median : 6.8 Median : 6.6
## Mean : 7.0 Mean : 6.9 Mean : 7.0 Mean : 6.8 Mean : 6.8
## 3rd Qu.: 8.6 3rd Qu.: 8.6 3rd Qu.: 8.6 3rd Qu.: 8.0 3rd Qu.: 8.0
## Max. :11.3 Max. :11.2 Max. :11.3 Max. :11.3 Max. :11.2
## GSM239936 GSM239899 GSM239894 GSM239877 GSM239917
## Min. : 3.2 Min. : 3.2 Min. : 3.3 Min. : 3.4 Min. : 3.5
## 1st Qu.: 5.2 1st Qu.: 5.3 1st Qu.: 5.3 1st Qu.: 5.4 1st Qu.: 5.2
## Median : 6.9 Median : 7.0 Median : 7.0 Median : 7.0 Median : 6.6
## Mean : 6.8 Mean : 7.0 Mean : 7.0 Mean : 7.0 Mean : 6.8
## 3rd Qu.: 8.2 3rd Qu.: 8.6 3rd Qu.: 8.6 3rd Qu.: 8.5 3rd Qu.: 8.0
## Max. :11.3 Max. :11.3 Max. :11.3 Max. :11.2 Max. :11.2
## GSM239937 GSM239930 GSM239872 GSM239902 GSM239915
## Min. : 3.4 Min. : 3.4 Min. : 3.3 Min. : 3.5 Min. : 3.3
## 1st Qu.: 5.2 1st Qu.: 4.8 1st Qu.: 5.5 1st Qu.: 5.4 1st Qu.: 5.1
## Median : 6.8 Median : 6.8 Median : 7.0 Median : 7.0 Median : 6.9
## Mean : 6.8 Mean : 6.8 Mean : 7.0 Mean : 7.0 Mean : 6.8
## 3rd Qu.: 8.1 3rd Qu.: 8.1 3rd Qu.: 8.4 3rd Qu.: 8.5 3rd Qu.: 8.1
## Max. :11.2 Max. :11.2 Max. :11.3 Max. :11.3 Max. :11.3
## GSM239903 GSM239923 GSM239900 GSM239933 GSM239870
## Min. : 3.4 Min. : 3.5 Min. : 3.6 Min. : 3.5 Min. : 3.3
## 1st Qu.: 5.2 1st Qu.: 5.1 1st Qu.: 5.3 1st Qu.: 5.1 1st Qu.: 5.3
## Median : 7.0 Median : 6.8 Median : 7.0 Median : 6.8 Median : 7.0
## Mean : 7.0 Mean : 6.8 Mean : 6.9 Mean : 6.8 Mean : 7.0
## 3rd Qu.: 8.6 3rd Qu.: 8.0 3rd Qu.: 8.6 3rd Qu.: 8.1 3rd Qu.: 8.4
## Max. :11.3 Max. :11.3 Max. :11.3 Max. :11.2 Max. :11.0
## GSM239889 GSM239924 GSM239880 GSM239878 GSM239931
## Min. : 3.5 Min. : 3.4 Min. : 3.2 Min. : 3.6 Min. : 3.3
## 1st Qu.: 5.4 1st Qu.: 5.1 1st Qu.: 5.4 1st Qu.: 5.3 1st Qu.: 5.1
## Median : 7.0 Median : 6.8 Median : 7.0 Median : 7.0 Median : 6.9
## Mean : 7.0 Mean : 6.8 Mean : 7.0 Mean : 7.0 Mean : 6.8
## 3rd Qu.: 8.5 3rd Qu.: 8.1 3rd Qu.: 8.4 3rd Qu.: 8.4 3rd Qu.: 8.1
## Max. :11.2 Max. :11.2 Max. :11.3 Max. :11.3 Max. :11.1
## GSM239884 GSM239896 GSM239871 GSM239920 GSM239916
## Min. : 3.4 Min. : 3.5 Min. : 3.5 Min. : 3.4 Min. : 3.3
## 1st Qu.: 5.4 1st Qu.: 5.3 1st Qu.: 5.3 1st Qu.: 5.0 1st Qu.: 5.2
## Median : 6.9 Median : 7.0 Median : 7.0 Median : 6.7 Median : 6.8
## Mean : 7.0 Mean : 7.0 Mean : 7.0 Mean : 6.8 Mean : 6.9
## 3rd Qu.: 8.5 3rd Qu.: 8.6 3rd Qu.: 8.3 3rd Qu.: 7.9 3rd Qu.: 8.1
## Max. :11.2 Max. :11.4 Max. :11.3 Max. :11.3 Max. :11.3
## GSM239918 GSM239876 GSM239897 GSM239914 GSM239938
## Min. : 3.4 Min. : 3.6 Min. : 3.5 Min. : 3.5 Min. : 3.3
## 1st Qu.: 5.0 1st Qu.: 5.3 1st Qu.: 5.4 1st Qu.: 5.3 1st Qu.: 5.1
## Median : 6.6 Median : 7.0 Median : 7.0 Median : 7.0 Median : 6.8
## Mean : 6.8 Mean : 7.0 Mean : 7.0 Mean : 7.0 Mean : 6.8
## 3rd Qu.: 8.1 3rd Qu.: 8.4 3rd Qu.: 8.5 3rd Qu.: 8.7 3rd Qu.: 8.1
## Max. :11.3 Max. :11.4 Max. :11.4 Max. :11.3 Max. :11.2
## GSM239907 GSM239925 GSM239922 GSM239926 GSM239885
## Min. : 3.3 Min. : 3.5 Min. : 3.3 Min. : 3.2 Min. : 3.5
## 1st Qu.: 5.2 1st Qu.: 5.1 1st Qu.: 5.1 1st Qu.: 5.0 1st Qu.: 5.3
## Median : 7.0 Median : 6.9 Median : 6.8 Median : 7.0 Median : 7.0
## Mean : 6.9 Mean : 6.8 Mean : 6.8 Mean : 6.8 Mean : 7.0
## 3rd Qu.: 8.5 3rd Qu.: 8.1 3rd Qu.: 8.0 3rd Qu.: 8.1 3rd Qu.: 8.4
## Max. :11.3 Max. :11.2 Max. :11.3 Max. :11.2 Max. :11.2
## GSM239888 GSM239906 GSM239883 GSM239913 GSM239912
## Min. : 3.2 Min. : 3.1 Min. : 3.4 Min. : 3.4 Min. : 3.6
## 1st Qu.: 5.4 1st Qu.: 5.2 1st Qu.: 5.3 1st Qu.: 5.3 1st Qu.: 5.4
## Median : 7.0 Median : 7.1 Median : 7.0 Median : 7.0 Median : 7.0
## Mean : 7.0 Mean : 6.9 Mean : 7.0 Mean : 6.9 Mean : 7.0
## 3rd Qu.: 8.4 3rd Qu.: 8.6 3rd Qu.: 8.5 3rd Qu.: 8.7 3rd Qu.: 8.6
## Max. :11.2 Max. :11.2 Max. :11.2 Max. :11.3 Max. :11.2
## GSM239911 GSM239886 GSM239875 GSM239874 GSM239932
## Min. : 3.4 Min. : 3.3 Min. : 3.2 Min. : 3.6 Min. : 3.5
## 1st Qu.: 5.2 1st Qu.: 5.4 1st Qu.: 5.4 1st Qu.: 5.4 1st Qu.: 5.0
## Median : 7.2 Median : 7.0 Median : 7.0 Median : 7.0 Median : 6.9
## Mean : 7.0 Mean : 7.0 Mean : 7.0 Mean : 7.0 Mean : 6.8
## 3rd Qu.: 8.7 3rd Qu.: 8.4 3rd Qu.: 8.3 3rd Qu.: 8.2 3rd Qu.: 8.0
## Max. :11.2 Max. :11.2 Max. :11.4 Max. :11.3 Max. :11.2
We can transform some information in the metadata into factors:
metadata$Strain<-as.factor(metadata$Strain)
metadata$Condition<-as.factor(metadata$Condition)
metadata$Tissu<-as.factor(metadata$Tissu)Using a summary, we see better potential problems:
## ID SampleName Time ZTime
## Length:71 Length:71 Min. : 0.00 Min. : 0.000
## Class :character Class :character 1st Qu.: 9.00 1st Qu.: 3.000
## Mode :character Mode :character Median :18.00 Median : 6.000
## Mean :23.75 Mean : 8.873
## 3rd Qu.:36.00 3rd Qu.:12.000
## Max. :48.00 Max. :18.000
## Condition Tissu Strain
## Ctr:36 Brain:71 AK :23
## SD :35 B6 :22
## d2 : 1
## D2 :23
## NaN : 1
## NA's: 1
Some samples are unknown mouse Strain, they will be removed later and we see a typo. One mouse was annotated “d2” instead of “D2”.
We have to rename it:
For the experession data, we can see the 2 problematic recordings:
## [1] "7.1516016314504" "9.44083982847215" "5.58214299030138"
## [4] "8.77636374861736" "NA" "7.01073848225408"
## [7] "7.51168547926991" "10.9942396307951" "3.5426477506181"
## [10] "11.2689940905563" "4.53300849132838" "3.93627124437631"
## [13] "4.22529429703503" "9.64249266538495" "4.75981844248776"
## [16] "7.90611853773516" "8.43996954444962" "9.31335735689715"
## [19] "7.21183422219714" "9.43023479379811" "7.56275579853497"
## [22] "6.2637628719418" "4.55087222927257" "6.36221424813594"
## [25] "5.30887673968548" "6.65887027754669" "3.98358500197667"
## [28] "5.63816902313448" "6.54750271986372" "3.76705969902158"
## [31] "7.13752470247227" "6.60720091584353" "6.86501210504626"
## [34] "6.09345185819473" "3.78261561756992" "10.3716916857919"
## [37] "9.8656180529183" "7.88514755796799" "4.68059094256405"
## [40] "7.20394953277159" "9.48601693614409" "3.58116580206789"
## [43] "8.40294133877664" "3.91343915415062" "6.17218208925419"
## [46] "4.71936252106112" "7.84634257411119" "7.07133973499474"
## [49] "7.48043881913625" "5.38381073453594" "6.4117148219353"
## [52] "10.8638181809604" "8.53203369897089" "9.71412964400098"
## [55] "4.41553452782242" "7.52919043197241" "9.24428105566894"
## [58] "7.76586241594676" "3.48750361484482" "5.42885051906792"
## [61] "5.98329549404109" "3.57968102404718" "6.94199154485471"
## [64] "7.76108577095632" "4.41344177163489" "6.44354872772903"
## [67] "5.41883662252547" "7.59619062955516" "4.81955276503251"
## [70] "7.59033448049269" "7.35294344938346" "5.02283676418099"
## [73] "8.03051275657304" "3.99357055438367" "4.39597142665914"
## [76] "5.53312866101338" "9.04157200655132" "6.33434060568062"
## [79] "7.60263538244219" "4.74136709397252" "6.20835220106041"
## [82] "10.2598337137952" "8.61491530308567" "5.24883741386779"
## [85] "7.64891616685063" "7.17972217549752" "7.43210738888375"
## [88] "10.2739252040157" "4.96486634234627" "8.46040562013735"
## [91] "8.85002885322223" "5.33339849536578" "10.1298644723963"
## [94] "10.7028549569907" "9.28967262285694" "5.43844387208594"
## [97] "9.93303086975839" "10.2353270381247" "7.43120585339832"
## [100] "8.13002510919751"
## [1] "7.05629185690423" "NA" "3.62109088251407"
## [4] "8.707745577559" "5.39374970472041" "7.08153511654968"
## [7] "7.74787747159035" "10.9680636213069" "3.80060486399479"
## [10] "11.1107013306742" "4.71197277197259" "3.53585641389607"
## [13] "4.01707996453672" "9.71463446247705" "4.6415635029193"
## [16] "7.78199353116428" "8.46257287969743" "7.73506505192023"
## [19] "4.09695801725552" "9.15576828743918" "7.35418669669635"
## [22] "6.22118577527544" "4.44705855050427" "10.4432610264594"
## [25] "5.35815577184569" "6.52081278399767" "3.92886780339795"
## [28] "5.73817284791818" "4.60829120282569" "3.63324879156339"
## [31] "8.47825705219818" "5.91755868384873" "7.102553765068"
## [34] "5.62531797842723" "3.90491726990032" "8.96141708728896"
## [37] "10.0578429941317" "5.05643766900686" "4.72305652738904"
## [40] "5.27180439729869" "9.85817578542504" "3.67150002114846"
## [43] "8.56246753514483" "3.77349036449541" "6.23681159172155"
## [46] "4.71184400723521" "8.06911097400442" "7.32468290670881"
## [49] "7.47600606461747" "5.38473025175546" "6.14774027107141"
## [52] "10.8409328004626" "8.92639609384679" "9.77784799335737"
## [55] "4.57294092686079" "7.73700514863385" "9.18907529161975"
## [58] "7.49225626759818" "3.62792684046304" "7.81764997988425"
## [61] "5.83829787733053" "6.40617750266875" "6.82146934203883"
## [64] "7.68448055830424" "4.37590105568283" "6.3576105661357"
## [67] "6.08551924346881" "7.74021184367948" "4.69291810629799"
## [70] "7.54517212525693" "7.28068017269519" "5.00653506646486"
## [73] "7.9484165215413" "3.95159948552454" "8.81059463856966"
## [76] "5.57324942755916" "6.18176633687667" "6.34246068904597"
## [79] "7.49337681197434" "5.00314068374953" "5.99878340297803"
## [82] "10.0377220337655" "10.4104735822674" "5.13086146584666"
## [85] "7.82249697648517" "6.74553140259875" "7.06255159524849"
## [88] "10.1775570310208" "5.11567778914582" "6.85919004528415"
## [91] "10.4412623020071" "5.43434853189694" "7.49498210719405"
## [94] "10.4708706063564" "4.70574696727817" "5.76291643609249"
## [97] "8.06910207153329" "9.87247755896656" "7.37119130778341"
## [100] "6.32179850070519"
We see that the “NA” caused data to be saved as character, we can simply change them into numerical value:
## Warning: NAs introduits lors de la conversion automatique
## Warning: NAs introduits lors de la conversion automatique
We can save our data into a matrix, if all data are numerical it should be transformed into a numerical matrix.
All values are numerical now
## [1] "double"
With the %in% operator, we ask which strain are B6 and D2 and only keep row which are TRUE
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [25] TRUE TRUE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
## [37] TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [49] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [61] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
Directly reordering expression does not work, because 1 sample is missing
Keep only shared samples
We can do a control that they are in the same order:
##
## TRUE
## 45