### reload the data saved during the exercise
logdata = read.table("testoutput.txt")
data = readRDS("leukemiaExpressionSubset.rds")
annotations = data.frame(LeukemiaType = substr(colnames(data),1,3),
row.names = substr(colnames(data),10,13))
geneid = "ENSG00000140379"
bcl2a1_expression = as.numeric(logdata[geneid,])
boxplot(bcl2a1_expression~annotations$LeukemiaType)1. Introduction to R - solutions
Exercise 3
We are interested in the gene BCL2A1 because it has been implicated in many cancers, including Leukemia. We would like to see if it displays some interesting signal in our data.
The Ensemble identifier of this gene is ENSG00000140379. Use this identifier to extract the corresponding row from the log-data matrix, and show that it is disregulated in acute leukemia (ALL, AML):
Using the UCSC genome browser we find:
- Human BCL2A1 is on the reverse strand
- There are 2 isoforms according to the NCBI RefSeq and 3 according to GENCODE
- The next protein-coding gene upstream of BCL2A1 (in the direction of transcription) is ZFAND6, and downstream is MTHFS.
- There is a binding site for NFKB1 (nuclear factor kappa B subunit 1, a transcription factor) less than 10kb upstream of BCL2A1, within a Dnase-1 hypersensitive site bearing an H3K27ac mark: