R
in ActionEfficient data science with R
A demonstration by Md. Aminul Islam Shazid.
Grammar of graphics with ggplot2
ggplot2
ggplot2
is an R
package that implements the grammar of graphics.More informative: gives a sense of the density too!
To show trend or evolution.
LOESS
smoother added as a trend line.
Fast data exploration with DataExplorer
Boxplots of all continuous variables with groups formed with respect to a categorical variable
Publication ready tables with gtsummary
This is the so-called table-1
library(gtsummary)
tbl_summary(
data = trial,
missing_text = "NA",
include = c("age", "trt", "marker", "stage", "grade", "death")
) |>
bold_labels()
Characteristic | N = 2001 |
---|---|
Age | 47 (38, 57) |
NA | 11 |
Chemotherapy Treatment | |
Drug A | 98 (49%) |
Drug B | 102 (51%) |
Marker Level (ng/mL) | 0.64 (0.22, 1.39) |
NA | 10 |
T Stage | |
T1 | 53 (27%) |
T2 | 54 (27%) |
T3 | 43 (22%) |
T4 | 50 (25%) |
Grade | |
I | 68 (34%) |
II | 68 (34%) |
III | 64 (32%) |
death | 112 (56%) |
1 Median (IQR); n (%) |
tbl_summary(
data = trial,
include = c("age", "trt", "marker", "stage", "grade"),
by = "death",
percent = "row",
missing_text = "NA",
) |>
add_p() |>
bold_p() |>
bold_labels() |>
modify_spanning_header(all_stat_cols() ~ "**Death**")
Characteristic | Death | p-value2 | |
---|---|---|---|
No, N = 881 | Yes, N = 1121 | ||
Age | 47 (36, 57) | 48 (38, 58) | 0.5 |
NA | 2 | 9 | |
Chemotherapy Treatment | 0.4 | ||
Drug A | 46 (47%) | 52 (53%) | |
Drug B | 42 (41%) | 60 (59%) | |
Marker Level (ng/mL) | 0.73 (0.23, 1.33) | 0.57 (0.20, 1.45) | 0.6 |
NA | 2 | 8 | |
T Stage | 0.004 | ||
T1 | 29 (55%) | 24 (45%) | |
T2 | 27 (50%) | 27 (50%) | |
T3 | 21 (49%) | 22 (51%) | |
T4 | 11 (22%) | 39 (78%) | |
Grade | 0.080 | ||
I | 35 (51%) | 33 (49%) | |
II | 32 (47%) | 36 (53%) | |
III | 21 (33%) | 43 (67%) | |
1 Median (IQR); n (%) | |||
2 Wilcoxon rank sum test; Pearson’s Chi-squared test |
gtsummary
many different kinds of statistical models. Adding support for new models is also very easy.
logit_model <- glm(death ~ age + trt + marker + stage + grade,
data = trial, family = binomial)
tbl_regression(
logit_model,
exponentiate = TRUE
) |>
bold_p() |>
bold_labels()
Characteristic | OR1 | 95% CI1 | p-value |
---|---|---|---|
Age | 1.01 | 0.99, 1.03 | 0.3 |
Chemotherapy Treatment | |||
Drug A | — | — | |
Drug B | 1.42 | 0.76, 2.67 | 0.3 |
Marker Level (ng/mL) | 0.95 | 0.65, 1.38 | 0.8 |
T Stage | |||
T1 | — | — | |
T2 | 1.53 | 0.67, 3.55 | 0.3 |
T3 | 1.46 | 0.59, 3.67 | 0.4 |
T4 | 5.34 | 2.13, 14.3 | <0.001 |
Grade | |||
I | — | — | |
II | 1.07 | 0.49, 2.34 | 0.9 |
III | 2.07 | 0.97, 4.52 | 0.062 |
1 OR = Odds Ratio, CI = Confidence Interval |
Decision tree classifier in R
Classifying disease outcome using decision tree
Hierarchical clustering in R
KNN clustering in R
iris
datasetTime series analysis in R
Decomposing the AirPassengers
data into trend, seasonality etc.