Streamline Your Research: Creating Publication-Ready Demographics Tables in R

Author

Jason Gavrilis

Introduction

When publishing studies involving human participants, a demographics table is a vital tool for summarizing the characteristics of the population under investigation.

Demographics are the characteristics of a population that have been categorized by distinct criteria.
Figure 1: Demographics are the characteristics of a population that have been categorized by distinct criteria.

However, creating a publication-ready demographics table can be a challenging and time-consuming task. If you’ve ever struggled with this, fear not! The following article will guide you through the process of creating a publication-ready demographics table in R, using the gtsummary and flextable packages. These tools will save you time and effort, allowing you to focus on the insights from your data rather than the intricacies of table formatting.

Prerequisites

Load required packages:

Code
if (!require(pacman)) install.packages("pacman")
library(pacman)
p_load(
  tidyverse, flextable, gtsummary, labelled, officer
)

Demographics Table

Figure 2 shows an example demographics table published by Ching et al. (2022) in the Ear and Hearing Journal.

Figure 2: Image of Table 1 taken from Ching et al. (2022).

Let’s create our own demographics table using a similar, but custom data set.

Data set

First we make a custom data frame which contains thirty observations of six demographic variables:

Code
data <- data.frame(
  sex = c(
    "Male", "Female", "Female",
    "Female", "Female", "Female", "Female", "Male", "Female",
    "Female", "Male", "Female", "Female", "Female", "Male",
    "Female", "Female", "Male", "Female", "Male", "Male",
    "Female", "Male", "Female", "Male", "Male", "Male", "Male",
    "Female", "Male"
  ),
    mode_hearing = c(
    "Bimodal", "Bilateral",
    "Bilateral", "Bimodal", "Bimodal", "Bilateral", "Bimodal",
    "Bilateral", "Unilateral", "Bimodal", "Unilateral",
    "Bilateral", "Unilateral", "Bilateral", "Unilateral", "Bilateral",
    "Bilateral", "Bilateral", "Bimodal", "Bimodal",
    "Bilateral", "Bimodal", "Bilateral", "Unilateral", "Bimodal",
    "Bilateral", "Bilateral", "Bimodal", "Bilateral",
    "Unilateral"
  ),
  age = c(
    71L, 52L, 47L, 44L, 68L, 57L,
    77L, 58L, 76L, 79L, 62L, 57L, 76L, 68L, 65L, 43L, 78L, 73L,
    58L, 45L, 76L, 44L, 66L, 42L, 76L, 63L, 76L, 54L, 50L,
    56L
  ),
  hearing_loss = c(
    98L, 108L, 66L, 99L, 115L, 95L,
    98L, 110L, 94L, 110L, 75L, 64L, 81L, 107L, 109L, 118L,
    75L, 79L, 119L, 67L, 73L, 73L, 86L, 97L, 72L, 82L, 91L,
    94L, 94L, 80L
  ),
  age_first_ci = c(
    63L, 42L, 36L, 41L, 57L, 51L,
    68L, 44L, 59L, 65L, 48L, 45L, 58L, 59L, 59L, 24L, 59L, 60L,
    52L, 36L, 73L, 24L, 48L, 37L, 64L, 46L, 62L, 38L, 40L,
    43L
  ),
  duration_use = c(
    8L, 10L, 11L, 3L, 11L, 6L, 9L,
    14L, 17L, 14L, 14L, 12L, 18L, 9L, 6L, 19L, 19L, 13L, 6L,
    9L, 3L, 20L, 18L, 5L, 12L, 17L, 14L, 16L, 10L, 13L
  )
)

flextable::flextable(head(data))

sex

mode_hearing

age

hearing_loss

age_first_ci

duration_use

Male

Bimodal

71

98

63

8

Female

Bilateral

52

108

42

10

Female

Bilateral

47

66

36

11

Female

Bimodal

44

99

41

3

Female

Bimodal

68

115

57

11

Female

Bilateral

57

95

51

6

Then we add nicely formatted labels to all variables in the data frame using the labelled package1:

Code
labelled::var_label(data) <-
  list(
    sex = "Sex",
    mode_hearing = "Mode of hearing",
    age = "Age at assessment (Years)",
    hearing_loss = "Hearing Loss (4FA)",
    age_first_ci = "Age at first CI (Years)",
    duration_use = "Duration of use (Years)"
  )

flextable::flextable(var_label(data) %>%
  as.data.frame())

sex

mode_hearing

age

hearing_loss

age_first_ci

duration_use

Sex

Mode of hearing

Age at assessment (Years)

Hearing Loss (4FA)

Age at first CI (Years)

Duration of use (Years)

Basic Demographics Table

We then create the demographics table with the tbl_summary() function from the gtsummary package:

Code
# set gtsummary themes
gtsummary::reset_gtsummary_theme()
#theme_gtsummary_journal(journal = "nejm", set_theme = TRUE)

# create demographics table
basic <- data %>%
  tbl_summary(
    type = all_continuous() ~ "continuous2",
    statistic = list(
      all_continuous() ~ c(
        "{mean} ({sd})",
        "{median}",
        "{min}, {max}"
      ),
      all_categorical() ~ "{n} ({p}%)"
    ),
    digits = list(all_continuous() ~ 1),
    missing = "no",
  ) %>%
  bold_labels() %>%
  modify_header(
    label = "",
    all_stat_cols() ~ "**{level}**, N = {n} ({style_percent(p)}%)"
  ) %>%
  gtsummary::as_flex_table()

basic %>% 
  flextable::autofit() %>% 
  flextable::width(width = 3) %>% 
  flextable::line_spacing(space = 0.35, part = "body")
Table 1: Participant Demographics

Overall, N = 30 (100%)1

Sex

Female

17 (57%)

Male

13 (43%)

Mode of hearing

Bilateral

14 (47%)

Bimodal

10 (33%)

Unilateral

6 (20%)

Age at assessment (Years)

Mean (SD)

61.9 (12.3)

Median

62.5

Range

42.0, 79.0

Hearing Loss (4FA)

Mean (SD)

91.0 (16.4)

Median

94.0

Range

64.0, 119.0

Age at first CI (Years)

Mean (SD)

50.0 (12.5)

Median

49.5

Range

24.0, 73.0

Duration of use (Years)

Mean (SD)

11.9 (4.9)

Median

12.0

Range

3.0, 20.0

1n (%)

And there we have a demographics table that is ready to publish!

Advanced Demographics Table

Sometimes, we we need to display demographic variables across a specific study group, such as comparing participant characteristics based on different modes of hearing.

With a few adjustments to the tbl_summary() function, we can make a new demographics table that allows us to quickly observe the differences in participant characteristics across the various hearing modes:

Code
# create demographics table with grouping variable
gtsummary::theme_gtsummary_compact(set_theme = TRUE, font_size = 10)

adv <- data %>%
  tbl_summary(
    by = mode_hearing,
    type = all_continuous() ~ "continuous2",
    statistic = list(
      all_continuous() ~ c(
        "{mean} ({sd})",
        "{median}",
        "{min}, {max}"
      ),
      all_categorical() ~ "{n} ({p}%)"
    ),
    digits = list(all_continuous() ~ 1),
    missing = "no",
  ) %>%
  add_overall() %>%
  add_p() %>%
  bold_labels() %>%
  modify_header(
    label = "",
    all_stat_cols() ~ "**{level}**, N = {n} ({style_percent(p)}%)"
  ) %>%
  gtsummary::as_flex_table()

adv %>% 
  flextable::line_spacing(space = 1.1, part = "body") %>% 
  flextable::autofit()
Table 2: Participant Demographics by Mode of Hearing

Overall, N = 30 (100%)1

Bilateral, N = 14 (47%)1

Bimodal, N = 10 (33%)1

Unilateral, N = 6 (20%)1

p-value2

Sex

>0.9

Female

17 (57%)

8 (57%)

6 (60%)

3 (50%)

Male

13 (43%)

6 (43%)

4 (40%)

3 (50%)

Age at assessment (Years)

>0.9

Mean (SD)

61.9 (12.3)

61.7 (11.5)

61.6 (14.3)

62.8 (12.9)

Median

62.5

60.5

63.0

63.5

Range

42.0, 79.0

43.0, 78.0

44.0, 79.0

42.0, 76.0

Hearing Loss (4FA)

0.6

Mean (SD)

91.0 (16.4)

89.1 (17.1)

94.5 (18.3)

89.3 (12.9)

Median

94.0

88.5

98.0

87.5

Range

64.0, 119.0

64.0, 118.0

67.0, 119.0

75.0, 109.0

Age at first CI (Years)

>0.9

Mean (SD)

50.0 (12.5)

49.2 (12.5)

50.8 (15.1)

50.7 (9.4)

Median

49.5

47.0

54.5

53.0

Range

24.0, 73.0

24.0, 73.0

24.0, 68.0

37.0, 59.0

Duration of use (Years)

0.6

Mean (SD)

11.9 (4.9)

12.5 (4.8)

10.8 (5.0)

12.2 (5.5)

Median

12.0

12.5

10.0

13.5

Range

3.0, 20.0

3.0, 19.0

3.0, 20.0

5.0, 18.0

1n (%)

2Fisher's exact test; Kruskal-Wallis rank sum test

Exporting tables to Word

Typically, a demographics table will need to be inserted into a Microsoft Word document where it can join the rest of a manuscript or report.

The gtsummary package - in combination with the flextable and officer packages - makes exporting demographics tables to Microsoft Word very easy.

Let’s export our demographics tables to Microsoft word using flextable and officer packages:

Code
flextable::save_as_docx(adv,
  path = "articles/demographics_table_r/demo_table_adv.docx",
  pr_section =
    officer::prop_section(
      page_size = officer::page_size(
        orient = "landscape",
        width = 8.3, height = 11.7
      ),
      type = "continuous",
      page_margins = officer::page_mar()
    )
)

Screenshot of the advanced demographics table exported to Microsoft Word.

Once exported, the table can be edited just like any other Microsoft Word table.

Final Thoughts

  • When publishing study results, creating comprehensive and well-formatted demographics tables is essential, but it can also be time-consuming. 

  • The gtsummary package in R drastically simplifies the process of generating publication-ready demographics tables.

  • With the help of flextable and officer packages, the gtsummary tables can be easily exported to Microsoft Word.

References

Ching, Teresa Y. C., Harvey Dillon, Sanna Hou, Mark Seeto, Ana Sodan, and Nicky Chong-White. 2022. “Development and Evaluation of a Language-Independent Test of Auditory Discrimination for Referrals for Cochlear Implant Candidacy Assessment.” Ear and Hearing 43 (4): 1151. https://doi.org/10.1097/AUD.0000000000001166.
Gohel, David. 2023. “Officer: Manipulation of Microsoft Word and PowerPoint Documents.” https://CRAN.R-project.org/package=officer.
Gohel, David, and Panagiotis Skintzos. 2023. “Flextable: Functions for Tabular Reporting.” https://CRAN.R-project.org/package=flextable.
Larmarange, Joseph. 2022. “Labelled: Manipulating Labelled Data.” https://CRAN.R-project.org/package=labelled.
Sjoberg, Daniel D., Karissa Whiting, Michael Curry, Jessica A. Lavery, and Joseph Larmarange. 2021. “Reproducible Summary Tables with the Gtsummary Package” 13: 570–80. https://doi.org/10.32614/RJ-2021-053.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the Tidyverse 4: 1686. https://doi.org/10.21105/joss.01686.

Footnotes

  1. When creating the demographics table, these formatted labels will be shown in the table instead of the actual variable names. This is an important step which will save you a lot of effort renaming individual variables within the table.↩︎