Background
Polypharmacy, defined as the use of multiple drugs for a single patient, can pose risk to adverse drug reactions especially for older adults with dementia. Growdon, et al. investigated the way adults 65 and older viewed their routine medication intake. The National Health and Aging Trends Study (NHATS) is a nationally representative survey of older adults in the United States. Using the NHATS Round 6 Medications Attitudes module, we focused on how adults with dementia subscribed to 3 main survey attitudes:
- Believe to be taking at least 1 prescribed medication no longer needed,
- Willing to deprescribe from medication under doctor’s advice, and
- Uncomfortable to take 5 or more pills daily.
To supplement the paper, we had to design a visualization summarizing 3 outcomes and 17 sociodemographic and clinical covariates. Undergoing what must have been 30 or more plot revisions, our paper was published along with the final plot in Journal of the American Geriatrics Society. To make a plot like this of your own, the most pivotal step is transforming your data.
Data
Using NHATS Round 6, we fit logistic regression models on MICE-imputed version of our dataset. From those models, we estimated the marginal predicted probabilities. These analyses were all done in Stata 17 software. I wrangled the Stata output data in R to create a long-format dataset with the Outcome, Covariate and Category/Level as the ID variables. This data cleaning process will vary between projects.
For each combination of Outcome, Covariate, Category/Level, these data have corresponding model estimates. I created a column called Highlight which defaults to 0 if the p-value for that row is greater than 0.05.
plot_data
To impose the ordering of our outcome variables and covariates, we have convert our columns Outcome and Category to factor variables.
# Manually written order of variables
cov_levels <- c('Age',
'Sex',
'Race/Ethnicity',
'Education',
'Married/Partnered',
'Medicaid',
'Chronic Conditions',
'Regular Medications',
'Self-Rated Health',
'Dementia Classification',
'Reported Dementia Diagnosis',
'Proxy Status',
'Hospitalized in Past Year',
'Seen Regular Doctor in Past Year',
'ADL Difficulties',
'Difficulty Tracking Medications',
'Fall in Past Month')
cat_levels <- c('no', 'yes',
'65-74', '75-84', '85+',
'male', 'female',
'white, non-hispanic', 'black, non-hispanic', 'hispanic', 'other',
'below high school', 'high school', 'beyond high school',
'<6', '6+',
'0-1', '2-3', '>3',
'possible dementia', 'probable dementia',
'<2', '2+',
'fair/poor', 'good', 'excellent/very good',
'sample person', 'proxy respondent')
# Converting to factor variables
plot_data <- plot_data %>%
mutate(Covariate=factor(Covariate, levels=cov_levels)) %>%
mutate(Category=factor(Category, levels=cat_levels)) %>%
mutate(Level=factor(Level))
Next, we’ll need reformat some of our string variables using functions from the stringr library. This is so that our plot labels have a “text wrap” style to them as opposed to them overflowing/overlapping. I used two wrapped functions that Hadley Wickham posted on a tidyverse issue.
# Wrapper Functions
library(stringr)
str_wrap_factor <- function(x, width) {
levels(x) <- stringr::str_wrap(levels(x), width)
x
}
str_to_title_factor <- function(x, width) {
levels(x) <- stringr::str_to_title(levels(x))
x
}
# Converting the variables
plot_data <- plot_data %>%
mutate(Covariate=str_wrap_factor(Covariate, width=20)) %>%
mutate(Category=str_wrap_factor(Category, width=25)) %>%
mutate(Category=str_to_title_factor(Category)) %>%
mutate(Outcome=str_wrap_factor(Outcome, width=35))Plot
Finally, we can run our plotting code! I used the ggplot2 library to make this with several layers. If you have gotten as far as to format your data as I described above, then this step should be quick. This was my procedure.
- Using our clean long-format dataset above, I started with a blank plot with the aesthetics:
- x: Coefficient
- y: Category
- col: Level
- I used
geom_rectto set a background color for the groupings that were desginated as statistically-significant.
- I used
geom_pointto plot the predicted marginal probability of agreement within that group as a square point (to match thelineranges in the next step). - Using the columns
LBandUB, we created the colorful horizontal bars representing the 95% confidence intervals around each estimate. - To separate the estimates into their own boxes, we used
facet_grid. I made some design decisions!
library(ggplot2)
# Take the data we just cleaned
p <- plot_data %>%
# Set the inner x-axes (Coefficient) and y-axes (Category)
ggplot(aes(x=Coefficient, y=Category, col=Level)) +
# Conditionally set background color when P-value is < 0.05
geom_rect(data=subset(plot_data, Highlight==1),
aes(fill=Highlight), xmin=-Inf,xmax=Inf, ymin=-Inf, ymax=Inf, alpha=0.1) +
# Leave a square point at the Coefficient value
geom_point(shape='square', col='black') +
# Create a bar from the LB to UB of the 95% confidence interval
geom_linerange(aes(xmin=LB, xmax=UB), alpha=0.7, lwd=2) +
# Create a grid using Covariate and Outcome
facet_grid(Covariate~Outcome, scales='free_y', switch='both') +
# Set the overall ggplot2 theme
theme_bw() +
# Adjust where the labels are and remove the legend
theme(strip.text.y.left=element_text(angle=0),
strip.placement='outside',
legend.position='none') +
# Correct the y-scaling, otherwise would come out backward
scale_y_discrete(limits=rev) +
# Remove the words "Outcome" and "Covariate" from the plot
ylab('') +
xlab('')
ggsave(filename='attitude-agreements.png', width=12, height=9, dpi=300)