Skip to content

lbenz730/semiparametric_missing_elig

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Robust Causal Inference for Point Exposures with Missing Eligibility Criteria

Benz, L., Mukherjee, R., Wang, R., Arterburn, D., Fischer, H., Lee, C., Shortreed, S.M., Haneuse, S., and Levis, A.W. "Robust Causal Inference for EHR-based Studies of Point Exposures with Missingness in Eligibility Criteria" Under Review (Pre-Print)

R Scripts (scripts/)

  • helpers.R: File of helper functions

Data (data/)

Folder of scripts used to clean and process EHR data for use in data application

  • rygb_vsg_data_prep.R: This script is used to prep analysis dataset(s) for data application presented in the paper. It calls several raw EHR files which are not directly sharable due to data use agreements with Kaiser Permanente. Nevertheless, this script is commented with specific details on how the underlying cohort was created including application of the eligibility criteria to the entire cohort across all 40 operationalizations.
  • surgical_px_cleaning.R Clean some chart review for surgical procedure types and correct a few cases that were incorrectly tagged in the original EHR files.

Analysis (analysis/aligned_t0)

Folder of scripts used for data application analysis. For each of the two outcomes examined, there is one script that fits each of the four estimators explored in this work. When used, specific functions in each script are commented with descriptions of input and output. Given that the underlying EHR data can not be shared, this example can not be reproduced locally. For a detailed reproducible example, please refer to the [OMOP Worked Example]

Weight Change Analysis

  • fit_CC_outcome_regression_estimator.R: Naive ATT analysis ( $\hat\theta_\text{CC}$) for weight change outcome
  • fit_iwor_estimator.R: IWOR ATT analysis (with $\hat\theta_\text{IWOR}$) for weight change outcome
  • fit_IF_estimator.R: IF ATT analysis (with $\hat\theta_\text{IF}$) for weight change outcome
  • fit_EIF_estimator.R: EIF ATT analysis (with $\hat\theta_\text{EIF}$) for weight change outcome

T2DM Remission Analysis

  • fit_CC_outcome_remission: Naive ATT analysis ( $\hat\theta_\text{CC}$) for diabetes remission outcome
  • fit_iwor_estimato_remission.R: IWOR ATT analysis (with $\hat\theta_\text{IWOR}$) for diabetes remission outcome
  • fit_IF_estimator_remission.R: IF ATT analysis (with $\hat\theta_\text{IF}$) for diabetes remission outcome
  • fit_EIF_estimator_remission.R: EIF ATT analysis (with $\hat\theta_\text{EIF}$) for diabetes remission outcome

Figures

  • diabetes_figure.R: Plot of diabetes figure showing frequency of certain measurements for select surgical patients (Generates Figure 1)
  • elig_figures.R: Plot eligibility distributions (Generates Figure 2 and S2)
  • measure_times_figure.R: Plots distribution of time between date of surgery and most recent measure of BMI/A1c (Generates Figure 3)
  • plot_nuisances.R: Plotting code for distributions of nuisance functions (Generates Figures 4, S3, and S4)
  • plot_results.R: Plot point estimates and 95% confidence intervals (Generates Figure 4)

Worked OMOP Example (worked_omop_example)

Given that the underlying EHR data can not be shared, the example presented in the main text can not be reproduced locally. For a detailed reproducible example, we have created a working example based on OMOP-CDM formated data. In particular, we use the omock package to create a synthetic dataset based on OMOP standards which can be shared. We then demonstrate how to turn this data into an analytical dataset to be analyzed by $\widehat\theta_\text{EIF}$, and analyze the sythetic dataset.

  • build_omop_example.R: This script creates a synthetic EHR dataset using the omock package. It then illustrates how to clean this example dataset and prepare the dataset for analysis by our EIF-based estimator, $\widehat\theta_\text{EIF}$. The final output of this script is the dataset scripts/worked_omop_example/analysis_dataset.csv.
  • EIF_omop_example.R: This script contains a documented function eif_estimator which implements $\widehat\theta_\text{EIF}$. The script loads in the prepared dataset scripts/worked_omop_example/analysis_dataset.csv and applies eif_estimator to that worked data example.

Simulations (simulations/aligned_t0)

Folder of scripts for a setting where time zero ($t_0$) is aligned for all subjects so we only need to consider eligibility, missingness, etc. at a single time per subject, and matching is not needed (insofar as it is a mechanism for establishing time zero). Contains an implementation of $\widehat\theta_\text{EIF}, \widehat\theta_\text{IF}$ and $\widetilde\theta_\text{IF}$, $\widehat\theta_\text{CC}$ and $\widehat\theta_\text{IWOR}$

  • estimators.R: Script contains functions which implement each of the four estimators explored in this work, in the context of the simulation study. Specific parameters used in simulations and estimator instructions are downloadable in simulations/aligned_t0/inputs.
  • generate_data.R: Script contains function generate_data to generate simulated datasets given a list of instructions.
  • inform_sims.R: Exploratory analysis to guide range of models for consideration in simulated datasets. This script is how the coefficient values used in generating simulated datasets (those in Table S2) were chosen.
  • run_simulation.R: This script calls all functions to generate and analyze simulated datasets. In other words, this is the main simulation wrapper which controls the simulations.
  • pnp.R: Script to examine how well $\mu_0$ is calibrated in single simulated dataset between correctly specified parametric model and 6 non-parametric models based on SuperLearner (Generates Figure S1)
  • specify_inputs.R: Script which specifies simulation parameters and estimators for consideration. This script is specifically used to generate params used by generate_data function in generate_data.R.
  • latex_tables.R: Generates all tables for describing simulation results (Table S1) and parameters (Table S2).

Data (data/)

  • Simulation inputs + results
  • Data application results

Figures (figures/)

Figures saved out from various analyses

Figures (tables/)

Tables saved out from various analyses

Jobs (jobs/)

.sh files for batch jobs on the cluster

  • aligned_t0_sims_loop.sh: SBATCH job file for running simulations for comnination of estimator/simulation parameters
  • run_aligned_t0_loop.sh: Wrapper for fully 2-D job array for aligned_t0_sims_loop.sh .
  • data_application.sh: Wrapper for submitting all the jobs for the data application, in the application/ sub-directory.

About

Code/analysis/simulations for robust and efficient causal inference from electronic health record based (observational) cohort studies with missing study eligibility criteria.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors