Prepare ADPKD cohort data for survival modelling
prepare_adpkd_dataset.Rd
This helper reproduces the preprocessing pipeline that was previously in
temp.R
. It reads the baseline and follow-up Excel files, performs column
harmonisation, derives follow-up times and laboratory summaries, and returns a
cleaned dataset split into training and validation subsets.
Usage
prepare_adpkd_dataset(
baseline_path,
followup_path,
followup_reference = as.Date("2025-08-01"),
train_size = 300,
seed = 123456L
)
Arguments
- baseline_path
Path to the baseline Excel file (sheet 1, skip = 1).
- followup_path
Path to the follow-up Excel file (sheet 1).
- followup_reference
Date used when the RRT start date is missing. Either a
Date
or something coercible viaas.Date()
. Defaults to "2025-08-01".- train_size
Number of subjects to sample into the training set. If this exceeds the number of rows it falls back to
nrow(data)
.- seed
Integer seed for the train/validation split (default
123456
).