AKI-1 and baseline SCr definition
The study workflow is outlined in Fig. 1. Following the AKI clinical guideline established by KDIGO11, we categorized AKI severity based on the SCr-based criteria. AKI-1 was defined by a 1.5-fold increase in baseline SCr within 7 days or an increase in SCr by 0.3 mg/dL within 48 h. Since there is no standard definition for estimating baseline SCr, the baseline SCr level was determined by a three-step approach described in Supplementary Fig. 1, where the SCr measurements documented 1 week prior to admission had the strongest level of evidence, followed by records from 365 days to seven days before admission, and then the baseline was inferred using the modification of diet in renal disease (MDRD) formula39.

This study comprises four main steps. First, we compute baseline SCr levels for each patient and identify cases of AKI-1. Second, we apply consensus k-means clustering to the SCr trajectories of AKI-1 patients to derive novel AKI-1 subphenotypes. Third, each AKI-1 patient is matched to a clinically similar non-AKI patient to form a control group. Finally, we perform a comprehensive analysis of clinical outcomes for both AKI-1 and non-AKI patients. SCr serum creatinine, AKI acute kidney injury, KDIGO kidney disease: improving global outcomes.
Study population
This retrospective study included adult patients who were hospitalized between February 1, 2009, and February 1, 2022, across eight academic hospitals in seven US states. The study population comprised individuals who either experienced an AKI-1 onset episode or did not during their hospital stay. Participating institutions included the University of Pittsburgh/University of Pittsburgh Medical Center (UPMC), University of Texas Health Science Center at San Antonio (UTHSCSA), University of Iowa (UIOWA), University of Texas Southwestern Medical Center (UTSW), Medical College of Wisconsin (MCW), University of Missouri Health Care (MUHC), University of Utah (UofU), and University of Kansas Medical Center (KUMC).
The detailed eligibility criteria for patient selection were as follows: (1) Exclusion of patients with preexisting end-stage renal disease (ESRD); (2) Exclusion of patients with a history of dialysis or RRT; (3) Exclusion of patients whose estimated glomerular filtration rate (eGFR) <15 mL/min/1.73 m2 or baseline SCr >3.5 mg/dL; (4) A minimum hospital stay of two days; (5) The encounter must involve an AKI-1 as the only AKI stage (i.e., no AKI stage progression during hospitalization); (6) At least one SCr measurement on each of the three days: 2 days before AKI-1 onset, 1 day before AKI-1 onset, and the day of AKI-1 onset; (7) No more than two missing SCr measurements within the seven-day observation window. (8) Only the earliest encounter of each patient was kept, making sure each patient enrolled in the study was unique. A graphical depiction of the cohort entry process is provided in Supplementary Fig. 2.
To minimize confounding variables, we devised a three-step matching framework to pair each AKI-1 patient with a non-AKI counterpart. The matching process was based on demographics, baseline SCr, and comorbidities, as detailed in Supplementary Data 1. Before the matching began, all AKI-1 patients’ encounters were excluded from the non-AKI candidate pool to prevent them from being matched with their own encounters that did not have an AKI onset. In step one, we applied rule-based matching based on demographics and baseline SCr levels to identify all eligible non-AKI candidates for each AKI-1 patient. These variables were matched first because they are closely associated with kidney function and influence all kidney-related clinical outcomes40.
In step two, each eligible non-AKI candidate was assigned a score based on their comorbidities. Different comorbidities were weighted according to their respective impact on kidney-related clinical outcomes2. In step three, the most suitable non-AKI counterpart was then selected based on these comorbidity-based score rankings for each AKI-1 patient. The matching was done without replacement. In this way, each AKI-1 patient was matched with a unique, similar non-AKI patient.
This results in a cohort of AKI-1 patients aged 18–89 years (55.57% male) and a matched cohort of non-AKI patients aged 18–90 years (55.57% male).
Data processing
We collected 7-day SCr measurements preceding AKI-1 onset. Notably, we did not distinguish between the sources of SCr measurements. Both pre-admission and post-admission SCr values were considered equally valid. Multiple measurements of SCr on the same day were averaged. We employed linear interpolation to impute data for days without SCr measurements, assuming constant values outside of the first and last measurements. This approach has been shown to be effective for handling short, unevenly sampled time series data41.
To characterize the identified AKI-1 subphenotypes and quantify their differences and similarities, we extracted the comorbidities of interest, family history, laboratory test results, mortalities, and post-hospitalization dialysis, RRT, and CKD diagnoses from the EHR data of the study population. For comorbidities of interest and family history, data collection covered records prior to the patient’s index hospitalization admission date. The medical codes used to extract comorbidities of interest and family history are presented in Supplementary Data 2. For laboratory test results, we collected results from a seven-day window before AKI-1 onset. For each non-AKI patient, we used the date of the last SCr measurement as the alternative to the AKI-1 onset date to collect laboratory test results. Variables with a missing rate > 30% were excluded. For the remaining variables, missing values were imputed using the Multivariate Imputation by Chained Equations (MICE) algorithm42. For mortalities, dialysis, RRT, and CKD diagnoses, data collection spanned 1 year after the patient’s discharge date from the index hospitalization.
Clustering
Rather than using the raw SCr trajectories as inputs for clustering, we derived four trajectory-based features. Based on our preliminary experiments, these features produced clearer cluster separation compared to using the raw SCr time series directly. The four features were defined as follows: (1) SCr level at AKI-1 onset, measuring the absolute kidney impairment level at that point; (2) change in SCr level from baseline SCr to AKI-1 onset, measuring the absolute reduction in kidney function; (3) change in SCr level from 48-hours prior to AKI-1 onset, measuring the short-period reduction in kidney function; and (4) the difference between the average SCr level of the first four days within the seven-day window and the baseline SCr level, measuring preexisting kidney impairment before the onset of AKI-1. Min-max normalization was performed for each feature so that each feature’s value is between 0 and 1, preventing any single feature from dominating the distance measure between SCr trajectories when performing clustering.
The objective of the clustering was to partition patients into a smaller number of groups or subphenotypes solely based on their derived trajectory features, without incorporating laboratory test results or subsequent clinical outcomes. We employed consensus k-means clustering, an advanced variant of the traditional Euclidean-distance-based k-means algorithm, which aligns with its statistical assumptions because the input trajectory features were outlier-free, normalized, low-dimensional, and independent. It involves running k-means multiple times with different initializations and aggregating the results to find a consensus clustering. We ran a total of 100 iterations on the four SCr trajectory features, sampling 80% of the patients and 3 features each time. This approach addresses the sensitivity of traditional k-means to initial conditions, ensuring more stable and reliable clustering results.
To ascertain the optimal number of clusters, we considered four metrics: the Silhouette score, the Bayesian information criterion, the Davies-Bouldin, and the Calinski-Harabasz index. Hierarchical clustering was further employed to assess clustering robustness. We evaluated the clustering robustness using a confusion matrix and visual checks of t-distributed stochastic neighbor embedding (t-SNE) plots43 to examine overlap between results from consensus k-means and hierarchical clustering. The optimal number of clusters was determined by cluster sizes, clustering quality, clear separation in the consensus-matrix heatmaps, and the structure of the dendrogram produced by hierarchical clustering.
Statistics and reproducibility
Once the optimal number of clusters was determined, AKI-1 patients were clustered accordingly. A matching non-AKI patient was then assigned to each AKI-1 patient in the corresponding group. For each AKI-1 subphenotype, comparisons were made against other subphenotypes and their non-AKI counterparts across demographics, comorbidities, laboratory test results, and short-term and long-term clinical outcomes.
We further estimated the odds ratios (ORs) of adverse clinical outcomes for AKI-1 subphenotypes and their non-AKI counterparts using logistic regression (LR) models. The outcomes evaluated included AKD, post-discharge 30-day and 1-year all-cause mortality, the need for dialysis or RRT within 1 year, and the incidence of CKD within 1 year. AKD was defined by the Acute Dialysis Quality Initiative as a condition wherein criteria for AKI-1 or greater persist ≥7 days after an exposure44. Accordingly, in this study, we defined AKD as a SCr level not returning to below 1.5 times the SCr baseline within 7 days after AKI-1 onset.
The AKI-1 subphenotype associated with more favorable outcomes was used as the reference group for comparisons among subphenotypes, while each AKI-1 subphenotype’s corresponding non-AKI group served as the reference for comparisons between AKI-1 and non-AKI individuals. Comparisons among the remaining AKI-1 subphenotypes (i.e., those not involving the designated AKI-1 reference group) were derived through contrasts of estimated coefficients within the same fitted model, rather than by refitting separate models. All LR models were fitted under four levels of adjustment: (1) Unadjusted; (2) Model 1: adjusted for age and sex; (3) Model 2: further adjusted for the most severe CKD stage, cardiovascular diseases, chronic liver diseases, and diabetes mellitus; (4) Model 3: further adjusted for baseline SCr levels. In addition to the LR models, we also employed Cox proportional hazards models to independently analyze risk. This parallel approach allowed us to verify the robustness of the conclusions derived from the LR analyses by assessing their consistency with the results from the Cox models.
Statistical comparisons of demographics, comorbidities, and laboratory test results were performed using the chi-squared test for categorical variables and the Mann–Whitney U-test for continuous variables. These statistical tests were conducted in a pairwise manner, i.e., between AKI-1 subphenotypes and between AKI-1 subphenotypes and the corresponding non-AKI counterparts. p < 0.05 was considered statistically significant. Variations in laboratory test results were summarized using a table reporting the percentage of patients with abnormalities for each laboratory test, alongside ranked plots of variables based on the mean standardized differences between pairs of subphenotypes.
Ethics statement
This study was determined by the institutional review boards of the University of Florida, the University of Pittsburgh Medical Center, and the University of Missouri as nonhuman subject research because it only involved the collection of existing and deidentified patient medical data. Data use agreements have been executed with both the Greater Plains Collaborative (GPC) and the University of Pittsburgh. Informed written consent was waived due to the retrospective nature of the study. As this study was approved under an exempt/nonhuman subjects research determination, no formal Institutional Review Board (IRB) protocol number was assigned.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
link
