Skip to main content

Table 2 Characteristics of Members in the Dataset Subsets

From: Incorporating machine learning and social determinants of health indicators into prospective risk adjustment for health plan payments

CharacteristicTraining SetTest Set
Members Total, No.1,058,479117,616
Female Total, No. (%)517,364 (48.9%)57,469 (48.9%)
Members from ZIP codes without measured SDH variablesa, No. (%)1074 (0.1%)115 (0.1%)
Population statistics, mean [median] (SD)
 Age, y41.1 [41.0] (13.1)41.1 [41.0] (13.1)
 2017 Annual Cost, $6946 [861] (28,240)6868 [855] (27,826)
 2017 Top-coded Annual Costb, $6762 [861] (23,822)6677 [855] (23,536)
  1. The training set was used to develop the models and the test set was used to evaluate the models
  2. aThe SDH variables of these members were imputed with the median values of SDH variables over all ZIP codes, and an additional indicator variable was used to identify whether members fall into this category
  3. bStatistics of cost when top-coding at $400,000 (values higher than $400,000 were replaced with $400,000)