Skip to main content

Table 1 Demographic information and geometric mean of biomarkers for the exposure biomarkers classification model

From: Binary classification of users of electronic cigarettes and smokeless tobacco through biomarkers to assess similarity with current and former smokers: machine learning applied to the population assessment of tobacco and health study

   

Training data

Test data

Products

  

Full CS1

CS2

ExSM

test-CS2

test-ExSM

dual EPRODS

EPRODS

dual SMKLS

SMKLS

Participants (N)

 

2189

81

81

21

21

68

142

37

169

Age

Mean

39.1

38.8

30.1

38.7

31.1

36.0

38.1

32.1

42.0

 

SD

15.0

14.6

14.8

13.8

14.2

14.2

14.2

15.5

17.0

Gender (N)

Male

1080

35

56

8

12

42

61

-

159

 

Female

1109

46

25

13

9

26

81

-

10

Ethnicity (N)

White

1681

65

43

-

-

-

127

-

148

 

Black

299

8

27

-

-

-

5

-

6

 

Other

209

8

11

-

-

-

10

-

15

Alcohol (N)3

 

1980

76

70

19

19

61

134

32

155

Urban (N)3

 

1957

75

77

18

20

62

134

23

136

High blood pressure (N)3

 

547

23

14

-

-

15

23

9

56

High cholesterol (N)3

 

386

12

15

0

-

9

27

-

41

Diabetes (N)3

 

289

14

6

-

-

6

11

-

23

Cardiovascular disease (N)4

230

10

5

-

-

-

10

-

13

Respiratory disease (N)4

 

551

24

11

6

-

10

29

5

28

CEMA (ug/g*CRE)

Mean

368.2

361.0

131.0

278.5

96.0

287.9

169.4

261.1

126.8

 

SD

281.0

272.2

97.7

132.3

44.9

186.2

203.7

162.2

93.7

HEMA (ug/g*CRE)

Mean

5.2

5.6

1.7

4.3

1.2

3.9

1.8

3.0

1.2

 

SD

6.6

9.9

1.7

3.2

0.8

5.3

1.6

2.6

1.1

HPMA (ug/g*CRE)

Mean

1708.8

1646.3

364.5

1378.5

251.6

1412.2

564.2

1075.3

406.7

 

SD

1452.9

1130.8

430.9

1141.4

190.4

1039.8

660.5

740.5

674.5

NNAL (ng/g*CRE)

Mean

410.4

366.8

94.7

355.4

69.7

290.6

76.1

912.5

1480.8

 

SD

365.9

247.7

266.6

289.3

159.5

209.8

186.6

1044.4

3142.6

NNNT (ng/g*CRE)

Mean

25.7

24.0

5.6

14.0

3.5

19.2

9.9

40.1

53.1

 

SD

78.9

69.7

14.8

12.4

7.6

22.1

16.8

40.3

87.0

P01 (ug/g*CRE)

Mean

86.1

14.5

3.1

12.3

2.1

62.6

44.3

10.3

4.4

 

SD

768.7

8.7

4.1

8.0

1.9

375.1

349.9

7.1

20.7

P02 (ug/g*CRE)

Mean

17.2

16.9

7.0

15.5

4.6

15.3

11.5

12.5

8.3

 

SD

10.9

7.6

5.7

8.5

2.3

8.1

35.7

5.2

13.4

P10 (ng/g*CRE)

Mean

471.1

386.9

202.0

344.4

354.1

401.8

234.8

399.8

245.4

 

SD

1420.5

232.9

154.1

167.6

611.3

410.2

196.2

396.4

236.0

PMA (ng/g*CRE)

Mean

1311.2

1150.4

947.0

1681.7

846.2

1268.6

1543.7

1151.0

1226.2

 

SD

1448.9

732.3

817.9

1797.6

510.9

1294.2

1792.3

1028.2

1081.3

TNE7 (ug/g*CRE)

Mean

77.9

73.2

12.1

67.7

6.0

80.2

64.5

98.6

119.4

 

SD

51.7

41.8

32.2

37.5

10.6

53.2

53.5

66.8

100.5

  1. Abbreviations: CEMA N-acetyl-S-(2-carboxyethyl)-L-cysteine, CS cigarette smoker, CRE creatinine, dual-EPRODS user of both conventional and electronic cigarettes, dual-SMKLS user of both conventional cigarettes and smokeless tobacco, ExSM former smoker, EPRODS electronic cigarettes user, test-CS test dataset of current smokers, test-ExSM test dataset of former smokers, HEMA N-acetyl-S-(2-hydroxyethyl)-L-cysteine, HPMA N-acetyl-S-(3-hydroxypropyl)-L-cysteinem, NNAL 4-(methylnitrosamino)-4-(3-pyridyl)-1-butanol, NNNT N’-nitrosonornicotine, SD standard deviation, SMKLS user of smokeless tobacco products, TNE7 total nicotine equivalents, PMA N-acetyl-S-(phenyl)-L-cysteine, P01 1-naphthol or 1-hydroxynaphthalene, P02 2-naphthol or 2-hydroxynaphthalene, P10 1-hydroxypyrene
  2. 1Full data of current smokers
  3. 2Random selection from full data of current smokers to match the number of former smokers for machine learning
  4. 3Number of participants who answered “Yes” in the questionnaire
  5. 4Number of participants who answered “Yes” in the relevant disease questionnaire
  6. - In accordance with ICPSR data release rules, tables with cell sizes smaller than the threshold for the specific dataset will not be released