π«μ¬νκ΄μ§ν λΆμ 04-attribute λ Έμ΄μ¦ νμΈ
π¨ μ λ² κΈμμλ μ μΌνκ² classify λμ§ μμ νμ attributeμ λ²μλ₯Ό μμλ³΄κ³ μ΄λ₯Ό μ μ²λ¦¬νλ λ΄μ©μ λ€λ€λ€. μ΄λ² κΈμμλ λλ¨Έμ§ classify λ attribute μ€μμ noise κ° μλμ§ νμΈν΄λ³΄μ.
π¨ μλ³Έ λ°μ΄ν° 70000κ°μ object μ€μμ Systolic / Diastolic Blood Presure Preprocessing μ κ²°κ³Ό λ¨μ objectκ° 64500κ°μλ€. λ°λΌμ κ° attributeμ classλ€μ κ°μλ₯Ό ν©μΉλ©΄ κ°κ° 64500 μ΄ λμ€λμ§ νμΈνλ λ°©λ²μΌλ‘ noisy dataλ₯Ό νλ¨ν κ²μ΄λ€.
π¨ μ¬μ€, μλΉν λ¨μ무μν λ°©λ²μ΄λΌκ³ μκ°νλ€. κ° attribute κ° κ°μ§λ κ°μ νμΈνλ .unique() ν¨μλ₯Ό μ¬μ©ν΄μ μ΄λ₯Ό μ½κ² μ μλ μμ§λ§, νλ‘μ νΈμ μ½λλ§μΌλ‘λ μ§νλ°©ν₯μ μ€λͺ
νκΈ° μν΄ μ΄λ¬ν λ°©μμ μ¬μ©νμλ€. λμμ κ° attributeμ classμ λΆν¬λ₯Ό μ μ μλ€λ μ₯μ λν μλ€κ³ μκ°νμλ€ππ.
π« 3.1. Label (cardio) - Binary
print('cardiovascular : ', len(cardio[cardio['cardio']==1]))
print('Non-cardiovascular : ', len(cardio[cardio['cardio']==0]))
print('Cardio summation : ', len(cardio[cardio['cardio']==1]) + len(cardio[cardio['cardio']==0]))
>>
cardiovascular : 32146
Non-cardiovascular : 32354
Cardio summation : 64500
π¨ κ° caseμ κ²½μ°μ λ°μ΄ν°μμ ν©μ΄ μ΄ λ°μ΄ν°μμ λμΌνλ―λ‘ noisy dataκ° μλ€.
π¨ λν, μ°λ¦¬κ° μμΈ‘νκ³ μ νλ target λ€μ λΆν¬κ° κ±°μ 1:1 μ κ°κΉλ€λ κ²μ νμΈ ν μ μμλ€.
π« 3.2. Cholesterol - Ordinal
print('normal-cholesterol : ', len(cardio[cardio['cholesterol']==1]))
print('above normal-cholesterol : ', len(cardio[cardio['cholesterol']==2]))
print('well above normal-cholesterol : ', len(cardio[cardio['cholesterol']==3]))
print('Cholesterol summation : ', len(cardio[cardio['cholesterol']==1]) + len(cardio[cardio['cholesterol']==2]) + len(cardio[cardio['cholesterol']==3]))
>>
normal-cholesterol : 48461
above normal-cholesterol : 8583
well above normal-cholesterol : 7456
Cholesterol summation : 64500
π¨ κ° caseμ κ²½μ°μ λ°μ΄ν°μμ ν©μ΄ μ΄ λ°μ΄ν°μμ λμΌνλ―λ‘ noisy dataκ° μλ€.
π« 3.3. Glucose - Ordinal
print('normal-Glucose : ', len(cardio[cardio['gluc']==1]))
print('above normal-Glucose : ', len(cardio[cardio['gluc']==2]))
print('well above normal-Glucose : ', len(cardio[cardio['gluc']==3]))
print('Glucose summation : ', len(cardio[cardio['gluc']==1]) + len(cardio[cardio['gluc']==2]) + len(cardio[cardio['gluc']==3]))
>>
normal-Glucose : 54886
above normal-Glucose : 4673
well above normal-Glucose : 4941
Glucose summation : 64500
π¨ κ° caseμ κ²½μ°μ λ°μ΄ν°μμ ν©μ΄ μ΄ λ°μ΄ν°μμ λμΌνλ―λ‘ noisy dataκ° μλ€.
π« 3.4. Smoke / Non-smoke - Binary
print('Smoke : ', len(cardio[cardio['smoke']==1]))
print('Non-smoke : ', len(cardio[cardio['smoke']==0]))
print('Smoke summation : ', len(cardio[cardio['smoke']==1]) + len(cardio[cardio['smoke']==0]))
>>
Smoke : 5651
Non-smoke : 58849
Smoke summation : 64500
π¨ κ° caseμ κ²½μ°μ λ°μ΄ν°μμ ν©μ΄ μ΄ λ°μ΄ν°μμ λμΌνλ―λ‘ noisy dataκ° μλ€.
π« 3.5. Alcohol / Non-Alcohol - Binary
print('alcohol : ', len(cardio[cardio['alco']==1]))
print('Non-alcohol : ', len(cardio[cardio['alco']==0]))
print('Alcohol summation : ', len(cardio[cardio['alco']==1]) + len(cardio[cardio['alco']==0]))
>>
alcohol : 3422
Non-alcohol : 61078
Alcohol summation : 64500
π¨ κ° caseμ κ²½μ°μ λ°μ΄ν°μμ ν©μ΄ μ΄ λ°μ΄ν°μμ λμΌνλ―λ‘ noisy dataκ° μλ€.
π« 3.6. Active - Binary
print('Active : ', len(cardio[cardio['active']==1]))
print('Non-Active : ', len(cardio[cardio['active']==0]))
print('Active summation : ', len(cardio[cardio['active']==1]) + len(cardio[cardio['active']==0]))
>>
Active : 51825
Non-Active : 12675
Active summation : 64500
π¨ κ° caseμ κ²½μ°μ λ°μ΄ν°μμ ν©μ΄ μ΄ λ°μ΄ν°μμ λμΌνλ―λ‘ noisy dataκ° μλ€.
π¨ κ° attributeκ° noise μμ΄ κΉλνκ² μ 리λ κ±Έ νμΈν μ μμλ€. μ΅μ’
μ μΌλ‘ μ¬μ©ν λ°μ΄ν°μ infoλ₯Ό νμΈνλ κ²μΌλ‘ λ°μ΄ν° μ μ²λ¦¬ κ³Όμ μ μ 리νλλ‘ νκ² λ€ππ.
cardio
cardio.info()
>>
<class 'pandas.core.frame.DataFrame'>
Int64Index: 64500 entries, 0 to 69999
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 64500 non-null int64
1 gender 64500 non-null int64
2 ap_hi 64500 non-null int64
3 ap_lo 64500 non-null int64
4 cholesterol 64500 non-null int64
5 gluc 64500 non-null int64
6 smoke 64500 non-null int64
7 alco 64500 non-null int64
8 active 64500 non-null int64
9 cardio 64500 non-null int64
10 BMI 64500 non-null float64
dtypes: float64(1), int64(10)
memory usage: 5.9 MB
Leave a comment