๐Ÿšจ ์ €๋ฒˆ ๊ธ€์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ์ตœ์ข…์ ์œผ๋กœ ์ „์ฒ˜๋ฆฌํ•˜๋Š” ๊ณผ์ •์„ ๋‹ค๋ค˜๋‹ค. ์ด๋ฒˆ ๊ธ€์—์„œ๋Š” ์ด๋ ‡๊ฒŒ ์ „์ฒ˜๋ฆฌ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ๊ด€๊ด€๊ณ„ ์ฆ‰ correlation์„ ๋ถ„์„ํ•ด๋ณผ ๊ฒƒ์ด๋‹ค.

๐Ÿšจ ์ด ๊ธ€์„ ์‹œ์ž‘์œผ๋กœ ๋ฐ์ดํ„ฐ์˜ ๊ฐ attribute ๊ฐ„์˜ ์—ฐ๊ด€๊ด€๊ณ„๋ฅผ ๋ถ„์„ํ•ด๋ณผ ์ƒ๊ฐ์ด๋‹ค. ์–ด์ฉŒ๋ฉด ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹ ์‹œ๊ฐ„์— ๊ฐ€์žฅ ๋น„์ค‘์žˆ๊ฒŒ ๋‹ค๋ฃฌ ๋‚ด์šฉ๋“ค์ธ ๋งŒํผ ๋งŽ์€ ๋‚ด์šฉ์„ ๋‹ค๋ฃฐ ๊ฒƒ์ด๋‹ค.


๐Ÿซ€ 1. Preprocessing ๋œ cardio ๋ฐ์ดํ„ฐ ์ž„ํฌํŠธ

๐Ÿšจ ๋จผ์ € ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์ „์ฒ˜๋ฆฌํ•œ ์ฝ”๋“œ๋ฅผ ์ •๋ฆฌํ•ด๋ณด์•˜๋‹ค. ๊ฐ„๋‹จํ•˜๊ฒŒ ์‚ดํŽด๋ณด์ž.

# ๋ฐ์ดํ„ฐ ์ž„ํฌํŠธ ๋ฐ ์ „์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ pandas / numpy library ์ž„ํฌํŠธ
import pandas as pd
import numpy as np

# ์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•œ seaborn, matplotlib ์ž„ํฌํŠธ
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.font_manager as fm
%matplotlib inline

cardio = pd.read_csv('C:\\Users\mingu\Desktop\\cardio_train.csv', sep=';')

# day๊ธฐ์ค€ age๋ฅผ year ๊ธฐ์ค€์œผ๋กœ ๋ณ€ํ™˜ : 365๋กœ ๋‚˜๋ˆ„๊ณ  ์†Œ์ˆ˜์  ์ฒซ์งธ์ž๋ฆฌ์—์„œ ๋ฐ˜์˜ฌ๋ฆผ
cardio['age'] = cardio['age'] / 365
cardio['age'] = round(cardio['age'], 0).astype('int64').copy()

# BMI attribute ์ƒ์„ฑ
cardio['height'] = cardio['height'] / 100
cardio['BMI'] = cardio['weight'] / (cardio['height']**2)
cardio['BMI'] = round(cardio['BMI'], 2).copy()

# id, height, weight attribute ์‚ญ์ œ
cardio = cardio.drop(['id','height','weight'], axis = 1)

# ์ˆ˜์ถ•๊ธฐํ˜ˆ์••์ด ์ด์™„๊ธฐํ˜ˆ์••๋ณด๋‹ค ๋‚ฎ์€ row ์‚ญ์ œ
low_drop_index = cardio[(cardio['ap_hi'] < cardio['ap_lo'])].index
cardio = cardio.drop(low_drop_index).copy()

# 90mmHg ์ดํ•˜, 200mmHg ์ด์ƒ์˜ Ap_hi ํ˜ˆ์•• ์ œ๊ฑฐ
drop_index_sys = cardio[(cardio['ap_hi'] < 90) | (cardio['ap_hi'] > 170)].index
cardio = cardio.drop(drop_index_sys).copy()

# 60mmHg ์ดํ•˜, 140mmHg ์ด์ƒ์˜ Ap_lo ํ˜ˆ์•• ์ œ๊ฑฐ
drop_index_dias = cardio[(cardio['ap_lo'] < 65) | (cardio['ap_lo'] > 105)].index
cardio = cardio.drop(drop_index_dias).copy()

cardio  


๐Ÿซ€ 2. Correlation Analysis

๐Ÿšจ correlation ๋ถ„์„์„ ์œ„ํ•ด์„œ .corr() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•  ๊ฒƒ์ด๋‹ค. ์ด๋•Œ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ์‹œ๊ฐํ™”ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ํŒŒ์ด์ฌ์˜ seaborn ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค. ๋”ฐ๋ผ์„œ ์ด ์นœ๊ตฌ๋ถ€ํ„ฐ ์ž„ํฌํŠธํ•  ๊ฒƒ์ด๋‹ค. correlation ๋ถ„์„์ด๋ผ๊ณ  ํ•ด์„œ ์ฝ”๋“œ๊ฐ€ ๊ธด ๊ฒƒ์ด ์•„๋‹ˆ๋‹ค. ํ•จ์ˆ˜ ํ•˜๋‚˜๋งŒ ๊ฐ€์ง€๊ณ ๋„ ์šฐ๋ฆฌ๊ฐ€ ์›ํ•˜๋Š” ๊ฐ attribute์˜ ์—ฐ๊ด€๊ด€๊ณ„๋ฅผ ์‰ฝ๊ฒŒ ์•Œ ์ˆ˜ ์žˆ๋‹ค.

# ์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•œ seaborn ์ž„ํฌํŠธ
import seaborn as sns 

plt.figure(figsize = (20,12))
ax = sns.heatmap(cardio.corr(), annot=True, annot_kws=dict(color='r'), cmap='Greens')
plt.show()


๐Ÿ“Œ .corr()์„ ์‚ฌ์šฉํ•œ attribute ๊ฐ„์˜ ์ƒ๊ด€๊ด€๊ณ„ ๋ถ„์„
๐Ÿ“Œ seaborn ๋ชจ๋“ˆ์„ ํ†ตํ•ด ์‹œ๊ฐํ™”
๐Ÿ“Œ target์ด cardio attribute์ด๋ฏ€๋กœ ๋‹ค๋ฅธ feature๋“ค๊ณผ cardio ์‚ฌ์ด์˜ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋ถ„์„


๐Ÿšจ correlation ๊ฒฐ๊ณผ

  • age & cardio : 0.23
  • gender & cardio : -0.0022
  • ap_hi & cardio : 0.43
  • ap_lo & cardio : 0.34
  • cholesterol & cardio : 0.22
  • gluc & cardio : 0.086
  • smoke & cardio : -0.019
  • alco & cardio : -0.011
  • active & cardio : -0.037
  • BMI & cardio : 0.15

๐Ÿšจ cardio์™€ attribute๊ฐ„์˜ ์ƒ๊ด€๊ด€๊ณ„ ๋ถ€ํ˜ธ

  • (+) : age, ap_hi, ap_lo, cholesterol, gluc, BMI
  • (-) : gender, smoke, alco, active

๐Ÿšจ cholesterol๊ณผ glucose๊ฐ€ ์„œ๋กœ ๋†’์€ ์ƒ๊ด€๊ด€๊ณ„(0.45)๋ฅผ ๊ฐ€์ง€๊ณ , ap_hi, ap_lo ์—ญ์‹œ ์ƒ๋‹นํžˆ ๋†’์€ ์ƒ๊ด€๊ด€๊ณ„(0.71)๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.

  • ์ผ๋ฐ˜์ ์ธ ์ƒ๊ฐ์œผ๋กœ๋Š” cholesterol๊ณผ glucose, ๊ทธ๋ฆฌ๊ณ  ํ˜ˆ์••์ด ์„œ๋กœ ํฐ ์ƒ๊ด€๊ด€๊ณ„๊ฐ€ ์žˆ์„ ๊ฒƒ์ด์ง€๋งŒ, ์ด ๋ฐ์ดํ„ฐ๋Š” ๊ทธ๋ ‡๊ฒŒ ํฐ ์ˆ˜์น˜๋ฅผ ๋‚˜ํƒ€๋‚ด์ง€๋Š” ์•Š๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

๐Ÿšจ Correlation ๊ฒฐ๊ณผ ์ƒ๊ด€๊ด€๊ณ„์˜ ์ ˆ๋Œ“๊ฐ’์ด ์ปค์„œ ๋ฌด์‹œํ•˜์ง€ ๋ชปํ•  ๋งŒํ•œ attribute๋Š” [age, ap_hi, ap_lo, cholesterol] ์ด๋‹ค.

๐Ÿšฉ 2022.09.08 ์ถ”๊ฐ€ - correlation ์€ ์ˆ˜์น˜ํ˜• ๋ณ€์ˆ˜์™€ ์ˆ˜์น˜ํ˜• ๋ณ€์ˆ˜ ์‚ฌ์ด์˜ ๊ด€๊ณ„๋ฅผ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค. ๋‚˜๋Š” ํ”„๋กœ์ ํŠธ ์ง„ํ–‰ ์‹œ์— ๋ฒ”์ฃผํ˜• ๋ณ€์ˆ˜๊ฐ„์˜ ๊ด€๊ณ„์— ๋Œ€ํ•ด์„œ๋„ ์ƒ๊ด€๊ด€๊ณ„ ๋ถ„์„์„ ์ง„ํ–‰ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์—, ํ˜น์‹œ ์ด ๊ธ€์„ ์ฝ์œผ์‹  ๋ถ„๋“ค์€ ๋‚˜์™€ ๊ฐ™์€ ์‹ค์ˆ˜๋ฅผ ์ €์ง€๋ฅด์ง€ ์•Š์•˜์œผ๋ฉด ํ•œ๋‹ค๐Ÿคฅ.


Leave a comment