๐Ÿงฉ ์ €๋ฒˆ ํฌ์ŠคํŒ… ์ดํ›„ ๊ฑฐ์˜ 20์ผ ๋งŒ์˜ ํฌ์ŠคํŒ…์ด๋‹ค. ์ด๋ ‡๊ฒŒ ๊ณต๋ฐฑ์ด ๊ธธ์—ˆ๋˜ ์ด์œ ์—๋Š” ๋ญ ์ถ”์„ ์—ฐํœด๋‹ค ๊ฐœ๊ฐ• ์‹œ์ฆŒ์ด๋‹ค ์š”๋Ÿฐ ์ด์œ ๊ฐ€ ์žˆ์ง€๋งŒ, ๊ฐ€์žฅ ํฐ ์ด์œ ๋Š” ๋‚ด๊ฐ€ ์ •๋ง ํ•˜๊ณ  ์‹ถ์€ ๊ฒŒ ๋ฌด์—‡์ผ์ง€ ์ƒ๊ฐํ•˜๋Š” ์‹œ๊ฐ„์„ ๊ฐ€์ ธ๋ณด์•˜๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค๐Ÿ˜ฅ. ์ง€๋‚œ ํ•™๊ธฐ์— ๋ฐ”์ด์˜ค๋ฉ”๋””์ปฌ์— ๊ด€๋ จ๋œ ์ธ๊ณต์ง€๋Šฅ ์ˆ˜์—…๊ณผ ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹ ์ˆ˜์—…์„ ๋“ค์œผ๋ฉด์„œ ์ „๊ณต์„ ์‚ด๋ฆฌ๋ ค๋ฉด ๋Œ€ํ•™์›์„ ๊ผญ ๊ฐ€์•ผ๊ฒ ๊ตฌ๋‚˜ ์ƒ๊ฐํ•˜๋ฉฐ ์—ฌ๋ฆ„ ๋ฐฉํ•™์„ ๋งž์•˜๊ณ , ์ด๋ฒˆ ํ•™๊ธฐ ์ดˆ๋งŒ ํ•ด๋„ ๋ญ๋ž„๊นŒ, ๋Œ€ํ•™์›์€ ํ•„์ˆ˜๋‹ค ํ•˜๋Š” ์ƒ๊ฐ๋งŒ์„ ๊ฐ€์ง€๊ณ  ์žˆ์—ˆ๋‹ค. ๋งˆ์นจ 3ํ•™๋…„ 2ํ•™๊ธฐ์ด๊ณ , ๋Œ€๋ถ€๋ถ„์˜ ๋™๊ธฐ๋“ค์ด ์ด๋ง˜๋•Œ์ฏค ์ž์‹ ์ด ๊ฐ€๊ณ  ์‹ถ์€ ๋Œ€ํ•™์› ์—ฐ๊ตฌ์‹ค์„ ์ฐพ์œผ๋ฉด ์ข‹๋‹ค๋Š” ์กฐ์–ธ์„ ํ•ด์ฃผ์—ˆ๊ธฐ์— ๋‚˜๋„ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์ด๋Ÿฐ์ €๋Ÿฐ ๊ณณ์„ ์ฐพ์•„๋ณด๊ณ  ์žˆ์—ˆ๋˜ ๊ฒƒ ๊ฐ™๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ๋ญ๋ž„๊นŒ, ๋‚ด๊ฐ€ ํ•˜๊ณ  ์‹ถ์€ ๊ฒƒ์€ ์˜๋ฃŒ ๋ฐ์ดํ„ฐ์—์„œ ์–ด๋– ํ•œ ์ธ์‚ฌ์ดํŠธ๋ฅผ ๋ฝ‘์•„์„œ ํ™˜์ž ํ˜น์€ ๋™์ผ ์งˆ๋ณ‘ ๋ณด์œ ์ž๋“ค์—๊ฒŒ ์ ์šฉํ•˜๋Š” ๊ฒƒ์ด์—ˆ๋Š”๋ฐ ์• ์ดˆ์— ๊ทธ์™€ ๊ด€๋ จ๋œ ์—ฐ๊ตฌ์‹ค์ด ์ ๊ธฐ๋„ ํ–ˆ๊ณ  ๊ทธ ์ฃผ์ œ๋ฅผ ์ฃผ๋กœ ์žก๊ณ  ์—ฐ๊ตฌ๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋“œ๋ฌผ์—ˆ๋˜ ๊ฒƒ ๊ฐ™๋‹ค.

๐Ÿงฉ ์œ„์— ์–ธ๊ธ‰ํ•œ๋Œ€๋กœ ๋‚ด๊ฐ€ ํ•˜๊ณ  ์‹ถ์€ ๊ฒƒ์€ ๋ฐ์ดํ„ฐ ๋ถ„์„๊ณผ ์‹œ๊ฐํ™”๋ฅผ ํ†ตํ•œ ์ธ์‚ฌ์ดํŠธ ๋„์ถœ์ด์—ˆ๊ธฐ ๋•Œ๋ฌธ์—, ๊ทธ๋ƒฅ ๋‹จ์ˆœํžˆ ๊ตฌ๊ธ€์— ๋ฐ์ดํ„ฐ ๋ถ„์„ ์ด๋ผ๊ณ  ๊ฒ€์ƒ‰ํ–ˆ๋˜ ๊ฒƒ ๊ฐ™๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ทธ ๊ฒ€์ƒ‰ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜๋ฆ„ ์•ž์œผ๋กœ์˜ ๋‚˜์˜ ์ง„๋กœ๋ฅผ ์ •ํ•ด์ค€ ๊ฒ€์ƒ‰์ด ๋˜์—ˆ๋‹ค. ๊ทธ ๊ฒ€์ƒ‰์œผ๋กœ๋ถ€ํ„ฐ ๋‚˜๋Š” DA(Data Analyst) ์™€ BA(Business Analyst) ๋ผ๋Š” ์ง๋ฌด๋ฅผ ์•Œ ์ˆ˜ ์žˆ์—ˆ๊ณ , ์˜์™ธ๋กœ ๋‚ด๊ฐ€ ํ•™๋ถ€์—์„œ ๋ฐฐ์šด ํŒŒ์ด์ฌ, R, ํ†ต๊ณ„ํ•™, ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹ ๋“ฑ์ด ๋งŽ์€ ๋ถ€๋ถ„์„ ์ปค๋ฒ„ํ•˜๋Š” ๋„๋ฉ”์ธ์ž„์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค. ๋ฌผ๋ก  SQL๊ณผ ์ถ”๊ฐ€์ ์ธ BI ํˆด์„ ๋‹ค๋ฃฐ ์ค„์€ ์•Œ์•„์•ผ๊ฒ ์ง€๋งŒ, ์–ด์ฉŒ๋ฉด ์ง€๊ธˆ์˜ ๋‚˜์—๊ฒŒ๋Š” ๋‘˜๋„ ์—†์ด ํฅ๋ฏธ๋กœ์šด ์ง๋ฌด๋ผ๊ณ  ์ƒ๊ฐํ–ˆ๋‹ค. ๊ทธ๋ž˜์„œ ์ด๋ ‡๊ฒŒ ์ง„๋กœ๋ฅผ ์ •ํ•œ ์ดํ›„ ๋ฉฐ์น ๊ฐ„ ์ธํ„ฐ๋„ท์„ ๋’ค์ง€๋ฉฐ ๋ฐ์ดํ„ฐ ๋ถ„์„๊ฐ€๊ฐ€ ๋˜๊ธฐ ์œ„ํ•œ ์ค€๋น„์‚ฌํ•ญ์„ ์ฐพ์•„๋‹ค๋‹ˆ๋Š๋ผ ๋ฐ”๋นด๋‹ค๐Ÿ˜€๐Ÿ˜€.

๐Ÿงฉ ์š”๋Ÿฐ ์ด์œ ๋กœ ํฌ์ŠคํŒ… ๊ณต๋ฐฑ๊ธฐ๊ฐ€ ์ƒ๋‹นํžˆ ๋Šฆ์–ด์กŒ๋‹ค. ์•ž์œผ๋กœ์˜ ํฌ์ŠคํŒ…์€ ๋ฐ”์ด์˜ค์™€ ๊ด€๋ จ๋œ ๋ถ„์•ผ๋ณด๋‹ค๋Š” SQL์ด๋‚˜ ์ˆœ์ˆ˜ ๋ฐ์ดํ„ฐ ๋ถ„์„๐Ÿ“Š์— ๋Œ€ํ•œ ๋‚ด์šฉ์ด ๋งŽ์•„์งˆ ์˜ˆ์ •์ด๊ธฐ ๋•Œ๋ฌธ์—, ์•„๋งˆ๋„ ์กฐ๊ธˆ์˜ ์ค€๋น„์‹œ๊ฐ„์„ ๊ฑฐ์นœ ์ดํ›„์— ์™€๋‹ค๋‹ค๋‹ฅ ๐Ÿƒโ€โ™‚๏ธ๐Ÿƒโ€โ™‚๏ธ๐Ÿƒโ€โ™‚๏ธ ํฌ์ŠคํŒ… ํ•  ๊ฒƒ ๊ฐ™๋‹ค. ๊ทธ๋ž˜๋„ ๊ณต๋ถ€ํ•˜๋ฉด์„œ ํ‹ˆํ‹ˆ์ด ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹ ๊ด€๋ จ ๊ธ€์€ ์กฐ๊ธˆ์‹ ์˜ฌ๋ฆด ์ƒ๊ฐ์ด๋‹ค๐Ÿ˜Š.

๐Ÿงฉ ๋ป˜๊ธ€์ด ๊ธธ์–ด์กŒ๋‹ค!! ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” ์ €๋ฒˆ ํ•™๊ธฐ์— ์ˆ˜ํ–‰ํ•œ ํ”„๋กœ์ ํŠธ์—์„œ support, confidence, lift๋ฅผ ํ†ตํ•ด ํŒจํ„ด์„ ๋ถ„์„ํ•˜๋Š” ์ฒซ๋ฒˆ์งธ ์‹œ๊ฐ„์„ ๊ฐ€์งˆ ๊ฒƒ์ด๋‹ค. ๋จผ์ €, ์›๋ž˜ ์žˆ๋Š” ์›๋ณธ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๋ฒ”์ฃผํ˜• ๋ฐ์ดํ„ฐ์™€ ํŠธ๋žœ์žญ์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“œ๋Š” ๊ณผ์ •์„ ์‚ดํŽด๋ณด์ž.


1. support / confidence / lift ์ด๋ก ์  ๋ฐฐ๊ฒฝ

  • ๐Ÿ Support
    • ์ง€์ง€๋„
    • x์™€ y๋ฅผ ๋™์‹œ์— ํฌํ•จํ•˜๋Š” ๋น„์œจ
    • ์‹ ๋ขฐ๋„(Confidence)๋ฅผ ์ง€์ง€ํ•˜๋Š” ์ฒ™๋„
    • confidence์— ์˜ํ•œ ๊ทœ์น™์ด ์ง€์ง€๋ฐ›๊ธฐ ์œ„ํ•ด์„œ๋Š” support ๊ฐ’์ด ๋†’์•„์•ผ ํ•จ.
  • ๐Ÿ Confidence
    • ์‹ ๋ขฐ๋„
    • x๋ฅผ ํฌํ•จํ•˜๋Š” ๊ฑฐ๋ž˜ ๋‚ด์—ญ ์ค‘, y๊ฐ€ ํฌํ•จ๋œ ๋น„์œจ
    • ๊ทœ์น™์˜ ์‹ ๋ขฐ๋„์— ๋Œ€ํ•œ ์ฒ™๋„
    • P(Y|X)
  • ๐Ÿ Lift
    • ์‹ ๋ขฐ๋„์˜ ๊ฒฐ๊ณผ๊ฐ€ 0.9๋ผ๊ณ  ๊ฐ€์ •ํ•˜์˜€์„ ๋•Œ p(Y)๊ฐ€ 0.9๋ฉด x,y๊ฐ€ ์„œ๋กœ ๋…๋ฆฝ์ด ๋˜๊ธฐ ๋•Œ๋ฌธ์— x๋Š” y๋ฅผ ์„ค๋ช…ํ•˜๋Š” ๋ฐ์— ์•„๋ฌด๋Ÿฐ ๋„์›€์„ ์ค„์ˆ˜ ์—†์Œ
    • ๊ทœ์น™์ด ์ง„์งœ ์˜๋ฏธ๊ฐ€ ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•œ ์ฒ™๋„
    • P(Y|X) / P(Y)
      • Lift = 1 : x์™€ y๋Š” ์•„๋ฌด ๊ด€๊ณ„๊ฐ€ ์—†์Œ. ๋…๋ฆฝ.
      • Lift > 1 : x๊ฐ€ y์˜ ๋ฐœ์ƒ ์ฆ๊ฐ€๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ์— ๋„์›€์ด ๋จ. (์–‘์˜ ์ƒ๊ด€๊ด€๊ณ„).
      • Lift < 1 : x๊ฐ€ y์˜ ๋ฐœ์ƒ ๊ฐ์†Œ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐ์— ๋„์›€์ด ๋จ. (์Œ์˜ ์ƒ๊ด€๊ด€๊ณ„).

2. Preprocessing

๐Ÿ ์šฐ๋ฆฌ๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋Š” ๋ฒ”์ฃผํ˜• ์ž๋ฃŒ์™€ ์ˆ˜์น˜ํ˜• ์ž๋ฃŒ๊ฐ€ ์ด๊ฒƒ์ €๊ฒƒ ์„ž์—ฌ์žˆ๋‹ค. ํŒจํ„ด ๋ถ„์„์„ ํ†ตํ•ด ๊ทœ์น™์„ ์ฐพ๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ํŠธ๋žœ์žญ์…˜ ๋ฐ์ดํ„ฐ, ์ฆ‰ Boolean ํ˜•ํƒœ๋กœ ๊ตฌ์„ฑ๋œ ๋ฐ์ดํ„ฐ์—ฌ๋งŒ ํ•œ๋‹ค. ๋”ฐ๋ผ์„œ ์šฐ๋ฆฌ๋Š” ๊ฐ attribute๋“ค์„ ์ผ์ •ํ•œ ๊ธฐ์ค€์„ ๊ฐ€์ง€๊ณ  ๋ชจ๋‘ ๋ฒ”์ฃผํ™” ํ•œ ๋’ค, ์ตœ์ข…์ ์œผ๋กœ ์ด๋ ‡๊ฒŒ ๋ฒ”์ฃผํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ Boolean ํ‘œํ˜„ํ˜•์œผ๋กœ ๋ฐ”๊ฟ” ํŠธ๋žœ์žญ์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ตฌํ•  ๊ฒƒ์ด๋‹ค.

  • ๐Ÿ Support, Confidence ๊ณ„์‚ฐ์„ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ transaction table ํ˜•ํƒœ๋กœ ๋ณ€๊ฒฝ
    • pre_tran : ๊ฐ attribute์˜ binary ๊ฐ’์„ category ํ˜•ํƒœ๋กœ ๋ฐ”๊พผ dataframe ์ƒ์„ฑ
    • transaction : mlxtend ๋ฉ”์†Œ๋“œ์˜ transform ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ boolean dataframe ์ƒ์„ฑ
  • ๐Ÿ Support, Confidence ๊ณ„์‚ฐ
    • mlxtend.frequent_patterns ๋ชจ๋“ˆ์˜ apriori, association_rules ํ•จ์ˆ˜
    • apriori() : itemsets ๊ฐ„์˜ Support๋ฅผ ๊ณ„์‚ฐํ•˜์—ฌ dataframe์œผ๋กœ ๋ฐ˜ํ™˜
      • ์„ค์ •ํ•œ min_support๋ฅผ ๋งŒ์กฑํ•˜๋Š” ๊ฒฝ์šฐ๋งŒ ๋ฐ˜ํ™˜
    • association_rules() ํ•จ์ˆ˜์˜ metric, min_threshold ์˜ต์…˜
      • ์„ค์ •ํ•œ metric์ด min_threshold ์ด์ƒ์ธ ๊ฒฝ์šฐ๋งŒ ๋ฐ˜ํ™˜
  • ๐Ÿ ์šฐ๋ฆฌ๊ฐ€ ์ฐพ๊ณ ์ž ํ•˜๋Š” ๊ฒƒ์€ cardio์™€ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” attribute ๊ฐ„์˜ ์ธ๊ณผ๊ด€๊ณ„์ด๊ธฐ ๋•Œ๋ฌธ์— cardio๋ฅผ consequents๋กœ ํ•˜๋Š” ๊ฒฝ์šฐ๋ฅผ ์ฃผ๋กœ ์‚ดํŽด๋ณผ ์˜ˆ์ •์ž„
    • confidence, Lift, support ์ˆœ์„œ๋กœ ์šฐ์„ ์ˆœ์œ„๋ฅผ ์„ค์ •
    • min_confidence = 0.6 / Lift > 1 / min_support = 0.01
    • support๋ฅผ ๋‚ฎ๊ฒŒ ์„ค์ •ํ•œ ์ด์œ ๋Š” confidence์™€ Lift๋ฅผ ๋งŒ์กฑํ•˜๋Š” ๊ฒฝ์šฐ์— antecedents์˜ support๊ฐ€ ๋„ˆ๋ฌด ์ž‘์•„ ์ „์ฒด์ ์ธ support๊ฐ€ ๋‚ฎ๊ฒŒ ๋‚˜์˜ค๋Š” ๊ฒฝ์šฐ๋ฅผ ๊ณ ๋ คํ•œ ๊ฒƒ์ด๋‹ค.

3. Code

๐Ÿšฉ 3.1. Categorical Data Code

# BMI attribute๋ฅผ ์œ„ํ•œ categorize ํ•จ์ˆ˜ ์ƒ์„ฑ
# BMI < 18.5 : ์ €์ฒด์ค‘
# 18.5 =< BMI < 25 : ์ •์ƒ
# 25 =< BMI < 30 : ๊ณผ์ฒด์ค‘
# 30 =< BMI < 39.9 : ๋น„๋งŒ
# 39.9 =< BMI  : ๊ณ ๋„๋น„๋งŒS
def bmi(x):
    if x < 18.5:
        x = 'LOW'
    elif (x >= 18.5) & (x<25):
        x = 'NORMAL'
    elif (x >= 25) & (x < 30):
        x = 'OVER'
    elif (x >= 30) & (x < 39.9):
        x = 'OBESITY'
    else:
        x = 'HIGH_OBESITY'
    return x
# cardio ๋ฐ์ดํ„ฐ ๋ฒ”์ฃผํ™”
pre_tran = cardio.copy()

# gender : 1 2
pre_tran = pre_tran.replace({'gender':1},'Women')
pre_tran = pre_tran.replace({'gender':2},'Men')

# cholesterol : 1 2 3
pre_tran = pre_tran.replace({'cholesterol':1},'Normal_cho')
pre_tran = pre_tran.replace({'cholesterol':2},'Above_Normal_cho')
pre_tran = pre_tran.replace({'cholesterol':3},'Well_Above_Normal_cho')

# gluc : 1 2 3
pre_tran = pre_tran.replace({'gluc':1},'Normal_gluc')
pre_tran = pre_tran.replace({'gluc':2},'Above_Normal_gluc')
pre_tran = pre_tran.replace({'gluc':3},'Well_Above_Normal_gluc')

# smoke : 0 1
pre_tran = pre_tran.replace({'smoke':0},'No_Smoke')
pre_tran = pre_tran.replace({'smoke':1},'Smoke')

# alco : 0 1
pre_tran = pre_tran.replace({'alco':0},'No_Alcohol')
pre_tran = pre_tran.replace({'alco':1},'Alcohol')

# active : 0 1
pre_tran = pre_tran.replace({'active':0},'No_Active')
pre_tran = pre_tran.replace({'active':1},'Active')

# cardio : 0 1, target
pre_tran = pre_tran.replace({'cardio':0},'No_cardio')
pre_tran = pre_tran.replace({'cardio':1},'Cardio')

# ap_hi๊ฐ€ 140์ด์ƒ์ด๋ฉด HBP_SYS(๊ณ ํ˜ˆ์••), ๊ทธ ์™ธ์—๋Š” NBP_SYS(์ •์ƒ)
# ap_lork 90 ์ด์ƒ์ด๋ฉด HBP_DIAS(๊ณ ํ˜ˆ์••), ๊ทธ ์™ธ์—๋Š” NBP_DIAS(์ •์ƒ)
pre_tran["ap_hi"] = np.where(pre_tran["ap_hi"] >=140, 'HBP_SYS', 'NBP_SYS')
pre_tran["ap_lo"] = np.where(pre_tran["ap_lo"] >=90, 'HBP_DIAS', 'NBP_DIAS')

# age : ์—ฐ๋ น๋Œ€๋กœ ๋ถ„๋ฅ˜
pre_tran.loc[pre_tran['age'] // 10 == 3, 'age'] = 30
pre_tran.loc[pre_tran['age'] // 10 == 4, 'age'] = 40
pre_tran.loc[pre_tran['age'] // 10 == 5, 'age'] = 50
pre_tran.loc[pre_tran['age'] // 10 == 6, 'age'] = 60

# BMI : ์•ž์„œ ์ƒ์„ฑํ•œ BMI ํ•จ์ˆ˜ ์‚ฌ์šฉ
pre_tran['BMI'] = pre_tran['BMI'].apply(bmi)
print('row : ', len(pre_tran))
print('columns : ', len(pre_tran.columns))
pre_tran.head()
>>
row :  64500
columns :  11


๐Ÿ“Œ ์œ„์™€ ๊ฐ™์€ ๊ณผ์ •์„ ๊ฑฐ์น˜๋ฉด ์šฐ๋ฆฌ๊ฐ€ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ๊ฐ€ ๋ชจ๋‘ ๋ฒ”์ฃผํ˜•์œผ๋กœ ๋ณ€ํ•˜๊ฒŒ ๋œ๋‹ค. ์ด์ œ๋Š” ์ด ๋ฒ”์ฃผํ˜• ๋ฐ์ดํ„ฐ๋ฅผ ํŠธ๋žœ์žญ์…˜ ๋ฐ์ดํ„ฐ๋กœ ๋งŒ๋“ค์–ด์ฃผ๋ฉด ๋œ๋‹ค.


๐Ÿšฉ 3.2. Transaction Data Code

# transaction table ์ƒ์„ฑ
# mlxtend.preprocessing ๋ชจ๋“ˆ์˜ TransactionEncoder ์ž„ํฌํŠธ
from mlxtend.preprocessing import TransactionEncoder

# transaction ๋ฐ์ดํ„ฐ ์ƒ์„ฑ
# ๋ฒ”์ฃผํ˜• ๋ฐ์ดํ„ฐ๋ฅผ mlxtend ๋ฉ”์†Œ๋“œ์˜ transform ํ•จ์ˆ˜์— ๋„ฃ๊ธฐ ์œ„ํ•ด listํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ : trans_data
trans_data = np.array(pre_tran)
trans_data = np.array(trans_data.tolist())

# transform() ํ•จ์ˆ˜๋กœ trans_data๊ฐ€ one-hot encoding ๋œ ํ˜•ํƒœ์˜ boolean list๋ฅผ te_ary๋กœ ๋ฐ›์Œ
# te_ary๋ฅผ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ํ˜•ํƒœ๋กœ ๋ณ€ํ™˜ํ•˜์—ฌ transaction data ์ƒ์„ฑ 
# transaction : attribute์˜ ๊ฐ category์— ๋Œ€ํ•œ value๋ฅผ column์œผ๋กœ ๋ฐ›์Œ
te = TransactionEncoder()
te_ary = te.fit(trans_data).transform(trans_data)
transaction = pd.DataFrame(te_ary, columns=te.columns_)
transaction


๐Ÿ˜ฅ ์–ด์—‡ ์‚ฌ์ง„์ด ์ž˜ ์•ˆ๋ณด์ธ๋‹ค,,, ํ•œ๋ฒˆ์— ์ตœ๋Œ€ํ•œ ๋งŽ์€ attribute๋ฅผ ๋ณด์—ฌ์ฃผ๊ณ  ์‹ถ์–ด์„œ ์บก์ฒ˜๋ฅผ ์ €๋ ‡๊ฒŒ ๋–ด๋Š”๋ฐ ์•„์‰ฝ๋‹ค. ํ˜น์‹œ ๋” ์ž์„ธํžˆ ๋ณด๊ณ  ์‹ถ์œผ์‹  ๋ถ„๋“ค์€ ์‚ฌ์ง„์„ ํ•œ๋ฒˆ๋งŒ ๋” ํด๋ฆญํ•ด์ฃผ์‹œ๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™๋‹ค!!


๐Ÿ ๋“œ๋””์–ด ์›ํ•˜๋Š” ํ˜•ํƒœ์˜ ํŠธ๋žœ์žญ์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋งŒ๋“ค์—ˆ๋‹ค. ์ด์ œ๋ถ€ํ„ฐ๋Š” ์ด๋ ‡๊ฒŒ ๋งŒ๋“ค์–ด์ง„ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์— ๋Œ€ํ•œ ํŒจํ„ด์„ ๋ถ„์„ํ•ด์„œ ๊ฐ™์ด ๋‚˜์˜ค๋Š” ์นœ๊ตฌ๋“ค์ด ๋ฌด์—‡์ด ์žˆ๋Š”์ง€, ๊ทธ ์ˆ˜์น˜๋Š” ์–ด๋– ํ•œ์ง€์— ๋Œ€ํ•ด์„œ ๋ถ„์„ํ•˜๋ฉด ๋œ๋‹ค.

๐Ÿ ๋ฐ์ดํ„ฐ์—์„œ ๊ทธ ์ƒ๊ด€๊ด€๊ณ„๋ฅผ ๋ฝ‘์•„๋‚ด์„œ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ ์—ญ์‹œ ์ค‘์š”ํ•˜์ง€๋งŒ, ์ด๋ฅผ ์œ„ํ•ด์„œ ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„๋ฅผ ์ง„ํ–‰ํ•˜๋ฉด์„œ ๋ฐ์ดํ„ฐ ๋ถ„์„์— ๋Œ€ํ•œ ์ข€ ๋” ๋„“์€ ์‹œ์•ผ๋ฅผ ๊ฐ€์ง€๊ฒŒ ๋˜์—ˆ๋˜ ๊ฒƒ ๊ฐ™๋‹ค. ์ˆ˜์—…์‹œ๊ฐ„์— ๋ฐฐ์šด ์ด๋ก ๋งŒ์„ ๋ฐ”ํƒ•์œผ๋กœ ์ •๋ง ๋งŽ์€ ์‚ฝ์งˆ์„ ํ•˜๋ฉด์„œ ๋ฐฐ์šด ๋ฐฉ๋ฒ•๋“ค์ด๊ธฐ ๋•Œ๋ฌธ์—, ์•„๋งˆ ๋‘๊ณ ๋‘๊ณ  ์ƒ๊ฐ๋‚˜์ง€ ์•Š์„๊นŒ ์‹ถ๋‹ค๐Ÿ˜€๐Ÿ˜€.

๐Ÿ ๋‹ค์Œ ๊ธ€์—์„œ๋Š” ํŠธ๋žœ์žญ์…˜ ๋ฐ์ดํ„ฐ์—์„œ support, confidence, lift๋ฅผ ๊ตฌํ•˜๊ณ  ์‹œ๊ฐํ™”ํ•˜๋Š” ๋ถ€๋ถ„์„ ๋‹ค๋ฃฐ ๊ฒƒ์ด๋‹ค.


Leave a comment