๐Ÿงฉ ์ด๋ฒˆ ํฌ์ŠคํŒ…๋ถ€ํ„ฐ๋Š” Dataset์—์„œ Pattern์„ ์ฐพ๋Š” Pattern Discovery ์— ๋Œ€ํ•ด์„œ ๋‹ค๋ฃฐ ๊ฒƒ์ด๋‹ค. ํŠนํžˆ ์ด ๋ถ€๋ถ„์€ ์ƒ๋ช…์ •๋ณดํ•™ ๋ถ„์•ผ์— ์žˆ์–ด์„œ ๊ฝค๋‚˜ ํฐ ๋น„์ค‘์„ ์ฐจ์ง€ํ•˜๊ณ , ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹์— ๊ด€๋ จ๋œ ์ค‘์š”ํ•œ ๊ธฐ๋ฒ• ์—ญ์‹œ ๋งŽ์ด ๋‚˜์˜ค๊ธฐ ๋•Œ๋ฌธ์— ์ฃผ์˜๊นŠ๊ฒŒ ๋ณด๋ฉด ์ข‹์„ ๊ฒƒ ๊ฐ™๋‹ค (๋‹น์—ฐํžˆ ์–‘๋„ ๋งŽ๋‹คโ€ฆ๐Ÿ˜จ๐Ÿ˜จ).

๐Ÿงฉ ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” Pattern Discovery์— ๊ด€๋ จ๋œ ๊ธฐ์ดˆ ๊ฐœ๋…์„ ์•Œ์•„๋ณด๋„๋ก ํ•˜์ž.


1. Pattern Doscovery ๋ž€??

  • Patterns : ํ•˜๋‚˜์˜ dataset์—์„œ ํ•จ๊ป˜ ๋ฐœ์ƒํ•˜๊ฑฐ๋‚˜ ์—ฐ๊ด€๊ด€๊ณ„๊ฐ€ ๊นŠ์€ ๊ฒƒ๋“ค.
    • ex) ํ•จ๊ป˜ ํŒ”๋ฆฐ ๋ฌผ๊ฑด๋“ค / ๊ฐ™์ด ๋‚˜ํƒ€๋‚˜๋Š” ๋‹จ์–ด๋“ค / ํ•จ๊ป˜ ๋‚˜ํƒ€๋‚˜๋Š” sequence
  • Pattern Doscovery : dataset์—์„œ inherent reqularities๋ฅผ ์ฐพ๋Š” ๊ฒƒ. ์ฆ‰, ๊ณ ์œ ํ•œ ๊ทœ์น™์„ ์ฐพ๋Š” ๊ฒƒ.

  • ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹์„ ์œ„ํ•œ ๊ธฐ์ดˆ ์ž‘์—…์ด๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Œ.
    • Association / Correlation / Casuality analysis
    • Mining Sequential / Structure Patterns
    • ํŒจํ„ด๋ถ„์„ : ์‹œ๊ณต๊ฐ„ / ๋ฉ€ํ‹ฐ๋ฏธ๋””์–ด / ์‹œ๊ณ„์—ด / ์ŠคํŠธ๋ฆผ๋ฐ์ดํ„ฐ
    • Classification : pattern based analysis - Discriminative
    • Cluster analysis : pattern based clustering - subspace
  • ์ ์šฉ๊ฐ€๋Šฅํ•œ ๋ถ„์•ผ : Market basket / Cross marketing / Catalog design / Biological Sequence

๐Ÿงฉ ์ด๋ ‡๊ฒŒ ํ•ด์„œ Pattern Discovery์˜ ๊ฐœ๋…์— ๋Œ€ํ•ด์„œ ๊ฐ„๋žตํ•˜๊ฒŒ ์•Œ์•„๋ณด์•˜๋‹ค. ์ด์ œ๋Š” ๊ธฐ๋ณธ์ ์ธ ์šฉ์–ด๋“ค์„ ์‚ดํŽด๋ณด๋„๋ก ํ•˜์ž.

2. Pattern Doscovery ๊ธฐ์ดˆ

๐Ÿšฉ 2.1. K-itemsets and Support

  • itemset : ํ•˜๋‚˜ ์ด์ƒ itemset์˜ set
  • K-itemset : K๊ฐœ๋กœ ๊ตฌ์„ฑ๋œ itemset

  • absolute-support

    • sup{X}.
    • itemset X์˜ ์ถœํ˜„๋นˆ๋„. ์–ผ๋งˆ๋‚˜ ๋งŽ์ด ๋“ฑ์žฅํ–ˆ๋Š”๊ฐ€.
    • Frequency
  • relative-support
    • s{X}.
    • itemset X๋ฅผ ํฌํ•จํ•œ transaction์˜ ๋น„์œจ.
    • $\frac{Sup}{total\;transaction}$

๐Ÿงฉ ์ •๋ฆฌํ•ด๋ณด๋ฉด absolute-support ๋Š” itemset X์˜ ๋นˆ๋„๋ฅผ ๋‚˜ํƒ€๋‚ด๊ณ , relative-suppor ๋Š” itemset X์˜ ๋น„์œจ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค. ์ € ์„ค๋ช…๋งŒ ๋ณด๊ณ  ๋‘ ๊ฐœ๋…์— ๋Œ€ํ•œ ์ฐจ์ด๊ฐ€ ๋ฐ”๋กœ ๋Š๊ปด์ง€๊ธฐ์—๋Š” ์–ด๋ ค์šธ ๊ฑฐ๋ผ ์ƒ๊ฐํ•ด์„œ ์ˆ˜์—…์‹œ๊ฐ„์— ๋‹ค๋ฃฌ ์˜ˆ์ œ๋ฅผ ํ•˜๋‚˜ ๊ฐ€์ ธ์™€๋ดค๋‹ค.


$sup\{Beer\}=3\;\;\;\;,\;\;\;\;s\{Beer\}=3/5=60\%$


$sup\{Diapper\}=4\;\;\;\;,\;\;\;\;s\{Diapper\}=4/5=80\%$


$sup\{Beer,Diapper\}=3\;\;\;\;,\;\;\;\;s\{Beer,Diapper\}=3/5=60\%$


$sup\{Beer,Eggs\}=1\;\;\;\;,\;\;\;\;s\{Beer,Eggs\}=1/5=20%$


๐Ÿงฉ ์œ„์˜ ์ฒจ๋ถ€ํ•œ ํ‘œ์™€ ๊ฑ‘์€ ๋ฐ์ดํ„ฐ๋ฅผ ๋งˆ์ผ“์—์„œ ๊ตฌ๋งคํ•œ ๋ฌผํ’ˆ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋‹ด์€ ๋ฐ์ดํ„ฐ๋ผ ํ•ด์„œ Transaction DB ๋ผ๊ณ  ํ•œ๋‹ค.


๐Ÿšฉ 2.2. Frequent Itemsets (Patterns)

  • minsup : ์ž„์˜๋กœ ์„ค์ •ํ•œ relative-support์˜ Thresholds

  • ๋งŒ์•ฝ itemset X ์˜ relative-support s{X} ๊ฐ€ ์„ค์ •ํ•œ minsup ์ด์ƒ์ด๋ฉด X๋Š” Frequentํ•˜๋‹ค๊ณ  ํ•œ๋‹ค.

  • ์ฆ‰, Transaction DB์—์„œ ํ•จ๊ป˜, ์ž์ฃผ ๋‚˜ํƒ€๋‚˜๋Š” K-itemset ์„ ์–ด๋–ป๊ฒŒ ์ฐพ์•„๋‚ผ ๊ฒƒ์ธ๊ฐ€์— ๋Œ€ํ•œ ๊ฐœ๋…์ด๋‹ค.

๐Ÿงฉ ์ด ๊ฐœ๋… ์—ญ์‹œ ์˜ˆ์ œ๋ฅผ ํ•œ๋ฒˆ ์‚ดํŽด๋ณด๋„๋ก ํ•˜์ž!!


$Let\;\,minsup\;\;ฯƒ=50\%$


$s\{Beer\}=60\%\;\;\;,\;\;\;s\{Nuts\}=60\%$


$s\{Diapper\}=80\%\;\;\;,\;\;\;s\{Eggs\}=60\%$


$s\{Milk\}=40\%\;\;\;,\;\;\;s\{Beer,Diapper\}=60\%\;\;\;,\;\;\;s\{Nuts,Diapper\}=40\%$


๐Ÿ‘‰ ์œ„์˜ ์˜ˆ์‹œ์—์„œ $minsup=50\%$ ์ด์ƒ์˜ frequent ํ•œ itemset X๋Š” {Beer} {Nuts} {Diapper} {Eggs} {Beer,Diapper} ์— ํ•ด๋‹นํ•œ๋‹ค. ๋ฐ˜๋ฉด {Milk} ๋Š” $40\%$ ๋กœ $minsup$ ๋ณด๋‹ค ์ž‘๊ธฐ ๋•Œ๋ฌธ์— frequent ํ•˜์ง€ ์•Š๋‹ค. ์—ฌ๊ธฐ์„œ ์ฃผ์˜๊นŠ๊ฒŒ ๋ด์•ผํ•  ์ ์ด ์žˆ๋Š”๋ฐ, 2-itemset {Beer,Diapper} ๊ฐ€ frequent ํ•˜๋‹ค๋Š” ๊ฒƒ์€ 1-itemset {Beer}{Diapper} ๊ฐ๊ฐ์ด frequentํ•˜๋‹ค๋Š” ๊ฒƒ๋„ ์˜๋ฏธํ•œ๋‹ค. ๋”ฐ๋ผ์„œ 2-itemset์˜ frequent ๋ฅผ ํŒ๋‹จํ•จ์œผ๋กœ์จ ๊ฐ sub itemset์˜ frequent ์—ญ์‹œ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๋‹ค๐Ÿ™ƒ๐Ÿ™ƒ. ์œ„์˜ ์˜ˆ์‹œ์—์„œ 3๊ฐœ์˜ item ์ด ๋‘ ๊ฐœ ์ด์ƒ์˜ transaction์—์„œ ๋‚˜์˜ค๋Š” ๊ฒฝ์šฐ๋Š” ์—†๊ธฐ ๋•Œ๋ฌธ์— 2-itemset๊นŒ์ง€๋งŒ ๊ตฌํ•ด์ฃผ์—ˆ๋‹ค.


๐Ÿšฉ 2.3. Association Rule Mining

๐Ÿงฉ Association Rule Mining ์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๊ธฐ ์ „์— ์•Œ์•„์•ผ ํ•  ๊ฐœ๋…์ด ํ•˜๋‚˜ ์žˆ๋‹ค. ์ด ์นœ๊ตฌ ๋จผ์ € ์Šฌ์ฉ ์‚ดํŽด๋ณด๊ณ  ๊ฐ€๋„๋ก ํ•˜์ž.


๐Ÿ“ Support

  • transaction ์ด $X\cup{Y}$ ๋ฅผ contain ํ•  ํ™•๋ฅ . ์ฆ‰, $X,Y$ ๋‘ itemset์„ ๋ชจ๋‘ ํฌํ•จํ•  ํ™•๋ฅ .
  • ex) s{Beer,Diapper}=60%

๐Ÿ“ Confidence

  • conditional probability : $X\cup{Y}$ ์— ๋Œ€ํ•œ ์กฐ๊ฑด๋ถ€ํ™•๋ฅ 
  • $c=sup(X\cup{Y})/sup(X)$
  • confidence ๊ณ„์‚ฐ์„ ์œ„ํ•ด ์‚ฌ์šฉํ•˜๋Š” support๋Š” absolute-support์ž„์— ์œ ์˜.

  • ํ‘œํ˜„์€ $X\rightarrow{Y}(support,confidence)$ ๋กœ ํ•œ๋‹ค.
    • $X\rightarrow{Y}(s,c) : c=sup(X\cup{Y})/sup(X)$
    • $Y\rightarrow{X}(s,c) : c=sup(X\cup{Y})/sup(Y)$
    • $X\rightarrow{Y}(s,c)$ ์—์„œ ํ™”์‚ดํ‘œ์˜ ์‹œ์ž‘ ๋ถ€๋ถ„์— ์žˆ๋Š” itemset X๊ฐ€ confidence์˜ ์กฐ๊ฑด์„ ์˜๋ฏธํ•œ๋‹ค.

๐Ÿงฉ support, confidence ๋ผ๋Š” ๊ฐœ๋…์„ ์•Œ์•„๋ณด์•˜๋‹ค. ์ด์ œ๋Š” Association Rule Mining์„ ์•Œ์•„๋ณด๋„๋ก ํ•˜์ž๐Ÿ™„.


๐Ÿงฉ Association Rule Mining ์—์„œ๋Š” ๋‘ ๊ฐœ์˜ ์ž„๊ณ„์น˜๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค. ์•„๊นŒ ์‚ฌ์šฉํ–ˆ๋˜ minsup๊ณผ confidence์— ๋Œ€ํ•œ ์ž„๊ณ„์น˜์ธ minconf ์ด๋‹ค. ๊ทธ ๋ชฉ์ ์€ minsup๊ณผ minconf๋ฅผ ๋งŒ์กฑํ•˜๋Š” ์—ฐ๊ด€์„ฑ์„ ํŒŒ์•…ํ•˜๋Š” ๊ฒƒ์ด๋ฉฐ, ์ตœ์ข…์ ์œผ๋กœ ๊ทธ ์—ฐ๊ด€์„ฑ์„ ๋‚˜ํƒ€๋‚ด๋Š” ๋ชจ๋“  rule์„ ์ฐพ์•„์•ผ ํ•œ๋‹ค.

๐Ÿ“ Association Rule Mining

  • ๋‘ ๊ฐœ์˜ ์ž„๊ณ„์น˜ : minsup, minconf ์‚ฌ์šฉ
  • ํ•จ๊ป˜ ๋“ฑ์žฅํ•˜๋Š” itemset ๊ฐ„์˜ ์—ฐ๊ด€์„ฑ์„ ํŒŒ์•…ํ•ด์•ผ ํ•˜๋ฏ€๋กœ 2-itemset์ด ์กด์žฌํ•ด์•ผ ํ•œ๋‹ค.

  • Find all rules : $X\rightarrow{Y}(s,c)\;\;that\;\;s\geq{minsup}\;\;and\;\;c\geq{minconf}$

๐Ÿงฉ ์œ„์—์„œ ์‚ฌ์šฉํ•œ Transaction Data๋ฅผ ๊ฐ€์ง€๊ณ  Association Rule์„ ์ฐพ์•„๋ณด์ž.

๐Ÿ“Œ ์šฐ์„  minsup์„ ๋งŒ์กฑํ•˜๋Š” itemset์„ ๋จผ์ € ์ฐพ์•„๋ณด๋„๋ก ํ•˜์ž.

$Let\;\,minsup\;\;ฯƒ=50\%$


$s\{Beer\}=60\%\;\;\;,\;\;\;s\{Nuts\}=60\%$


$s\{Diapper\}=80\%\;\;\;,\;\;\;s\{Eggs\}=60\%$


$s\{Milk\}=40\%\;\;\;,\;\;\;s\{Beer,Diapper\}=60\%\;\;\;,\;\;\;s\{Nuts,Diapper\}=40\%$


๐Ÿ“Œ ์ด๋•Œ $minsup = 50\%$ ์ด์ƒ์ธ itemset์€ {Beer} {Nuts} {Diapper} {Eggs} {Beer,Diapper} ์ด๋‹ค.


๐Ÿ“Œ minsup์„ ๋งŒ์กฑํ•˜๋Š” itemset์„ ์ฐพ์•˜์œผ๋‹ˆ ์ด๋ฒˆ์—๋Š” minconf๋ฅผ ๋งŒ์กฑํ•˜๋Š” itemset์„ ์ฐพ์•„์•ผ ํ•œ๋‹ค. ๋‘ ์ž„๊ณ„์น˜๋ฅผ ๋ชจ๋‘ ๋งŒ์กฑํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์šฐ๋ฆฌ๊ฐ€ ์ฐพ๋Š” rule์ด๊ธฐ ๋•Œ๋ฌธ์—, ์œ„์—์„œ ์ฐพ์€ itemset์—์„œ confidence๋ฅผ ๊ณ„์‚ฐํ•˜๋ฉด ๋œ๋‹ค.

$Let\;\,minconf\;\;ฯƒ=50\%$


$Beer\rightarrow{Diapper} : c=sup(Beer\cup{Diapper})/sup(Beer)=3/3=1$


$Diapper\rightarrow{Beer} : c=sup(Beer\cup{Diapper})/sup(Diapper)=3/4=0.75$


๐Ÿ“Œ ์ด๋ ‡๊ฒŒ ๊ตฌํ•œ support์™€ confidence๋กœ Association Rule์„ ํ‘œํ˜„ํ•˜๋ฉด ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

$Beer\rightarrow{Diapper}(s,c)=(60\%,100\%)$


$Diapper\rightarrow{Beer}(s,c)=(60\%,75\%)$



๐Ÿ“Œ $Beer$๊ฐ€ ์„ ํ–‰์กฐ๊ฑด์ธ ๊ฒฝ์šฐ์™€ $Diapper$๊ฐ€ ์„ ํ–‰์กฐ๊ฑด์ธ ๊ฒฝ์šฐ ๋ชจ๋‘ minconf๋ฅผ ๋งŒ์กฑํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์œ„์˜ Transaction Data์—์„œ Association Rule์€ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์กด์žฌํ•œ๋‹ค.

$Beer\rightarrow{Diapper}\;\;\;\;\;and\;\;\;\;\;Diapper\rightarrow{Beer}$


3. Summary

๐Ÿงฉ ์ด๋ ‡๊ฒŒ ํ•ด์„œ

  • itemset
  • absolute-support
  • relative-support
  • Frequent Itemsets
  • Confidence
  • Association Rule Mining

์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด์•˜๋‹ค.

๐Ÿงฉ ๊ฒฐ๋ก ์ ์œผ๋กœ Transaction Data ์˜ K-itemset ์œผ๋กœ๋ถ€ํ„ฐ support ์™€ confidence ๋ฅผ ๊ตฌํ•˜๊ณ , ๊ทธ ๊ฐ’๋“ค์„ ๋ฐ”ํƒ•์œผ๋กœ Association Rule Mining ์„ ๋งŒ์กฑํ•˜๋Š” rule์„ ์ฐพ๋Š” ๊ฒƒ์ด ์˜ค๋Š˜ ๋ฐฐ์šด ๋‚ด์šฉ์˜ ๋ชฉ์ ์ด๋ผ ํ•  ์ˆ˜ ์žˆ์„ ๊ฒƒ ๊ฐ™๋‹ค. ํ•˜์ง€๋งŒ ์˜ค๋Š˜ ์„ค๋ช…์„ ์œ„ํ•ด ์‚ฌ์šฉํ•œ ์˜ˆ์ œ๋Š” 2-itemset์ด ์ตœ๋Œ€์ธ ๊ฒฝ์šฐ์˜€๊ธฐ ๋•Œ๋ฌธ์—, ๋‹ค์Œ ํฌ์ŠคํŒ…์—์„œ๋Š” frequent pattern์ด ๋„ˆ๋ฌด ๋งŽ์€ ๊ฒฝ์šฐ์— ๋Œ€ํ•ด์„œ ์•Œ์•„๋ณด๋„๋ก ํ•˜์ž๐Ÿƒโ€โ™‚๏ธ๐Ÿƒโ€โ™‚๏ธ.

๐Ÿงฉ ์ „์ฒ˜๋ฆฌ๋‚˜ Classification, Clustering์€ ํ‰์†Œ์— ๊ณต๋ถ€ํ•˜๋ฉด์„œ ์–ด๋Š์ •๋„ ์ต์ˆ™ํ•œ ๋Š๋‚Œ์ด ์žˆ์—ˆ๋Š”๋ฐ, ํŒจํ„ด๋ถ„์„์€ ์ง€๋‚œ ํ•™๊ธฐ์— ์™„์ „ ์ฒ˜์Œ ๋ฐฐ์šด ๋‚ด์šฉ์ด์—ˆ๊ณ  ์˜ค๋žœ๋งŒ์— ๊ณต๋ถ€ํ•˜๋‹ค ๋ณด๋‹ˆ ๋งŽ์ด ํ—ท๊ฐˆ๋ ธ๋˜ ๊ฒƒ ๊ฐ™๋‹ค. ์™„์ „ํžˆ ์ƒˆ๋กœ ๋ฐฐ์šฐ๋Š”(โ€ฆ๐Ÿ˜จ) ๋Š๋‚Œ์ด ๋‚˜์„œ ํฌ์ŠคํŒ…ํ•˜๋Š” ๋ฐ์— ์ •๋ง ๋งŽ์€ ์‹œ๊ฐ„์ด ๊ฑธ๋ ธ์ง€๋งŒ, ์ด๋ ‡๊ฒŒ ์ •๋ฆฌํ•˜๊ณ  ๋‚˜๋‹ˆ ๊ทธ๋ž˜๋„ ์–ด๋Š ์ •๋„ ์ •๋ฆฌ๋˜๋Š” ๊ฒƒ ๊ฐ™์•„ ์ข‹์•˜๋‹ค. ์•ž์œผ๋กœ์˜ ํŒจํ„ด๋ถ„์„ ๋‚ด์šฉ์ด ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ ๋‹ค๋ฃฌ ๊ฐœ๋ณธ์ ์ธ ๊ฐœ๋…์„ ๋ฐ”ํƒ•์œผ๋กœ ์ง„ํ–‰๋˜๊ธฐ ๋•Œ๋ฌธ์— ๋‚˜๋ฆ„ ๊ผผ๊ผผํ•˜๊ฒŒ ์ •๋ฆฌํ–ˆ๋Š”๋ฐ ์–ด๋–จ์ง€ ๋ชจ๋ฅด๊ฒ ๋‹ค. ์•„๋ฌด์ชผ๋ก ๋„์›€์ด ๋˜๋ฉด ์ข‹๊ฒ ๋‹คใ…Žใ…Ž๐Ÿ™‚๐Ÿ™‚.


๐Ÿ’ก์œ„ ํฌ์ŠคํŒ…์€ ํ•œ๊ตญ์™ธ๊ตญ์–ด๋Œ€ํ•™๊ต ๋ฐ”์ด์˜ค๋ฉ”๋””์ปฌ๊ณตํ•™๋ถ€ ๊ณ ์œคํฌ ๊ต์ˆ˜๋‹˜์˜ [์ƒ๋ช…์ •๋ณดํ•™์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹] ๊ฐ•์˜ ๋‚ด์šฉ์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•จ์„ ๋ฐํž™๋‹ˆ๋‹ค.

Leave a comment