๐Ÿงฉ ์ €๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” categorical data์— ๋Œ€ํ•œ distance measure๋ฅผ ์•Œ์•„๋ณด์•˜๋‹ค. ์ด๋ฒˆ์—๋Š” Numerical Data๋ฅผ ์œ„ํ•œ measure์ธ Minkowski Distance๋ฅผ ๋ฐฐ์›Œ๋ณด๋„๋ก ํ•˜์ž.

1. Basic Minkowski Distance

๐Ÿ‘‰ Minkowski Distance ์—ญ์‹œ ๋‘ object๋“ค ์‚ฌ์ด์˜ distance๋ฅผ ๊ณ„์‚ฐํ•  ๋•Œ ์‚ฌ์šฉ๋œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค๋ฉด,

์ด์™€ ๊ฐ™์ด $l$๊ฐœ์˜ feature๋ฅผ ๊ฐ€์ง„ ๋ฐ์ดํ„ฐ์˜ ๋ชจ๋“  feature์— ๋Œ€ํ•ด์„œ Basic Minkowski Distance๋Š” ์•„๋ž˜์™€ ๊ฐ™์ด ์ •์˜๋œ๋‹ค.

$d(i,j)\;=^p\sqrt{|x_{i1}-x_{j1}|^p+|x_{i2}-x_{j2}|^p+...+|x_{il}-x_{jl}|^p}$


๊ทธ๋ฆฌ๊ณ  ์ด๋–„์˜ $p$๊ฐ’์— ๋Œ€ํ•ด์„œ Minkowski Distance๋ฅผ $L-p\;norm$ ์ด๋ผ ํ•œ๋‹ค.

Minkowski Distance ๋Š” ๋ช‡๊ฐ€์ง€ ์„ฑ์งˆ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š”๋ฐ,

  • $d(i,j)>0\;\;(when\;\;i\neq{j})$
  • $d(i,i)=0\;\;\,(positivity)$
  • $d(i,j)=d(j,i)\;\;(symmetry)$
  • $d(i,j)\leqq{d(i,k)}+d(k,j)\;\;(Triangle\;Inequality)$

๋งจ ๋งˆ์ง€๋ง‰ ์„ฑ์งˆ์ด ์ดํ•ด๊ฐ€ ์•ˆ ๊ฐˆ ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋Š” $i,j,k$ ์„ธ ์ ์ด ์‚ผ๊ฐํ˜•์„ ์ด๋ฃฐ ๋•Œ๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋œ๋‹ค. ํ•œ ๋ณ€์˜ ๊ธธ์ด๊ฐ€ ๋‘๋ณ€์˜ ๊ธธ์ด์˜ ํ•ฉ๋ณด๋‹ค ์ž‘์•„์•ผ ํ•œ๋‹ค๋Š” ์‚ผ๊ฐํ˜•์˜ ์ƒ์„ฑ์กฐ๊ฑด์— ์˜ํ•œ ์„ฑ์งˆ์ด๋‹ค.

๐Ÿงฉ ์œ„์—์„œ ์–ธ๊ธ‰ํ–ˆ๋“ฏ์ด Minkowski distance๋Š” $p$๊ฐ’์— ์˜ํ•ด ์ˆ˜์‹๊ณผ ์ด๋ฆ„์ด ๋‹ฌ๋ผ์ง„๋‹ค. ์ด์ œ๋Š” ๊ทธ ๊ฒฝ์šฐ์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๋„๋ก ํ•˜์ž.


2. L-p Norm

  • p = 1 ์ธ ๊ฒฝ์šฐ
    • L1 Norm, Manhattan Distance
    • ๋‹จ์ˆœ ๊ฑฐ๋ฆฌ์˜ ํฌ๊ธฐ์˜ ํ•ฉ.
$d(i,j)\;=|x_{i1}-x_{j1}|+|x_{i2}-x_{j2}|+...+|x_{il}-x_{jl}|$
  • p = 2 ์ธ ๊ฒฝ์šฐ
    • L2 Norm, Euclidean Distance
    • ํ”ํžˆ ์ˆ˜ํ•™์—์„œ ์ ‘ํ•  ์ˆ˜ ์žˆ๋Š” ๋‘ ์  ์‚ฌ์ด์˜ ๊ฑฐ๋ฆฌ ๊ณต์‹.
$d(i,j)\;=\sqrt{|x_{i1}-x_{j1}|^2+|x_{i2}-x_{j2}|^2+...+|x_{il}-x_{jl}|^2}$


  • p $\rightarrow$ โˆž ์ธ ๊ฒฝ์šฐ
    • Lmax Norm, Lโˆž Norm, Supremum Distance
    • ๊ฑฐ๋ฆฌ์˜ ํฌ๊ธฐ๋“ค ์ค‘ ์ตœ๋Œ“๊ฐ’์„ ์„ ํƒ.
$d(i,j)\;=max(|x_{i1}-x_{j1}|,\,|x_{i2}-x_{j2}|,\,...,\,|x_{il}-x_{jl}|)$


๐Ÿงฉ ๋‹น์—ฐํžˆ, ์ด Mimkowski Distance์˜ ๊ฒฐ๊ณผ ์—ญ์‹œ Distance Matrix์˜ ํ˜•ํƒœ๋กœ ๋งŒ๋“ค์–ด ์ค„ ์ˆ˜ ์žˆ๋‹ค. ๊ด€๋ จ ๋งํฌ๋ฅผ ์ฒจ๋ถ€ํ•ด ๋‘์—ˆ์œผ๋‹ˆ ํ•„์š”ํ•œ ์‚ฌ๋žŒ์€ ์ฐธ๊ณ ํ•ด๋„ ์ข‹์„ ๊ฒƒ ๊ฐ™๋‹ค๐Ÿ˜Š.

๐Ÿ“ Distance Matrix ๊ด€๋ จ ํฌ์ŠคํŒ….


๐Ÿงฉ ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” Numerical Attribute์˜ distance measure๋ฅผ ๋‹ค๋ค„๋ณด์•˜๋‹ค. ์ˆ˜์‹์— ๋ฃจํŠธ๋„ ๋“ค์–ด๊ฐ€ ์žˆ์–ด์„œ ์•ฝ๊ฐ„ ๊ท€์ฐฎ์•„๋ณด์ผ ์ˆ˜ ์žˆ์ง€๋งŒ, ๊ทธ ๋ฐฉ์‹์€ ์ƒ๊ฐ๋ณด๋‹ค ๊ฐ„๋‹จํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ง์ ‘ ๊ตฌํ˜„ํ•ด ๋ณด๋Š” ๊ฒƒ๋„ ์–ด๋ ต์ง€ ์•Š์„ ๊ฒƒ์ด๋ผ ์ƒ๊ฐํ•œ๋‹ค. ๋‹ค์Œ ํฌ์ŠคํŒ…์—์„œ๋Š” Document frequency๋ฅผ ์œ„ํ•œ distance measure๋ฅผ ๋ฐฐ์›Œ๋ณด์ž๐Ÿƒโ€โ™‚๏ธ๐Ÿƒโ€โ™‚๏ธ.


๐Ÿ’ก์œ„ ํฌ์ŠคํŒ…์€ ํ•œ๊ตญ์™ธ๊ตญ์–ด๋Œ€ํ•™๊ต ๋ฐ”์ด์˜ค๋ฉ”๋””์ปฌ๊ณตํ•™๋ถ€ ๊ณ ์œคํฌ ๊ต์ˆ˜๋‹˜์˜ [์ƒ๋ช…์ •๋ณดํ•™์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹] ๊ฐ•์˜ ๋‚ด์šฉ์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•จ์„ ๋ฐํž™๋‹ˆ๋‹ค.

Leave a comment