๐Ÿงฉ ์ €๋ฒˆ ํฌ์ŠคํŒ…๊นŒ์ง€ ํ•ด์„œ Data Integration์„ ๋‹ค๋ค˜๋‹ค. ์ด์ œ๋ถ€ํ„ฐ๋Š” ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด ๊ฐ€์žฅ ์ค‘์š”ํ•˜๋‹ค๊ณ  ํ•  ์ˆ˜ ์žˆ๋Š” Data Reduction์— ๋Œ€ํ•ด ์•Œ์•„๋ณด๋„๋ก ํ•˜์ž.

๐Ÿงฉ ์ด๋ฒˆ ํฌ์ŠคํŒ…์—์„œ๋Š” Data Reduction์„ ์™œ ํ•ด์•ผ ํ•˜๋Š”์ง€, ์™œ ์ค‘์š”ํ•œ์ง€. ๊ทธ๋ฆฌ๊ณ  ์–ด๋–ค ์ข…๋ฅ˜๊ฐ€ ์žˆ๋Š”์ง€๋ฅผ ๊ฐ€๋ณ๊ฒŒ ๋‹ค๋ฃฐ ๊ฒƒ์ด๋‹ค.


1. Data Reduction์ด๋ž€??

๐Ÿงฉ ์‹ค์ œ๋กœ ์šฐ๋ฆฌ๊ฐ€ ๋‹ค๋ฃฐ ๋ฐ์ดํ„ฐ์—๋Š” ๋ถˆํ•„์š”ํ•œ ์ •๋ณด๋“ค๋„ ๋งŽ์ด ํฌํ•จ๋˜์–ด ์žˆ๊ณ , ์ด๋ฏธ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ฐ’์„ ์ค‘๋ณตํ•ด์„œ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๊ฒฝ์šฐ๋„ ์žˆ๋‹ค. ๋˜ํ•œ ๋น„์Šทํ•œ ์˜๋ฏธ๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์–ด ํ•ฉ์น  ์ˆ˜ ์žˆ์œผ๋‚˜ ์›๋ณธ ๋ฐ์ดํ„ฐ์—์„œ๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ attribute๋กœ ๋‚˜๋ˆ ์ ธ ์žˆ๋Š” ๊ฒฝ์šฐ๋„ ์—ญ์‹œ ์กด์žฌํ•œ๋‹ค. ์ด๋ ‡๊ฒŒ ๋ณต์žกํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถ„์„ํ•˜๋Š” ๋ฐ์—๋Š” ์‹œ๊ฐ„๋„, ๋…ธ๋ ฅ๋„ ๋งŽ์ด ํ•„์š”ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฏธ๋ฆฌ ๋ฐ์ดํ„ฐ๋ฅผ ์–ด๋Š์ •๋„ ๊ฐ„๋‹จํžˆ ๋งŒ๋“œ๋Š” ๊ณผ์ •์ด ํ•„์š”ํ•˜๋‹ค. ์ด๋ ‡๊ฒŒ ๋ถˆํ•„์š”ํ•œ attribute ๋˜๋Š” object๋ฅผ ์ค„์—ฌ ๋ฐ์ดํ„ฐ์˜ dimension์„ ์ค„์ด๋Š” ๊ณผ์ •์„ Data Reduction์ด๋ผ๊ณ  ํ•œ๋‹ค.


2. Data Reduction ๋ฐฉ๋ฒ•

๐Ÿงฉ Data์˜ ๋ณต์žก๋„๋ฅผ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•์—๋Š” object๋ฅผ ์ค„์ด๊ฑฐ๋‚˜ attribute, ์ฆ‰ dimension์„ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•์ด ์กด์žฌํ•œ๋‹ค. ๋˜ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋ƒฅ ์••์ถ•ํ•˜๋Š” ๋ฐฉ๋ฒ•๋„ ์žˆ๋‹ค. ๊ฐ๊ฐ์— ๋Œ€ํ•ด ๊ฐ„๋‹จํžˆ ์•Œ์•„๋ณด๋„๋ก ํ•˜์ž.

๐Ÿ“ 1. object ์ค„์ด๊ธฐ : Numerosity Reduction

- Parametric Methods
- ์—…๋ฐ์ดํŠธ ํ•  parameter๋ฅผ ๊ฐ€์ง€๋Š” ๋ฐฉ๋ฒ•
- Reduction์„ ์œ„ํ•œ Assumption์ด ํ•„์š”ํ•จ
- ์ฆ‰, ๋ฐ์ดํ„ฐ๊ฐ€ ์–ด๋– ํ•œ ๋ชจ๋ธ์— fitting๋  ๊ฒƒ์ด๋ผ๋Š” ์ž„์˜์˜ ๋ชจ๋ธ์„ ๊ฐ€์ •ํ•˜๊ณ  ์ง„ํ–‰
- ex) Linear Regeression

- Non-Parametric Methods
- parameter๊ฐ€ ์—†๋Š” ๋ฐฉ๋ฒ•
- assumption์ด ์—†์Œ
- ๋ชจ๋ธ์„ ๊ฐ€์ •ํ•˜์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์–ด๋ ค์›€
- ex) Histogram, Clustering, Sampling

๐Ÿ“ 2. Attribute ์ค„์ด๊ธฐ : Dimensionality Reduction

- Principal Component Analysis (PCA)
- attribute๋ฅผ combinationํ•œ ์ƒˆ๋กœ์šด dimension ์ƒ์„ฑ
- ์ƒˆ๋กœ์šด dimension์„ ์ถ•์œผ๋กœ ํ•ด์„œ ๊ธฐ์กด์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์„ค๋ช…ํ•˜๋Š” ๋ฐฉ๋ฒ•

- Subset Selection
- ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์žฅ ์ž˜ ์„ค๋ช…ํ•  ์ˆ˜ ์žˆ๋Š” subset model์„ ์„ ํƒํ•ด์„œ dimension์„ ์ค„์ด๋Š” ๋ฐฉ๋ฒ•

๐Ÿ“ 3. Data Compression

- String Compression

- Audio / Video Compression


๐Ÿงฉ ์ด๋ ‡๊ฒŒ ํ•ด์„œ ๊ฐ„๋‹จํ•˜๊ฒŒ Data Reduction์„ ์•Œ์•„๋ณด์•˜๋‹ค. ๋‹ค์Œ ํฌ์ŠคํŒ…์˜ Parametric Method๋ถ€ํ„ฐ ๋ณธ๊ฒฉ์ ์œผ๋กœ ์•Œ์•„๋ณด๋„๋ก ํ•˜์ž๐Ÿƒโ€โ™‚๏ธ๐Ÿƒโ€โ™‚๏ธ.


๐Ÿ’ก์œ„ ํฌ์ŠคํŒ…์€ ํ•œ๊ตญ์™ธ๊ตญ์–ด๋Œ€ํ•™๊ต ๋ฐ”์ด์˜ค๋ฉ”๋””์ปฌ๊ณตํ•™๋ถ€ ๊ณ ์œคํฌ ๊ต์ˆ˜๋‹˜์˜ [์ƒ๋ช…์ •๋ณดํ•™์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹] ๊ฐ•์˜ ๋‚ด์šฉ์„ ๋ฐ”ํƒ•์œผ๋กœ ํ•จ์„ ๋ฐํž™๋‹ˆ๋‹ค.

Leave a comment