🧩 λ°μ΄ν„°λ§ˆμ΄λ‹μ˜ 첫번째 ν¬μŠ€νŒ…μž…λ‹ˆλ‹€. 이번 ν¬μŠ€νŒ…μ—μ„œλŠ” κ°„λ‹¨ν•˜κ²Œ λ°μ΄ν„°λ§ˆμ΄λ‹μ΄ 무엇인지 κ°œλ… μœ„μ£Όλ‘œ μ‚΄νŽ΄λ³΄κ³ μž ν•©λ‹ˆλ‹€.


1. 데이터 λ§ˆμ΄λ‹μ΄λž€??

  • 기쑴의 λ°μ΄ν„°μ—μ„œ μ˜λ―ΈμžˆλŠ” νŒ¨ν„΄μ΄λ‚˜ 지식을 μ–»λŠ” κ²ƒμž…λ‹ˆλ‹€.

  • λ‹€λ§Œ, κ°„λ‹¨ν•œ κ²€μƒ‰μ΄λ‚˜ μ •ν˜•ν™”λœ κ·œμΉ™μ„ 기반으둜 μž‘μ—…ν•˜λŠ” 것은 λ°μ΄ν„°λ§ˆμ΄λ‹μ΄λΌκ³  보기 μ–΄λ ΅μŠ΅λ‹ˆλ‹€.


2. λ°μ΄ν„°λ§ˆμ΄λ‹ μˆœμ„œ

  • 데이터 κ²°μ • : time-series / sequence / text / graphs / social…

  • λ„μΆœν•  insight κ²°μ • : classification / clustering / trend / deviation…

  • 적용 기술 κ²°μ • : machinelearning / deeplearning / statics / pattern…

  • μ μš©ν•  도메인 κ²°μ • : retail / banking / bio-data / stack / text…

  • 데이터 νƒ€μž…μ— λ§žλŠ” 방법을 μ‚¬μš©ν•΄μ„œ λ§ˆμ΄λ‹μ„ 진행해야 ν•©λ‹ˆλ‹€.

πŸ‘‰ Data Cleaning β–Ά Integration β–Ά Selection β–Ά Transform β–Ά Data Mining β–Ά Pattern Evaluation β–Ά Knowledge Presentation

πŸ‘‰ 각각의 μˆœμ„œμ—μ„œ μ‚¬μš©ν•˜λŠ” 기법듀은 μ•žμœΌλ‘œ ν¬μŠ€νŒ…ν•  μ˜ˆμ •μž…λ‹ˆλ‹€!!


3. λ°μ΄ν„°λ§ˆμ΄λ‹μ˜ Function

  • λ°μ΄ν„°λ§ˆμ΄λ‹μ€ μœ μ˜λ―Έν•œ νŒ¨ν„΄μ„ μ•Œμ•„λ‚΄λŠ” κ²ƒμž…λ‹ˆλ‹€. λ”°λΌμ„œ 이λ₯Ό μœ„ν•œ function듀이 μ‘΄μž¬ν•©λ‹ˆλ‹€.

  • Generalization
    • Information Integration & Data warehouse construction
    • Data Cube
    • 가지고 μžˆλŠ” 정보듀을 톡해 λ‹€μ–‘ν•œ κ°λ„λ‘œ 데이터λ₯Ό μΌλ°˜ν™”.
    • λŒ€μš©λŸ‰μ˜ 데이터λ₯Ό λ‹€λ£¨λŠ” κ²½μš°μ— 주둜 μ‚¬μš©.
  • Pattern Discovery
    • 데이터 / Attribute κ°„μ˜ 관계와 νŒ¨ν„΄μ„ 발견
    • ex) Frequent Patterns / Correlation Analysis
  • Classification
    • Supervised learning with training data (examples)
    • μ•Œλ €μ§€μ§€ μ•Šμ€ classλ₯Ό μ˜ˆμΈ‘ν•˜λŠ” 것 (label)
    • support vector machine / deep learning / bayesian / decision tree / logistic regression…
  • Clustering
    • Unsupervised learning
    • classλ₯Ό λͺ¨λ₯΄λŠ” μƒνƒœλ‘œ μœ μ‚¬ν•œ 데이터끼리 ꡰ집화
    • λ¬Άκ³  λ΄€λ”λ‹ˆ 묢인 λ°μ΄ν„°λ“€μ˜ νŠΉμ§•μ΄ μ΄λŸ¬μ΄λŸ¬ν•˜λ”λΌλ₯Ό νŒλ‹¨
    • rule : Maximizing intraclass similarity & Minimizing interclass similarity
      πŸ‘‰ 같은 κ·Έλ£Ή λ‚΄ μœ μ‚¬μ„± μ΅œλŒ€ν™” & λ‹€λ₯Έ 그룹끼리의 μœ μ‚¬μ„± μ΅œμ†Œν™”
  • Outlier Analysis
    • 데이터가 λ…Έμ΄μ¦ˆμΈμ§€, ν•„μ—°μ μœΌλ‘œ 생긴 특이 μΌ€μ΄μŠ€μΈμ§€ ꡬ뢄
    • fraud detection / rare events analysis
  • Time & Ordering
    • 주식, μ£Όκ°€, νƒœν’μ²˜λŸΌ 주기적으둜 λ°œμƒν•˜λŠ” νŒ¨ν„΄ 뢄석
    • Sequential Pattern, μ‹œκ°„μƒ μˆœμ„œκ΄€κ³„ κ³ λ €
    • μœ μ „μž μ‹œν€€μŠ€ 뢄석, μœ μ‚¬μ„± νŒŒμ•…
  • Structure & Network
    • κ·Έλž˜ν”„ λ§ˆμ΄λ‹, λ„€νŠΈμ›Œν¬ 뢄석, μ›Ή λ§ˆμ΄λ‹
  • Major Issues
    • 효율적인 μ•Œκ³ λ¦¬μ¦˜μΈκ°€
    • 데이터 양이 μ¦κ°€ν–ˆμ„ λ•Œλ„ 잘 μ μš©λ˜λŠ”κ°€
    • λ°μ΄ν„°μ˜ ν˜•νƒœμ— 따라 λ‹¬λΌμ§€λŠ”κ°€
    • μ‚¬νšŒμ μœΌλ‘œ μ‚¬μš© κ°€λŠ₯ν•œ 데이터인가 ex) ν”„λΌμ΄λ²„μ‹œ

🧩 μ΄λ ‡κ²Œ ν•΄μ„œ κ°„λ‹¨ν•˜κ²Œ λ°μ΄ν„°λ§ˆμ΄λ‹μ΄ 뭔지, μš°λ¦¬κ°€ 신경써야 ν•  뢀뢄이 어디인지, 그리고 μ–Όλ§ˆλ‚˜ λ‹€μ–‘ν•œ 상황에 μ‚¬μš©λ  수 μžˆλŠ”μ§€λ₯Ό κ°„λž΅ν•˜κ²Œ μ•Œμ•„λ³΄μ•˜μŠ΅λ‹ˆλ‹€. λ‹€μŒ ν¬μŠ€νŒ…μ—μ„œλŠ” 데이터와 object κ°„μ˜ μœ μ‚¬μ„±μ„ νŒλ‹¨ν•˜κΈ° μœ„ν•œ Distance measure에 λŒ€ν•΄ μ•Œμ•„λ³Ό κ²ƒμž…λ‹ˆλ‹€πŸ™‚.


πŸ’‘μœ„ ν¬μŠ€νŒ…μ€ ν•œκ΅­μ™Έκ΅­μ–΄λŒ€ν•™κ΅ λ°”μ΄μ˜€λ©”λ””μ»¬κ³΅ν•™λΆ€ 고윀희 κ΅μˆ˜λ‹˜μ˜ [생λͺ…정보학을 μœ„ν•œ λ°μ΄ν„°λ§ˆμ΄λ‹] κ°•μ˜ λ‚΄μš©μ„ λ°”νƒ•μœΌλ‘œ 함을 λ°νž™λ‹ˆλ‹€.

Leave a comment