๐Ÿ† 1. ๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ

๐Ÿ“ˆ ๋ฐ์ดํ„ฐ๋Š” ๋ฐ์ด์ฝ˜์—์„œ ๋ฐฐํฌํ•œ ์ฝ”๋“œ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ 2022-09-13 ๊นŒ์ง€์˜ ์ฝ”์Šคํ”ผ ์ง€์ˆ˜๋ฅผ ์ถ”์ถœํ•˜์˜€์Šต๋‹ˆ๋‹ค.

๐Ÿ“ˆ ๋ฐ์ดํ„ฐ ์„ค๋ช…

  • Close : ์ข…๊ฐ€
  • Open : ์‹œ๊ฐ€
  • High : ๊ณ ๊ฐ€
  • Low : ์ €๊ฐ€
  • Volume : ๊ฑฐ๋ž˜๋Ÿ‰
  • Change : ๋“ฑ๋ฝ๋ฅ (์ „์ผ๋Œ€๋น„ ๋“ฑ๋ฝ๋ฅ ) -> ๋“ฑ๋ฝ๋ฅ  = (ํ˜„์žฌ ์ข…๊ฐ€ - ์ „์ผ ์ข…๊ฐ€) / ์ „์ผ์ข…๊ฐ€

๐Ÿ“ˆ ๋ฐ์ดํ„ฐ ํ™•์ธ

# ์ฝ”๋“œ ์ง„ํ–‰ ์ค‘ ๋ถˆํ•„์š”ํ•˜๊ฒŒ ์ถœ๋ ฅ๋˜๋Š” warning ๋ฌด์‹œ ์ฝ”๋“œ
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np

# ์‹œ๊ณ„์—ด ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•œ datetime ์ž„ํฌํŠธ
from datetime import datetime

# ์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•œ seaborn, matplotlib ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์ž„ํฌํŠธ
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt

# ์‹œ๊ฐํ™”๋ฅผ ์œ„ํ•œ plotly library ์ž„ํฌํŠธ
import plotly.graph_objects as go
import plotly.offline as pyo
pd.options.plotting.backend = 'plotly'
import plotly.io as pio
pio.renderers.default = "notebook_connected"  
kospi = pd.read_csv('kospi_data.csv')
kospi


2. ๐Ÿ† KOSPI ๋ฐ์ดํ„ฐ ํ™•์ธ

  1. ๊ฒฐ์ธก์น˜ ์กด์žฌ์—ฌ๋ถ€ ํ™•์ธ
  2. Outlier ์กด์žฌ์—ฌ๋ถ€ ํ™•์ธ
  3. ์ค‘๋ณต ๋‚ ์งœ ์กด์žฌ์—ฌ๋ถ€ ํ™•์ธ

2.1. ๊ฒฐ์ธก์น˜ ์กด์žฌ์—ฌ๋ถ€ ํ™•์ธ

kospi.isnull().sum()
>> Out[26]

Date      0
Close     0
Open      0
High      0
Low       0
Volume    0
Change    0
dtype: int64

๐Ÿ“ˆ ๊ฒฐ์ธก์น˜๋Š” ์กด์žฌํ•˜์ง€ ์•Š์Œ์„ ํ™•์ธํ•˜์˜€์Šต๋‹ˆ๋‹ค.

2.2. Outlier ์กด์žฌ์—ฌ๋ถ€ ํ™•์ธ

# boxplot์„ ์‚ฌ์šฉํ•œ outlier ๊ฒ€์ฆ
fig = go.Figure()
fig.add_trace(go.Box(x = kospi['Close'], name = 'Close'))
fig.add_trace(go.Box(x = kospi['Open'], name = 'Open'))
fig.add_trace(go.Box(x = kospi['High'], name = 'High'))
fig.add_trace(go.Box(x = kospi['Low'], name = 'Low'))
fig.show()


# Change attribute ํ™•์ธ - ์ด์ƒ์—†์Œ
fig = go.Figure()
fig.add_trace(go.Box(x = kospi['Change'], name = 'Change'))
fig.show()  


# Volume attribute ํ™•์ธ - max ๊ฐ’ ์˜์‹ฌ์Šค๋Ÿฌ์›€
fig = go.Figure()
fig.add_trace(go.Box(x = kospi['Volume'], name = 'Volume'))
fig.show()  


๐Ÿ“ˆ Outlier ํ™•์ธ ์ค‘์— Volume attribute์— ๋Œ€ํ•ด์„œ max๊ฐ’์ด ์ง€๋‚˜์น˜๊ฒŒ ํฌ๋‹ค๋Š” ์ƒ๊ฐ์ด ๋“ค์–ด ํ™•์ธํ•ด ๋ณด์•˜์Šต๋‹ˆ๋‹ค. ๋„ค์ด๋ฒ„๊ธˆ์œต์—์„œ ํ™•์ธํ•œ ๊ฒฐ๊ณผ ์ด์ƒ์—†๋Š” ๋ฐ์ดํ„ฐ๋กœ ๋‚˜์™”์œผ๋ฉฐ, ์ธ์ ‘ํ•œ ๋‚ ๋“ค๊ณผ ๋น„๊ตํ•˜์—ฌ ์œ ์ผํ•˜๊ฒŒ ์ƒ์Šน์„ธ๋ฅผ ๋ณด์ธ ์žฅ์ด์—ˆ๊ธฐ ๋•Œ๋ฌธ์— ์ด์™€ ๊ฐ™์ด ๋†’์€ ๊ฑฐ๋ž˜๋Ÿ‰์„ ๋ณด์ธ ๊ฒƒ์ด ์•„๋‹๊นŒ ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค.

kospi[kospi['Volume']==kospi['Volume'].max()]
>> Out[30]

            Date    Close    Open    High     Low       Volume  Change
------------------------------------------------------------------------
10644 2021-02-19  3107.62 3089.96 3109.67 3040.28 3.460000e+09  0.0068


๐Ÿ“ˆ ๋ฐ์ดํ„ฐ ํ™•์ธ ๊ฒฐ๊ณผ ๊ฒฐ์ธก์น˜๋‚˜ ์ด์ƒ์น˜ ๋“ฑ์˜ ๋ฌธ์ œ๋Š” ์กด์žฌํ•˜์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ๋ณธ ๋ฐ์ดํ„ฐ์—์„œ ๋ณ„๋„์˜ ์ „์ฒ˜๋ฆฌ ์—†์ด ๋ถ„์„ํ•˜๊ธฐ๋กœ ๊ฒฐ์ •ํ•˜์˜€์Šต๋‹ˆ๋‹ค.


๐Ÿ† ๋‹ค์Œ ํฌ์ŠคํŒ…์—์„œ๋Š” ์ด๋™ํ‰๊ท ์„ ์„ ์‚ฌ์šฉํ•˜์—ฌ KOSPI๋ฅผ ๋ถ„์„ํ•  ๊ฒƒ์ž…๋‹ˆ๋‹ค!!


Leave a comment