πŸ† μ €λ²ˆ ν¬μŠ€νŒ…κ³Ό 같은 planets 데이터λ₯Ό 가지고 μ „μ²˜λ¦¬λ₯Ό μ§„ν–‰ν•΄λ³΄μž.

1. 데이터 뢈러였기

import pandas as pd

import seaborn as sns
planets = sns.load_dataset('planets')
planets.shape
>> (1035, 6)
planets.head()
>>
        method	        number	orbital_period	mass	distance   year
0	Radial Velocity	1	269.300	        7.10	77.40	   2006
1	Radial Velocity	1	874.774	        2.21	56.95	   2008
2	Radial Velocity	1	763.000	        2.60	19.84	   2011
3	Radial Velocity	1	326.030	        19.40	110.62	   2007
4	Radial Velocity	1	516.220	        10.50	119.47	   2009
planets.info()
>>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1035 entries, 0 to 1034
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   method          1035 non-null   object 
 1   number          1035 non-null   int64  
 2   orbital_period  992 non-null    float64
 3   mass            513 non-null    float64
 4   distance        808 non-null    float64
 5   year            1035 non-null   int64  
dtypes: float64(3), int64(2), object(1)
memory usage: 48.6+ KB

orbital_period / mass / distance 열에 NaN 값이 μ‘΄μž¬ν•¨μ„ 확인할 수 μžˆλ‹€.
ν•˜μ§€λ§Œ 이 값듀은 λ‚˜μ€‘μ— μš°λ¦¬κ°€ μ›ν•˜λŠ” 데이터λ₯Ό μ„ νƒν•˜λŠ” κ²½μš°μ— μ²˜λ¦¬ν•΄μ£Όλ„λ‘ ν•˜μž.

2. Finding numbers per year by Redial Velocity since 1995

πŸ† 이전 ν¬μŠ€νŒ…μ—μ„œ 행성을 κ°€μž₯ 많이 λ°œκ²¬ν•œ 관츑방법은 Radial Velocity μž„μ„ 확인할 수 μžˆμ—ˆλ‹€.
πŸ† 이제 이 방법이 년도당 λ°œκ²¬ν•œ ν–‰μ„± μˆ˜λŠ” λͺ‡κ°œμΈμ§€ κ·Έ 좔이λ₯Ό ν•œλ²ˆ μ•Œμ•„λ³΄μž.

planets[planets['method'] == "Radial Velocity"].sort_values('year').head()
>>
        method	        number	orbital_period	mass	distance    year
441	Radial Velocity	1	83.888000	11.6800	40.57	    1989
16	Radial Velocity	1	4.230785	0.4720	15.36	    1995
62	Radial Velocity	1	3.313500	3.9000	15.60	    1996
64	Radial Velocity	4	4.617033	0.6876	13.47	    1996
25	Radial Velocity	1	116.688400	NaN	18.11	    1996

1989λ…„λΆ€ν„° 2014λ…„κΉŒμ§€ μ—¬λŸ¬κ°œμ˜ 행성듀을 λ°œκ²¬ν•œ 것을 확인할 수 μžˆλ‹€.
ν•˜μ§€λ§Œ 1989λ…„κ³Ό 1995λ…„μ˜ 기간이 μƒλ‹Ήνžˆ κΈΈκ³ , 1989년에 발견된 ν–‰μ„± μˆ˜κ°€ ν•˜λ‚˜ 뿐이기 λ•Œλ¬Έμ— 1995λ…„λΆ€ν„° 집계λ₯Ό ν•˜λŠ” 것이 깔끔할 μˆ˜λ„ μžˆμ„κ±° κ°™λ‹€.
일단은 planets λ°μ΄ν„°ν”„λ ˆμž„μ—μ„œ ν•„μš”ν•œ method, number, year μ—΄λ§Œ κ°€μ Έμ™€μ„œ Radial Velocity λ§Œμ„ ν•„ν„°λ§ν•΄λ³΄μž.

planets_number_year = planets[['method', 'number', 'year']].copy()
planets_number_year_Radial =  planets_number_year[planets_number_year['method']=='Radial Velocity']
planets_number_year_Radial.head()
>>
        method	        number	year
0	Radial Velocity	1	2006
1	Radial Velocity	1	2008
2	Radial Velocity	1	2011
3	Radial Velocity	1	2007
4	Radial Velocity	1	2009

이제 ν•„μš”ν•œ λ°μ΄ν„°λ§Œ κ°€μ Έμ˜¬ μ°¨λ‘€λ‹€.
planets_number_year_Radial λ°μ΄ν„°ν”„λ ˆμž„μ€ μ–΄μ°¨ν”Ό Radial Velocity λ§Œμ„ ν•„ν„°λ§ν•œ 데이터이기 λ•Œλ¬Έμ— μš°μ„  method 열을 μ‚­μ œν•΄μ„œ κ°„λž΅ν™”ν•΄μ£Όμž.
κ·Έ λ‹€μŒμ—λŠ” year열을 κΈ°μ€€μœΌλ‘œ κ·Έλ£Ήν™”ν•˜κ³  sum으둜 집계해쀀닀.
그리고 인덱슀λ₯Ό .sort_index() ν•¨μˆ˜λ‘œ μ •λ ¬ν•˜λ©΄ κΉ”λ”ν•œ 데이터가 λ‚˜μ˜¨λ‹€.

del planets_number_year_Radial['method']
planets_number_year_Radial = planets_number_year_Radial.groupby('year').sum().sort_index()
planets_number_year_Radial.head()
>>
        number
year	
1989	1
1995	1
1996	15
1997	1
1998	11

λ§ˆμ§€λ§‰μœΌλ‘œ 1995λ…„ μ΄μ „μ˜ 년도λ₯Ό μ—†μ• μ£ΌκΈ° μœ„ν•΄μ„œ μŠ¬λΌμ΄μ‹±μ„ ν•΄μ£Όμž.

planets_number_year_Radial = planets_number_year_Radial[1:]
planets_number_year_Radial.head()
>>
        number
year	
1995	1
1996	15
1997	1
1998	11
1999	24

2.1. iplot μ‹œκ°ν™”


πŸ† 년도에 λ”°λ₯Έ μΆ”μ„Έλ₯Ό ν•œλˆˆμ— μ•Œμ•„λ³΄κΈ° μ‰½κ²Œ line κ·Έλž˜ν”„λ₯Ό 그렀보자.

import chart_studio.plotly as py
import cufflinks as cf
cf.go_offline(connected = True)

2.1.1. Bar graph


layout = {
    'title' : {'text' : '<b>Finding numbers per year in Redial Velocity since 1995</b>', 
    'font' : {'size':25}, 'x':0.5, 'y':0.9},
    
    'xaxis' : {'showticklabels' : True, 'dtick': 1, 'title' : {'text' : 'Year', 'font' : {'size' : 15}}},
    
    'yaxis' : {'showticklabels' : True, 'title' : {'text' : 'Number', 'font' : {'size' : 15}}}
}

planets_number_year_Radial.iplot(kind = 'scatter', mode = 'lines+markers', layout = layout)

2.2. plotly μ‹œκ°ν™”


πŸ† κ°€μž₯ λ§Žμ€ 행성을 λ°œκ²¬ν•œ ν•΄λ₯Ό μ•Œμ•„λ³΄κΈ° μ‰½κ²Œ bar κ·Έλž˜ν”„μ™€ line κ·Έλž˜ν”„λ₯Ό λ§Œλ“€μ–΄λ³΄μž.

import plotly.graph_objects as go
import plotly.offline as pyo
pyo.init_notebook_mode()

2.2.1. Bar graph

colors = ['#03658C',] * len(planets_number_year_Radial.index)
colors[16] = '#F29F05'
fig = go.Figure()
fig.add_trace(
    go.Bar(
        x = planets_number_year_Radial.index, y = planets_number_year_Radial['number'],
        text = planets_number_year_Radial['number'], 
        textposition='inside', texttemplate = '%{text}', textfont=dict(color = 'white', size = 10),
        marker_color = colors))

fig.update_layout({
    'title' : {'text' : '<b>Finding numbers per year in Redial Velocity since 1995</b>', 
    'font' : {'size':25}, 'x':0.5, 'y':0.9},

    'xaxis' : {'showticklabels' : True, 'dtick' : 1, 'title' : {'text' : 'Year', 'font' : {'size' :  15}}},
    'yaxis' : {'showticklabels' : True, 'title' : {'text' : 'Year', 'font' : {'size' :  15}}},
    'template' : 'plotly_white'
})

fig.add_annotation(
            x = 2011, y = 180,
            text = '<b>2011 : 176</b>',
            showarrow = True,
            font = {'size' : 10, 'color' : '#ffffff'},
            align = 'center',
            arrowhead = 2,
            arrowsize = 1,
            arrowwidth = 2,
            arrowcolor = '#F29F05',
            ax = 40, ay = -30,
            bordercolor = '#F29F05',
            borderwidth = 2,
            borderpad = 4,
            bgcolor = '#F29F05',
            opacity = 0.8
)
fig.show()

2.2.2. Line graph

fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x = planets_number_year_Radial.index, y = planets_number_year_Radial['number'], mode = 'lines+markers'))

fig.update_layout({
    'title' : {'text' : '<b>Finding numbers per year in Redial Velocity since 1995</b>', 
    'font' : {'size' : 25}, 'x' : 0.5, 'y' : 0.9},
    
    'xaxis' : {'showticklabels' : True, 'dtick' : 1, 'title' : {'text' : 'Year', 'font' : {'size' :  15}}},
    'yaxis' : {'showticklabels' : True, 'title' : {'text' : 'Year', 'font' : {'size' :  15}}},
    'template' : 'ggplot2'
})

fig.add_annotation(
            x = 2011, y = 180,
    
            text = '<b>2011 : 176</b>',
            showarrow = True,
            font = {'size' : 10, 'color' : '#ffffff'},
    
            align = 'center',
            arrowhead = 2,
            arrowsize = 1,
            arrowwidth = 2,
            arrowcolor = '#F22E62',
            
    
            ax = 40, ay = -30,
            bordercolor = '#F22E62',
            borderwidth = 2,
            borderpad = 4,
            bgcolor = '#F22E62',
            opacity = 0.8
)

fig.show()

πŸ† μ΄λ ‡κ²Œ ν¬μŠ€νŒ… λ‘κ°œλ₯Ό ν†΅ν•΄μ„œ ν–‰μ„± 데이터λ₯Ό μ „μ²˜λ¦¬ν•˜κ³  μ‹œκ°ν™”ν•˜λŠ” 과정을 ν™•μΈν•΄λ³΄μ•˜λ‹€.
πŸ† 이 데이터λ₯Ό μ²˜λ¦¬ν•˜λ©΄μ„œ μ•„μ‰¬μš΄ 점은 우주 ν–‰μ„±μ˜ 데이터이닀 λ³΄λ‹ˆ 각 νŠΉμ§•λ“€ κ°„μ˜ 관계가 거의 μ—†λ‹€λŠ” μ μ΄μ—ˆλ‹€. μ§ˆλŸ‰μ΄ μ–΄λŠ μ •λ„μΌλ•Œ κΆ€λ„μ£ΌκΈ°λŠ” μ–΄λŠ 정도이닀 λΌλŠ” 관계가 μžˆμ„ 쀄 μ•Œμ•˜λŠ”λ° λ°μ΄ν„°μ˜ ν—€λ“œλ§Œ 보아도 그렇지 μ•Šλ‹€λŠ” 것을 μ‰½κ²Œ μ•Œ 수 μžˆμ—ˆλ‹€. λ¬Όλ‘  λͺ©μ μ— ν•„μš”ν•œ μ—΄μ—λŠ” κ²°μΈ‘μΉ˜κ°€ μ—†λŠ” κΉ”λ”ν•œ λ°μ΄ν„°μ˜€μ§€λ§Œ, κ³΅λΆ€ν•˜λŠ” μž…μž₯μ—μ„œλŠ” μ•½κ°„ 아쉬움이 μžˆμ—ˆλ‹€.
πŸ† ν•˜μ§€λ§Œ Line κ·Έλž˜ν”„λ₯Ό 그리고 직접 annotation을 ν•˜λŠ” 과정이 μž¬λ°Œμ—ˆλ‹€γ…Žγ…Ž

Leave a comment