πŸ† μ €λ²ˆ λ―Έκ΅­ 인ꡬ 데이터 이후 λ‘λ²ˆμ§Έ 데이터뢄석 μ‹€μŠ΅ ν¬μŠ€νŒ…μ΄λ‹€.
πŸ† μ΄λ²ˆμ—λŠ” seaborn λΌμ΄λΈŒλŸ¬λ¦¬μ—μ„œ μ œκ³΅ν•˜λŠ” planets λ°μ΄ν„°μ…‹μœΌλ‘œ μ‹€μŠ΅μ„ 진행해볼것이닀.

1. 데이터 뢈러였기

import pandas as pd

import seaborn as sns
planets = sns.load_dataset('planets')
planets.shape
>> (1035, 6)
planets.head()
>>
        method	        number	orbital_period	mass	distance   year
0	Radial Velocity	1	269.300	        7.10	77.40	   2006
1	Radial Velocity	1	874.774	        2.21	56.95	   2008
2	Radial Velocity	1	763.000	        2.60	19.84	   2011
3	Radial Velocity	1	326.030	        19.40	110.62	   2007
4	Radial Velocity	1	516.220	        10.50	119.47	   2009

ν–‰μ„±μ˜ 발견 방법 / 개수 / ꢀ도주기 / μ§ˆλŸ‰ / 거리 / λ°œκ²¬λ…„λ„ λ₯Ό λ‚˜νƒ€λ‚Έ λ°μ΄ν„°μž„μ„ μ•Œ 수 μžˆλ‹€.

planets.info()
>>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1035 entries, 0 to 1034
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   method          1035 non-null   object 
 1   number          1035 non-null   int64  
 2   orbital_period  992 non-null    float64
 3   mass            513 non-null    float64
 4   distance        808 non-null    float64
 5   year            1035 non-null   int64  
dtypes: float64(3), int64(2), object(1)
memory usage: 48.6+ KB

orbital_period / mass / distance 열에 NaN 값이 μ‘΄μž¬ν•¨μ„ 확인할 수 μžˆλ‹€.
ν•˜μ§€λ§Œ 이 값듀은 λ‚˜μ€‘μ— μš°λ¦¬κ°€ μ›ν•˜λŠ” 데이터λ₯Ό μ„ νƒν•˜λŠ” κ²½μš°μ— μ²˜λ¦¬ν•΄μ£Όλ„λ‘ ν•˜μž.

2. Findnig numbers per method

πŸ† κ΄€μΈ‘ 방법에 따라 λ°œκ²¬ν•œ 개수λ₯Ό μ•Œμ•„λ³΄μž.

방법에 따라 λ°œκ²¬ν•œ 개수λ₯Ό μ•Œμ•„λ³΄κΈ° μœ„ν•΄μ„œλŠ” method 와 number 열이 ν•„μš”ν•˜λ‹€.
이 λ‘κ°œμ˜ μ—΄λ§Œ κ°€μ Έμ˜¨ 데이터λ₯Ό .copy() ν•¨μˆ˜λ₯Ό μ¨μ„œ planets_number 라고 κ°€μ Έμ˜€μž.

planets_number = planets[['method', 'number']].copy() 

planets_number λ₯Ό method 열을 κΈ°μ€€μœΌλ‘œ κ·Έλ£Ήν™”ν•˜κ³ , sum 으둜 μ§‘κ³„ν•΄μ£Όμž.
κ°€μž₯ 많이 λ°œκ²¬ν•œ 방법 순으둜 λ‚˜μ—΄ν•˜κΈ° μœ„ν•΄μ„œ .sort_values() ν•¨μˆ˜λ₯Ό μ‚¬μš©ν•œλ‹€.
λ‚΄λ¦Όμ°¨μˆœμ΄λ―€λ‘œ ascending = False λ₯Ό λ„£μ–΄μ£Όμž.

planets_number = planets_number.groupby('method').sum().sort_values('number', ascending=False)
planets_number.head()
>>
                                number
                   method	
Radial Velocity	                952
Transit	                        776
Imaging	                        50
Microlensing	                27
Eclipse Timing Variations	15

2.1. iplot μ‹œκ°ν™”


πŸ† λ‹¨μˆœ λ²”μ£Όν˜• 데이터이기 λ•Œλ¬Έμ— λ°” κ·Έλž˜ν”„μ™€ 원 κ·Έλž˜ν”„λ‘œ 직관적인 μ°¨μ§€μœ¨μ„ μ•Œμ•„λ³΄λ„λ‘ ν•˜μž.

import chart_studio.plotly as py
import cufflinks as cf
cf.go_offline(connected = True)

2.1.1. Bar graph


layout = {
    'title' : {'text' : '<b>Findnig numbers per method</b>',
               'font' : {'size' : 25},
               'x' : 0.5, 'y' : 0.9
              },
    
    'xaxis' : {'showticklabels' : True,
              'title' : {'text' : 'Method', 'font' : {'size' : 20}}},
    
    'yaxis' : {'showticklabels' : True,
              'dtick' : '100',
              'title' : {'text' : 'numbers', 'font' : {'size' : 20}}}   
}
planets_number.iplot(kind = 'bar', theme = 'white', layout = layout)

2.1.2. Pie graph


planets_number_df = planets[['method', 'number']]
planets_number_df.iplot(kind = 'pie', theme = 'white', labels = 'method', values = 'number')

2.2. plotly μ‹œκ°ν™”


import plotly.graph_objects as go
import plotly.offline as pyo
pyo.init_notebook_mode()

2.2.1. Bar graph


fig = go.Figure()
fig.add_trace(
    go.Bar(
    x = planets_number.index, y = planets_number['number']))

fig.update_layout(
    {
    'title' : {'text' : '<b>Findnig numbers per method</b>', 'font' : {'size' : 25}, 'x' : 0.5, 'y' : 0.9},
    'xaxis' : {'showticklabels' : True, 'title' : {'text' : 'Method', 'font' : {'size' : 15}}},
    'yaxis' : {'showticklabels' : True, 'dtick' : 100, 'title' : {'text' : 'Number', 'font' : {'size' : 15}}},
    'template' : 'plotly_white'
    })

fig.show()             

πŸ‘‰ μœ„μ˜ λ§‰λŒ€κ·Έλž˜ν”„λ₯Ό 보면 확인할 수 μžˆκ² μ§€λ§Œ Redial Velocity λ‹€μŒμœΌλ‘œ Transit이 κ°€μž₯ λ§Žμ€ κ΄€μΈ‘μΉ˜λ₯Ό λ³΄μœ ν•˜κ³  있으며, κ·Έ μ™Έμ˜ λ‚˜λ¨Έμ§€ 방법듀은 κ·Έλ ‡κ²Œ 큰 비쀑을 μ°¨μ§€ν•˜κ³  μžˆμ§€λŠ” μ•ŠμŒμ„ 확인할 수 μžˆλ‹€. 이 데이터λ₯Ό κ°€μž₯ 잘 μ‚¬μš©ν•  수 μžˆλŠ” 방법은 년도와 관츑방법에 λ”°λ₯Έ κ΄€μΈ‘μΉ˜λ₯Ό μ •λ¦¬ν•˜λŠ” 것이라고 μƒκ°ν•œλ‹€.

2.2.2. Pie graph


fig = go.Figure()
fig.add_trace(
    go.Pie(
        labels = planets_number_df['method'], values = planets_number_df['number']
    ))

fig.show()

πŸ† Pie graphλ₯Ό 보면 μ•Œ 수 μžˆλ“―μ΄ λ„ˆλ¬΄ λ§Žμ€ μš”μ†Œλ“€μ΄ 듀어가버리면 가독성이 많이 떨어진닀.
πŸ† λ”°λΌμ„œ Pie graphλ₯Ό 그릴 λ•ŒλŠ” ν•„μš”ν•œ λͺ‡κ°€μ§€ μš”μ†Œλ§Œ κ°€μ Έμ™€μ„œ μ‹œκ°ν™”ν•˜λŠ” 것이 λ‚˜μ„ 것이라고 μƒκ°ν•œλ‹€.
πŸ† λ‹€μŒ ν¬μŠ€νŒ…μ—μ„œλŠ” ν•œ 가지 κ΄€μΈ‘λ°©λ²•μ—λ§Œ μ§‘μ€‘ν•΄μ„œ 데이터λ₯Ό μ „μ²˜λ¦¬ν•˜κ³  μ‹œκ°ν™”ν•˜λŠ” λ‚΄μš©μ„ 닀뀄보겠닀.

Leave a comment