ππ 3. λ―Έκ΅ μ°λλ³ μΆμ λΉλλ₯Ό μμ보μ (1)
π λ―Έκ΅ μ§λ³ κ΄λ¦¬μ²(CDC) μμ μ 곡νλ μΆμ μΈκ΅¬ μλ₯Ό κ°μ§κ³ λ°μ΄ν° λΆμμ μ§νν΄ λ³΄μ.
π μ΄λ² ν¬μ€ν μμλ pd . to_datetime( ) ν¨μλ₯Ό μ¬μ©νλ λ²κ³Ό μ€κ° λΆλΆμ λ°μ΄ν°λ₯Ό μ¬λΌμ΄μ± νλ λ²μ μ€μ μΌλ‘ μ§ννλ€.
1. λ°μ΄ν° λΆλ¬μ€κΈ°
import pandas as pd
birth = pd.read_csv("Data/births.csv", encoding = 'utf-8-sig')
birth.head()
>>
year month day gender births
0 1969 1 1.0 F 4046
1 1969 1 1.0 M 4440
2 1969 1 2.0 F 4454
3 1969 1 2.0 M 4548
4 1969 1 3.0 F 4548
λ , μ, μΌ, μ±λ³, μΆμ μμ λ°μ΄ν°λ₯Ό κ°μ§λ κ²μ νμΈνμ.
2. λ°μ΄ν° μ μ²λ¦¬νκΈ°
birth.info()
>>
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 15547 entries, 0 to 15546
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 year 15547 non-null int64
1 month 15547 non-null int64
2 day 15067 non-null float64
3 gender 15547 non-null object
4 births 15547 non-null int64
dtypes: float64(1), int64(3), object(1)
memory usage: 607.4+ KB
μμ info( ) ν¨μ μΆλ ₯μΌλ‘λΆν° dayμ λν μ λ³΄κ° μλ κ²μ νμΈν μ μλ€. μ΄λ ꡬκ°μ΄ κ²°μΈ‘μΉλ₯Ό κ°λμ§ νμΈνκΈ° μν΄ .isnull( ) ν¨μλ₯Ό μ¬μ©ν΄λ³΄μ.
birth[birth['day'].isnull()]
>>
year month day gender births
15067 1989 1 NaN F 156749
15068 1989 1 NaN M 164052
15069 1989 2 NaN F 146710
15070 1989 2 NaN M 154047
15071 1989 3 NaN F 165889
... ... ... ... ... ...
15542 2008 10 NaN M 183219
15543 2008 11 NaN F 158939
15544 2008 11 NaN M 165468
15545 2008 12 NaN F 173215
15546 2008 12 NaN M 181235
480 rows Γ 5 columns
μ΄λ κ² 1989 ~ 2008 κΈ°κ°μ day μ λν μ λ³΄κ° μμμ νμΈν μ μλ€. μμΌλ‘ μμ μ μ²λ¦¬ κ³Όμ μ birth λ°μ΄ν°νλ μμ 1989λ μ΄μ κ³Ό μ΄νλ‘ λλ μ μ§νν κ²μ΄λ€.
λ°μ΄ν°νλ μμ λλλ λ°©λ²μ μ ν μ΄λ ΅μ§ μλ€. μ΄μ°¨νΌ μΈλ±μ±μ΄ λ€ λμ΄ μλ μνμ΄κΈ° λλ¬Έμ μΈλ±μ€λ‘ μ¬λΌμ΄μ± μ ν΄μ£Όλ©΄ λλ€.
μΈλ±μ€ λ²νΈ 15067μ΄ 1989λ λμ 첫λ²μ§Έ λ°μ΄ν°λ₯Ό μλ―Ένλ―λ‘ κ·Έ μκΉμ§ μ¬λΌμ΄μ±νλ©΄ 1988λ λκΉμ§μ λ°μ΄ν°λ₯Ό μ»μ μ μλ€.
birth_til_1988 = birth[:15067]
μ΄μ μ΄ λ°μ΄ν°λ₯Ό λ¨μ±κ³Ό μ¬μ±μΌλ‘ λλλ μ μ²λ¦¬λ₯Ό μ§ννλ€.
π 1969 ~ 1988 μ¬μ± μΆμ λ°μ΄ν°
birth_til_1988_F = birth_til_1988[birth_til_1988['gender'] == 'F'].copy()
birth_til_1988_F = birth_til_1988_F.groupby(by = ['year', 'month']).aggregate({'births' : 'sum'})
birth_til_1988_F = birth_til_1988_F.reset_index()
birth_til_1988_F.tail()
>>
year month births
235 1988 8 173088
236 1988 9 169923
237 1988 10 162361
238 1988 11 153134
239 1988 12 157444
π 1969 ~ 1988 λ¨μ± μΆμ λ°μ΄ν°
birth_til_1988_M = birth_til_1988[birth_til_1988['gender'] == 'M'].copy()
birth_til_1988_M = birth_til_1988_M.groupby(by = ['year', 'month']).aggregate({'births' : 'sum'})
birth_til_1988_M = birth_til_1988_M.reset_index()
birth_til_1988_M.tail()
>>
year month births
235 1988 8 181511
236 1988 9 177354
237 1988 10 169272
238 1988 11 161532
239 1988 12 164883
π 1989 ~ 2008 μ¬μ± μΆμ λ°μ΄ν°
birth_after_1989 = birth[15067:]
birth_after_1989_F = birth_after_1989[birth_after_1989['gender']=="F"].copy()
birth_after_1989_F = birth_after_1989_F[['year','month','births']].copy()
birth_after_1989_F.head()
>>
year month births
15067 1989 1 156749
15069 1989 2 146710
15071 1989 3 165889
15073 1989 4 155689
15075 1989 5 163800
π 1989 ~ 2008 λ¨μ± μΆμ λ°μ΄ν°
birth_after_1989_M = birth_after_1989[birth_after_1989['gender']=="M"].copy()
birth_after_1989_M = birth_after_1989_M[['year','month','births']].copy()
birth_after_1989_M.head()
>>
year month births
15068 1989 1 164052
15070 1989 2 154047
15072 1989 3 174433
15074 1989 4 163432
15076 1989 5 172892
λ§μ§λ§μΌλ‘ μ±λ³λΌλ¦¬ λ°μ΄ν°νλ μμ ν©μ³μ£Όμ.
birth_F = pd.concat([birth_til_1988_F, birth_after_1989_F])
birth_M = pd.concat([birth_til_1988_M, birth_after_1989_M])
π μμ κ³Όμ μ ν΅ν΄μ μνλ ννμ λ°μ΄ν°λ₯Ό λ§λλ λ°μ μ±κ³΅νλ€. λ€μ ν¬μ€ν μμλ μ΄λ₯Ό λ°νμΌλ‘ μλμ λ°μ΄ν°λΆμ μμ μ μ§νν΄ λ³΄μπ.
- μ°λλ³ λ¨μ± / μ¬μ± μΆμ μ λΉκ΅
- μμΌλ³ μΆμ μΆμ΄ νμΈ
- 1969λ μΌλ³ μΆμ μΆμ΄ νμ
- λΆκΈ°λ³ μΆμ μΆμ΄ (1969 ~ 2008)
Leave a comment