1.DataFrame

๐ŸŒต ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์€ ์‹œ๋ฆฌ์ฆˆ๋กœ ์ด๋ฃจ์–ด์ง„ ๋ฐ์ดํ„ฐ์ž…๋‹ˆ๋‹ค.
๐ŸŒต row ์™€ column์œผ๋กœ ๊ตฌ์„ฑ๋˜๋Š” Table ํ˜•์‹์ž…๋‹ˆ๋‹ค.

2.Dafatrame ์ƒ์„ฑ - pd.DataFrame( ) ํ•จ์ˆ˜

2.1 index๋ฅผ ๋”ฐ๋กœ ์ง€์ •ํ•ด์„œ ๋งŒ๋“ค๊ธฐ


import pandas as pd

๐ŸŒตpd.DataFrame(๋”•์…”๋„ˆ๋ฆฌ, index)

df1 = pd.DataFrame({"a" : [1,2,3],
                    "b" : [4,5,6],
                    "c" : [7,8,9]}, index=[1,2,3])
df1
>> 
    a   b   c
1   1   4   7
2   2   5   8
3   3   6   9

2.2. index๋ฅผ ๋”ฐ๋กœ ์ง€์ •ํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ


๐ŸŒตpd.DataFrame(๋”•์…”๋„ˆ๋ฆฌ)

df2 = pd.DataFrame({"a" : [1,2,3],
                    "b" : [4,5,6],
                    "c" : [7,8,9]})
df2
>> 
    a   b   c
0   1   4   7
1   2   5   8
2   3   6   9

index๋ฅผ ๋”ฐ๋กœ ์ง€์ •ํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ default๋Š” 0,1,2โ€ฆ ๋กœ ์ž๋™ ์„ค์ •๋ฉ๋‹ˆ๋‹ค.

2.3. ๋ฆฌ์ŠคํŠธ๋กœ ๋งŒ๋“ค๊ธฐ


๐ŸŒตpd.DataFrame(๋ฆฌ์ŠคํŠธ, columns, index)

df3 = pd.DataFrame(data = [[1,4,7],[2,5,8],[3,6,9]], 
                   columns = ['a','b','c'], 
                   index = [1,2,3])
df3
>> 
    a   b   c
1   1   4   7
2   2   5   8
3   3   6   9

2.4. ๋ฉ€ํ‹ฐ์ธ๋ฑ์Šค ๋งŒ๋“ค๊ธฐ


๐ŸŒตpd.MultiIndex(์ธ๋ฑ์Šค ๋ฆฌ์ŠคํŠธ, names)

df4 = pd.DataFrame({"a" : [1,2,3],
                    "b" : [4,5,6],
                    "c" : [7,8,9]},
                    index = pd.MultiIndex.from_tuples(
                    [('d',1), ('d',2), ('e',1)], names=['n','v']))
df4
>> 
        a   b   c
n   v
d   1   1   4   7
    2   2   5   8
e   1   3   6   9

MultiIndex๋ฅผ ์ฒ˜์Œ๋ถ€ํ„ฐ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ๋Š” ๊ฑฐ์˜ ๋ณด์ง€ ๋ชปํ–ˆ๋Š”๋ฐ ์ƒˆ๋กœ์›Œ์„œ ๊ฐ€์ ธ์™€๋ดค์Šต๋‹ˆ๋‹ค.

pandas์—์„œ ์ œ๊ณตํ•˜๋Š” ํฌ๋งท์ด๊ธฐ์— ํ•„์š”์— ๋”ฐ๋ผ ์‚ฌ์šฉํ•˜์‹œ๋Š” ๊ฒƒ๋„ ์ข‹์„ ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค!!

3. DataFrame์— ์ ‘๊ทผํ•˜๊ธฐ

df = pd.DataFrame({"a" : [1,2,3],
                   "b" : [4,5,6],
                   "c" : [7,8,9]})
df
>> 
    a   b   c
0   1   4   7
1   2   5   8
2   3   6   9

3.1. DataFrame ์ •๋ณด์–ป๊ธฐ


๐ŸŒต df.index - ์ธ๋ฑ์Šค ์ •๋ณด

df.index
>> RangeIndex(start=0, stop=3, step=1)

๐ŸŒต df.columns - ์—ด ์ •๋ณด

df.columns
>> Index(['a', 'b', 'c'], dtype='object')

๐ŸŒต df.values - ๊ฐ’ ์ •๋ณด

df.values
>> array([[1, 4, 7],
          [2, 5, 8],
          [3, 6, 9]], dtype=int64)

๐ŸŒต df.shape - ํ˜•ํƒœ ์ •๋ณด

df.shape
>> (3, 3)

๐ŸŒต df. info() - ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ ์ •๋ณด

df.info()
>> <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 3 entries, 0 to 2
    Data columns (total 3 columns):
     #   Column  Non-Null Count  Dtype
    ---  ------  --------------  -----
     0   a       3 non-null      int64
     1   b       3 non-null      int64
     2   c       3 non-null      int64
    dtypes: int64(3)
    memory usage: 200.0 bytes

ํŠนํžˆ df. info() ํ•จ์ˆ˜๋Š” ๊ฒฐ์ธก๊ฐ’์˜ ์œ ๋ฌด๋ฅผ ๋ณด์—ฌ์ฃผ๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ ๊ณผ์ •์—์„œ ์ƒ๋‹นํžˆ ์œ ์šฉํ•˜๊ฒŒ ์“ฐ์ด๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค.

3.2. ์—ด์„ ์ธ๋ฑ์Šค๋กœ ๋งŒ๋“ค๊ธฐ


๐ŸŒต df.set_index(์—ด)

df.set_index('a')
>> 
    b   c
a
1   4   7
2   5   8
3   6   9

๋ฆฌ์ŠคํŠธ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ์—ด์„ ์ธ๋ฑ์Šค๋กœ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

df.set_index(['a', 'b'])
>> 
       c
a  b
1  4   7
2  5   8
3  6   9

3.3. ์ธ๋ฑ์Šค ์ดˆ๊ธฐํ™”ํ•˜๊ธฐ


๐ŸŒต df.reset_index()

drop ์˜ต์…˜์€ default ๊ฐ€ False ์ด๊ธฐ ๋•Œ๋ฌธ์— ๊ธฐ์กด์˜ ์ธ๋ฑ์Šค๋ฅผ ์ฒซ๋ฒˆ์งธ์—ด๋กœ ์‚ฝ์ž…ํ•ฉ๋‹ˆ๋‹ค.

df.reset_index(drop = False)
>> 
    index   a   b   c
0       0   1   4   7
1       1   2   5   8
2       2   3   6   9

drop=True ๋กœ ์ง€์ •ํ•˜๋ฉด ๊ธฐ์กด์˜ ์ธ๋ฑ์Šค๋ฅผ ๋ฒ„๋ฆฌ๊ณ  ์ •๋ ฌํ•ด์ค๋‹ˆ๋‹ค.

df.reset_index(drop = True)
>> 
    a   b   c
0   1   4   7
1   2   5   8
2   3   6   9

Leave a comment