'pandas' 태그의 글 목록

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

목록pandas (7)

기계는 거짓말하지 않는다

Python Pandas 조건에 맞는 데이터 일정 비율 추출

DataFrame에서 조건을 만족하는 행 중 일정 비율 추출하는 예시이다. count가 50보다 작은 행들 중 40% 데이터만 랜덤으로 추출하려면 아래와 같이 할 수 있다. import pandas as pd import random df = pd.read_csv("custom_data.csv", encoding="utf-8") # count가 50보다 큰 행의 인덱스 rows_to_select = df[df["count"] < 50].index print(rows_to_select) print(df.iloc[rows_to_select]) print("-" * 50) # 조건에 맞는 인덱스 중 랜덤하게 40% 추출 rows_to_select = list(rows_to_select) random.seed(..

Python 2023. 5. 6. 00:01

Python Pandas 다수 열의 조건 값 추출

import pandas as pd df = pd.read_csv("custom_data.csv", encoding="utf-8") # & (and), | (or) print(df[(df["count"] >= 200) & (df["price"] >= 1000)]) # ~ (not) print(df[~(df["price"] >= 500) | ~(df["count"] > 10)]) ''' query를 이용하여 다중 조건을 한 번에 처리할 수 있다. Column에 ` (Backtick)을 사용하는 이유는 Column의 문자열에 특수문자, 띄어쓰기가 포함될 경우 오류가 발생하기 때문이다. ''' query_string = "`count` >= 200 & `price` >= 1000" print(df.query(..

Python 2023. 1. 21. 20:23

Python Pandas 특정 열의 특정 값, 특정 열의 다수 값 추출

import pandas as pd df = pd.read_csv("custom_data.csv", encoding="utf-8") print(df[df["price"] == 500]) # price의 값이 500 # price의 값이 500 또는 100 print(df[df["price"].isin([500, 100])]) # 조건 print(df[(df["price"] == 500) | (df["price"] == 100)])

Python 2023. 1. 21. 19:47

Python Pandas 기본통계

기본통계 import pandas as pd data = pd.read_csv("임의데이터.csv", encoding="euc-kr", index_col="번호") print(data) print("=" * 30) print(data.describe().round(3)) # 요약 print("-" * 30) print(data[["수량", "단가"]].mean()) # 수량, 단가의 평균 print("-" * 30) print(data[["수량", "단가"]].max()) # 수량, 단가의 최댓값 print("-" * 30) print(data[["수량", "단가"]].min()) # 수량, 단가의 최솟값 print("-" * 30) print(data.loc[[1, 3]].mean()) # 행 선택 후..

Python 2021. 7. 10. 17:23

Python Pandas 정렬

Series import pandas as pd fruit = pd.Series([2500, 3800, 1200, 6000], index=['apple', 'banana', 'peer', 'cherry']) print(fruit) print('=' * 20) print(fruit.sort_values(ascending=False)) print('-' * 20) print(fruit.sort_index()) values, index 정렬 가능, axis(축)설정과 ascending(오름차순 or 내림차순)설정도 가능하다. DataFrame 동일하게 axis, ascending 설정 가능 index 정렬 fruitData = {'fruitName':['peer','banana','apple','cherry'..

Python 2021. 7. 3. 14:56

Python Pandas 기본 연산

Series 연산 fruit1 = Series([5, 9, 10, 3], index=['apple', 'banana', 'cherry', 'peer']) fruit2 = Series([3, 2, 9, 5, 10], index=['apple', 'orange', 'banana', 'cherry', 'mango']) print(fruit1) print('-' * 20) print(fruit2) print('=' * 20) print(fruit1 + fruit2) +, -, *, /, %, // (몫 연산) 모두 가능하며 겹치지 않는 인덱스의 value는 계산되지 않고 NaN으로 표기된다. DataFrame 연산 fruitData1 = {'Ohio':[4, 8, 3, 5], 'Texas':[0, 1, 2, 3..

Python 2021. 7. 3. 13:44

Python Pandas(Panel Data)

Pandas 파이썬에서 데이터 분석, 조작을 위해 사용되는 라이브러리이다. Pandas에서 제공하는 데이터 자료구조는 Series와 DataFrame 두 가지가 존재한다. Series는 시계열(time series: 일정 시간 간격으로 배치된 데이터들의 수열)과 유사한 데이터로써 index와 value가 있고, DataFrame은 딕셔너리 데이터를 매트릭스 형태로 만들어 준 것 같은 frame을 가지고 있다. 이런 데이터 구조를 통해 시계열, 비시계열 데이터를 통합하여 다룰 수 있다. Install command 창에서 pip install pandas 입력 (pip 패키지 관리자가 있어야 함) Pandas를 사용하기 위해 import pandas를 사용 관행적으로 pd 라는 별칭을 사용하여 import..

Python 2021. 7. 1. 23:22

Prev 1 Next

목록pandas (7)

기계는 거짓말하지 않는다

티스토리툴바