Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | |||
5 | 6 | 7 | 8 | 9 | 10 | 11 |
12 | 13 | 14 | 15 | 16 | 17 | 18 |
19 | 20 | 21 | 22 | 23 | 24 | 25 |
26 | 27 | 28 | 29 | 30 | 31 |
Tags
- JSON
- mysql
- 핑거스타일
- pandas
- OpenCV
- YOLO
- error
- Visual Studio
- 명령어
- Docker
- C++
- Python
- Selenium
- Numpy
- Linux
- pytorch
- 채보
- paramiko
- C#
- VS Code
- ubuntu
- 프로그래머스
- 기타 연주
- label
- LIST
- C
- windows forms
- SSH
- pip
- 오류
Archives
- Today
- Total
기계는 거짓말하지 않는다
Python Scikit Learn Iris 꽃 분석, Classification 본문
Scikit Learn 라이브러리에서 제공하는 데이터 셋 중 Iris 꽃의 데이터이다.
from sklearn.datasets import load_iris
iris_dataset = load_iris()
print(type(iris_dataset)
# sklearn.utils.Bunch
print(iris_dataset.keys())
# dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename'])
print(iris_dataset['DESCR'][:193])
'''
.. _iris_dataset:
Iris plants dataset
--------------------
**Data Set Characteristics:**
:Number of Instances: 150 (50 in each of three classes)
:Number of Attributes: 4 numeric, pre
'''
Target과 Features가 무엇인지 확인할 수 있다.
print('Target names: ', iris_dataset['target_names'])
# Target names: ['setosa' 'versicolor' 'virginica']
print('Features names: \n', iris_dataset['feature_names'])
'''
Features names:
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
'''
Target, Feature Data, Shape
print(iris_dataset['data'])
'''
array([[5.1, 3.5, 1.4, 0.2],
[4.9, 3. , 1.4, 0.2],
[4.7, 3.2, 1.3, 0.2],
[4.6, 3.1, 1.5, 0.2],
[5. , 3.6, 1.4, 0.2],
[5.4, 3.9, 1.7, 0.4],
[4.6, 3.4, 1.4, 0.3],
...
'''
print('target:\n', iris_dataset['target'])
'''
target:
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]
'''
print(iris_dataset['data'].shape, iris_dataset['target'].shape)
# (150, 4) (150,)
Split Data
from sklearn.model_selection import train_test_split
# 기본 test_size=0.25. shuffle은 기본 True이며, 순차 데이터를 원할 경우 False로 설정
# stratify=iris_dataset['target']를 설정하면 train, test dataset의 class 비율을 동일하게 나눌 수 있음
# random_state를 변경하면 다른 순서로 섞이게 됨
x_train, x_test, y_train, y_test = \
train_test_split(iris_dataset['data'], iris_dataset['target'], test_size=0.25, random_state = 0)
print('x_train shape: ', x_train.shape)
print('y_train shape: ', y_train.shape)
'''
x_train shape: (112, 4)
y_train shape: (112,)
'''
print('x_test shape: ', x_test.shape)
print('y_test shape: ', y_test.shape)
'''
x_test shape: (38, 4)
y_test shape: (38,)
'''
Model Select
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)
logistic = LogisticRegression(max_iter=300)
Training, Prediction
# training
knn.fit(x_train, y_train)
logistic.fit(x_train, y_train)
# prediction
logistic_y_pred = logistic.predict(x_test)
knn_y_pred = knn.predict(x_test)
Score
print('Test set score(logistic): ', np.mean(logistic_y_pred == y_test))
print('Test set score(knn): ', np.mean(knn_y_pred == y_test))
'''
Test set score(logistic): 0.9736842105263158
Test set score(knn): 0.9736842105263158
# split시에 stratify=iris_dataset['target']를 설정할 경우
Test set score(logistic): 1.0
Test set score(knn): 0.9736842105263158
'''
'AI' 카테고리의 다른 글
Pytorch torch 버전, GPU 사용 유무, 이름 확인, device 설정 (0) | 2021.08.18 |
---|---|
torchvision VOC dataset download 문제 (0) | 2021.08.15 |
회귀 분석(Regression) (0) | 2021.08.09 |
Pytorch model save and load (0) | 2021.08.08 |
Machine Learning의 간단한 용어와 이해 (0) | 2021.08.02 |
Comments