Notice
Recent Posts
Recent Comments
Link
일 | 월 | 화 | 수 | 목 | 금 | 토 |
---|---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | ||
6 | 7 | 8 | 9 | 10 | 11 | 12 |
13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 22 | 23 | 24 | 25 | 26 |
27 | 28 | 29 | 30 |
Tags
- Linux
- pandas
- C
- Visual Studio
- C++
- VS Code
- JSON
- 핑거스타일
- 오류
- nvidia-smi
- Python
- mysql
- paramiko
- ubuntu
- YOLO
- 프로그래머스
- 기타 연주
- C#
- SSH
- Numpy
- pip
- error
- pytorch
- windows forms
- 컨테이너
- Docker
- 채보
- OpenCV
- Selenium
- label
Archives
- Today
- Total
기계는 거짓말하지 않는다
Python Scikit Learn Iris 꽃 분석, Classification 본문
Scikit Learn 라이브러리에서 제공하는 데이터 셋 중 Iris 꽃의 데이터이다.
from sklearn.datasets import load_iris iris_dataset = load_iris() print(type(iris_dataset) # sklearn.utils.Bunch print(iris_dataset.keys()) # dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename']) print(iris_dataset['DESCR'][:193]) ''' .. _iris_dataset: Iris plants dataset -------------------- **Data Set Characteristics:** :Number of Instances: 150 (50 in each of three classes) :Number of Attributes: 4 numeric, pre '''
Target과 Features가 무엇인지 확인할 수 있다.
print('Target names: ', iris_dataset['target_names']) # Target names: ['setosa' 'versicolor' 'virginica'] print('Features names: \n', iris_dataset['feature_names']) ''' Features names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'] '''
Target, Feature Data, Shape
print(iris_dataset['data']) ''' array([[5.1, 3.5, 1.4, 0.2], [4.9, 3. , 1.4, 0.2], [4.7, 3.2, 1.3, 0.2], [4.6, 3.1, 1.5, 0.2], [5. , 3.6, 1.4, 0.2], [5.4, 3.9, 1.7, 0.4], [4.6, 3.4, 1.4, 0.3], ... ''' print('target:\n', iris_dataset['target']) ''' target: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2] ''' print(iris_dataset['data'].shape, iris_dataset['target'].shape) # (150, 4) (150,)
Split Data
from sklearn.model_selection import train_test_split # 기본 test_size=0.25. shuffle은 기본 True이며, 순차 데이터를 원할 경우 False로 설정 # stratify=iris_dataset['target']를 설정하면 train, test dataset의 class 비율을 동일하게 나눌 수 있음 # random_state를 변경하면 다른 순서로 섞이게 됨 x_train, x_test, y_train, y_test = \ train_test_split(iris_dataset['data'], iris_dataset['target'], test_size=0.25, random_state = 0) print('x_train shape: ', x_train.shape) print('y_train shape: ', y_train.shape) ''' x_train shape: (112, 4) y_train shape: (112,) ''' print('x_test shape: ', x_test.shape) print('y_test shape: ', y_test.shape) ''' x_test shape: (38, 4) y_test shape: (38,) '''
Model Select
from sklearn.linear_model import LogisticRegression from sklearn.neighbors import KNeighborsClassifier knn = KNeighborsClassifier(n_neighbors=1) logistic = LogisticRegression(max_iter=300)
Training, Prediction
# training knn.fit(x_train, y_train) logistic.fit(x_train, y_train) # prediction logistic_y_pred = logistic.predict(x_test) knn_y_pred = knn.predict(x_test)
Score
print('Test set score(logistic): ', np.mean(logistic_y_pred == y_test)) print('Test set score(knn): ', np.mean(knn_y_pred == y_test)) ''' Test set score(logistic): 0.9736842105263158 Test set score(knn): 0.9736842105263158 # split시에 stratify=iris_dataset['target']를 설정할 경우 Test set score(logistic): 1.0 Test set score(knn): 0.9736842105263158 '''
'AI' 카테고리의 다른 글
Pytorch torch 버전, GPU 사용 유무, 이름 확인, device 설정 (0) | 2021.08.18 |
---|---|
torchvision VOC dataset download 문제 (0) | 2021.08.15 |
회귀 분석(Regression) (0) | 2021.08.09 |
Pytorch model save and load (0) | 2021.08.08 |
Machine Learning의 간단한 용어와 이해 (0) | 2021.08.02 |