Python Scikit Learn Iris 꽃 분석, Classification

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

기계는 거짓말하지 않는다

Python Scikit Learn Iris 꽃 분석, Classification 본문

Python Scikit Learn Iris 꽃 분석, Classification

KillinTime 2021. 8. 12. 20:18

Scikit Learn 라이브러리에서 제공하는 데이터 셋 중 Iris 꽃의 데이터이다.

from sklearn.datasets import load_iris

iris_dataset = load_iris()
print(type(iris_dataset)
# sklearn.utils.Bunch

print(iris_dataset.keys())
# dict_keys(['data', 'target', 'frame', 'target_names', 'DESCR', 'feature_names', 'filename'])

print(iris_dataset['DESCR'][:193])
'''
.. _iris_dataset:

Iris plants dataset
--------------------

**Data Set Characteristics:**

    :Number of Instances: 150 (50 in each of three classes)
    :Number of Attributes: 4 numeric, pre
'''

Target과 Features가 무엇인지 확인할 수 있다.

print('Target names: ', iris_dataset['target_names'])
# Target names:  ['setosa' 'versicolor' 'virginica']
print('Features names: \n', iris_dataset['feature_names'])
'''
Features names: 
 ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
'''

Target, Feature Data, Shape

print(iris_dataset['data'])
'''
array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2],
       [5.4, 3.9, 1.7, 0.4],
       [4.6, 3.4, 1.4, 0.3],
       ...
'''

print('target:\n', iris_dataset['target'])
'''
target:
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]
 '''
 
 print(iris_dataset['data'].shape, iris_dataset['target'].shape)
 # (150, 4) (150,)

Split Data

from sklearn.model_selection import train_test_split

# 기본 test_size=0.25. shuffle은 기본 True이며, 순차 데이터를 원할 경우 False로 설정
# stratify=iris_dataset['target']를 설정하면 train, test dataset의 class 비율을 동일하게 나눌 수 있음
# random_state를 변경하면 다른 순서로 섞이게 됨
x_train, x_test, y_train, y_test = \
	train_test_split(iris_dataset['data'], iris_dataset['target'], test_size=0.25, random_state = 0)
    
print('x_train shape: ', x_train.shape)
print('y_train shape: ', y_train.shape)
'''
x_train shape:  (112, 4)
y_train shape:  (112,)
'''

print('x_test shape: ', x_test.shape)
print('y_test shape: ', y_test.shape)
'''
x_test shape:  (38, 4)
y_test shape:  (38,)
'''

Model Select

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=1)
logistic = LogisticRegression(max_iter=300)

Training, Prediction

# training
knn.fit(x_train, y_train)
logistic.fit(x_train, y_train)

# prediction
logistic_y_pred = logistic.predict(x_test)
knn_y_pred = knn.predict(x_test)

Score

print('Test set score(logistic): ', np.mean(logistic_y_pred == y_test))
print('Test set score(knn): ', np.mean(knn_y_pred == y_test))

'''
Test set score(logistic):  0.9736842105263158
Test set score(knn):  0.9736842105263158

# split시에 stratify=iris_dataset['target']를 설정할 경우
Test set score(logistic):  1.0
Test set score(knn):  0.9736842105263158
'''

'AI' 카테고리의 다른 글

PyTorch torch 버전, GPU 사용 유무, 이름 확인, device 설정 (0)	2021.08.18
torchvision VOC dataset download 문제 (0)	2021.08.15
회귀 분석(Regression) (0)	2021.08.09
PyTorch model save and load (0)	2021.08.08
Machine Learning의 간단한 용어와 이해 (0)	2021.08.02

'AI' Related Articles

Comments

기계는 거짓말하지 않는다

Python Scikit Learn Iris 꽃 분석, Classification 본문

Python Scikit Learn Iris 꽃 분석, Classification

Target, Feature Data, Shape

Split Data

Model Select

Training, Prediction

Score

'AI' 카테고리의 다른 글

티스토리툴바