ch1 붓꽃(iris) 품종분류예제

Recent Posts

Recent Comments

Link

깃헙

Today

Total

06-10 21:24

« 2025/06 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Tags more

Archives

관리 메뉴

Hippo's data

ch1 붓꽃(iris) 품종분류예제 본문

ML(Machine Learning)/책: 파이썬 라이브러리를 활용한 머신러닝(2판)

ch1 붓꽃(iris) 품종분류예제

Hippo's data 2023. 9. 5. 18:27

728x90

머신러닝 입문에서 자주 등장하는 '붓꽃 품종 분류'를 진행해보겠습니다

목적: 품종구분된 데이터로 새로 채집한 붓꽃 품종 예측하기(지도학습 & 분류classification)

클래스(class) - 붓꽃 종류(3개) :'setosa' 'versicolor' 'virginica'
특성(피쳐) (4개) : 'sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'
-> 붓꽃의 꽃잎(petal), 꽃받침(sepal)의 폭(width)과 길이(length)

# 라이브러리 불러오기 & 버전확인

import sys

print("Python 버전:", sys.version)

import pandas as pd

print("pandas 버전:", pd.__version__)

import matplotlib

print("matplotlib 버전:", matplotlib.__version__)

import numpy as np

print("NumPy 버전:", np.__version__)

import scipy as sp

print("SciPy 버전:", sp.__version__)

import IPython

print("IPython 버전:", IPython.__version__)

import sklearn

print("scikit-learn 버전:", sklearn.__version__)

import mglearn

print("mglearn 버전:", mglearn.__version__)

# 저자가 교재 코드 간략히 표현하기 위해 만든 교재전용 라이브러리

# 데이터 적재

-> sklearn의 datasets 모듈에 포함되어 있음 (불러오기만 하면 됨)

from sklearn.datasets import load_iris

iris_dataset = load_iris()

print(type(iris_dataset))

-> <class 'sklearn.utils._bunch.Bunch'>키와 값으로 구성된 Bunch클래스 객체 (딕셔너리와 유사)

print(iris_dataset['DESCR'][:193])

-> 'DESCR'키에 데이터 셋에대한 설명존재

print(iris_dataset['target_names'])

# 예측하려는 붓꽃 품종 이름

->['setosa' 'versicolor' 'virginica']

print(iris_dataset['feature_names'])

# 각 특성

-> ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

print(iris_dataset['data'][:5])

-> 5개 행 확인

[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]]

print(iris_dataset['data'].shape)

-> 붓꽃 데이터의 크기 확인 (shape함수 사용)

(150, 4) -> 150행 4열

print(iris_dataset['target'])

->[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
2 2]

-> 맞출 타겟값 ( 0-> setosa / 1-> versicolor / 2-> virginica)

print(iris_dataset['target_names'])

-> 맞출 타겟값의 인덱스가 의미하는 붓꽃 종류 확인 0,1,2 -> ['setosa' 'versicolor' 'virginica']

# 데이터 세트 분할

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(

iris_dataset['data'], iris_dataset['target'], random_state=0

)

# 대문자X -> 데이터(입력) 2차원배열(행렬) / 소문자y -> 레이블(타겟) - 1차원배열(벡터)

train_test_split -> 데이터를 무작위로 섞고 데이터세트를 분할해줌

-> 기본적으로 75% 훈련세트 / 25% 테스트세트로 분할 / X_train, y_train->112행 / X_test, y_test->38행 (150행중 75:25 비율)

-> 섞지 않을 경우 데이터가 레이블 순서대로 정렬되어 있음

random_state ->데이터를 무작위로 섞기, 난수 초깃값 설정 / 항상 같은 결과를 출력하기 위해

# 데이터 살펴보기- 시각화(산점도)

iris_dataframe = pd.DataFrame(X_train, columns=iris_dataset.feature_names)

pd.plotting.scatter_matrix(iris_dataframe, c=y_train, figsize=(15, 15), marker='o',

hist_kwds={'bins': 20}, s=60, alpha=.8, cmap=mglearn.cm3)

->pandas의 scatter_matrix함수 이용

-> X_train데이터 이용하여 시각화 / 데이터프레임으로 변경 후 산점도 그리기

-> y_train에 따라 색으로 구분된 산점도 그리기(3가지 붓꽃 종류)

-> cmap -> mglearn 라이브러리에서 정의된 특정 컬러맵사용

-> 세 클래스(붓꽃 종류)가 꽃잎, 꽃 받침의 변수(피쳐)에 따라 잘 구분되어 있음 -> 클래스 구분할 수 있는 머신러닝 모델 학습 가능

from sklearn.neighbors import KNeighborsClassifier

knn = KNeighborsClassifier(n_neighbors=1)

-> K-NN 최근접 이웃 분류기 -> 가장 가까운 k개의 이웃 찾기 / n_neighbors=1 이웃의 개수 1

-> 모델사용위해 객체 만듦 knn

knn.fit(X_train, y_train)

->모델생성 fit 함수 / X_train, y_train 훈련데이터 사용

#예측하기

X_new = np.array([[5, 2.9, 1, 0.2]])

print(X_new.shape)

-> (1, 4)

*** skitit-learn에서는 항상 데이터 2차원으로 가정

-> new붓꽃 입력 - 꽃받침 5cm / 폭 2.9cm / 꽃잎길이 1cm / 폭 0.2cm -> 품종 예측

prediction = knn.predict(X_new)

print(prediction, iris_dataset['target_names'][prediction])

-> [0] ['setosa']

-> predict 함수로 예측 / 0번 ['setosa']품종으로 예측함

# 모델평가

-> 우리의 예측이 잘 된 예측인가

-> 지금까지 사용하지 않은 testset 사용

y_pred = knn.predict(X_test)

print( y_pred)

print(np.mean(y_pred == y_test)) # 얼마나 일치하는지 == 이용

print(knn.score(X_test, y_test)) # score메서드 이용

->[2 1 0 2 0 2 0 1 1 1 2 1 1 1 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0 2 1 0 2 2 1 0 2] -> 예측한 인덱스 값
->0.9736842105263158 -> 일치율
->0.9736842105263158 -> score메서드 / 0.97만큼 품종 정확히 맞춤

728x90

저작자표시 (새창열림)

'ML(Machine Learning) > 책: 파이썬 라이브러리를 활용한 머신러닝(2판)' 카테고리의 다른 글

ch2지도학습_나이브베이즈 분류기 Naive bayes classifier (0)	2023.09.10
ch2지도학습_분류classification의 선형모델 linear_model (0)	2023.09.10
ch2지도학습_회귀regression의 선형모델 linear_model (0)	2023.09.08
ch2지도학습_K-최근접이웃 K-Neareset Neighbors(K-NN) (2)	2023.09.07

'ML(Machine Learning)/책: 파이썬 라이브러리를 활용한 머신러닝(2판)' Related Articles

Hippo's data

ch1 붓꽃(iris) 품종분류예제 본문

ch1 붓꽃(iris) 품종분류예제

# 라이브러리 불러오기 & 버전확인

# 데이터 적재

# 데이터 세트 분할

# 데이터 살펴보기- 시각화(산점도)

#예측하기

# 모델평가

'ML(Machine Learning) > 책: 파이썬 라이브러리를 활용한 머신러닝(2판)' 카테고리의 다른 글

티스토리툴바