지도학습 : 분류모델 - 로지스틱 회귀

⊢MachineLearning

지도학습 : 분류모델 - 로지스틱 회귀

최 수빈 2025. 3. 10. 19:42

로지스틱 회귀

종속 변수가 이진형(0 또는 1)일 때 사용하는 회귀 분석 기법

결과값을 0과 1 사이의 확률 값으로 변환하기 위해 시그모이드 함수(Sigmoid Function) 사용

회귀라는 이름을 가지지만 실제 사용 목적은 분류

*시그모이드 함수(Sigmoid Function)

입력값을 0과 1 사이의 확률 값으로 변환하는 함수

σ(z) = ½ / (1 + e^-z)

여기서 𝒛는 선형 회귀 방정식으로 표현됨:

z = β₀ + β₁x₁ + β₂x₂ + ... + β_nx_n

데이터로부터 특정 클래스에 속할 확률을 예측
예시 :
- 유방암 데이터 : 환자가 암에 걸릴 확률 예측
- 타이타닉 데이터 : 승객이 생존할 확률 예측

*비용 함수(Log Loss, Cross-Entropy Loss)

모델의 예측 확률과 실제 레이블 간 차이를 측정

J(θ) = -&frac1m; ∑_i=1^m [ y⁽ⁱ⁾ log(h_θ(x⁽ⁱ⁾)) + (1 - y⁽ⁱ⁾) log(1 - h_θ(x⁽ⁱ⁾)) ]

로그 손실 함수(Log Loss) 또는 크로스 엔트로피 손실 함수(Cross-Entropy loss)라고 불림

로지스틱 회귀 예시

유방암 데이터 분석

데이터 로드 및 전처리

import numpy as np
import pandas as pd
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# 데이터 로드
cancer_data = load_breast_cancer()
X = cancer_data.data
y = cancer_data.target

# 데이터 분할
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 데이터 스케일링
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

모델 학습 및 평가

from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

# 모델 생성 및 학습
model = LogisticRegression()
model.fit(X_train, y_train)

# 예측
y_pred = model.predict(X_test)

# 평가
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
print(f"Classification Report:\n{classification_report(y_test, y_pred)}")
print(f"Confusion Matrix:\n{confusion_matrix(y_test, y_pred)}")

"""
Accuracy: 0.9649122807017544
Classification Report:
              precision    recall  f1-score   support

           0       0.97      0.92      0.95        39
           1       0.96      0.99      0.97        75

    accuracy                           0.96       114
   macro avg       0.97      0.95      0.96       114
weighted avg       0.97      0.96      0.96       114

Confusion Matrix:
[[36  3]
 [ 1 74]]
"""

타이타닉 데이터 분석

데이터 로드 및 전처리

import seaborn as sns

# 데이터 로드
titanic = sns.load_dataset('titanic')

# 필요한 열 선택 및 결측값 처리
titanic = titanic[['survived', 'pclass', 'sex', 'age', 'sibsp', 'parch', 'fare', 'embarked']].dropna()

# 성별과 탑승한 곳 인코딩
titanic['sex'] = titanic['sex'].map({'male': 0, 'female': 1})
titanic['embarked'] = titanic['embarked'].map({'C': 0, 'Q': 1, 'S': 2})

# 특성과 타겟 분리
X = titanic.drop('survived', axis=1)
y = titanic['survived']

# 데이터 분할
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 데이터 스케일링
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

모델 학습 및 평가

# 모델 생성 및 학습
model = LogisticRegression()
model.fit(X_train, y_train)

# 예측
y_pred = model.predict(X_test)

# 평가
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
print(f"Classification Report:\n{classification_report(y_test, y_pred)}")
print(f"Confusion Matrix:\n{confusion_matrix(y_test, y_pred)}")

"""
Accuracy:0.8251748251748252
Classification Report:
              precision    recall  f1-score   support

           0       0.84      0.89      0.87        91
           1       0.79      0.71      0.75        52

    accuracy                           0.83       143
   macro avg       0.82      0.80      0.81       143
weighted avg       0.82      0.83      0.82       143

Confusion Matrix:
[[81 10]
 [15 37]]
"""

로지스틱 회귀 : 이진 분류 문제를 해결하는 데 적합

시그모이드 함수를 사용하여 확률을 예측하고 결정 경계를 설정

크로스 엔트로피 손실 함수를 통해 모델 성능 최적화

저작자표시 비영리 변경금지 (새창열림)

'⊢MachineLearning' 카테고리의 다른 글

지도학습 : 분류모델 - KNN (0)	2025.03.14
지도학습 : 분류모델 - SVM (0)	2025.03.13
지도학습 : 회귀모델 (0)	2025.03.06
데이터 전처리 실습 흐름/코드 정리 (0)	2025.03.06
데이터 전처리(Data Cleaning) (0)	2025.03.01

현재글지도학습 : 분류모델 - 로지스틱 회귀

if(life){code();}

life: Compiling… Please Wait

250x250

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

if(life){code();}