๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
HYU/๋ฐ์ดํ„ฐ์‚ฌ์ด์–ธ์Šค

12. Evaluation & Ensemble

by Jaeguk 2024. 4. 16.

Classification Model์˜ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ์•™์ƒ๋ธ”์— ๋Œ€ํ•ด์„œ ๋‹ค๋ฃฐ ๊ฒƒ์ด๋‹ค

 

Accuracy Evaluation


๋ชจ๋ธ์˜ ์ •ํ™•์„ฑ์„ ํ‰๊ฐ€

  • ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด์„œ ๋ชจ๋ธ์˜ ๋ถ„๋ฅ˜ ์ •ํ™•์„ฑ์„ ํ‰๊ฐ€ ํ•œ๋‹ค
  • Test Data
    • ์ •ํ™•์„ฑ ์ธก์ •์„ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ ์…‹
    • <Feat 1, Feat 2, ..., Feat n, Class Label>์˜ ํ˜•ํƒœ
    • ๋ชจ๋ธ์— ์ž…๋ ฅํ•  ๋•Œ๋Š” ์ •๋‹ต ํด๋ž˜์Šค๋ฅผ ์ œ์™ธํ•˜๊ณ  ์ž…๋ ฅํ•œ ํ›„, ์˜ˆ์ธก๊ฐ’๊ณผ ์ฃผ์–ด์ง„ Class Label์„ ๋น„๊ตํ•œ๋‹ค
  • Accuray = ๋ชจ๋ธ์ด ์ •ํ™•ํ•˜๊ฒŒ ๋ถ„๋ฅ˜ํ•œ ๊ฐœ์ˆ˜
  • ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋Š” ํ•™์Šต์— ์‚ฌ์šฉ๋˜์ง€ ์•Š์€ ๋…๋ฆฝ์ ์ธ ๋ฐ์ดํ„ฐ์—ฌ์•ผ ํ•œ๋‹ค
    • ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ๋กœ ํ…Œ์ŠคํŠธ๋ฅผ ํ•˜๋Š” ๊ฑด, ์ด๋ฏธ ๋‹ต์„ ์•„๋Š” ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ๊ฒƒ์ž„

 

Confusion Matrix


๋ถ„๋ฅ˜ ๋ชจ๋ธ ์„ฑ๋Šฅ ํ‰๊ฐ€ ์ง€ํ‘œ

  • ๊ฐ ์—”ํŠธ๋ฆฌ๋Š” class i์ธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๋ชจ๋ธ์ด class j๋กœ ์˜ˆ์ธกํ•œ ๊ฐœ์ˆ˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค
    • True Positive: Yes์ธ ๊ฐ’์„ Yes๋ผ๊ณ  ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ (์ •๋‹ต)
    • False Negative : No์ธ ๊ฐ’์„ No๋ผ๊ณ  ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ (์ •๋‹ต)
    • True Negative: Yes์ธ ๊ฐ’์„ No๋ผ๊ณ  ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ (์˜ค๋‹ต)
    • False Positive: No์ธ ๊ฐ’์„ Yes๋ผ๊ณ  ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒƒ (์˜ค๋‹ต)

 

์˜ˆ์‹œ

  • ๋ชจ๋ธ์˜ ์ •ํ™•์„ฑ์€ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ๋ถ„๋ฅ˜๋œ ๋ฐ์ดํ„ฐ์˜ ๋น„์œจ ์ด๋‹ค
    • (2588 + 6954) / 10000
  • ๋ชจ๋ธ์˜ ์˜ค๋ฅ˜๋Š” (1.0 - Accuracy)

 

Alternative Accuracy


Accuracy๋ฅผ ๋Œ€์ฒดํ•˜๋Š” ์ธก์ •๊ฐ’๋“ค

  • Sensitivity(Recall) = true positive / positive
    • positiveํ•œ ๋ฐ์ดํ„ฐ ์ค‘์—, ๋ชจ๋ธ์ด positive๋ผ๊ณ  ์ธ์‹ํ•œ ๋ฐ์ดํ„ฐ์˜ ๋น„์œจ
    • Sensitivity๋ฅผ Recall์ด๋ผ๊ณ ๋„ ํ•˜๋Š”๋ฐ, ์ผ๋ฐ˜์ ์œผ๋กœ Recall์œผ๋กœ ๋” ์•Œ๋ ค์ ธ์žˆ๋‹ค
  • Sepecificity = true negative / negative
    • negativeํ•œ ๋ฐ์ดํ„ฐ ์ค‘์—, ๋ชจ๋ธ์ด negative๋ผ๊ณ  ์ธ์‹ํ•œ ๋ฐ์ดํ„ฐ์˜ ๋น„์œจ
  • Precision = true positive / (true positive + false positive)
    • Yes(positive)๋กœ ๋ถ„๋ฅ˜ํ•œ ๊ฒƒ ์ค‘์— ์‹ค์ œ๋กœ positive์ธ ๊ฒƒ์˜ ๊ฐœ์ˆ˜

 

Recall๊ณผ Precision์€ ์–‘์„ฑ์— ๋ฏผ๊ฐํ•œ ๋ชจ๋ธ ์˜ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•  ๋•Œ, ์‚ฌ์šฉ๋œ๋‹ค

ํ•˜์ง€๋งŒ ๋‘˜์€ ์„œ๋กœ trade-off ๊ด€๊ณ„ ์— ์žˆ๋‹ค

=> ๊ทธ๋ž˜์„œ ๋‘˜์˜ ์ ์ ˆํ•œ ๊ฒฝ๊ณ„๋ฅผ ์ฐพ์€ ํ‰๊ฐ€ ์ง€ํ‘œ๊ฐ€ F1-score

 

F1-score

  • ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•  ๋•Œ, F1-score๋ผ๋Š” ๊ฒƒ๋„ ๋งŽ์ด ์‚ฌ์šฉ
  • Precision๊ณผ Recall์€ ์„œ๋กœ ์˜์กด์ (Dependent) ์ด๋‹ค
  • ์„œ๋กœ์˜ ๋นˆํ‹ˆ์„ ๋ณด์™„ํ•ด์ฃผ๋Š” ๊ด€๊ณ„

 

Recall์ด๋‚˜ Precision ์ค‘ ํ•˜๋‚˜๋งŒ ์„ฑ๋Šฅ์˜ ์ง€ํ‘œ๋กœ ์‚ฌ์šฉํ•˜๋ฉด ์–ด๋–ค ๋ฌธ์ œ๊ฐ€ ์žˆ์„๊นŒ?

  • ๋ชจ๋“  ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ Yes(Positive)๋กœ ๋ถ„๋ฅ˜ํ•˜๊ฒŒ ๋˜๋ฉด, Recall๊ฐ’์ด ํ•ญ์ƒ 100%๋กœ ๋‚˜์˜จ๋‹ค
    • Positiveํ•œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ชจ๋‘ Positive๋กœ ๋ถ„๋ฅ˜๋˜๊ธด ํ• ํ…Œ๋‹ˆ๊นŒ

 

  • Positiveํ•œ ๋ฐ์ดํ„ฐ ์ค‘์— ํ•˜๋‚˜๋งŒ ์ •ํ™•ํ•˜๊ฒŒ Yes๋กœ ๋ถ„๋ฅ˜ํ•œ ๊ฒฝ์šฐ
  • ํ•˜๋‚˜๋งŒ ์ œ๋Œ€๋กœ ๋ถ„๋ฅ˜ํ–ˆ๋Š”๋ฐ๋„, Precision์ด 100%๋กœ ๋‚˜์˜จ๋‹ค

 

Recall์ด๋‚˜ Precision ์ค‘์— ํ•˜๋‚˜๋งŒ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ๋Š”, ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๊ฐ€ ์žˆ๊ธฐ ๋•Œ๋ฌธ์—
์„œ๋กœ๊ฐ€ ์„œ๋กœ์˜ ๋‹จ์ ์„ ๋ณด์™„ ํ•ด์ค€๋‹ค

 

Evaluation Protocols


์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•๋“ค

 

Holdout Method


์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ๋ฅผ ๋žœ๋คํ•˜๊ฒŒ 2๊ฐœ์˜ ๋…๋ฆฝ๋œ ํŒŒํ‹ฐ์…˜์œผ๋กœ ๋‚˜๋ˆˆ๋‹ค

  • ํŒŒํ‹ฐ์…˜์€ ๊ฐ๊ฐ Training set, Test set์œผ๋กœ ์‚ฌ์šฉ
    • Training set: ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด์„œ ์‚ฌ์šฉ
    • Test set: ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด์„œ ์‚ฌ์šฉ
  • K๋ฒˆ ๋ฐ˜๋ณตํ•ด์„œ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•œ๋‹ค
    • ๋žœ๋คํ•˜๊ฒŒ ํŒŒํ‹ฐ์…˜์„ Splitํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ํ•œ๋ฒˆ์œผ๋กœ๋Š” ์ถฉ๋ถ„ํ•˜์ง€ ์•Š๋‹ค
    • ์—ฌ๋Ÿฌ๋ฒˆ ๋ฐ˜๋ณตํ•œ ๊ฒฐ๊ณผ์˜ ํ‰๊ท ์„ ์‚ฌ์šฉํ•œ๋‹ค
    • K๊ฐ€ ์ปค์งˆ์ˆ˜๋ก, ์‹ ๋ขฐ๋„๊ฐ€ ์˜ฌ๋ผ๊ฐ

 

Cross-validation


๋™์ผํ•œ ํฌ๊ธฐ๋ฅผ ๊ฐ–๋Š” K๊ฐœ์˜ Subset์œผ๋กœ Data๋ฅผ ๋ถ„๋ฆฌํ•œ๋‹ค

  • K-fold Cross-validation ์ด๋ผ๊ณ ๋„ ๋ถˆ๋ฆฐ๋‹ค
  • K๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ 5 ๋˜๋Š” 10์œผ๋กœ ์žก๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ๋‹ค
  • i๋ฒˆ์งธ iteration์—์„œ๋Š” i๋ฒˆ์งธ Fold๋ฅผ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์šฉํ•˜๊ณ , ๋‚˜๋จธ์ง€ Fold๋ฅผ ํ•™์Šต์šฉ์œผ๋กœ ์‚ฌ์šฉ ํ•œ๋‹ค
  • K๋ฒˆ์˜ iteration์—์„œ ๋‚˜์˜จ ๊ฒฐ๊ณผ์˜ ํ‰๊ท ์„ ์„ฑ๋Šฅ์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค

 

Leave-one-out


K-fold Cross-validation์˜ ๊ทน๋‹จ์ ์ธ ๊ฒฝ์šฐ

  • K-folds์—์„œ K๋ฅผ ๋ฐ์ดํ„ฐ์˜ ๊ฐœ์ˆ˜๋กœ ์žก๋Š”๋‹ค
  • ์ฆ‰, ํ•˜๋‚˜์˜ Fold๊ฐ€ ํ•˜๋‚˜์˜ ๋ฐ์ดํ„ฐ๋งŒ ๋‹ด๊ณ  ์žˆ๊ฒŒ ๋œ๋‹ค
  • ๋ฐ์ดํ„ฐ์˜ ์ˆ˜๊ฐ€ ๋งค์šฐ ์ž‘์€ ๊ฒฝ์šฐ์— ์‚ฌ์šฉํ•œ๋‹ค

 

Stratified Cross-validation


K-fold Cross-validation์˜ ๋˜ ๋‹ค๋ฅธ ํŠน๋ณ„ ์ผ€์ด์Šค

  • ๊ฐ Fold์— ๋“ค์–ด์žˆ๋Š” ๋ฐ์ดํ„ฐ๋“ค์˜ Class ๋ถ„ํฌ๊ฐ€ ์ผ์ •ํ•˜๋„๋ก Fold๋ฅผ ๋ถ„๋ฆฌ ํ•œ๋‹ค
  • ๋งŒ์•ฝ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋„๋ก ํ•œ๋‹ค๊ณ  ํ•˜๋ฉด, ๋ชจ๋“  Fold์˜ ๋ฐ์ดํ„ฐ๋“ค์˜ Class๊ฐ€ ๊ฐ€์šฐ์‹œ์•ˆ ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ผ์•ผ ํ•œ๋‹ค

 

Ensemble


์•™์ƒ๋ธ”

  • ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด์„œ ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋ชจ๋ธ์„ ์‚ฌ์šฉ ํ•œ๋‹ค
  • K๊ฐœ์˜ ํ•™์Šต๋œ ๋ชจ๋ธ๋“ค์˜ ์˜ˆ์ธก๊ฐ’์„ ์ข…ํ•ฉํ•ด์„œ ์ตœ์ข…์ ์ธ ์˜ˆ์ธก๊ฐ’์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค
  • Popular Ensemble Method
    • Bagging: ๋‹จ์ˆœํ•˜๊ฒŒ K๊ฐœ์˜ ๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋‹ค์ˆ˜๊ฒฐ์— ๋”ฐ๋ผ ์ตœ์ข… ๊ฒฐ๊ณผ ์˜ˆ์ธก
    • Boosting: ๊ฐ ๋ชจ๋ธ๋งˆ๋‹ค์˜ Weight๋ฅผ ๊ณ ๋ คํ•ด์„œ ์ตœ์ข… ๊ฒฐ๊ณผ๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค
    • Model Ensemble: ์—ฌ๋Ÿฌ ์ข…๋ฅ˜์˜ ๋ชจ๋ธ์„ ์•™์ƒ๋ธ”ํ•ด์„œ ์‚ฌ์šฉํ•œ๋‹ค
      • Ex) SVM + Decision Tree + Neural Network + ...

์ด๋ฏธ ์•ž์—์„œ Random Forestํ•  ๋•Œ ๋ฐฐ์› ๋˜ ๊ฐœ๋…์ด๋‹ค

 

Bagging


Bootstrap Aggregation

  • Training
    • ๊ฐ Iteration๋งˆ๋‹ค Original Data๋กœ๋ถ€ํ„ฐ ๋žœ๋คํ•˜๊ฒŒ ์ƒ˜ํ”Œ๋œ ๋ฐ์ดํ„ฐ ์…‹์„ ์‚ฌ์šฉํ•œ๋‹ค
    • ์ด๋•Œ ์‚ฌ์šฉ๋˜๋Š” Sample Data(Bootstrap)์˜ ํฌ๊ธฐ๋Š” ์›๋ณธ ๋ฐ์ดํ„ฐ ์…‹์˜ ํฌ๊ธฐ์™€ ๋™์ผํ•ด์•ผ ํ•œ๋‹ค
    • i๋ฒˆ์งธ Bootstrap์„ ์‚ฌ์šฉํ•ด์„œ i๋ฒˆ์งธ ๋ชจ๋ธ Mi๊ฐ€ ํ•™์Šต๋œ๋‹ค
  • Classification
    • ๊ฐ ๋ชจ๋ธ์€ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ ์˜ˆ์ธก๊ฐ’์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค
    • Bagging์€ ๊ฐ€์žฅ ๋งŽ์ด ๋‚˜์˜จ ์˜ˆ์ธก๊ฐ’์„ ์ตœ์ข… ๊ฒฐ๊ณผ๋กœ ์„ ํƒํ•œ๋‹ค (๋‹ค์ˆ˜๊ฒฐ)

 

Boosting


๊ฐ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์— ๊ธฐ๋ฐ˜ํ•œ Weight๋ฅผ ๊ณ ๋ คํ•˜์—ฌ ์ตœ์ข… ๊ฐ’์„ ๊ฒฐ์ •ํ•˜๋Š” ๋ชจ๋ธ

  • Iteration๋งˆ๋‹ค Sample(Bootstrap)์„ ์ƒ์„ฑํ•ด์„œ, K๊ฐœ์˜ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฐœ๋…์€ ๋™์ผํ•˜๋‹ค

 

๊ณผ์ •์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค

  1. ์ดˆ๊ธฐ์—๋Š” ๊ฐ ๋ฐ์ดํ„ฐ๋งˆ๋‹ค 1/d์˜ Weight๋ฅผ ๋ถ€์—ฌํ•œ๋‹ค
    • ์ด Weight๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ˜ํ”Œ๋งํ•  ๋•Œ๋„ ์ ์šฉ๋˜๋Š” ๊ฐœ๋…์ด๋‹ค
    • Weight๊ฐ€ ๋†’์€ ๋ฐ์ดํ„ฐ๊ฐ€ ์ƒ˜ํ”Œ๋ง๋  ํ™•๋ฅ ์ด ๋†’๋‹ค
  2. 1 ~ K๊นŒ์ง€์˜ Iteration์„ ํ†ตํ•ด์„œ K๊ฐœ์˜ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๊ณ , ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๊ฐ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•œ๋‹ค
    • ์ด๋•Œ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋Š” ์ƒ˜ํ”Œ๋˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ๋กœ ๊ตฌ์„ฑ๋˜์–ด์•ผ ํ•œ๋‹ค
  3. i๋ฒˆ์งธ ๋ชจ๋ธ์ด ์ œ๋Œ€๋กœ ๋ถ„๋ฅ˜ํ•˜์ง€ ๋ชปํ•œ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด Weight๋ฅผ ๋†’์—ฌ์„œ, i+1๋ฒˆ์งธ ๋ชจ๋ธ์—์„œ ์ƒ˜ํ”Œ๋ง๋˜์–ด ํ•™์Šต๋  ํ™•๋ฅ ์„ ๋†’์ธ๋‹ค
  4. ๊ฐ ๋ชจ๋ธ์ด ๋ณด์˜€๋˜ Accuracy๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ตœ์ข… Vote๋ฅผ ํ•  ๋•Œ ์‚ฌ์šฉ๋˜๋Š” Weight๋ฅผ ๊ฒฐ์ •ํ•œ๋‹ค
728x90

'HYU > ๋ฐ์ดํ„ฐ์‚ฌ์ด์–ธ์Šค' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

11. Rule Based Classification  (0) 2024.04.14
10. Overfitting  (0) 2024.04.14
9. Decision Tree  (0) 2024.04.13
8. Classification  (0) 2024.04.13
7. Association Rules  (0) 2024.04.13