๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

์ „์ฒด ๊ธ€204

12. Evaluation & Ensemble Classification Model์˜ ์„ฑ๋Šฅ์„ ์ธก์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•๊ณผ ์•™์ƒ๋ธ”์— ๋Œ€ํ•ด์„œ ๋‹ค๋ฃฐ ๊ฒƒ์ด๋‹ค Accuracy Evaluation ๋ชจ๋ธ์˜ ์ •ํ™•์„ฑ์„ ํ‰๊ฐ€ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด์„œ ๋ชจ๋ธ์˜ ๋ถ„๋ฅ˜ ์ •ํ™•์„ฑ์„ ํ‰๊ฐ€ ํ•œ๋‹ค Test Data ์ •ํ™•์„ฑ ์ธก์ •์„ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ ์…‹ ์˜ ํ˜•ํƒœ ๋ชจ๋ธ์— ์ž…๋ ฅํ•  ๋•Œ๋Š” ์ •๋‹ต ํด๋ž˜์Šค๋ฅผ ์ œ์™ธํ•˜๊ณ  ์ž…๋ ฅํ•œ ํ›„, ์˜ˆ์ธก๊ฐ’๊ณผ ์ฃผ์–ด์ง„ Class Label์„ ๋น„๊ตํ•œ๋‹ค Accuray = ๋ชจ๋ธ์ด ์ •ํ™•ํ•˜๊ฒŒ ๋ถ„๋ฅ˜ํ•œ ๊ฐœ์ˆ˜ ํ…Œ์ŠคํŠธ ๋ฐ์ดํ„ฐ๋Š” ํ•™์Šต์— ์‚ฌ์šฉ๋˜์ง€ ์•Š์€ ๋…๋ฆฝ์ ์ธ ๋ฐ์ดํ„ฐ์—ฌ์•ผ ํ•œ๋‹ค ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ๋กœ ํ…Œ์ŠคํŠธ๋ฅผ ํ•˜๋Š” ๊ฑด, ์ด๋ฏธ ๋‹ต์„ ์•„๋Š” ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ๊ฒƒ์ž„ Confusion Matrix ๋ถ„๋ฅ˜ ๋ชจ๋ธ ์„ฑ๋Šฅ ํ‰๊ฐ€ ์ง€ํ‘œ ๊ฐ ์—”ํŠธ๋ฆฌ๋Š” class i์ธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๋ชจ๋ธ์ด class j๋กœ ์˜ˆ์ธกํ•œ ๊ฐœ์ˆ˜๋ฅผ ์˜๋ฏธํ•œ๋‹ค Tru.. 2024. 4. 16.
9. ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€ ๋ฐ ์—ฐ๋™ ์ด๋ฒˆ ํ™œ๋™ ์š”์•ฝ ์ง€๋‚œ ๋ฒˆ์— ๊ตฌ์ถ•ํ•ด๋‘” ์„œ๋ฒ„์— ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ , ํ”„๋ก ํŠธ์™€ ์—ฐ๋™ํ•˜๋Š” ์ž‘์—…์„ ์ง„ํ–‰ํ–ˆ๋‹ค ์ด์ „๊นŒ์ง€ ๋ฐฑ์—”๋“œ ์„œ๋ฒ„๋ฅผ ๊ตฌ์ถ•ํ•˜๊ณ  ๊ธฐ๋ณธ์ ์ธ CRUD๋ฅผ ๊ตฌํ˜„ํ–ˆ์—ˆ๋‹ค ์ด๋ฒˆ์—๋Š” DB์— ๋”๋ฏธ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ , ํ”„๋ก ํŠธ์—”๋“œ์—์„œ API๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๋Š” ๊ฒƒ๊นŒ์ง€ ์ง„ํ–‰์„ ํ–ˆ๋‹ค ๋ฐ์ดํ„ฐ ์ถ”๊ฐ€ ๊ธฐ์กด์— ํ”„๋ก ํŠธ์—์„œ ์ง์ ‘ ์ €์žฅํ•˜์—ฌ ์“ฐ๊ณ ์žˆ๋˜ ๋ฐ์ดํ„ฐ๋ฅผ DB์— ์ถ”๊ฐ€ํ•˜๋Š” ์ž‘์—…์„ ํ–ˆ๋‹ค ์ดˆ๊ธฐ ๋ฐ์ดํ„ฐ ์ €์žฅ์„ ์œ„ํ•ด ํ”„๋ก ํŠธ์—์„œ String ํ˜•ํƒœ๋กœ ๊ฐ€์ง€๊ณ ์žˆ๋˜ ๋ฐ์ดํ„ฐ๋ฅผ ํŒŒ์‹ฑํ•ด์„œ ์„œ๋ฒ„์— ์ €์žฅํ–ˆ๋‹ค ์ด๋•Œ, ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ํ•ด์„œ ๋ฏธ๋ฆฌ ํด๋Ÿฌ์Šคํ„ฐ๋ณ„๋กœ ์ƒ‰์ƒ์„ ์ง€์ •ํ•ด์„œ ์ƒ‰์ƒ๊ฐ’๋„ ํ•จ๊ป˜ ์ €์žฅํ–ˆ๋‹ค ์›๋ž˜๋Š” ์ด ๋กœ์ง์„ ํ”„๋ก ํŠธ์—์„œ ์ง์ ‘ ํ•œ ๋‹ค์Œ, ๋ฐ์ดํ„ฐ๋ฅผ ๊ทธ๋ ค๋‚ด๊ณ  ์žˆ์—ˆ๋Š”๋ฐ DB์—์„œ ๋ฐ›์•„์˜ค๊ฒŒ ๋œ๋‹ค ์ด๋ฏธ ๋ชจ๋“  ์ค€๋น„๊ฐ€ ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ›์•„์™€์„œ ์ถœ๋ ฅํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ดˆ๊ธฐ ํ™”๋ฉด ๊ตฌ์„ฑ์ด ๋นจ๋ผ์กŒ์„.. 2024. 4. 14.
11. Rule Based Classification Rule์— ๊ธฐ๋ฐ˜ํ•œ ๋ถ„๋ฅ˜๊ธฐ ๊ธฐ๋ณธ์ ์ธ ์•„์ด๋””์–ด๋Š” IF-THEN์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ Ex) IF age = youth AND student = false THEN buys_computer = no ๋ฐ์ดํ„ฐ์˜ ์ˆ˜๊ฐ€ ๊ทธ๋ ‡๊ฒŒ ๋งŽ์ง€ ์•Š์€ ๊ฒฝ์šฐ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค ์ด Rule๋“ค์€ Domain Experts (Human Experts)์— ์˜ํ•ด์„œ ๋งŒ๋“ค์–ด์ง„๋‹ค ํŠน์ • ๋ฐ์ดํ„ฐ๊ฐ€ ์—ฌ๋Ÿฌ ๊ฐœ์˜ Rule์— ๋ถ€ํ•ฉํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” Conflict Resolution์ด ํ•„์š”ํ•˜๋‹ค Size Ordering Rule์˜ Size๋ผ๋Š” ๊ฒƒ์€ IF๋ฌธ์— ๊ฑธ๋ ค์žˆ๋Š” Feature์˜ ์ˆ˜ ์ฆ‰, Size๊ฐ€ ํฌ๋‹ค๋Š” ๊ฒƒ์€ Rule์ด ๊ตฌ์ฒด์ ์ด๊ณ  Toughest ํ•˜๋‹ค๋Š” ๊ฒƒ์ด๋‹ค Size๊ฐ€ ํฐ Rule์ผ์ˆ˜๋ก ๋” ๋†’์€ ์šฐ์„ ์ˆœ์œ„๋ฅผ ์ฃผ๋Š” ๋ฐฉ์‹ Class-based Ordering Miscla.. 2024. 4. 14.
10. Overfitting Training Data๋ฅผ ์‚ฌ์šฉํ•ด์„œ Decision Tree๋ฅผ Top-down ๋ฐฉ์‹์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์„ ๋ดค๋‹ค ํ•™์Šต์„ ์–ธ์ œ ์ข…๋ฃŒํ•˜๋Š” ๊ฒƒ์ด ์ข‹์„๊นŒ? ๋…ธ๋“œ์— ์žˆ๋Š” ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋“ค์ด ๊ฐ™์€ Class๋ฅผ ๊ฐ€์งˆ ๋•Œ๊นŒ์ง€ ๋ถ„๋ฅ˜ํ•œ๋‹ค ๊ทธ๋Ÿฌ๋ฉด ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ 100% ๋ถ„๋ฅ˜ ์ •ํ™•๋„๋ฅผ ๋ณด์ด๋Š” Decision Tree๊ฐ€ ๋งŒ๋“ค์–ด์ง€๊ฒŒ ๋œ๋‹ค ์œ„์ฒ˜๋Ÿผ ํ•™์Šต์ด ๋˜๋ฉด ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด 100% ์ •ํ™•ํžˆ ๋ถ„๋ฅ˜ํ•˜๋Š” Decision Tree๊ฐ€ ํ•™์Šต๋˜์—ˆ๋‹ค ์ด๊ฒŒ ๊ณผ์—ฐ ์ข‹์€ ๊ฒƒ์ผ๊นŒ? ์•„๋‹ˆ๋‹ค. Overfitting์ด ๋œ ๊ฒƒ ์ด๋‹ค Overfitting of Decision Tree Models Decision Tree๋Š” Overfitting์ด ๋ฐœ์ƒํ•œ๋‹ค๋Š” ๋ฌธ์ œ ๊ฐ€ ์žˆ๋‹ค Overfitting์ด ๋œ๋‹ค๋Š” ๊ฒƒ์€ ์™œ ๋ฌธ์ œ์ผ๊นŒ ๋„ˆ๋ฌด ๋งŽ์€ Branch๊ฐ€ ์ƒ๊ธฐ๊ฒŒ ๋˜๊ณ ,.. 2024. 4. 14.
9. Decision Tree What is Decesion Tree? Decision Tree๋Š” ํŠน์ • ์กฐ๊ฑด์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ๊ฒฐ์ •์— ๋Œ€ํ•œ ๋ชจ๋“  ๊ฐ€๋Šฅํ•œ ํ•ด๊ฒฐ์ฑ…์„ ์‹œ๊ฐ์ ์œผ๋กœ ๋‚˜ํƒ€๋‚ธ ๊ทธ๋ž˜ํ”ฝ ํ‘œํ˜„์ด๋‹ค ์ค‘๊ฐ„ ๋…ธ๋“œ๋Š” ์—ฌ๋Ÿฌ Alternatives ์ค‘์— ์–ด๋– ํ•œ ์„ ํƒ์„ ์˜๋ฏธํ•œ๋‹ค ๋ฆฌํ”„ ๋…ธ๋“œ๋Š” ์ตœ์ข… Decision์„ ์˜๋ฏธํ•œ๋‹ค ๊ทธ๋ฆผ์—์„œ ๋‚˜์ด๊ฐ€ 30์„ธ ์ดํ•˜์ด๊ณ , ํ•™์ƒ์ด ์•„๋‹ˆ๋ฉด ์ปดํ“จํ„ฐ๋ฅผ ์‚ฌ์ง€ ์•Š์„ ๊ฒƒ์ด๋ผ๊ณ  ์˜ˆ์ธก Algorithm Overview Decision Tree ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๋Œ€๋žต์ ์ธ ๊ณผ์ • ๋ถ„ํ• ์ •๋ณต์„ ์‚ฌ์šฉํ•ด์„œ Top-down ๋ฐฉ์‹์œผ๋กœ ํŠธ๋ฆฌ๋ฅผ ํ˜•์„ฑํ•œ๋‹ค ์ฒ˜์Œ์—๋Š” ๋ชจ๋“  Training data๋Š” Root๋กœ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•œ๋‹ค ๋ฐ์ดํ„ฐ๋“ค์€ ํ˜„์žฌ ๋‹จ๊ณ„์—์„œ ์„ ํƒ๋œ Feature๋ฅผ ๊ธฐ์ค€์œผ๋กœ Recursiveํ•˜๊ฒŒ ๋‚˜๋‰œ๋‹ค ์ด๋•Œ ํ˜„์žฌ ๋‹จ๊ณ„์—์„œ ์–ด๋–ค Feature๋ฅผ ์„ ํƒํ• ์ง€๋Š”, Heuri.. 2024. 4. 13.
8. Classification Classification์ด๋ž€ ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด์„œ Class label์„ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ Classificaiotn vs Regression ๋‘˜์€ ์–ด๋–ค ์ฐจ์ด๊ฐ€ ์žˆ์„๊นŒ? Classification ์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด Categoricalํ•œ Class label์„ ์˜ˆ์ธก ํ•˜๋Š” ๊ฒƒ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด์„œ Classifier๋ฅผ ํ•™์Šต์‹œํ‚จ ๋‹ค์Œ, ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ๋ฅผ ๋ชจ๋ธ์— ๋„ฃ์–ด์„œ ๊ฒฐ๊ณผ๋ฅผ ์˜ˆ์ธก Ex) ๋‚ ์”จ๊ฐ€ ์ถ”์šด์ง€ ์•ˆ ์ถ”์šด์ง€ ํŒ๋ณ„ํ•˜๋Š” ๋ชจ๋ธ Regression Continuousํ•œ ๊ฐ’์„ ๋ฑ‰์–ด๋‚ด๋Š” ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚จ๋‹ค Unknownํ•˜๊ฑฐ๋‚˜ Missing๋œ ๊ฐ’์„ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ด์„œ ์˜ˆ์ธกํ•œ๋‹ค ์—ฐ์†๋œ ๊ฐ’์„ ์˜ˆ์ธก ํ•˜๋Š” ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๊ฒƒ Ex) ๊ธฐ์˜จ์„ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ Classification Classification์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ชจ๋ธ์€ ์–ด๋–ป๊ฒŒ.. 2024. 4. 13.
7. Association Rules Frequent Pattern์„ ์ถ”์ถœํ–ˆ์œผ๋‹ˆ, ์ด๊ฒƒ์„ ์ด์šฉํ•ด์„œ Association Rule๋“ค์„ ๋งŒ๋“ค๊ณ  ํ‰๊ฐ€ํ•ด์•ผ ํ•œ๋‹ค Association Rules Mining Multilevel Association Mining Multidimensional Association Mining Quantitative Assocation Mining Interesting Correlation Patterns Mining Multilevel Association Rules Item๋“ค์€ ์ข…์ข… ๊ณ„์ธต ๊ตฌ์กฐ๋ฅผ ํ˜•์„ฑํ•œ๋‹ค ์˜ˆ๋ฅผ ๋“ค๋ฉด, Milk์™€ 2% Milk Milk๊ฐ€ 2% Milk์˜ ์ƒ์œ„ ๊ฐœ๋…์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค ๊ฐ ๊ณ„์ธต๋งˆ๋‹ค ์œ ์—ฐํ•˜๊ฒŒ Minimum Supoort๋ฅผ ์„ค์ •ํ•ด์ฃผ๋Š” ๊ฒƒ์ด ํ•„์š”ํ•˜๋‹ค ๊ณ„์ธต์ด ๋‚ด๋ ค๊ฐˆ์ˆ˜๋ก ๋‹น์—ฐํžˆ Support๊ฐ€ ์ž‘์•„.. 2024. 4. 13.
6. Miner Improvements ์ง€๊ธˆ๊นŒ์ง€ Frequent Pattern Mining์„ ํ•˜๊ธฐ ์œ„ํ•œ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋“ค์„ ์‚ดํŽด๋ณด์•˜๋‹ค Apriori๋Š” Candidate์˜ ์ˆ˜๋ฅผ ์ค„์—ฌ์ฃผ๊ธด ํ•˜์ง€๋งŒ ์—ฌ์ „ํžˆ ๊ทธ ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ๋งŽ๊ณ , DB ์ ‘๊ทผ๋„ ๋„ˆ๋ฌด ๋งŽ๋‹ค ๊ทธ๋ž˜์„œ ์ด๊ฑธ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ Improving Apriori ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์„ ๋ณด์•˜๋‹ค ๊ทธ๋Ÿผ์—๋„ ์—ฌ์ „ํžˆ Candidate๋ฅผ ์ƒ์„ฑํ•˜๊ณ , Testํ•˜๋Š” ๊ฒƒ์ด ๋ฌด๊ฑฐ์šด ์ž‘์—…์ด๋ผ ์ด๊ฒƒ์„ ํ•˜์ง€ ์•Š์•„๋„ ๋˜๋Š” FP-growth๋ผ๋Š” ๋ฐฉ๋ฒ•๋„ ๋ดค๋‹ค ์ด๊ฒƒ ์™ธ์— ๋‹ค๋ฅธ ๋งˆ์ด๋‹์„ Improveํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋“ค์„ ์‚ดํŽด๋ณผ ๊ฒƒ์ด๋‹ค MaxMiner Mining Max Patterns Recall Max Pattern : Max Pattern์€ X์˜ Superset(X โŠ‚ Y) ์ค‘์— Frequent Pattern์ด ์กด์žฌํ•˜์ง€ ์•Š์œผ๋ฉด, Itemset X๋ฅผ.. 2024. 4. 13.
5. FP-growth Frequent Pattern Growth ์ด์ „๊นŒ์ง€๋Š” Apriori algorithm์„ ์‚ฌ์šฉํ•ด์„œ Freqeunt Pattern Mining์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ดค๋‹ค Apriori์˜ ํ•œ๊ณ„๋ฅผ ๊ฐœ์„ ํ•œ Improving Apriori ๋ฐฉ๋ฒ•๋“ค๋„ ์‚ดํŽด๋ณด์•˜๋‹ค DIC, Partition, Sampling, DHP ๊ทธ๋Ÿผ์—๋„ ์—ฌ์ „ํžˆ ๋Š๋ฆฌ๋‹ค๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค Candidate๋ฅผ ์ƒ์„ฑํ•˜๊ณ , Testํ•˜๋Š” ๊ณผ์ • ์ž์ฒด๊ฐ€ ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆผ(Bottleneck) FP-growth Mining Frequent Patterns without Candidate Generation Candidate๋ฅผ Generateํ•˜๋Š” ๊ฒƒ ์ž์ฒด๋ฅผ ํ•˜์ง€ ์•Š๋Š” ๋ฐฉ๋ฒ• Local Frequent Item๋“ค์„ ์‚ฌ์šฉํ•ด์„œ, ์งง์€ Pattern์œผ๋กœ๋ถ€ํ„ฐ ๊ธด Pattern์„ ์ƒ์„ฑํ•ด๋‚ด.. 2024. 4. 13.