๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

์ „์ฒด ๊ธ€207

7. Association Rules Frequent Pattern์„ ์ถ”์ถœํ–ˆ์œผ๋‹ˆ, ์ด๊ฒƒ์„ ์ด์šฉํ•ด์„œ Association Rule๋“ค์„ ๋งŒ๋“ค๊ณ  ํ‰๊ฐ€ํ•ด์•ผ ํ•œ๋‹ค Association Rules Mining Multilevel Association Mining Multidimensional Association Mining Quantitative Assocation Mining Interesting Correlation Patterns Mining Multilevel Association Rules Item๋“ค์€ ์ข…์ข… ๊ณ„์ธต ๊ตฌ์กฐ๋ฅผ ํ˜•์„ฑํ•œ๋‹ค ์˜ˆ๋ฅผ ๋“ค๋ฉด, Milk์™€ 2% Milk Milk๊ฐ€ 2% Milk์˜ ์ƒ์œ„ ๊ฐœ๋…์ด๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค ๊ฐ ๊ณ„์ธต๋งˆ๋‹ค ์œ ์—ฐํ•˜๊ฒŒ Minimum Supoort๋ฅผ ์„ค์ •ํ•ด์ฃผ๋Š” ๊ฒƒ์ด ํ•„์š”ํ•˜๋‹ค ๊ณ„์ธต์ด ๋‚ด๋ ค๊ฐˆ์ˆ˜๋ก ๋‹น์—ฐํžˆ Support๊ฐ€ ์ž‘์•„.. 2024. 4. 13.
6. Miner Improvements ์ง€๊ธˆ๊นŒ์ง€ Frequent Pattern Mining์„ ํ•˜๊ธฐ ์œ„ํ•œ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•๋“ค์„ ์‚ดํŽด๋ณด์•˜๋‹ค Apriori๋Š” Candidate์˜ ์ˆ˜๋ฅผ ์ค„์—ฌ์ฃผ๊ธด ํ•˜์ง€๋งŒ ์—ฌ์ „ํžˆ ๊ทธ ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ๋งŽ๊ณ , DB ์ ‘๊ทผ๋„ ๋„ˆ๋ฌด ๋งŽ๋‹ค ๊ทธ๋ž˜์„œ ์ด๊ฑธ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ Improving Apriori ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์„ ๋ณด์•˜๋‹ค ๊ทธ๋Ÿผ์—๋„ ์—ฌ์ „ํžˆ Candidate๋ฅผ ์ƒ์„ฑํ•˜๊ณ , Testํ•˜๋Š” ๊ฒƒ์ด ๋ฌด๊ฑฐ์šด ์ž‘์—…์ด๋ผ ์ด๊ฒƒ์„ ํ•˜์ง€ ์•Š์•„๋„ ๋˜๋Š” FP-growth๋ผ๋Š” ๋ฐฉ๋ฒ•๋„ ๋ดค๋‹ค ์ด๊ฒƒ ์™ธ์— ๋‹ค๋ฅธ ๋งˆ์ด๋‹์„ Improveํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•๋“ค์„ ์‚ดํŽด๋ณผ ๊ฒƒ์ด๋‹ค MaxMiner Mining Max Patterns Recall Max Pattern : Max Pattern์€ X์˜ Superset(X ⊂ Y) ์ค‘์— Frequent Pattern์ด ์กด์žฌํ•˜์ง€ ์•Š์œผ๋ฉด, Itemset X๋ฅผ.. 2024. 4. 13.
5. FP-growth Frequent Pattern Growth ์ด์ „๊นŒ์ง€๋Š” Apriori algorithm์„ ์‚ฌ์šฉํ•ด์„œ Freqeunt Pattern Mining์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ดค๋‹ค Apriori์˜ ํ•œ๊ณ„๋ฅผ ๊ฐœ์„ ํ•œ Improving Apriori ๋ฐฉ๋ฒ•๋“ค๋„ ์‚ดํŽด๋ณด์•˜๋‹ค DIC, Partition, Sampling, DHP ๊ทธ๋Ÿผ์—๋„ ์—ฌ์ „ํžˆ ๋Š๋ฆฌ๋‹ค๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค Candidate๋ฅผ ์ƒ์„ฑํ•˜๊ณ , Testํ•˜๋Š” ๊ณผ์ • ์ž์ฒด๊ฐ€ ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆผ(Bottleneck) FP-growth Mining Frequent Patterns without Candidate Generation Candidate๋ฅผ Generateํ•˜๋Š” ๊ฒƒ ์ž์ฒด๋ฅผ ํ•˜์ง€ ์•Š๋Š” ๋ฐฉ๋ฒ• Local Frequent Item๋“ค์„ ์‚ฌ์šฉํ•ด์„œ, ์งง์€ Pattern์œผ๋กœ๋ถ€ํ„ฐ ๊ธด Pattern์„ ์ƒ์„ฑํ•ด๋‚ด.. 2024. 4. 13.
4. Improving Apriori Apriori ์•Œ๊ณ ๋ฆฌ์ฆ˜์—๋Š” ์—ฌ๋Ÿฌ ํ•œ๊ณ„๊ฐ€ ์กด์žฌํ•œ๋‹ค๊ณ  ํ–ˆ๋‹ค Multiple scans of DB (k times) ๋Œ€๋žต k๋ฒˆ์˜ DB ์Šค์บ”์ด ๋ฐœ์ƒํ•œ๋‹ค๋Š” ๊ฒƒ ์—ฌ๊ธฐ์„œ k๋Š” Max Pattern์˜ ๊ธธ์ด์ด๋‹ค DB ์ ‘๊ทผ์€ ๋„ˆ๋ฌด ๋Š๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์— ๊ฐœ์„ ์ด ํ•„์š”ํ•˜๋‹ค Huge number of candidates ํ›„๋ณด๊ตฐ์˜ ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ๋งŽ๋‹ค Max Pattern {i1, i2, ..., i100}์„ ์ฐพ๊ธฐ ์œ„ํ•ด์„œ๋Š” # of scans(k): 100 # of candidates: 2^100 - 1 ๋งŒํผ์˜ ํ›„๋ณด๊ตฐ Tedious workload of Candidate generation and Test Candidate๋“ค์˜ Support๋ฅผ Countํ•˜๋Š” ๊ฒƒ์˜ Cost๊ฐ€ ๊ฝค ํฌ๋‹ค Improving Apriori Apriori๋ฅผ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ.. 2024. 4. 13.
3. Apriori Scalable Mining Method ์ค‘ ํ•˜๋‚˜ Scale down์„ ํ•˜๋ฉด์„œ Frequent Pattern์„ ์ฐพ๋Š” Method ์ค‘ ํ•˜๋‚˜ Apriori Candidate Generation and Test Approach Apriori์—์„œ Scaledown์„ ํ•˜๋Š” ์›๋ฆฌ๋Š” Infrequentํ•œ Pattern์ด ์žˆ๋‹ค๋ฉด, ํ•ด๋‹น ํŒจํ„ด์˜ Superset์€ ์ ˆ๋Œ€ Frequentํ•  ์ˆ˜๊ฐ€ ์—†๋‹ค๋Š” ๊ฒƒ์„ ์ด์šฉ Downward property ์ด์šฉ ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ๊ตณ์ด Generationํ•˜๊ณ  Testํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค ์ฒดํฌํ•ด์•ผ ํ•  ํŒจํ„ด์˜ ์ˆ˜๋ฅผ ์ค„์—ฌ์ค€๋‹ค ๋ฐฉ๋ฒ•์„ ๊ฐ„๋žตํ•˜๊ฒŒ ๋ณด๋ฉด 1. DB๋ฅผ ์Šค์บ”ํ•ด์„œ ํฌ๊ธฐ๊ฐ€ 1์ธ Frequent Pattern๋“ค์„ ์ฐพ๋Š”๋‹ค 2. ์•„๋ž˜์˜ ๊ณผ์ •์„ ๊ณ„์†ํ•ด์„œ ๋ฐ˜๋ณตํ•œ๋‹ค 2-1. ๊ธธ์ด๊ฐ€ K์ธ Frequent Patt.. 2024. 4. 13.
2. Frequent Patterns Mining Frequent Patterns, Association and Correlatons Frequent Pattern Mining ๋ฐ์ดํ„ฐ ์†์—์„œ ์ž์ฃผ ๋“ฑ์žฅํ•˜๋Š” ํŒจํ„ด์„ ๋ถ„์„ํ•˜๋Š” ๊ธฐ์ˆ  Frequent Pattern? : ๋ฐ์ดํ„ฐ์…‹ ๋‚ด์—์„œ ์ž์ฃผ ๋“ฑ์žฅํ•˜๋Š” ํŒจํ„ด ์˜ˆ๋ฅผ ๋“ค๋ฉด, ์ž์ฃผ ํ•จ๊ป˜ ๊ตฌ๋งค๋˜๋Š” ์ƒํ’ˆ๋“ค Motivation? ๋ฐ์ดํ„ฐ ์†์— ๋‚ด์žฌ๋œ ํŒจํ„ด๋“ค ์ฐพ๊ธฐ ์œ„ํ•จ ์–ด๋–ค ์ƒํ’ˆ๋“ค์ด ํ•จ๊ป˜ ๊ตฌ๋งค๊ฐ€ ๋˜๋Š”๊ฐ€? (์ด๊ฒŒ ์•ž์œผ๋กœ ์ฃผ๋กœ ๋‹ค๋ค„์งˆ ์˜ˆ์‹œ) Beers and Diapers ๊ธฐ์ €๊ท€์™€ ๋งฅ์ฃผ๋Š” ํ•จ๊ป˜ ๊ตฌ๋งค๊ฐ€ ๋˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค ์ด ์ •๋ณด๋ฅผ ์•Œ๋ฉด ๊ธฐ์ €๊ท€์™€ ๋งฅ์ฃผ๋ฅผ ํ•จ๊ป˜ ๋น„์น˜ํ•˜๋ฉด ํŒ๋งค์œจ์ด ์˜ฌ๋ผ๊ฐˆ ๊ฒƒ ํŠน์ • ์ƒํ’ˆ์„ ๊ตฌ๋งคํ•œ ๋‹ค์Œ ์ˆœ์ฐจ์ ์œผ๋กœ ์–ด๋–ค ๊ฒƒ์„ ๊ตฌ๋งคํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋Š”๊ฐ€? ๋””์ง€ํ„ธ ์นด๋ฉ”๋ผ๋ฅผ ๊ตฌ๋งคํ•œ ํ›„์— ์–ผ๋งˆ์žˆ๋‹ค๊ฐ€ SD์นด๋“œ(๋ฉ”๋ชจ๋ฆฌ)๋ฅผ ๊ตฌ๋งคํ•˜๋Š”.. 2024. 4. 13.
1. Introduction What is Data Mining? ๋ฐ์ดํ„ฐ ๋งˆ์ด๋‹์ด๋ž€ ๋ฌด์—‡์ผ๊นŒ ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ ์†์—์„œ ํฅ๋ฏธ๋กญ๊ณ  ์ค‘์š”ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์ž๋™์œผ๋กœ ๋ฝ‘์•„๋‚ด๋Š” ๊ณผ์ • ์–ด๋–ค ๋ฐ์ดํ„ฐ๊ฐ€ ํฅ๋ฏธ๋กญ๊ณ  ์ค‘์š”? Non-trivial, Implicit, Previously unknown, Potentially usefull ,,, ํ•œ ์ •๋ณด๋“ค ์š”์ฆ˜ ์šฐ๋ฆฌ๋Š” ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ ์‹œ๋Œ€์— ์‚ด๊ณ  ์žˆ๊ณ , ๋ฐ์ดํ„ฐ๋Š” ๊ณ„์†ํ•ด์„œ ์Œ“์—ฌ๊ฐ€๊ธฐ ๋•Œ๋ฌธ์— ๊ทธ ์†์—์„œ ์ค‘์š”ํ•œ ์˜๋ฏธ๋ฅผ ์ฐพ์•„์•ผ ํ•œ๋‹ค Knowledge Discovery Process ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ ์†์—์„œ ์˜๋ฏธ์žˆ๋Š” ์ •๋ณด๋ฅผ ์ฐพ์•„๋‚ด๋Š” ๊ณผ์ • Data Cleaning ๋ฐ์ดํ„ฐ์— ์„ž์—ฌ์žˆ๋Š” ๋…ธ์ด์ฆˆ, ์—๋Ÿฌ ๋“ฑ์„ ์ œ๊ฑฐํ•˜๋Š” ๊ณผ์ • Data Warehouse ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋“ค์ด ์ €์žฅ๋œ ์ €์žฅ์†Œ Task-relevant Data ํ˜„์žฌ ์ง„ํ–‰ํ•˜๊ณ  ์žˆ๋Š” Task.. 2024. 4. 13.
๋ ˆ๋””์Šค ? Redis์ผ๋ฐ˜์ ์œผ๋กœ ๋ ˆ๋””์Šค๋ผ ํ–ˆ์„ ๋•Œ ๋– ์˜ค๋ฅด๋Š” ์ด๋ฏธ์ง€๋Š”,DB์— ์ ‘๊ทผํ•˜๋Š” ๊ฒƒ์ด ๋Š๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์— DB ๋ฐ์ดํ„ฐ์˜ ์ผ๋ถ€๋ฅผ ์ €์žฅํ•˜๋Š” ์บ์‹œ ์—ญํ•  ์˜ Key-Value Store๋ผ๊ณ  ์•Œ๊ณ ์žˆ๋‹ค.๋Œ€๋žต์ ์œผ๋กœ ์ด์ •๋„๋Š” ์•Œ์ง€๋งŒ, ์ •ํ™•ํžˆ๋Š” ์ž˜ ๋ชฐ๋ž๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฒˆ์— ์ •๋ฆฌ๋ฅผ ํ•œ๋ฒˆ ํ•ด๋ณด๊ณ ์ž ํ•œ๋‹ค ์ตœ๊ทผ ์ด์ŠˆRedis๋Š” ์›๋ž˜๋Š” ์˜คํ”ˆ์†Œ์Šค์˜€์œผ๋‚˜, ์ตœ๊ทผ์— ๋ผ์ด์„ผ์Šค๊ฐ€ ๋ณ€๊ฒฝ๋˜์–ด ๋”์ด์ƒ ์˜คํ”ˆ์†Œ์Šค๊ฐ€ ์•„๋‹ˆ๊ฒŒ ๋˜์—ˆ๋‹ค๋‹คํ–‰(?)์ธ ์ ์€ ๊ธฐ์กด ๊ฐœ๋ฐœ์ž ์ค‘ ๋ช‡๋ช…์ด Fork๋ฅผ ๋– ์„œ ๊ฐ€์ง€๊ณ  ๋‚˜์™€์„œ ValKey๋ผ๋Š” ์˜คํ”ˆ ์†Œ์Šค๋ฅผ ๋งŒ๋“ค๊ฒŒ ๋˜์—ˆ๋‹คValKey๋Š” ํ˜„์žฌ Linux์˜ ์žฌ๋‹จ์—์„œ ๊ด€๋ฆฌ๋ฅผ ํ•˜๊ณ  ์žˆ๊ณ , ์ด๋ฏธ ๋งŽ์€ ์‚ฌ์šฉ์ž๋“ค์ด ValKey๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์žˆ๋‹คFork๋ฅผ ๋– ์˜จ ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์—, ํ˜„์žฌ๊นŒ์ง€๋Š” ์‚ฌ์šฉ๋ฒ•์— ํฐ ์ฐจ์ด๊ฐ€ ์—†๋‹ค Redis?๋ ˆ๋””์Šค๋ž€ ๋ฌด์—‡์ธ๊ฐ€In-Memory Cache.. 2024. 4. 13.
8. ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๊ตฌ์ถ• ์ด๋ฒˆ ํ™œ๋™ ์š”์•ฝ ์ด๋ฒˆ์ฃผ์—๋Š” ๋ฐฑ์—”๋“œ ์„œ๋ฒ„ ๊ตฌ์ถ•์„ ์œ„ํ•ด์„œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ์ƒ์„ฑํ•˜์˜€๋‹ค ์„œ๋ฒ„ ์ปดํ“จํ„ฐ์— ์„œ๋ฒ„๋ฅผ ๊ตฌ์ถ•ํ•˜๊ธฐ ์ „์—, ๋จผ์ € ๋กœ์ปฌ์—์„œ ๊ฐœ๋ฐœ ์ž‘์—…์„ ํ•˜๋ ค๊ณ  ํ–ˆ๋‹ค. ๊ทธ๋Ÿผ์—๋„ ํ˜„์žฌ ๋‘˜์ด์„œ ์ž‘์—…์„ ํ•˜๋Š” ์ƒํ™ฉ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋Š” ๊ณต์œ ๋ฅผ ํ•˜๋ฉด ์ข‹๊ฒ ๋‹ค๊ณ  ์ƒ๊ฐํ•˜์—ฌ, ์„œ๋ฒ„ ์ปดํ“จํ„ฐ์— ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋งŒ ๋จผ์ € ์„ค์น˜๋ฅผ ํ•˜๊ธฐ๋กœ ํ–ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ฐ์ž ๋กœ์ปฌ์—์„œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋งŒ ์—ฐ๊ฒฐ์„ ํ•ด์„œ ๋ฐฑ์—”๋“œ ์ž‘์—…์„ ์ง„ํ–‰ํ•˜๊ณ ์ž ํ–ˆ๋‹ค ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์„œ๋ฒ„ ์ปดํ“จํ„ฐ์— ์ƒˆ๋กœ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ์„ค์น˜ํ–ˆ๋‹ค ๋จผ์ € ์–ด๋–ค DB๋ฅผ ์“ธ ๊ฒƒ์ธ์ง€๋ฅผ ๊ณ ๋ฏผํ–ˆ๋‹ค. ํฌ๊ฒŒ ๋ดค์„ ๋•Œ, SQL๊ณผ NoSQL๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค. SQL์˜ ๊ฒฝ์šฐ์—๋Š” ์ฃผ๋กœ ๊ด€๊ณ„ํ˜• ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๊ณ , NoSQL์€ ๊ทธ์™€ ๋ฐ˜๋Œ€๋˜๋Š” ๋น„๊ด€๊ณ„ํ˜• ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ผ๊ณ  ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. NoSQL์˜ ์žฅ์ ์€ ์กฐํšŒ๊ฐ€ ๋น ๋ฅด๊ณ , ๋Œ€์šฉ๋Ÿ‰.. 2024. 3. 31.