๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

HYU39

5. FP-growth Frequent Pattern Growth ์ด์ „๊นŒ์ง€๋Š” Apriori algorithm์„ ์‚ฌ์šฉํ•ด์„œ Freqeunt Pattern Mining์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ดค๋‹ค Apriori์˜ ํ•œ๊ณ„๋ฅผ ๊ฐœ์„ ํ•œ Improving Apriori ๋ฐฉ๋ฒ•๋“ค๋„ ์‚ดํŽด๋ณด์•˜๋‹ค DIC, Partition, Sampling, DHP ๊ทธ๋Ÿผ์—๋„ ์—ฌ์ „ํžˆ ๋Š๋ฆฌ๋‹ค๋Š” ๋ฌธ์ œ๊ฐ€ ์žˆ๋‹ค Candidate๋ฅผ ์ƒ์„ฑํ•˜๊ณ , Testํ•˜๋Š” ๊ณผ์ • ์ž์ฒด๊ฐ€ ์‹œ๊ฐ„์ด ์˜ค๋ž˜ ๊ฑธ๋ฆผ(Bottleneck) FP-growth Mining Frequent Patterns without Candidate Generation Candidate๋ฅผ Generateํ•˜๋Š” ๊ฒƒ ์ž์ฒด๋ฅผ ํ•˜์ง€ ์•Š๋Š” ๋ฐฉ๋ฒ• Local Frequent Item๋“ค์„ ์‚ฌ์šฉํ•ด์„œ, ์งง์€ Pattern์œผ๋กœ๋ถ€ํ„ฐ ๊ธด Pattern์„ ์ƒ์„ฑํ•ด๋‚ด.. 2024. 4. 13.
4. Improving Apriori Apriori ์•Œ๊ณ ๋ฆฌ์ฆ˜์—๋Š” ์—ฌ๋Ÿฌ ํ•œ๊ณ„๊ฐ€ ์กด์žฌํ•œ๋‹ค๊ณ  ํ–ˆ๋‹ค Multiple scans of DB (k times) ๋Œ€๋žต k๋ฒˆ์˜ DB ์Šค์บ”์ด ๋ฐœ์ƒํ•œ๋‹ค๋Š” ๊ฒƒ ์—ฌ๊ธฐ์„œ k๋Š” Max Pattern์˜ ๊ธธ์ด์ด๋‹ค DB ์ ‘๊ทผ์€ ๋„ˆ๋ฌด ๋Š๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์— ๊ฐœ์„ ์ด ํ•„์š”ํ•˜๋‹ค Huge number of candidates ํ›„๋ณด๊ตฐ์˜ ์ˆ˜๊ฐ€ ๋„ˆ๋ฌด ๋งŽ๋‹ค Max Pattern {i1, i2, ..., i100}์„ ์ฐพ๊ธฐ ์œ„ํ•ด์„œ๋Š” # of scans(k): 100 # of candidates: 2^100 - 1 ๋งŒํผ์˜ ํ›„๋ณด๊ตฐ Tedious workload of Candidate generation and Test Candidate๋“ค์˜ Support๋ฅผ Countํ•˜๋Š” ๊ฒƒ์˜ Cost๊ฐ€ ๊ฝค ํฌ๋‹ค Improving Apriori Apriori๋ฅผ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ.. 2024. 4. 13.
3. Apriori Scalable Mining Method ์ค‘ ํ•˜๋‚˜ Scale down์„ ํ•˜๋ฉด์„œ Frequent Pattern์„ ์ฐพ๋Š” Method ์ค‘ ํ•˜๋‚˜ Apriori Candidate Generation and Test Approach Apriori์—์„œ Scaledown์„ ํ•˜๋Š” ์›๋ฆฌ๋Š” Infrequentํ•œ Pattern์ด ์žˆ๋‹ค๋ฉด, ํ•ด๋‹น ํŒจํ„ด์˜ Superset์€ ์ ˆ๋Œ€ Frequentํ•  ์ˆ˜๊ฐ€ ์—†๋‹ค๋Š” ๊ฒƒ์„ ์ด์šฉ Downward property ์ด์šฉ ๊ทธ๋ ‡๊ธฐ ๋•Œ๋ฌธ์— ๊ตณ์ด Generationํ•˜๊ณ  Testํ•  ํ•„์š”๊ฐ€ ์—†๋‹ค ์ฒดํฌํ•ด์•ผ ํ•  ํŒจํ„ด์˜ ์ˆ˜๋ฅผ ์ค„์—ฌ์ค€๋‹ค ๋ฐฉ๋ฒ•์„ ๊ฐ„๋žตํ•˜๊ฒŒ ๋ณด๋ฉด 1. DB๋ฅผ ์Šค์บ”ํ•ด์„œ ํฌ๊ธฐ๊ฐ€ 1์ธ Frequent Pattern๋“ค์„ ์ฐพ๋Š”๋‹ค 2. ์•„๋ž˜์˜ ๊ณผ์ •์„ ๊ณ„์†ํ•ด์„œ ๋ฐ˜๋ณตํ•œ๋‹ค 2-1. ๊ธธ์ด๊ฐ€ K์ธ Frequent Patt.. 2024. 4. 13.
2. Frequent Patterns Mining Frequent Patterns, Association and Correlatons Frequent Pattern Mining ๋ฐ์ดํ„ฐ ์†์—์„œ ์ž์ฃผ ๋“ฑ์žฅํ•˜๋Š” ํŒจํ„ด์„ ๋ถ„์„ํ•˜๋Š” ๊ธฐ์ˆ  Frequent Pattern? : ๋ฐ์ดํ„ฐ์…‹ ๋‚ด์—์„œ ์ž์ฃผ ๋“ฑ์žฅํ•˜๋Š” ํŒจํ„ด ์˜ˆ๋ฅผ ๋“ค๋ฉด, ์ž์ฃผ ํ•จ๊ป˜ ๊ตฌ๋งค๋˜๋Š” ์ƒํ’ˆ๋“ค Motivation? ๋ฐ์ดํ„ฐ ์†์— ๋‚ด์žฌ๋œ ํŒจํ„ด๋“ค ์ฐพ๊ธฐ ์œ„ํ•จ ์–ด๋–ค ์ƒํ’ˆ๋“ค์ด ํ•จ๊ป˜ ๊ตฌ๋งค๊ฐ€ ๋˜๋Š”๊ฐ€? (์ด๊ฒŒ ์•ž์œผ๋กœ ์ฃผ๋กœ ๋‹ค๋ค„์งˆ ์˜ˆ์‹œ) Beers and Diapers ๊ธฐ์ €๊ท€์™€ ๋งฅ์ฃผ๋Š” ํ•จ๊ป˜ ๊ตฌ๋งค๊ฐ€ ๋˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋‹ค ์ด ์ •๋ณด๋ฅผ ์•Œ๋ฉด ๊ธฐ์ €๊ท€์™€ ๋งฅ์ฃผ๋ฅผ ํ•จ๊ป˜ ๋น„์น˜ํ•˜๋ฉด ํŒ๋งค์œจ์ด ์˜ฌ๋ผ๊ฐˆ ๊ฒƒ ํŠน์ • ์ƒํ’ˆ์„ ๊ตฌ๋งคํ•œ ๋‹ค์Œ ์ˆœ์ฐจ์ ์œผ๋กœ ์–ด๋–ค ๊ฒƒ์„ ๊ตฌ๋งคํ•˜๋Š” ๊ฒฝํ–ฅ์ด ์žˆ๋Š”๊ฐ€? ๋””์ง€ํ„ธ ์นด๋ฉ”๋ผ๋ฅผ ๊ตฌ๋งคํ•œ ํ›„์— ์–ผ๋งˆ์žˆ๋‹ค๊ฐ€ SD์นด๋“œ(๋ฉ”๋ชจ๋ฆฌ)๋ฅผ ๊ตฌ๋งคํ•˜๋Š”.. 2024. 4. 13.
1. Introduction What is Data Mining? ๋ฐ์ดํ„ฐ ๋งˆ์ด๋‹์ด๋ž€ ๋ฌด์—‡์ผ๊นŒ ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ ์†์—์„œ ํฅ๋ฏธ๋กญ๊ณ  ์ค‘์š”ํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ์ž๋™์œผ๋กœ ๋ฝ‘์•„๋‚ด๋Š” ๊ณผ์ • ์–ด๋–ค ๋ฐ์ดํ„ฐ๊ฐ€ ํฅ๋ฏธ๋กญ๊ณ  ์ค‘์š”? Non-trivial, Implicit, Previously unknown, Potentially usefull ,,, ํ•œ ์ •๋ณด๋“ค ์š”์ฆ˜ ์šฐ๋ฆฌ๋Š” ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ ์‹œ๋Œ€์— ์‚ด๊ณ  ์žˆ๊ณ , ๋ฐ์ดํ„ฐ๋Š” ๊ณ„์†ํ•ด์„œ ์Œ“์—ฌ๊ฐ€๊ธฐ ๋•Œ๋ฌธ์— ๊ทธ ์†์—์„œ ์ค‘์š”ํ•œ ์˜๋ฏธ๋ฅผ ์ฐพ์•„์•ผ ํ•œ๋‹ค Knowledge Discovery Process ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ ์†์—์„œ ์˜๋ฏธ์žˆ๋Š” ์ •๋ณด๋ฅผ ์ฐพ์•„๋‚ด๋Š” ๊ณผ์ • Data Cleaning ๋ฐ์ดํ„ฐ์— ์„ž์—ฌ์žˆ๋Š” ๋…ธ์ด์ฆˆ, ์—๋Ÿฌ ๋“ฑ์„ ์ œ๊ฑฐํ•˜๋Š” ๊ณผ์ • Data Warehouse ๋Œ€๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ๋“ค์ด ์ €์žฅ๋œ ์ €์žฅ์†Œ Task-relevant Data ํ˜„์žฌ ์ง„ํ–‰ํ•˜๊ณ  ์žˆ๋Š” Task.. 2024. 4. 13.
8. ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ๊ตฌ์ถ• ์ด๋ฒˆ ํ™œ๋™ ์š”์•ฝ ์ด๋ฒˆ์ฃผ์—๋Š” ๋ฐฑ์—”๋“œ ์„œ๋ฒ„ ๊ตฌ์ถ•์„ ์œ„ํ•ด์„œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ์ƒ์„ฑํ•˜์˜€๋‹ค ์„œ๋ฒ„ ์ปดํ“จํ„ฐ์— ์„œ๋ฒ„๋ฅผ ๊ตฌ์ถ•ํ•˜๊ธฐ ์ „์—, ๋จผ์ € ๋กœ์ปฌ์—์„œ ๊ฐœ๋ฐœ ์ž‘์—…์„ ํ•˜๋ ค๊ณ  ํ–ˆ๋‹ค. ๊ทธ๋Ÿผ์—๋„ ํ˜„์žฌ ๋‘˜์ด์„œ ์ž‘์—…์„ ํ•˜๋Š” ์ƒํ™ฉ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋Š” ๊ณต์œ ๋ฅผ ํ•˜๋ฉด ์ข‹๊ฒ ๋‹ค๊ณ  ์ƒ๊ฐํ•˜์—ฌ, ์„œ๋ฒ„ ์ปดํ“จํ„ฐ์— ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋งŒ ๋จผ์ € ์„ค์น˜๋ฅผ ํ•˜๊ธฐ๋กœ ํ–ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ฐ์ž ๋กœ์ปฌ์—์„œ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋งŒ ์—ฐ๊ฒฐ์„ ํ•ด์„œ ๋ฐฑ์—”๋“œ ์ž‘์—…์„ ์ง„ํ–‰ํ•˜๊ณ ์ž ํ–ˆ๋‹ค ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค ์„œ๋ฒ„ ์ปดํ“จํ„ฐ์— ์ƒˆ๋กœ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ฅผ ์„ค์น˜ํ–ˆ๋‹ค ๋จผ์ € ์–ด๋–ค DB๋ฅผ ์“ธ ๊ฒƒ์ธ์ง€๋ฅผ ๊ณ ๋ฏผํ–ˆ๋‹ค. ํฌ๊ฒŒ ๋ดค์„ ๋•Œ, SQL๊ณผ NoSQL๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ๋‹ค. SQL์˜ ๊ฒฝ์šฐ์—๋Š” ์ฃผ๋กœ ๊ด€๊ณ„ํ˜• ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๊ณ , NoSQL์€ ๊ทธ์™€ ๋ฐ˜๋Œ€๋˜๋Š” ๋น„๊ด€๊ณ„ํ˜• ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค๋ผ๊ณ  ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ๋‹ค. NoSQL์˜ ์žฅ์ ์€ ์กฐํšŒ๊ฐ€ ๋น ๋ฅด๊ณ , ๋Œ€์šฉ๋Ÿ‰.. 2024. 3. 31.
7. ์›น ์„œ๋ฒ„ ๊ตฌ์ƒ ์ด๋ฒˆ ํ™œ๋™ ์š”์•ฝ ์ด๋ฒˆ์ฃผ์—๋Š” ๋ฐฑ์—”๋“œ ์„œ๋ฒ„๋ฅผ ์–ด๋–ค ์‹์œผ๋กœ ๊ตฌ์„ฑํ• ์ง€์— ๋Œ€ํ•ด์„œ ๊ตฌ์ƒ์„ ํ•ด๋ณด์•˜๋‹ค ํ”„๋ก ํŠธ์—”๋“œ ๊ฐœ๋ฐœ์€ ์ผ๋‹จ ์ž ์‹œ ๋ฏธ๋ค„๋‘๊ณ , ์ด๋ฒˆ์ฃผ์—๋Š” ๋ฐฑ์—”๋“œ ์„œ๋ฒ„ ๊ฐœ๋ฐœ์„ ์ง„ํ–‰ํ–ˆ๋‹ค ์ผ๋ฐ˜์ ์ธ ์ƒํ™ฉ์ด์—ˆ๋‹ค๋ฉด ๋กœ์ปฌ์—์„œ ๊ฐœ๋ฐœ์„ ํ•˜๊ณ , AWS์— ๋ฐฐํฌํ•˜๋Š” ์‹์œผ๋กœ ์ง„ํ–‰์„ ํ–ˆ์„ ๊ฒƒ์ด๋‹ค. ๊ทธ๋Ÿฐ๋ฐ ์—ฐ๊ตฌ์‹ค์— ์ƒ์‹œ ๊ฐ€๋™ํ•  ์ˆ˜ ์žˆ๋Š” ์„œ๋ฒ„ ์ปดํ“จํ„ฐ๊ฐ€ ์žˆ๋‹ค๊ณ  ํ•˜์…”์„œ, ์ด๊ฒƒ์„ ํ™œ์šฉํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ง„ํ–‰์„ ๊ฒฐ์ •ํ–ˆ์—ˆ๋‹ค. ๋ฐฑ์—”๋“œ ๊ฐœ๋ฐœ ๊ฒฝํ—˜๋„ ๋ณ„๋กœ ์—†์ง€๋งŒ, ์„œ๋ฒ„ ์ปดํ“จํ„ฐ๋กœ ์„œ๋ฒ„๋ฅผ ์šด์˜ํ•˜๋Š” ๊ฒƒ์€ ์ฒ˜์Œ์ด๋ผ ์ด๊ฒƒ์ €๊ฒƒ ์•Œ์•„๋ณด์•„์•ผ ํ–ˆ๋‹ค. ์›๊ฒฉ ์ ‘์† ๊ณ„์† ์—ฐ๊ตฌ์‹ค์— ๋ฐฉ๋ฌธํ•ด์„œ ์ž‘์—…์„ ํ•  ์ˆ˜๋Š” ์—†์–ด์„œ, ์›๊ฒฉ์œผ๋กœ ์„œ๋ฒ„ ์ปดํ“จํ„ฐ์— ์ ‘์†ํ•  ์ˆ˜ ์žˆ๋„๋ก ์„ธํŒ…์„ ํ–ˆ๋‹ค ๊ธฐ๋ณธ์ ์ธ ์„ธํŒ…์€ ์กฐ๊ต๋‹˜์ด ํ•ด์ฃผ์‹œ๊ณ , ๋‚˜๋Š” ์—ฐ๊ตฌ์‹ค์— ๋ฐฉ๋ฌธํ•ด์„œ ๋‚ด ๊ณ„์ •์„ ๋“ฑ๋กํ•˜๊ธฐ๋งŒ ํ–ˆ๋‹ค. ํฌ๋กฌ์—์„œ ์ œ๊ณตํ•˜๋Š” ์›๊ฒฉ ๋ฐ์Šคํฌํ†ฑ์„ ์‚ฌ.. 2024. 3. 20.
6. ๋…ธ๋“œ ์‚ญ์ œ ๋ฐ ๊ฐ„์„  ์ถ”๊ฐ€ ๊ธฐ๋Šฅ ๊ฐœ์„  ์ด์ „ ๋ฏธํŒ… ํ”ผ๋“œ๋ฐฑ ์ด์ „ ๋ฏธํŒ…์—์„œ ๋ฐ›์€ ํ”ผ๋“œ๋ฐฑ์„ ๋ฐ”ํƒ•์œผ๋กœ ๊ธฐ๋Šฅ ๊ฐœ์„  ๋…ธ๋“œ ์‚ญ์ œ์‹œ ์ด์›ƒ ๋…ธ๋“œ๋„ ํ•จ๊ป˜ ์‚ญ์ œํ•  ๊ฒƒ์ธ์ง€ ์—ฌ๋ถ€ ์„ ํƒ ๊ฐ€๋Šฅ ์„œ๋กœ ๋‹ค๋ฅธ ํด๋Ÿฌ์Šคํ„ฐ๊ฐ„ ๊ฐ„์„  ์ถ”๊ฐ€์‹œ ์ƒ‰์ƒ ๋ณ€๊ฒฝ Union-Find ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ฑ๋Šฅ ๊ฐœ์„  ํ”ผ๋“œ๋ฐฑ ๋ฐ˜์˜ 1. ์ด์›ƒ ๋…ธ๋“œ ์‚ญ์ œ ์—ฌ๋ถ€ ๊ฒฐ์ • ํŠน์ • ๋…ธ๋“œ ์‚ญ์ œ์‹œ ํ•ด๋‹น ๋…ธ๋“œ์™€ ์ด์›ƒํ•œ ๋…ธ๋“œ๋ฅผ ํ•จ๊ป˜ ์‚ญ์ œํ• ์ง€ ์„ ํƒ ๊ฐ€๋Šฅํ•˜๋„๋ก ๊ฐœ์„  const handleSubmit = useCallback(() => { if (filter === "") { return; } setRerender(false); setGraphData((prev) => { let filteredNodes = prev.nodes; let filteredEdges = prev.edges; if (deleteNeighbor) { filtere.. 2024. 3. 17.
5. ์ •์ , ๊ฐ„์„  ์ถ”๊ฐ€ ๋ฐ ์‚ญ์ œ ์ด๋ฒˆ ํ™œ๋™ ์š”์•ฝ ์ด๋ฒˆ์ฃผ์—๋Š” Cytoscape.js์—์„œ Measure์— ๋Œ€ํ•œ ๊ธฐ๋Šฅ์„ ์ œ๊ณตํ•˜๋Š”์ง€ ์•Œ์•„๋ณด๊ณ , ์ •์ ๊ณผ ๊ฐ„์„ ์„ ์ถ”๊ฐ€ํ•˜๊ณ  ์‚ญ์ œํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋Šฅ์„ ์ผ๋ถ€ ๊ตฌํ˜„ํ–ˆ๋‹ค. Community Detection์€ ์–ด๋Š์ •๋„ ๋˜์—ˆ์œผ๋‹ˆ, ์ด์ œ ๋…ธ๋“œ์˜ Degree, Path์˜ ๊ธธ์ด ๋“ฑ Network์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ์‚ฌ์šฉ์ž์—๊ฒŒ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด์„œ Network๋ฅผ ์ธก์ •ํ•  ์ˆ˜ ์žˆ๋Š” ๊ธฐ๋Šฅ๋“ค์„ ์ œ๊ณตํ•˜๋Š”์ง€ ์ฐพ์•„๋ณด์•˜๋‹ค. ๊ธฐ์กด์—๋Š” ํŒŒ์ด์ฌ์—์„œ ํ•ด๋‹น ๊ธฐ๋Šฅ๋“ค์„ ์ œ๊ณตํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํŒŒ์ด์ฌ์— ์ด๋ฏธ ๊ตฌํ˜„์ด ๋˜์–ด ์žˆ์ง€๋งŒ, ํ”„๋ก ํŠธ๋‹จ์—์„œ ๊ณ„์‚ฐ์„ ํ•ด์„œ ๋ฐ”๋กœ ๋ณด์—ฌ์ฃผ๊ฒŒ ๋˜๋ฉด ์•„๋ฌด๋ž˜๋„ ๋” ๋น ๋ฅด๊ฒŒ ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— Cytoscape.js์—์„œ ๊ทธ๋Ÿฐ ๊ธฐ๋Šฅ๋“ค์„ ์ œ๊ณตํ•˜๋Š”์ง€ ์ฐพ์•„๋ณด์•˜๋‹ค. Network Analyzer Cytoscape์—์„œ๋Š” Network์— ๋Œ€ํ•œ ํ†ต๊ณ„ ์ •๋ณด.. 2024. 3. 4.