Summary ๐Ÿค™


โ€œ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ์ข‹๋‹คโ€๋ผ๋Š” ๊ธฐ์ค€๋“ค์„ ์•Œ์•„๋ณด๊ณ  ์–ด๋–ป๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š”์ง€ ์•Œ์•„๋ณธ๋‹ค.

๋‹จ, ์–ด๋–ค ๋ฐฉ๋ฒ•์ด๋“  test data๋ฅผ ํ™œ์šฉํ•˜๋ฉด cheating์ž„์„ ๋ช…์‹ฌํ•˜์ž.



Index ๐Ÿ‘€



Generalization


Trainning Error์™€ Test Error์˜ ์ฐจ์ด๋ฅผ ๋‚˜ํƒ€๋‚ด๋Š” ๊ฒƒ์œผ๋กœ ์ด Gap์„ ์ค„์ด๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ๊ฐ€์ง€๊ฒŒ ๋œ๋‹ค.
ํ›ˆ๋ จ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ด ์–ป์€ Weight๊ฐ€ ์‹ค์ œ ๋ฐ์ดํ„ฐ(test data)์—์„œ๋„ ์–ผ๋งˆ๋‚˜ ์ž˜ ์ ์šฉ๋˜๋Š”์ง€ ๋‚˜ํƒ€๋‚ธ๋‹ค.



Underfitting & Overfitting


Underfitting ์ด๋ž€ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๋งž๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ๋Š์Šจํ•˜๊ฒŒ ํ•™์Šต๋œ ํ˜•ํƒœ๋กœ์„œ ์ ์ ˆํ•œ ํŒŒ๋ผ๋ฏธํ„ฐ๋กœ ํŒ๋‹จํ•˜๊ธฐ ์–ด๋ ค์›Œ์ง„๋‹ค.
Overfitting ์ด๋ž€ ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๊ณผ๋„ํ•˜๊ฒŒ ์ตœ์ ํ™”๋œ ๊ฒƒ์œผ๋กœ์„œ ์•ž์„œ ์–ธ๊ธ‰ํ•œ Generalization gap์„ ์ฆ๊ฐ€์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค.



Cross-validation


K-fold Cross-validation์ด๋ผ๊ณ  ๋ถ€๋ฅด๊ธฐ๋„ ํ•œ๋‹ค.
๋ฐ์ดํ„ฐ์…‹์„ ์ผ์ •ํ•œ ํฌ๊ธฐ๋กœ k ๊ทธ๋ฃน์œผ๋กœ ๋‚˜๋ˆ„์–ด 1๊ฐœ์˜ validation set(test set์ด ์ ˆ๋Œ€ ์•„๋‹ˆ๋‹ค.)๊ณผ k-1๊ฐœ์˜ training set์œผ๋กœ ๋‚˜๋ˆ„์–ด ํ•™์Šต์— ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด๋‹ค.

์˜ˆ๋ฅผ ๋“ค์–ด 100๊ฐœ์˜ ๋ฐ์ดํ„ฐ ์…‹์„ 10๊ฐœ์˜ fold๋กœ ๋‚˜๋ˆ„๊ณ  ๋ชจ๋ธA์— ๋Œ€ํ•ด์„œ๋Š” 1~9 fold ๋ฐ์ดํ„ฐ์…‹์„ training set์œผ๋กœ ํ™œ์šฉํ•˜๊ณ  10 fold ๋ฐ์ดํ„ฐ์…‹์„ validation set์œผ๋กœ ํ™œ์šฉ, ๋ชจ๋ธ B, C์— ๋Œ€ํ•ด์„œ ๊ฐ๊ฐ ๋‹ค๋ฅธ validation set์„ ํ™œ์šฉํ•ด ๊ฐ๊ฐ์˜ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•˜์—ฌ ์ ์ ˆํ•œ ๋ชจ๋ธ์„ ํƒํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.
์ฃผ๋กœ hyperparameter(learing rateโ€ฆ)์™€ ๊ฐ™์€ ๋ชจ๋ธ์˜ ํŠน์ง•์„ ํ‰๊ฐ€ํ•˜๊ณ  ๋น„๊ตํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•œ๋‹ค.



Bias-Variance tradeoff


์ฃผ์–ด์ง„ ๋ฐ์ดํ„ฐ์— ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ๊ฒฝ์šฐ Bias์™€ Variance๋Š” ๋‹ค์Œ์˜ ๊ด€๊ณ„๋กœ ์ธํ•ด ๋‘˜๋‹ค ๋™์‹œ์— ์ค„์ด๋Š” ๋ชจ๋ธ์„ ์ฐพ๋Š” ๊ฒƒ์€ ํž˜๋“ค๋‹ค.

Cost = Bias^2 + Variance + Noise

$\begin{aligned} \mathbb{E}\left[(t-\hat{f})^{2}\right] &=\mathbb{E}\left[(t-f+f-\hat{f})^{2}\right] \ &=\cdots \ &=\mathbb{E}\left[\left(f-\mathbb{E}[\hat{f}]^{2}\right)^{2}\right]+\mathbb{E}\left[(\mathbb{E}[\hat{f}]-\hat{f})^{2}\right]+\mathbb{E}[\epsilon] \end{aligned}$



BootStrapping


ํ•™์Šต๋ฐ์ดํ„ฐ๊ฐ€ ์žˆ์„ ๋•Œ sub sampling์„ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ๋‚˜๋ˆ„๋Š” ๊ฒƒ์ด๋‹ค.



Bagging & Boosting


Bagging (Bootstrapping aggregating) : ๋ฐ์ดํ„ฐ๋ฅผ bootstraping ํ•˜์—ฌ ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋ชจ๋ธ๋กœ ๋…๋ฆฝ์ ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹์ด๋‹ค.
Dectreteํ•œ ์˜ˆ์ธก๊ฐ’์€ vote, ์—ฐ์† ๋ฐ์ดํ„ฐ๋Š” ํ‰๊ท ์œผ๋กœ ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•œ๋‹ค.

Related : Random Forest

Boosting : Bagging๊ณผ ๋™์ผํ•˜๋‚˜ ๊ฐ ๋ชจ๋ธ์ด ๋…๋ฆฝ์ ์ธ ํ˜•ํƒœ๊ฐ€ ์ด๋‹Œ Sequentialํ•˜๊ฒŒ ๋ฐฐ์น˜ํ•˜์—ฌ ์—ฌ๋Ÿฌ ๊ฐœ์˜ weak learner ๋ฅผ ํ†ตํ•ด 1๊ฐœ์˜ strong learner๋ฅผ ๋งŒ๋“œ๋Š” ๋ฐฉ์‹์ด๋‹ค.