Summary ๐Ÿค™


Generalization์„ ๋†’์ด๊ธฐ ์œ„ํ•ด ํ•™์Šต์„ ๋ฐฉํ•ดํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ๋‹ค์‹œ ๋งํ•ด test set์—์„œ ์ž˜ ๋™์ž‘ํ•˜๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•œ ๋ฐฉ๋ฒ•์ด๋‹ค.



Index ๐Ÿ‘€


Early Stopping


์ถ”๊ฐ€์ ์ธ Validation Data๋ฅผ ๊ตฌ์„ฑํ•˜๊ณ , Training Error์™€ Validation Error์˜ ์ฐจ์ด๊ฐ€ ์ฆ๊ฐ€ํ•  ๋•Œ ํ•™์Šต์„ ๋ฉˆ์ถ”๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค.



Parameter Norm Penalty


\[\begin{aligned} &\text { total } \operatorname{cost}=\operatorname{loss}(\mathcal{D} ; W)+\frac{\alpha}{2}\|W\|_{2}^{2} \end{aligned}\]

Weight Parameter์˜ ํฌ๊ธฐ(์ ˆ๋Œ€๊ฐ’)๋ฅผ ์ •๊ทœํ™”ํ•˜์—ฌ ์ž‘๊ฒŒ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด๋‹ค.
๋ถ€๋“œ๋Ÿฌ์šด ํ•จ์ˆ˜๊ฐ€ Generalization ์„ฑ๋Šฅ์ด ๋†’์„ ๊ฒƒ์ด๋ผ๋Š” ๊ฐ€์ •์—์„œ ์ถœ๋ฐœํ•œ๋‹ค.



Data argumentation


๋ฐ์ดํ„ฐ๊ฐ€ ๋งŽ์„ ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ๋†’์•„์ง„๋‹ค. ๋‹จ, ๋ฐ์ดํ„ฐ๊ฐ€ ํ•œ์ •์ ์ธ ๊ฒฝ์šฐ์— Label์ด ๋ณ€ํ•˜์ง€ ์•Š๋Š” ์„ ์—์„œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณ€ํ˜•ํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์…‹์„ ์ฆ๊ฐ€์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค.



Noise Robustness


์ž…๋ ฅ์ด๋‚˜ Weight์— Noise๋ฅผ ์ค‘๊ฐ„์— ์ง€์†์ ์œผ๋กœ ๋„ฃ์–ด์ฃผ๋ฉด ๋” ํ•™์Šต๋ฅ ์ด ๋†’์•„์งˆ ์ˆ˜ ์žˆ๋‹ค. (์ด์œ ๋Š” ์•„์ง ๋ชจ๋ฆ„)



Label smoothing


  • Mixup : ๋‘๊ฐœ์˜ ์ด๋ฏธ์ง€๋ฅผ ์„ž๊ณ  label๋„ ์„ž๋Š”๋‹ค.
  • Cutout : ์ด๋ฏธ์ง€์˜ ์ผ์ • ์˜์—ญ์„ ๋บ€๋‹ค.
  • CutMix : ํŠน์ • ์˜์—ญ์„ ์ž˜๋ผ ๋‹ค๋ฅธ ์ด๋ฏธ์ง€๋ฅผ ๋„ฃ๋Š”๋‹ค.



Dropout


weight์˜ ์ผ๋ถ€ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ 0์œผ๋กœ ์ดˆ๊ธฐํ™” ํ•˜์—ฌ ํŠน์ • feature์— ๊ตญํ•œ๋˜์ง€ ์•Š๋„๋ก ํ•œ๋‹ค.



Batch Normalization


\[\begin{aligned} \mu_{B} &=\frac{1}{m} \sum_{i=1}^{m} x_{i} \\ \sigma_{B}^{2} &=\frac{1}{m} \sum_{i=1}^{m}\left(x_{i}-\mu_{B}\right)^{2} \\ \hat{x}_{i} &=\frac{x_{i}-\mu_{B}}{\sqrt{\sigma_{B}^{2}+\epsilon}} \end{aligned}\]

layer์˜ weight ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ •๊ทœํ™”ํ•œ๋‹ค. (๋…ผ๋ž€์ด ๋งŽ์œผ๋‚˜ ๋Œ€๋ถ€๋ถ„์˜ ๊ฒฝ์šฐ์— ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋œ๋‹ค.)