Javadoc
AdaGrad - Adaptive Stochastic Gradient Method
AdaGrad alters the update to adapt based on historical information, so that frequent occurring
features in the gradients get small learning rates and infrequent features get higher ones. The
learner learns slowly from frequent features but "pays attention" to rate but informative
features. In practice, this means that infrequently occurring features can be learned effectively
along side more frequently occurring features.
A good reference for literature is: Duchi, John, Elad Hazan, and Yoram Singer.
"Adaptive subgradient methods for online learning and stochastic optimization." The Journal of
Machine Learning Research 12 (2011): 2121-2159. http://www.magicbroom.info/Papers/DuchiHaSi10.pdf