An implementation of the training methods of a BalancedWinnow
on-line classifier. Given a labeled instance (x, y) the algorithm
computes dot(x, wi), for w1, ... , wc where wi is the weight
vector for class i. The instance is classified as class j
if the value of dot(x, wj) is the largest among the c dot
products.
The weight vectors are updated whenever the the classifier
makes a mistake or just barely got the correct answer (highest
dot product is within delta percent higher than the second highest).
Suppose the classifier guessed j and answer was j'. For each
feature i that is present, multiply w_ji by (1-epsilon) and
multiply w_j'i by (1+epsilon)
The above procedure is done multiple times to the training
examples (default is 5), and epsilon is cut by the cooling
rate at each iteration (default is cutting epsilon by half).