Weight initialization scheme
DISTRIBUTION: Sample weights from a provided distribution
ZERO: Generate weights as zeros
SIGMOID_UNIFORM: A version of XAVIER_UNIFORM for sigmoid activation functions. U(-r,r) with r=4*sqrt(6/(fanIn + fanOut))
UNIFORM: Uniform U[-a,a] with a=1/sqrt(fanIn). "Commonly used heuristic" as per Glorot and Bengio 2010
XAVIER: As per Glorot and Bengio 2010: Gaussian distribution with mean 0, variance 2.0/(fanIn + fanOut)
XAVIER_UNIFORM: As per Glorot and Bengio 2010: Uniform distribution U(-s,s) with s = sqrt(6/(fanIn + fanOut))
XAVIER_FAN_IN: Similar to Xavier, but 1/fanIn -> Caffe originally used this.
XAVIER_LEGACY: Xavier weight init in DL4J up to 0.6.0. XAVIER should be preferred.
RELU: He et al. (2015), "Delving Deep into Rectifiers". Normal distribution with variance 2.0/nIn
RELU_UNIFORM: He et al. (2015), "Delving Deep into Rectifiers". Uniform distribution U(-s,s) with s = sqrt(6/fanIn)