组件优化

"组件优化"

Posted by zwt on March 4, 2024

RMSNorm

layerNorm计算如下: \(\begin{aligned} &a_i=\sum_{j=1}^m w_{i j} x_j, \quad y_i=f\left(a_i+b_i\right),\\ &\bar{a}_i=\frac{a_i-\mu}{\sigma} g_i, \quad y_i=f\left(\bar{a}_i+b_i\right),\\ &\mu=\frac{1}{n} \sum_{i=1}^n a_i, \quad \sigma=\sqrt{\frac{1}{n} \sum_{i=1}^n\left(a_i-\mu\right)^2} . \end{aligned}\) 第一行表示未经过归一化。第二行表示经过layerNorm,第三行表示均值和方差的计算。 rmsnorm计算如下: \(\bar{a}_i=\frac{a_i}{\operatorname{RMS}(\mathbf{a})} g_i, \quad \text { where } \operatorname{RMS}(\mathbf{a})=\sqrt{\frac{1}{n} \sum_{i=1}^n a_i^2} .\)

AdamW

SwiGLU

causal multi-head attention

all_reduce

a cosine learning rate schedule