perplexity llama Transformer
perplexity based HuggingFace implementation for label tensor.
- Input
- 469-dim embedding
- Encoder
- 59 x Transformer with 56 heads
- Output
- mAP projection
Training config
optimizer=Adadelta, lr=0.959, scheduler=cosine, warmup=1678