Transformers Resources
Papers
- Attention is all you need(Vashani 2017)
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding(Devlin 2018)
- GPT-2: Language Models are Unsupervised Multitask Learners(Radford 2018)
- GPT-2: Language Models are Few-Shot Learners(Radford 2020)
Codes
- attention implementation by sooftware
- notebook on transformers library by leis et al
- minGPT by karpathy
- tranformers library
- transformers in pytorch
- gpt2 simple python package