Skip to main content

Pretrained Model

Large model published in huggingface model hub.

Training Details

  • Sentencepiece, Word2Vec, Fasttext, GloVe model trained with Bengali Wikipedia Dump Dataset
  • SentencePiece Training Vocab Size=50000
  • Fasttext trained with total words = 20M, vocab size = 1171011, epoch=50, embedding dimension = 300 and the training loss = 0.318668,
  • Word2Vec word embedding dimension = 100, min_count=5, window=5, epochs=10
  • To Know Bengali GloVe Wordvector and training process follow this repository
  • Bengali CRF POS Tagging was training with nltr dataset with 80% accuracy.
  • Bengali CRF NER Tagging was train with this data with 90% accuracy.
  • Bengali news article doc2vec model train with 8 jsons of this corpus with epochs 40 vector size 100 min_count=2, total news article 400013
  • Bengali wikipedia doc2vec model trained with wikipedia dump datasets. Total articles 110448, epochs: 40, vector_size: 100, min_count: 2