Pretrained Model

Download Links

Large model published in huggingface model hub.

Sentencepiece, Word2Vec, Fasttext, GloVe model trained with Bengali Wikipedia Dump Dataset
- Bengali Wiki Dump
SentencePiece Training Vocab Size=50000
Fasttext trained with total words = 20M, vocab size = 1171011, epoch=50, embedding dimension = 300 and the training loss = 0.318668,
Word2Vec word embedding dimension = 100, min_count=5, window=5, epochs=10
To Know Bengali GloVe Wordvector and training process follow this repository
Bengali CRF POS Tagging was training with nltr dataset with 80% accuracy.
Bengali CRF NER Tagging was train with this data with 90% accuracy.
Bengali news article doc2vec model train with 8 jsons of this corpus with epochs 40 vector size 100 min_count=2, total news article 400013
Bengali wikipedia doc2vec model trained with wikipedia dump datasets. Total articles 110448, epochs: 40, vector_size: 100, min_count: 2