mySentence

The corpus and models for Burmese (Myanmar language) Sentence Tokenization

Stars

5

Forks

1

Language

Python

Last Updated

Jan 17, 2024

Similar Repos

Repo	Language	Stars	Description	Updated At
myNER	None	2	Named Entity Recognition (NER) corpus for Burmese (Myanmar language)	Feb 13, 2023
myParaphrase	Jupyter Notebook	4	Paraphrase Dataset for Burmese (Myanmar Language)	Apr 15, 2023
myWord	Python	43	syllable, word and phrase segmenter for Burmese (Myanmar language)	Apr 22, 2023
sylbreak	HTML	47	Syllable segmentation tool for Myanmar language (Burmese) by Ye.	Apr 12, 2023
MSL4Emergency	None	7	Myanmar Sign Language Corpus for Emergency Domain	Aug 26, 2022
myRoman	None	6	Burmese Romanization Corpora (for both word & sentence)	Feb 14, 2023
myanmar-text-extractor-js	TypeScript	5	Burmese language (Myanmar text) extractor JavaScript library for word segmentation, text extraction or syllable break.	Jun 08, 2022
myPOS	Python	57	myPOS (Myanmar Part-of-Speech) Corpus for Myanmar NLP Research and Developments	Apr 24, 2023
myG2P	Perl	46	Myanmar (Burmese) Language Grapheme to Phoneme (myG2P) Conversion Dictionary for speech recognition (ASR) and speech …	Feb 12, 2023
myanmar-sort	Ruby	4	Big module for NodeJS to sort any Myanmar/Burmese text	May 14, 2021
langumo	Python	8	The unified corpus building environment for Language Models.	Jan 13, 2022
muse-as-service	Python	47	REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.	Sep 05, 2022
syntok	Python	168	Text tokenization and sentence segmentation (segtok v2)	Jul 31, 2022
CADSR-LM	Shell	2	Language models baseline Kaldi script for TORGO corpus	Apr 21, 2022
NLP-Cube	Python	486	Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing	Aug 06, 2022
myth	Shell	7	Myanmar and Thai Language Resources	Aug 01, 2022
blbooks-lms	Python	2	Pretrained Language Models on British Library Corpus	Aug 07, 2023
sscorpus	None	11	A monolingual parallel corpus for sentence simplification	Jun 11, 2022
sentence-embd-fusion	Python	7	Sentence embeddings as artefacts fused to language models	Mar 22, 2023
myUDTree	None	8	Universal Dependency Tree for Myanmar Language	Mar 26, 2023
japanese-bert	Python	11	BERT models with tokenization for Japanese texts.	Apr 27, 2022
sherlock	HTML	16	Code, data, models for the Sherlock corpus	Jul 30, 2022
Myanmar-States-Districts-Townships-JSON	None	2	States Districts Townships Data in Myanmar as JSON format. ( Both Myanmar and English Language …	Feb 05, 2023
spacy-sentence-bert	Python	70	Sentence transformers models for SpaCy	Aug 30, 2022
ChineseGLUE	Python	1633	Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard	Sep 12, 2022
chineseGLUE	Python	2	Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard	Feb 26, 2023
encodechka	Python	36	The tiniest sentence encoder for Russian language	May 12, 2023
N-Grams	Jupyter Notebook	2	Find the probability of test sentence given the corpus using n-grams	Jun 18, 2021
BERT-flow	Python	483	TensorFlow implementation of On the Sentence Embeddings from Pre-trained Language Models (EMNLP 2020)	Sep 09, 2022
ffwpu-myanmar	HTML	5	Myanmar- Family Federation for World Peace and Unification Myanmar	Mar 08, 2023
Kor-Sentence-Similarity	Python	4	Sentence/Text Similararity Models for Korean	Dec 04, 2020
Russian-Flair-LM	Jupyter Notebook	2	Pretrained version of Flair char-level language models for russian language(currently, on lenta news corpus)	Jul 30, 2020
code_tokenize	Python	10	Fast tokenization and structural analysis of any programming language	Jul 20, 2022
DroppedText_Corpus	None	3	A text corpus collection for the DroppedText language.	Aug 28, 2022
cltk_readers	Jupyter Notebook	4	Corpus reader extension for the Classical Language Toolkit	Apr 09, 2023
LARC	JavaScript	47	Language-annotated Abstraction and Reasoning Corpus	Apr 24, 2023
Meedan-Open-Translation-Memory--ar-en-	None	5	The central repository of the Meedan.net open source translation corpus of Arabic and English sentence …	Dec 02, 2021
SeedLing	Python	8	Building and Using A Seed Corpus for the Human Language Project	Apr 30, 2019
fi-sentence-embeddings-eval	Python	7	Comparison of sentence embedding models for Finnish	Nov 07, 2021
indonesian-nlp-datasets	Python	6	Indonesian corpus for Natural Language Processing	Dec 14, 2019
MultilingualShareGPT	None	44	MultilingualShareGPT, the free multi-language corpus for LLM training	Apr 12, 2023
tlingit-corpus	Shell	5	Text corpus the of Tlingit language for linguistic research.	May 21, 2023
ua-gec	Python	197	UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language	Jul 21, 2022
ParaBART	Python	21	Code for our NAACL-2021 paper "Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language …	May 03, 2023
Tokenization-Practice	Jupyter Notebook	2	Practice for different types of manual tokenization and tokenization with NLTK	Mar 04, 2023
KabyleNLP	Python	7	NLP tools for the Kabyle language: Lemmatization, Stemming, Tokenization, Text 2 Speech, SpellCheck	May 15, 2022
lstm_sentence_classifier	Python	67	LSTM-based Models for Sentence Classification in PyTorch	Jul 21, 2022
sentence-transformers-example	Jupyter Notebook	5	HuggingFace's Transformer models for sentence / text embedding generation.	Aug 05, 2022
vocabsieve	Python	73	Simple sentence mining tool for language learning	Jun 28, 2022
hacker_news_topic_modelling	None	19	Topic models (just LDA for now) on the Hacker News corpus	Sep 04, 2021