Stars
5
Forks
1
Language
Python
Last Updated
Jan 17, 2024
Similar Repos
Repo | Language | Stars | Description | Updated At |
---|---|---|---|---|
None | 2 | Named Entity Recognition (NER) corpus for Burmese (Myanmar language) | Feb 13, 2023 | |
Jupyter Notebook | 4 | Paraphrase Dataset for Burmese (Myanmar Language) | Apr 15, 2023 | |
Python | 43 | syllable, word and phrase segmenter for Burmese (Myanmar language) | Apr 22, 2023 | |
HTML | 47 | Syllable segmentation tool for Myanmar language (Burmese) by Ye. | Apr 12, 2023 | |
None | 7 | Myanmar Sign Language Corpus for Emergency Domain | Aug 26, 2022 | |
None | 6 | Burmese Romanization Corpora (for both word & sentence) | Feb 14, 2023 | |
TypeScript | 5 | Burmese language (Myanmar text) extractor JavaScript library for word segmentation, text extraction or syllable break. | Jun 08, 2022 | |
Python | 57 | myPOS (Myanmar Part-of-Speech) Corpus for Myanmar NLP Research and Developments | Apr 24, 2023 | |
Perl | 46 | Myanmar (Burmese) Language Grapheme to Phoneme (myG2P) Conversion Dictionary for speech recognition (ASR) and speech … | Feb 12, 2023 | |
Ruby | 4 | Big module for NodeJS to sort any Myanmar/Burmese text | May 14, 2021 | |
Python | 8 | The unified corpus building environment for Language Models. | Jan 13, 2022 | |
Python | 47 | REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder. | Sep 05, 2022 | |
Python | 168 | Text tokenization and sentence segmentation (segtok v2) | Jul 31, 2022 | |
Shell | 2 | Language models baseline Kaldi script for TORGO corpus | Apr 21, 2022 | |
Python | 486 | Natural Language Processing Pipeline - Sentence Splitting, Tokenization, Lemmatization, Part-of-speech Tagging and Dependency Parsing | Aug 06, 2022 | |
Shell | 7 | Myanmar and Thai Language Resources | Aug 01, 2022 | |
Python | 2 | Pretrained Language Models on British Library Corpus | Aug 07, 2023 | |
None | 11 | A monolingual parallel corpus for sentence simplification | Jun 11, 2022 | |
Python | 7 | Sentence embeddings as artefacts fused to language models | Mar 22, 2023 | |
None | 8 | Universal Dependency Tree for Myanmar Language | Mar 26, 2023 | |
Python | 11 | BERT models with tokenization for Japanese texts. | Apr 27, 2022 | |
HTML | 16 | Code, data, models for the Sherlock corpus | Jul 30, 2022 | |
None | 2 | States Districts Townships Data in Myanmar as JSON format. ( Both Myanmar and English Language … | Feb 05, 2023 | |
Python | 70 | Sentence transformers models for SpaCy | Aug 30, 2022 | |
Python | 1633 | Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard | Sep 12, 2022 | |
Python | 2 | Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard | Feb 26, 2023 | |
Python | 36 | The tiniest sentence encoder for Russian language | May 12, 2023 | |
Jupyter Notebook | 2 | Find the probability of test sentence given the corpus using n-grams | Jun 18, 2021 | |
Python | 483 | TensorFlow implementation of On the Sentence Embeddings from Pre-trained Language Models (EMNLP 2020) | Sep 09, 2022 | |
HTML | 5 | Myanmar- Family Federation for World Peace and Unification Myanmar | Mar 08, 2023 | |
Python | 4 | Sentence/Text Similararity Models for Korean | Dec 04, 2020 | |
Jupyter Notebook | 2 | Pretrained version of Flair char-level language models for russian language(currently, on lenta news corpus) | Jul 30, 2020 | |
Python | 10 | Fast tokenization and structural analysis of any programming language | Jul 20, 2022 | |
None | 3 | A text corpus collection for the DroppedText language. | Aug 28, 2022 | |
Jupyter Notebook | 4 | Corpus reader extension for the Classical Language Toolkit | Apr 09, 2023 | |
JavaScript | 47 | Language-annotated Abstraction and Reasoning Corpus | Apr 24, 2023 | |
None | 5 | The central repository of the Meedan.net open source translation corpus of Arabic and English sentence … | Dec 02, 2021 | |
Python | 8 | Building and Using A Seed Corpus for the Human Language Project | Apr 30, 2019 | |
Python | 7 | Comparison of sentence embedding models for Finnish | Nov 07, 2021 | |
Python | 6 | Indonesian corpus for Natural Language Processing | Dec 14, 2019 | |
None | 44 | MultilingualShareGPT, the free multi-language corpus for LLM training | Apr 12, 2023 | |
Shell | 5 | Text corpus the of Tlingit language for linguistic research. | May 21, 2023 | |
Python | 197 | UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language | Jul 21, 2022 | |
Python | 21 | Code for our NAACL-2021 paper "Disentangling Semantics and Syntax in Sentence Embeddings with Pre-trained Language … | May 03, 2023 | |
Jupyter Notebook | 2 | Practice for different types of manual tokenization and tokenization with NLTK | Mar 04, 2023 | |
Python | 7 | NLP tools for the Kabyle language: Lemmatization, Stemming, Tokenization, Text 2 Speech, SpellCheck | May 15, 2022 | |
Python | 67 | LSTM-based Models for Sentence Classification in PyTorch | Jul 21, 2022 | |
Jupyter Notebook | 5 | HuggingFace's Transformer models for sentence / text embedding generation. | Aug 05, 2022 | |
Python | 73 | Simple sentence mining tool for language learning | Jun 28, 2022 | |
None | 19 | Topic models (just LDA for now) on the Hacker News corpus | Sep 04, 2021 |