Stars
41
Forks
8
Language
Python
Last Updated
Dec 04, 2023
Similar Repos
Repo | Language | Stars | Description | Updated At |
---|---|---|---|---|
Python | 14 | Vietnamese Wikipedia Corpus | May 17, 2022 | |
None | 3 | Comparable Wikipedia Corpus (aligned documents) | Apr 03, 2023 | |
JavaScript | 17 | Daily online magazine for Chinese Wikipedia | Jan 28, 2023 | |
None | 51 | Pre-trained Wikipedia corpus by MITIE | Jun 28, 2022 | |
Python | 53 | Yet Another Chinese Learner Corpus | Apr 13, 2023 | |
Python | 715 | Collections of Chinese NLP corpus | Oct 10, 2022 | |
None | 5 | Collections of Chinese NLP corpus | Sep 05, 2022 | |
Python | 6 | Wikipedia text corpus for self-supervised NLP model training | Apr 26, 2022 | |
Python | 2 | BERT fine-tuning with Chinese corpus | Sep 21, 2021 | |
JavaScript | 12 | The Chinese Wikipedia twinkle javascript helper | Jul 07, 2022 | |
None | 2 | A Swahili corpus made from Swahili Wikipedia articles | May 05, 2022 | |
Python | 2 | [After ChatGPT NLP]Chinese Question Matching Corpus | Mar 24, 2023 | |
None | 34 | A large-scale cleaned Chinese chitchat corpus and Chinese dialogpt models | Jun 19, 2022 | |
SCSS | 3 | Instant Messaging Code (Chinese Wikipedia Telegram Group) | May 18, 2022 | |
Python | 70 | A pipeline for training word embeddings using word2vec on wikipedia corpus. | Mar 29, 2023 | |
Go | 18 | A corpus builder for Tamil by analyzing wordpress, blogger, wikipedia dumps | Apr 18, 2022 | |
HTML | 77 | Text corpus calculation in Javascript. Supports Chinese, English. | Apr 28, 2023 | |
Python | 14 | The New York Times English-Chinese parallel corpus | May 02, 2023 | |
Python | 2 | early chinese text corpus sourced from kanseki repository | Dec 10, 2021 | |
Python | 2 | Train model based on Wikipedia English corpus with gensim package | May 13, 2023 | |
Python | 163 | chinese and english corpus process script, python, c++, java | Apr 23, 2023 | |
Python | 60 | Word Cloud for Chinese Text Corpus (中文词云制作) | Feb 26, 2023 | |
Python | 2 | A Tensorflow implementation of QANet for machine reading comprehension on Chinese corpus. | Sep 14, 2023 | |
SRecode Template | 2 | dgk_lost_conv 中文对白语料 chinese conversation corpus | Aug 04, 2019 | |
Nim | 133 | Extract a plain text corpus from MediaWiki XML dumps, such as Wikipedia. | Jun 13, 2022 | |
None | 25 | This is a corpus of Chinese abbreviation, including negative full forms. | Oct 12, 2021 | |
Elixir | 88 | An elixir module to translate simplified Chinese to traditional Chinese, and vice versa, based on … | Aug 19, 2022 | |
Python | 1633 | Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard | Sep 12, 2022 | |
Python | 2 | Language Understanding Evaluation benchmark for Chinese: datasets, baselines, pre-trained models,corpus and leaderboard | Feb 26, 2023 | |
None | 32 | A Chinese lyric corpus which contains nearly 50,000 lyrics from 500 artists | Mar 22, 2023 | |
None | 2 | 中文语料小数据:Some useful Chinese corpus datasets | Aug 13, 2018 | |
None | 508 | Some useful Chinese corpus datasets 中文语料小数据 | Apr 25, 2023 | |
JavaScript | 3 | Annotator for Chinese Text Corpus (UNDER DEVELOPMENT) 中文文本标注工具 | May 12, 2022 | |
None | 673 | Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料 | Apr 28, 2023 | |
JavaScript | 1391 | Annotator for Chinese Text Corpus (UNDER DEVELOPMENT) 中文文本标注工具 | May 22, 2023 | |
Python | 6 | Using Wikipedia enwiki dump (43 GB) to create a plain text corpus for NLP and … | Aug 02, 2022 | |
Python | 6 | An ultimate list of medical abbrevations with meaning, Chinese translation and Wikipedia link. | Feb 27, 2023 | |
Python | 251 | This repository is for the paper "A Hybrid Approach to Automatic Corpus Generation for Chinese … | Apr 28, 2023 | |
R | 2 | Corpus linguistic analysis for Corpus Workbench corpora | Feb 07, 2022 | |
None | 2 | A Chinese causal corpus containing 1,314 pairs of arguments based on the Chinese Discourse Treebank … | Aug 08, 2022 | |
Python | 20 | The first Chinese metaphor corpus serving for identification and generation. 中文比喻数据集。 | Apr 27, 2023 | |
JavaScript | 2 | A corpus of Chinese texts labelled with their phonological position of the Qieyun phonological system | Sep 24, 2023 | |
JavaScript | 4 | The wikipedia twinkle javascript helper, for Indonesian Wikipedia | Jan 05, 2022 | |
Python | 50 | The Corpus & Code for EMNLP 2022 paper "FCGEC: Fine-Grained Corpus for Chinese Grammatical Error … | Apr 18, 2023 | |
CSS | 4 | Playlists for Wikipedia | Nov 15, 2021 | |
Go | 7 | TUI for Wikipedia | Dec 18, 2022 | |
Python | 4 | A large labeled corpus for Application Privacy Policy in Chinese to train named entity recognition … | May 18, 2023 | |
None | 7202 | 大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP | Aug 12, 2022 | |
None | 18 | 大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP | Jul 28, 2022 | |
JavaScript | 3 | Wikipedia | Dec 03, 2013 |