|
None |
34 |
A large-scale cleaned Chinese chitchat corpus and Chinese dialogpt models |
Jun 19, 2022 |
|
Python |
715 |
Collections of Chinese NLP corpus |
Oct 10, 2022 |
|
None |
5 |
Collections of Chinese NLP corpus |
Sep 05, 2022 |
|
None |
7202 |
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP |
Aug 12, 2022 |
|
Python |
2 |
[After ChatGPT NLP]Chinese Question Matching Corpus |
Mar 24, 2023 |
|
None |
673 |
Large-scale Pre-training Corpus for Chinese 100G 中文预训练语料 |
Apr 28, 2023 |
|
Python |
2 |
Large scale web corpus of Austronesian text. |
Jan 17, 2022 |
|
None |
2 |
A Large-scale Vietnamese News Text Classification Corpus |
Dec 12, 2023 |
|
Python |
53 |
T2Ranking: A large-scale Chinese benchmark for passage ranking. |
May 20, 2023 |
|
None |
5 |
Chinese Mandarin Ngrams Counts from large-scale corpora |
Aug 29, 2022 |
|
None |
12 |
A Large-Scale Chinese Legal Case Retrieval Dataset |
May 10, 2023 |
|
Python |
24 |
Large scale unannotated Korean corpus for unsupervised tasks. (e.g. Language modeling) |
Jun 21, 2022 |
|
Python |
26 |
ReCO: A Large Scale Chinese Reading Comprehension Dataset on Opinion |
Jun 20, 2022 |
|
Python |
433 |
A Large-Scale Chinese Cross-Domain Task-Oriented Dialogue Dataset |
Oct 15, 2022 |
|
Python |
39 |
Corpus creator for Chinese Wikipedia |
Apr 25, 2022 |
|
Python |
1060 |
A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models |
Oct 16, 2022 |
|
Python |
3 |
A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models |
Apr 20, 2021 |
|
None |
2 |
A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models |
Nov 05, 2021 |
|
None |
2 |
A Large-scale Chinese Short-Text Conversation Dataset and Chinese pre-training dialog models |
Nov 15, 2023 |
|
Python |
250 |
CoVoST: A Large-Scale Multilingual Speech-To-Text Translation Corpus (CC0 Licensed) |
Jul 28, 2022 |
|
Python |
364 |
A large-scale multilingual speech corpus for representation learning, semi-supervised learning and interpretation |
Aug 02, 2022 |
|
None |
8 |
Reichsanzeiger NLP corpus & guidelines |
Jan 16, 2023 |
|
C++ |
6 |
NLP-Fast: A Fast, Scalable, and Flexible System to Accelerate Large-Scale Heterogeneous NLP Models |
Apr 20, 2023 |
|
Python |
446 |
GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors |
Aug 31, 2022 |
|
Python |
41 |
Chinese NLP package |
Aug 04, 2021 |
|
Python |
13 |
Title2Event: Benchmarking Open Event Extraction with a Large-scale Chinese Title Dataset |
May 08, 2023 |
|
Python |
53 |
Yet Another Chinese Learner Corpus |
Apr 13, 2023 |
|
Jupyter Notebook |
166 |
All For NLP, especially Chinese. |
Oct 10, 2022 |
|
None |
3 |
Cherokee English Corpus material for NLP research |
Jan 12, 2022 |
|
Jupyter Notebook |
2 |
Mini-Luotuo: A Diverse Herd of Distilled Chinese Models from Large-Scale Instructions |
May 20, 2023 |
|
None |
5 |
large corpus of PDF document |
Mar 18, 2023 |
|
C |
6 |
Large scale optimization |
Apr 24, 2022 |
|
Python |
2 |
BERT fine-tuning with Chinese corpus |
Sep 21, 2021 |
|
Python |
5 |
Generator for Large Scale Structure |
Apr 12, 2022 |
|
Python |
1687 |
Large-scale pretraining for dialogue |
Aug 07, 2022 |
|
Jupyter Notebook |
4 |
Topic modeling for Chinese news articles using Chinese NLP techniques |
Jul 22, 2022 |
|
Python |
3 |
A NLP project for Chinese Spell Check & Chinese Text Correction, |
Aug 10, 2022 |
|
Python |
2 |
flashtext-chinese-nlp-data-augmentation |
Oct 21, 2021 |
|
HTML |
11 |
Large-Scale Graph Inference |
May 28, 2022 |
|
Python |
28 |
Large Scale BERT Distillation |
Apr 22, 2022 |
|
Python |
116 |
Large-scale model inference. |
Aug 28, 2022 |
|
Python |
2 |
CURENT Large-Scale Testbed |
Apr 27, 2023 |
|
Python |
9 |
Large Scale Search Index |
Jan 06, 2023 |
|
Python |
33 |
BERT+ TF Keras For Chinese NLP Tasks |
Feb 14, 2022 |
|
Python |
4 |
A large labeled corpus for Application Privacy Policy in Chinese to train named entity recognition … |
May 18, 2023 |
|
Go |
12 |
Arktos for large-scale cloud platform |
Aug 23, 2022 |
|
Python |
3689 |
Repo for external large-scale work |
Aug 19, 2022 |
|
Python |
9 |
Repo for external large-scale work |
Jan 30, 2023 |
|
Python |
2 |
Scripts for automated large-scale curation |
Mar 23, 2023 |
|
Go |
212 |
Arktos for large-scale cloud platform |
Aug 29, 2022 |