Stars
6
Forks
3
Language
Python
Last Updated
May 08, 2022
Similar Repos
Repo | Language | Stars | Description | Updated At |
---|---|---|---|---|
Python | 2 | ocr,pdf转docx,pdf to docx | Apr 18, 2023 | |
Java | 104 | Produce doc/docx/pdf format from doc/docx template | Mar 07, 2022 | |
Python | 3 | Read texts from different formats (including .doc, .docx, .txt, .pdf) | Mar 25, 2022 | |
JavaScript | 2 | Covert doc/docx to pdf. | Nov 22, 2020 | |
JavaScript | 33 | Convert doc, docx to pdf file | Jul 19, 2022 | |
None | 251 | Various documents related to Tesseract OCR | Aug 14, 2022 | |
JavaScript | 175 | Parse office documents (doc, docx, xls, etc..) | Jul 25, 2022 | |
Ruby | 478 | Read text and metadata from files and documents (.doc, .docx, .pages, .odt, .rtf, .pdf) | Oct 05, 2022 | |
Java | 2 | OCR REST API using Tesseract OCR Engine (via Tess4J) | Jan 03, 2024 | |
Protocol Buffer | 2 | A Go library about to be Microservice to convert PDF, DOC, DOCX, XML, HTML, RTF, … | Nov 27, 2019 | |
Python | 3 | papermerge worker - extracts (OCR) text from documents using tesseract. | Jan 12, 2023 | |
Python | 349 | Python script to do PDF OCR conversion using Tesseract | Aug 19, 2022 | |
Python | 3 | PDF to text conversion website, with OCR scanner (Tesseract) | Jun 06, 2021 | |
Java | 2 | Documents reader (TXT, DOC, HTML, PDF, etc) | Oct 17, 2021 | |
Java | 14 | Java client for Plutext's doc/docx to PDF Converter product | Oct 28, 2022 | |
Python | 3 | Convert ppt or doc or docx to pdf using aspose. | May 23, 2023 | |
C++ | 2 | SfTesseract is a PDF OCR processer based on Tesseract engine | Apr 19, 2022 | |
Python | 18 | Pdf to image, Image to CSV table, OCR by Tesseract | Nov 24, 2022 | |
Swift | 7 | Capture photos, convert to pdf, (ocr) text recognition with tesseract, share etc (SwiftUI, Combine, Tesseract) | May 21, 2022 | |
Go | 987 | Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text | Aug 24, 2022 | |
Go | 2 | Converts PDF, DOC, DOCX, XML, HTML, RTF, etc to plain text | Sep 18, 2021 | |
Python | 2 | Convert docx xlsx doc xls to pdf and combine into one. | Dec 14, 2018 | |
Java | 522 | A standalone Java library/command line tool that converts DOC, DOCX, PPT, PPTX and ODT documents … | Oct 15, 2022 | |
Java | 10 | extract text content from doc, docx, pdf, rtf, txt. and html files | Mar 12, 2022 | |
Python | 3 | Simple Flask API to convert Microsoft Word files (DOC/DOCX) to PDF | Dec 07, 2023 | |
Go | 4 | Convert Word documents (DocX) to PDF with a standalone CLI | Mar 29, 2022 | |
TypeScript | 16 | Chat with documents (pdf, docx, txt) using ChatGPT and Langchain | Apr 29, 2023 | |
Java | 44 | Export docx to PDF via XSL FO, using FOP | Apr 23, 2023 | |
PHP | 4 | a php class to read document lik,e doc,docx,pdf,txt,zip... | May 26, 2020 | |
Python | 81 | Creates thumbnails of office documents (.docx, .odt, .ppt, .pdf) and images. | Oct 26, 2022 | |
Shell | 24 | Some useful Nemo Actions and Shell Scripts with zenity GUIs: 1. Sandwich PDF Maker (OCR, … | Aug 19, 2022 | |
Visual Basic | 16 | Batch convert image-only PDF files to text under Windows using Tesseract OCR | Jan 19, 2022 | |
Python | 5 | Simple python3 QT5 demo of tesseract OCR C-API via c-types. | Jul 08, 2019 | |
None | 2 | Necessary SOLR plugins for DOC and PDF extraction, to go with silverstripe/fulltextsearch | Apr 24, 2015 | |
Kotlin | 78 | This library reads word documents (.doc and .docx), txt and PDF files, and gives the … | Apr 13, 2023 | |
Shell | 2 | a simple tesseract and pdftoppm wrapper shell script for PDF to text OCR transformation | Feb 19, 2018 | |
Java | 183 | Easy-to-use template engine for creating docx documents in Java. | Aug 07, 2022 | |
Shell | 4 | Script shell using scantailor, tesseract OCR, pdftk and Imagemagick to convert Image-PDF to text | Nov 01, 2019 | |
Python | 3 | Python skripta sem notar Tesseract OCR til að textavæða pdf skrár sem eru án textaleitar. | Nov 08, 2021 | |
C# | 21 | Tool for visualizing hOCR output from Tesseract (or other OCR engines that support hOCR). | Dec 15, 2020 | |
Jupyter Notebook | 4 | Takes in the resume in pdf or docx/doc form and extracts key information from it. | Sep 04, 2022 | |
Python | 10 | Metagoofil is an information gathering tool designed for extracting metadata of public documents (pdf,doc,xls,ppt,docx,pptx,xlsx) belonging … | Jun 09, 2022 | |
Go | 4 | Watches a directory for PDF documents and automatically does OCR on them. | Jan 08, 2022 | |
TypeScript | 5 | A simple wrapper around command-line utils to assist in PDF / Image OCR processing using … | Feb 14, 2022 | |
Shell | 15 | OwncloudOCR uses tesseract OCR and OCRmyPDF for reading text from images and images in PDF … | Jul 31, 2021 | |
Python | 4 | A tool for converting older Word 97–2004 Documents (.doc) to modern Word 2007–365 Documents (.docx) … | Jan 15, 2024 | |
Shell | 4 | Tesseract ocr training data for Danish written in fraktur script and a few other languages | Dec 23, 2022 | |
HTML | 2 | Tools to extract tables from PDF and other documents | Dec 07, 2018 | |
HTML | 44 | a template to create pdf/ePub/html/docx books by Markdown via Pandoc | May 07, 2023 | |
Python | 12 | Data extractor of PDF documents from the Official Gazette of the Federal District, Brazil. | Aug 17, 2022 |