PromptingWhisper

Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation

Stars

3

Forks

1

Language

None

Last Updated

Dec 06, 2023

Similar Repos

Repo	Language	Stars	Description	Updated At
PromptingWhisper	Python	16	Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation	May 20, 2023
zac	Python	39	Zero-shot Audio Classification using Whisper	Apr 25, 2023
avsr	Jupyter Notebook	4	Audio-visual speech recognition models	May 27, 2022
talk-to-gpt-3	Python	33	Whisper + OpenAI + Speech Recognition	Apr 08, 2023
Audio-transcriber	Python	23	Simple Python audio transcriber using OpenAI's Whisper speech recognition model	Mar 26, 2023
sp-aen.cvpr18	Python	44	Zero-Shot Visual Recognition using Semantic-Preserving Adversarial Embedding Networks	Apr 19, 2022
GILA	Python	6	Code for paper "Cross-Modal Global Interaction and Local Alignment for Audio-Visual Speech Recognition"	Jun 25, 2023
vlm_benchmark	HTML	2	Code for benchmarking VLMs on zero and few-shot activity recognition	May 19, 2023
whisper-punctuator	Python	53	Zero-shot multimodal punctuation insertion and truecasing using Whisper	Apr 25, 2023
avsr-tf1	Python	76	Audio-Visual Speech Recognition using Sequence to Sequence Models	Sep 24, 2022
audio.whisper	C	45	Transcribe audio files using the "Whisper" Automatic Speech Recognition model from R	Apr 24, 2023
pix2pix-zero	Python	767	Zero-shot Image-to-Image Translation	May 09, 2023
AVEC	Jupyter Notebook	47	[WACV 2023] Audio-Visual Efficient Conformer (AVEC) for Robust Speech Recognition	May 23, 2023
ml-code-switched-speech-translation	Python	16	This repository contains the code and instructions needed to reproduce the dataset splits for out …	Aug 03, 2022
openai-whisper	Python	3	Whisper is a general-purpose speech recognition model. It is trained on a large dataset of …	Apr 01, 2023
zeroshot-storytelling	Python	16	Github repository for Zero Shot Visual Storytelling	Nov 12, 2022
whisper_streaming	Python	38	Whisper realtime streaming for long speech-to-text transcription and translation	Apr 23, 2023
deep_avsr	None	2	A PyTorch implementation of the Deep Audio-Visual Speech Recognition paper.	Jan 24, 2021
DIPL	MATLAB	7	Domain-Invariant Projection Learning for Zero-Shot Recognition	Jul 07, 2020
Zero-Shot-TTS	Python	32	Unofficial Implementation of Zero-Shot Text-to-Speech for Text-Based Insertion in Audio Narration	Jun 27, 2022
soxan	Jupyter Notebook	168	Wav2Vec for speech recognition, classification, and audio classification	May 29, 2023
alt-hackathon-docs	JavaScript	3	QCRI Speech Recognition and Machine Translation API's for Hackathons	Apr 19, 2021
pyVSR	Python	34	Python toolkit for Visual Speech Recognition	Dec 22, 2022
multilingual-question-answering	Jupyter Notebook	3	Zero-shot and Translation Experiments on XQuAD, MLQA and TyDiQA	Nov 08, 2022
TCAF-GZSL	Python	4	This repository contains the code for our ECCV 2022 paper "Temporal and cross-modal attention for …	Aug 09, 2022
MISP2021-AVSR	Shell	2	repository for paper "Audio-Visual Speech Recognition in MISP2021 Challenge: Dataset Release and Deep Analysis"	Jun 29, 2022
CROP	Python	3	[EMNLP 2022] CROP: Zero-shot Cross-lingual Named Entity Recognition with Multilingual Labeled Sequence Translation	Feb 20, 2023
whisper-cli	TypeScript	2	A CLI speech recognition tool, using OpenAI Whisper, supports audio file transcription and near-realtime microphone …	Oct 20, 2023
LookingAndListen	Python	7	Audio-Visual Model for Speech Separation.	Dec 26, 2019
AndroidVoiceTranslator	Kotlin	2	Text-to-speech translation Android application with speech recognition	Mar 17, 2023
naturalspeech2-pytorch	Python	207	Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch	Apr 20, 2023
naturalspeech2-pytorch	Python	4	Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch	Jul 12, 2023
deep-face-speechreading	Python	9	Visual speech recognition with face inputs: code and models for F&G 2020 paper "Can We …	Apr 02, 2022
openai-whisper-microservice	Python	2	This is an OpenAI Whisper automatic speech recognition microservice	Apr 06, 2023
Speech-recognition-for-Japanese	TypeScript	5	Speech recognition using Google Cloud Speech, specified for Japanese audio	Oct 20, 2022
InCo	Python	4	Code for Invariant and consistent: Unsupervised representation learning for few-shot visual recognition. Neurocomputing 2023	Mar 06, 2023
WhisperRealtime-Manual	CSS	3	Official documentation of Whisper-Based Real-time Speech Recognition, a Unreal Engine plugin for real-time speech-to-text transcription …	Apr 14, 2023
WhiTTsper-The-Lora	Jupyter Notebook	7	Demo combining Whisper for speech recognition and Google TTS for speech synthesis to interact with …	Apr 01, 2023
ZS-F-VQA	Python	53	Code and Data for paper: Zero-shot Visual Question Answering using Knowledge Graph [ ISWC 2021 …	May 08, 2023
CANAVER	Python	2	Code for Multimodal Cross Attention Network for Audio Visual Emotion Recognition	Nov 01, 2022
p5.js-speech	JavaScript	248	Web Audio Speech Synthesis / Recognition for p5.js	Aug 22, 2022
whisper-diarization	Jupyter Notebook	146	Automatic Speech Recognition with Speaker Diarization based on OpenAI Whisper	Apr 11, 2023
low-shot-shrink-hallucinate	Python	303	Presenting Low-shot Visual Recognition by Shrinking and Hallucinating Features	Jul 16, 2022
latticetm	C++	5	Code to train a translation model using speech recognition lattices and their written translations.	Nov 10, 2021
AVCA-GZSL	Python	16	This repository contains the code for our CVPR 2022 paper on "Audio-visual Generalised Zero-shot Learning …	Aug 09, 2022
ZS-F-VQA	None	2	Code and Data for the paper: Zero-shot Visual Question Answering using Knowledge Graph [ ISWC …	Apr 15, 2022
bimodal-speech-recognition	Python	5	bimodal speech recognition based on acoustic and visual data	May 02, 2022
openai_whisper_stt	Python	4	Simple demo of OpenAI's Whisper speech recognition model for HF 🤗 Spaces.	Mar 03, 2023
AGAM	Python	7	Code for the AAAI 2021 paper "Attributes-Guided and Pure-Visual Attention Alignment for Few-Shot Recognition".	Jun 07, 2022
Speech-Translate	Python	105	A realtime speech transcription and translation application using Whisper OpenAI and free translation API. Interface …	Apr 25, 2023