openai_whisper
The openai_whisper
node is part of the ComfyUI LLM Party, designed to integrate OpenAI's Whisper audio-to-text transcription service into ComfyUI workflows. This node allows users to convert audio inputs into text form, effectively leveraging OpenAI's advanced speech recognition technology. This functionality can be particularly useful in applications where voice inputs need to be processed and understood by AI systems.
The openai_whisper
node takes an audio file as input and utilizes either OpenAI's Whisper model or an Azure OpenAI integration to produce a text transcription of the audio. This can be used to transcribe spoken content into text, supporting the development of applications that require audio content parsing and processing.
The node accepts the following inputs:
is_enable
: A boolean option to enable or disable the node's functionality. Default is True
.audio_path
: A string path to the audio file that needs to be transcribed.base_url
(optional): A string to specify the base URL for the API endpoint. This is useful when integrating with different OpenAI or Azure OpenAI instances.api_key
(optional): A string containing the API key for accessing the OpenAI or Azure OpenAI service.audio
(optional): An audio object input that can be directly transcribed if provided.The node produces the following output:
text
: A string that contains the transcribed text from the input audio.The openai_whisper
node can be integrated into ComfyUI workflows to convert voice inputs into text, which can then be used in various applications like:
To utilize this node, users should:
openai_whisper
node within the ComfyUI workflow.This node is an essential component for workflows requiring speech-to-text capabilities, broadening the range of applications and interactions possible within ComfyUI.