comfyui_LLM_party

1625

Available Nodes

openai_whisper

ComfyUI Node Documentation: openai_whisper

Overview

The openai_whisper node is part of the ComfyUI LLM Party, designed to integrate OpenAI's Whisper audio-to-text transcription service into ComfyUI workflows. This node allows users to convert audio inputs into text form, effectively leveraging OpenAI's advanced speech recognition technology. This functionality can be particularly useful in applications where voice inputs need to be processed and understood by AI systems.

Functionality

What This Node Does

The openai_whisper node takes an audio file as input and utilizes either OpenAI's Whisper model or an Azure OpenAI integration to produce a text transcription of the audio. This can be used to transcribe spoken content into text, supporting the development of applications that require audio content parsing and processing.

Inputs Accepted

The node accepts the following inputs:

  1. is_enable: A boolean option to enable or disable the node's functionality. Default is True.
  2. audio_path: A string path to the audio file that needs to be transcribed.
  3. base_url (optional): A string to specify the base URL for the API endpoint. This is useful when integrating with different OpenAI or Azure OpenAI instances.
  4. api_key (optional): A string containing the API key for accessing the OpenAI or Azure OpenAI service.
  5. audio (optional): An audio object input that can be directly transcribed if provided.

Outputs Produced

The node produces the following output:

  • text: A string that contains the transcribed text from the input audio.

Usage in ComfyUI Workflows

The openai_whisper node can be integrated into ComfyUI workflows to convert voice inputs into text, which can then be used in various applications like:

  • Automating note-taking services where spoken content is converted into text notes.
  • Building voice-controlled interaction systems where spoken commands are transcribed and further processed.
  • Enhancing multimedia projects by generating captions or subtitles for audio content.

To utilize this node, users should:

  1. Configure the openai_whisper node within the ComfyUI workflow.
  2. Set the necessary API details either within the node or through configuration files.
  3. Connect the node to other nodes in the workflow that require transcribed text as input, such as nodes used for text processing or analysis.

Special Features and Considerations

  • API Integration: The node supports integration with OpenAI's Whisper model and Azure OpenAI services, making it versatile for different deployment scenarios.
  • Audio Source Flexibility: Users can provide either a file path to an audio file or an audio object directly, offering flexibility in how audio data is supplied to the node.
  • Localization Options: The node supports configuration in different languages through settings in configuration files.
  • Dependency Management: Ensure that all required API keys and base URLs are properly set up for the node to function correctly.

This node is an essential component for workflows requiring speech-to-text capabilities, broadening the range of applications and interactions possible within ComfyUI.