Fish Whisper Node Documentation

Overview

The Fish Whisper node is a component of the ComfyUI LLM Party project. This node is designed to perform Automatic Speech Recognition (ASR), converting spoken language in audio files into text format. It leverages the Fish Audio SDK to provide this functionality, allowing users to incorporate speech-to-text conversion as a part of their larger LLM workflows.

Node Functionality

The Fish Whisper node provides the ability to input audio in various forms and outputs the transcribed text. This can be integrated into different ComfyUI workflows that require speech input to be processed and understood in text form.

Inputs

The Fish Whisper node accepts the following inputs:

Required Inputs

is_enable: A boolean value (default: True) indicating whether the node should be active. If set to False, the node will not process any inputs and will not produce an output.
audio_path: A string representing the file path to the audio file that needs transcription.

Optional Inputs

api_key: A string for the API key required by the Fish Audio SDK. If not provided, the node will attempt to use an API key stored in the configuration file.
audio: Audio input in the form of waveform data that can be used as an alternative to an audio file path. This option allows for dynamic audio input during workflow execution.

Outputs

The Fish Whisper node produces the following output:

text: A string that contains the transcribed text from the given audio input.

Usage in ComfyUI Workflows

Within ComfyUI workflows, the Fish Whisper node can be incorporated to process speech inputs. It is particularly useful in workflows that involve interaction with Large Language Models (LLMs) where speech input from users needs to be converted to text for further processing and interaction with other nodes.

Example Use Cases

Voice Command Recognition: In a workflow designed to execute commands based on voice input, the Fish Whisper node can translate spoken commands into text, which can then be interpreted and executed by subsequent nodes.
Speech Interaction Workflows: In workflows that involve human-computer interaction through speech, this node can facilitate the translation of audio to text, enhancing interaction capabilities.
Transcription Services: The node can be part of a workflow that needs to transcribe audio files into text, which can then be stored or further processed for insights or records.

Special Features and Considerations

API Key Management: Ensure that a valid API key is provided either through the api_key input or configured within the configuration file. This is crucial for authentication and access to the Fish Audio API services.
Audio Formats: The node supports audio input either by file path or waveform data, providing flexibility in how audio data is processed.
Configuration: Users may want to configure language settings or API keys in the configuration file to automate and customize the node’s behavior within their workflows.
Enable/Disable Functionality: The is_enable input parameter allows users to control when the node processes data, providing the ability to dynamically enable or disable its function within a larger workflow context.

By leveraging the Fish Whisper node, users of ComfyUI can seamlessly integrate speech-to-text capabilities into their workflows, expanding the potential applications of their LLM setups.

comfyui_LLM_party

Available Nodes

fish_whisper