ComfyUI Node Documentation: Fish TTS Node
Overview
The fish_tts
node is a component within the ComfyUI LLM Party suite designed for use in ComfyUI workflows that require text-to-speech (TTS) capabilities. This node utilizes the Fish audio SDK to convert textual content into auditory output, enabling users to generate audio files from text inputs.
Functionality
The primary function of the fish_tts
node is to take a string input (text) and produce an audio file as output. This node can also incorporate reference audio for voice style transfer, which allows the generated speech to closely match specific tonal and stylistic qualities.
Inputs
The fish_tts
node accepts the following inputs:
- is_enable (Required): A boolean indicating whether the node is active.
- input_string (Required): The text content that needs to be synthesized into speech.
- reference_audio_path (Optional): A path to a reference audio file intended for voice style transfer. This allows the TTS engine to mimic the style and tonality of the reference file.
- reference_text (Optional): Text corresponding to the reference audio, which helps in aligning the TTS with the reference.
- reference_id (Optional): An identifier for a pre-existing reference in the Fish audio system. This can be used instead of uploading a new reference audio file.
- api_key (Optional): The API key to access the Fish audio service. If not provided here, it must be configured in the
config.ini
file of the LLM Party setup.
Outputs
The fish_tts
node produces two outputs:
- audio_path: A string representing the file path to the generated audio file (.mp3 format).
- audio: A dictionary containing the waveform and sample rate information of the generated audio.
Usage in ComfyUI Workflows
The fish_tts
node can be a crucial component in ComfyUI workflows where transforming text input into an audio output is required. This might include applications such as:
- Creating Audio Narrations: Converting written content to spoken word for podcasts, video voiceovers, or audiobooks.
- Interactive Applications: Providing auditory feedback in interactive interfaces or applications.
- Speech Prototyping: Testing different voice styles and tones in applications involving speech synthesis.
Example Workflow
A sample workflow might include a node setup where user input text is fed into the fish_tts
node, potentially alongside a reference audio file, to output an MP3 audio file. This output can then be used or manipulated further down the workflow pipeline for various multimedia applications.
Special Features and Considerations
- Voice Style Transfer: By utilizing reference audio files or IDs, the
fish_tts
node supports style transfer, allowing the generated speech to mimic the characteristics of the reference audio. This can be particularly useful for maintaining voice consistency in multimedia projects.
- Configuration: Users must ensure that their API keys are correctly configured either within the node or globally within the
config.ini
file of the LLM Party installation.
- Language Integration: The system can be configured to support multiple languages. Ensure your setup is correctly aligned with the language requirements of your project.
The fish_tts
node, as part of the ComfyUI LLM Party suite, facilitates advanced audio synthesis capabilities, offering flexibility and customization to suit a variety of project needs.