ComfyUI Node Documentation: Fish TTS Node

Overview

The fish_tts node is a component within the ComfyUI LLM Party suite designed for use in ComfyUI workflows that require text-to-speech (TTS) capabilities. This node utilizes the Fish audio SDK to convert textual content into auditory output, enabling users to generate audio files from text inputs.

Functionality

The primary function of the fish_tts node is to take a string input (text) and produce an audio file as output. This node can also incorporate reference audio for voice style transfer, which allows the generated speech to closely match specific tonal and stylistic qualities.

Inputs

The fish_tts node accepts the following inputs:

is_enable (Required): A boolean indicating whether the node is active.
input_string (Required): The text content that needs to be synthesized into speech.
reference_audio_path (Optional): A path to a reference audio file intended for voice style transfer. This allows the TTS engine to mimic the style and tonality of the reference file.
reference_text (Optional): Text corresponding to the reference audio, which helps in aligning the TTS with the reference.
reference_id (Optional): An identifier for a pre-existing reference in the Fish audio system. This can be used instead of uploading a new reference audio file.
api_key (Optional): The API key to access the Fish audio service. If not provided here, it must be configured in the config.ini file of the LLM Party setup.

Outputs

The fish_tts node produces two outputs:

audio_path: A string representing the file path to the generated audio file (.mp3 format).
audio: A dictionary containing the waveform and sample rate information of the generated audio.

Usage in ComfyUI Workflows

The fish_tts node can be a crucial component in ComfyUI workflows where transforming text input into an audio output is required. This might include applications such as:

Creating Audio Narrations: Converting written content to spoken word for podcasts, video voiceovers, or audiobooks.
Interactive Applications: Providing auditory feedback in interactive interfaces or applications.
Speech Prototyping: Testing different voice styles and tones in applications involving speech synthesis.

Example Workflow

A sample workflow might include a node setup where user input text is fed into the fish_tts node, potentially alongside a reference audio file, to output an MP3 audio file. This output can then be used or manipulated further down the workflow pipeline for various multimedia applications.

Special Features and Considerations

Voice Style Transfer: By utilizing reference audio files or IDs, the fish_tts node supports style transfer, allowing the generated speech to mimic the characteristics of the reference audio. This can be particularly useful for maintaining voice consistency in multimedia projects.
Configuration: Users must ensure that their API keys are correctly configured either within the node or globally within the config.ini file of the LLM Party installation.
Language Integration: The system can be configured to support multiple languages. Ensure your setup is correctly aligned with the language requirements of your project.

The fish_tts node, as part of the ComfyUI LLM Party suite, facilitates advanced audio synthesis capabilities, offering flexibility and customization to suit a variety of project needs.

comfyui_LLM_party

Available Nodes

fish_tts