The listen_audio
node is a component of the ComfyUI LLM Party, designed to facilitate audio recording and processing within ComfyUI workflows. This node allows users to record audio directly through a key press and can be integrated into larger workflows that require audio input, such as those involving speech recognition or other audio-based applications.
The listen_audio
node is primarily responsible for capturing audio input through a microphone when a specific key is pressed. The node continuously monitors for the specified key press, starts recording when the key is pressed, and stops recording once the key is released. The captured audio is then saved to a file for further processing or analysis.
The node accepts the following input:
press_key: This parameter specifies the key that starts and stops the audio recording. Available options include:
shift
space
ctrl
alt
tab
Users can select the most convenient key based on their preferences and workflow needs.
The listen_audio
node provides the following outputs:
audio_path: This is a string representing the file path where the recorded audio is saved. This allows other nodes in the workflow to access and process the audio file.
audio: This output is an audio object containing the waveform and sample rate of the recorded audio. It is designed to facilitate further audio processing within the ComfyUI environment.
The listen_audio
node can be used in any workflow within ComfyUI that requires real-time audio input. Some common use cases include:
Speech Recognition: Pair this node with ASR (Automatic Speech Recognition) nodes to convert spoken words into text. This is useful for applications like voice commands or transcription tasks.
Audio Analysis: Use in combination with audio analysis nodes to extract features from recorded sounds or to perform audio classification.
Interactive Installations: Integrate into interactive art or educational installations where user input is captured through audio for engagement and interaction analyses.
To incorporate listen_audio
into a workflow, simply drag the node into the ComfyUI interface and connect its outputs to relevant nodes that can process audio data, such as the openai_whisper
node for audio-to-text conversion.
Real-Time Interaction: The node supports real-time audio capturing based on user interaction, making it suitable for dynamic and interactive applications.
Configurable Key Press: The flexibility in choosing a key for starting and stopping recordings allows for a more personalized user experience based on comfort and accessibility.
Automatic File Saving: Recorded audio is automatically saved with a timestamp, ensuring unique filenames and easy organization of audio files.
By integrating the listen_audio
node into workflows, users can achieve a seamless audio input experience, contributing to richer and more interactive ComfyUI applications.