EasyOCR_advance Node Documentation
Overview
The EasyOCR_advance node is a component of the ComfyUI extension for the "comfyui_LLM_party" project. This node is designed to perform Optical Character Recognition (OCR) on images using the EasyOCR library. It extracts text from images, supports multiple languages, and provides various customization options to enhance the performance and accuracy of text recognition.
Features
- Multi-language support for text recognition.
- Customizable parameters including decoder type, text detection thresholds, and image processing options.
- Real-time text extraction displayed both as structured data and as annotated images.
- Integration for enabling/disabling the node's functionality within workflows.
Inputs
The EasyOCR_advance node accepts the following inputs:
- image: The image on which OCR is to be performed.
- gpu: Boolean option to enable or disable GPU usage for faster processing.
- language_name: A string indicating the languages to be used for OCR, specified in EasyOCR language codes (e.g., "ch_sim,en" for Simplified Chinese and English).
- decoder: Chooses the type of decoder used ("greedy", "beamsearch", or "wordbeamsearch").
- beamWidth: Integer value specifying the width of the beam for beamsearch decoding.
- batch_size: Number of images processed in a batch.
- workers: Number of worker threads used for parallel image processing.
- allowlist: A string of allowable characters for text detection.
- blocklist: A string of characters to exclude from text detection.
- paragraph: Boolean to enable or disable paragraph segmentation.
- min_size: Minimum size of text components to be detected.
- contrast_ths: Threshold for contrast enhancement.
- adjust_contrast: Level of contrast adjustment.
- text_threshold: Threshold for text line break detection.
- low_text: Threshold for lower text score cutoff.
- link_threshold: Threshold for text link score.
- canvas_size: Maximum size of the image canvas.
- mag_ratio: Magnification ratio for image resizing.
- slope_ths: Threshold for slant detection.
- ycenter_ths: Threshold for vertical alignment of text.
- height_ths: Threshold for text height variation.
- width_ths: Threshold for text width variation.
- add_margin: Margin added around text detections.
- is_enable: Boolean to enable or disable the node.
Outputs
The EasyOCR_advance node produces the following outputs:
- images: Annotated images with bounding boxes drawn around detected text.
- masks: Mask images highlighting the detected text areas.
- json_str: JSON string containing structured data with bounding boxes, detected text, and confidence scores.
- text: Raw extracted text from the images.
- language_list_help: A string providing a mapping of supported languages in human-readable format.
Usage
The EasyOCR_advance node can be used in ComfyUI workflows for various purposes, such as:
- Extracting text information from images for analysis or recording.
- Pre-processing text within images for further AI/ML tasks.
- Integrating with multilingual systems to process images in different languages.
- Generating overlays or annotations on images to display detected text visually.
To use this node, create a large language model node in ComfyUI and input the required parameters as needed. Depending on your hardware setup and needs, you may decide to enable GPU processing for improved performance.
Special Features and Considerations
- Language Support: The node supports a wide range of languages using EasyOCR's language codes. Refer to the language_list_help output to understand supported languages.
- Customizable Parameters: Users have control over various parameters allowing customization of the detection process suited for different scenarios and image types.
- Performance Optimization: Options like GPU usage, batch size, and worker threads enable users to optimize performance based on their computational resources.
Conclusion
The EasyOCR_advance node is an advanced tool designed for performing comprehensive OCR tasks in ComfyUI workflows. Its extensive customization options and language support make it versatile for a variety of applications in image processing and text recognition tasks.