The ComfyUI-Florence2 repository integrates the advanced capabilities of the Florence-2 vision foundation model into ComfyUI. Florence-2 uses a prompt-based approach to handle diverse vision and vision-language tasks. These tasks include captioning, object detection, and segmentation. By leveraging the extensive FLD-5B dataset with over 5.4 billion annotations across 126 million images, Florence-2 excels in multi-task learning. Its sequence-to-sequence architecture enables high performance in both zero-shot and fine-tuned settings, making it a competitive choice in the realm of vision foundation models.
This repository aims to embed Florence-2's capabilities into ComfyUI, allowing users to utilize its advanced functionalities within their workflows. A significant enhancement in this fork is the inclusion of Document Visual Question Answering (DocVQA), enabling users to interact with and extract information from document images.
To install the ComfyUI-Florence2 custom nodes, follow these steps:
Clone the repository into the ComfyUI/custom_nodes
folder:
git clone https://github.com/kijai/ComfyUI-Florence2 ComfyUI/custom_nodes/ComfyUI-Florence2
Install the dependencies listed in requirements.txt
. Ensure transformers version 4.38.0 or higher is used:
pip install -r requirements.txt
For users of the portable version, run the following in the ComfyUI_windows_portable
directory:
python_embeded\python.exe -m pip install -r ComfyUI\custom_nodes\ComfyUI-Florence2\requirements.txt
This repository provides several nodes, each designed to integrate Florence-2 functionalities into ComfyUI:
ComfyUI/models/LLM
.The standout feature in this repository is its support for Document Visual Question Answering (DocVQA) using the Florence-2 model. This feature allows users to ask questions about document images, providing answers based on both visual and textual data. This is particularly advantageous for interpreting information from scanned documents, forms, receipts, and other text-heavy visuals.
The repository supports a range of Florence-2 models, which can be automatically downloaded. Some of these models include:
Official:
Tested Finetunes:
ComfyUI-Florence2 can significantly enhance your workflows by:
By integrating these nodes, users can harness the cutting-edge capabilities of Florence-2 to build more intelligent and responsive applications within ComfyUI.