EVE

Preview：

<br />

Introduce：

EVE is an encod-free visual-language model jointly developed by researchers from Dalian University of Technology, Beijing Institute of Artificial Intelligence and Peking University. It shows outstanding capabilities at different image aspect ratios, outperforming the Fuyu-8B and approaching the LVLMs that is the basis of modular encoders. EVE stands out in terms of data efficiency, training efficiency, pre-training using 33M of publicly available data, and training using 665K LLaVA SFT data for EVE-7B models, and additional 1.2M SFT data for EVE-7B (HD) models. EVE was developed using an efficient, transparent, and practical strategy that opens up new avenues for a pure decoder architecture across modes.
EVE
https://pic.chinaz.com/ai/2024/06/24061803085387658927.jpg
Stakeholders:
The EVE model is primarily aimed at researchers and developers in the field of artificial intelligence, especially those professionals who focus on visual-language tasks and natural language processing. Due to its efficient data processing capabilities and training efficiency, EVE is well suited for scenarios where large-scale visual data and language models need to be processed, while at the same time being important for advancing the field of artificial intelligence.
Usage Scenario Examples:

The researchers used the EVE model for the image description generation task.
EVE was used by developers to develop a visual question answering system.
Educational institutions use EVE models to teach the construction and application of visual-language models.

The features of the tool:

Design of visual language model for arbitrary image aspect ratio.
Efficient pre-training using small amounts of publicly available data.
Leverage large amounts of SFT data for further optimization.
In terms of training efficiency, two 8-A100 (40G) nodes were used to complete training in about 9 days.
Encoder free architecture simplifies model complexity and improves transparency.
Demonstrated superior performance on multiple visual-language tasks.

Steps for Use:

Visit EVE’s GitHub page for project information and code.
Read the README file for the installation and configuration requirements of the model.
Follow the instructions to download and install the necessary dependencies.
Clone or download the EVE model codebase to your local environment.
Follow the steps in the documentation to train or test the model.
Adjust the model parameters as needed to accommodate different visual-linguistic tasks.
Participate in community discussions, get help, or contribute code.

Tool’s Tabs: Visual-language model, encoder free