Salesforce AI Research has introduced BLiP–2, a new pre–training strategy that combines vision and language tasks to better equip AI models for natural language processing and computer vision tasks.
BLiP–2 can bootstrap from frozen image encoders and large language models (LLMs) to form better representations of images and language. This method has been shown to improve the performance of AI models on a variety of tasks, including image–captioning, question–answering, image classification, and more.
https://www.marktechpost.com/2023/02/05/salesforce–ai–research–introduces–blip–2–a–generic–and–efficient–vision–language–pre–training–strategy–that–bootstraps–from–frozen–image–encoders–and–frozen–large–language–models–llms/
https://arxiv.org/pdf/2301.12597.pdf