TIGER-Lab

TIGER-Lab is a research laboratory at the University of Waterloo, whose full name is Text and Image GEnerative Research Lab. The laboratory is led by Professor Wenhu Chen and is affiliated with the Vector Institute for Artificial Intelligence and the Waterloo NLP Group. The laboratory was established around 2023 and focuses on the field of generative artificial intelligence, including text generation, image and video generation, multi-modal retrieval and grounding, improvement of generative AI's reasoning and planning capabilities, controllability research, and evaluation method development.

The lab's mission is to advance the development of generative AI through innovative solutions to make it more suitable for social transformation digital content creation. The laboratory emphasizes improving capabilities in the post-training phase of basic models (such as instruction tuning or preference optimization), building new benchmarks to evaluate model progress, and enhancing the fidelity and controllability of generative models to support various generative AI applications.

The main research directions of the laboratory include:

Basic model improvement: Focusing on instruction tuning, preference optimization and retrieval enhancement generation, aiming to improve the model's capabilities in reasoning, planning and structural knowledge grounding.

Benchmark development: Create robust evaluation frameworks for testing multimodal understanding, long-context learning, and structured output generation.

Generative model enhancement: For image and video diffusion models, improve visual consistency, generation efficiency and editing controllability, and solve hallucination problems and unfaithful generation.

Multi-modal applications: Explore text-image interaction, video editing, and multi-image instruction tuning to advance the development of cross-modal AI.

The membership structure of TIGER-Lab includes multiple doctoral students, master's students and interns. There are currently about 8 doctoral and master's students, as well as multiple current and former interns. These members come from top universities around the world, such as Tsinghua University, Zhejiang University, Hong Kong University of Science and Technology, and the University of Toronto. Former members have entered doctoral programs at New York University, University of California, Santa Barbara and other institutions, or joined companies such as xAI and Modelbest.

The Laboratory's key projects and publications cover a variety of areas:

Instruction Tuning Project: MAmmoTH2 expands instruction data to 10 million examples by mining educational web documents, improving the reasoning capabilities of models such as Mistral or Llama-3, and reaching leading levels on mathematics and science benchmarks. MANTIS improves multi-modal task performance through multi-image instruction data set optimization, approaching the GPT-4V level. StructLM builds structural knowledge grounding datasets and achieves leading results on eight relevant datasets.

Benchmark test project: MMMU is a multi-disciplinary multi-modal benchmark that contains diverse visual inputs and is used to test the perception and reasoning capabilities of expert-level AGI. MMLU-Pro enhances the MMLU dataset, increases options to 10 and introduces university-level problems as the official benchmark for the Hugging Face LM ranking. Others include VideoScore for video generation evaluation, and the long-context LLM evaluation benchmark.

Generative model project: ConsistI2V improves visual consistency in image-to-video generation by expanding temporal attention layers. T2V-Turbo uses hybrid consistency and reinforcement learning training to balance video generation efficiency and quality. AnyV2V provides a training-free video-to-video editing framework that is compatible with image editing methods. Others such as VLM2Vec transform visual language models into multi-modal embedding tasks, and General-Reasoner promotes LLM cross-domain reasoning.

TIGER-Lab publishes resources on GitHub and Hugging Face through open source projects, including 53 repositories, such as MMLU-Pro, VLM2Vec, ImagenHub, and verl-tool, which are used to build trusted AI models, standardize image generation evaluation, and support the use of diverse tools. The lab collaborates with industry, participates in the Waterloo.AI initiative, and regularly publishes evaluation results and workshops, such as a lecture on diffusion-based video editing, to promote the reliability and applicability of AI models. Future plans continue to focus on instruction tuning, evaluation, retrieval enhancement, and visual content generation to extend the boundaries of the base model.

Published models

About this organization