Train vision-language models (VLMs) with reinforcement learning using Group Relative Policy Optimization (GRPO) on multimodal image+text tasks via the verl framework. This workflow trains ...
Tensor pre-processing on low memory hardware(?) fails due to Qwen, VAE and audio separately loaded to CPU/GPU causing dtype mismatch in preprocess_to_tensors(). Occurs on consumer 3070 ti. Temporary ...