Small. With bf16/fp16 (supported by native pytorch), our baseline could be trained with only 2GB GPU memory. Friendly. You may use the off-the-shelf options to apply many state-of-the-art tricks in ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results