Latest from Google AI – Sparse video tubes for joint video and image vision transformers
Posted by AJ Piergiovanni and Anelia Angelova, Research Scientists, Google Video understanding is a challenging problem that requires reasoning about both spatial information (e.g., for objects in a scene, including their locations and relations) and temporal information for activities or events shown in a video. There are many video understanding applications and tasks, such as…