Research Topic

Multimodal Machine Learning Applications

This cluster of papers focuses on the development and improvement of visual question answering systems, image captioning techniques, and neural networks for understanding and generating descriptions of images and videos. The research involves semantic reasoning, multimodal fusion, scene graph generation, attention mechanisms, and deep learning approaches to bridge the gap between vision and language.

Works

58,512

Citations

677,364

Domain

Physical Sciences

Field

Computer Science

Subfield

Computer Vision and Pattern Recognition

OpenAlex ID

T11714

Taxonomy Context

Physical Sciences / Computer Science / Computer Vision and Pattern Recognition

Multimodal Machine Learning Applications

Taxonomy Context

Related Topics