Research Topic
Multimodal Machine Learning Applications
This cluster of papers focuses on the development and improvement of visual question answering systems, image captioning techniques, and neural networks for understanding and generating descriptions of images and videos. The research involves semantic reasoning, multimodal fusion, scene graph generation, attention mechanisms, and deep learning approaches to bridge the gap between vision and language.
Works
58,512
Citations
677,364
Domain
Physical Sciences
Field
Computer Science
Subfield
Computer Vision and Pattern Recognition
OpenAlex ID
T11714
Taxonomy Context
Physical Sciences / Computer Science / Computer Vision and Pattern Recognition
Related Topics
Advanced Neural Network ApplicationsMedical Image Segmentation TechniquesFace and Expression RecognitionVideo Surveillance and Tracking MethodsAdvanced Steganography and Watermarking TechniquesContext-Aware Activity Recognition SystemsAdvanced Vision and ImagingRobotic Path Planning AlgorithmsHandwritten Text Recognition TechniquesAdvanced Image and Video Retrieval TechniquesOptical measurement and interference techniquesImage and Signal Denoising Methods