Just as human eyes tend to focus on pictures before reading accompanying text, multimodal artificial intelligence (AI)—which processes multiple types ...