Dota - Tianai Dong

Hi! I am a third-year PhD student, affiliated with Multimodal Language Department at the Max Planck Institute for Psycholinguistics, and Predictive Brain Lab at the Donders Institute (Centre for Cognitive Neuroimaging). I am co-advised by Floris de Lange , Lea-Maria Schmitt , Stefan Frank, and Paula Rubio-Fernández . I also work closely with Mariya Toneva at the Max Planck Institute for Software Systems. I am funded by an IMPRS fellowship.

I study how humans acquire, mentally represent, and generate predictions about language through our rich multimodal experiences. My approach combines computational methods with insights from neuroscience, linguistics, and psychology, with the dual goals of understanding the human mind and advancing artificial intelligence.

If you want to discuss any academia-related topics, please feel free to reach out to me :)

Email // Google Scholar // BlueSky

profile photo
News
2025 May

I gave a talk at CIMEC computational linguistics, on "Grounding Language (and Language Models) by Seeing, Hearing, and Interacting"

2025 April

Our workshop on Representational Alignment is back at ICLR 2025!

2025

I'm helping to organize this year's CCN in Amsterdam as part of the DEI Committee.

Papers
You Prefer This One, I Prefer Yours: Using Reference Words is Harder Than Vocabulary Words for Humans and Multimodal Language Models
Dota Tianai Dong (co-first), Yifan Luo (co-first), Po-Ya Angela Wang, Asli Ozyurek, Paula Rubio-Fernandez
Preprint, 2025

We evaluated seven MLMs against humans on three word classes: vocabulary words, possessive pronouns, and demonstrative pronouns. We observed a consistent difficulty hierarchy shared by both humans and models, but a clear performance gap remains: while MLMs approach human-level performance on vocabulary tasks, they show substantial deficits with possessive and demonstrative pronouns.

Multimodal Video Transformers Partially Align with Multimodal Grounding and Compositionality in the Brain
Dota Tianai Dong, Mariya Toneva
ICLR-MRL, 2023; CCN, 2023; Preprint, 2024

We propose to probe a pre-trained multimodal video transformer model, guided by insights from neuroscientific evidence on multimodal information processing in the human brain.

Discogem: A crowdsourced corpus of genre-mixed implicit discourse relations
Merel Scholman Dota Tianai Dong, Frances Yung, Vera Demberg
LREC, 2022;

We present DiscoGeM, a crowdsourced corpus of 6,505 implicit discourse relations from three genres: political speech, literature, and encyclopedic text.

Comparison of methods for explicit discourse connective identification across various domains
Merel Scholman Dota Tianai Dong, Frances Yung, Vera Demberg
CODI, 2021;

We assess the performance on explicit connective identification of four parse methods (PDTB e2e, Lin et al., 2014; the winner of CONLL2015, Wang and Lan, 2015; DisSent, Nie et al., 2019; and Discopy, Knaebel and Stede, 2020), along with a simple heuristic.

Visually grounded follow-up questions: A dataset of spatial questions which require dialogue history
Dota Tianai Dong, Alberto Testoni, Luciana Benotti, Raffaella Bernardi
Splurobonlp, 2021;

We define and evaluate a methodology for extracting history-dependent spatial questions from visual dialogues.

Others
Analyses of Multiple Discourse Relations within a Chinese Sentence
Dota Tianai Dong, Bonnie Webber, Jennifer Spenader
Bachelor Thesis


Template from here