My work focuses on understanding language and its interaction with the physical and social world. I take an interdisciplinary approach, combining computational methods —such as machine learning and bayesian modeling— with insights from neuroscience, linguistics, and
psychology to better understand the human mind and advance artificial intelligence.
Fun Fact: Dota is actually my real, preferred name. It comes from my Mandarin initials, no connection to the game;). My family started calling me that, my friends picked it up, and it’s stuck ever since (in the best way).
If you’d like to discuss academic topics, feel free to get in touch :).
Multimodal language models (MLMs) increasingly demonstrate human-like communication, yet their use of everyday perspectival words remains poorly understood. To address this gap, we compare humans and MLMs in their use of three word types, which we predict impose increasing cognitive demands: vocabulary (e.g., 'boat' or 'cup'), possessives (e.g., 'mine' vs. 'yours'), and demonstratives (e.g., 'this one' vs. 'that one'). Testing seven MLMs against human participants, we find that perspectival words are harder than vocabulary words for both groups. The gap is even larger for MLMs: while models approach human-level performance on using vocabulary, they exhibit clear deficits with possessives and even greater difficulties with demonstratives. Ablation analyses point to limitations in perspective-taking and spatial reasoning as key sources of these gaps in MLMs. Instruction-based prompting helps close the gap for possessives but still leaves demonstratives far below human performance. These results show that, unlike vocabulary, perspectival words pose a greater challenge in human communication—and this difficulty is further amplified in MLMs, revealing a crucial shortfall in their pragmatic and social-cognitive abilities.
The Statistics of Natural Experience Dota Tianai Dong*,
Jing Li*,
Tobias Thomas*,
Linda B. Smith
*equal contribution
Proceedings of the Analytical Connectionism Schools 2023--2024, PMLR 320:126-150, 2026
These lecture notes present Linda Smith's comprehensive analysis of the statistics of natural experience and its consequences for how we think about learning and intelligence. The material explores how statistics shape behavior and learning, details a developmental curriculum, examines properties of natural statistics, and investigates the dynamic coupling of parents and toddlers. Through multiple perspectives and examples, Smith offers insights into how statistical patterns in our environment influence cognitive development and learning processes. This collection is particularly valuable for machine learning readers seeking to understand the statistical foundations of human cognition and their applications to artificial intelligence systems.