Language is an essential part of human experience. We use computational, experimental, and corpus linguistic methods to understand the unique role of language in our lives and in our society. The Experimental and Computational Linguistics Ensemble Lab (ECOLE) at San Francisco State University focuses on research on both cognitive and social aspects of language.
Research
Our research projects focus on both cognitive and social aspects of language. On one hand, we study linguistic phenomena that shed light on the question of grammar architecture and the relation between language and cognition. At the same time we are interested in what language reveals about its users and about society in general.
Generative AI and Large Language Models
Our most recent line of research focuses on Generative AI and Large Language Models. We study how Large Language Models perform compared to humans on a range of tasks, from world knowledge and common sense reasoning to text simplification.
The Syntax and Semantics of Queries
The internet gave rise to a new form of language use – queries. Yet little is known about how we formulate queries. Queries are processed by search engines as ‘word salad’ with little attention to word order. In this project we investigate the hypothesis that queries are more than ‘word salad’ and that they have systematic structural properties.
Subjectivity in Language
Do you say Democrats and Republicans or Republicans and Democrats? In a series of papers we show that word order in binomials (above) and in prenominal adjectives depends on subjective preferences of the speaker: the attribute that is psychologically closer to the speaker is mentioned first.
Recent Publications
- Smirnova, A. (2024). Productivity and Creative Use of Compounds in Reduced Registers: Implications for Grammar Architecture. In Proceedings of the 46th Annual Conference of the Cognitive Science Society. https://escholarship.org/content/qt1c39n81t/qt1c39n81t.pdf
- Reese, M. and Smirnova, A. (2024). Comparing ChatGPT and Humans on World Knowledge and Commonsense Reasoning Tasks in the Proceedings of Computer-Human Interaction (CHI).
- Smirnova, A. (2021). Variation in linguistic complexity and its cognitive underpinning. Proceedings of the 43rd Annual Conference of the Cognitive Science Society, 2162-2168. https://escholarship.org/uc/item/8sf5t58c
- Smirnova, A. (2021). Evidentiality in abductive reasoning: Experimental support for a modal analysis of evidentials. Journal of Semantics, Vol. 38 (4), 531 – 570 https://doi.org/10.1093/jos/ffab013
Events and Announcements
- Anastasia Smirnova (PI) and Whitney Taylor (Political Science, SFSU) presented Students’ Perception on Research and Career-readiness Competencies. ConnectUR 2024 Annual Conference. June 20-21, 2024.
- ECOLE lab received Best Abstract award at 10th Annual UC Davis Symposium on Language Research for the project on Advantages and Challenges in using ChatGPT in Text Simplification for Children. May 24, 2024.
- Ava Austin presented Faculty-student collaboration in the ECOLE. National Conference on Undergraduate Research (NCUR). April 8-10, 2024. Long Beach.
ECOLE Lab Members
Lab members are SF State students who come from diverse backgrounds and bring to the table their expertise in computer science, data analysis, linguistics and psychology.
Lab Director
Anastasia Smirnova, Associate Professor
Professor Smirnova’s primary research focus is on the grammar architecture and the structure of the lexicon (e.g. the relation between verbs and nouns in language), as well as how temporal and modal information is expressed in language. Her research employs a variety of methods, from fieldwork to experimental and corpus studies.
Lab Members
Kyu Beom “Kyle” Chun (UC Berkeley Data Discovery Project)
Kyle is an undergraduate student at the University of California, Berkeley, majoring in Data Science. Along with a strong passion for exploring various data science topics, he is interested in NLP, AL/ML models, deep learning, and the real-world applications of these techniques. Kyle is eager to contribute to various research and industry projects based on a solid foundation he has built through coursework and external experiences and he seeks opportunities to apply his knowledge to contribute innovative solutions and approaches to enhance decision-making and push the lines of what AI/ML can acquire in various settings. Kyle loves to spend time with his dog, go to the gym, or cook with his family and friends outside of the classroom.
Jake Owen Connorton
Jake is an undergraduate student of linguistics at San Francisco State University. He is interested in computational linguistics, psycholinguistics, and animal communication systems. As a part of the ECOLE lab at SFSU he is excited to explore the capabilities of LLMs with a focus on the ethical use of AI. Outside the classroom he is an avid rock climber, ocean swimmer, and enjoys hosting dinner parties.
Diya Garg (UC Berkeley Data Discovery Project)
Diya is an undergraduate at the University of California, Berkeley, majoring in Applied Mathematics and Data Science. Passionate about leveraging data analytics for social impact, particularly in the field of education, she seeks to apply her technical skills to drive meaningful change. Diya is also interested in computational linguistics, exploring how AI/ML models and Natural Language Processing can enhance language understanding and communication technologies. She is focused on expanding her expertise in these areas, with a special interest in Large Language Models (LLMs). Beyond academics, she is an avid digital artist, enjoys playing the guitar, swimming, and exploring new places through travel.
Malleeswari Jagabattuni (MJ)
MJ is a graduate student in the Linguistics Department at San Francisco State University, specializing in Computational Linguistics. She also has an MA in Teaching English to Speakers of Other Languages and a BA in Anthropology from San Francisco State University. She was an English language educator serving immigrant and refugee populations for several years as a tutor and teacher. She is looking to apply her experiences from her background as an educator in the field of computational linguistics and natural language processing. Her research interests include evaluating large language models on natural language processing tasks as well as the syntax and semantics of under-resourced languages.
Mikey Pagán
Mikey is an M.A. student in Comparative & World Literature at SFSU where he also completed B.A.s in Linguistics and Comparative & World Literature. His research interests include semantics and sociolinguistics, especially critical discourse analyses and the ways we make and maintain communicative meaning across disciplines and registers.
Cassia Reddig
Cassia is currently pursuing a Master of Science in Interdisciplinary Studies (emphasis in Language, Cognition, & Computation) with a Graduate Certificate in Ethical AI at San Francisco State University, where she also obtained a BA in Psychology and Computational Linguistics. She has contributed to research in the domains of cognitive psychology and media psychology through SFSU LACE Lab and Stanford Social Media Lab. She is interested in applying multidisciplinary approaches to research natural language processing and human-centered artificial intelligence.
May Reese
May is a graduate student in the linguistics department at San Francisco State University. Her current research is on Large Language Models and Natural Language Understanding benchmark tests with a focus on Japanese. Her academic interests include language typology, computational linguistics and psycholinguistics. In her free time, you’ll find her reading, playing board games or out on a cycling trip.
William "Wil" Louis Rothman (UC Berkeley Data Discovery Project)
Wil is an undergraduate student at the University of California, Berkeley, majoring in Computer Science and Data Science with a domain emphasis in Robotics. He has research experience at the Computer Vision Laboratory at Osaka University's Graduate School of Information Science and Technology, where he worked under the supervision of Dr. Yasuyuki Matsushita. His research focused on developing algorithms for deterministically generated Kanji shadow art, culminating in a research paper. Wil's academic interests lie in efficient algorithms and big data applications. Originally from Los Angeles, California, Wil enjoys traveling, trying new things, and discovering new restaurants in his free time.
Siyona Sarma (UC Berkeley Data Discovery Project)
Siyona Sarma is an undergraduate student at the University of California Berkeley, majoring in Computer Science. Siyona is interested in exploring different applications of computer science and expanding her knowledge in different AI/ML models including LLMs, NLP libraries, and neural networks. Siyona aims to focus on building computer science tools for social good and believes that the ECOLE lab is using computational linguistics to do just that. Outside of university, Siyona loves to discover new bookstores, volunteer with dogs, cook for family/friends and go to the beach.
Shruti Vora (UC Berkeley Data Discovery Project)
Shruti Vora is an undergraduate student at the University of California, Berkeley, majoring in Data Science and Statistics. With a passion for the intersection of data science and finance, Shruti is dedicated to broadening her expertise across various domains of data science to deepen her understanding of machine learning models, Natural Language Processing (NLP), and Large Language Models (LLMs). Experienced in multiple programming languages, she is eager to apply her skills to solve real-world challenges and contribute to industry projects, utilizing advanced technology to make meaningful impacts. Outside of academics, Shruti enjoys baking, cooking, traveling, playing golf, painting, and spending time with family and friends.
Peixi “Max” Xie
Max Xie is a recent graduate with a B.A in Linguistics from the University of California, Santa Cruz. Currently pursuing an M.A degree in Linguistics at San Francisco State University. Experienced in linguistics research, computer programming, and startup management. Adept in Natural Language Processing (NLP), Computational Linguistics, and AI applications. He is known for strong leadership, communication, and analytical skills, and is actively seeking opportunities in the fields of AI and NLP that leverage linguistic expertise. Outside of school, Max has many different hobbies including swimming, hiking, biking, ping pong, pool, computer robotics, computer programming, reading, and gaming.
Lab Alumni
Helena Almassy. Professor of Mathematics, Cañada College
Lauren Baker. Finance manager at DTR Consulting Services
Angie Garcia. Manager at Sound Hound
Skyler Ilenstine. Computational Linguist at Microsoft vis DISYS
Jonathan Kakama. Data Analyst at Vaco
Chohee Kim. Senior Software Engineer at LinkedIn
Rose Kitchel. Executive Assistant at the Reeds Center
Helena Laranetto. Machine Learning Data Linguist II, Alexa Devices at Amazon
Sujung Nam. PhD student at the University of Hawaii, Honolulu
Jasmine Rivero. Chatbot Operations Manager at Sense
Amanda Robinson. Computational Linguist at Samsung
Ricardo Romero Sanchez. Linguistic Project Manager at Google
Laurel Selvig. Data Analyst at Axos Bank
Olivia Vallejo. LV Quality Specialist / Linguist
Erly Tang. PhD student in Linguistics and Anthropology at the University of Arizona