Follow Us on Google News
Muhammad Umar, A Pakistani student at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI), has made a significant contribution in detecting propaganda on social media platforms, especially in cases where there is a mixture of low and high-resource languages.
Umar, who is from Pakistan and speaks Urdu as his first language, is one of many people who are contributing to the large amounts of research and time being spent on languages other than English for preservation, education, and language modelling.
Umar, who holds a Master of Science in natural language processing (NLP), is aware of the influence that language has on public conversation and the way that opinions are formed.
“Propaganda is a pervasive tool used to manipulate public opinion, and it is a growing concern in the digital age, especially in bilingual communities where little to no work has been done to detect it. Most propaganda detection work has been done on high-resource languages, such as English, leaving low-resource languages largely unexplored,” said Umar, who is part of the university’s first cohort of NLP graduates.
Umar noted that code-switching, which involves mixing multiple languages in the same text, is common in low-resource language communities and can make propaganda detection more challenging.
“In linguistics, code-switching refers to the practice of alternating between two or more languages or language varieties in a single conversation or text. In the context of my thesis, code-switched social media text specifically refers to social media text that uses a mixture of different languages, including English and Roman Urdu.”
Despite graduating, Umar is continuing his research and hopes to submit a paper related to detecting propaganda techniques in code-switched text at the 2023 Empirical Methods in Natural Language Processing (EMNLP) conference, one of the primary high impact NLP and artificial intelligence conferences for NLP research.
His model can be extended to other underrepresented or low-resource languages.