@inproceedings{KelTyaImrLicGomEMNLP2025,
    title = "Parsing the Switch: {LLM}-Based {UD} Annotation for Complex Code-Switched and Low-Resource Languages",
    author = "Kellert, Olga  and
      Tyagi, Nemika  and
      Imran, Muhammad  and
      Licona-Guevara, Nelvin  and
      G{\'o}mez-Rodr{\'i}guez, Carlos",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-emnlp.863/",
    doi = "10.18653/v1/2025.findings-emnlp.863",
    pages = "15934--15949",
    ISBN = "979-8-89176-335-7",
    abstract = "Code-switching presents a complex challenge for syntactic analysis, especially in low-resource language settings where annotated data is scarce. While recent work has explored the use of large language models (LLMs) for sequence-level tagging, few approaches systematically investigate how well these models capture syntactic structure in code-switched contexts. Moreover, existing parsers trained on monolingual treebanks often fail to generalize to multilingual and mixed-language input. To address this gap, we introduce the BiLingua Pipeline, an LLM-based annotation pipeline designed to produce Universal Dependencies (UD) annotations for code-switched text. First, we develop a prompt-based framework for Spanish-English and Spanish-Guaran{\'i} data, combining few-shot LLM prompting with expert review. Second, we release two annotated datasets, including the first Spanish-Guaran{\'i} UD-parsed corpus. Third, we conduct a detailed syntactic analysis of switch points across language pairs and communicative contexts. Experimental results show that BiLingua Pipeline achieves up to 95.29{\%} LAS after expert revision, significantly outperforming prior baselines and multilingual parsers. These results show that LLMs, when carefully guided, can serve as practical tools for bootstrapping syntactic resources in under-resourced, code-switched environments."
}