Scientific data and expert-level tasks differ fundamentally from general-domain natural language tasks. Expert tasks, such as those performed by physicians, engineers, or scientists, demand not only a deep understanding of domain-specific knowledge but also intuitive reasoning, multi-step planning, and execution honed through years of training. The distinct properties of scientific data and the complexity of expert tasks necessitate foundational innovations in architecture, optimization, and task composition to advance the next generation of language models. In this dissertation, I develop machine learning (ML) paradigms inspired by scientific data and expert tasks, equipping language models with the intuition and knowledge of domain experts. My research introduces machine learning innovations and insights to enable a comprehensive spectrum of expertise acquisition, from explicit knowledge representation to implicit intuition modeling and from individual decision-making processes to the automation of complex expert workflows. Specifically, my work begins with extracting explicit knowledge instances from unstructured data in low-resource scenarios and capturing implicit expert intuition. Further, it expands to compositional, project-level reasoning and automation. Finally, I address critical issues of fairness and safety in generative LLMs.
Cookie SettingseScholarship uses cookies to ensure you have the best experience on our website. You can manage which cookies you want us to use.Our Privacy Statement includes more details on the cookies we use and how we protect your privacy.