Dutta, Rohan

Encoding Abstract Syntax Trees (AST) via distance based self-attention mechanism

2021

Dutta, Rohan
Advisor(s): Kao, Jonathan

Abstract

Code summarization and generation are valuable tasks to master for their wide range ofapplications in code readability and code translation to name a few. This research work is an extension of previously conducted research on the use of PLBART, a sequence-tosequence transformer model used for a variety of program and language understanding and generation (PLUG) tasks. The ultimate goal is to improve the performance of PLBART by modifying the noise function of it’s denoising autoencoder. The current noise function corrupts code tokens randomly, but we hope to improve performance by masking nodes on the corresponding Abstract Syntax Tree (AST) instead.To integrate the AST structure into the self-attention mechanism, we adopt the dependency-guided self-attention mechanism explored in NLP literature in particular [ZKC21]. However, from the AST, we cannot compute distances between all tokens that appear in a code since they need not necessarily appear in the parse tree structure. So, we investigate how we can derive distances between tokens from the AST structure.

Main Content

For improved accessibility of PDF content, download the file to your device.

UCLA

Encoding Abstract Syntax Trees (AST) via distance based self-attention mechanism