Software vulnerabilities are now reported at an unprecedented speed due to
the recent development of automated vulnerability hunting tools. However,
fixing vulnerabilities still mainly depends on programmers' manual efforts.
Developers need to deeply understand the vulnerability and try to affect the
system's functions as little as possible.
In this paper, with the advancement of Neural Machine Translation (NMT)
techniques, we provide a novel approach called SeqTrans to exploit historical
vulnerability fixes to provide suggestions and automatically fix the source
code. To capture the contextual information around the vulnerable code, we
propose to leverage data flow dependencies to construct code sequences and fed
them into the state-of-the-art transformer model. The fine-tuning strategy has
been introduced to overcome the small sample size problem. We evaluate SeqTrans
on a dataset containing 1,282 commits that fix 624 vulnerabilities in 205 Java
projects. Results show that the accuracy of SeqTrans outperforms the latest
techniques and achieves 23.3% in statement-level fix and 25.3% in CVE-level
fix. In the meantime, we look deep inside the result and observe that NMT model
performs very well in certain kinds of vulnerabilities like CWE-287 (Improper
Authentication) and CWE-863 (Incorrect Authorization).