R is a dynamic, interpreted programming language designed for statistical computing. In contrast to languages more traditionally used in software development and engineering, code analysis and tools for code analysis are not common in the R community, with some notable exceptions. Nevertheless, a general framework that facilitates the development of novel code analyses for R is valuable. This dissertation presents a collection of strategies and software for static analysis of R code. Two of the three parts focus on type inference, a specific kind of static analysis which attempts to determine the type of data produced by each expression in the code.
The first part describes a framework for creating static analyses and transformation of R code based on contemporary techniques and research. The framework provides tools to search code for specific syntactic patterns, extract information about different ways in which code can be evaluated depending on run-time conditions, and examine how data propagate from definitions of variables to expressions which use those variables.
The second part presents a static type inference strategy for R code. The strategy leverages the static analysis framework developed in the first chapter. In contrast to languages like C and Java, R code is generally not annotated with types and there is no built-in syntax to add type annotations. Thus type inference is necessary in order to get information about types. Information about types is useful for transforming and translating code, checking code for errors, and reasoning about code.
The third part presents strategies for collecting type information from foreign routines written in C and called from R. The type inference strategy for R code can more accurately infer types if it has type signatures for these routines. Even in C code, the R type of an R object does not have to be specified in the code, so type inference is non-trivial.