The assessment of behavioural disturbance in cetacean species (e.g. resulting from exposure to anthropogenic sources such as military sonar, seismic surveys, or pile driving) is important for effective conservation and management. Disturbance effects can be informed by Behavioural Response Studies (BRSs), involving either controlled exposure experiments (CEEs) where noise exposure conditions are presented deliberately to meet experimental objectives or in opportunistic contexts where ongoing activities are monitored in a strategic manner. In either context, animal-borne sensors or in situ observations can provide information on individual exposure and disturbance responses. The past 15 years of research have greatly expanded our understanding of behavioural responses to noise, including hundreds of experiments in nearly a dozen cetacean species. Many papers note limited sample sizes, required knowledge of baseline behaviour prior to exposure and the importance of contextual factors modulating behavioural responses, all of which in combination can lead to sampling biases, even for well-designed research programs. It is critical to understand these biases to robustly identify responses. This ensures outcomes of BRSs help inform predictions of how anthropogenic disturbance impacts individuals and populations. Our approach leverages concepts from the animal behaviour literature focused on helping to avoid sampling bias by considering what shapes an animals response. These factors include social, experience, genetic and natural changes in responsiveness. We developed and applied a modified version of this framework to synthesise current knowledge on cetacean response in the context of effects observed across marine and terrestrial taxa. This new Sampling, Exposure, Receptor framework (SERF) identifies 43 modulating factors, highlights potential biases, and assesses how these vary across selected focal species. In contrast to studies that identified variation in Exposure factors as a key concern, our analysis indicated that factors relating to Sampling (e.g. deploying tags on less evasive individuals, which biases selection of subjects), and Receptor (e.g. health status or coping style) have the greatest potential for weakening the desired broad representativeness of BRSs. Our assessment also highlights how potential biases could be addressed with existing datasets or future developments.