The success of a software application is related to users' willingness to keep using it. In this sense, evaluating User eXperience (UX) became an important part of the software development process. Researchers have been carrying out studies by employing various methods to evaluate the UX of software products. Some studies reported varied and even contradictory results when applying different UX evaluation methods, making it difficult for practitioners to identify which results to rely upon. However, these works did not evaluate the developers' perspectives and their impacts on the decision process. Moreover, such studies focused on one-shot evaluations, which cannot assess whether the methods provide the same big picture of the experience (i.e., deteriorating, improving, or stable). This paper presents a longitudinal study in which 68 students evaluated the UX of an online judge system by employing AttrakDiff, UEQ, and Sentence Completion methods at three moments along a semester. This study reveals contrasting results between the methods, which affected developers' decisions and interpretations. With this work, we intend to draw the HCI community's attention to the contrast between different UX evaluation methods and the impact of their outcomes in the software development process.