Search

Scholarly Works (2 results)

Thesis
Peer Reviewed

Improving and Understanding Self-Supervised Learning Algorithms

Tejankar, Ajinkya Baban
Advisor(s): Pirsiavash, Hamed

UC Davis Electronic Theses and Dissertations (2024)

The success of deep neural networks in the past decade was founded on supervised learning, but as the model and dataset sizes have grown so has the desire to break away from the costly process of human annotations, a requirement in supervised learning. In addition to the cost of data labeling, the human annotation process can be ambiguous, prone to biases, and involve privacy concerns (e.g. medical imaging). Here, self-supervised learning offers a path forward where models can be trained on cheap and abundantly available unlabeled data.

Improving Self-Supervised Learning. The idea behind self-supervised learning (SSL) is to use the training pipeline of supervised learning, but since there are no labels, replace the supervised task with a "pretext" task that is derived from the data itself. A well designed pretext task induces the model to learn a feature representation of the data that is useful for downstream tasks. Since the performance of an SSL model heavily relies on the pretext task used, one of my research objectives is to improve their design. Specifically, my contributions are as follows. 1) Traditional SSL pretext tasks are less effective for smaller capacity models than larger capacity models. Hence, I developed a better pretext task for smaller models. 2) Contrastive learning, a popular pretext task in SSL, suffers from the problem of treating semantically similar images as dissimilar. Hence, I developed a pretext task to fix this problem. 3) Clustering based SSL pretext tasks also suffer from the problem of incorrect negatives in addition to imposing unnecessary priors on the shape and size of clusters. Hence, I developed a mean-shift clustering pretext task that fixes these problems. 4) While an improvement over previous clustering methods, the mean-shift pretext task does not cluster semantically diverse samples. Hence, I developed a constrained mean-shift clustering pretext task that clusters semantically relevant yet far away samples.

Understanding Self-Supervised Learning. When scaled to large datasets, SSL models have been shown to learn rich generalizable features in both Natural Language Processing and Computer Vision. The idea is so powerful that for most applications these days, the default first step is to load a self-supervised model and then either use it in a few-shot setting or fine-tune it for a specific application. Hence, in addition to improving SSL models, the other objective of my research is to understand their inner working. I have made following contributions towards this objective. 1) SSL models are vulnerable to a class of adversarial attacks called "backdoor attacks". If an attacker can hijack the data collection pipeline, then they can alter the behavior of the model such that it fails in the presence of an attacker chosen backdoor. I analyzed the mechanism through which backdoors affect SSL models and used the insights to develop a defense for the attack. 2) Backdoor attacks are possible since SSL models learn shortcut features present in the dataset. Given that the scope of SSL models extends beyond computer vision, I was interested in understanding the types of shortcuts exploited by the language component in vision-language contrastive models. I showed that the model ignores grammatical structure of the language and simply uses language as a bag-of-words (BoWs).

Cover page: Improving and Understanding Self-Supervised Learning Algorithms

Article
Peer Reviewed

Surveying Nutrient Assessment with Photographs of Meals (SNAPMe): A Benchmark Dataset of Food Photos for Dietary Assessment.

UC Davis Previously Published Works (2023)

Photo-based dietary assessment is becoming more feasible as artificial intelligence methods improve. However, advancement of these methods for dietary assessment in research settings has been hindered by the lack of an appropriate dataset against which to benchmark algorithm performance. We conducted the Surveying Nutrient Assessment with Photographs of Meals (SNAPMe) study (ClinicalTrials ID: NCT05008653) to pair meal photographs with traditional food records. Participants were recruited nationally, and 110 enrollment meetings were completed via web-based video conferencing. Participants uploaded and annotated their meal photos using a mobile phone app called Bitesnap and completed food records using the Automated Self-Administered 24-h Dietary Assessment Tool (ASA24®) version 2020. Participants included photos before and after eating non-packaged and multi-serving packaged meals, as well as photos of the front and ingredient labels for single-serving packaged foods. The SNAPMe Database (DB) contains 3311 unique food photos linked with 275 ASA24 food records from 95 participants who photographed all foods consumed and recorded food records in parallel for up to 3 study days each. The use of the SNAPMe DB to evaluate ingredient prediction demonstrated that the publicly available algorithms FB Inverse Cooking and Im2Recipe performed poorly, especially for single-ingredient foods and beverages. Correlations between nutrient estimates common to the Bitesnap and ASA24 dietary assessment tools indicated a range in predictive capacity across nutrients (cholesterol, adjusted R2 = 0.85, p < 0.0001; food folate, adjusted R2 = 0.21, p < 0.05). SNAPMe DB is a publicly available benchmark for photo-based dietary assessment in nutrition research. Its demonstrated utility suggested areas of needed improvement, especially the prediction of single-ingredient foods and beverages.

Cover page: Surveying Nutrient Assessment with Photographs of Meals (SNAPMe): A Benchmark Dataset of Food Photos for Dietary Assessment.

Creative Commons 'BY' version 4.0 license