Skip to main content
eScholarship
Open Access Publications from the University of California

UC Riverside

UC Riverside Electronic Theses and Dissertations bannerUC Riverside

Extracting Actionable Information From Security Forums

Creative Commons 'BY' version 4.0 license
Abstract

The goal of this thesis is to systematically extract information from security forums,

whose information would be in general described as unstructured: the text of a post

is not necessarily following any writing rules. By contrast, many security initiatives and

commercial entities are harnessing the readily public information, but they seem to focus

on structured sources of information. Here, we focus on analyzing text content in security

forums to extract actionable information. Specifically, we search and nd: IP addresses

reported in the text, study keyword-based queries, and identify and classify threads that

are of interest to the security analysts.

The power of our study lies in the following key novelties. First, we use a matrix

decomposition method to extract latent features of the user behavioral information,

which we combine with textual information from related posts. Second, we address the

labeling difficulties by utilizing a cross-forum learning method that helps to transfer knowledge

between models. Third, we develop a multi-step weighted embedding approach, more

specifically, we project words, threads, and classes in appropriate embedding spaces and establish relevance and similarity there. These novel approaches enable us to extract and

refine information which could not be obtained from security forums if only trivial analyses

were used.

We collected a wealth of data from six different security forums. The contribution

of our work is threefold: (a) we develop a method to automatically identify malicious IP

addresses observed in the forums; (b) we propose a systematic method to identify and

classify user-specified threads of interest into four different categories, and (c) we present

an iterative approach to expand the initial keywords of interest which are essential feeds in

searching and retrieving information.

We see our approaches as essential building blocks in developing useful methods

for harnessing the wealth of information available in online forums.

Main Content
For improved accessibility of PDF content, download the file to your device.
Current View