Jeff Stuckman

University of Maryland, College Park
Doctoral Student in Computer Science
Email: stuckman (at symbol) umd (dot symbol) edu

Research Interests

My main research interest relates to learning and evaluating methods to improve software security by gaining insights from past security vulnerabilities. Much related work has focused on analyzing characteristics of vulnerabilities and modelling their relationships to features found in source code. My current work focuses on:

How studies of security vulnerabilities can be refactored or redesigned to provide actionable insights to multiple stakeholder communities, such as developers, system administrators, and users
Specific activities which stakeholders can perform, such as system reconfiguration, which can improve security without neutralizing the beneficial functionality provided by a potentially vulnerable application.
Ways to collect security vulnerability datasets in an automated or semi-automated manner, supporting the above efforts.

Research projects

Below are several current and past research projects which support these interests:

Vulnerability research dataset

In conjunction with our paper Predicting Vulnerable Components: Software Metrics vs Text Mining, presented at ISSRE 2014, we have released a dataset containing data on security vulnerabilities found in three open-source PHP web applications: PHPMyAdmin, Drupal, and Moodle. This dataset contains information on the revisions where each vulnerability was introduced and fixed, along with the file which contained the vulnerable code at each revision of the software. This fine-grained information on the evolution of each vulnerability allows for the time dimension to be considered when building predictive models for vulnerabilities -- a dimension that has traditionally been difficult to work with using readily available vulnerability date.

The dataset is available here

Actionable vulnerability prediction models

Related work in vulnerability prediction models has identified ways to compute the relative likelihood that regions of source code are associated with security vulnerabilities. We seek to recast these vulnerability prediction models into forms which can support decision-making in additional scenarios, such as choosing products which are more likely to be secure in the context of a user's particular environment.

Security vulnerability datasets

To facilitate better empirical vulnerability research, are developing two public software security vulnerability datasets. The first, BugBox, is a corpus of PHP web application vulnerabilities which allows for the behavior of exploits to be measured in a simulated runtime environment. The second, a dataset linking PHP web application vulnerabilities to source code artifacts, is slated for release later in 2014.

Reproducible defect and vulnerability prediction studies

To complement the datasets discussed above, we are currently developing tools supporting easier replication of prediction studies for defects and vulnerabilities.

Software defect and vulnerability modeling

Defect and vulnerability prediction often involves the construction of models which estimate the likelihood that a defect will exist in a particular source code artifact. Sometimes, these models can also be utilized in a generative capacity to produce synthetic data (such as counts of simulated defects). We are examining if characteristics of this synthetic data can be compared to those of real defect data in order to examine the consistency of the model against the data that was actually observed. In addition, we are studying ways to improve cross-project prediction performance by increasing the generality of predictive models, in the same way that avoiding overfitting can improve performance in within-project prediction.

Wiki research dataset

This section describes the companion datasets to Measuring the Wikisphere and its related work.

Due to the inherent difficulty in obtaining experimental data from wikis, past quantitative wiki research has largely been focused on Wikipedia, limiting the degree that it can be generalized. We developed WikiCrawler, a tool that automatically downloads and analyzes wikis, and studied 151 popular wikis running Mediawiki (none of them Wikipedias).

Available for download is a dataset describing the articles of each analyzed wiki, the users of each wiki, and the wiki's revisions (which indicate that a certain user edited a certain wiki article on a certain date and time.) Wikis and wiki articles are identified only by ID numbers, and due to copyright issues, data that would readily link a specific wiki in our data set to a specific wiki in real life is not publicly released. This means that URLs, the text of articles, the titles of articles, and revision comments are not available (although the link graph is preserved.)

The file containing information on each wiki's pages and users (wikidata) is a CSV file containing the following columns (different types of rows can be distinguished by the number of columns therein):

	Col 1	Col 2	Col 3	Col 4	Col 5
User count	Wiki ID	Number of users
Page info	Wiki ID	URL ID	Article ID	Word count	Word count of clickable links
Link graph	Wiki ID	Origin URL ID	Destination URL ID

The file containing information on each wiki page's edit history (wikihistorydata) is a CSV file containing the following columns:

	Col 1	Col 2	Col 3	Col 4	Col 5	Col 6	Col 7	Col 8
History entry	Wiki ID	URL ID	User ID	Minutes into day edited	Year edited	Month edited	Day edited	Reserved

Download dataset

If your research requires the use of data that has not been publicly released, please contact the authors. Possible resolutions include the release of a more detailed scrubbed dataset, or a private release of the full dataset.

Other software projects

Software packages which we developed to support the above projects are listed here. Due to the realities of writing research code, this software is largely undocumented and, hence, is not directly available on this site. However, this software will be made available to anyone who e-mails and requests it.

PHP software metrics computation framework

Much research on defect prediction with software metrics has studied languages such as Java and C; however, there has been comparatively little research on metrics in conjunction with scripting languages such as PHP. Currently available tools which work on PHP source code are only capable of computing a relatively small number of metrics. We have developed a metrics computation tool for PHP that computes a wider variety of size, complexity, and coupling metrics.

Wikicrawler

WikiCrawler is the software package which was used to generate the wiki corpus available on this site. It has been designed to quickly download the wiki data from Mediawiki instances and extract its relevant features without downloading unnecessary data.

WikiCrawler is written in Java with an Apache Derby backend. R functions to import the data are also available.

Publications

J. Stuckman and J. Purtilo. Mining security vulnerabilities from Linux distribution metadata. In proceedings of RSDA: 2nd IEEE International Workshop on Reliability and Security Data Analysis (To appear), Naples, Italy, 2014.
J. Walden, J. Stuckman, and R. Scandariato. Predicting Vulnerable Components: Software Metrics vs Text Mining. In proceedings of ISSRE: IEEE International Symposium on Software Reliability Engineering (To appear), Naples, Italy, 2014.
J. Stuckman, K. Wills, and J. Purtilo. Evaluating Software Product Metrics with Synthetic Defect Data. In proceedings of Empirical Software Engineering and Measurement (Best short paper award), 2013.
G. Nilson, K. Wills, J. Stuckman, and J. Purtilo. BugBox: A Vulnerability Corpus for PHP Web Applications. In proceedings of the 6th Workshop on Cyber Security Experimentation and Test, 2013.
J. Stuckman and J. Purtilo. Metrics-based investigation of distributed intrusion detection and attack surface reduction. University of Maryland Computer Science Department CS-TR-5014, College Park, MD, 2012.
J. Stuckman and J. Purtilo. Comparing and Applying Attack Surface Metrics. In proceedings of MetriSec: International Workshop on Security Measurements and Metrics, 2012.
J. Stuckman and J. Purtilo. A Testbed for the Evaluation of Web Intrusion Prevention Systems. In proceedings of MetriSec: International Workshop on Security Measurements and Metrics, 2011.
J. Stuckman and J. Purtilo. Detecting runtime anomalies in AJAX applications through trace analysis. University of Maryland Computer Science Department CS-TR-4989, College Park, MD, 2011.
J. Stuckman and J. Purtilo. Analyzing the Wikisphere: Methodology and data to support quantitative wiki research. Journal of the American Society for Information Science and Technology, vol. 62, no. 8, pp. 1564–1576, 2011.
J. Stuckman and J. Purtilo. Measuring the Wikisphere. In WikiSym '09: Proceedings of the 2009 international symposium on Wikis (Best paper finalist), ACM, 2009.
D. Tahmoush, J. Stuckman, S. McMaster, S. Fouche, and J. Purtilo. Enhancing Software Project Management Courses With Industry Participation . In FECS'09 - The 2009 International Conference on Frontiers in Education: Computer Science and Computer Engineering, 2009.
J. Stuckman and G. Q. Zhang. Mastermind is NP-Complete. INFOCOMP J. Comput. Sci, 5:25-28, 2006.