VulnData - A Security Vulnerability Dataset Collector

VulnData is a project aimed at gathering an extensive dataset of security vulnerabilities across various programming languages. This project will scrape data from various sources, consolidate it, and present it in a unified format to train a language model that can help identify potential security flaws in code. The goal is to build a comprehensive resource for security professionals, researchers, and developers who want to understand the vulnerabilities in their codebase.

Project Overview

The primary objective of VulnData is to:

Scrape data: Collect security vulnerability data from various trusted sources.
Data consolidation: Merge, process, and format the collected data into a structured and usable form.
Vulnerability mapping: Categorize vulnerabilities by programming language, vulnerability type, and common exploitation methods.
Training a model: Use the data to train a model to identify patterns and generate feedback for developers to improve the security of their code.

Core Features

Web Scraping: Automates the process of scraping vulnerability data from multiple sources, including CVE databases, blog posts, and security advisories.
Data Collection: Collects security-related information and maps vulnerabilities to their corresponding programming languages and severity levels.
Vulnerability Classification: Classifies vulnerabilities based on their type (e.g., SQL injection, buffer overflow, XSS) and the severity of the issue.
Dataset Formatting: Consolidates the scraped data into a structured, machine-readable format (JSON, CSV, etc.) for easy model training.
Security Model Training: After collecting sufficient data, this project will leverage the dataset to train a machine learning model focused on identifying and categorizing code vulnerabilities.

Installation

Follow the steps below to install and run VulnData locally.

Prerequisites

Rust

Clone the repository

git clone https://github.com/yourusername/vulndata.git
cd vulndata

Build the Project

To install dependencies and build the project, follow the instructions for your chosen language.

For Rust:

cargo build --release

Running the Project

Once built, run the project with the following command:

./vulndata --scrape

This will begin scraping data, consolidating it into the specified format, and initiate training the model (once sufficient data has been collected).

Project Structure

/scrapers: Contains code for scraping data from various security sources (e.g., CVE databases, blog posts, security advisories).
/data: The collected raw data before processing.
/models: Scripts for training and saving machine learning models.
/scripts: Utility scripts for cleaning and processing the data.
/output: Processed datasets in structured formats (JSON, CSV).
/docs: Documentation on data sources, security classifications, and model training procedures.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Future Plans

Additional Language Support: Extend the dataset to include vulnerabilities in more programming languages.
More Advanced Models: Train a more advanced model to identify vulnerabilities directly in code and suggest fixes.
Web Interface: Build a user-friendly web interface to query the vulnerability dataset and get real-time analysis on code snippets.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VulnData - A Security Vulnerability Dataset Collector

Project Overview

Core Features

Installation

Prerequisites

Clone the repository

Build the Project

Running the Project

Project Structure

License

Future Plans

About

Releases

Packages

License

Rubix982/VulnData

Folders and files

Latest commit

History

Repository files navigation

VulnData - A Security Vulnerability Dataset Collector

Project Overview

Core Features

Installation

Prerequisites

Clone the repository

Build the Project

Running the Project

Project Structure

License

Future Plans

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages