Predicting Data Exfiltration via DNS

Overview

This repository consists of development of machine learning models capable of predicting data exfiltration attempts via DNS queries.

It includes the following model types:

Standard Static ML model
ML model that adapts to patterns in real-time

Set Up

Install Docker and Docker-compose on your system
Run docker script on your system, once docker is installed.
- docker_script.bat for windows
- docker_script.sh for Linux/MacOS

Static Model .ipynb: Implementation for static model including data preprocessing, feature engineering, and model training processes.
Dynamic Model .ipynb: Implementation for dynamic model which includes real-time data processing and adaptive learning mechanisms.
Kafka_dataset.csv: Dataset for the dynamic model training that simulates real-time DNS query data through Kafka streams.
Static_dataset.csv: Dataset for the static model training to analyze historical DNS query data.
docker-compose.yml: Docker Compose configuration file for setting up the Kafka streaming environment required for the dynamic model's data processing.
docker_script.bat, docker_script.sh: Scripts to help set up the Docker environment on Windows and Unix-based systems, respectively.
requirements.txt: Specifies the Python package dependencies necessary to run the project.

Prerequisites

Docker installed on your machine.
Python 3.x along with Jupyter or any compatible IDE for running the notebooks.

Setup

Environment Preparation: Begin by setting up the Kafka environment using Docker. Utilize the provided docker_script.bat for Windows or docker_script.sh for Unix/Linux systems to automate the setup.
Install Dependencies: Install the Python dependencies specified in requirements.txt by running the following command:
```
pip install -r requirements.txt
```
Run the Notebooks: Navigate to the .ipynb notebooks in Jupyter or your preferred IDE. Ensure the Kafka environment is up and running before executing the dynamic model notebook.

Highlights

Implemented classifier for predicting DNS-based data exfiltration, achieving an impressive accuracy rate of over 90% through meticulous data analysis and feature engineering.
Demonstrated expertise in dynamic model creation by implementing a cutting-edge solution capable of adapting in real-time, showcasing a remarkable improvement of 15% in detection accuracy compared to static models.
Leveraged ML techniques to uncover nuanced patterns in streaming data, resulting in a dynamic model with a 20% increase in sensitivity to emerging threats

For detailed explanation of the dataset analysis, learning methodologies and results, please refer to Detailed Analysis and Implementation

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Detailed-Report.pdf		Detailed-Report.pdf
Dynamic_model.ipynb		Dynamic_model.ipynb
Kafka_dataset.csv		Kafka_dataset.csv
README.md		README.md
Static_dataset.csv		Static_dataset.csv
Static_model.ipynb		Static_model.ipynb
dns_kafka_producer.ipynb		dns_kafka_producer.ipynb
docker-compose.yml		docker-compose.yml
docker_script.bat		docker_script.bat
docker_script.sh		docker_script.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Predicting Data Exfiltration via DNS

Overview

Set Up

Contents

Prerequisites

Setup

Highlights

License

About

Releases

Packages

Languages

deepankkartikey/Data-Exfiltration-Detection

Folders and files

Latest commit

History

Repository files navigation

Predicting Data Exfiltration via DNS

Overview

Set Up

Contents

Prerequisites

Setup

Highlights

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages