Glossary¶

Terms¶

Persistent state file enabling resumable dataset generation. Contains progress, timestamps, and metadata.

Minimum probability score required for a label. Below threshold triggers "uncertain" fallback.

Setting controlling strictness of uncertainty handling:

GPT-Generated Unified Format. File format for quantized LLM models used by llama.cpp.

Safety mechanisms preventing LLM hallucinations:

Security event requiring investigation. Represented as natural language narrative in this system.

Large Language Model. Used for dataset enhancement and second-opinion triage.

Knowledge base of adversary tactics and techniques. Used for incident enrichment and mapping.

Model compression technique reducing size/memory at slight accuracy cost (Q4, Q5, Q8).

LLM-powered component enhancing synthetic narratives during generation.

LLM-assisted classification for uncertain cases. Provides alternative perspective with rationale.

Security Operations Center. Team responsible for monitoring and responding to security incidents.

Artificially generated training data. This project uses 100% synthetic incidents.

Term Frequency-Inverse Document Frequency. Statistical measure for text feature extraction.

Process of categorizing and prioritizing incidents based on severity and type.

Label assigned when classifier confidence is below threshold. Indicates manual review needed.

Conversion of text to numerical representations for ML processing.

Acronym	Full Term
API	Application Programming Interface
ATT&CK	Adversarial Tactics, Techniques & Common Knowledge
CI/CD	Continuous Integration/Continuous Deployment
CLI	Command-Line Interface
CPU	Central Processing Unit
CSV	Comma-Separated Values
EDR	Endpoint Detection and Response
ETA	Estimated Time of Arrival
GGUF	GPT-Generated Unified Format
GPU	Graphics Processing Unit
IR	Incident Response
JSON	JavaScript Object Notation
JSONL	JSON Lines (one JSON object per line)
LLM	Large Language Model
MITRE	Massachusetts Institute of Technology Research and Engineering
ML	Machine Learning
NLP	Natural Language Processing
RAM	Random Access Memory
SIEM	Security Information and Event Management
SOAR	Security Orchestration, Automation and Response
SOC	Security Operations Center
TF-IDF	Term Frequency-Inverse Document Frequency
UI	User Interface
URL	Uniform Resource Locator

Extension	Description
`.csv`	Comma-separated values dataset file
`.joblib`	Serialized scikit-learn model or vectorizer
`.json`	JSON configuration or results file
`.jsonl`	JSON Lines (bulk results, one record per line)
`.log`	Text log file
`.md`	Markdown documentation file
`.gguf`	Quantized LLM model file
`.py`	Python source code file
`.sh`	Shell script file
`.yml`	YAML configuration file

For technical terms, see Architecture and Model Information.