Model Information¶
đź“„ Model Card — NLP Incident Triage (v0.2.0)¶
Model Name: NLP Incident Triage
Version: 0.2.0
Author: Chris Campbell (@texasbe2trill)
Intended Use: Educational and research-grade NLP classifier trained on synthetic cybersecurity incident narratives. Designed to demonstrate SOC triage automation concepts—not to replace production security tooling.
1. Model Description¶
A TF–IDF + Logistic Regression classifier that assigns cybersecurity incident narratives to high-level event types:
The model uses: - Synthetic SOC-style data with narrative templates
- MITRE ATT&CK–inspired phrasing
- Noise injection + label flipping for realism
- Uncertainty-aware predictions (uncertain fallback)
- Difficulty modes (soc-medium, soc-hard)
2. Intended Use¶
The model is suitable for:
- Demonstrating NLP-based SOC triage workflows
- Academic or training environments
- Research on text classification methods
- Early-stage prototyping of incident summarization/triage tools
3. Not Intended For¶
Not for production SOC or IR operations.
Not designed for automated high-stakes decisions.
Not trained on real security logs.
4. Training Data¶
- 100% synthetic dataset
- Multiple event types with realistic SOC-style variation
- Includes ambiguous scenarios and misdirection
- MITRE-inspired narrative segments (Technique IDs included as text)
- No PII, customer data, or proprietary logs
5. Evaluation¶
Evaluated using:
- Synthetic hold‑out test set
- 18‑scenario SOC test suite
- Ambiguity stress tests
- Model comparison across LogReg, Linear SVM, RandomForest
- Probability calibration analysis
Observed behavior:
- ~92% accuracy on synthetic test set
- Strong performance on clear-cut phishing/malware/exfiltration
- Realistic degradation on ambiguous or noisy cases
- Uncertainty thresholding improves stability
6. Ethical Considerations¶
- Only synthetic data used
- No real-world adversary emulation
- User should maintain human-in-the-loop validation
- MITRE ATT&CK® used with required attribution:
- “MITRE ATT&CK® is a registered trademark of The MITRE Corporation.”
7. Limitations¶
- Not trained on real logs or telemetry
- Cannot detect rare SOC events outside template patterns
- Limited ability to reason over long multi-sentence reports
- Vocabulary tied to generator styles
8. Recommendations for Future Work¶
- Expand training set with richer MITRE-derived semantics
- Add vendor-specific log styles (MDE, CrowdStrike, Okta, etc.)
- Introduce structured indicators (IPs, ports, geodata) as model features
- Explore transformer-based encoders for long-text handling
- Increase scenario test suite to 50+ cases
9. Version History¶
v0.2.0¶
- MITRE-inspired narrative enrichment
- Difficulty modes (
soc-medium,soc-hard) - Bulk analysis mode
- SOC summary output
- Improved CLI formatting and progress bar
- Updated documentation and model card
If you would like, I can also generate a printable PDF version of this model card for inclusion in releases.