❌

Normal view

Received β€” 16 April 2026 ⏭ /r/netsec - Information Security News & Discussion

Open dataset: 100k+ multimodal prompt injection samples with per-category academic sourcing

I submitted an earlier version of this dataset and was declined on the basis of missing methodology and unverifiable provenance. The feedback was fair. The documentation has since been rewritten to address it directly, and I would very much appreciate a second look.

What the dataset contains

101,032 samples in total, balanced 1:1 attack to benign.

Attack samples (50,516) across 27 categories sourced from over 55 published papers and disclosed vulnerabilities. Coverage spans:

  • Classical injection - direct override, indirect via documents, tool-call injection, system prompt extraction
  • Adversarial suffixes - GCG, AutoDAN, Beast
  • Cross-modal delivery - text with image, document, audio, and combined payloads across three and four modalities
  • Multi-turn escalation - Crescendo, PAIR, TAP, Skeleton Key, Many-shot
  • Emerging agentic attacks - MCP tool descriptor poisoning, memory-write exploits, inter-agent contagion, RAG chunk-boundary injection, reasoning-token hijacking on thinking-trace models
  • Evasion techniques - homoglyph substitution, zero-width space insertion, Unicode tag-plane smuggling, cipher jailbreaks, detector perturbation
  • Media-surface attacks - audio ASR divergence, chart and diagram injection, PDF active content, instruction-hierarchy spoofing

Benign samples (50,516) are drawn from Stanford Alpaca, WildChat, MS-COCO 2017, Wikipedia (English), and LibriSpeech. The benign set is matched to the surface characteristics of the attack set so that classifiers must learn genuine injection structure rather than stylistic artefacts.

Methodology

The previous README lacked this section entirely. The current version documents the following:

  1. Scope definition. Prompt injection is defined per Greshake et al. and OWASP LLM01 as runtime text that overrides or redirects model behaviour. Pure harmful-content requests without override framing are explicitly excluded.
  2. Four-layer construction. Hand-crafted seeds, PyRIT template expansion, cross-modal delivery matrix, and matched benign collection. Each layer documents the tool used, the paper referenced, and the design decision behind it.
  3. Label assignment. Labels are assigned by construction at the category level rather than through per-sample human review. This is stated plainly rather than overclaimed.
  4. Benign edge-case design. The ten vocabulary clusters used to reduce false positives on security-adjacent language are documented individually.
  5. Quality control. Deduplication audit results are included: zero duplicate texts in the benign pool, zero benign texts appearing in attacks, one documented legacy duplicate cluster with cause noted.
  6. Known limitations. Six limitations are stated explicitly: text-based multimodal representation, hand-crafted seed counts, English-skewed benign pool, no inter-rater reliability score, ASR figures sourced from original papers rather than re-measured, and small v4 seed counts for emerging categories.

Reproducibility

Generators are deterministic (random.seed(42)). Running them reproduces the published dataset exactly. Every sample carries attack_source and attack_reference fields with arXiv or CVE links. A reviewer can select any sample, follow the citation, and verify that the attack class is documented in the literature.

Comparison to existing datasets

The README includes a comparison table against deepset (500 samples), jackhhao (2,600), Tensor Trust (126k from an adversarial game), HackAPrompt (600k from competition data), and InjectAgent (1,054). The gap this dataset aims to fill is multimodal cross-delivery combinations and emerging agentic attack categories, neither of which exists at scale in current public datasets.

What this is not

To be direct: this is not a peer-reviewed paper. The README is documentation at the level expected of a serious open dataset submission - methodology, sourcing, limitations, and reproducibility - but it does not replace academic publication. If that bar is a requirement for r/netsec specifically, that is reasonable and I will accept the feedback.

Links

I am happy to answer questions about any construction decision, provide verification scripts for specific categories, or discuss where the methodology falls short.

submitted by /u/BordairAPI
[link] [comments]
Received β€” 15 April 2026 ⏭ /r/netsec - Information Security News & Discussion
Received β€” 14 April 2026 ⏭ /r/netsec - Information Security News & Discussion

Unpatched RAGFlow Vulnerability Allows Post-Auth RCE

The current version of RAGFlow, a widely-deployed Retrieval Augmented Generation solution, contains a post-auth vulnerability that allows for arbitrary code execution.

This post includes a POC, walkthrough and patch.

The TL;DR is to make sure your RAGFlow instances aren't on the public internet, that you have the minimum number of necessary users, and that those user accounts are protected by complex passwords. (This is especially true if you're using Infinity for storage.)

submitted by /u/Prior-Penalty
[link] [comments]
Received β€” 13 April 2026 ⏭ /r/netsec - Information Security News & Discussion

CVE-2026-22666: Dolibarr 23.0.0 dol_eval() whitelist bypass -> RCE (full write-up + PoC)

Root cause: the $forbiddenphpstrings blocklist is only enforced in blacklist mode -> the default whitelist mode never touches it. The whitelist regex is also blind to PHP dynamic callable syntax (('exec')('cmd')). Either bug alone limits impact; together they reach OS command execution. Coordinated disclosure - patch available as of 4/4/2026.

submitted by /u/JivaSecurity
[link] [comments]
Received β€” 12 April 2026 ⏭ /r/netsec - Information Security News & Discussion
Received β€” 10 April 2026 ⏭ /r/netsec - Information Security News & Discussion
Received β€” 9 April 2026 ⏭ /r/netsec - Information Security News & Discussion

Threat Model Discrepancy: Google Password Manager leaks cleartext passwords via Task Switcher (Won't Fix) - Violates German BSI Standards

Hi everyone, I’m a Cybersecurity student at HFU in Germany and recently submitted a vulnerability to the Google VRP regarding the Google Password Manager on Android (tested on Pixel 8, Android 16).

The Issue: When you view a cleartext password in the app and minimize it, the app fails to apply FLAG_SECURE or blur the background. When opening the "Recent Apps" (Task Switcher), the cleartext password is fully visible in the preview, even though the app actively overlays a "Enter your screen lock" biometric prompt in the foreground. It basically renders its own secondary biometric lock completely useless.

Google's Response: Google closed the report as Won't Fix (Intended Behavior). Their threat model assumes that if an attacker has physical access to an unlocked device, it's game over.

The BSI Discrepancy: What makes this interesting is that the German Federal Office for Information Security (BSI) recently published a study on Password Managers. In their Threat Model A02 ("Attacker has temporary access to the unlocked device"), they explicitly mandate that sensitive content MUST be protected from background snapshots/screenshots. So while Google says this is intended, national security guidelines classify this as a vulnerability. (For comparison: The iOS built-in password manager instantly blurs the screen when losing focus).

Here is my PoC screenshot:
https://drive.google.com/file/d/1PTGKRpyFj_jY9S76Jlo62mSCDJ3c6uLO/view?usp=sharing
https://drive.google.com/file/d/1nIJMQbM4R17EMt9f1Ffb4UmCPYY7-GXb/view?usp=sharing

What are your thoughts on this? Should password managers protect against shoulder surfing via the Task Switcher, or is Google right to rely solely on the OS lockscreen?

submitted by /u/Onat120
[link] [comments]
Received β€” 8 April 2026 ⏭ /r/netsec - Information Security News & Discussion
❌