Skip to main content
See Security Labs

SEC401 - Defense in Depth

Lab 2.2 - Data Loss Prevention

Solo, Lab

Focus: Data Security & DLP

Level: SEC401

Date: Apr 2026

Artifacts: Sanitized screenshots from Slingshot Linux lab environment

TL;DR

  • Used grep with regex to scan removable media and flag files containing 'secret', 'confidential', or 'sensitive'
  • Extracted Office document metadata with exiftool: author, classification keyword (SECRET), and modification history
  • Geolocated a photograph by extracting GPS coordinates from EXIF data and identifying the real-world location

Skills demonstrated

DLP keyword scanning with grepRegex-based content classificationDocument metadata extraction (exiftool)EXIF GPS coordinate extractionGeolocation from image metadataInsider threat investigationRemovable media forensics

Note: Course-provided PCAPs and lab instructions are not shared. Only my own captures and sanitized notes are published.

Why this matters

Data exfiltration via removable media remains one of the most common insider threat vectors. Organizations that don't scan outbound media miss classified documents walking out the door. Metadata in Office files and photos can reveal authorship, classification markings, and even physical locations that the sender never intended to share. DLP tools automate what this lab does manually, but understanding the underlying techniques is essential for tuning DLP policies and investigating incidents.

Context

This lab simulates a data loss prevention investigation on a removable media device (CDROM). The goal was to identify sensitive files using keyword scanning with grep, extract hidden metadata from Office documents using exiftool, and geolocate the origin of a photograph by extracting GPS coordinates embedded in its EXIF data.

Tools used

grepexiftoolEXIF/GPS analysisCLI

Steps taken

1Scan removable media for sensitive keywords

Navigated to the mounted CDROM and used grep with Perl-compatible regex to scan all files for the words 'secret', 'confidential', or 'sensitive'. The -P flag enables PCRE, -a treats binary files as text, -i makes the search case-insensitive, and -l prints only filenames (not matching lines). One file matched: 'Merger Offer Letter to Beta Industries.doc', a document that would be flagged by any DLP system scanning for classification markers.

$ cd /media/sec401/CDROM/
$ grep -Pail '(secret|confidential|sensitive)' *
-PPerl-compatible regex (supports alternation with |)
-atreat binary files as text (needed for .doc/.docx)
-icase-insensitive matching
-lprint only filenames, not matching content

2Extract document metadata with exiftool

Ran exiftool against Bankruptcy.docx to extract all embedded metadata. Key findings: the document was created by Madison Jeffries, last modified by Jerry Jackson, and tagged with the keyword SECRET. Additional metadata reveals it was created in Microsoft Office Word (App Version 16.0000), has 358 words across 2 pages, and has a total edit time of 2,982,555 days. The Keywords field is particularly significant for DLP: this is where classification markings like SECRET, CONFIDENTIAL, or TOP SECRET are often stored in government and corporate environments.

$ exiftool Bankruptcy.docx
exiftoolread/write metadata in files (EXIF, IPTC, XMP, Office XML)
Outputs all metadata fields including Creator, Keywords, Last Modified By

3Geolocate photo from GPS coordinates

Opened an image file and examined its EXIF properties, which revealed embedded GPS coordinates (GPSLatitude and GPSLongitude). Photographs taken with smartphones and GPS-enabled cameras automatically embed location data in the image file. By extracting these coordinates, the exact location where the photo was taken can be identified on a map. This is a major privacy and security concern: employees sharing photos from sensitive facilities, whistleblowers inadvertently revealing their location, or insiders documenting assets before exfiltration. DLP policies should strip EXIF data from outbound images or flag files containing GPS metadata.

Key findings

grep -Pail flagged 'Merger Offer Letter to Beta Industries.doc' containing sensitive keywords
exiftool revealed Bankruptcy.docx: Creator (Madison Jeffries), Keywords (SECRET), Last Modified By (Jerry Jackson)
Image EXIF data contained embedded GPS coordinates revealing the photo's geographic origin
All three data leakage vectors (content, metadata, geolocation) found on a single removable device

Outcome / Lessons learned

This lab demonstrated three core DLP investigation techniques: keyword scanning to identify sensitive content in files, metadata extraction to reveal document authorship and classification markings, and GPS coordinate extraction to geolocate the origin of photographs. The combination of grep for content inspection and exiftool for metadata analysis represents the manual equivalent of what enterprise DLP solutions automate at scale.

If this were a real investigation: I'd correlate the document author (Madison Jeffries) and modifier (Jerry Jackson) with HR and access control records, escalate the SECRET-classified document found on removable media as a potential data spillage incident, implement DLP policies to scan removable media before data can be written to it, configure endpoint agents to strip EXIF/GPS data from outbound images, and review access logs to determine how classified documents reached the CDROM.

Security controls relevant

  • DLP policies scanning removable media for classification keywords
  • Endpoint controls restricting USB/CDROM write access
  • EXIF/GPS metadata stripping on outbound files
  • Document classification enforcement (mandatory marking)
  • Insider threat monitoring and behavioral analytics
  • Data-at-rest scanning for misplaced classified documents

What I took away from this

The grep scan is the simplest possible DLP check, and it's shocking how effective it is. A one-liner with three keywords caught a merger offer letter sitting on a CDROM. In a real organization, an insider copying M&A documents to removable media is a textbook exfiltration scenario. The SEC has prosecuted cases exactly like this. Enterprise DLP tools like Microsoft Purview, Symantec DLP, and Digital Guardian do the same thing at scale, scanning content against hundreds of classification patterns, but the underlying technique is identical: regex matching against file content. If you understand what grep -Pail does, you understand the detection engine behind a $500K DLP deployment.

The exiftool output is where this gets interesting from an investigation standpoint. The Keywords field reads SECRET, which means someone explicitly classified this document and it still ended up on removable media. That's not an accidental leak; it's a control failure. The Creator and Last Modified By fields give you two names to investigate: Madison Jeffries authored it, Jerry Jackson edited it last. In a real incident response, you'd cross-reference these names against access control lists, check who had physical access to the CD burner, and pull endpoint logs to trace the chain of custody. Metadata is the forensic breadcrumb trail that most insiders forget to clean.

The GPS coordinates in the image EXIF data are the most underestimated risk on this disc. Every smartphone photo embeds latitude and longitude by default unless the user explicitly disables location services. Military and intelligence organizations have standing orders to strip EXIF data before sharing any imagery, and there are documented cases where embedded GPS coordinates revealed the locations of forward operating bases. In a corporate context, a photo taken inside a data center, a lab, or a competitor's facility carries the same risk. The fix is straightforward: DLP policies should flag or strip GPS metadata from outbound files, and security awareness training should cover the risk of location-tagged photos. Most employees have no idea their phone is embedding a precise map coordinate in every picture they take.

Evidence gallery