This is the GAT Labs for Enterprise website. Go to the GAT Labs for Education solutions here.

How to Find and Secure Sensitive Data Across Your Google Workspace Domain

Data Discovery in Google Workspace

See GAT Labs
in action

Table of Contents

You cannot secure what you cannot see

Ask any experienced Google Workspace admin what keeps them up at night, and the answer is rarely a sophisticated external attack. More often, it is the slow creep of ungoverned data: files shared too broadly, sensitive documents sitting in folders nobody owns anymore, and permissions granted months ago that nobody thought to revoke.

This is the data discovery problem, and it is one of the most underestimated security challenges facing enterprise Google Workspace deployments today.

Data discovery is the process of systematically identifying, locating, and cataloguing all the data within your organisation’s Google Workspace environment. It sounds straightforward. In practice, for any organisation with more than a few hundred users, it is anything but.


Why data discovery is harder than it looks in Google Wokspace

Google Workspace generates data at extraordinary volume. A typical mid-size enterprise will accumulate millions of Drive files, years of email history, and countless Shared Drive contributions over time. The challenge is not that the data is hidden. It is that the default tools for managing it were not built for the scale or depth that security teams need.

Here is what makes Google Drive audit and management genuinely difficult at enterprise scale:

1. Organic data growth: Files are created, copied, moved, and shared continuously. Every collaboration spawns new versions, every project creates new folders, and every file share potentially creates a new exposure.

2. External sharing complexity: Google’s sharing model is flexible by design, but that flexibility means it is easy for “view access” to become “edit access,” for internal shares to become external ones, and for “Restricted” links to quietly become “Anyone with the link.” Our knowledge base covers how to find publicly shared Google Drive files across an entire domain, which is often the first thing admins want to do once they have the right tooling.

3. Ownership gaps: When a user leaves your organisation, their Drive files do not leave with them. Unless ownership is transferred proactively, data accumulates in orphaned accounts, technically accessible but effectively ungoverned. Transferring ownership of Google Drive files in bulk is one of the most common tasks that comes up immediately after an offboarding audit.

4. Third-party app access: OAuth-connected apps can hold broad access to Drive data without any admin actively managing or reviewing those permissions. Most organisations have far more connected apps than they realise, many of them long since forgotten, still sitting on active permissions.

5. The API ceiling: Google’s native admin reports are genuinely useful, but they have limits. The native console does not give you the filtering depth, the cross-domain visibility, or the exportable audit trails that security and compliance teams require.

What a proper data discovery process looks like

Effective data discovery in Google Workspace follows a clear structure.

Phase 1: Inventory

Start by getting a complete picture of what exists. This means every file in every Drive and Shared Drive across the domain, ownership and last modified date for each file, sharing status, all external collaborators with active access, and all third-party OAuth applications and the permissions they hold.

This is your baseline. Without it, everything else is guesswork. A useful starting point is exporting a tree view of all Shared Drives to a Google Sheet, which gives you a navigable, auditable record of your entire domain structure.

Phase 2: Risk Scoring

Not all data carries the same risk, and not all exposure is equal. Build your inventory, then assess each element against key risk signals. Identify stale publicly shared files, orphaned ownership, sensitive data patterns, and Shared Drives with external members and no governance.

It is also worth checking for files shared in from external users, which are often overlooked. These are files where the owner sits outside your domain and has added one of your users as an editor or viewer. They represent an exposure surface that native tools rarely surface clearly.

Phase 3: Prioritised Remediation

Armed with a risk-scored inventory, you can act with confidence. Prioritise the highest-risk items first: remove public and “Anyone with the link” permissions from sensitive files, transfer ownership of orphaned data, and remove inactive external collaborators. Document every change for your audit trail.

The critical principle at this stage is bulk action. Manual, file-by-file remediation is not a realistic strategy at enterprise scale. If your tools cannot update thousands of files in a single operation, they are not fit for purpose.

Phase 4: Ongoing Monitoring

Data discovery is not a quarterly exercise. It is a continuous practice. Set up real-time alerts for new instances of high-risk sharing, schedule recurring audit reports, and maintain a view of the data posture that is always current. When you leave an exposure undetected for three months, you treat it no differently than one you never found.

Five questions every admin should be able to answer right now

If you have not done a thorough data discovery exercise recently, here are the five questions you should be able to answer, and probably cannot without the right tooling:

  1. How many files in your domain are currently shared with “Anyone with the link”? Not an estimate. The actual number, filterable by owner, department, or date.
  1. Which former employees still own files with active external shares? Offboarding processes often miss this, and the exposure can persist for months or years after someone has left.
  1. What data do your OAuth-connected third-party apps have access to? Many organisations have dozens of connected apps, many of them long-forgotten, with broad Drive permissions still active.
  1. Which files contain PII or sensitive financial data and are accessible beyond their intended audience? This is the core compliance question, and it requires content-aware scanning to answer properly.
  1. How long would it take you to respond to a DSAR? If the answer is a few weeks, that is a risk signal in itself.

If any of these take more than a few minutes to answer, you have a data discovery gap.


How GAT addresses data discovery for Google Workspace

GAT+ was built to answer exactly these questions, quickly, accurately, and at any scale.

It gives admins a complete, filterable inventory of all Drive data across the domain, surfacing details that Google’s native tools do not expose. From that inventory, security and compliance teams can filter by any combination of parameters, including owner, sharing type, content type, external collaborator, and date range. They can flag high-risk files automatically based on configurable policy rules, execute bulk remediations across thousands of files in a single action, and generate audit reports on demand or on a schedule in formats ready for GDPR compliance review, DSAR response, or internal governance.

One of GAT Labs’ enterprise clients, a leading Ivy League institution, used GAT+ to audit and manage over 450 million Google Drive files. That is the scale at which effective data discovery tooling has to operate.

For smaller deployments, the value is the same: complete visibility, actionable insight, and the ability to fix problems before they become incidents.

Data discovery does not exist in isolation either. It feeds directly into your DSPM practice, your GDPR and compliance obligations, your incident response capability, and your broader Google Drive security posture. When a breach does occur, knowing your data landscape means you can scope the incident accurately and contain it faster. And for zero-trust security frameworks to function correctly, data discovery provides the inventory layer they depend on.

GAT extends that coverage further through GAT Shield, which monitors Chrome activity in real time, including downloads, visited sites, and session behaviour. GAT Unlock gives admins approval-gated access to sensitive Gmail and Drive content, requiring a second approver and maintaining a full audit trail for every action.

Start with a single audit

If data discovery feels like a large project, start small. Run a single focused audit of all files in your domain that are currently shared externally. Then filter the results to the last 90 days and sort by sensitivity signals.

That one report will tell you more about your current data risk posture than most organisations know about theirs.

Then build from there.

An admin who has never run a proper data discovery audit is worse off than one who has already found issues. The risks exist either way. The difference is whether you find them before someone else does.

Insights That Matter. In Your Inbox.

Join our newsletter for practical tips on managing, securing, and getting the most out of Google Workspace, designed with Admins and IT teams in mind.

Subscribe to GAT Labs Newsletter