GDPR challenge: Finding personal data across your systems

gdpr personal data


Under the EU GDPR, users own their personal data. At any time, they can ask you to see their data, send it to them, or delete it from your systems. You as a data collector can process their data only for as long as the data owner lets you. And the GDPR doesn’t limit itself to any particular data or content type.

This opens up a whole new set of use cases to handle archived content or unmanaged content. It’s no longer enough to merely archive data and consider your compliance box checked off. Instead, you need to be able to quickly find and address any personal data in potentially huge volumes of unmanaged content, including file shares and backup storage. To do this, there’s a good chance you need to update your organization’s file analysis practices. Here’s how to approach it.

Do a content inventory

This is the first step, in which you look for anywhere that customer data could be stored, either intentionally or accidentally. This could be in your email, chat, content management systems, and applications like ERP and CRM. But you also need to pay special attention to unmanaged environments like archives, backups, and legacy file shares, as these can often be harder to sift through.

For each piece of content you find, classify it with what you know right now. You might not yet know if the content has any customer data, so you’d classify it as unknown. As an example, Gartner suggests you use labels like this:




Site Owner

Paris IT

Primary User

Paris, Lyon office

Content Type


File Format

Microsoft Office, scanned

Primary Storage Medium


Capacity Usage




Nonemployee Data?


Retention Requirements

Not defined

Source: Gartner (October 2017)

Choose a file analysis solution

Next up, you need a solution to help you more quickly find and label the content you find in your inventory. Depending on where the content lives, you might be able to take advantage of built-in discovery features on the Microsoft platform. Or you might need to supplement this with a third-party solution.

You’re looking for a solution that will help you:

  • Deploy it quickly
  • Sift through many different content types and storage environments
  • Report on the data it finds
  • Set up content policies and make sure they’re working overtime

Update your e-discovery practices

It’s likely that your organization already has a process in place to search for and find content. Perhaps you have people who are trained to help with litigation, audit, and other investigations. You can use their experience and hone it to help you support the GDPR, too. Steps to take:

  • Talk to your e-discovery gurus about the process they use
  • See if their current tools could be repurposed for the GDPR

Build better long-term back-up practices

Complying with the GDPR isn’t a one and done process. So the last step is to come up with a new plan for how you’ll maintain your back-ups going forward. You need an efficient way to find and label personal data in archived content. Some ways to do this:

  • Make sure you can search and retrieve your backup data when you need it
  • Consider managing your backups differently, such as within an ECM, rather than standalone
  • Explore data masking and anonymization of personal data when you archive it

Need help with the GDPR?

We at Binary Tree are doing our part to help our clients protect the privacy of their own users. Specifically, we can move to the Microsoft cloud, which is designed to help you:

  • Discover personal data across your systems
  • Manage how you store and use personal data
  • Protect personal data from vulnerabilities and breaches
  • Report on your processes, user requests, and security issues


For more about how we can help, get in touch.


Source: Gartner. How to Implement File Analysis for GDPR Challenges. October 2017.