Prepare for Your Data Discovery Journey and Gain Sensitive Data Intelligence!

Published On: January 8, 2023Categories: Blog
By Raj Soni

I speak to many CISOs and cybersecurity leaders on a regular basis. The topic of data discovery and classification comes up a lot. I find many cybersecurity teams are not ready or prepared to discover their data; they are busy fighting fires dealing with phishing attacks and ransomware. Many cybersecurity teams have a data discovery and classification (DDC) project on their “To Do List” but have had trouble getting the project started or completed if it gets kicked off at all. It’s not a herculean effort but there is a process and lots of communication and collaboration with business leaders to get it right. With the increase of data privacy regulations popping up these days, knowing where all the Personally Identifiable Information (PII) data is in your organization has huge benefits: harden your security policies and reduce false positives, the firefighting I mentioned above and event triaging can be reduced drastically if you know where all your sensitive data is (also referred to as sensitive data intelligence). In this blog we will discuss how to prepare for a data discovery and classification journey. Knowing where your data resides and deriving the “sensitive data intelligence” is the outcome and the first step, putting controls on your data once you know where it is, will be covered in a future blog.

One of the first pillars in the Seven Pillars of Data Protection ( is to know where your data is! Knowing where your data is located is important but equally important is knowing what kind of data it is. How sensitive is this data to the business? What data elements are we trying to protect? What are the corporate policies that align with the data privacy regulations? Assessing business risk of data privacy should be based on existing controls that are in place and understanding the gap(s); this will help build out a 3–5-year data protection roadmap.

Data Discovery and Classification (DDC) is a broad topic, the focus of this blog is how to get started and prepare for this initiative and what you need in place. The end goal of any DDC project is to learn what constitutes critical, sensitive, non-public data in your organization and the only way to do this is by working in partnership with business leaders and risk liaisons. Initially we want to focus on driving the data protection requirements by understanding what is considered “crown jewels” by business leaders. Security professionals need to work with various lines-of-business (LOB) leaders to make sure they are both aligned to help protect the critical assets. In most large organizations, there is no central “place” or a team that one can refer to, to know where all the organization’s critical data could reside. A team responsible for that needs to be identified or created. Who should lead this team? Our recommendation is the data protection team under the CISO organization that is responsible for applying controls on the data, should also be responsible for the data discovery and classifying initiative because once the data is discovered, your next step is to determine if that data is critical to the business and if yes, what controls need to be applied in order to protect that data..

As you lead this initiative, keep in mind that business leader are the data stewards, and they know where their critical data is, but they are very protective of sharing these repositories. As a security professional you will have to work in partnership with a business information security officer (BISO) or a risk liaison to socialize the benefits of protecting their data.

After a team has been identified that will take on the data discover and classification initiative, what’s next? We need to identify and prioritize the Crown Jewels to determine which to tackle first and build a roadmap for each Crown Jewel application. Next, we need to identify the BISOs (or risk liaisons) responsible for data protection for each LOB to work in partnership.

Communication & collaboration is extremely important. To discover data, you will need to convince each LOB leader to give access to their repositories for scanning. Therefore, it’s crucial that you work with BISOs (or risk liaisons) to educate and socialize the benefits of data protection, which is the main reason for conducting a data discovery project. In communicating with business leaders regarding data discovery and classification we want to emphasize that critical assets should align with corporate policies and regulations. Determine which regulation(s) the company needs to be in compliance with (PCI, SOX, HIPAA, CCPA, GDPR, etc.) and build the controls to align with those regulations.

Some other things to consider as you are starting this project: as you identify critical data repositories it’s important to note the groups (or groups of users) that are creating this data as well as users that are using this data. User information will be critical when creating and hardening security policies.

Once you have a team in place responsible for this project, let’s summarize what you will need to prepare.

  • Identify all the crown jewel apps in scope and prioritize them based on the risk
  • A BISO or risk liaison should be identified for each LOB and/or application
  • Communicate the risk associated and explain the benefits of data protection and data privacy to all stakeholders.
  • Once you have the above in place, we need to develop a roadmap and timeline for this project.

Part of developing a roadmap is to build the controls and determine what data elements you need to protect. Once you identify the sensitive data, what are you going to do with it? Security controls are applied based on the type of data: structured data or unstructured data. Structured data is data in databases, unstructured data consists of documents, emails, as well as other data that does not reside in a database.

One thing we have not mentioned so far are tools or technology because we want to focus on the process even before we talk about technology. In the past DDC initiatives have been highly manual or fraught with false positives and negatives which are dangerous too. Luckily today we have better tools that use AI/ML are more automated and require less human intervention leading to the sensitive data intelligence we are striving for. In our next blog we will do a deep dive into what controls should be applied based on the type of data. Stay tuned for our next blog and let us know if you have any questions on what we discussed above.

Other resources to consider as you start your data discovery journey:

Data Discovery and Classification Are Complicated, But Critical to Your Data Protection Program:

You Don’t Know What You Don’t Know: 5 Best Practices for Data Discovery and Classification: