Unstructured data–data in documents, spreadsheets, images, and other files–accounts for the vast majority of the sensitive information created and stored by a typical organization. These files represent some of an organization’s most valuable assets, as well as some of its greatest security risks.
To keep pace with the exponential growth of unstructured data and the proliferation of threats to it, organizations need to adopt strategies and technologies that enable enterprise-wide visibility and control over sensitive data in files.
Understanding the challenge
Studies by technology companies and industry analysts have demonstrated that unstructured data makes up 80% of a typical company’s data volume, with year-over-year growth rates of 50% or more. In a large organization, unstructured data is continually being created by end users, extracted from structured databases, collected from external sources, and generated by automated processes.
Compared to information in a database, unstructured data is harder to control, harder to audit, and harder to protect. For those reasons, it’s also easier for organizations to overlook when they assess their cybersecurity risks. That can be a grave mistake, because unstructured data often contains the same information as an organization’s tightly-managed structured data, but can travel anywhere, and be duplicated any number of times, without the organization’s knowledge.
Data discovery: the essential tool
The problem of unstructured data can only be addressed using technology that (a) automatically detects sensitive data wherever it exists within the organization, and (b) takes policy-based protective action as soon as sensitive information is detected. Without these capabilities, an organization will leave gaps in its security, and gaps are where breaches happen.
Data discovery (not to be confused with e-discovery, a related concept that’s confined to the world of legal proceedings) is the process of scanning files and detecting sensitive information within them. Enterprise data discovery tools have advanced rapidly in recent years, allowing organizations to scan files on laptops, desktops, file servers, and other IT assets, and identify files that contain sensitive data.
As a first step in using discovery technology effectively, an organization must define sensitive data according to its own business goals and security needs. Discovery tools can be configured to search for payment card information, personally identifiable information, specific words, or certain forms of data (such as source code). Software is then deployed to monitor file activity on servers and user devices and detect files that match the organization’s unique criteria.
The next steps
Identifying sensitive information is only the first step in securing unstructured data. Once sensitive files have been detected through automated discovery, they need to be classified and remediated according to organizational policy. Classification helps maintain user awareness of a file’s appropriate use, while remediation (encryption, masking, redaction, etc.) ensures that sensitive information is protected against misuse when a file is sent via email, copied to the cloud, or stored on a file server.
Ideally, classification and remediation should be automated as well, and integrated in the same workflow with discovery. This approach allows users to work without interruption, while eliminating the possibility that files will be left unprotected or secured via out-of-policy tools.
Discovery in action
If sensitive data is detected at the point of creation and immediately protected, entire categories of threats cease to be an issue.
Imagine, for example, that an employee extracts a few thousand records from a secured database and saves them in a spreadsheet. If the organization has the right technology in place, the spreadsheet will be scanned by a discovery agent as soon as the employee saves the file. The scan will detect the presence of sensitive data in the file, which will trigger the next step in the data protection workflow. That can mean classification, encryption, quarantine, or a combination of actions. The end result is that the file never has a chance to exist outside of the organization’s security policies, and the employee cannot–either through error or intentional action–expose the data to theft or misuse.
With the right strategy and the right tools, organizations can manage their unstructured data with the same degree of visibility and control that they maintain over structured data today.PKWARE’s Smartcrypt is an enterprise data security platform that automatically discovers, classifies, and protects sensitive unstructured data on endpoint devices and servers. Many of the world’s largest financial institutions, government agencies, and other organizations use Smartcrypt to keep their sensitive information secure.
To learn more about data discovery and Smartcrypt’s capabilities, read PKWARE’s whitepaper, Making Sense of Sensitive Data.
- Is This Negligence? Atrium Health data breach exposed 2.65 million patient records
- Applying a Factory Model to Artificial Intelligence and Machine Learning
- M&A’s Are the Perfect Time to Assess Your IT Environment
Latest posts by Marty Meehan
- Data-Centric Security and Zero Trust Architecture: - February 3, 2019
- Unstructured Data: Vulnerable, Uncontrolled, and Getting Bigger Every Day - October 1, 2018
- The Entropy Problem: Random Data and Secure Cryptography - July 30, 2018