In another post, I mentioned organizations not having a data
classification standard and associated policy will have a difficult time
implementing many information security related controls such as DLP and rights
management. Data classification can also
help with DR optimization, justifying spend on technology, and may be a
cyber-insurance requirement. Since many security projects rely on proper
classification of data, we have seen an up-tick in requests related to helping clients
in this regard.
What does a Data Classification Project look
like?
A Data Classification Project assesses an organization’s
digital assets to determine criticality, sensitivity, privacy requirements, and
to determine a naming taxonomy to categorize the data.
A typical project consists of a series of interviews, research, and tools based discovery to establish the following regarding the data:
A typical project consists of a series of interviews, research, and tools based discovery to establish the following regarding the data:
·
Location
·
Criticality to the organization (using the typical CIA Triad)
·
Regulations relevant to data
While a full blown Data Classification Exercise could fill a book, I'll attempt to condense it down in this post.
Location of Data and its Risk
Interviews with business unit managers as well as the
technical staff are coupled with a tools based discovery effort to identify all
of the repositories containing data.
These repositories will start with physical data containers such as hard
drives, SANs, NAS devices, cloud providers, etc., and will ultimately identify files, databases and applications.
Once there is a mapping of the data locations, we will start to group
the data by risk to the organization.
The following table illustrates an example of the risk each discovered data element has to the organization and how it might be calculated:
Data
Element 1
|
Data Element 2
|
Data Element 3
|
|
Confidentiality
|
|||
Data
Leakage, Theft, Disclosure
|
5
|
2
|
9
|
Integrity
|
|||
Data
Compromise, Manipulation
|
5
|
8
|
8
|
Availability
|
|||
Failure
of system, Comms, Deletion
|
2
|
6
|
5
|
Risk Score
|
4
|
5
|
7
|
Data Labels and Classification
Once the data is identified and ranked, a naming taxonomy will
need to be decided upon. This is when data labels will be used to mark data so
the appropriate controls can be applied – whether they are automatic or manual,
technical or otherwise. Most people have at least heard of one of the
federal government’s data labels – “Top Secret” - a very high level of
sensitivity of information, only allowed to be viewed by individuals with a
“Top Secret” clearance, or higher clearance.
The “Top Secret” designation is the data label, which is applied to
documents, emails, etc. and the classification is the understanding of what
information falls into this category. While
regulated entities such as healthcare and banking already have data protections
defined, an organization will still need to come up with a labeling
scheme. Carnegie Mellon University has a
great example of how their data is labeled and classified:
Restricted Data – Data
should be classified as Restricted when the unauthorized disclosure, alteration
or destruction of that data could cause a significant level of risk to the
University or its affiliates. Examples of Restricted data include data
protected by state or federal privacy regulations and data protected by
confidentiality agreements. The highest level of security controls should
be applied to Restricted data.
Private Data - Data should be classified as Private when the unauthorized disclosure, alteration or destruction of that data could result in a moderate level of risk to the University or its affiliates. By default, all Institutional Data that is not explicitly classified as Restricted or Public data should be treated as Private data. A reasonable level of security controls should be applied to Private data.
Public Data - Data should be classified as Public when the unauthorized disclosure, alteration or destruction of that data would result in little or no risk to the University and its affiliates. Examples of Public data include press releases, course information and research publications. While little or no controls are required to protect the confidentiality of Public data, some level of control is required to prevent unauthorized modification or destruction of Public data.
Once a project like this is
completed, it will become much easier to implement effective protective
controls on the appropriate data. I
usually find the exercise also sheds light on the sheer expanse of data an
organization maintains. When management
is given visibility into the amount of data, backed up by numbers showing the risk
the data poses to an organization, decisions can be made and budgets can be
created to ensure the protection of data is appropriate.
[1] http://www.cmu.edu/iso/governance/guidelines/data-classification.html