Articles

Cyber-Threat Risk Assessment using R

Data Science

An event tree is an analytical diagram in the form of a directed path tree. Each node of the tree represents some event, which branches events from one single event using Boolean logic. Such a tree can be used for risk assessment, fault tree analysis (FTA), mission failure and so on.

Every directed path in the tree starts with the root node, and ends at a node with a leaf node. Each directed path from the root node to a leaf node in the event tree represents a possible sequence of alternating events and outcomes (i.e., a scenario). Successor events are randomly selected from a successor node using probabilities of occurrence of events.

The problem of identifying a possible cyber-threat attack profile can be solved by FTA using event trees. The steps to perform an event tree analysis are:

  1. Define the system and boundaries
  2. Identify the hazard scenarios
  3. Identify the initiating events
  4. Identify intermediate events, including countermeasures associated with the specific scenario.
  5. Build the event tree diagram
  6. Obtain event failure probabilities
  7. Identify the outcome risk
  8. Evaluate the outcome risk for acceptability.
  9. Recommend corrective action
  10. Document the entire process on the event tree diagram

We will work from the event tree in Figure 1. For simplicity, we assume a single initiating event. For concreteness, we assign uncertainly distributions to each of the arc probabilities:

  • P_A1~Beta(2,2);
  • P_T1~Beta(4,1); and
  • P_T2~Beta(3,2).
FIGURE 1. A simple event tree for two successive stages (events), each with two outcomes. For this example, each path through the tree represents a unique scenario with its own consequence distribution.

FIGURE 1. A simple event tree for two successive stages (events), each with two outcomes. For this example, each path through the tree represents a unique scenario with its own consequence distribution.

Figure 2. Distribution of the arc propabilities

Figure 2. Distribution of the arc probabilities

In addition, we know the distributional form of each consequence distribution. Using the notation  to denote the consequence distribution associated with the first arc, we assign the following distributions to consequences:

  • c(x | s_1)~Gamma(8000,2);
  • c(x | s_2)~Gamma(4500,1);
  • c(x | s_3) ~Gamma(10000,2); and
  • c(x | s_4)~Gamma(5500,1).

Start. We would like to know the form of the risk distribution. Summary statistics from this distribution (5th percentile, mean, 95th percentile) arc used to summarize risk and present analyses in the cyber-threat risk analysis.

A simple way to simulate from the risk distribution is as follows:

  • Repeat n times;
  • Sample from each arc probability;
  • Calculate the probabilities for each scenario;
  • Choose a scenario using the calculated probabilities;
  • Sample from the consequence distribution for that scenario;
  • The n samples constitute a sample from the risk distribution; and
  • Summarize these samples using a histogram, empirical quantiles, and sample mean.

R code implementing this algorithm follows.

n <- 1000000
consq <- rep(0,n)

for (i in 1:n)
pa1<- rbeta(1,2,2)
pt1 <- rbeta(1,4,1)
pt2 <- rbeta(1,3,2)

s1p <- pa1*pt1
s2p <- pa1*(1-pt1)
s3p <- (1-pa1)*pt2
s4p <- (1-pa1)*(1-pt2)

scen<- rmultinom(1,1,c(s1p,s2p,s3p,s4p))
if (scen[1] == 1) consq[i] <- rgamma(1,8000,2)
if (scen[2] == 1) consq[i] <- rgamma(1,4500,1)
if (scen[3] == 1) consq[i] <- rgamma(1,10000,2)
if (scen[4] == 1) consq[i] <- rgamma(1,5500,1)

hist(consq,freq=F,main=””,xlim=c(3500,6000), xlab=”Consequence Distribution”,ylim=c(0,0.0035))
lines(density(consq))
quantile(consq,c(0.05,0.95))
5% 95%
0 0
mean(consq)
[1] 0.003892016

Approach 1. For an event tree as complex as one that might be encountered in real operations, this approach is unfeasible. A more realistic approach follows:

  • Draw 500 samples from each arc probability;
  • Calculate 500 sets of scenario probabilities;
  • Draw 1000 samples from each consequence distribution;
  • Represent each consequence distribution as a histogram;
  • For each of the 500 sets of scenario probabilities, calculate a weighted average of the mass in each bin of the histogram, and call this one “sampled risk curve”;
  • Calculate the average over all500 risk curves. Use this as an approximation to the risk distribution and calculate the mean, 5th percentile, and 95th percentile; and
  • Also calculate the 5th and 95th percentiles for the entire set of risk curves.

R code implementing this algorithm follows.

nsampbr <-  500

pa1 <- rbeta(nsampbr,2,2)
pt1 <- rbeta(nsampbr,4,1)
pt2 <- rbeta(nsampbr,3,2)

s1p <- pa1*pt1
s2p <- pa1*(1-pt1)
s3p <- (1-pa1)*pt2
s4p <- (1-pa1)*(1-pt2)

The estimated risk distribution from this approach is given as tile line with circles in Figure 3.

Figure 3. Consequence distribution for random scenarios using Approach 1

Figure 3. Consequence distribution for random scenarios using Approach 1

Approach 2. The risk distribution can be calculated without sampling from the arc probability distributions. For an event tree the size that would really present itself, this represents a significant computational simplification. What is lost in the simplification is the family of risk curves—i.e., one curve for each possible outcome.

Consider the following simplified algorithm:

  • Draw 1000 samples from each consequence distribution;
  • Represent each consequence distribution as a histogram;
  • Calculate a weighted average of the mass in e::tch bin of the histogram using the expected arc probabilities; and
  • Use this as the estimated risk distribution.

R code implementing this algorithm follows.

nsampc <- 1000

cs1 <- rgamma(nsampc,8000,2)
cs2 <- rgamma(nsampc,4500,1)
cs3 <- rgamma(nsampc,10000,2)
cs4 <- rgamma(nsampc,5500,1)

bh1<-hist(cs1,breaks=seq(3500,6000,length=101),plot=F)$density
bh2<-hist(cs2,breaks=seq(3500,6000,length=101),plot=F)$density
bh3<-hist(cs3,breaks=seq(3500,6000,length=101),plot=F)$density
bh4<-hist(cs4,breaks=seq(3500,6000,length=101),plot=F)$density

qdm <- matrix(0,nsampbr,100)
for (i in 1:nsampbr) {
qdm[i,]<-s1p[i]*bh1 + s2p[i]*bh2 + s3p[i]*bh3 + s4p[i]*bh4
}
qdmean <- apply(qdm,2,mean)
qd5 <- apply(qdm,2,quantile,c(0.05))
qd95<- apply(qdm,2,quantile,c(0.95))
xy<-xy.coords(3500:6000, 0:.005, recycle = TRUE)
x <- seq(3512.5,5987.5,by=25)
points(x,qdmean,type=”b”,pch=1)

The estimated risk distribution from this approach is given as the line with triangles in Figure 4.

Analytics Figure 4

Figure 4. Consequence distribution for Approach 1 (circles) and Approach 2 (triangle)

Approach 3. If the conditional consequence distributions are given in parametric form, or in numerical look-up tables, calculation of the risk distribution can be done exactly. Without resorting to estimating these distributions from the outputs of Monte Carlo simulations. This method is simply:

  • Calculate the expected arc probabilities; and
  • Calculate the weighted average of the consequence distributions.

ms1p <- (0.5)*(0.8)
ms2p <- (0.5)*(0.2)
ms3p <- (0.5)*(0.6)
ms4p <- (0.5)*(0.4)
nsampc <- 1000
cs1 <- rgamma(nsampc,8000,2)
cs2 <- rgamma(nsampc,4500,1)
cs3 <- rgamma(nsampc,10000,2)
cs4 <- rgamma(nsampc,5500,1)
bh1 <- hist(cs1,breaks=seq(3500,6000,length=101),plot=F)$density
bh2 <- hist(cs2,breaks=seq(3500,6000,length=101),plot=F)$density
bh3 <- hist(cs3,breaks=seq(3500,6000,length=101),plot=F)$density
bh4 <- hist(cs4,breaks=seq(3500,6000,length=101),plot=F)$density
erd <- ms1p*bh1 + ms2p*bh2 + ms3p*bh3 + ms4p*bh4
x <- seq(3512.5,5987.5,by=25)
points(x,erd,type=”b”,pch=2)

The risk distribution (exact, and not an estimate) obtained using this approach is given as the line with crosses in Figure 5.

Summary

The histogram and solid black line result from brute force sampling from the arc probability distributions and the consequence distributions. The line with circles is the estimate that can produce risk curves. The line with triangles is the estimate from a greatly simplified algorithm that uses only the marginal expected values of individual arc probabilities and simulations from the consequence distributions. The line with crosses is calculated assuming a parametric (or tabular) form is known for the consequence distributions and requires no simulation. Notice the good agreement between the four estimates.

Figure 5.The risk distribution (exact, and not an estimate) using Approach 3 is given as the line with crosses, overlaid on Approaches 1 and 2 distributions

Figure 5.The risk distribution (exact, and not an estimate) using Approach 3 is given as the line with crosses, overlaid on Approaches 1 and 2 distributions

The exact computation is both trivial and fast.

R Code

References

Clemen, R. 1996.  Making Hard Decisions, 2nd edition.  Belmont, CA: Duxbury Press.
Dillon- Merrill. R.L., G .S. Parnell, and D.L. Buckshaw. 2007. “Logic Trees Fault, Success, Attack. Event, Probability, and Decision Trees.” In John G. Voeller (ed.), Wiley Handbook of Science and Technology for Homeland Security. Hoboken, N.J.: Wiley and Sons.
Keeney, R.L., and H. Raiffa. 1976. Decision Making with Multiple Objectives Preferences and Value Tradeoffs. New York: Wiley.
Kirkwood, C.W. 1997. Strategic Decision Making: Multi-objective Decision Analysis with Spreadsheets. Belmont, Calif.: Duxbury Press.
Parnell, G.S. 2007. “Multi-objective Decision Analysis.” In John G. Voeller (ed.), Wiley Handbook of Science and Technology for Homeland Security. Hoboken, N.J.: Wiley & Sons.
Parnell, G.S., P.J. Driscoll, and D.L. Henderson (eds.). 2008. Decision Making for Systems Engineering and Management. Wiley Series in Systems Engineering, Andrew P. Sage (ed.). Hoboken, N.J.: Wiley and Sons.
Paté-Cornell, E.E., and R.L. Dillon. 2006. “The Respective Roles of Risk and Decision Analysis in Decision Support.” Decision Analysis 3(4):220-232.


Jeffrey StricklandAuthored by:
Jeffrey Strickland, Ph.D.

Jeffrey Strickland, Ph.D., is the Author of “Predictive Analytics Using R” and a Senior Analytics Scientist with Clarity Solution Group. He has performed predictive modeling, simulation and analysis for the Department of Defense, NASA, the Missile Defense Agency, and the Financial and Insurance Industries for over 20 years. Jeff is a Certified Modeling and Simulation professional (CMSP) and an Associate Systems Engineering Professional. He has published nearly 200 blogs on LinkedIn, is also a frequently invited guest speaker and the author of 20 books including:

  • Discrete Event simulation using ExtendSim
  • Crime Analysis and Mapping
  • Missile Flight Simulation
  • Mathematical modeling of Warfare and Combat Phenomenon
  • Predictive Modeling and Analytics
  • Using Math to Defeat the Enemy
  • Verification and Validation for Modeling and Simulation
  • Simulation Conceptual Modeling
  • System Engineering Process and Practices
  • Weird Scientist: the Creators of Quantum Physics
  • Albert Einstein: No one expected me to lay a golden eggs
  • The Men of Manhattan: the Creators of the Nuclear Era
  • Fundamentals of Combat Modeling

Connect with Jeffrey Strickland
Contact Jeffrey Strickland

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s