Monday, 24 March 2014

Did 'Sauron' in 'Lord of The Rings' have a Single Point of Failure?

Disclaimer: The views expressed in the blog are personal. It is not meant to hurt any sentiments. Any connections direct or indirect to anyone, any place or anything is purely coincidental. It is only meant to be fun to read and possibly portray me as a great thinker ;)

Most of us have been regaled by the works of JRR Tolkien - 'Lord of The Rings', 'The Hobbit', be it movies or books. I am a big fan and claim to have gobbled through the whole LOTR book and cannot get tired of watching the movies over and over again.
The last edition of 'The Hobbit' movie drove me into some research on characters. One character that never seems to give up and keeps coming back again and again since the beginning of Tolkien's mythical world is Sauron/Necromancer there are other names for him that can be found in other books of his. Kudos to his tenacity! If this has piqued your interest in him and you don't want to trawl through books you can read a short biography on this link (Thank you IMDB).
What if Sauron had learnt lesson from his previous failures and hired a consultant to have a look at his plan of conquering the world? What would the process flow look like? I put myself in the poor auditor's shoes, kept my hate aside for all his wrong doings and objectively looked at his strategy. Here it is in brief:

Process map of Sauron's conquest of the World.

Sauron had accumulated the resources and invested the required time into putting his plan in action and the plan went on for the first two eras (approx 2000 yrs). We could get into analysis of each individual step and discuss the pros and cons of each but since the focus of this topic is to identify the failures let us stick to that.

A close look at the stories that took place in various eras reveals one thing in common i.e. Sauron launches a mass attack on other kingdoms. All of them revolt individually or for an alliance against him. Then a war ensues and ultimately Sauron loses. An even closer look at the end of these battle would show that Sauron is about to win the battle when he some how manages to lose the ring leading to his loss of power and hence the war. Following is a brief flow of events across ages:

Event flow leading to Sauron's defeat






















What does this have to do anything with practical world anyway? We can say that the ring in this case is a 'Single Point of Failure' or 'Bottleneck' in business terms or 'Achilles' heel' in literary terms. A single point of failure (SPOF) is a part of a system that, if it fails, will stop the entire system from working. (Wikipedia).
Any industry best practice warns an organization against such SPOFs. There are three basic principles in security: Confidentiality, Integrity and Availability. SPOF has a direct impact on system or process availability and the effect is easier to comprehend. Example, Say all the organization data is stored in a single server. Then the server is an SPOF. In case of server failure, data becomes unavailable for everyone. Another example from a different industry, say a hospital is situated in a place with only one approach route. In case a natural calamity like snow storm or landslide blocks the route the services of the hospital would be cut off when it is needed the most. Even confidentiality and integrity are affected by such failure points. Any audit has a clear directive to identify and report such single points of failure in the organization.

What to do? There is no single shot to cure it all. Careful monitoring of systems and processes is required. A common practice to circumvent the issue is to maintain redundancy. If a bottleneck is unavoidable the onus is on the organization to have special continuity and recovery strategies in place. Special care should be given to prevent such points in the process flow from failing. In manufacturing critical machine components are always on standby and continuous and proactive/preventive maintenance is carried out. Imagine if Sauron had backup rings and could whip it out when in dire straits and remotely deactivate the lost ring instead of frantically searching for it! Well then the story won't be so much fun and I like happy endings in my story.

I would go even further and say that Lord Voldemort (Harry Potter Series) was more sensible, if compared to Sauron, as he had created backup Horcruxes stowed away here and there as backups. Although I am happy that both villains suffer the same fate in the end but the same is not true when it comes to business processes.

Ciao!

Monday, 27 January 2014

Experiment to study 'High Impact Low Probabitlity Risks' using R

Here we would have a look at an experiment to understand how quantitative risk analysis can be used for simulation. An open source tools ‘R’ was used to perform the experiment.
This experiment deals with High impact low probability risks.

Model

This model utilizes the basic principle of risk definition which is product of likelihood of occurrence and impact. The impact parameter is binary i.e. either there is an impact or there is no impact at all. For the purpose of satisfying our basic premise of high impact and low probability we would assign high probability for no impact case and maximum impact value (10 on a scale of 1 to 10) for a case where event has occurred.
In a single experiment 100 samples would be taken from the population of 0 and 10 and the impact values would be summed up. This experiment would be replicated for 1000 times and impact vs. frequency mapping is created.
Following is the R code to depict the model


# This function simulates the single experiment of occurrence of event
high.impact = function(n=100){
  impact = 0
  p1=<probability of non occurrence>
  p2=1-p1 #probability of occurrence
  impact = sample(c(0,10), size=n, replace=TRUE, prob = c(p1,p2))
   sum(impact) # total impact in a single experiment
}
#single function call to simulate single experiment
high.impact()
#perform the experiment 10000 times
F=replicate(1000,high.impact())

#tabulate results
table(F)

#plot results
par(mfrow=c(1,1))
plot(table(F), type = "l")
Output
Since events have low probability of occurrence the probability of occurrence would be kept low, for which probability of non-occurrence would be high. For this simulation probabilities of non-occurrence were kept in range of 0.90 to 0.99.  Outputs of above simulation for different probabilities are as follows:
Probability of Non-Occurrence P1=0.90 (Use the probability p1 in R code above)

Probability of Non-Occurrence P1=0.95


Probability of Non-Occurrence P1=0.99


Observations

Based on the above data following observations were made:
  1. As the probability of occurrence increases net impact on organization increases.
  2. Even at very low probability of occurrence i.e. P2 = 0.01 or 1% the full impact of the event i.e. 10 can be seen for more approximately 50% of the time (~ 400 out of 1000)
  3. Since such events are not frequent even the tail risk needs to be seen






Tuesday, 21 January 2014

Risk Assessment Methodologies - A Comparison

There a number of risk assessment methodologies. This post defines briefly some widely accepted ones and would make a comparison of these methodologies. Different risk assessment methodologies are as follows:
 
CCTA Risk Analysis and Management Method (CRAMM):

Central Computer and Telecommunications Agency (CCTA) now renamed as Office of Government Commerce (OGC) developed this methodology for the British government. It incorporates Securing IT Hardware and Software with physical and human resource controls. 3 stages of CRAMM risk analysis are:
1. Identifying and valuing assets
2. Assessing threats and vulnerability
3. Selecting and recommending counter measures

  Failure Modes and Effect Analysis (FMEA):

It was originally developed for Hardware but can be effectively used for analysis of systems and software. Manufacturing industry has found FMEA to be useful for their risk analysis too. In this methodology potential failure of each part, process or module is identified. Modes can be the cause of the failure like man, machine, processes etc. Then effects these failures would have on immediate level, the intermediate level and across the system are examined. Total impact of failure in specific modules is calculated. A severity is assigned to it and personnel responsible for the module are identified. This has to be revised at regular intervals.

  Facilitated Risk Analysis Process (FRAP):

It enables organizations to pre-screen security related systems and processes to determine if risk analysis is needed. It is a method to help focus organizations on critical security issues. It consists of a range of tested approaches for conducting a qualitative risk assessment. It is simple and inexpensive to use hence it can be used for initial analysis.

  SP 800-30 and 800-66 by National Institute of Standards and Technology (NIST):

NIST developed 2 sets of Qualitative risk assessment techniques SP 800-30 and SP 800-66 for regulated industries like healthcare industry. SP 800-66 was written for clients who need to adhere to Health Insurance Portability and Accountability Act (HIPAA) in the US. Steps involved in this risk assessment are:
1. Characterize systems
2. Identify threats
3. Identify countermeasures
4. Determine likelihood
5. Determine impact
6. Determine risk
7. Recommend additional countermeasures
8. Document results

  Operationally Critical Threat, Assets and Vulnerability Evaluation (OCTAVE):

The OCTAVE approach was developed by the Software Engineering Institute (SEI) at Carnegie Mellon University in 2001 to address the information security compliance challenges faced by the US Department of Defense (DoD). This methodology uses self-directed, interdisciplinary team to analyze and evaluate security risks by reviewing operational risk and security practices. Technology is examined only in relation to security practices. It outlines set of principles e.g. to use the self-directed team to evaluate risk. Based on these principles it defines required attributes or characteristics of evaluation process. It generates outputs which are required outcomes of each phase of analysis.

  PUSH:
It is an acronym for service based risk assessment solution that involves the following 4 Phases
1. Preparation - Defining the audience and purpose of risk assessment
2. Universe definition - Identifying and characterizing the most critical assets, risk and controls
3. Scoring - Choosing a consistent scale to rate the importance of assets, the impact of risks, and effectiveness of controls
4. Hitting the mark - ensuring the risk assessment fulfills the purpose set out in the planning phase using a documented methodology

  Spanning Tree Analysis Methodology

In this a map or tree of all possible threats to and Information system is created. Branches denote general categories of threats e.g. physical or network threats. More detail is added as leaves for each branch. When assessing risk, organizations prune tree branches that don't apply to their situation.

  Security Officers Management and Analysis Project (SOMAP):

It was developed by a Swiss Non-Profit Organization. They created a guide and risk assessment tool to guide in risk assessment analysis for open source systems or enterprises. It discusses both quantitative and qualitative risk assessment methods and the importance of aligning goals with the business goals of the organization.
It is a 5 Stage cyclic workflow for risk assessment as depicted below:
 
 
Value at Risk (VAR) methodology:

VAR is a theoretical quantitative measure of Information Security risk. This methodology helps create a summary of worst loss due to a security Breach and create a workable balance between cost of implementing controls and reducing risk. In this methodology both tangible and intangible assets are considered. Examples of intangible assets include copyrights, collaboration activities, IP, public perceptions, and structural activities. It requires a 4 stage cycle as depicted below.
 


Comparison of methodologies

The table below compares various methodologies mentioned above:

Features
Methodologies
Type
Industry
Characteristics
CRAMM
Qualitative
IT Hardware
Contains a very large countermeasure library consisting of over 3000 detailed countermeasures organized into over 70 logical groupings
Old methodology. Limitation of language (available in English and Dutch)
FMEA
Qualitative (Rating based formulae used)
Any
Used extensively by quality professionals
Aims at finding root cause and is used in conjunction with fish bone diagrams
Helps in early identification of single point failures
Does not deal with multiple failures in subsystems
Does not give an exact idea of how bad the risk is as it uses ordinal scales
FRAP
Qualitative (Uses expert panel opinion to identify critical risks)
Any
Used to pre-screen risks quickly and at low cost. Increases org. focus on critical issues
Can limit org. view of risks it faces
Difficult to find an efficient panel of diverse experts who can reliably come up with reliable estimates. Also prone to personal opinions and prejudices.
Multiple teams are formed to perform FRAP to counter above limitations
NIST (SP 800-30, 800-66)
Qualitative
Regulated Industries
Developed for industries regulated by HIPAA and FISMA
Extensive process and stresses on proper documentation
The complete process suggested might not be cost effective for less regulated industries with higher risk tolerance
OCTAVE
Qualitative
Any
A suite of tools, techniques, and methods for risk-based information security strategic assessment and planning
There are 3 OCTAVE methods suited to different industry needs
PUSH
Qualitative
Any Enterprise System
 
Spanning Tree
Qualitative
Any
Simple to use. Easy way to map risks and subsets of risks. Graphical method.
Cannot be used for detailed analysis purpose other than identifying and prioritizing threats.
Has to be used simultaneously with other techniques.
SOMAP
Qualitative & Quantitative
IT
It is an open source IT risk assessment and management methodology.
Freely available to customers.
Stresses on collaborative development.
Comes with a suite of tools.
VAR
Quantitative
Financial primarily. Can be used in any industry.
Capable to produce accurate results for valuation of risk.
Can be used to create simulations and predictive models of risk.
Extensive research already done on methods related to calculation of VAR.
Mathematical model dependent so errors in estimation can lead to catastrophic effects especially in financial and banking sectors.
Difficult to incorporate complex and intangible risks e.g. human behavior, political effects




 Bibliography