You got @RISK, now what? Building technically sound simulation

June 26, 2018 | Author: Anonymous | Category: Math, Statistics And Probability

Short Description

Download You got @RISK, now what? Building technically sound simulation...

Description

You got @RISK, now what? Building technically sound simulation models Dr. Huybert Groenendaal Dr. Francisco Zagmutt Managing Partner EpiX Analytics [email protected]

www.epixanalytics.com

Outline • Background – @RISK, simulation, this talk • ModelAssist • A motivational example • Technical considerations for sound models

• A handy checklist to keep in mind

© EpiX Analytics LLC

@RISK and simulation modeling • Stochastic modeling – random variables used to simulate all possible outcomes • @RISK: Monte Carlo simulation, forecasting, and optimization tool • Wide range of fields and applications: • • • • •

Pharmaceuticals Mining Manufacturing Transportation Insurance

© EpiX Analytics LLC

• Financial industry • Health / Food safety

• Energy, oil & gas • Government • Many others

The emphasis of this talk Often deterministic (fixed values) models are “upgraded” to stochastic ones During conversion or design of new models, several mistakes can happen. Mistakes affect spread and shape of outputs -> statistics e.g. percentiles and variance.

Impact on decision supported by model from irrelevant to ‘fatal’ Therefore, key to get them right…

© EpiX Analytics LLC

Correct

FATAL!!

ModelAssist for @RISK Free simulation and @RISK training and reference tool: http://www.epixanalytics.com/ModelAssist.html

 Based on EpiX Analytics’ decades of experience in risk analysis consulting, training, and research  Option to install locally or visit online

 Page numbers are Mxxxx. For example, M0407 is “Selecting the appropriate distributions for your model”  Keep an eye on these during this talk!

© EpiX Analytics LLC

A motivational example You are in charge of predicting next year’s total borrowed money for the consumer lending division in a bank Making lots of assumptions, a very simplistic calculation could be: Expected totalBorrowed= # of clients x loan size +… …however, the number of individual loans and the loan size is highly variable (and uncertain for the future) How do we incorporate this variability in our calculation above?

© EpiX Analytics LLC

We hire a marketing consultancy, and after $4.2M in fees, they conclude that: • # loans/year is Pert(450,650,800) • Loan size is $Lognormal(1000,4000)

Therefore, we could use @RISK to “stochastize” prior calculation using: Expected totalBorrowed= Pert(450,650,800)*Lognormal(1000,4000)+...

Lets see how this model works…

© EpiX Analytics LLC

Let’s visualize what our model does

550

28,126

© EpiX Analytics LLC

Iteration

# loans

1

550

X

$ per loan

Total borrowed

28,126

15,469,300

Let’s visualize what our model does

770

6,307

© EpiX Analytics LLC

Iteration

# loans

1

550

2

770

$ per loan

Total borrowed

X

28,126

15,469,300

X

6,307

4,856,390

Let’s visualize what our model does

498

864

Iteration

# loans

$ per loan

Total borrowed

1

550

X

28,126

15,469,300

2

770

X

6,307

4,856,390

3

498

X

5,499

1,561,716

4

864

X

26,795

23,150,880

. . . n

5,499

26,795

Correct

FATAL!!

Are these calculations correct? Model says that for each scenario, every client borrows the same amount!! How would this affect your model? © EpiX Analytics LLC

Technical considerations for sound models 1. When a multiplication is a sum 2. Be certain to model uncertainties 1. Confusing variability and uncertainty 2. Reporting of variability and uncertainty 3. Dependencies matter 4. Separate inputs from calculations

© EpiX Analytics LLC

1. When a multiplication is a sum (M0089) Multiplications can be a shortcut to sum identical numbers e.g. 10*3=10+10+10 However, random numbers are not identical. E.g. the sum of n Lognormal(10,4) ≠ Logn(10,4) * n •

Impact: gross overestimation of output variance.

•

Correct modeling (M0435): • Simulate n distributions individually, then sum them • Use identities (e.g. Binomial(1,p)+Binomial(1,p)=Binomial(2,p)) • Use CLT approximation (e.g. N(m,s)+N(m,s)=N(2m,s√2)) • Use actuarial methods (FFT, Panjer, DePril) • The RiskCompound function

•

Confused yet? Lets see how to solve our banking problem, now also calculating revenue assuming 7.5% Interest Rate (IR)

© EpiX Analytics LLC

Correct approaches Approach 1 (M0075): Use the CLT (N(n*m,s√n)) to calculate total money borrowed, given n random # of clients. Then multiply by IR Approach 2 (M0435): Simulate every client individually and sum total money Borrowed. Then multiply by IR.

Model © EpiX Analytics LLC

Does this matter?

•Both correct approaches give the same answer •The incorrect approach greatly overestimates the total risk (sincorrect= 4x correct method!) © EpiX Analytics LLC

2. Be certain to model uncertainties The difference is subtle, but they can greatly impact a model Variability: heterogeneity (H) among individuals, or randomness (R) (due to chance, or random samples) A function of the system being modeled. Can’t be reduced. • • • •

N(m,s)

Distribution of household income (H) Initial production of oil/gas well (H) Catastrophic events (R) Quality control sampling (R)

Uncertainty: lack of knowledge about the value of a parameter (e.g. imperfect data) or about the right model to use. A function of the analyst. Thus, can be reduced with more data • • • •

N(

Probability of loan defaults Mean, P10 and P90 of household income Post-launch sales Poverty rates in a country

© EpiX Analytics LLC

,

)

2.1. Confusing variability and uncertainty • Impact: under or over estimation of output variance. • Correct modeling: • Variability: should be repeated in model to represent heterogeneity or randomness (M0247). • Uncertainty: • It’s one value, we just don’t know it: thus, show only once in model (M0088) • For this reason, can usually be treated the same way as a point estimate (multiply, divide, etc)

© EpiX Analytics LLC

What happens if I replicate uncertainties? Uncertainty: what the observer doesn’t know. • •

Impact: underestimation of output spread (e.g. variance). Correct modeling: represent uncertainty distribution only once in the model.

Example: We randomly sample 100 individuals in a region and 8 have a disease. How many total infected individuals will there be next week, given the below populations for areas A-E within the region? Area ID A B C D E

© EpiX Analytics LLC

Population in the region 10,000 5,000 6,000 1,000 9,000

Model

What happens if I replicate uncertainties? Correct modeling: uncertainty on proportion of disease p = Beta(8+1, 100-8+1) once. Then simulate infected/area using Binomial(population, p) and sum areas.

∑

Incorrect modeling of uncertainty: sample p for each area, then simulate and sum areas as above.

© EpiX Analytics LLC

∑

Does this matter? Certainly! Replicating uncertainties grossly underestimated the risk, even in this basic example.

Likewise (and to add to the confusion!), if I don’t replicate variability, risk will be overestimated © EpiX Analytics LLC

2.2. Reporting uncertainty and variability Cumulative Frequency of Data 1.0 0.9

Cumulative Frequency

1. Output = probability. Report distribution of uncertainty in probability:

0.8 0.7

0.6 0.5 0.4 0.3 0.2 0.1 0.0 0

5

10

15

20

25

30

35

40

45

Prevalence

1 0.9

0.8

Cumulative probability

2. If output = distribution. Report distributions representing uncertainty and variability (2 dimensions) of output:

0.7 0.6 0.5

0.4 0.3 0.2

0.1 0 0

20

40

60

80

100

120

Number of people affected

© EpiX Analytics LLC

140

160

180

200

Second order cumulative plot (M0406) Evaluating impact of variability and uncertainty in results

Variability dominates results

Variability and uncertainty equally affect results

© EpiX Analytics LLC

3. Dependencies matter! Important rule: every iteration (sample) in a simulation model has to be a possible scenario Correlated (dependent) random variables can’t be sampled independently from each other as this would create impossible scenarios e.g. Sampling a high S&P500 and a very low Dow Jones. Cost estimation of a project, independent of schedule risks

• •

•

Impact: typically underestimation of output variance (but can also result in overestimate)

© EpiX Analytics LLC

Correlation modeling options

• •

© EpiX Analytics LLC

Beta posterior

8

7

Be 6

5

4

5.5

5

4.5

4

3.5

3

2.5

3

2

•

9

1.5

•

Beta posterior vs Alpha posterior 10

Alpha posterior

38

36

34

NPV ($M)

•

Linear correlation: Rank order correlation. Most commonly used in @RISK, but limited. Non-linear: Bootstrap: flexible and easy to implement (M0264) Bayesian MC or MCMC: flexible, but harder to implement (M0052) Conditional model logic (e.g. IF statements): to model causal flow/conditionality (M0097) Copulas: more complex but restricted shapes

32

30

28

26 2000

2200

2400

a)

2600

2800

3000

3200

2800

3000

3200

BOPD

38

36

34

NPV ($M)

•

32 30

28 26 2000

c)

2200

2400

2600 BOPD

4. Separate inputs from calculations (M0088) NEVER hardcode an input variable within a calculation NEVER REALLY, DON’T DO IT! Instead, keep a separate sheet with inputs and refer to them within your calculations. Named ranges even better. Let’s see if I can convince you…

Client hired a consultancy to develop a MC forecasting model. Previously, model had predicted spot-on an unexpected growth However, lately the model has been getting it pretty wrong Hence, client called us to bring the model to its past glory © EpiX Analytics LLC

4. Separate inputs from calculations As standard practice, before working on somebody else’s work, we audit it … And in this case we found a little “surprise”. Below is one of the key calculations for the forecast =IF(OR(ISBLANK($E4),ISBLANK($G4),$T4="",ISBLANK($W4),$AR4="")," ,(IF(Summary!$H$6="X",VLOOKUP($E4,Actuals,24,FALSE),((Mar!AZ4*(1+(IF($T4=12,I F(Mar!AZ4

You got @RISK, now what? Building technically sound simulation

Short Description

Description

Comments

We need your help!