Application of Forecast Verification Science to Operational River Forecasting in the National Weather Service Julie Demargne, James Brown, Yuqiong Liu and D-J Seo UCAR
NROW, November 4-5, 2009
Approach to river forecasting Observations
Forecasters
Models Input forecasts
EVAPOTRANSPIRATION
INFILTRATION FREE
PERCOLATION
LOWER ZONE
Users
TENSION
UPPER ZONE
PRIM ARY FREE
RESERVED
INTERFLOW
SURFACE RUNOFF
TENSION TENSION SUPPLEM ENTAL FREE
RESERVED
BASEFLOW
SUBSURFACE OUTFLOW
DIRECT RUNOFF
Forecast products
Forecasters 2
Where is the …? In the past Verification ?? ?
• Limited verification of hydrologic forecasts
• How good are the forecasts for application X?
3
Where is the …? Now Verification !!!
Papers
Verification Experts
Verification Products Verification Systems 4
Hydrologic forecasting: a multi-scale problem
National
Forecast group
Major river system
River basin with river forecast points
Headwater basin with radar rainfall grid
High resolution flash flood basins
Hydrologic forecasts must be verified consistently across all spatial scales and resolutions.
5
Hydrologic forecasting: a multi-scale problem Forecast Uncertainty
Years
Seasons Months
Forecast Lead Time
Weeks Days Hours Minutes Protection of Life & Property
Benefits Hydropower
Flood Mitigation & Navigation
Recreation
Agriculture
Ecosystem
Reservoir Control
State/Local Planning
Health
Environment
Commerce
Seamless probabilistic water forecasts are required for all lead times and all users; so is verification information. 6
Need for hydrologic forecast verification • In 2006, NRC recommended NWS expand verification of its uncertainty products and make it easily available to all users in near real time
Users decide whether to take action with risk-based decision Must educate users on how to interpret forecast and verification info
7
River forecast verification service
http://www.nws.noaa.gov/oh/ rfcdev/docs/ Final_Verification_Report.pdf
http://www.nws.noaa.gov/oh/rfcdev/docs/ NWS-Hydrologic-Forecast-VerificationTeam_Final-report_Sep09.pdf.pdf
8
River forecast verification service • To help us answer How good are the forecasts for application X? What are the strengths and weaknesses of the forecasts? What are the sources of error and uncertainty in the
forecasts? How are new science and technology improving the
forecasts and the verifying observations? What should be done to improve the forecasts? Do forecasts help users in their decision making?
9
River forecast verification service River forecasting system Observations
Verification systems
Models Input forecasts
Forecasters Users
EVAPOTRANSPIRATION
TENSION
UPPER ZONE
INFILTRATION FREE
PERCOLATION
LOWER ZONE
PRIM ARY FREE
INTERFLOW
SURFACE RUNOFF
TENSION TENSION SUPPLEM ENTAL FREE
RESERVED
RESERVED
BASEFLOW
SUBSURFACE OUTFLOW
Users
DIRECT RUNOFF
Forecast products
Verification products
10
River forecast verification service •
Verification Service within Community Hydrologic Prediction System (CHPS) to: Compute metrics Display data & metrics Disseminate data & metrics
Provide real-time access to metrics Analyze uncertainty and error in forecasts Track performance
11
Verification challenges • Verification is useful if the information generated leads to decisions about the forecast/system being verified Verification needs to be user oriented
• No single verification measure provides complete information about the quality of a forecast product Several verification metrics and products are needed
• To facilitate communication of forecast quality, common verification practices and products are needed from weather and climate forecasts to water forecasts Collaborations between meteorology and hydrology communities
are needed (e.g., Thorpex-Hydro, HEPEX) 12
Verification challenges: two classes of verification • Diagnostic verification: to diagnose and improve model performance done off-line with archived forecasts or hindcasts to analyze
forecast quality relative to different conditions/processes
• Real-time verification: to help forecasters and users make decisions in real-time done in real-time (before the verifying observation occurs)
using information from historical analogs and/or past forecasts and verifying observations under similar conditions 13
Diagnostic verification products • Key verification metrics for 4 levels of information for single-valued and probabilistic forecasts 1. Observations-forecasts comparisons (scatter plots, box plots, time series plots) 2. Summary verification (e.g. MAE/Mean CRPS, skill score)
3. More detailed verification (e.g. measures of reliability, resolution, discrimination, correlation, results for specific conditions) 4. Sophisticated verification (e.g. for specific events with ROC)
To be evaluated by forecasters and forecast users 14
Diagnostic verification products
Forecast value
• Examples for level 1: scatter plot, box-and-whiskers plot
User-defined threshold
Observed value 15
Diagnostic verification products • Examples for level 1: box-and-whiskers plot ‘Errors’ for one forecast Max.
90% 80%
Median
20% 10%
Forecast error (forecast - observed) [mm]
American River in California – 24-hr precipitation ensembles (lead day 1) Zero error line
“Blown” forecasts
High bias
Low bias
Min.
Observed daily total precipitation [mm]
16
Diagnostic verification products • Examples for level 2: skill score maps by months January
April
October
Smaller score, better 17
Diagnostic verification products • Examples for level 3: more detailed plots Score
Performance under different conditions
Score
Performance for different months 18
Diagnostic verification products • Examples for level 4: event specific plots Event: > 85th percentile from observed distribution
Reliability
Perfect
Predicted Probability
Discrimination
Probability of Detection POD
Observed frequency
Perfect
Probability of False Detection POFD 19
Diagnostic verification products • Examples for level 4: user-friendly spread-bias plot “Hit rate” = 90%
60% of time, observation should fall in window covering middle 60% (i.e. median ±30%)
60%
“Underspread”
20
Diagnostic verification analyses • Analyze any new forecast process with verification • Use different temporal aggregations Analyze verification statistic as a function of lead time If similar performance across lead times, data can be pooled
• Perform spatial aggregation carefully Analyze results for each basin and results plotted on spatial maps Use normalized metrics (e.g. skill scores)
Aggregate verification results across basins with similar hydrologic
processes (e.g. by response time)
• Report verification scores with sample size In the future, confidence intervals 21
Diagnostic verification analyses • Evaluate forecast performance under different conditions w/ time conditioning: by month, by season w/ atmospheric/hydrologic conditioning: – low/high probability threshold – absolute thresholds (e.g., PoP, Flood Stage)
Check that sample size is not too small
• Analyze sources of uncertainty and error Verify forcing input forecasts and output forecasts
For extreme events, verify both stage and flow Sensitivity analysis to be set up at all RFCs: 1) what is the optimized QPF horizon for hydrologic forecasts? 2) do run-time modifications made on the fly improve forecasts?
22
Diagnostic verification software • Interactive Verification Program (IVP) developed at OHD: verifies single-valued forecasts at given locations/areas
23
Diagnostic verification software • Ensemble Verification System (EVS) developed at OHD: verifies ensemble forecasts at given locations/areas
24
Dissemination of diagnostic verification • Example: WR water supply website http://www.nwrfc.noaa.gov/westernwater/ Data Visualization
Error •MAE, RMSE •Conditional on lead time, year
Skill •Skill relative to Climatology •Conditional
Categorical •FAR, POD, contingency table (based on climatology or user definable) 25
Dissemination of diagnostic verification • Example: OHRFC bubble plot online http://www.erh.noaa.gov/ohrfc/bubbles.php
26
Real-time verification • How good could the ‘live’ forecast be? Live forecast Observations
27
Real-time verification • Select analogs from a pre-defined set of historical events and compare with ‘live’ forecast
Analog 3
Analog 2
Analog 1
Observed Live forecast Analog Observed Analog Forecast
“Live forecast for Flood is likely to be too high” 28
Real-time verification • Adjust ‘live’ forecast based on info from the historical analogs Live forecast
What happened
Live forecast was too high 29
Real-time verification • Example for ensemble forecasts Live forecast (L) Analog forecasts (H): μH = μL ± 1.0˚C
Temperature (oF)
Analog observations
“Day 1 forecast is probably too high” Forecast lead day
30
Real-time verification • Build analog query prototype using multiple criteria Seeking analogs for precipitation: “Give me past forecasts for
the 10 largest events relative to hurricanes for this basin.” Seeking analogs for temperature: “Give me all past forecasts
with lead time 12 hours whose ensemble mean was within 5% of the live ensemble mean.” Seeking analogs for flow: “Give me all past forecasts with lead
times of 12-48 hours whose probability of flooding was >=0.95, where the basin-averaged soil-moisture was > x and the immediately prior observed flow exceeded y at the forecast issue time”.
Requires forecasters’ input!
31
Outstanding science issues • • • • • •
Define meaningful reference forecasts for skill scores
•
Account for observational error (measurement and representativeness errors) and rating curve error
•
Account for non-stationarity (e.g., climate change)
Separate timing error and amplitude error in forecasts
Verify rare events and specify sampling uncertainty in metrics Analyze sources of uncertainty and error in forecasts Consistently verify forecasts on multiple space and time scales
Verify multivariate forecasts (issued at multiple locations and for multiple time steps) by accounting for statistical dependencies
32
Verification service development OHD OCWWS NCEP
Forecasters Users
Academia Forecast agencies COMET-OHD-OCWWS OHD-NCEP Thorpex-Hydro project collaboration on training Private OHD-Deltares collaboration for CHPS enhancements
HEPEX Verification Test Bed (CMC, Hydro-Quebec, ECMWF)
33
Looking ahead •
•
2012:
Info on quality of forecast service available online
real-time and diagnostic verification implemented in CHPS
RFC verification standard products available online along with forecasts
2015:
Leveraging grid-based verification tools
34
Thank you Questions?
FORECASTER
FORECASTER
[email protected] 35
Extra slide
36
Diagnostic verification products •
Key verification metrics from NWS Verification Team report
37