Forecast Verification and Validation via Backtesting

john's picture

A critical component of any forecast program, in any field, is verification and validation.  Many of the standard methods are listed at http://www.cawcr.gov.au/projects/verification/.  Although this site is concerned primarily with testing of weather and climate forecasts, the methods are general and have been adopted by, and applied to data from, many fields, including finance, earthquakes, and social systems.

Here we illustrate the use of these methods by "backtesting" the forecasts for Japan that have been the subject of previous posts.  Backtesting is the process of using the data from the ANSS earthquake catalog for Japan, available for example from the Advanced National Seismic System at http://www.ncedc.org/cnss/catalog-search.html.  The portion of the catalog used to make the previous forecasts involves the region of Japan from Latitudes 28 to 42 degrees north, Longitudes 127 to 146 degress east, and depths less than 40 km (25 miles). Using our forecast methods, we test over previous periods of time to determine the reliability and skill of our probability calculations.

We consider large earthquakes in the defined region of Japan having large magnitudes, M>7.2.  The forecast of interest is for 24 months into the future, with the average recurrence interval for such events being 31.5 months.   There are several variables or parameters in a forecast of such earthquakes.  So our first step is to determine what values of these parameters lead to forecasts with the maximum forecast skill, or minimum reliability error.  We therefore vary the parameters over a wide range, and compute the reliability and skill for each set of parameters. We then use the best or optimal parameters in computing our forecasts.

A systematic method involves the use of Reliability("Attribute") diagrams, which are scatter plots of the observed frequency of such earthquakes, along the vertical axis, against forecast probability along the horizontal axis.  A perfectly reliable forecast will have data that lie along the diagonal line from lower left to upper right. it has been shown that reliability tests are conditional, in that they are conditioned on the forecast ("given the forecast, what was the outcome?").  Our example of a reliability diagram is shown below, computed for the time period from 1990 to April 20, 2011 for earthquakes in Japan having M>7.2:

In the figure above, the dashed black diagonal line denotes perfect reliability.  The dashed diagonal red line is the zero-skill line.  The red dot represents the average, called the "climatology point".  The inset histogram represents the fraction of forecasts during each 3.65-day period in 1990-April 20, 2011 having the indicated probability.  These methods are described in much more detail at the web site above and in references therein: http://www.cawcr.gov.au/projects/verification/  The plot shown here was made with the parameters in the forecast model having the maximum skill.  Note that the blue dots and blue line reliabilty curve fluctuate generally along the black dashed diagonal line, indicating the existence of positive skill and relatively small reliability error.

We can also plot a temporal Receiver Operating Characteristic ("ROC") diagram.  This type of diagram plots fraction of successful forecasts ("hits") along the vertical axis against fraction of "false alarms" along the horizontal axis. 

In this diagram, the black dashed diagonal line represents the "no skill" line which would be attained by a completely random forecast.  Here we would like the fluctuating blue data line to be in the upper left hand corner of the plot, residing in the region of high fraction of hits at small false alarm rate.   For example, we can use the plot to see that about a fraction 0.5 (50%) of the roughly 2130 forecasts made in each 3.65 day period were followed by an M>7.2 earthquake within the 24 month time window, at a false alarm rate of only 0.1 or 10%.

Using the optimal forecast parameters, we can now plot the probability timeseries for M>7.2 earthquakes within the region of interest:

Comments

DavidAlexander's picture

I totally agree with your point that the most critical thing for any firm is verification and validation. According to the ukbestessays services these reports of verification is very important and every firm should maintain the updated reports. The information and graphs shared above are very useful and informative.

screen23's picture

This is the best ever website for the paypal generator online where you want the online money.

Talmadge's picture

A debt of gratitude is in order for composing this article today. This article has given me the assistance with my http://bestessay-services-reviews.com/ project.

Risk Alert