In a desire to provide the user community with more information
regarding the quality of the Comprehensive pacific rainfall database, we
have undertaken a series of tests to examine the data in more detail.
The first test analyzed the percentage of zero/trace rainfall and the average
daily rainfall per day of the week (Mon., Tues., etc.). The purpose
of this test was to potentially uncover the human element in the recording
of data – for example, perhaps an employee reports erroneous values for
the weekends when he/she returns on Monday. It was discovered in
this examination that one of the most dominant error patterns was not indicating
rainfall as an accumulated value while incorrectly recording zero values
prior to the accumulation. A clear example of this error pattern
is illustrated below :
| Station | # Sun. w/data | # Mon. w/data | # Tue. w/data | # Wed. w/data | # Thu. w/data | # Fri. w/data | # Sat. w/data |
| 14000 | 759 | 831 | 828 | 826 | 811 | 820 | 754 |
| Station | % Sun. zeros | % Mon. zeros | % Tue. zeros | % Wed. zeros | % Thu. zeros | % Fri. zeros | % Sat. zeros |
| 14000 | 0.72 | 0.26 | 0.41 | 0.38 | 0.45 | 0.43 | 0.72 |
| Station | Sun. mean
rain |
Mon. mean
rain |
Tue. mean
rain |
Wed. mean
rain |
Thu. mean
rain |
Fri. mean
rain |
Sat. mean
rain |
| 14000 | 78.4 | 308.9 | 170.6 | 159.6 | 133.5 | 138.2 | 80.2 |
It is possible for the pattern above to occur over a relatively short
period of time making it harder to detect. This was the main motivation
behind the next QC test which examined daily rainfall values having a cumulative
gamma probability greater than or equal to 0.98 AND having at least 5 preceding
zeros/trace/missing (gamma distributions were fitted separately for each
station).
Comparing rainfall records among neighboring stations was another QC
test that was performed. The comparisons were done over 5-year periods
for those atolls, raised atolls, and low island stations that recorded
a sufficient amount of data for the period. Initial results, while
perhaps showing some significant differences in daily rainfall statistics
among some of the neighboring stations, were inconclusive.
The last QC test used a tropical storm database in conjunction
with a tropical storm model (see descriptions below) to get estimates of
how much rain one might reasonably expect on tropical storm days.
Since the exact accuracy of the tropical storm model is unknown, one of
the most useful results of this analysis was to list instances when a station
recorded zero rainfall on a significant tropical storm day.
As a summary of how well each station performed on the QC tests
as a whole, a final score was calculated for each station. Only stations
which had a sufficient amount of data were scored. Since the scores
for this test are somewhat subjective, one should view them with caution
and not assume a station’s degree of reliability based on it’s score.
Also, stations with high quality data for a fraction of their of the overall
record would receive a low to medium score. In summary, these techniques
have been applied to examine the accuracy of the data, and to determine
potential problems. The use of the data is, of course, ultimately
left up to the individual user. We look forward to hearing comments
from the user community about the efficacy of these results. The
files produced by the different tests, and a description of the methods
and results, are provided below.
Description of QC files and methodologies
QC files (in Microsoft Excel, total space 4.76Mb):
Download ALL files in one master zip file.
1) histog.xls
2) station_stats.xls
3) large_amts98.xls
4) large_amts98_5.xls
5) large_amts10.xls
6) large_amts_accum.xls
7) stats71to75.xls
8) stats76to80.xls
9) stats81to85.xls
10) stats86to90.xls
11) stats91to97.xls
12) dow_table.xls
13) hurr_NH_all.xls
14) hurr_SH_all.xls
15) hurr_susp_zeros.xls
16) final_scores.xls
1) histog.xls download
Histogram of daily rainfall from ALL the stations in the Pacrain database from the period 1971 to 1997. Amounts are in 10ths of mm.
2) station_stats.xls download
(Note: missing values in station_stats.xls are given as –99999)
Lat and Lon : latitudes and longitudes given in decimal format
with S and W as negative.
Elev : elevation given in meters above sea level
Start YrMo, End YrMo : starting and ending years/months (YYYYMM) of
station record.
Tot. Poss. Meas. Days : indicates the total number of days between
the start and end dates of the station record.
NUM Missing : number of days when a ‘missing’ was recorded –
NOT the same as the number of days in data gaps.
Most Consec. Years No Meas. : indicates the largest data gap
in years
AVG, MED, MAX, STD of Daily Rain : these statistics given in
10ths of mm (MED=median, STD=standard deviation).
% Zero and Trace : percentage of zeros and trace in the set of
daily rainfall records (does not include the number of days from data gaps).
Gamma PDF – Alpha : gamma distribution shape parameter (in 10ths of
mm).
Gamma PDF – Beta : gamma distribution size parameter (in 10ths of mm).
3) large_amts98.xls download
For each station, daily rainfall amounts with a cumulative gamma probability greater than or equal to 0.98 were selected (gamma distributions were figured separately for each station). Each rainfall is given in 10ths of mm, along with the year, month, and day. In addition, the following fields are given :
Num. Days Accum. : some stations report rainfall as accumulated
and this figure gives the number of prior days which make up the accumulated
amount (0 indicates that the amount is not an accumulated value).
Num. of Prec. Zeros/Trace/Missing : the number of zeros/trace/missing
which precede the large rainfall. This statistic might be useful
since it was found that some stations falsely don’t report values as accumulated
and fill in zero rainfall amounts in between the accumulated values.
4) large_amts98_5.xls download
Same as #3 above except that values are selected from #3 which have at least 5 preceding zeros/trace/missing (therefore are not accumulated values). The amount of at least 5 preceding zeros/trace/missing was chosen in order to increase the potential of the rainfall being an accumulated value. These values are candidates for the error pattern described at the bottom of #3. NOTE : the zero values which precede these large amounts are potentially suspicious as well. Other fields that have been added are :
Trop. Storm Day? (1=y, 0=n) : indicates whether there was a tropical
storm in the vicinity (<480km as defined by the tropical storm model.
Note that the model rainfall rates at the distance of 400km to 480km are
on the order of 0.1mm/hr for depressions and 0.8mm/hr for typhoons)
on that day.
If Trop. Storm Day, amt. rain exp. (mm) : a hurricane model (described
in hurr_NH_all.xls below) along with a database of pacific tropical storms
was used to determine estimates of expected rainfall during tropical storm
events given the storm’s intensity and proximity.
5) large_amts10.xls download
Daily rainfalls greater than 10 inches (for description of the tropical storm fields, see #4 above).
6) large_amts_accum.xls download
A small table showing accumulated rainfalls greater than or equal to 10 inches with a maximum of 3 days prior making up the accumulation.
7-11) statsYYtoYY.xls download
The purpose of creating these files was to map and compare daily rainfall
statistics over 5-year periods. Only atolls, raised atolls,
and low islands were considered. Since many stations do not have
continuous data, only stations with a minimum of 70% data during the period
in question were used (to insure a good degree of time overlap).
The mapping analysis did show some (perhaps) significant differences, however,
more work would need to be done for more conclusive results.
The rainfall amounts were given in 10ths of mm. The mean
and standard deviation without zero/trace amounts as well as the standard
deviation of all data were computed for each station. Also included
were the lag-1, 2, and 3 autocorrelations.
Maps of these statistics are available upon request.
12) dow_table.xls download
A useful table which takes into account the human element of recording data on a daily basis. For each station and for each year, average rainfall and percentage of zero/trace were computed for each day of the week (Mon, Tues, etc. – hence the name “dow” which stands for “day of the week”). For each station and year, the table shows the sample size for each day of the week. Error bars for each average and zero percentage were computed as follows :
for the average : +/- 1.5?/sqrt(N)
where ? is the sample standard deviation, and N is the sample length (where the sample is the set of values for a particular day of the week).
for zero percentage : +/- 1.5sqrt[ p(1- p)/N ]
where p is the zero percentage, and N is the sample length.
For both the average and zero percentage, values were compared (among
the days of the week for each year) and whenever the difference was large
enough such that there was no overlap of error bars, a significant difference
was defined. In the table, a “1” indicates there is a significant
difference (“0” indicates no significant difference).
13 –14) hurr_(“NH” or “SH”)_all.xls download
A database of all pacific tropical storms (ranging in intensity from
depression to typhoon) for both hemispheres for the years 1971-1997 was
used (the tropical storm database was obtained from data available to the
public at: 1) Unisys weather page, http://weather.unisys.com, and
2) JTWC’s Tropical Cyclone Best Track Data Site, http://www.npmoc.navy.mil/products/jtwc/best_tracks/index.html).
The tropical storm locations and intensity (highest sustained winds) in
the database are given every six hours (“best track”).
The goal of using this tropical storm data was to get an approximation
of how much rain to expect at a station given a storm’s intensity and proximity.
A rainfall rate model for northern hemisphere western pacific tropical
storms (ranging from depression to typhoon) was used from Adler et al.,
Monthly Weather Review, 1981, #109, p. 506-521. This model used satellite
estimates of rainfall.
The intensity of many of the SH tropical storms from the 1970s were
not known and in order to compute an expected rainfall, a value was needed
which was arbitrarily chosen as 35kts. In addition, some station
measuring times are not known, in which case, a measuring time of noon
local time was arbitrarily chosen.
A Normal distribution was fitted to the set of expected minus reported
rains and cumulative Normal probabilities were given for each such difference
(last field in the table).
15) hurr_susp_zeros.xls download
From the tropical storm files described above (#14-15), all zero rainfalls with cumulative Normal probabilities of expected minus reported rainfall greater than or equal to 0.75 were selected. These are zero rainfalls on significant tropical storm days. If the severity of the storm prevented rainfall measurement, then a “missing” instead of “zero” should have been reported. Zero rainfalls were specifically selected since this fits the error pattern described above in #3. Therefore, any following zeros as well as the first non-zero value after the zero value from a tropical storm day may be viewed with suspicion. Also, zero rainfalls were selected since we do not know the accuracy of the tropical storm model. With the error pattern described in #3, it is also possible to get large negative differences in expected minus reported rainfall – this is taken into account in the final_scores.xls file.
16) final_scores.xls download
Using a few quality control tests, “reliability” scores for stations
with enough data can be computed. This is a subjective process which
highlights the potential reliability of a station’s rainfall record.
The three QCs used were: QC1-rainfall per day of the week statistics (see
dow_table.xls above), QC2-gamma distribution statistics (see large_amts98_5.xls
above), and QC3-rainfall statistics during tropical storm events (see #s
14-16 above). For QC1, only stations having at least 5 years with
each year having of a minimum of 140 records were scored. For QC2,
only stations with at least 30 rainfalls (over the whole station record)
having cumulative gamma probability greater than or equal to 0.98 were
scored. For QC3, only stations with at least 30 total tropical storm
days (over the whole station record) were scored.
For each test, certain ratios were calculated (“raw” score) for each
station and a standardized anomaly was calculated for the ratio (the z-score
standardization method was used). For example, one of the QC1 tests
is the ratio of the number of years with no significant difference in zero
percentage to the total number of years (only years with a minimum of 140
records were considered). This ratio was found for each station and
then standardized (“S.A.” columns in the final_scores.xls table stand for
“standardized anomaly”). (NOTE: the QC3 score is the result of two
tests with the first QC3 test’s S.A. receiving double weight since the
accuracy of the tropical storm model is not known. The second QC3
test defines significant differences between expected and reported rains
as those which have cumulative Normal probabilities in the 10% of each
tail of the Normal distribution.).
The final score for each station is the average of each QC with the
following weights: QC1=4, QC2=1, QC3=1. The weight for QC1 was arbitrarily
chosen to be 4 since it is the most conclusive test. When less than
3 of the QC scores were known, either the average (keeping the same weights
above) or just the single QC score became the final score. For a
station to receive a final score, it had to have at least 5 years with
a minimum of 140 records each.
For stations which did receive a final score due to there being too
little data, a “NaN” (“not a number” – kept this way for convenient reading
into a numerical matrix) was indicated. NOTE: This is a subjective
scoring system and it must be emphasized that stations with a relatively
low score are not necessarily unreliable. Also, If you are planning
to use data from a particular station which has a relatively high score,
it is still a good idea to examine the QC files listed above. For
example, while a station may have a decent score, it may be found in the
file dow_table.xls that there is clearly a suspicious rainfall pattern
over a 3-year period out of it’s 20-year record.