The Photogrammetric Record 23(122): 148–169 (June 2008)
ACCURACY ASSESSMENT OF LIDAR-DERIVED DIGITAL
ELEVATION MODELS
Fernando J. Aguilar (faguilar@ual.es)
University of Almeria, Spain
Jon. P. Mills (j.p.mills@newcastle.ac.uk)
Newcastle University
(Based on a contribution to the Annual Conference of the Remote Sensing and
Photogrammetry Society at Newcastle upon Tyne, 14th September 2007)
Abstract
Despite the relatively high cost of airborne lidar-derived digital elevation models
(DEMs), such products are usually presented without a satisfactory associated
estimate of accuracy. For the most part, DEM accuracy estimates are typically
provided by comparing lidar heights against a finite sample of check point coordinates
from an independent source of higher accuracy, supposing a normal distribution of the
derived height differences or errors. This paper proposes a new methodology to assess
the vertical accuracy of lidar DEMs using confidence intervals constructed from a
finite sample of errors computed at check points. A non-parametric approach has been
tested where no particular error distribution is assumed, making the proposed
methodology especially applicable to non-normal error distributions of the type
usually found in DEMs derived from lidar. The performance of the proposed model
was experimentally validated using Monte Carlo simulation on 18 vertical error datasets. Fifteen of these data-sets were computed from original lidar data provided by the
International Society for Photogrammetry and Remote Sensing Working Group III/3,
using their respective filtered reference data as ground truth. The three remaining
data-sets were provided by the Natural Environment Research Council’s Airborne
Research and Survey Facility lidar system, together with check points acquired using
high precision kinematic GPS. The results proved promising, the proposed models
reproducing the statistical behaviour of vertical errors of lidar using a favourable
number of check points, even in the cases of data-sets with non-normally distributed
residuals. This research can therefore be considered as a potentially important step
towards improving the quality control of lidar-derived DEMs.
Keywords: accuracy assessment, confidence intervals, DEM, lidar
Introduction
Accurate digital elevation models (DEMs) of high spatial resolution from airborne lidar
data are in increasing demand for a growing number of mapping and GIS tasks related to a
wide variety of applications including forest management, urban planning, bird population
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
Blackwell Publishing Ltd. 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street Malden, MA 02148, USA.
The Photogrammetric Record
modelling, ice sheet mapping, flood control and road design (Lim et al., 2003). Lidar DEMs
are also being increasingly used for new applications relating to change detection and
geopositioning (James et al., 2006; Rodarmel et al., 2006; Miller et al., 2007). However,
despite the growing range of applications, and in spite of the relatively high cost of this new
type of digital product, such DEMs are usually presented without an appropriate associated
estimate of their accuracy. Indeed, with a popular estimate being that 80% of all data used by
managers and decision-makers is spatially referenced (van Oort and Bregt, 2005), many
researchers have stressed the need to deal with issues of spatial quality. A recent,
comprehensive review regarding the importance of assessing error in DEMs can be found in
Fisher and Tate (2006).
In most cases, users of lidar-derived DEMs are content with some nominal specifications
about raw data accuracy, as supplied by the instrument manufacturer or the data provider. Such
statistics are often inappropriate given that the major component of the total error could well be
the error introduced during the processing of the raw data (data filtering, gridding,
segmentation and/or object reconstruction), rather than the error induced during the data
capture process (which is related to vertical and horizontal error in the positioning of the laser
platform, laser scan angle, surface reflectivity and terrain slope). In fact, gridding error can
comprise a very important, and often neglected, source of error which should be taken into
account (Smith et al., 2005; Aguilar et al., 2006). For example, it may be especially relevant for
canopy height model estimation in forestry applications where there are fewer ground-return
samples for effective DEM surface interpolation (Lim et al., 2003; Clark et al., 2004;
Hopkinson et al., 2004; Su and Bork, 2006).
Although there have been a number of studies dealing with lidar-derived DEM error,
most can be classified as empirical work in which the influence of different variables on
DEM error was analysed (Hopkinson et al., 2004; Hyyppä et al., 2005; Goodwin et al.,
2006; Göpfert and Heipke, 2006; Su and Bork, 2006). In practice, statistical tests are the
only way to ensure that requirements are met at a moderate cost by using an inference
process based on sampling theory. At present, the great majority of DEM accuracy standards
are based on computing the vertical accuracy of a finite sample data-set (check points) from
the differences (residuals) between data-set heights and height values from an independent
source of higher accuracy, usually obtained by differential GPS techniques and supposing
those residuals follow a normal distribution. This is the case, for example, for the ‘‘National
Standards for Spatial Data Accuracy’’ (NSSDA) document published by the US Federal
Geographic Data Committee (FGDC, 1998). However, it should be recognised that vertical
errors in lidar DEMs often follow a non-normal distribution when they are captured over
non-open terrain (for instance, over vegetated or built-up areas) and, therefore, both filtering
and gridding can introduce a large quantity of non-random noise (Flood, 2004; Hopkinson
et al., 2004; Göpfert and Heipke, 2006), presenting high kurtosis and skewness due to the
presence of systematic errors and outliers. The presence of systematic errors is relatively
frequent in lidar-derived DEMs of forest areas where non-ground samples (such as low
vegetation and logs) are included in the DEM interpolation process and the resulting mean
error tends to be positive. In such cases a lidar-derived DEM tends to overestimate the
reference elevation (Clark et al., 2004).
Concerned about the likely non-normal nature of lidar-derived DEM error under non-open
terrain, the American Society for Photogrammetry and Remote Sensing (ASPRS), through the
work of the ASPRS Lidar Committee, recently approved a new set of guidelines (Flood, 2004).
The guidelines recommend that for open terrain, where error distribution is thought to be close
to a normal distribution, the methodology proposed by the NSSDA should be adopted. That is,
for assessing vertical accuracy at 95% confidence level:
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
149
Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models
Vertical accuracy ¼ 196rmsez
ð1Þ
where rmse is the root mean square error. The resulting figure is known as the
‘‘fundamental’’ vertical accuracy. Away from open terrain, the fundamental vertical
accuracy is replaced by the so-called ‘‘supplemental’’ vertical accuracy, tested using the
95th percentile method. The 95th percentile method may be used regardless of whether
or not the errors follow a normal distribution and whether or not errors qualify as
outliers, indicating that 95% of the errors in the data-set will have absolute values of
equal or lesser value and 5% of the errors will be of larger value. Regarding the number
of necessary check points, the ASPRS recommends a minimum of 20 check points (30
preferred) in each of the major land cover categories represented in the surveyed area.
This gives rise to the question of exactly how many check points should be used in
order to guarantee that a certain confidence level is achieved. Conversely, it may be
necessary to derive the reliability of fundamental and supplemental vertical
accuracy estimates when a particular number of check points is used. Deriving
definitive values is not easy, and the ASPRS guidelines do not offer advice with respect
to these issues.
Addressing the aforementioned issues, the main objective of this paper is the development
of a theoretical methodology to assess lidar DEM vertical accuracy. Vertical accuracy is
estimated by means of the construction of confidence intervals from a finite sample of residuals
computed at check points evenly distributed across the whole survey area. One of the goals is
to derive the minimum number of check points needed to guarantee a certain confidence level.
In this way, a non-parametric approach based on the theory of estimating functions has been
adopted and tested. This approach means that no particular distribution of residuals at check
points is assumed and so it is applicable to non-normal error distributions which are usually
found in lidar-derived DEMs.
Development of the Proposed Models
The models tested in the work reported here concern two different scenarios:
(1) Construction of confidence intervals relating to the distribution of lidar vertical errors
from a finite sample of residuals coming from a population not necessarily expected to
be normally distributed.
(2) Construction of confidence intervals for the assessment of an average error statistic
for which the meaning will easily be understood, both on the part of the user and the
producer of lidar-derived DEMs. In this case rmse, as a widely used average error
statistic, was chosen.
Both of the contemplated scenarios avoid the very common assumption of normally
distributed errors, as adopted in the majority of current vertical accuracy standards.
Model for Estimating Lidar Vertical Accuracies
Suppose that the difference between a lidar-derived DEM and ground truth is a random
variable X. This being the case, a sample of size N height differences may be used to estimate
the mathematical expectation (mean value, l) and dispersion (standard deviation, r) of the said
height differences for the whole surface (X = {x1, x2, …, xN}). This is the procedure adopted by
the NSSDA standards and the ASPRS Lidar Committee, as was previously discussed.
150
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
The Photogrammetric Record
The first step is to obtain a confidence interval for l by means of statistical inference,
taking into account the likely non-normal nature of the errors. It is therefore necessary to
determine the confidence interval using a non-parametric approach where no particular
distribution is assumed. This issue can be resolved using the theory of estimating functions, as
demonstrated by Aguilar et al. (2007a). An outline of the main steps in deriving equations (2)
and (3) can also be found in the Appendix located at the end of this paper.
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ta ðc2m þ2Þðc2m þ2c21m Þ
c2m þ2
c2m þ2
þ1
þ4
c1m þ
c1m
jc1m j
r
pffiffiffiffi
lupper ¼ x þ
ð2Þ
2
N
c2m þ2
c1m
llower ¼ x þ
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
ta ðc2m þ2Þðc2m þ2c21m Þ
c2m þ2
þ4
þ
1
c
jc j
1m
1m
2
r
pffiffiffiffi
N
ð3Þ
where x is the sample mean, N is the sample size, r is the value of the population standard
deviation (which will be estimated by sample standard deviation Sd), and finally c1m and c2m
are the skewness and standardised kurtosis of the sampling distribution of the sample mean,
respectively. Skewness and standardised kurtosis can be estimated from a finite sample by
means of the following expressions:
P
N Ni¼1 ðxi xÞ3
c
; c1m ¼ p1ffiffiffiffi
ð4Þ
c1 ¼
3
ðN 1ÞðN 2ÞSd
N
c2 ¼
P
N ðN þ 1Þ Ni¼1 ðxi xÞ4
3ðN 1Þ2
c
; c2m ¼ 2 :
4
ðN 2ÞðN 3Þ
N
ðN 1ÞðN 2ÞðN 3ÞSd
ð5Þ
The use of ta, a one-tailed critical value corresponding to a confidence level of 1 ) a,
instead of za should be noted in equations (2) and (3). This change is recommended when
estimating confidence intervals using the sample standard deviation instead of r, especially
when working with small samples (lower than 100 observations). In this case it is more
appropriate to employ Student’s t-distribution calculating ta values for N ) 1 degrees of
freedom, N being the sample size (Steel and Torrie, 1980).
Once the confidence interval for the mean is determined, which therefore includes the
disturbance effects of error bias, skewness and kurtosis, the second step implies treating data as
if it came from a normal population. Under this hypothesis the following expressions can be
written to estimate the maximum and minimum expected vertical error (xupper and xlower,
respectively) for a 1 ) a confidence level:
xupper ¼ lupper þ ta Sd
ð6Þ
xlower ¼ llower ta Sd:
ð7Þ
Systematic errors are not assumed to have been removed and so the mean is not
necessarily expected to be zero. This phenomenon is frequently encountered when working
with lidar data in afforested areas where it is usual to find a positive bias due to dense low lying
vegetation beneath the tree canopy (Kraus and Pfeifer, 1998; Goodwin et al., 2006; Göpfert
and Heipke, 2006; Su and Bork, 2006).
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
151
Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models
Model for Estimating Lidar-Derived DEM Accuracy Using Rmse
In this case it is necessary to construct a confidence interval to estimate
uncertainty when rmse is employed to evaluate lidar-derived DEM vertical accuracy,
again supposing a non-normal error distribution. The approach proposed by Aguilar et al.
(2007a), based on estimating functions theory, was again adopted to obtain the following
equations:
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u
2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u
ta ðc2mse þ2Þðc2mse þ2c21mse Þ
c2mse þ2
c2mse þ2
u
þ
þ4
þ
1
u
c1mse
c1mse
jc1mse j
t
ð8Þ
rmse
rmseupper ¼ mse þ
2
rmselower
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u
2 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
u
ta ðc2mse þ2Þðc2mse þ2c21mse Þ
c2mse þ2
c2mse þ2
u
þ1
þ4
u
c1mse
c1mse
jc1mse j
t
rmse
¼ mse þ
2
ð9Þ
where mse is the mean square error, rmse is the standard deviation of the mse sampling
distribution, and finally c1mse and c2mse are mse skewness and standardised kurtosis,
respectively. All described central moments for square errors may be estimated from a
sample of size N check points by employing the following expressions:
mse ¼
Sdx2 ¼
c1x2
c2x2 ¼
N
P
vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
uN
uP 2
u ðxi mseÞ2
ti¼1
N
x2i
i¼1
N
r2
Sdx2
; rmse ¼ pxffiffiffiffi pffiffiffiffi
N
N
P
c1x2
N Ni¼1 ðx2i mseÞ3
p
ffiffiffiffi
¼
;
c
¼
1mse
ðN 1ÞðN 2ÞSd3x2
N
P
c 2
N ðN þ 1Þ Ni¼1 ðx2i mseÞ4
3ðN 1Þ2
; c2mse ¼ 2x :
4
N
ðN 1ÞðN 2ÞðN 3ÞSdx2 ðN 2ÞðN 3Þ
ð10Þ
ð11Þ
ð12Þ
ð13Þ
Methodology
Test Data
The study sites concern lidar surveys of two different areas. The first area comprises the
data-sets from the second phase of the European Organisation for Experimental Photogram-
152
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
The Photogrammetric Record
metric Research (OEEPE—now European Spatial Data Research, EuroSDR) project on laser
scanning. Data of the second area was acquired in a series of lidar surveys carried out by the
Natural Environment Research Council’s Airborne Research and Survey Facility (NERC
ARSF).
Data-sets from the EuroSDR Project on Laser Scanning
As part of the second phase of the EuroSDR project on laser scanning, different lidar datasets, first and second pulse return, were acquired over the Vaihingen/Enz and Stuttgart city
centre (Germany) test sites using an Optech ALTM laser scanner. International Society for
Photogrammetry and Remote Sensing (ISPRS) Working Group III/3 has previously used these
data-sets for testing numerous filtering algorithms (Sithole and Vosselman, 2004) and data has
been made available through the working group’s website (http://www.commission3.isprs.org/
wg3). A total of seven study sites (four urban and three rural) were selected for testing the
aforementioned models, with sites chosen for their varied characteristics and diverse feature
content (open fields, dense buildings, large buildings, quarries, bridges, dense vegetation and
steep slopes, among others). The selected sites were appropriate to enable validation of the
proposed models against a very wide range of operational conditions. Fifteen sub-sites were
extracted from the original data (Table I).
For a better understanding of the nature of the data, a planimetric view of Site 2 and
corresponding reference sub-sites are depicted in Fig. 1. Reference data (ground truth) was
generated by manual filtering of the height data-set to segment bare earth data, the process
assisted by knowledge of the landscape and aerial photographs (see Sithole and Vosselman
(2003) for further details).
Data-sets from the NERC ARSF
Filey Bay is located approximately 11 kilometres south of the coastal town of
Scarborough in North Yorkshire, UK. Three lidar data-sets of Filey Bay were acquired using
Table I. Characteristics of the selected EuroSDR data-sets.
Site
Site 1
Site 2
Site 3
Site 4
Site 5
Site 6
Site 7
Point
spacing
Reference data
(sub-site)
Number of
points
1m
1m
1m
1m
1m
1m
1m
Sample
Sample
Sample
Sample
Sample
Sample
Sample
11
12
21
22
23
24
31
17 626
25 344
10 055
21 570
11 025
3755
15 544
1m
1m
2m
2m
2m
2m
2m
2m
Sample
Sample
Sample
Sample
Sample
Sample
Sample
Sample
41
42
51
52
53
54
61
71
1653
11 989
13 860
17 678
25 113
3948
31 531
12 786
Characteristics
Steep slopes, mixture of vegetation and buildings
on hillside, buildings on hillside
Large and irregularly shaped buildings, small tunnel,
road and bridge
Vegetation between a high density of buildings,
building with eccentric roof, open terrain with mixture
of low and high features
Railway station with trains (low density of ground points)
Steep vegetated slopes, quarry (sharp break lines),
vegetated river bank
Large buildings, road including embankments
Road, underpass, road including embankments
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
153
Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models
Sample 21
5403200
Sample 24
5403100
5403000
5402900
5402800
5402700
513500
Sample 22
513600
513700
513800
Sample 23
Fig. 1. Site 2 with corresponding reference sub-sites shown as perspective views.
an Optech ALTM 3033 scanner (operating at 1000 m flight altitude) in April 2005 (data-set
0405), August 2005 (0805) and May 2006 (0506); Miller et al. (2007) contains further details
of the surveys. All issues related to georeferencing were carried out by the NERC ARSF. As
independent quality control, delivered lidar data was compared against a testfield based on
check points acquired during a previous survey in 2001 using kinematic GPS (anticipated
vertical accuracy of around 1 to 2 cm). The test area consisted of a hard surface, an asphalt road
circuit, located at Filey Brigg Country Park (Fig. 2) that enabled an open terrain vertical
accuracy evaluation of the lidar data to be undertaken. In this case no error from filtering or
gridding is expected, and so all error can be assumed to arise from the data capture process.
Table II gives statistical parameters associated to the vertical errors computed for the Filey
Bay data-sets. It should be highlighted that error distribution generally fitted a normal
distribution except in the case of the 0506 lidar data, where the ‘‘3-sigma’’ rule (Daniel and
Tennant, 2001) had to be applied to remove three likely outliers. Notice that the Kolmogorov–
Smirnov (K-S) test (Royston, 1982), with a confidence level of 95%, demonstrated that the
error distribution for each data-set matched quite well a normal probability distribution because
the K-S statistic was always lower than the 95% critical value. Despite the 0805 lidar data-set
presenting an absolute value of skewness higher than the critical value of 0Æ5, proposed by
some authors as a reasonable limit to consider data to come from a normal distribution (Daniel
and Tennant, 2001), no outlier correction was applied because of the results yielded by the K-S
test. Therefore, these data-sets were subsequently used for testing the proposed model to
construct rmse confidence intervals from normally distributed lidar error data.
154
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
The Photogrammetric Record
Fig. 2. GPS check point testfield located at Filey Brigg Country Park.
Table II. Statistical characteristics of Filey Bay data-sets.
Data-sets
Number of check points
Minimum error (m)
Maximum error (m)
Mean error (m)
Standard deviation (m)
Skewness
Standardised kurtosis
K-S statistic
K-S critical value (95%)
0405 lidar
0805 lidar
0506 lidar
(raw)
0506 lidar
(3-sigma rule)
840
)0Æ16
0Æ03
)0Æ06
0Æ03
)0Æ04
0Æ12
0Æ026
0Æ047
833
)0Æ29
0Æ23
)0Æ16
0Æ05
1Æ24
9Æ07
0Æ038
0Æ047
834
)0Æ03
1Æ31
0Æ09
0Æ06
10Æ75
179Æ45
0Æ151
0Æ047
831
)0Æ03
0Æ20
0Æ08
0Æ04
0Æ19
0Æ00
0Æ029
0Æ047
Raw Data Processing of EuroSDR Data
Last-return raw EuroSDR lidar data was filtered to segment the ground surface from
vegetation, buildings and any gross errors embedded in the general point cloud (Axelsson,
1999). The filtering algorithm developed by Axelsson (2000) was used to segment ground
points by means of a progressive triangular irregular network (TIN) densification method
where the surface was allowed to fluctuate within certain values. Briefly, a sparse TIN is
derived from neighbourhood minima, and then progressively densified through the laser
point cloud. The algorithm has been implemented in TerraScan (integrated in Terrasolid,
version 007.004), the software used to carry out the filtering process. As the first step in the
filtering process, points regarded as too low or very high were removed as likely outliers.
Subsequently, the ground point class was filtered and saved as points belonging to the DEM.
Fig. 3 depicts an example of the results from the filtering process for Site 5. The
interpolation method used to refill any gaps created in the filtered ground model (gaps
resulting from the removal of points on buildings, vegetation and the like) was the
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
155
Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models
Fig. 3. Filtering process over Site 5. Top: last-return raw data. Bottom: last-return ground-filtered data.
Multiquadric Radial Basis Function working with the local support of the eight neighbours
closest to the interpolated point, since this has previously been found generally to yield
better results (Aguilar et al., 2005). The final grid spacing of the interpolated DEM was the
same as presented by the raw lidar data.
Lidar error was computed as the difference between the derived ground DEM heights
(obtained after applying Axelsson’s filtering) and the heights from the manually filtered
reference data for each of the 15 EuroSDR samples (as listed in Table I). It is important to note
that large DEM data gaps caused by the filtering process (for example, the removal of points on
large buildings) were not included in the error calculation in order to avoid the uncertainty in
areas where no reference data (ground truth) was available. Hence the error data-sets obtained
(Table III) can be considered very close to the errors found in real-world lidar surveys.
Furthermore, the 3-sigma rule was applied to remove outliers which can corrupt the true
statistical distribution of the errors. In fact, this technique is generally applied under operational
conditions for removing outliers in the DEM quality assessment of lidar data (Daniel and
Tennant, 2001). The percentage of data removed ranged between 0Æ5 and 3Æ5%, with a mean
value of 1Æ8%. These results are fairly consistent with those published by Torlegård et al.
(1986) referring to errors from photogrammetric DEMs. Nevertheless, despite applying the
3-sigma rule, corrected error data-sets still followed a non-normal distribution according to the
K-S 95% confidence test.
156
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
The Photogrammetric Record
Table III. Statistics for 3-sigma rule corrected error from EuroSDR data-sets.
Samples
11
12
21
22
23
24
31
41
42
51
52
53
54
61
71
Statistics
Points
Mean (m)
Sd (m)
c1
c2
K-S
Critical K-S
% data outliers
16 995
25 203
9742
21 193
10 871
3695
15 315
1626
11 743
13 701
17 368
24 702
3863
31 057
12 517
0Æ16
0Æ04
0Æ02
0Æ04
0Æ05
0Æ04
0Æ01
0Æ25
0Æ02
0Æ00
0Æ08
0Æ16
0Æ00
0Æ01
0Æ02
0Æ44
0Æ16
0Æ05
0Æ13
0Æ20
0Æ17
0Æ04
1Æ11
0Æ07
0Æ06
0Æ30
0Æ71
0Æ08
0Æ10
0Æ11
3Æ30
6Æ26
1Æ83
5Æ12
5Æ06
5Æ31
1Æ29
5Æ05
2Æ98
0Æ26
2Æ52
6Æ64
2Æ79
3Æ84
2Æ33
11Æ82
54Æ41
4Æ77
32Æ39
31Æ26
45Æ81
4Æ16
24Æ41
13Æ22
5Æ32
8Æ41
57Æ08
19Æ27
24Æ07
10Æ80
0Æ322
0Æ291
0Æ153
0Æ272
0Æ299
0Æ273
0Æ108
0Æ424
0Æ201
0Æ107
0Æ270
0Æ360
0Æ179
0Æ252
0Æ208
0Æ010
0Æ010
0Æ014
0Æ009
0Æ013
0Æ022
0Æ011
0Æ034
0Æ013
0Æ012
0Æ010
0Æ009
0Æ022
0Æ008
0Æ012
3Æ5
0Æ5
3Æ1
1Æ7
1Æ4
1Æ6
1Æ4
1Æ6
2Æ0
1Æ1
1Æ7
1Æ6
2Æ1
1Æ5
2Æ1
Experimental Validation of the Proposed Models
The Monte Carlo method was employed for model validation. Monte Carlo numerical
simulation is based on stochastic techniques, meaning it is based on the use of random numbers
and probability statistics, to investigate complex problems without relying on analytical
equations (Weir, 2002). For the first model evaluated, described through equations (2) to (7),
the simulation started by extracting a sample of residuals (sample X of size N) from each of the
EuroSDR data-sets by means of random sampling. In this case N took the values of 20, 40, 60,
80 and 100 check points. The model parameters were then computed and the bounds of the
interval, xupper and xlower, were obtained for a 1 ) a confidence level. The number of errors
from the whole data-set (population) which were within the bounds estimated by the model
was then calculated. It must be noted that the percentage of errors included within the bounds
should be close to 95% if, for instance, a significance level of a = 0Æ05 was chosen. This
procedure was repeated 1000 times to estimate the mean value of the percentage of errors from
the population falling within the bounds and its associated reliability measure (coefficient of
variation). Over the same simulated samples, and purely for comparison purposes, fundamental
and supplemental vertical accuracies were also computed according to the ASPRS guidelines
for lidar data (Flood, 2004), their corresponding mean values and associated reliabilities also
being obtained for 1000 runs.
For the evaluation of the second model, described through equations (8) to (13), the
simulation procedure was quite similar to the first, with the exception that the estimated bounds
were rmse values and the sample size increased in increments of 20, from 20 to 200 check
points. The population rmse for every data-set was calculated from all residual values available
in each. It was then necessary to determine whether the corresponding rmse population fell
within the limits of the calculated confidence intervals and whether the model had correctly
predicted the expected uncertainty for data-set estimation. In this case the procedure was run
2000 times because it was computationally faster than for the first model.
Note that because samples were extracted from finite populations, a correcting coefficient
must be applied to calculate sample variance in order to correct the reduced variability
expected on consecutive random samplings extracted from the same finite residual population
(Steel and Torrie, 1980). Therefore, the following equation was applied:
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
157
Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models
Sd
Sdm ¼ pffiffiffiffi
N
rffiffiffiffiffiffiffiffiffiffiffiffiffiffi
M N
M
ð14Þ
where M is the number of residuals, N is the sampling size, Sd and Sdm are the standard
deviation of the residuals sample and the mean sample, respectively.
Results and Discussion
Confidence Intervals for Lidar Vertical Accuracy Assessment
40
30
20
10
40
60
80
100
75
0
20
40
30
20
10
80
100
Percentage of errors within
computed intervals
50
60
0
120
40
30
20
10
80
100
Number of check points
Fundamental vertical accuracy
Supplemental vertical accuracy
95% population percentile
Reliability for fundamental accuracy calculation
Reliability for supplemental accuracy calculation
0
120
Percentage of errors within
computed intervals
50
Reliability (%)
Accuracy (m)
60
60
100
5
4·5
4
3·5
3
2·5
2
1·5
1
0·5
0
120
90
85
80
75
0
20
40
60
80
100
70
40
80
Number of check points
1·5
1·35
1·2
1·05
0·9
0·75
0·6
0·45
0·3
0·15
0
20
60
95
Number of check points
0
40
100
60
Reliability (%)
Accuracy (m)
70
40
100
80
Number of check points
1·5
1·35
1·2
1·05
0·9
0·75
0·6
0·45
0·3
0·15
0
20
5
4·5
4
3·5
3
2·5
2
1·5
1
0·5
0
120
85
Number of check points
0
100
90
Reliability (%)
20
95
95
90
85
80
75
0
20
40
60
80
Reliability (%)
0
0
120
5
4·5
4
3·5
3
2·5
2
1·5
1
0·5
0
120
Reliability (%)
100
60
50
Percentage of errors within
computed intervals
70
1·5
1·35
1·2
1·05
0·9
0·75
0·6
0·45
0·3
0·15
0
Reliability (%)
Accuracy (m)
Fig. 4 shows the performance of the proposed model compared against results achieved
using the ASPRS guidelines (Flood, 2004). Only results for samples 11, 52 and 61 are given,
although results are similar for all 15 samples.
Number of check points
Mean percentage of errors within the computed interval (95%
confidence level) for the proposed model
Reliability for the computed intervals calculation
Fig. 4. Evaluation results for samples 11 (top graphs), 52 (middle) and 61 (bottom) from the EuroSDR data-set.
Left column: accuracy following ASPRS guidelines. Right column: proposed model.
158
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
The Photogrammetric Record
Results calculated using the fundamental accuracy assessment, as defined in equation (1)
supposing a normal distribution of the error, were usually far from the true vertical accuracy
computed for the error population. This is to be expected, taking into account the non-normally
distributed nature of lidar error data-sets in non-open terrain, as previously confirmed from
results yielded by the K-S test given in Table III. Therefore, following the recommendations of
the ASPRS guidelines, the supplemental vertical accuracy test figures should be considered
and computed using the 95th percentile method, as explained in the introduction. In this case
supplemental accuracy always overestimated the true value, but closed in to the truth when the
sample size increased.
The problem with both fundamental and supplemental vertical accuracy is that the
variability of the estimates was very noticeable, especially in the case of the latter (always
above ±20% even when using up to 100 check points). Hence many more than 100 check
points would have to be employed to ensure a confidence level near to the target value of 95%.
However, the 95% confidence intervals computed by means of the proposed model, developed
through equations (2) to (7), achieved a percentage of population errors within the bounds very
close to the target value, even when working with only 20 check points. Better still, the
variability of the estimates was very low, as can be seen in Fig. 4. Summing up, only around 60
check points would be needed to reach a confidence level of 95% with an estimate error below
±2Æ5%.
Fig. 5 shows confidence interval limits (the mean of 1000 runs) computed from the
proposed model for the three samples referred to in Fig. 4, varying randomly the initial finite
sample of check points. The stability of the confidence interval length along the horizontal axis
(sample size) should be highlighted. The confidence interval length basically depends on the
sample standard deviation, skewness and standardised kurtosis, rather than the number of
check points used. Therefore, precise and narrow confidence intervals can be calculated when
working with as few as 60 check points. Notice that unlike the case for a normal distribution,
the confidence intervals are not symmetrical around the mean value, but they are biased
towards the direction of skewness.
The importance of the development of this simple and understandable model for accuracy
estimation is supported by the fact that, traditionally, users are known to be failing to quantify
risks due a lack of tools, theory and poor documentation on spatial data quality (van Oort and
Bregt, 2005). The presentation of clear upper and lower limits rather than a single accuracy
value may substantially improve the understanding of accuracy test results by users of lidar
data and enhance communication with data providers. Furthermore, it is worth noting that the
proposed model, unlike the ASPRS methodology, does not assume any previous lidar error
distribution since all required information is extracted from the sample by means of a nonparametric approach. Thus, the so-called ‘‘strong assumption’’ of distribution normality
accepted in many previous works, and in the majority of the current standards for positional
accuracy, is avoided.
Confidence Intervals for Rmse Estimate
Figs. 6 and 7 show the performance of the model tested to compute rmse 95% confidence
intervals as well as another model based on the Student’s t asymptotic approach. The Student’s
t model has been formulated under the hypothesis of an almost-normal distribution of lidar
vertical error and is given by the following expressions:
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
rmseupper ¼ mse þ ta=2 rmse
ð15Þ
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
159
Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models
Error value (m)
Upper limit
Lower limit
Population mean
1·4
1·2
1
0·8
0·6
0·4
0·2
0
–0·2
–0·4
–0·6
–0·8
0
20
40
60
80
100
120
100
120
Error value (m)
Number of check points
1·4
1·2
1
0·8
0·6
0·4
0·2
0
–0·2
–0·4
–0·6
–0·8
0
20
40
60
80
Error value (m)
Number of check points
1·4
1·2
1
0·8
0·6
0·4
0·2
0
–0·2
–0·4
–0·6
–0·8
0
20
40
60
80
100
120
Number of check points
Fig. 5. Confidence intervals (95% confidence level) as a function of sample size computed, using the proposed
model, for samples 11 (top graph), 52 (middle) and 61 (bottom) from the EuroSDR data-set.
rmselower ¼
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
mse ta=2 rmse
ð16Þ
where mse is the mean square error given by equation (10), rmse is the standard deviation of
the mse sampling distribution (equation (11)) and ta=2 is the two tails critical value
corresponding to confidence level a for N ) 1 degrees of freedom, N being the sample size
(see Aguilar et al. (2007a) for further details). In the case of error data coming from nonopen terrain, such as those from the EuroSDR reference data-sets (Fig. 6), it seems to be clear
160
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
100
95
90
85
80
75
70
65
60
55
1
0·9
0·8
0·7
0·6
0·5
0·4
0·3
0·2
0·1
0
P(%)
Rmse (m)
The Photogrammetric Record
0
20
40
60
80
100 120 140 160 180 200 220
0
20
40
1
0·9
0·8
0·7
0·6
0·5
0·4
0·3
0·2
0·1
0
Number of check points
100
95
90
85
80
75
70
65
60
55
0
20
40
60
80 100
120 140 160 180 200 220
0
20 40
Number of check points
1
0·9
0·8
0·7
0·6
0·5
0·4
0·3
0·2
0·1
0
60 80 100 120 140 160 180 200 220
Number of check points
100
95
90
85
80
75
70
65
60
55
P(%)
Rmse (m)
60 80 100 120 140 160 180 200 220
P(%)
Rmse (m)
Number of check points
0
20
40
60
80 100
120 140 160 180 200 220
Number of check points
Upper proposed model
Population rmse
Lower proposed model
Upper t-Student
0
20
40
60 80 100 120 140 160 180 200 220
Number of check points
Proposed model
t-Student
Lower t-Student
Fig. 6. Rmse evaluation results for samples 11 (top graphs), 52 (middle) and 61 (bottom) from the EuroSDR dataset. Left column: rmse confidence intervals (95% level) computed from the proposed model and the Student’s
t approach. Right column: percentage of times population rmse falls within intervals computed for the proposed
model and the Student’s t approach.
that the Student’s t approach failed to reach the expected target value of 95% confidence, even
when using sample sizes of up to 200 check points (Fig. 6, right). Conversely, the model tested
here can be considered successful when using anything upwards from around 60 to 160 check
points, 100 being the average for all 15 EuroSDR data-sets (see Fig. 9).
Notice that the length of the confidence intervals (Fig. 6, left) is narrower in the case of
the Student’s t model because the residual population has been assumed to be normal, and so
standardised kurtosis and skewness have been set to zero (no outliers or systematic errors
present). This is an unreal scenario which is overcome by means of the assumption-free nonparametric model. It is recognised that whilst non-normal, fairly symmetrical population
shapes converge to normal looking sampling distributions with relatively small samples,
strongly skewed distributions require much larger samples for the approximation to be good
when a normal distribution is supposed. Thus the Student’s t approach is expected to require a
large number of check points when working with a population that exhibits high skewness.
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
161
0·3
0·28
0·26
0·24
0·22
0·2
0·18
0·16
0·14
0·12
0·1
0·08
0·06
0·04
100
95
P (%)
Rmse (m)
Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models
90
85
80
0
20
40
60
80
100
120
140
160
180
0
200 220
20
40
100 120 140 160 180 200 220
95
P (%)
Rmse (m)
80
100
0·3
0·28
0·26
0·24
0·22
0·2
0·18
0·16
0·14
0·12
0·1
0·08
0·06
0·04
90
85
80
0
20
40
60
80
100
120
140
160
180
200
0
220
20
40
Number of check points
60
80
100 120 140 160 180 200 220
Number of check points
100
0·3
0·28
0·26
0·24
0·22
0·2
0·18
0·16
0·14
0·12
0·1
0·08
0·06
0·04
95
P (%)
Rmse (m)
60
Number of check points
Number of check points
90
85
80
0
20
40
60
80
100
120
140
160
180
200
Number of check points
Upper proposed model
Upper t-Student
Lower proposed model
Lower t-Student
220
0
20
40
60
80
100 120 140 160 180 200 220
Number of check points
Population rmse
Proposed model
t-Student
Fig. 7. Rmse evaluation results for 0405 lidar (top graphs), 0805 lidar (middle) and 3-sigma rule corrected 0506
lidar (bottom) from the NERC ARSF data-set. Left column: rmse confidence intervals (95% level) computed from
the proposed model and Student’s t approach. Right column: percentage of times population rmse falls within
intervals computed for the proposed model and Student’s t approach.
Results from the NERC ARSF test data (Fig. 7), however, demonstrated that the Student’s
t approach may be deemed appropriate when dealing with normal lidar vertical error
distributions which are usually found over open terrain (no filtering error). It should be noted
that the three NERC ARSF error data-sets clearly followed a normal distribution after being
filtered using the 3-sigma rule, as can be observed from the K-S test results in Table II. In this
case, results indicate that working with only a few check points, around 20 to 40, would be
sufficient. In this instance the proposed model computed confidence intervals that were too
wide and illustrated a somewhat erratic behaviour when working with a small number of check
points. Indeed, despite the wide mean confidence interval that was calculated, the number of
times the population rmse fell within the computed interval limits was found to be very poor
until a sample size of between 60 and 80 check points was reached. This was mainly due to the
high level of uncertainty in estimating the kurtosis value from a small sample size (Aguilar et
al., 2007b). The Student’s t model estimates only mean square error and the standard deviation
of the mean square error sampling distribution, whose values are more reliable than kurtosis
and skewness when estimated from small sample sizes.
162
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
Check points needed to obtain the
required confidence level
The Photogrammetric Record
170
150
130
R2 = 0·9515
110
R2= 0·9526
90
70
50
30
0
10
20
30
40
50
60
Standardised kurtosis
Fig. 8. Relationship between check points needed to reach a determined confidence level for the computed rmse
bounds (circles 95%; triangles 90%) and the residual population kurtosis. Data from the 15 EuroSDR samples.
Determining the Minimum Required Number of Check Points
The nature of the relationship between the number of check points needed to reach a
determined confidence level for the computed rmse bounds and the residual population
kurtosis is depicted in Fig. 8. A second-order polynomial has been fitted to the experimental
data to provide a better understanding of the non-linear increase in the number of required
check points when errors come from a more leptokurtic data-set. The finding can be considered
as logical since high kurtosis implies more errors are likely to be outliers, and so more check
points are required for the construction of a reliable confidence interval. It should be stressed
that when less stringent confidence levels are required (for instance, reducing confidence from
95 to 90%, as illustrated in Fig. 8), a smaller number of check points will be needed to
construct a reliable interval according to the chosen confidence level. Kurtosis therefore seems
to be the key to determining the required number of check points in order to achieve a certain
confidence level in the uncertainty of the accuracy estimate. From these results, further work
should be undertaken to determine some morphological terrain and landscape indicators
(terrain roughness, landscape spatial distribution and arrangement of features such as buildings
Check points needed to obtain the
required confidence level
180
160
140
120
Maximum value
100
Minimum value
80
Mean value
60
40
20
0
75%
80%
85%
90%
95%
100%
Confidence level
Fig. 9. Relationship between the number of check points needed to reach a determined confidence level and the
desired confidence level. Maximum, minimum and mean values extracted from the 15 EuroSDR samples and for
every confidence level tested.
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
163
Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models
and vegetation) to help predict the expected degree of kurtosis in the error budget after
applying lidar data filtering and gridding processes.
Currently, the ASPRS guidelines specify that accuracies should be reported at the 95%
confidence level. However, some researchers report applications related to geopositioning tasks
which typically demand a 90% or even lower confidence level (Rodarmel et al., 2006).
Therefore, rather than require 95%, the proposed method allows the user to determine the
confidence level depending on the application of the data. Changing the confidence level is as
simple as changing the significance level in the developed equations. Fig. 9 clearly shows that
fewer check points are required when the degree of confidence is relaxed, although the
relationship is not linear. In fact, in the worst case (maximum values in Fig. 9), the number of
check points required for computing rmse bounds at the 95% confidence level would be 160,
whilst it would be only 110 for an 80% confidence level.
Conclusions
From the results obtained through this research the following conclusions can be drawn:
(1) The model proposed to compute confidence intervals for lidar vertical error has
proved to be advantageous compared to the methodology recommended by the
ASPRS Lidar Committee for non-open terrain (non-normal error distribution). The
proposed methodology works quite well under open and non-open terrain conditions
because input parameters such as skewness and kurtosis allow the model to seek the
statistical nature of the error data-set. Furthermore, reliable and narrow confidence
intervals can be calculated when working from as few as 60 check points. This is
because interval length depends basically on the sample standard deviation, skewness
and standardised kurtosis, rather than the number of check points used.
(2) The proposed methodology not only estimates confidence intervals for lidar vertical
error, but also computes uncertainty in rmse estimates, which is usually better
understood by DEM users and data providers. In this case the model developed by
Aguilar et al. (2007a) is recommended when lidar error in non-open terrain is
evaluated. Depending on the kurtosis of error population, between 60 and 160
check points (around 100 on average) could be required. However, lidar accuracy
assessment via rmse in open terrain (normal error distribution) would be more successfully achieved by means of the Student’s t approach because it needs fewer
check points than the proposed model. In fact, only 20 to 40 check points would be
required in this case.
(3) Bearing in mind the variability of the data-sets tested, the results have proved to be
very promising, the proposed models reproducing the statistical behaviour of lidar
vertical errors using a reasonably low number of check points, even in the case of the
more non-normally distributed residual data-sets. This research can therefore be
considered as an important step towards improving and understanding the quality
control of lidar-derived DEMs.
Acknowledgments
The authors are grateful to the Spanish Government for financing the first author’s
fellowship at Newcastle University through the Spanish Research Mobility Programme
‘‘Estancias de profesores e investigadores españoles en centros de ensenanza superior e
investigación extranjeros’’. Thanks are due to the NERC ARSF and Pauline Miller, Newcastle
164
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
The Photogrammetric Record
University, for assisting with the data from Filey Bay and also to ISPRS WG III/3 ‘‘3D
Reconstruction from Airborne Laser Scanner and InSAR Data’’ for facilitating access to the
EuroSDR lidar data.
references
Aguilar, F. J., Agüera, F., Aguilar, M. A. and Carvajal, F., 2005. Effects of terrain morphology, sampling
density, and interpolation methods on grid DEM accuracy. Photogrammetric Engineering & Remote Sensing,
71(7): 805–816.
Aguilar, F. J., Aguilar, M. A., Agüera, F. and Sánchez, J., 2006. The accuracy of grid digital elevation
models linearly constructed from scattered sample data. International Journal of Geographical Information
Science, 20(2): 169–192.
Aguilar, F. J., Aguilar, M. A. and Agüera, F., 2007a. Accuracy assessment of digital elevation models using
a non-parametric approach. International Journal of Geographical Information Science, 21(6): 667–686.
Aguilar, F. J., Agüera, F. and Aguilar, M. A., 2007b. A theoretical approach to modeling the accuracy
assessment of digital elevation models. Photogrammetric Engineering & Remote Sensing, 73(12): 1367–1380.
Axelsson, P., 1999. Processing of laser scanner data—algorithms and applications. ISPRS Journal of Photogrammetry and Remote Sensing, 54(2/3): 138–147.
Axelsson, P., 2000. DEM generation from laser scanner data using adaptive TIN models. International Archives of
Photogrammetry and Remote Sensing, 33(B4): 110–117.
Clark, M. L., Clark, D. B. and Roberts, D. A., 2004. Small-footprint lidar estimation of sub-canopy elevation
and tree height in a tropical rain forest landscape. Remote Sensing of Environment, 91(1): 68–89.
Daniel, C. and Tennant, K., 2001. DEM quality assessment. Digital Elevation Model Technologies and
Applications: The DEM Users Manual (Ed. D. F. Maune). American Society for Photogrammetry and
Remote Sensing, Bethesda, Maryland. 539 pages: 395–440.
FGDC, 1998. Geospatial positioning accuracy standards. National standards for spatial data accuracy. http://
www.fgdc.gov/standards/projects/FGDC-standards-projects/accuracy/part3/chapter3 [Accessed: 23rd August
2007].
Fisher, P. F. and Tate, N. J., 2006. Causes and consequences of error in digital elevation models. Progress in
Physical Geography, 30(4): 467–489.
Flood, M., 2004. ASPRS guidelines. Vertical accuracy reporting for lidar data. http://www.asprs.org/society/
divisions/ppd/standards/Lidar%20guidelines.pdf [Accessed: 23rd August 2007].
Godambe, V. P. (Ed.), 1991. Estimating Functions. Oxford University Press, Oxford. 356 pages.
Godambe, V. P. and Thompson, M. E., 1989. An extension of quasi-likelihood estimation. Journal of Statistical
Planning and Inference, 22(2): 137–152.
Goodwin, N. R., Coops, N. C. and Culvenor, D. S., 2006. Assessment of forest structure with airborne LiDAR
and the effects of platform altitude. Remote Sensing of Environment, 103(2): 140–152.
Göpfert, J. and Heipke, C., 2006. Assessment of LiDAR DTM accuracy in coastal vegetated areas. International
Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 36(3): 79–85.
Hopkinson, C., Chasmer, L. E., Zsigovics, G., Creed, I. F., Sitar, M., Treitz, P. and Maher, R. V., 2004.
Errors in LiDAR ground elevation and wetland vegetation height estimates. International Archives of
Photogrammetry, Remote Sensing and Spatial Information Sciences, 36(8/W2): 108–113.
Hyyppä, H., Yu, X., Hyyppä, J., Kaartinen, H., Kaasalainen, S., Honkavaara, E. and Rönnholm, P.,
2005. Factors affecting the quality of DTM generation in forested areas. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 36(3/W19): 85–90.
James, T. D., Murray, T., Barrand, N. E. and Barr, S. L., 2006. Extracting photogrammetric ground control
from lidar DEMs for change detection. Photogrammetric Record, 21(116): 312–328.
Kraus, K. and Pfeifer, N., 1998. Determination of terrain models in wooded areas with airborne laser scanner
data. ISPRS Journal of Photogrammetry and Remote Sensing, 53(4): 193–203.
Lim, K., Treitz, P., Wulder, M., St-Onge, B. and Flood, M., 2003. LiDAR remote sensing of forest structure.
Progress in Physical Geography, 27(1): 88–106.
Miller, P., Mills, J., Edwards, S., Bryan, P., Marsh, S., Hobbs, P. and Mitchell, H., 2007. A robust
surface matching technique for integrated monitoring of coastal geohazards. Marine Geodesy, 30(1/2): 109–
123.
Oort, P. A. J. van and Bregt, A. K., 2005. Do users ignore spatial data quality? A decision-theoretic perspective.
Risk Analysis, 25(6): 1599–1610.
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
165
Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models
Rodarmel, C., Theiss, H., Johanesen, T. and Samberg, A., 2006. A review of the ASPRS guidelines for the
reporting of horizontal and vertical accuracy in lidar data. International Archives of Photogrammetry, Remote
Sensing and Spatial Information Sciences, 36(1): 6 pages (on CD-ROM).
Royston, J. P., 1982. Expected normal order statistics (exact and approximate). Applied Statistics, 31(2): 161–165.
Sithole, G. and Vosselman, G., 2003. Report: ISPRS comparison of filters. http://www.itc.nl/isprswgIII-3/
filtertest/Report05082003.pdf [Accessed: 23rd August 2007].
Sithole, G. and Vosselman, G., 2004. Experimental comparison of filter algorithms for bare-earth extraction
from airborne laser scanning points clouds. ISPRS Journal of Photogrammetry and Remote Sensing, 59(1–2):
85–101.
Smith, S. L., Holland, D. A. and Longley, P. A., 2005. Quantifying interpolation errors in urban airborne laser
scanning models. Geographical Analysis, 37(2): 200–224.
Steel, R. G. D. and Torrie, J. H., 1980. Principles and Procedures of Statistics: A Biometrical Approach.
Second edition. McGraw-Hill, New York. 631 pages.
Su, J. and Bork, E., 2006. Influence of vegetation, slope, and lidar sampling angle on DEM accuracy. Photogrammetric Engineering & Remote Sensing, 72(11): 1265–1274.
Torlegård, K., Östman, A. and Lindgren, R., 1986. A comparative test of photogrammetrically sampled
digital elevation models. Photogrammetria, 41(1): 1–16.
Weir, M. J. C., 2002. Monte Carlo simulation of long-term spatial error propagation in forestry databases. Spatial
Data Quality (Eds. W. Shi, P. F. Fisher and M. F. Goodchild). Taylor & Francis, London. 336 pages: 294–
303.
Appendix
An estimating equation for parameter h is an equation g(X, h) = 0 which has a unique
solution h = h(X) for every value of X, X being a random variable. In this case, X is a set of N
independent residuals computed at corresponding check points for evaluating lidar-derived
DEM accuracy. The function g is named the estimating function. An optimal estimation of h
can be achieved considering the class of estimating functions as having zero expectation given
by the following linear combination:
g ¼ ah1 þ bh2
ðA1Þ
where h1 and h2 are real functions which would be computed depending on the underlying
nature of the statistical problem, and a and b are real functions that can be estimated by the
locally optimal solution found by Godambe and Thompson (1989) and shown in equation
(A2). For the correct application of this optimal solution, E(h1h2) should be equal to zero
(E expressing the mathematical expectation of the operation in parentheses). That is to say,
h1 and h2 must be orthogonal. A more detailed discussion on the theory of estimating
functions, together with numerous applications, can be found in Godambe (1991).
@h
1
2
E @h
@h
@h
a ¼
¼
;
b
:
ðA2Þ
Eðh21 Þ
Eðh22 Þ
In this case, the parameter h would be the expected or mean value for the variable sample
mean xðlÞ. The basic estimating functions, supposing the third and fourth central moments are
known (skewness and kurtosis, respectively), would be (Godambe and Thompson, 1989)
h1 ¼ x l;
h2 ¼ ðx lÞ2 r2m c1m rm ðx lÞ
ðA3Þ
where rm and c1m are the standard deviation and the skewness of the sampling distribution
of the sample mean, respectively. It can be easily proved that h1 and h2 are orthogonal
functions, as stated before. Therefore, the estimates of real functions a and b can be obtained
by means of equation (A2):
166
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
The Photogrammetric Record
a ¼
1
;
r2m
b ¼
c1m rm
;
þ 2 c21m Þ
r4m ðc2m
r
rm ¼ pffiffiffiffi :
N
ðA4Þ
Hence the estimating function g for l would be estimated by g* = a * h1 + b * h2. Given
that the distribution of the statistic g*/rg* can be approximated by a standardised normal
distribution, the following expression can be written:
g
Za ¼ 1 a
ðA5Þ
P
rg
where Za is the one-tailed critical value corresponding to a confidence level of 1 ) a. Hence
Za = 1Æ64 for a = 0Æ05. Equations (2) and (3) in the main text can be obtained from equation
(A5) and solving a quadratic equation for the variable x l. The reader is referred to
Godambe and Thompson (1989) for further details.
Résumé
Malgré le coût relativement élevé des modèles numériques des altitudes (MNA)
obtenus à partir de lidars aéroportés, ces produits sont généralement fournis sans
qu’on les accompagne d’une estimation convenable de leur exactitude. Dans la
plupart des cas, l’estimation de l’exactitude des MNA est fournie d’après une
comparaison entre les altitudes obtenues au lidar et celles en un nombre limité de
points de vérification dont les coordonnées proviennent d’une source indépendante et
de précision supérieure; on suppose également que la distribution des erreurs et des
écarts d’altitudes suit une loi normale. On propose dans cet article une nouvelle
méthodologie pour évaluer l’exactitude altimétrique des MNA issus de lidars en
recourant à des intervalles de confiance élaborés à partir d’un échantillonnage limite
d’erreurs calculées aux points de vérification. On a essayé une solution nonparamétrique dans laquelle on ne suppose aucune distribution particulière des
erreurs, ce qui permet d’appliquer la méthodologie proposée lorsque les distributions
d’erreurs ne sont pas normales, ce qui est généralement le cas des MNA issus de
lidars. On a testé la qualité de la modélisation proposée en utilisant une simulation
expérimentale de type Monte Carlo sur 18 jeux de données d’erreurs altimétriques.
On a établi 15 de ces jeux à partir des données-lidar originales fournies par le
Groupe de Travail III/3 de la Société Internationale de Photogrammétrie et de
Télédétection, en utilisant comme réalité de terrain leurs données de référence filtrées
manuellement. Les trois derniers jeux de données ont été fournis par le système lidar
du Natural Environment Research Council et son Airborne Research and Survey
Facility, en déterminant les points de vérification par un système GPS cinématique de
haute précision. Les résultats se sont montrés prometteurs, les modèles proposés
reproduisant bien le comportement statistique des erreurs altimétriques du lidar pour
un nombre convenable de points de vérification, y compris dans le cas où les jeux de
données n’ont pas une distribution normale des résidus. On peut donc considérer que
cette étude constitue une avancée importante pour l’amélioration du contrôle de la
qualité des MNA issus de lidars.
Zusammenfassung
Trotz der relativ hohen Kosten für Digitale Höhenmodelle (DEMs) aus
flugzeuggestütztem Lidar werden diese Produkte ohne eine zufrieden stellende
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
167
Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models
Genauigkeitsabschätzung geliefert. Meist wird die Genauigkeit des Höhenmodells
durch Vergleich einer Auswahl von Kontrollpunkthöhen aus unabhängigen Messungen höherer Genauigkeit mit den Höhen aus den Lidar Daten abgeleitet. Dazu wird
eine Normalverteilung der abgeleiteten Höhendifferenzen, bzw. Höhenfehler angenommen. Dieser Beitrag schlägt eine neue Methode zur Schätzung der Höhengenauigkeit eines Lidar Höhenmodells vor, das Konfidenzintervalle nutzt, die aus einer
Stichprobe von Fehlern an Kontrollpunkten abgeleitet werden. Ein nicht-parametrischer Ansatz wurde getestet, bei dem keine besondere Fehlerverteilung angenommen wurde. Damit wird die vorgeschlagene Methode besonders auf die nichtnormalverteilten Fehler anwendbar, wie sie typischerweise in den Höhenmodellen
aus Lidar auftreten. Das Verhalten des vorgeschlagenen Modells wurde empirisch mit
einer Monte-Carlo Simulation an 18 Höhenfehlerdatensätzen untersucht. 15 dieser
Datensätze wurden aus den originalen Lidar Daten der Arbeitsgruppe III/3 der
Internationalen Gesellschaft für Photogrammetrie und Fernerkundung gerechnet,
wobei die zugehörigen gefilterten Referenzdaten als Sollwerte angenommen wurden.
Die restlichen drei Datensätze wurden durch das Lidar System des ‘‘Natural
Environment Research Council’s Airborne Research and Survey Facility’’ bereitgestellt zusammen mit Kontrollpunkten, die mit hochgenauem kinematischen GPS
bestimmt worden sind. Die Ergebnisse waren viel versprechend. Die vorgeschlagenen
Methode reproduzierte das statistische Verhalten der Höhenfehler der Lidar Daten
mit eine günstigen Anzahl von Kontrollpunkten, sogar in den Datensätzen mit nichtnormalverteilten Verbesserungen. Diese Forschungsergebnisse können deshalb als
wichtiger Schritt zur Verbesserung der Qualitätskontrolle von Höhenmodellen aus
Lidar Daten betrachtet werden.
Resumen
Pese al relativamente elevado coste de los Modelos Digitales de Elevación
(MDE) obtenidos mediante lı́dar aerotransportado, estos productos se proporcionan
habitualmente sin una adecuada estimación de su precisión. En la mayorı́a de los
casos las estimación de la exactitud de un MDE se hace comparando las elevaciones
lı́dar con una muestra finita de puntos de comprobación procedentes de una fuente
independiente de referencia de mayor exactitud. En estos casos se suele asumir que la
distribución estadı́stica de las diferencias de altura o error vertical sigue una
distribución normal. Este artı́culo propone una nueva metodologı́a para estimar la
exactitud vertical de un MDE lı́dar mediante intervalos de confianza construidos a
partir de una muestra finita de errores calculados en los puntos de comprobación. En
este sentido, se ha adoptado una perspectiva no paramétrica, sin asumir ninguna
distribución especı́fica del error, de modo que la metodologı́a propuesta es
particularmente aplicable en el caso de distribuciones no normales como las que
suelen encontrarse en MDE lı́dar. La bondad del modelo propuesto fue validada
experimentalmente en 18 conjuntos de datos lı́dar mediante una simulación numérica
basada en el método de Monte Carlo. 15 de las 18 poblaciones de errores se
obtuvieron a partir de datos lı́dar brutos facilitados por el Grupo de Trabajo III/3 de
International Society for Photogrammetry and Remote Sensing, usando sus
respectivos datos de referencia, filtrados manualmente, como forma de validación.
Las tres poblaciones de errores restantes fueron proporcionadas por el sistema lı́dar
del Airborne Research and Survey Facility del Natural Environment Research
Council, junto con puntos de comprobación adquiridos mediante GPS cinemático de
168
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
The Photogrammetric Record
alta precisión. Los resultados obtenidos resultaron ser muy prometedores, observándose cómo los modelos propuestos reproducı́an acertadamente el comportamiento
estadı́stico de los errores lı́dar verticales a partir de una muestra relativamente
pequeña de puntos de comprobación, incluso en casos en que los datos tenı́an
residuos con una distribución no normal. Por tanto, esta investigación puede
considerarse como un avance muy importante en la mejora del proceso de control de
calidad de los DEM lı́dar.
2008 The Authors. Journal Compilation 2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd.
169