www.fgks.org   »   [go: up one dir, main page]

Academia.eduAcademia.edu
The Photogrammetric Record 23(122): 148–169 (June 2008) ACCURACY ASSESSMENT OF LIDAR-DERIVED DIGITAL ELEVATION MODELS Fernando J. Aguilar (faguilar@ual.es) University of Almeria, Spain Jon. P. Mills (j.p.mills@newcastle.ac.uk) Newcastle University (Based on a contribution to the Annual Conference of the Remote Sensing and Photogrammetry Society at Newcastle upon Tyne, 14th September 2007) Abstract Despite the relatively high cost of airborne lidar-derived digital elevation models (DEMs), such products are usually presented without a satisfactory associated estimate of accuracy. For the most part, DEM accuracy estimates are typically provided by comparing lidar heights against a finite sample of check point coordinates from an independent source of higher accuracy, supposing a normal distribution of the derived height differences or errors. This paper proposes a new methodology to assess the vertical accuracy of lidar DEMs using confidence intervals constructed from a finite sample of errors computed at check points. A non-parametric approach has been tested where no particular error distribution is assumed, making the proposed methodology especially applicable to non-normal error distributions of the type usually found in DEMs derived from lidar. The performance of the proposed model was experimentally validated using Monte Carlo simulation on 18 vertical error datasets. Fifteen of these data-sets were computed from original lidar data provided by the International Society for Photogrammetry and Remote Sensing Working Group III/3, using their respective filtered reference data as ground truth. The three remaining data-sets were provided by the Natural Environment Research Council’s Airborne Research and Survey Facility lidar system, together with check points acquired using high precision kinematic GPS. The results proved promising, the proposed models reproducing the statistical behaviour of vertical errors of lidar using a favourable number of check points, even in the cases of data-sets with non-normally distributed residuals. This research can therefore be considered as a potentially important step towards improving the quality control of lidar-derived DEMs. Keywords: accuracy assessment, confidence intervals, DEM, lidar Introduction Accurate digital elevation models (DEMs) of high spatial resolution from airborne lidar data are in increasing demand for a growing number of mapping and GIS tasks related to a wide variety of applications including forest management, urban planning, bird population  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. Blackwell Publishing Ltd. 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street Malden, MA 02148, USA. The Photogrammetric Record modelling, ice sheet mapping, flood control and road design (Lim et al., 2003). Lidar DEMs are also being increasingly used for new applications relating to change detection and geopositioning (James et al., 2006; Rodarmel et al., 2006; Miller et al., 2007). However, despite the growing range of applications, and in spite of the relatively high cost of this new type of digital product, such DEMs are usually presented without an appropriate associated estimate of their accuracy. Indeed, with a popular estimate being that 80% of all data used by managers and decision-makers is spatially referenced (van Oort and Bregt, 2005), many researchers have stressed the need to deal with issues of spatial quality. A recent, comprehensive review regarding the importance of assessing error in DEMs can be found in Fisher and Tate (2006). In most cases, users of lidar-derived DEMs are content with some nominal specifications about raw data accuracy, as supplied by the instrument manufacturer or the data provider. Such statistics are often inappropriate given that the major component of the total error could well be the error introduced during the processing of the raw data (data filtering, gridding, segmentation and/or object reconstruction), rather than the error induced during the data capture process (which is related to vertical and horizontal error in the positioning of the laser platform, laser scan angle, surface reflectivity and terrain slope). In fact, gridding error can comprise a very important, and often neglected, source of error which should be taken into account (Smith et al., 2005; Aguilar et al., 2006). For example, it may be especially relevant for canopy height model estimation in forestry applications where there are fewer ground-return samples for effective DEM surface interpolation (Lim et al., 2003; Clark et al., 2004; Hopkinson et al., 2004; Su and Bork, 2006). Although there have been a number of studies dealing with lidar-derived DEM error, most can be classified as empirical work in which the influence of different variables on DEM error was analysed (Hopkinson et al., 2004; Hyyppä et al., 2005; Goodwin et al., 2006; Göpfert and Heipke, 2006; Su and Bork, 2006). In practice, statistical tests are the only way to ensure that requirements are met at a moderate cost by using an inference process based on sampling theory. At present, the great majority of DEM accuracy standards are based on computing the vertical accuracy of a finite sample data-set (check points) from the differences (residuals) between data-set heights and height values from an independent source of higher accuracy, usually obtained by differential GPS techniques and supposing those residuals follow a normal distribution. This is the case, for example, for the ‘‘National Standards for Spatial Data Accuracy’’ (NSSDA) document published by the US Federal Geographic Data Committee (FGDC, 1998). However, it should be recognised that vertical errors in lidar DEMs often follow a non-normal distribution when they are captured over non-open terrain (for instance, over vegetated or built-up areas) and, therefore, both filtering and gridding can introduce a large quantity of non-random noise (Flood, 2004; Hopkinson et al., 2004; Göpfert and Heipke, 2006), presenting high kurtosis and skewness due to the presence of systematic errors and outliers. The presence of systematic errors is relatively frequent in lidar-derived DEMs of forest areas where non-ground samples (such as low vegetation and logs) are included in the DEM interpolation process and the resulting mean error tends to be positive. In such cases a lidar-derived DEM tends to overestimate the reference elevation (Clark et al., 2004). Concerned about the likely non-normal nature of lidar-derived DEM error under non-open terrain, the American Society for Photogrammetry and Remote Sensing (ASPRS), through the work of the ASPRS Lidar Committee, recently approved a new set of guidelines (Flood, 2004). The guidelines recommend that for open terrain, where error distribution is thought to be close to a normal distribution, the methodology proposed by the NSSDA should be adopted. That is, for assessing vertical accuracy at 95% confidence level:  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. 149 Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models Vertical accuracy ¼ 196rmsez ð1Þ where rmse is the root mean square error. The resulting figure is known as the ‘‘fundamental’’ vertical accuracy. Away from open terrain, the fundamental vertical accuracy is replaced by the so-called ‘‘supplemental’’ vertical accuracy, tested using the 95th percentile method. The 95th percentile method may be used regardless of whether or not the errors follow a normal distribution and whether or not errors qualify as outliers, indicating that 95% of the errors in the data-set will have absolute values of equal or lesser value and 5% of the errors will be of larger value. Regarding the number of necessary check points, the ASPRS recommends a minimum of 20 check points (30 preferred) in each of the major land cover categories represented in the surveyed area. This gives rise to the question of exactly how many check points should be used in order to guarantee that a certain confidence level is achieved. Conversely, it may be necessary to derive the reliability of fundamental and supplemental vertical accuracy estimates when a particular number of check points is used. Deriving definitive values is not easy, and the ASPRS guidelines do not offer advice with respect to these issues. Addressing the aforementioned issues, the main objective of this paper is the development of a theoretical methodology to assess lidar DEM vertical accuracy. Vertical accuracy is estimated by means of the construction of confidence intervals from a finite sample of residuals computed at check points evenly distributed across the whole survey area. One of the goals is to derive the minimum number of check points needed to guarantee a certain confidence level. In this way, a non-parametric approach based on the theory of estimating functions has been adopted and tested. This approach means that no particular distribution of residuals at check points is assumed and so it is applicable to non-normal error distributions which are usually found in lidar-derived DEMs. Development of the Proposed Models The models tested in the work reported here concern two different scenarios: (1) Construction of confidence intervals relating to the distribution of lidar vertical errors from a finite sample of residuals coming from a population not necessarily expected to be normally distributed. (2) Construction of confidence intervals for the assessment of an average error statistic for which the meaning will easily be understood, both on the part of the user and the producer of lidar-derived DEMs. In this case rmse, as a widely used average error statistic, was chosen. Both of the contemplated scenarios avoid the very common assumption of normally distributed errors, as adopted in the majority of current vertical accuracy standards. Model for Estimating Lidar Vertical Accuracies Suppose that the difference between a lidar-derived DEM and ground truth is a random variable X. This being the case, a sample of size N height differences may be used to estimate the mathematical expectation (mean value, l) and dispersion (standard deviation, r) of the said height differences for the whole surface (X = {x1, x2, …, xN}). This is the procedure adopted by the NSSDA standards and the ASPRS Lidar Committee, as was previously discussed. 150  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. The Photogrammetric Record The first step is to obtain a confidence interval for l by means of statistical inference, taking into account the likely non-normal nature of the errors. It is therefore necessary to determine the confidence interval using a non-parametric approach where no particular distribution is assumed. This issue can be resolved using the theory of estimating functions, as demonstrated by Aguilar et al. (2007a). An outline of the main steps in deriving equations (2) and (3) can also be found in the Appendix located at the end of this paper. sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   2  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ta ðc2m þ2Þðc2m þ2c21m Þ c2m þ2 c2m þ2 þ1 þ4 c1m þ c1m jc1m j r pffiffiffiffi lupper ¼ x þ ð2Þ 2 N c2m þ2 c1m llower ¼ x þ sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi   2  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ta ðc2m þ2Þðc2m þ2c21m Þ c2m þ2  þ4 þ 1 c jc j 1m 1m 2 r pffiffiffiffi N ð3Þ where x is the sample mean, N is the sample size, r is the value of the population standard deviation (which will be estimated by sample standard deviation Sd), and finally c1m and c2m are the skewness and standardised kurtosis of the sampling distribution of the sample mean, respectively. Skewness and standardised kurtosis can be estimated from a finite sample by means of the following expressions: P N Ni¼1 ðxi  xÞ3 c ; c1m ¼ p1ffiffiffiffi ð4Þ c1 ¼ 3 ðN  1ÞðN  2ÞSd N c2 ¼ P N ðN þ 1Þ Ni¼1 ðxi  xÞ4 3ðN  1Þ2 c ; c2m ¼ 2 :  4 ðN  2ÞðN  3Þ N ðN  1ÞðN  2ÞðN  3ÞSd ð5Þ The use of ta, a one-tailed critical value corresponding to a confidence level of 1 ) a, instead of za should be noted in equations (2) and (3). This change is recommended when estimating confidence intervals using the sample standard deviation instead of r, especially when working with small samples (lower than 100 observations). In this case it is more appropriate to employ Student’s t-distribution calculating ta values for N ) 1 degrees of freedom, N being the sample size (Steel and Torrie, 1980). Once the confidence interval for the mean is determined, which therefore includes the disturbance effects of error bias, skewness and kurtosis, the second step implies treating data as if it came from a normal population. Under this hypothesis the following expressions can be written to estimate the maximum and minimum expected vertical error (xupper and xlower, respectively) for a 1 ) a confidence level: xupper ¼ lupper þ ta Sd ð6Þ xlower ¼ llower  ta Sd: ð7Þ Systematic errors are not assumed to have been removed and so the mean is not necessarily expected to be zero. This phenomenon is frequently encountered when working with lidar data in afforested areas where it is usual to find a positive bias due to dense low lying vegetation beneath the tree canopy (Kraus and Pfeifer, 1998; Goodwin et al., 2006; Göpfert and Heipke, 2006; Su and Bork, 2006).  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. 151 Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models Model for Estimating Lidar-Derived DEM Accuracy Using Rmse In this case it is necessary to construct a confidence interval to estimate uncertainty when rmse is employed to evaluate lidar-derived DEM vertical accuracy, again supposing a non-normal error distribution. The approach proposed by Aguilar et al. (2007a), based on estimating functions theory, was again adopted to obtain the following equations: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u   2  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u ta ðc2mse þ2Þðc2mse þ2c21mse Þ c2mse þ2 c2mse þ2 u þ þ4 þ 1 u c1mse c1mse jc1mse j t ð8Þ rmse rmseupper ¼ mse þ 2 rmselower vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u   2  pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u ta ðc2mse þ2Þðc2mse þ2c21mse Þ c2mse þ2 c2mse þ2 u þ1 þ4 u c1mse  c1mse jc1mse j t rmse ¼ mse þ 2 ð9Þ where mse is the mean square error, rmse is the standard deviation of the mse sampling distribution, and finally c1mse and c2mse are mse skewness and standardised kurtosis, respectively. All described central moments for square errors may be estimated from a sample of size N check points by employing the following expressions: mse ¼ Sdx2 ¼ c1x2 c2x2 ¼ N P vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi uN uP 2 u ðxi  mseÞ2 ti¼1 N x2i i¼1 N r2 Sdx2 ; rmse ¼ pxffiffiffiffi  pffiffiffiffi N N P c1x2 N Ni¼1 ðx2i  mseÞ3  p ffiffiffiffi ¼ ; c ¼ 1mse ðN  1ÞðN  2ÞSd3x2 N P c 2 N ðN þ 1Þ Ni¼1 ðx2i  mseÞ4 3ðN  1Þ2 ; c2mse ¼ 2x :  4 N ðN  1ÞðN  2ÞðN  3ÞSdx2 ðN  2ÞðN  3Þ ð10Þ ð11Þ ð12Þ ð13Þ Methodology Test Data The study sites concern lidar surveys of two different areas. The first area comprises the data-sets from the second phase of the European Organisation for Experimental Photogram- 152  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. The Photogrammetric Record metric Research (OEEPE—now European Spatial Data Research, EuroSDR) project on laser scanning. Data of the second area was acquired in a series of lidar surveys carried out by the Natural Environment Research Council’s Airborne Research and Survey Facility (NERC ARSF). Data-sets from the EuroSDR Project on Laser Scanning As part of the second phase of the EuroSDR project on laser scanning, different lidar datasets, first and second pulse return, were acquired over the Vaihingen/Enz and Stuttgart city centre (Germany) test sites using an Optech ALTM laser scanner. International Society for Photogrammetry and Remote Sensing (ISPRS) Working Group III/3 has previously used these data-sets for testing numerous filtering algorithms (Sithole and Vosselman, 2004) and data has been made available through the working group’s website (http://www.commission3.isprs.org/ wg3). A total of seven study sites (four urban and three rural) were selected for testing the aforementioned models, with sites chosen for their varied characteristics and diverse feature content (open fields, dense buildings, large buildings, quarries, bridges, dense vegetation and steep slopes, among others). The selected sites were appropriate to enable validation of the proposed models against a very wide range of operational conditions. Fifteen sub-sites were extracted from the original data (Table I). For a better understanding of the nature of the data, a planimetric view of Site 2 and corresponding reference sub-sites are depicted in Fig. 1. Reference data (ground truth) was generated by manual filtering of the height data-set to segment bare earth data, the process assisted by knowledge of the landscape and aerial photographs (see Sithole and Vosselman (2003) for further details). Data-sets from the NERC ARSF Filey Bay is located approximately 11 kilometres south of the coastal town of Scarborough in North Yorkshire, UK. Three lidar data-sets of Filey Bay were acquired using Table I. Characteristics of the selected EuroSDR data-sets. Site Site 1 Site 2 Site 3 Site 4 Site 5 Site 6 Site 7 Point spacing Reference data (sub-site) Number of points 1m 1m 1m 1m 1m 1m 1m Sample Sample Sample Sample Sample Sample Sample 11 12 21 22 23 24 31 17 626 25 344 10 055 21 570 11 025 3755 15 544 1m 1m 2m 2m 2m 2m 2m 2m Sample Sample Sample Sample Sample Sample Sample Sample 41 42 51 52 53 54 61 71 1653 11 989 13 860 17 678 25 113 3948 31 531 12 786 Characteristics Steep slopes, mixture of vegetation and buildings on hillside, buildings on hillside Large and irregularly shaped buildings, small tunnel, road and bridge Vegetation between a high density of buildings, building with eccentric roof, open terrain with mixture of low and high features Railway station with trains (low density of ground points) Steep vegetated slopes, quarry (sharp break lines), vegetated river bank Large buildings, road including embankments Road, underpass, road including embankments  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. 153 Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models Sample 21 5403200 Sample 24 5403100 5403000 5402900 5402800 5402700 513500 Sample 22 513600 513700 513800 Sample 23 Fig. 1. Site 2 with corresponding reference sub-sites shown as perspective views. an Optech ALTM 3033 scanner (operating at 1000 m flight altitude) in April 2005 (data-set 0405), August 2005 (0805) and May 2006 (0506); Miller et al. (2007) contains further details of the surveys. All issues related to georeferencing were carried out by the NERC ARSF. As independent quality control, delivered lidar data was compared against a testfield based on check points acquired during a previous survey in 2001 using kinematic GPS (anticipated vertical accuracy of around 1 to 2 cm). The test area consisted of a hard surface, an asphalt road circuit, located at Filey Brigg Country Park (Fig. 2) that enabled an open terrain vertical accuracy evaluation of the lidar data to be undertaken. In this case no error from filtering or gridding is expected, and so all error can be assumed to arise from the data capture process. Table II gives statistical parameters associated to the vertical errors computed for the Filey Bay data-sets. It should be highlighted that error distribution generally fitted a normal distribution except in the case of the 0506 lidar data, where the ‘‘3-sigma’’ rule (Daniel and Tennant, 2001) had to be applied to remove three likely outliers. Notice that the Kolmogorov– Smirnov (K-S) test (Royston, 1982), with a confidence level of 95%, demonstrated that the error distribution for each data-set matched quite well a normal probability distribution because the K-S statistic was always lower than the 95% critical value. Despite the 0805 lidar data-set presenting an absolute value of skewness higher than the critical value of 0Æ5, proposed by some authors as a reasonable limit to consider data to come from a normal distribution (Daniel and Tennant, 2001), no outlier correction was applied because of the results yielded by the K-S test. Therefore, these data-sets were subsequently used for testing the proposed model to construct rmse confidence intervals from normally distributed lidar error data. 154  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. The Photogrammetric Record Fig. 2. GPS check point testfield located at Filey Brigg Country Park. Table II. Statistical characteristics of Filey Bay data-sets. Data-sets Number of check points Minimum error (m) Maximum error (m) Mean error (m) Standard deviation (m) Skewness Standardised kurtosis K-S statistic K-S critical value (95%) 0405 lidar 0805 lidar 0506 lidar (raw) 0506 lidar (3-sigma rule) 840 )0Æ16 0Æ03 )0Æ06 0Æ03 )0Æ04 0Æ12 0Æ026 0Æ047 833 )0Æ29 0Æ23 )0Æ16 0Æ05 1Æ24 9Æ07 0Æ038 0Æ047 834 )0Æ03 1Æ31 0Æ09 0Æ06 10Æ75 179Æ45 0Æ151 0Æ047 831 )0Æ03 0Æ20 0Æ08 0Æ04 0Æ19 0Æ00 0Æ029 0Æ047 Raw Data Processing of EuroSDR Data Last-return raw EuroSDR lidar data was filtered to segment the ground surface from vegetation, buildings and any gross errors embedded in the general point cloud (Axelsson, 1999). The filtering algorithm developed by Axelsson (2000) was used to segment ground points by means of a progressive triangular irregular network (TIN) densification method where the surface was allowed to fluctuate within certain values. Briefly, a sparse TIN is derived from neighbourhood minima, and then progressively densified through the laser point cloud. The algorithm has been implemented in TerraScan (integrated in Terrasolid, version 007.004), the software used to carry out the filtering process. As the first step in the filtering process, points regarded as too low or very high were removed as likely outliers. Subsequently, the ground point class was filtered and saved as points belonging to the DEM. Fig. 3 depicts an example of the results from the filtering process for Site 5. The interpolation method used to refill any gaps created in the filtered ground model (gaps resulting from the removal of points on buildings, vegetation and the like) was the  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. 155 Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models Fig. 3. Filtering process over Site 5. Top: last-return raw data. Bottom: last-return ground-filtered data. Multiquadric Radial Basis Function working with the local support of the eight neighbours closest to the interpolated point, since this has previously been found generally to yield better results (Aguilar et al., 2005). The final grid spacing of the interpolated DEM was the same as presented by the raw lidar data. Lidar error was computed as the difference between the derived ground DEM heights (obtained after applying Axelsson’s filtering) and the heights from the manually filtered reference data for each of the 15 EuroSDR samples (as listed in Table I). It is important to note that large DEM data gaps caused by the filtering process (for example, the removal of points on large buildings) were not included in the error calculation in order to avoid the uncertainty in areas where no reference data (ground truth) was available. Hence the error data-sets obtained (Table III) can be considered very close to the errors found in real-world lidar surveys. Furthermore, the 3-sigma rule was applied to remove outliers which can corrupt the true statistical distribution of the errors. In fact, this technique is generally applied under operational conditions for removing outliers in the DEM quality assessment of lidar data (Daniel and Tennant, 2001). The percentage of data removed ranged between 0Æ5 and 3Æ5%, with a mean value of 1Æ8%. These results are fairly consistent with those published by Torlegård et al. (1986) referring to errors from photogrammetric DEMs. Nevertheless, despite applying the 3-sigma rule, corrected error data-sets still followed a non-normal distribution according to the K-S 95% confidence test. 156  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. The Photogrammetric Record Table III. Statistics for 3-sigma rule corrected error from EuroSDR data-sets. Samples 11 12 21 22 23 24 31 41 42 51 52 53 54 61 71 Statistics Points Mean (m) Sd (m) c1 c2 K-S Critical K-S % data outliers 16 995 25 203 9742 21 193 10 871 3695 15 315 1626 11 743 13 701 17 368 24 702 3863 31 057 12 517 0Æ16 0Æ04 0Æ02 0Æ04 0Æ05 0Æ04 0Æ01 0Æ25 0Æ02 0Æ00 0Æ08 0Æ16 0Æ00 0Æ01 0Æ02 0Æ44 0Æ16 0Æ05 0Æ13 0Æ20 0Æ17 0Æ04 1Æ11 0Æ07 0Æ06 0Æ30 0Æ71 0Æ08 0Æ10 0Æ11 3Æ30 6Æ26 1Æ83 5Æ12 5Æ06 5Æ31 1Æ29 5Æ05 2Æ98 0Æ26 2Æ52 6Æ64 2Æ79 3Æ84 2Æ33 11Æ82 54Æ41 4Æ77 32Æ39 31Æ26 45Æ81 4Æ16 24Æ41 13Æ22 5Æ32 8Æ41 57Æ08 19Æ27 24Æ07 10Æ80 0Æ322 0Æ291 0Æ153 0Æ272 0Æ299 0Æ273 0Æ108 0Æ424 0Æ201 0Æ107 0Æ270 0Æ360 0Æ179 0Æ252 0Æ208 0Æ010 0Æ010 0Æ014 0Æ009 0Æ013 0Æ022 0Æ011 0Æ034 0Æ013 0Æ012 0Æ010 0Æ009 0Æ022 0Æ008 0Æ012 3Æ5 0Æ5 3Æ1 1Æ7 1Æ4 1Æ6 1Æ4 1Æ6 2Æ0 1Æ1 1Æ7 1Æ6 2Æ1 1Æ5 2Æ1 Experimental Validation of the Proposed Models The Monte Carlo method was employed for model validation. Monte Carlo numerical simulation is based on stochastic techniques, meaning it is based on the use of random numbers and probability statistics, to investigate complex problems without relying on analytical equations (Weir, 2002). For the first model evaluated, described through equations (2) to (7), the simulation started by extracting a sample of residuals (sample X of size N) from each of the EuroSDR data-sets by means of random sampling. In this case N took the values of 20, 40, 60, 80 and 100 check points. The model parameters were then computed and the bounds of the interval, xupper and xlower, were obtained for a 1 ) a confidence level. The number of errors from the whole data-set (population) which were within the bounds estimated by the model was then calculated. It must be noted that the percentage of errors included within the bounds should be close to 95% if, for instance, a significance level of a = 0Æ05 was chosen. This procedure was repeated 1000 times to estimate the mean value of the percentage of errors from the population falling within the bounds and its associated reliability measure (coefficient of variation). Over the same simulated samples, and purely for comparison purposes, fundamental and supplemental vertical accuracies were also computed according to the ASPRS guidelines for lidar data (Flood, 2004), their corresponding mean values and associated reliabilities also being obtained for 1000 runs. For the evaluation of the second model, described through equations (8) to (13), the simulation procedure was quite similar to the first, with the exception that the estimated bounds were rmse values and the sample size increased in increments of 20, from 20 to 200 check points. The population rmse for every data-set was calculated from all residual values available in each. It was then necessary to determine whether the corresponding rmse population fell within the limits of the calculated confidence intervals and whether the model had correctly predicted the expected uncertainty for data-set estimation. In this case the procedure was run 2000 times because it was computationally faster than for the first model. Note that because samples were extracted from finite populations, a correcting coefficient must be applied to calculate sample variance in order to correct the reduced variability expected on consecutive random samplings extracted from the same finite residual population (Steel and Torrie, 1980). Therefore, the following equation was applied:  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. 157 Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models Sd Sdm ¼ pffiffiffiffi N rffiffiffiffiffiffiffiffiffiffiffiffiffiffi M N M ð14Þ where M is the number of residuals, N is the sampling size, Sd and Sdm are the standard deviation of the residuals sample and the mean sample, respectively. Results and Discussion Confidence Intervals for Lidar Vertical Accuracy Assessment 40 30 20 10 40 60 80 100 75 0 20 40 30 20 10 80 100 Percentage of errors within computed intervals 50 60 0 120 40 30 20 10 80 100 Number of check points Fundamental vertical accuracy Supplemental vertical accuracy 95% population percentile Reliability for fundamental accuracy calculation Reliability for supplemental accuracy calculation 0 120 Percentage of errors within computed intervals 50 Reliability (%) Accuracy (m) 60 60 100 5 4·5 4 3·5 3 2·5 2 1·5 1 0·5 0 120 90 85 80 75 0 20 40 60 80 100 70 40 80 Number of check points 1·5 1·35 1·2 1·05 0·9 0·75 0·6 0·45 0·3 0·15 0 20 60 95 Number of check points 0 40 100 60 Reliability (%) Accuracy (m) 70 40 100 80 Number of check points 1·5 1·35 1·2 1·05 0·9 0·75 0·6 0·45 0·3 0·15 0 20 5 4·5 4 3·5 3 2·5 2 1·5 1 0·5 0 120 85 Number of check points 0 100 90 Reliability (%) 20 95 95 90 85 80 75 0 20 40 60 80 Reliability (%) 0 0 120 5 4·5 4 3·5 3 2·5 2 1·5 1 0·5 0 120 Reliability (%) 100 60 50 Percentage of errors within computed intervals 70 1·5 1·35 1·2 1·05 0·9 0·75 0·6 0·45 0·3 0·15 0 Reliability (%) Accuracy (m) Fig. 4 shows the performance of the proposed model compared against results achieved using the ASPRS guidelines (Flood, 2004). Only results for samples 11, 52 and 61 are given, although results are similar for all 15 samples. Number of check points Mean percentage of errors within the computed interval (95% confidence level) for the proposed model Reliability for the computed intervals calculation Fig. 4. Evaluation results for samples 11 (top graphs), 52 (middle) and 61 (bottom) from the EuroSDR data-set. Left column: accuracy following ASPRS guidelines. Right column: proposed model. 158  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. The Photogrammetric Record Results calculated using the fundamental accuracy assessment, as defined in equation (1) supposing a normal distribution of the error, were usually far from the true vertical accuracy computed for the error population. This is to be expected, taking into account the non-normally distributed nature of lidar error data-sets in non-open terrain, as previously confirmed from results yielded by the K-S test given in Table III. Therefore, following the recommendations of the ASPRS guidelines, the supplemental vertical accuracy test figures should be considered and computed using the 95th percentile method, as explained in the introduction. In this case supplemental accuracy always overestimated the true value, but closed in to the truth when the sample size increased. The problem with both fundamental and supplemental vertical accuracy is that the variability of the estimates was very noticeable, especially in the case of the latter (always above ±20% even when using up to 100 check points). Hence many more than 100 check points would have to be employed to ensure a confidence level near to the target value of 95%. However, the 95% confidence intervals computed by means of the proposed model, developed through equations (2) to (7), achieved a percentage of population errors within the bounds very close to the target value, even when working with only 20 check points. Better still, the variability of the estimates was very low, as can be seen in Fig. 4. Summing up, only around 60 check points would be needed to reach a confidence level of 95% with an estimate error below ±2Æ5%. Fig. 5 shows confidence interval limits (the mean of 1000 runs) computed from the proposed model for the three samples referred to in Fig. 4, varying randomly the initial finite sample of check points. The stability of the confidence interval length along the horizontal axis (sample size) should be highlighted. The confidence interval length basically depends on the sample standard deviation, skewness and standardised kurtosis, rather than the number of check points used. Therefore, precise and narrow confidence intervals can be calculated when working with as few as 60 check points. Notice that unlike the case for a normal distribution, the confidence intervals are not symmetrical around the mean value, but they are biased towards the direction of skewness. The importance of the development of this simple and understandable model for accuracy estimation is supported by the fact that, traditionally, users are known to be failing to quantify risks due a lack of tools, theory and poor documentation on spatial data quality (van Oort and Bregt, 2005). The presentation of clear upper and lower limits rather than a single accuracy value may substantially improve the understanding of accuracy test results by users of lidar data and enhance communication with data providers. Furthermore, it is worth noting that the proposed model, unlike the ASPRS methodology, does not assume any previous lidar error distribution since all required information is extracted from the sample by means of a nonparametric approach. Thus, the so-called ‘‘strong assumption’’ of distribution normality accepted in many previous works, and in the majority of the current standards for positional accuracy, is avoided. Confidence Intervals for Rmse Estimate Figs. 6 and 7 show the performance of the model tested to compute rmse 95% confidence intervals as well as another model based on the Student’s t asymptotic approach. The Student’s t model has been formulated under the hypothesis of an almost-normal distribution of lidar vertical error and is given by the following expressions: pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi rmseupper ¼ mse þ ta=2 rmse ð15Þ  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. 159 Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models Error value (m) Upper limit Lower limit Population mean 1·4 1·2 1 0·8 0·6 0·4 0·2 0 –0·2 –0·4 –0·6 –0·8 0 20 40 60 80 100 120 100 120 Error value (m) Number of check points 1·4 1·2 1 0·8 0·6 0·4 0·2 0 –0·2 –0·4 –0·6 –0·8 0 20 40 60 80 Error value (m) Number of check points 1·4 1·2 1 0·8 0·6 0·4 0·2 0 –0·2 –0·4 –0·6 –0·8 0 20 40 60 80 100 120 Number of check points Fig. 5. Confidence intervals (95% confidence level) as a function of sample size computed, using the proposed model, for samples 11 (top graph), 52 (middle) and 61 (bottom) from the EuroSDR data-set. rmselower ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi mse  ta=2 rmse ð16Þ where mse is the mean square error given by equation (10), rmse is the standard deviation of the mse sampling distribution (equation (11)) and ta=2 is the two tails critical value corresponding to confidence level a for N ) 1 degrees of freedom, N being the sample size (see Aguilar et al. (2007a) for further details). In the case of error data coming from nonopen terrain, such as those from the EuroSDR reference data-sets (Fig. 6), it seems to be clear 160  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. 100 95 90 85 80 75 70 65 60 55 1 0·9 0·8 0·7 0·6 0·5 0·4 0·3 0·2 0·1 0 P(%) Rmse (m) The Photogrammetric Record 0 20 40 60 80 100 120 140 160 180 200 220 0 20 40 1 0·9 0·8 0·7 0·6 0·5 0·4 0·3 0·2 0·1 0 Number of check points 100 95 90 85 80 75 70 65 60 55 0 20 40 60 80 100 120 140 160 180 200 220 0 20 40 Number of check points 1 0·9 0·8 0·7 0·6 0·5 0·4 0·3 0·2 0·1 0 60 80 100 120 140 160 180 200 220 Number of check points 100 95 90 85 80 75 70 65 60 55 P(%) Rmse (m) 60 80 100 120 140 160 180 200 220 P(%) Rmse (m) Number of check points 0 20 40 60 80 100 120 140 160 180 200 220 Number of check points Upper proposed model Population rmse Lower proposed model Upper t-Student 0 20 40 60 80 100 120 140 160 180 200 220 Number of check points Proposed model t-Student Lower t-Student Fig. 6. Rmse evaluation results for samples 11 (top graphs), 52 (middle) and 61 (bottom) from the EuroSDR dataset. Left column: rmse confidence intervals (95% level) computed from the proposed model and the Student’s t approach. Right column: percentage of times population rmse falls within intervals computed for the proposed model and the Student’s t approach. that the Student’s t approach failed to reach the expected target value of 95% confidence, even when using sample sizes of up to 200 check points (Fig. 6, right). Conversely, the model tested here can be considered successful when using anything upwards from around 60 to 160 check points, 100 being the average for all 15 EuroSDR data-sets (see Fig. 9). Notice that the length of the confidence intervals (Fig. 6, left) is narrower in the case of the Student’s t model because the residual population has been assumed to be normal, and so standardised kurtosis and skewness have been set to zero (no outliers or systematic errors present). This is an unreal scenario which is overcome by means of the assumption-free nonparametric model. It is recognised that whilst non-normal, fairly symmetrical population shapes converge to normal looking sampling distributions with relatively small samples, strongly skewed distributions require much larger samples for the approximation to be good when a normal distribution is supposed. Thus the Student’s t approach is expected to require a large number of check points when working with a population that exhibits high skewness.  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. 161 0·3 0·28 0·26 0·24 0·22 0·2 0·18 0·16 0·14 0·12 0·1 0·08 0·06 0·04 100 95 P (%) Rmse (m) Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models 90 85 80 0 20 40 60 80 100 120 140 160 180 0 200 220 20 40 100 120 140 160 180 200 220 95 P (%) Rmse (m) 80 100 0·3 0·28 0·26 0·24 0·22 0·2 0·18 0·16 0·14 0·12 0·1 0·08 0·06 0·04 90 85 80 0 20 40 60 80 100 120 140 160 180 200 0 220 20 40 Number of check points 60 80 100 120 140 160 180 200 220 Number of check points 100 0·3 0·28 0·26 0·24 0·22 0·2 0·18 0·16 0·14 0·12 0·1 0·08 0·06 0·04 95 P (%) Rmse (m) 60 Number of check points Number of check points 90 85 80 0 20 40 60 80 100 120 140 160 180 200 Number of check points Upper proposed model Upper t-Student Lower proposed model Lower t-Student 220 0 20 40 60 80 100 120 140 160 180 200 220 Number of check points Population rmse Proposed model t-Student Fig. 7. Rmse evaluation results for 0405 lidar (top graphs), 0805 lidar (middle) and 3-sigma rule corrected 0506 lidar (bottom) from the NERC ARSF data-set. Left column: rmse confidence intervals (95% level) computed from the proposed model and Student’s t approach. Right column: percentage of times population rmse falls within intervals computed for the proposed model and Student’s t approach. Results from the NERC ARSF test data (Fig. 7), however, demonstrated that the Student’s t approach may be deemed appropriate when dealing with normal lidar vertical error distributions which are usually found over open terrain (no filtering error). It should be noted that the three NERC ARSF error data-sets clearly followed a normal distribution after being filtered using the 3-sigma rule, as can be observed from the K-S test results in Table II. In this case, results indicate that working with only a few check points, around 20 to 40, would be sufficient. In this instance the proposed model computed confidence intervals that were too wide and illustrated a somewhat erratic behaviour when working with a small number of check points. Indeed, despite the wide mean confidence interval that was calculated, the number of times the population rmse fell within the computed interval limits was found to be very poor until a sample size of between 60 and 80 check points was reached. This was mainly due to the high level of uncertainty in estimating the kurtosis value from a small sample size (Aguilar et al., 2007b). The Student’s t model estimates only mean square error and the standard deviation of the mean square error sampling distribution, whose values are more reliable than kurtosis and skewness when estimated from small sample sizes. 162  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. Check points needed to obtain the required confidence level The Photogrammetric Record 170 150 130 R2 = 0·9515 110 R2= 0·9526 90 70 50 30 0 10 20 30 40 50 60 Standardised kurtosis Fig. 8. Relationship between check points needed to reach a determined confidence level for the computed rmse bounds (circles 95%; triangles 90%) and the residual population kurtosis. Data from the 15 EuroSDR samples. Determining the Minimum Required Number of Check Points The nature of the relationship between the number of check points needed to reach a determined confidence level for the computed rmse bounds and the residual population kurtosis is depicted in Fig. 8. A second-order polynomial has been fitted to the experimental data to provide a better understanding of the non-linear increase in the number of required check points when errors come from a more leptokurtic data-set. The finding can be considered as logical since high kurtosis implies more errors are likely to be outliers, and so more check points are required for the construction of a reliable confidence interval. It should be stressed that when less stringent confidence levels are required (for instance, reducing confidence from 95 to 90%, as illustrated in Fig. 8), a smaller number of check points will be needed to construct a reliable interval according to the chosen confidence level. Kurtosis therefore seems to be the key to determining the required number of check points in order to achieve a certain confidence level in the uncertainty of the accuracy estimate. From these results, further work should be undertaken to determine some morphological terrain and landscape indicators (terrain roughness, landscape spatial distribution and arrangement of features such as buildings Check points needed to obtain the required confidence level 180 160 140 120 Maximum value 100 Minimum value 80 Mean value 60 40 20 0 75% 80% 85% 90% 95% 100% Confidence level Fig. 9. Relationship between the number of check points needed to reach a determined confidence level and the desired confidence level. Maximum, minimum and mean values extracted from the 15 EuroSDR samples and for every confidence level tested.  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. 163 Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models and vegetation) to help predict the expected degree of kurtosis in the error budget after applying lidar data filtering and gridding processes. Currently, the ASPRS guidelines specify that accuracies should be reported at the 95% confidence level. However, some researchers report applications related to geopositioning tasks which typically demand a 90% or even lower confidence level (Rodarmel et al., 2006). Therefore, rather than require 95%, the proposed method allows the user to determine the confidence level depending on the application of the data. Changing the confidence level is as simple as changing the significance level in the developed equations. Fig. 9 clearly shows that fewer check points are required when the degree of confidence is relaxed, although the relationship is not linear. In fact, in the worst case (maximum values in Fig. 9), the number of check points required for computing rmse bounds at the 95% confidence level would be 160, whilst it would be only 110 for an 80% confidence level. Conclusions From the results obtained through this research the following conclusions can be drawn: (1) The model proposed to compute confidence intervals for lidar vertical error has proved to be advantageous compared to the methodology recommended by the ASPRS Lidar Committee for non-open terrain (non-normal error distribution). The proposed methodology works quite well under open and non-open terrain conditions because input parameters such as skewness and kurtosis allow the model to seek the statistical nature of the error data-set. Furthermore, reliable and narrow confidence intervals can be calculated when working from as few as 60 check points. This is because interval length depends basically on the sample standard deviation, skewness and standardised kurtosis, rather than the number of check points used. (2) The proposed methodology not only estimates confidence intervals for lidar vertical error, but also computes uncertainty in rmse estimates, which is usually better understood by DEM users and data providers. In this case the model developed by Aguilar et al. (2007a) is recommended when lidar error in non-open terrain is evaluated. Depending on the kurtosis of error population, between 60 and 160 check points (around 100 on average) could be required. However, lidar accuracy assessment via rmse in open terrain (normal error distribution) would be more successfully achieved by means of the Student’s t approach because it needs fewer check points than the proposed model. In fact, only 20 to 40 check points would be required in this case. (3) Bearing in mind the variability of the data-sets tested, the results have proved to be very promising, the proposed models reproducing the statistical behaviour of lidar vertical errors using a reasonably low number of check points, even in the case of the more non-normally distributed residual data-sets. This research can therefore be considered as an important step towards improving and understanding the quality control of lidar-derived DEMs. Acknowledgments The authors are grateful to the Spanish Government for financing the first author’s fellowship at Newcastle University through the Spanish Research Mobility Programme ‘‘Estancias de profesores e investigadores españoles en centros de ensenanza superior e investigación extranjeros’’. Thanks are due to the NERC ARSF and Pauline Miller, Newcastle 164  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. The Photogrammetric Record University, for assisting with the data from Filey Bay and also to ISPRS WG III/3 ‘‘3D Reconstruction from Airborne Laser Scanner and InSAR Data’’ for facilitating access to the EuroSDR lidar data. references Aguilar, F. J., Agüera, F., Aguilar, M. A. and Carvajal, F., 2005. Effects of terrain morphology, sampling density, and interpolation methods on grid DEM accuracy. Photogrammetric Engineering & Remote Sensing, 71(7): 805–816. Aguilar, F. J., Aguilar, M. A., Agüera, F. and Sánchez, J., 2006. The accuracy of grid digital elevation models linearly constructed from scattered sample data. International Journal of Geographical Information Science, 20(2): 169–192. Aguilar, F. J., Aguilar, M. A. and Agüera, F., 2007a. Accuracy assessment of digital elevation models using a non-parametric approach. International Journal of Geographical Information Science, 21(6): 667–686. Aguilar, F. J., Agüera, F. and Aguilar, M. A., 2007b. A theoretical approach to modeling the accuracy assessment of digital elevation models. Photogrammetric Engineering & Remote Sensing, 73(12): 1367–1380. Axelsson, P., 1999. Processing of laser scanner data—algorithms and applications. ISPRS Journal of Photogrammetry and Remote Sensing, 54(2/3): 138–147. Axelsson, P., 2000. DEM generation from laser scanner data using adaptive TIN models. International Archives of Photogrammetry and Remote Sensing, 33(B4): 110–117. Clark, M. L., Clark, D. B. and Roberts, D. A., 2004. Small-footprint lidar estimation of sub-canopy elevation and tree height in a tropical rain forest landscape. Remote Sensing of Environment, 91(1): 68–89. Daniel, C. and Tennant, K., 2001. DEM quality assessment. Digital Elevation Model Technologies and Applications: The DEM Users Manual (Ed. D. F. Maune). American Society for Photogrammetry and Remote Sensing, Bethesda, Maryland. 539 pages: 395–440. FGDC, 1998. Geospatial positioning accuracy standards. National standards for spatial data accuracy. http:// www.fgdc.gov/standards/projects/FGDC-standards-projects/accuracy/part3/chapter3 [Accessed: 23rd August 2007]. Fisher, P. F. and Tate, N. J., 2006. Causes and consequences of error in digital elevation models. Progress in Physical Geography, 30(4): 467–489. Flood, M., 2004. ASPRS guidelines. Vertical accuracy reporting for lidar data. http://www.asprs.org/society/ divisions/ppd/standards/Lidar%20guidelines.pdf [Accessed: 23rd August 2007]. Godambe, V. P. (Ed.), 1991. Estimating Functions. Oxford University Press, Oxford. 356 pages. Godambe, V. P. and Thompson, M. E., 1989. An extension of quasi-likelihood estimation. Journal of Statistical Planning and Inference, 22(2): 137–152. Goodwin, N. R., Coops, N. C. and Culvenor, D. S., 2006. Assessment of forest structure with airborne LiDAR and the effects of platform altitude. Remote Sensing of Environment, 103(2): 140–152. Göpfert, J. and Heipke, C., 2006. Assessment of LiDAR DTM accuracy in coastal vegetated areas. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 36(3): 79–85. Hopkinson, C., Chasmer, L. E., Zsigovics, G., Creed, I. F., Sitar, M., Treitz, P. and Maher, R. V., 2004. Errors in LiDAR ground elevation and wetland vegetation height estimates. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 36(8/W2): 108–113. Hyyppä, H., Yu, X., Hyyppä, J., Kaartinen, H., Kaasalainen, S., Honkavaara, E. and Rönnholm, P., 2005. Factors affecting the quality of DTM generation in forested areas. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 36(3/W19): 85–90. James, T. D., Murray, T., Barrand, N. E. and Barr, S. L., 2006. Extracting photogrammetric ground control from lidar DEMs for change detection. Photogrammetric Record, 21(116): 312–328. Kraus, K. and Pfeifer, N., 1998. Determination of terrain models in wooded areas with airborne laser scanner data. ISPRS Journal of Photogrammetry and Remote Sensing, 53(4): 193–203. Lim, K., Treitz, P., Wulder, M., St-Onge, B. and Flood, M., 2003. LiDAR remote sensing of forest structure. Progress in Physical Geography, 27(1): 88–106. Miller, P., Mills, J., Edwards, S., Bryan, P., Marsh, S., Hobbs, P. and Mitchell, H., 2007. A robust surface matching technique for integrated monitoring of coastal geohazards. Marine Geodesy, 30(1/2): 109– 123. Oort, P. A. J. van and Bregt, A. K., 2005. Do users ignore spatial data quality? A decision-theoretic perspective. Risk Analysis, 25(6): 1599–1610.  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. 165 Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models Rodarmel, C., Theiss, H., Johanesen, T. and Samberg, A., 2006. A review of the ASPRS guidelines for the reporting of horizontal and vertical accuracy in lidar data. International Archives of Photogrammetry, Remote Sensing and Spatial Information Sciences, 36(1): 6 pages (on CD-ROM). Royston, J. P., 1982. Expected normal order statistics (exact and approximate). Applied Statistics, 31(2): 161–165. Sithole, G. and Vosselman, G., 2003. Report: ISPRS comparison of filters. http://www.itc.nl/isprswgIII-3/ filtertest/Report05082003.pdf [Accessed: 23rd August 2007]. Sithole, G. and Vosselman, G., 2004. Experimental comparison of filter algorithms for bare-earth extraction from airborne laser scanning points clouds. ISPRS Journal of Photogrammetry and Remote Sensing, 59(1–2): 85–101. Smith, S. L., Holland, D. A. and Longley, P. A., 2005. Quantifying interpolation errors in urban airborne laser scanning models. Geographical Analysis, 37(2): 200–224. Steel, R. G. D. and Torrie, J. H., 1980. Principles and Procedures of Statistics: A Biometrical Approach. Second edition. McGraw-Hill, New York. 631 pages. Su, J. and Bork, E., 2006. Influence of vegetation, slope, and lidar sampling angle on DEM accuracy. Photogrammetric Engineering & Remote Sensing, 72(11): 1265–1274. Torlegård, K., Östman, A. and Lindgren, R., 1986. A comparative test of photogrammetrically sampled digital elevation models. Photogrammetria, 41(1): 1–16. Weir, M. J. C., 2002. Monte Carlo simulation of long-term spatial error propagation in forestry databases. Spatial Data Quality (Eds. W. Shi, P. F. Fisher and M. F. Goodchild). Taylor & Francis, London. 336 pages: 294– 303. Appendix An estimating equation for parameter h is an equation g(X, h) = 0 which has a unique solution h = h(X) for every value of X, X being a random variable. In this case, X is a set of N independent residuals computed at corresponding check points for evaluating lidar-derived DEM accuracy. The function g is named the estimating function. An optimal estimation of h can be achieved considering the class of estimating functions as having zero expectation given by the following linear combination: g ¼ ah1 þ bh2 ðA1Þ where h1 and h2 are real functions which would be computed depending on the underlying nature of the statistical problem, and a and b are real functions that can be estimated by the locally optimal solution found by Godambe and Thompson (1989) and shown in equation (A2). For the correct application of this optimal solution, E(h1h2) should be equal to zero (E expressing the mathematical expectation of the operation in parentheses). That is to say, h1 and h2 must be orthogonal. A more detailed discussion on the theory of estimating functions, together with numerous applications, can be found in Godambe (1991).   @h  1 2 E @h   @h @h a ¼ ¼ ; b : ðA2Þ Eðh21 Þ Eðh22 Þ In this case, the parameter h would be the expected or mean value for the variable sample mean xðlÞ. The basic estimating functions, supposing the third and fourth central moments are known (skewness and kurtosis, respectively), would be (Godambe and Thompson, 1989) h1 ¼ x  l; h2 ¼ ðx  lÞ2  r2m  c1m rm ðx  lÞ ðA3Þ where rm and c1m are the standard deviation and the skewness of the sampling distribution of the sample mean, respectively. It can be easily proved that h1 and h2 are orthogonal functions, as stated before. Therefore, the estimates of real functions a and b can be obtained by means of equation (A2): 166  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. The Photogrammetric Record a ¼  1 ; r2m b ¼ c1m rm ; þ 2  c21m Þ r4m ðc2m r rm ¼ pffiffiffiffi : N ðA4Þ Hence the estimating function g for l would be estimated by g* = a * h1 + b * h2. Given that the distribution of the statistic g*/rg* can be approximated by a standardised normal distribution, the following expression can be written:    g  Za ¼ 1  a ðA5Þ P rg where Za is the one-tailed critical value corresponding to a confidence level of 1 ) a. Hence Za = 1Æ64 for a = 0Æ05. Equations (2) and (3) in the main text can be obtained from equation (A5) and solving a quadratic equation for the variable x  l. The reader is referred to Godambe and Thompson (1989) for further details. Résumé Malgré le coût relativement élevé des modèles numériques des altitudes (MNA) obtenus à partir de lidars aéroportés, ces produits sont généralement fournis sans qu’on les accompagne d’une estimation convenable de leur exactitude. Dans la plupart des cas, l’estimation de l’exactitude des MNA est fournie d’après une comparaison entre les altitudes obtenues au lidar et celles en un nombre limité de points de vérification dont les coordonnées proviennent d’une source indépendante et de précision supérieure; on suppose également que la distribution des erreurs et des écarts d’altitudes suit une loi normale. On propose dans cet article une nouvelle méthodologie pour évaluer l’exactitude altimétrique des MNA issus de lidars en recourant à des intervalles de confiance élaborés à partir d’un échantillonnage limite d’erreurs calculées aux points de vérification. On a essayé une solution nonparamétrique dans laquelle on ne suppose aucune distribution particulière des erreurs, ce qui permet d’appliquer la méthodologie proposée lorsque les distributions d’erreurs ne sont pas normales, ce qui est généralement le cas des MNA issus de lidars. On a testé la qualité de la modélisation proposée en utilisant une simulation expérimentale de type Monte Carlo sur 18 jeux de données d’erreurs altimétriques. On a établi 15 de ces jeux à partir des données-lidar originales fournies par le Groupe de Travail III/3 de la Société Internationale de Photogrammétrie et de Télédétection, en utilisant comme réalité de terrain leurs données de référence filtrées manuellement. Les trois derniers jeux de données ont été fournis par le système lidar du Natural Environment Research Council et son Airborne Research and Survey Facility, en déterminant les points de vérification par un système GPS cinématique de haute précision. Les résultats se sont montrés prometteurs, les modèles proposés reproduisant bien le comportement statistique des erreurs altimétriques du lidar pour un nombre convenable de points de vérification, y compris dans le cas où les jeux de données n’ont pas une distribution normale des résidus. On peut donc considérer que cette étude constitue une avancée importante pour l’amélioration du contrôle de la qualité des MNA issus de lidars. Zusammenfassung Trotz der relativ hohen Kosten für Digitale Höhenmodelle (DEMs) aus flugzeuggestütztem Lidar werden diese Produkte ohne eine zufrieden stellende  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. 167 Aguilar and Mills. Accuracy assessment of lidar-derived digital elevation models Genauigkeitsabschätzung geliefert. Meist wird die Genauigkeit des Höhenmodells durch Vergleich einer Auswahl von Kontrollpunkthöhen aus unabhängigen Messungen höherer Genauigkeit mit den Höhen aus den Lidar Daten abgeleitet. Dazu wird eine Normalverteilung der abgeleiteten Höhendifferenzen, bzw. Höhenfehler angenommen. Dieser Beitrag schlägt eine neue Methode zur Schätzung der Höhengenauigkeit eines Lidar Höhenmodells vor, das Konfidenzintervalle nutzt, die aus einer Stichprobe von Fehlern an Kontrollpunkten abgeleitet werden. Ein nicht-parametrischer Ansatz wurde getestet, bei dem keine besondere Fehlerverteilung angenommen wurde. Damit wird die vorgeschlagene Methode besonders auf die nichtnormalverteilten Fehler anwendbar, wie sie typischerweise in den Höhenmodellen aus Lidar auftreten. Das Verhalten des vorgeschlagenen Modells wurde empirisch mit einer Monte-Carlo Simulation an 18 Höhenfehlerdatensätzen untersucht. 15 dieser Datensätze wurden aus den originalen Lidar Daten der Arbeitsgruppe III/3 der Internationalen Gesellschaft für Photogrammetrie und Fernerkundung gerechnet, wobei die zugehörigen gefilterten Referenzdaten als Sollwerte angenommen wurden. Die restlichen drei Datensätze wurden durch das Lidar System des ‘‘Natural Environment Research Council’s Airborne Research and Survey Facility’’ bereitgestellt zusammen mit Kontrollpunkten, die mit hochgenauem kinematischen GPS bestimmt worden sind. Die Ergebnisse waren viel versprechend. Die vorgeschlagenen Methode reproduzierte das statistische Verhalten der Höhenfehler der Lidar Daten mit eine günstigen Anzahl von Kontrollpunkten, sogar in den Datensätzen mit nichtnormalverteilten Verbesserungen. Diese Forschungsergebnisse können deshalb als wichtiger Schritt zur Verbesserung der Qualitätskontrolle von Höhenmodellen aus Lidar Daten betrachtet werden. Resumen Pese al relativamente elevado coste de los Modelos Digitales de Elevación (MDE) obtenidos mediante lı́dar aerotransportado, estos productos se proporcionan habitualmente sin una adecuada estimación de su precisión. En la mayorı́a de los casos las estimación de la exactitud de un MDE se hace comparando las elevaciones lı́dar con una muestra finita de puntos de comprobación procedentes de una fuente independiente de referencia de mayor exactitud. En estos casos se suele asumir que la distribución estadı́stica de las diferencias de altura o error vertical sigue una distribución normal. Este artı́culo propone una nueva metodologı́a para estimar la exactitud vertical de un MDE lı́dar mediante intervalos de confianza construidos a partir de una muestra finita de errores calculados en los puntos de comprobación. En este sentido, se ha adoptado una perspectiva no paramétrica, sin asumir ninguna distribución especı́fica del error, de modo que la metodologı́a propuesta es particularmente aplicable en el caso de distribuciones no normales como las que suelen encontrarse en MDE lı́dar. La bondad del modelo propuesto fue validada experimentalmente en 18 conjuntos de datos lı́dar mediante una simulación numérica basada en el método de Monte Carlo. 15 de las 18 poblaciones de errores se obtuvieron a partir de datos lı́dar brutos facilitados por el Grupo de Trabajo III/3 de International Society for Photogrammetry and Remote Sensing, usando sus respectivos datos de referencia, filtrados manualmente, como forma de validación. Las tres poblaciones de errores restantes fueron proporcionadas por el sistema lı́dar del Airborne Research and Survey Facility del Natural Environment Research Council, junto con puntos de comprobación adquiridos mediante GPS cinemático de 168  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. The Photogrammetric Record alta precisión. Los resultados obtenidos resultaron ser muy prometedores, observándose cómo los modelos propuestos reproducı́an acertadamente el comportamiento estadı́stico de los errores lı́dar verticales a partir de una muestra relativamente pequeña de puntos de comprobación, incluso en casos en que los datos tenı́an residuos con una distribución no normal. Por tanto, esta investigación puede considerarse como un avance muy importante en la mejora del proceso de control de calidad de los DEM lı́dar.  2008 The Authors. Journal Compilation  2008 The Remote Sensing and Photogrammetry Society and Blackwell Publishing Ltd. 169