Habitat Modelling for Conservation Analysis Using GIS and Remote Sensing in Lantau Island, Hong Kong

Knowledge of spatial distribution of faunal species is crucial to prioritize conservation resources. While vulnerable or endangered species are usually well-protected, common species were always overlooked. Traditional survey represented species distribution by discrete points, which is precise but often limited in applicability for conservation analysis. This study combined remote sensing; geographical information analytical tools and well-established multivariate statistical modelling to predict habitats of fifty common and representative animal species. Reliable ground data records from terrestrial biodiversity survey were collected and ecological niche factor analysis was used to identify pseudo-absence sites essential for habitat statistical models. Binary logistic regression models and generalized additive models were used to predict the faunal habitats. Species-richness map was then produced to identify biodiversity hotspots and possible conservation gaps of the current conservation system. Results from gap analysis showed that only 1% of faunal species richness hotspots coincided with existing protected sites of special scientific interest (SSSI). 45% of species-rich sites were under-protected. The results not only identified deficiency of existing protection system, where extra conservation planning efforts should be considered, but also highlighted the direction for future development to minimize the land use conflicts. Integration of geoinformatics and statistical analysis with ecological knowledge assist timely conservation policy making.


Introduction
The spatial location of ecologically sensitive areas is fundamental for wildlife conservation. Traditionally, the distribution of species is represented by either point or ranged maps. However, these maps are criticized for not being representative enough [1] and provide inadequate information on the distribution of species [2]. The deficiency of both information and knowledge in species distribution and abundance is likely to hinder the reliable location of ecological sensitive sites [3], which in turn can limit the effectiveness of park and reserve planning process.
Remote sensing, GIS and habitat model Species and habitat types are non-randomly distributed. The basic needs of different species have implications on their distribution as they reflect essential environmental characteristics that related to their life support system including food, water, nesting sites, shelter, evasion of potential enemies, etc. [3]. Austin [4] and Guisan and Zimmermann [5] classified the ecological factors into resource, direct, and indirect gradients. Among various types of factors, indirect gradients are usually used as environmental predictors for species distribution modelling primarily because they can be effectively derived from remote sensing imagery and extracted geographical information database. Remote sensing identifies and retrieves land cover characteristics, particularly vegetation types at broad spatial extents [6]. In early days, vegetation map derived from aerial photos or satellite images were used as the only predictor to map species habitats. While Geographic Information System stores precise locations of land attributes related to topography, hydrology, man-made structures etc., these spatial data can be further manipulated and analyzed to generate other relevant environmental variables through various analytical tools. The observed species distribution always shows significant correlation with these environmental variables. Suitable habitat of different species can then be modeled and localized by establishing relationship between field records and environmental gradients through modelling process [3].
Over past few decades, multivariate statistical algorithms were used [7][8][9] to model the distribution of species by identifying suitable habitat of concerned species [10]. Regression models have been extensively applied to species distribution prediction [7,9,11,12]. Traditionally, multiple regression and multiple discriminant analysis in which data are assumed to have normal errors were used to predict species abundance. However, assumption of Gaussian distribution error sometimes limits their applications [13,14]. New modelling paradigms, particularly, generalized linear model (GLM) and generalized additive model (GAM) were developed and intensively applied to habitats modelling of fauna [7,8,10,11,15,16]. The majority of statistical models require both presence and absence of dataset. Hirzel [17] argued even though the absence data was available, their accuracy was in doubt. The Ecological Niche Factor Analysis (ENFA) which requires presence-only data for modelling species distribution was developed. Not only would the models provide fundamental understanding of the habitat requirements of species; it can also extend to sites where no field surveys were conducted due to accessibility or manpower constraints. More crucially, if the potential habitats of a number of species overlapped, conservative measures should be considered in these sites.

Biodiversity and conservation
Parks and reserves are commonly used to conserve ecologically valuable and sensitive areas [18] and vulnerable or endangered species. While habitats of common species are usually overlooked, their habitats may coincide and perform high ecological functions. Due to the lack of reliable data to identify these habitats, these sites tend to be under-protected and may become vulnerable for development and eventually lead to the decline in both plant and animal diversity [18] Closely linked with habitat modelling is the identification of conservation sites of high level of biodiversity. The aggregation of species distribution helps identify ecologically important sites. A biodiversity hotspot is "a site that supports a significant proportion of the species in a particular taxon or group" [19]. As one of the measurements of ecological importance, it is the composition of species richness, endemism and rarity. Species richness is commonly used as indicator of biodiversity because of its simplicity, feasibility to measure and data availability [20]. Although the discrepancy between the definition of biodiversity and its actual representation is still under many debates [21][22][23], biodiversity hotspots is always used as a guide to identify areas of conservation priority [24]. Gap analysis is one of the systematic conservation planning methods. It identifies geographical gaps and prioritized areas for conservation by comparing potential locations/ coverage of plant and animal habitats (species-rich areas) to those that are currently protected [1,25,26]. The method provided useful and objective information to balance conservation and development and to minimize land use conflict.
This study attempted to identify biodiversity hotspots and possible conservation gaps based on predicted distribution of common species in Lantau Island, Hong Kong Special Administrative Region (HKSAR). With the aid of remote sensing and GIS analytical techniques, the following objectives are to going to be achieved: to identify and map the habitats of fifty common species based on field survey data; to evaluate and compare the accuracy of two regression models, i.e. binary logistic regression model (BLRM) and generalized additive model (GAM) used for habitat prediction; and to identify and locate biodiversity hotspots and possible conservation gaps; and to evaluate the proposed conservation and development plan in Lantau Island.

Materials and Methods
Study site Lantau Island is located at the south-western part of Hong Kong (22°16′14″N,113°57′10″E) as shown in Figure 1. It is the biggest island in Hong Kong with an area of approximately 142km 2 and over half of the land areas (78.4km 2 ) are designated as Country Parks. Lantau Island nurtures over 120 butterfly species and 63 dragonfly species which represent 50% and 60% of total butterfly and dragonfly species respectively in Hong Kong. Rare plants, birds, mammals, reptiles and amphibians are abundant with some are native and protected by laws [27]. Existing statutory country park system and SSSI of Lantau Island are captured in Figure 1. Ecological significance of Lantau is affirmed by the establishment of eight Sites of Special Scientific Interests (SSSIs), where fauna and flora with special scientific value are found. Due to surging land demand, Lantau was under the struggle between development and conservation in 2004 [28]. On one hand, the extensions of boundary of the Lantau North Country Park (LNCP) and of waters around Southwest as Marine Park were proposed [28]. On the other hand, the Island was further considered to be developed into a tourism spot supported by recreational facilities. The influx of Disneyland in 2005 was an example of mass tourism development. Although comments from public consultation were incorporated in concept plan revision in 2007, no objective and scientific evaluation was made and rendered the plan too general for further exploration and discussion until now.

Species Data
A spatial database was set up encompassing ecological surveys, remote sensing images, and GIS data from digital topographic maps.
Species data was collected from GIS database of the Biodiversity Survey of Hong Kong (version 3) published in 2002. The survey records were compiled from systematic in situ field survey of terrestrial and freshwater biodiversity, published sources and personal records [29] by the Department of Ecology and Biodiversity, the University of Hong Kong. It contains more than 95,000 point and binary locational records of more than 5000 species (from 20 taxonomic groups) and was geo-referenced to local Hong Kong 1980 Grid coordinate system. The recording resolution varies from 10 meters to 1 kilometer, but with the majority sampled at 100 meters. Details about the survey can be found in http://www.biosch.hku.hk/ecology/bs/.
Fifty species from five taxonomic groups namely, amphibians, birds, butterflies, dragonflies and mammals were extracted from the survey shown in Table S1 in Supplementary Material. Except all the birds and H. brachyura (mammal), no species has statutory protection in Hong Kong, which means they are common to Hong Kong.
M. migrans (aves) has been listed in the protection of endangered species ordinance (Cap. 187) and H. brachyura is listed as 'vulnerable' under the IUCN Red List Status. The number of survey points acquired from the survey is shown in Table 1. Effective number of sample points after excluding data points that fell outside the terrestrial boundary is bracketed.

Environmental gradients
Environmental gradients were divided into four categories including: (1) types of vegetation; (2) proximity to water resource; (3) influence of topographic feature and (4) adaptability to human disturbance. Figure 2 summarizes the environmental gradients and the tools/ software used to compute them.  The image was ortho-rectified to same local coordinates with the overall root-mean-square-error of 0.02 pixels (0.2 meter). Supervised classification with maximum likelihood classifier was applied to classify the image into eleven land cover classes including sea water, fresh water, fish pond, mangrove, woodland, mixed shrubland, shrubby grassland, grassland, hill-fire site, bare soil and urban area. The classified map is shown in Figure 3. Classification accuracy was computed from 500 stratified random sample points (413 points fell within the land boundary). The overall classification accuracy is 72% (297/413) which is considered to be reasonable for further analysis.
Resource, landscape and human-related factors were computed based on the digital topographic map layers from the Lands Department, The Government of Hong Kong Special Administrative Region. Apart from vegetation, rivers and streams are vital resources and the data was extracted from hydrographic polygons. Landscape factors including slope, aspect, land curvature affects the potential solar radiation received of a region. Spatial layer features including spot height, contour, and coastline were used to compute these factors. Man-made structures affect integrity and cause habitat fragmentation which were represented by road and railway networks and building structures in natural areas. Various spatial operations were conducted to compute the environmental gradients using the SPOT-5 derived vegetation maps and topographic data in IDRISI v.14.02, Biomapper v.3.1 and GRASS v.6.0. Definition, analysis and statistical description of each predictor are documented in Table S2 in Support Information.

Statistical Modelling
Habitats of individual species were predicted and mapped by forming relationship between point locations where species were recorded and environmental gradients using statistical model. Species-based modelling consists of three processes -data collection and exploration, model selection and model assessment. The dataset comprised the species occurrence points and the environmental predictors as described above. Prior to modelling, ranges of occurrence and correlation among the predictors were examined. Predictors with correlation coefficient less than or equal to 0.5 were retained.
Three multivariate statistical modelling methods including (1) Ecological Niche Factor Analysis (ENFA), (2) Binary Logistic Regression Model (BLRM) under the family of Generalized Linear Model (GLM) and (3) Generalized Additive Model (GAM) were used to predict the probability of occurrence of each of the fifty species. The modelling methods are summarized in Figure 4 and described below.

152
Habitat Modelling for Conservation Analysis using GIS and Remote Sensing in Lantau Island, Hong Kong ENFA performed two roles. First, it provided basic description and understanding of the general habitats of the species based on global statistics. Second, it was used to generate pseudo-absences data which are essential for regression analyses. The basic idea of ENFA is to compare environmental variable or ecogeographical variable (EGV) space of those areas where species are present with the characteristic of the whole study area. Suitability functions were used to describe habitat suitability of individual species [17]. ENFA was conducted in BIOMAPPER v.3.1. The predictors were firstly normalized by the Box-Cox function. Factor analysis similar to principal component analysis (PCA) was then used to transform the predictors into uncorrelated factors explaining the same amount of variance (information) based on the variance-covariance matrix. However, unlike PCA, the transformed factors convey ecological meanings. The first transformed factor which explains the maximum amount of variance is termed marginality the subsequent factors are named as specializations [17]. The marginality and selected specialization factors were retained with the amount of variances not less than 70%. Habitat suitability index was then computed with the distance geometric mean algorithm based on the retained factors. Three global statistics, namely, Marginality (M), Specialization (S) and Tolerance (T), describing the general habitat requirements when compared with the whole study area were computed.
Quality pseudo-absences were generated based on the habitat suitability results from ENFA. Random points were generated in areas where suitability level is less than 10. The number of pseudo-absences was the same as the recorded presences and the prevalence equals 0.5. Mann-Whitney U independent-samples test was used to check the significance of difference between the presence and pseudo-absence dataset using each predictor. The pseudo-absences were then incorporated in BLRM and GAM modelling. BLRM BLRM, a category of GLM, is one of the most popular habitat modelling methods extremely suitable when the dependent variable is dichotomous or binary characterized in many field surveys [7]. The link function of logistic regression is logit transformation and the error structure is assumed to be binomial [30]. With the logit link function, the results are more closely related to ecological and biological sense [7].
The likelihood of occurrence of a species is expressed in the equation below: is the likelihood of occurrence of species as a function of environmental predictors x. The likelihood of occurrence ranges between 0 and 1. Model building was conducted in SPSS with backward stepwise method to select relevant predictors. GAM GAM is non-parametric extension of GLM and distinguished from the GLM family by allowing the predictors to be non-linear in nature [31,32]. When linking with the logistic term, the general form of the logit(p) is given as Where s i (x i ), i = 1, …, k are smooth functions. The transformation of logit(p) into the probability of occurrence ranged between 0 and 1 was computed using (2).
Model building of GAM was implemented in Generalized Regression Analysis and Spatial Prediction (GRASP) v.3.0 in S-Plus v.6.2. As the prediction dataset was more than 250,000 observations, the prediction was required to be performed in ArcView [33].
Species models were evaluated for their reliability (goodness-of-fit) and their discriminatory ability [34] using the same dataset due to the deficiency of sample points for independent evaluation. Reliability was tested by the percentage of deviance reduction (R) while discrimination performance was described by leave-one-out cross validation (LOOCV) and Area Under ROC Curve (AUC). The logit denoted g(x) of the best model was selected to compute habitat map using the image calculator in IDRISI.

Species-richness and gap Analysis
Species richness was used as a surrogate to identify biodiversity spots. It was computed by taking the average probability of occurrence of the fifty species from selected models using the equation below: where π i is the likelihood of occurrence of a species defined in (2). The values ranged between zero (absences of species) to one (maximum species abundance). Biodiversity hotspots were defined using three cut-off thresholds, i.e. 0.5, 0.7 and 0.9. Areas with 0.9 or above are sites with extreme high species richness while 0.5 comprised sites of both extreme high and moderate richness. Hotspots defined using different thresholds were compared with protection system in 2004 including the Country Parks and eight SSSI through simple overlay analysis to identify and quantify possible conservation gaps in Lantau. The proposed extension of Country Park was evaluated.

Geographical distribution of fifty common species
Results from statistical habitat models provided fundamental understanding of the general living conditions of various common species. Detail statistical results of individual species were presented in Table S3 -S5 in Support Information. Table S3 shows the summary statistics extracted from ENFA. Marginality (M) and Tolerance (T) computed from ENFA revealed the range of habitats. On the whole, the modelled species tend to inhabit in environment which is moderately different from the mean conditions in Hong Kong (mean M = 0.709±0.226) and adapt to wide range of environment (mean T = 0.630±0.172).
Environmental factors selected by regression models and their corresponding relationship and magnitude are summarized in Table S4 and S5 for BLRM and GAM respectively in Support Information. On average, the number of variables retained by AIC of GAM is higher than that selected by the stepwise BLRM for most species. Nonetheless, similar predictors were retained by the two regression models. The habitat characteristics were interpreted based on the summary statistics of three models in taxonomic group below.
Amphibian -ENFA showed that P. megacephalus is the most adaptive species reflected by the low M = 0.35. Most species tend to live in habitats which are moderately different from the mean conditions in the territory of Hong Kong (mean M = 0.771±0.451). Documentation of amphibian distribution in Hong Kong also suggested that P. megacephalus, B. melanostictus, R. guentheri and R. limnocharis are all widespread in various habitats including forest, shrublands, low-lying grasslands, freshwater marshes, wet agricultural fields, upland mountain streams, catchwaters and fish ponds at all altitudes [35]. R. livida (M = 1.634) and R. exilispinosa (M = 1.331) require an extremely special living environment. For instance, R. exilispinosa is found in and near mountain streams [35]. The tolerance of most species is quite high with the maximum 0.87 (P. megacephalus) and mean equal to 0.713±0.177. R. livida is the least tolerant (T = 0.302). Together with the high M value, R. livida is likely to be restricted to a specific range of living conditions which are quite different from the background condition in Hong Kong. The regression models suggested that the amphibians inhabit in densely-vegetated areas such as woodland and shrubland and stay away from grassland. Moreover, major rivers are more favorable than small streams which reveal that most amphibian species inhabit closely to constant water sources.
Bird -Thirteen bird species tend to tolerate and adapt to a wide variety of habitats from rural to urban land uses. The summary statistics from ENFA showed that S. chinensis is the most adaptive species reflected by the lowest M = 0.473 in the group. Most species tend to inhabit in environment moderately different from the mean conditions of the territory (mean M = 0.610±0.095). P. sinensis has the highest tolerance level (0.809), which suggests that the species tends to live in a general range of living conditions. E. garzetta has the lowest tolerance (T = 0.16), which reveals that the species is restricted to a very specific range of living condition, in this case, the wetland habitats. The retained variables by the regression models suggested the bird species generally require densely-vegetated woodland and away from grassland. GAM further implies that solar radiation (AUTUMN), and water resources (RIVER) are positively related to their distribution. Eight out of thirteen species are common urban breeding birds; they adapt well to urban settings and human disturbance (BLDG).
Butterfly -ENFA revealed most butterfly species inhabit in environment moderately different from the mean environmental conditions in Hong Kong (mean M = 0.652±0.060). Indicated by mean global tolerance (T) of 0.677±0.180, the range of living condition is quite wide. A. echerius echerius has the highest tolerance level (T = 0.811), suggesting the species tends to live in general range of conditions. Comparatively, I. pyrene pyrene has the lowest tolerance ability (T = 0.24) and is restricted to a limited range of habitat condition. Nonetheless, local documentations suggested that I. pyrene pyrene is a common species which adapts to urban disturbance [36,37]. BLRM reveals that most species inhabit in densely-vegetated lands including woodland and mixed shrubland, and seldom found in open grassland and bare land. M. leda leda, H. glaucippe glaucippe, I. pyrene pyrene and P. memnon agenor are representative examples. Moreover, Species including C. nerissa nerissa, P. canidia canidia, P. memnon agenor, Z. maha serica and J. almana almanac adapt to proximity to human habitation such as urban parks and agricultural fields. Variables including water resources (RIVER), altitude (ELEV), distance from main roads (MAIN ROAD) and buildings (BLDG) all showed positive impact on the distribution of butterfly.
Dragonfly -Among thirteen species, R. autumnalis, O. sabina sabina and C. marginipes are widespread in Hong Kong. ENFA results also show that most species tends to inhabit in environmental condition relatively different from the mean conditions in Hong Kong (mean M = 0.709±0.103). The living condition of R. perforata perforata is quite unique with M = 0.918. Local documentations suggested that R. perforate perforata are found in fast flowing streams and rivers with strong current. The mean global tolerance (T) of 0.656±0.079 indicated the range of living condition is quite wide. P. flavescens has the highest tolerance level (T = 0.786), which suggests that the species tends to live in general range of living conditions while A. immaculifrons has the lowest tolerance (T = 0.491). Both BLRM and GAM revealed that proximity to water supply (RIVER) affects the distribution of dragonfly as water plays an important role in the life-cycle of dragonfly. Apart from water, they tend to inhabit in medium to highly vegetated lands (WOOD, MSHRUB, SHGRASS).
Mammal -With diversified food sources, the range of habitats of mammals is much wider that a distinctive or specific living habitat cannot be easily identified. All four species have marginality over one and tend to live in environment which is different from the mean conditions in Hong Kong (mean M = 1.067±0.108). The range of living condition is quite restrictive as explained by low mean global tolerance (T), 0.440±0.095. R. sikkimensis has the highest tolerance level (T = 0.545) while the tolerance of S. scrofa is the lowest (T = 0.358). Local documents also suggest that the two rats, N. Fulvescens and R. sikkimensis tend to inhabit everywhere from grassland to woodland [19]. BLRM suggests that they tend to inhabit in densely-wooded areas while GAM suggests all kinds of vegetative covers are suitable habitats. The regression models also revealed that water resources (RIVER), altitude (ELEV) and proximity to coastal locations (COAST) affects the distribution of the mammal species.
Tables 1(a -e) shows the model assessment statistics of individual species while Table 2 summarizes the accuracy statistics in taxonomic groups. The deviance reduction (R) of GAM is generally higher than that of BLRM. The mean deviance reductions are 87%±11% and 61%±14% for GAM and BLRM respectively. The maximum R for GAM and BLRM are 99% and 80% respectively. Discriminatory ability was evaluated by the LOOCV and AUC statistics. All species, except S. scrofa, show high classification accuracy in GAM in LOOCV. The mean accuracies from LOOCV are 95.9±7.4 and 89.0±5.0 for GAM and BLRM respectively. The AUC shows similar results. With the exception of two species -I. pyrene pyrene and S. scrofa, all species modeled with GAM show higher AUC than those modeled with BLRM. The mean AUC are 0.988±0.022 and 0.946±0.035 for GAM and BLRM respectively, which suggested excellent discrimination. As all three statistics suggested GAM performs better in model reliability and discriminatory ability, GAM models were used for computation of taxonomic-grouped richness maps and overall species richness map.  Species richness map derived from the average probability of occurrence of the fifty common species is shown in Figure 5. Areas of extremely high species richness are shown in red. The boundaries of village settlements, SSSI and Country Parks were overlaid for comparison. Species-rich hotspots were found in the southern Lantau, areas around Tung Chung, Tai Ho and Nim Shue Wan in northern Lantau. These sites have the following characteristics. First, they are in close proximity to the hydrological networks. Second, they are all below 300 meters (above mean sea level) The uplands in Lantau Peak, Sunset Peak, Yi Tung Shan, Lin Fa Shan, Nei Lak Shan are in blue implying species richness is low. Third, they are mostly located at foothills and along the valleys in-between the mountainous and undulating landscapes. Finally, some of them are located in vicinity of village settlements.

156
Habitat Modelling for Conservation Analysis using GIS and Remote Sensing in Lantau Island, Hong Kong Spatial analytical results showing the concordance between species richness and existing protection system is shown in Figure 6 and Figure 7. Species-rich sites protected by Country Parks (prior to 2004) and SSSI are represented in green and yellow respectively. Conservation gaps where high species richness sites fell outside SSSI and country park area are highlighted in red.
Species-rich areas of 0.9 and above indicated the most significant biodiversity hotspots in Lantau. As shown in Figure 6a, they located primarily near the lower bound of the Lantau South Country Park (LSCP). Corresponding, Figure  7a shows that 48% of total species-richness sites fell within the LSCP; 6% of the total was found to be coincided with the LNCP and less than one percent of the total was within the SSSI boundaries. The rest (45%) of extremely species-rich sites are unprotected by current conservation system, i.e. conservation gaps.
The total species-rich areas increased after the relaxation of threshold to 0.7 ( Figure 6b) and 0.5 ( Figure 6c) and they identified sites with high to moderate species richness. Spatial concordance between the species-rich areas and LNCP increases from 6% (0.9) to 8% (0.7) and 11% (0.5). However, species-rich areas spatially coincided with LSCP shrank to 47% (0.7) and 43% (0.5). The concordance with SSSI increased proportionally, but the percentage to total areas stayed around 1%. In fact, species-rich areas tended to fall outside Country Park boundaries and SSSI; and the conservation gap remained steady was around 44 to 45% as shown in Figure 7.

Discussion
Species-based habitat modelling GAM is superior to BLRM in prediction accuracy and reliability, however, relationships with most environmental variables are nonlinear in GAM (Table S5). Results from ENFA and BLRM are complementarity to understand the general and specific requirement of the species.
Generally, the chosen species were more responsive to factor related to vegetation type and water resource than to other factors. Although BLRM and GAM selected different sets of environmental variables, they identified common predictors for individual species and taxonomic groups. Almost all species favors densely vegetative habitats and absent in grassland and bare soil. As suggested by local literature, bird species like P. major, C. macrorhynchus, C. sinensis and S. chinensis are commonly found in fung shui woodlands while P. jocosus, C. macrorhynchus and L. schach are usually absent from closed woodlands and shrublands [38,39]. Butterfly species such as C. nerissa nerissa, C. clytia clytia, P. polytes polytes, P. paris paris prefer woodland and shrubland edges. The models reflected habitats requirement of various species.
Nevertheless, a few species such as E. garzetta, A. phoenicurus and M. caeruleus have long been observed in and known to well-adapt to urban environment in Hong Kong was not totally reflected by the models because field survey was mainly conducted in rural or non-urban areas. Besides, proximity to village houses was the only factor reflecting urban influence in the model.

Conservation analysis
Locations of vulnerable or endangered species are well-recognized with extensive protective measures exclude them from development consideration. For instance, SSSI is designated based on its uniqueness in terms of faunal, floral, ecological or geographical features. The emphasis of SSSI lies mainly on rare plant and animal species while the species richness map was computed based on common species. Hence, it is not unusual that species richness and SSSI did not exhibit a strong spatial relationship (shown in Figure 5) as special requirements of the rare species were not taken into account. The result was similar to other studies which also showed low correlation between species diversity and rare species [40]. In other words, the only dependency on existing protective measures cannot satisfy conservation need.
Results from gap analysis revealed conservation deficiency of current protection system and provides insights for further conservation efforts. By comparing different scenarios in Figs. 6, sites of extreme species richness (0.9) were located mainly in southern Lantau while sites of high (0.7) and moderate (0.5) were found in both LNCP and LSCP. However, the concordance of species richness with the LSCP was more significant than that with LNCP as shown in Figure 7. Although the coverage of the LSCP is extensive, extreme species-rich areas are still under-protected in southern island (shaded in red) in Figure  6a. Conservation gaps expanded to the entire southern Lantau and to an extensive part of the northern island when high to moderate level of species richness were considered in Figs. 6a and b.
The results revealed that 45% of species-rich hotspots tended to fall outside these protective borders. Such non-concordance suggested species richness provided another dimension for conservation planning and development control. In fact, no comprehensive guidelines are currently developed for the designation of SSSI or conservation areas because it is technically difficult if not infeasible in acquiring reliable data to formulate protection plans.
During the study period, an extension of the LNCP boundary ( Figure 1) with the primary objective to protect unspoiled stream courses and natural woodland of high conservation and landscape value has been proposed. The results also indicated that extra conservation effort was necessary in Lantau as a high percentage of areas were unprotected. In Figure 6c, the proposed extension was overlaid on top of the results. Red areas suggest extra conservation efforts are necessary due to relatively high species richness. However, less than half of these areas overlapped with the proposed extension and they were mainly found in the north near Sha Lo Wan and Tai Ho while the extension in north-eastern Lantau showed little coherence with species richness. This may be primarily due to the hill fire occurred at time when the study was conducted. Nevertheless, it is believed that priority of conservation should be given to southern Lantau where species richness is the highest and conservation gap is most significant.

Conclusions
Traditional species survey represented species distribution by discrete points. By linking indirect environmental gradients derived from timely remote sensing and GIS database and representative ground-truth survey data using multivariate statistical models, habitats of fifty common species were represented in spatially-continuous map. Three multivariate statistical models, ENFA, BLRM, and GAM were used to understand and model the living habitats of individual species. ENFA transformed the survey data into meaningful information to generally explore the habitats of species based on marginality and tolerance. Besides, results from ENFA provided reliable pseudo-absence data that were generated for BLRM and GAM regression analyses. Environmental gradients selected by stepwise BLRM and GAM further enhanced the understanding of species habitats. By comparing the two models, GAM performed better than BLRM in both reliability and discriminability. Local documentation suggested that habitat prediction showed both concordances and discrepancies.
Biodiversity hotspots were identified based on computation of species richness using the prediction results of GAM. Using various thresholds for species richness definition, the extremely species-rich sites were found in southern Lantau. High to moderate species-rich sites extended towards northern Lantau. Comparison of species richness and current protection boundaries revealed that about 45% of land areas were under-protected. When this study was in progress, the government has proposed an extension of conservation areas and also tourism development in Lantau. The results proved to be an important justification for evaluating the validity of the proposed extension. The species richness map acts as crucial information for conservation zoning.
All in all, Hong Kong is always facing the scarcity of land resources. Continuous demand for housing and economic expansion exerted tremendous pressures on unspoiled land for development. Balancing the objectives of development and conservation is always the prime concern of government and local citizens. Species richness can be one of the ecological indicators for partitioning the areas into different levels of ecological importance, based on which the conflicts between development and conservation can be effectively minimized. The approach enables timely, efficient and effective measurement of spatial distributions of different species, which is a valuable tool for conservation management and policy making. The results acted as a scientific, quantitative foundation and provided objective guidelines for planners, government, developers and other conservation parties to identify sensitive areas and evaluate existing protection system.

Supplementary Materials
Additional Supplementary Material may be found in the online version of this article:   Ф The number of factors chosen according to the broken-stick advice. * The amount of variance explained by both the marginality and specialization factors. # The amount of variance explained by the specialization factors only. Ω Global Marginality -A summary statistic describing the difference between the species optimal living condition and the environmental conditions within the study area. Value range from 0 to 1. Low value (approaching 0) suggests that a species tends to live in average conditions throughout the study area while high value (approaching 1) shows that a species is likely to live in extreme environment. ∂ Global Tolerance -A summary statistic describing the range of living condition of a species within the study area. Value range from 0 to 1. Low value (approaching 0) depicts a species can only survive in specific range of condition while high value (approaching 1) indicates a general living condition is acceptable for a species. ^ Global Specialization -A summary statistic which is the reciprocal of tolerance. Value range from 0 to infinity.  Table S5a. Selected predictors by AIC for generalized additive models of amphibian species Table S5d. Selected predictors by AIC for generalized additive models of dragonfly species