(LUR) Models

January 26, 2018 | Author: Anonymous | Category: Math, Statistics And Probability, Statistics

Short Description

Download (LUR) Models...

Description

Evaluating the Uncertainty of Land-Use Regression Models Halûk Özkaynak US EPA, Office of Research and Development National Exposure Research Laboratory, RTP, NC

Presented at the CMAS Special Symposium on Air Quality October 13, 2010

Land-Use Regression (LUR) Models • Widely-used methodology for estimating individual exposure to ambient air pollution in epidemiologic studies Point Sources

Area Sources

Line Sources 2

LUR Strengths • Able to capture smaller-scale variability in community health studies • Less resource intensive – – Easier to develop and apply compared with other methods for measuring or estimating subject-specific values (e.g., household measurements, physical modelling)

• Land-use data widely available 3

LUR Limitations

4

• Inputs – Require accurate monitoring data at large number of sites - e.g., in highly industrialized urban areas with many types of emission sources • Application in health studies – Not transferable from one urban area to another – Do not address multi-pollutant aspects of air pollution – Lack the fine-scale temporal resolution needed for estimating short-term exposure to air pollution – Often estimate ambient air pollution only versus indoor and personal • Lack the ability to connect specific sources of emissions to concentrations for developing pollution mitigation strategies

Analysis Goals: New Haven Case Study* • Use air pollution predicted by coupled regional (CMAQ) and local (AERMOD) scale air-quality models • Develop and evaluate land-use regression models for: – Benzene – Nitrogen oxides (NOx) – Particulate matter (PM2.5) • Examine (in future) the implications of alternate LUR development strategies on model efficacy for multiple pollutants Source: Johnson, M., Isakov, V., Touma, J.S., Mukerjee, S., and Özkaynak, H. (2010). 5 Evaluation of Land Use Regression Models Used to Predict Air Quality Concentrations in an Urban Area. Atmospheric Environment, Vol. 44, pp: 3660-3668.

Air Pollution Data • Air pollution concentrations were predicted at 318 census block group sites in New Haven, Connecticut using a coupled air quality model (Isakov et al., 2009) • Predicted daily concentrations for 2-month periods in winter and summer (2001) were used to calculate seasonal average concentrations for benzene, NOx, and PM2.5 at each site – July- August for summer – January- February for winter 6

et al. 2009. Journal of the Air and Waste Management Association; 59(4):461-472. • Annual averages were based Isakov on 365 daily means for 2001

LUR Model Structure and Inputs Dependent Variables

Independent (Predictor) Variables

Pollutant Concentrations Benzene, NOx, and PM2.5 Predicted by Coupled Regional and Local Scale Air Quality Models

=

Traffic Intensity and Proximity to Roadways

+

• Traffic intensity near the home (vpd/km2) • Proximity (1/distance) to major roadways

Proximity to Ports and Harbors

+

• Proximity (1/distance) to seaports • Proximity (1/distance) to harbors

Population and Housing Density

+

• Population density in census block group • Housing density in census block group

Proximity to Industrial Sources

• Proximity to industrial emitters of: –Benzene –NOx –PM2.5

• Multivariate linear regression models • Initial pool of 60 potential predictors • Eliminated variables based on

7

– High correlation (R-squared ~1.0) with other selected predictors and/or – Lack of interpretability

19 land-use variables included in model selection

Site Selection

• Sites – Census block group centroids • Training Sites – Sites used to fit LUR models – Selected from 318 census block groups in the study area – Stratified random selection among 4 census regions • Test Sites – Remaining sites withheld from training set - minimum of 10% used for independent model evaluation 8

Model Development and Evaluation

9

• Variable selection – Examined correlation structure for predictive variables • Model development – All subsets with 3-7 independent predictors – Model selection based on AIC, Mallow’s C(p), adjusted r-squared, and variance inflation factor • Model evaluation – Cross-validation within training dataset – Hold-out evaluation within test dataset • Models for multiple pollutants and training sites – Benzene, NOx, PM2.5 – 25, 50, 75, 100, 125, 150, 200, and 285 • Automated, iterative process – Site selection -> model development – Repeated 100x for each pollutant and number of training sites

Model Performance in Test versus Training Sites: Benzene Proportion of Variance Explained (R2)

1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

RSQ Predicted vs Observed Benzene in Test Dataset

0.1

RSQ LUR Models for Benzene in Training Dataset

0 0

25

50

75

100

125

150

175

200

Num ber of Sites in Training Dataset 10

225

250

275

300

Model Performance in Test versus Training Sites: NOx Proportion of Variance Explained (R2)

1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

RSQ Predicted vs Observed NOx in Test Dataset

0.1

RSQ LUR Models for NOx in Training Dataset

0.0 0

25

50

75

100

125

150

175

200

Num ber of Sites in Training Dataset

11

225

250

275

300

Model Performance in Test versus Training Sites: PM2.5 Proportion of Variance Explained (R2)

1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2

RSQ Predicted vs Observed PM2.5 in Test Dataset

0.1

RSQ LUR Models for PM2.5 in Training Dataset

0 0

25

50

75

100

125

150

175

200

Num ber of Sites in Training Dataset 12

225

250

275

300

LUR Prediction Errors: NOx • Prediction error = – Average (+/- SD) of mean predicted minus observed input values – For 100 iterations - aka 100 LUR models • Analyzed by low, medium, and high NOx concentration based on total NOx distribution – Low = 0 - 25th percentile – Medium = 25th - 75th – High = 75th - max 13

Rotterdam Area LUR versus Dispersion Model (Hoek et al., 2010) ) F

Dispersion Model

LUR Model

LUR Model Evaluation in Oslo from Hoek et al., 2010 Courtesy: Christian Madsen (Oslo) F ull model L OOC V adj R

2

adj R

2

Validation T raining s ets * 20 loc ations 40 loc ations adj R

2

adj R

2

2005: NO x NO 2 NO

0.63 0.69 0.57

0.61 - 0.66 0.67 - 0.72 0.53 - 0.59

0.52 - 0.68 0.60 - 0.78 0.45 - 0.61

0.58 - 0.70 0.64 - 0.76 0.51 - 0.65

NO x NO 2 NO

0.62 0.70 0.56

0.59 - 0.65 0.68 - 0.72 0.51 - 0.59

0.55 - 0.70 0.56 - 0.76 0.53 - 0.65

0.65 - 0.67 0.70 - 0.74 0.57 - 0.61

2008:

Comparison of Two LUR Models for Amsterdam (Hoek et al., 2010)

Comparison of Two LUR Models for Amsterdam Denoting Sites Impacted by Traffic or Urban Sources (Hoek et al., 2010)

Summary and Conclusions • We used air pollution concentrations predicted by coupled regional and local scale AQ models to develop and evaluate LUR models in New Haven, CT for benzene, PM2.5, and NOx • Model performance and robustness improved as number of sites used to build the models increased – R-squares were inflated for models based on pollutant concentrations from 25 trainings sites compared with models based on 100 -285 training sites – R-squared for LUR model (training dataset) and R-squared predicted versus observed (test dataset) converged as training sites increased

• It is critical to evaluate LUR performance using site-specific independent measurement data sets • Analysis suggests that coupled air quality models could provide a useful tool for improving LUR estimates of exposure to ambient air pollution in epidemiologic studies 18

• LUR model performance may be considerable poorer than emissions based modeling results for urban environments with complex sources and landscape characteristics

Acknowledgements* • • • • • •

19

Markey Johnson Vlad Isakov Joe Touma Shaibal Mukerjee Luther Smith (Alion Incorporated) Ellen Kinnee (Computer Science Corporation)

*Although this work was reviewed by EPA and approved for publication, it may not necessarily reflect official Agency policy

Additional Slides

20

Mean Contribution of Land-Use Factors in Benzene Models Proportion of Variability Explained Benzene 25 Site Models

Proportion of Variability Explained Benzene 100 Site Models

Models Based on 25 Training Sites 13%

2%

Models Based on 100 Training Sites

Intercept

10%

Traffic Intensity (vpd/km2)

8%

0%

12%

20%

11%

Proportion of Variability Explained 44% Benzene 285 Site Models

Models Based on Proportion of Variability Explained 25 Site Models 285 Benzene Training Sites

15%

13%

5% 0%

2%

27%

Proximity

Proximity to Ports and Harbors Proximity to Industrial Sources 20% Population and Housing Density

Proximity Harbors Proximity Sources Populatio Density

Traffic Intensity (vpd/km2) Proximity to Roadways 20% Proximity to Ports and Harbors Proximity to Industrial Sources Population and Housing Density

11%5%

21

48%

Traffic Inte

Proximity to Roadways

Intercept 10%

22%

Intercept

38%

LEGEND Intercept Traffic Intensity (vpd/km2) Proximity to Roadways Proximity to Ports and Harbors Proximity to Industrial Sources Population and Housing Density

Mean Contribution of Land-Use Factors in NOx Models Proportion of Variability Explained Models Based on NOx 25 Site Models

Proportion of Variability Explained Models Based on NOx 100 Site Models

25 Training Sites

100 Training Sites

1%

0% Intercept

1%

Traffic Intensity (vpd/km2)

15%

21%

Intercep

1%

Traffic In

16%

28%

Proximity to Roadways

24% 38%

Proximity to Ports and Harbors Proximity to Industrial Sources 17% Population and Housing Density

LEGEND

Proportion of Variability Explained Benzene 25 Site Models

285 Training Sites 15%

11% 5%

0%

2%

10%

0% 32%

22

48%

Proximi Harbors Proximi Sources Populat Density

38%

Proportion of Variability Explained Models Based NOx 285 Site Models on

13%

Proximi

Intercept

Intercept

Traffic Intensity (vpd/km2) 20% Proximity to Roadways

Traffic Intensity (vpd/km2)

Proximity to Ports and Harbors Proximity to Industrial Sources Population and Housing Density

Proximity to Ports and Harbors

Proximity to Roadways

Proximity to Industrial Sources Population and Housing Density

Mean Contribution of Land-Use Factors in PM2.5 Models Proportion of Variability Explained Models Based PM2.5 100 Site Models on

Proportion of Variability Explained Models Based on PM2.5 25 Site Models

25 Training Sites

100 Training Sites

1%

Traffic Intensity (vpd/km2)

8%

5%

6%

9%

9%

Inter

0%

Intercept

2%

Traf

0%

Prox

Proximity to Roadways

7% Proximity to Ports and Harbors Proximity to Industrial Sources Population and Housing Density

73%

Proportion of Variability Explained PM2.5 285 Models Models Based on Proportion ofSite Variability Explained

LEGEND

285 Training Sites Benzene 25 Site Models

13%

11%

11%

2%

0%

1% 5%

10%

Intercept Traffic Intensity (vpd/km2) 20% Proximity to Roadways

0%

23

83%

80%

Proximity to Ports and Harbors Proximity to Industrial Sources Population and Housing Density

Intercept Traffic Intensity (vpd/km2) Proximity to Roadways Proximity to Ports and Harbors Proximity to Industrial Sources Population and Housing Density

Prox Harb Prox Sou Pop Den

(LUR) Models

Short Description

Description

Comments

We need your help!