Please note that rgdal will be retired by the end of 2023,
plan transition to sf/stars/terra functions using GDAL and PROJ
at your earliest convenience.
rgdal: version: 1.5-30, (SVN revision 1171)
Geospatial Data Abstraction Library extensions to R successfully loaded
Loaded GDAL runtime: GDAL 3.4.2, released 2022/03/08
Path to GDAL shared files: /Library/Frameworks/R.framework/Versions/4.2/Resources/library/rgdal/gdal
GDAL binary built with GEOS: FALSE
Loaded PROJ runtime: Rel. 8.2.1, January 1st, 2022, [PJ_VERSION: 821]
Path to PROJ shared files: /Library/Frameworks/R.framework/Versions/4.2/Resources/library/rgdal/proj
PROJ CDN enabled: FALSE
Linking to sp version:1.4-6
To mute warnings of possible GDAL/OSR exportToProj4() degradation,
use options("rgdal_show_exportToProj4_warnings"="none") before loading sp or rgdal.
To access larger datasets in this package, install the spDataLarge
package with: `install.packages('spDataLarge',
repos='https://nowosad.github.io/drat/', type='source')`
Loading required package: sf
Linking to GEOS 3.10.2, GDAL 3.4.2, PROJ 8.2.1; sf_use_s2() is TRUE
Before we start any analysis, let us set the path to the directory where we are working. We can easily do that with setwd(). Please replace in the following line the path to the folder where you have placed this file and where the house_transactions folder with the data lives.
To explore ideas in spatial regression, we will the set of Airbnb properties for San Diego (US), borrowed from the “Geographic Data Science with Python” book (see here for more info on the dataset source). This covers the point location of properties advertised on the Airbnb website in the San Diego region.
For most of this chapter, we will be exploring determinants and strategies for modelling the price of a property advertised in AirBnb. To get a first taste of what this means, we can create a plot of prices within the area of San Diego:
Spatial heterogeneity (SH) arises when we cannot safely assume the process we are studying operates under the same “rules” throughout the geography of interest. In other words, we can observe SH when there are effects on the outcome variable that are intrinsically linked to specific locations. A good example of this is the case of seafront houses above: we are trying to model the price of a house and, the fact some houses are located under certain conditions (i.e. by the sea), makes their price behave differently. This somewhat abstract concept of SH can be made operational in a model in several ways. We will explore the following two: spatial fixed-effects (FE); and spatial regimes, which is a generalization of FE.
Spatial FE
Let us consider the house price example from the previous section to introduce a more general illustration that relates to the second motivation for spatial effects (“space as a proxy”). Given we are only including two explanatory variables in the model, it is likely we are missing some important factors that play a role at determining the price at which a house is sold. Some of them, however, are likely to vary systematically over space (e.g. different neighborhood characteristics). If that is the case, we can control for those unobserved factors by using traditional dummy variables but basing their creation on a spatial rule. For example, let us include a binary variable for every neighbourhood, as provided by AirBnB, indicating whether a given house is located within such area (1) or not (0). Neighbourhood membership is expressed on the neighborhood column:
where the main difference is that we are now allowing the constant term, \(\alpha\), to vary by neighbourhood \(r\), \(\alpha_r\).
Programmatically, we can fit this model with lm:
# Include `-1` to eliminate the constant term and include a dummy for every aream2<-lm('log_price ~ neighborhood + accommodates + bathrooms + bedrooms + beds - 1', db)summary(m2)
Econometrically speaking, what the postcode FE we have introduced imply is that, instead of comparing all house prices across San Diego as equal, we only derive variation from within each postcode. In our particular case, estimating spatial FE in our particular example also gives you an indirect measure of area desirability: since they are simple dummies in a regression explaining the price of a house, their estimate tells us about how much people are willing to pay to live in a given area. We can visualise this “geography of desirability” by plotting the estimates of each fixed effect on a map:
We can see how neighborhoods in the left (west) tend to have higher prices. What we can’t see, but it is represented there if you are familiar with the geography of San Diego, is that the city is bounded by the Pacific ocean on the left, suggesting neighbourhoods by the beach tend to be more expensive.
Remember that the interpretation of a \(\beta_k\) coefficient is the effect of variable \(k\), given all the other explanatory variables included remain constant. By including a single variable for each area, we are effectively forcing the model to compare as equal only house prices that share the same value for each variable; in other words, only houses located within the same area. Introducing FE affords you a higher degree of isolation of the effects of the variables you introduce in your model because you can control for unobserved effects that align spatially with the distribution of the FE you introduce (by neighbourhood, in our case).
Spatial regimes
At the core of estimating spatial FEs is the idea that, instead of assuming the dependent variable behaves uniformly over space, there are systematic effects following a geographical pattern that affect its behaviour. In other words, spatial FEs introduce econometrically the notion of spatial heterogeneity. They do this in the simplest possible form: by allowing the constant term to vary geographically. The other elements of the regression are left untouched and hence apply uniformly across space. The idea of spatial regimes (SRs) is to generalize the spatial FE approach to allow not only the constant term to vary but also any other explanatory variable. This implies that the equation we will be estimating is: \[
\log(P_i) = \alpha_r + \beta_{1r} Acc_i + \beta_{2r} Bath_i + \beta_{3r} Bedr_i + \beta_{4r} Beds_i + \epsilon_i
\]
where we are not only allowing the constant term to vary by region (\(\alpha_r\)), but also every other parameter (\(\beta_{kr}\)).
Also, given we are going to allow every coefficient to vary by regime, we will need to explicitly set a constant term that we can allow to vary:
db$one<-1
Then, the estimation leverages the capabilities in model description of R formulas:
Call:
lm(formula = "log_price ~ 0 + (accommodates + bathrooms + bedrooms + beds):(neighborhood)",
data = db)
Residuals:
Min 1Q Median 3Q Max
-10.4790 -0.0096 1.0931 1.7599 6.1073
Coefficients:
Estimate Std. Error t value
accommodates:neighborhoodBalboa Park 0.063528 0.093237 0.681
accommodates:neighborhoodBay Ho -0.259615 0.335007 -0.775
accommodates:neighborhoodBay Park -0.355401 0.232720 -1.527
accommodates:neighborhoodCarmel Valley 0.129786 0.187193 0.693
accommodates:neighborhoodCity Heights West 0.447371 0.231998 1.928
accommodates:neighborhoodClairemont Mesa 0.711353 0.177821 4.000
accommodates:neighborhoodCollege Area -0.346152 0.188071 -1.841
accommodates:neighborhoodCore 0.125864 0.148417 0.848
accommodates:neighborhoodCortez Hill 0.715958 0.126562 5.657
accommodates:neighborhoodDel Mar Heights 0.829195 0.214067 3.874
accommodates:neighborhoodEast Village 0.214642 0.077394 2.773
accommodates:neighborhoodGaslamp Quarter 0.451443 0.197637 2.284
accommodates:neighborhoodGrant Hill 1.135176 0.167771 6.766
accommodates:neighborhoodGrantville 0.300907 0.280369 1.073
accommodates:neighborhoodKensington 0.668742 0.450243 1.485
accommodates:neighborhoodLa Jolla 0.520882 0.055887 9.320
accommodates:neighborhoodLa Jolla Village 0.566452 0.413185 1.371
accommodates:neighborhoodLinda Vista 0.523975 0.282219 1.857
accommodates:neighborhoodLittle Italy 0.603908 0.121899 4.954
accommodates:neighborhoodLoma Portal 0.487743 0.127870 3.814
accommodates:neighborhoodMarina 0.431384 0.172628 2.499
accommodates:neighborhoodMidtown 0.618058 0.087992 7.024
accommodates:neighborhoodMidtown District 0.430398 0.191682 2.245
accommodates:neighborhoodMira Mesa -0.018199 0.310167 -0.059
accommodates:neighborhoodMission Bay 0.440951 0.049454 8.916
accommodates:neighborhoodMission Valley 0.144530 0.507925 0.285
accommodates:neighborhoodMoreno Mission 0.100471 0.229460 0.438
accommodates:neighborhoodNormal Heights 0.413682 0.198584 2.083
accommodates:neighborhoodNorth Clairemont -0.242723 0.307090 -0.790
accommodates:neighborhoodNorth Hills 0.262840 0.083258 3.157
accommodates:neighborhoodNorthwest -0.229157 0.255656 -0.896
accommodates:neighborhoodOcean Beach 0.754771 0.079097 9.542
accommodates:neighborhoodOld Town 0.177176 0.159714 1.109
accommodates:neighborhoodOtay Ranch -0.333536 0.309545 -1.078
accommodates:neighborhoodPacific Beach 0.345475 0.057599 5.998
accommodates:neighborhoodPark West 0.909020 0.156013 5.827
accommodates:neighborhoodRancho Bernadino -0.118939 0.256750 -0.463
accommodates:neighborhoodRancho Penasquitos 0.121845 0.228456 0.533
accommodates:neighborhoodRoseville 0.316929 0.226110 1.402
accommodates:neighborhoodSan Carlos 0.191248 0.318706 0.600
accommodates:neighborhoodScripps Ranch 0.347638 0.127239 2.732
accommodates:neighborhoodSerra Mesa 0.495491 0.282281 1.755
accommodates:neighborhoodSouth Park 0.334378 0.256708 1.303
accommodates:neighborhoodUniversity City 0.107605 0.113883 0.945
accommodates:neighborhoodWest University Heights 0.190215 0.212040 0.897
bathrooms:neighborhoodBalboa Park 2.275321 0.225032 10.111
bathrooms:neighborhoodBay Ho 3.312231 0.530568 6.243
bathrooms:neighborhoodBay Park 2.231649 0.365655 6.103
bathrooms:neighborhoodCarmel Valley 1.191058 0.224138 5.314
bathrooms:neighborhoodCity Heights West 2.517235 0.550272 4.575
bathrooms:neighborhoodClairemont Mesa 3.737297 0.427366 8.745
bathrooms:neighborhoodCollege Area 3.370263 0.413479 8.151
bathrooms:neighborhoodCore 3.635188 0.490640 7.409
bathrooms:neighborhoodCortez Hill 1.631032 0.299654 5.443
bathrooms:neighborhoodDel Mar Heights 1.346206 0.342828 3.927
bathrooms:neighborhoodEast Village 2.600489 0.190932 13.620
bathrooms:neighborhoodGaslamp Quarter 3.183092 0.527615 6.033
bathrooms:neighborhoodGrant Hill 2.770976 0.416838 6.648
bathrooms:neighborhoodGrantville 2.177175 0.693599 3.139
bathrooms:neighborhoodKensington 1.284044 0.671482 1.912
bathrooms:neighborhoodLa Jolla 0.852667 0.099413 8.577
bathrooms:neighborhoodLa Jolla Village 0.984426 1.193870 0.825
bathrooms:neighborhoodLinda Vista 2.359895 0.393392 5.999
bathrooms:neighborhoodLittle Italy 2.600567 0.275834 9.428
bathrooms:neighborhoodLoma Portal 2.575164 0.249679 10.314
bathrooms:neighborhoodMarina 3.317139 0.656533 5.053
bathrooms:neighborhoodMidtown 0.899736 0.112205 8.019
bathrooms:neighborhoodMidtown District 3.143440 0.594875 5.284
bathrooms:neighborhoodMira Mesa 2.858280 0.512511 5.577
bathrooms:neighborhoodMission Bay 1.764929 0.122421 14.417
bathrooms:neighborhoodMission Valley 2.666000 1.365483 1.952
bathrooms:neighborhoodMoreno Mission 3.234512 0.557898 5.798
bathrooms:neighborhoodNormal Heights 3.505139 0.467965 7.490
bathrooms:neighborhoodNorth Clairemont 2.574847 0.613471 4.197
bathrooms:neighborhoodNorth Hills 2.584724 0.191541 13.494
bathrooms:neighborhoodNorthwest 2.877519 0.569924 5.049
bathrooms:neighborhoodOcean Beach 1.702208 0.207508 8.203
bathrooms:neighborhoodOld Town 2.249120 0.302755 7.429
bathrooms:neighborhoodOtay Ranch 2.818736 1.132794 2.488
bathrooms:neighborhoodPacific Beach 2.272803 0.130607 17.402
bathrooms:neighborhoodPark West 2.676739 0.308257 8.683
bathrooms:neighborhoodRancho Bernadino 0.856723 0.555198 1.543
bathrooms:neighborhoodRancho Penasquitos 0.677767 0.414569 1.635
bathrooms:neighborhoodRoseville 1.109625 0.360103 3.081
bathrooms:neighborhoodSan Carlos 2.489815 0.511232 4.870
bathrooms:neighborhoodScripps Ranch 2.459862 0.469601 5.238
bathrooms:neighborhoodSerra Mesa 2.968934 0.602807 4.925
bathrooms:neighborhoodSouth Park 2.895471 0.521793 5.549
bathrooms:neighborhoodUniversity City 3.125387 0.347825 8.986
bathrooms:neighborhoodWest University Heights 2.188257 0.390408 5.605
bedrooms:neighborhoodBalboa Park 0.605655 0.245384 2.468
bedrooms:neighborhoodBay Ho 0.836163 0.631871 1.323
bedrooms:neighborhoodBay Park 1.060944 0.430737 2.463
bedrooms:neighborhoodCarmel Valley 0.521954 0.480497 1.086
bedrooms:neighborhoodCity Heights West -0.272600 0.663983 -0.411
bedrooms:neighborhoodClairemont Mesa -0.742539 0.450344 -1.649
bedrooms:neighborhoodCollege Area -0.306621 0.410476 -0.747
bedrooms:neighborhoodCore -0.786470 0.395991 -1.986
bedrooms:neighborhoodCortez Hill 0.793039 0.380195 2.086
bedrooms:neighborhoodDel Mar Heights -0.071069 0.369070 -0.193
bedrooms:neighborhoodEast Village -0.186076 0.213572 -0.871
bedrooms:neighborhoodGaslamp Quarter -0.294024 0.342057 -0.860
bedrooms:neighborhoodGrant Hill -0.456825 0.425374 -1.074
bedrooms:neighborhoodGrantville 0.907259 0.770945 1.177
bedrooms:neighborhoodKensington -0.257195 1.009326 -0.255
bedrooms:neighborhoodLa Jolla -0.152098 0.133726 -1.137
bedrooms:neighborhoodLa Jolla Village 4.291700 1.882046 2.280
bedrooms:neighborhoodLinda Vista -0.485372 0.642684 -0.755
bedrooms:neighborhoodLittle Italy 0.057475 0.306357 0.188
bedrooms:neighborhoodLoma Portal -0.406484 0.250607 -1.622
bedrooms:neighborhoodMarina -0.831114 0.511626 -1.624
bedrooms:neighborhoodMidtown 0.696852 0.167900 4.150
bedrooms:neighborhoodMidtown District 0.010614 0.509151 0.021
bedrooms:neighborhoodMira Mesa -0.197692 0.780959 -0.253
bedrooms:neighborhoodMission Bay -0.330540 0.121602 -2.718
bedrooms:neighborhoodMission Valley 0.514998 1.295767 0.397
bedrooms:neighborhoodMoreno Mission -0.584689 0.596044 -0.981
bedrooms:neighborhoodNormal Heights -0.127744 0.391691 -0.326
bedrooms:neighborhoodNorth Clairemont 0.281306 0.695297 0.405
bedrooms:neighborhoodNorth Hills 0.380444 0.178477 2.132
bedrooms:neighborhoodNorthwest 0.288603 0.607295 0.475
bedrooms:neighborhoodOcean Beach -0.038069 0.207927 -0.183
bedrooms:neighborhoodOld Town -0.319724 0.375203 -0.852
bedrooms:neighborhoodOtay Ranch 0.015564 1.332279 0.012
bedrooms:neighborhoodPacific Beach -0.037912 0.139026 -0.273
bedrooms:neighborhoodPark West -0.696514 0.413881 -1.683
bedrooms:neighborhoodRancho Bernadino 1.034776 0.579798 1.785
bedrooms:neighborhoodRancho Penasquitos 0.674520 0.519260 1.299
bedrooms:neighborhoodRoseville 0.881011 0.592962 1.486
bedrooms:neighborhoodSan Carlos -0.394191 0.540343 -0.730
bedrooms:neighborhoodScripps Ranch 1.107455 0.336101 3.295
bedrooms:neighborhoodSerra Mesa 0.253001 0.620774 0.408
bedrooms:neighborhoodSouth Park -0.595844 0.407811 -1.461
bedrooms:neighborhoodUniversity City 0.203783 0.455767 0.447
bedrooms:neighborhoodWest University Heights 0.242873 0.359245 0.676
beds:neighborhoodBalboa Park 0.041556 0.173183 0.240
beds:neighborhoodBay Ho -0.402544 0.495241 -0.813
beds:neighborhoodBay Park 0.283958 0.410776 0.691
beds:neighborhoodCarmel Valley 0.150416 0.288268 0.522
beds:neighborhoodCity Heights West -0.217526 0.497878 -0.437
beds:neighborhoodClairemont Mesa -1.109581 0.308998 -3.591
beds:neighborhoodCollege Area 0.594892 0.312780 1.902
beds:neighborhoodCore 0.602559 0.277027 2.175
beds:neighborhoodCortez Hill -0.609996 0.143559 -4.249
beds:neighborhoodDel Mar Heights -0.708476 0.257299 -2.754
beds:neighborhoodEast Village 0.399909 0.148641 2.690
beds:neighborhoodGaslamp Quarter 0.240245 0.319910 0.751
beds:neighborhoodGrant Hill -1.315807 0.186724 -7.047
beds:neighborhoodGrantville -0.382590 0.469011 -0.816
beds:neighborhoodKensington 0.133474 0.664698 0.201
beds:neighborhoodLa Jolla 0.001347 0.085013 0.016
beds:neighborhoodLa Jolla Village -2.878676 1.020652 -2.820
beds:neighborhoodLinda Vista -0.142372 0.278211 -0.512
beds:neighborhoodLittle Italy -0.569868 0.099961 -5.701
beds:neighborhoodLoma Portal -0.255510 0.222956 -1.146
beds:neighborhoodMarina 0.024175 0.429466 0.056
beds:neighborhoodMidtown -0.346866 0.137915 -2.515
beds:neighborhoodMidtown District -0.464781 0.337775 -1.376
beds:neighborhoodMira Mesa 0.319934 0.426799 0.750
beds:neighborhoodMission Bay -0.108936 0.067105 -1.623
beds:neighborhoodMission Valley -0.502441 0.879795 -0.571
beds:neighborhoodMoreno Mission 0.492514 0.439355 1.121
beds:neighborhoodNormal Heights -0.532907 0.227211 -2.345
beds:neighborhoodNorth Clairemont 0.562363 0.704213 0.799
beds:neighborhoodNorth Hills -0.279430 0.123678 -2.259
beds:neighborhoodNorthwest 0.742017 0.474903 1.562
beds:neighborhoodOcean Beach -0.667651 0.137647 -4.850
beds:neighborhoodOld Town 0.459210 0.287008 1.600
beds:neighborhoodOtay Ranch 0.235723 0.983870 0.240
beds:neighborhoodPacific Beach -0.179242 0.087511 -2.048
beds:neighborhoodPark West -0.873297 0.225334 -3.876
beds:neighborhoodRancho Bernadino 0.378088 0.348640 1.084
beds:neighborhoodRancho Penasquitos 0.147457 0.344820 0.428
beds:neighborhoodRoseville -0.391529 0.328609 -1.191
beds:neighborhoodSan Carlos 0.115338 0.621666 0.186
beds:neighborhoodScripps Ranch -1.654484 0.338331 -4.890
beds:neighborhoodSerra Mesa -1.018812 0.705888 -1.443
beds:neighborhoodSouth Park 0.452815 0.406052 1.115
beds:neighborhoodUniversity City -0.345822 0.232779 -1.486
beds:neighborhoodWest University Heights 0.146128 0.364075 0.401
Pr(>|t|)
accommodates:neighborhoodBalboa Park 0.495668
accommodates:neighborhoodBay Ho 0.438397
accommodates:neighborhoodBay Park 0.126774
accommodates:neighborhoodCarmel Valley 0.488131
accommodates:neighborhoodCity Heights West 0.053861 .
accommodates:neighborhoodClairemont Mesa 6.40e-05 ***
accommodates:neighborhoodCollege Area 0.065740 .
accommodates:neighborhoodCore 0.396446
accommodates:neighborhoodCortez Hill 1.61e-08 ***
accommodates:neighborhoodDel Mar Heights 0.000108 ***
accommodates:neighborhoodEast Village 0.005565 **
accommodates:neighborhoodGaslamp Quarter 0.022395 *
accommodates:neighborhoodGrant Hill 1.45e-11 ***
accommodates:neighborhoodGrantville 0.283202
accommodates:neighborhoodKensington 0.137520
accommodates:neighborhoodLa Jolla < 2e-16 ***
accommodates:neighborhoodLa Jolla Village 0.170446
accommodates:neighborhoodLinda Vista 0.063414 .
accommodates:neighborhoodLittle Italy 7.47e-07 ***
accommodates:neighborhoodLoma Portal 0.000138 ***
accommodates:neighborhoodMarina 0.012484 *
accommodates:neighborhoodMidtown 2.40e-12 ***
accommodates:neighborhoodMidtown District 0.024780 *
accommodates:neighborhoodMira Mesa 0.953213
accommodates:neighborhoodMission Bay < 2e-16 ***
accommodates:neighborhoodMission Valley 0.776000
accommodates:neighborhoodMoreno Mission 0.661505
accommodates:neighborhoodNormal Heights 0.037279 *
accommodates:neighborhoodNorth Clairemont 0.429327
accommodates:neighborhoodNorth Hills 0.001602 **
accommodates:neighborhoodNorthwest 0.370103
accommodates:neighborhoodOcean Beach < 2e-16 ***
accommodates:neighborhoodOld Town 0.267333
accommodates:neighborhoodOtay Ranch 0.281298
accommodates:neighborhoodPacific Beach 2.12e-09 ***
accommodates:neighborhoodPark West 5.96e-09 ***
accommodates:neighborhoodRancho Bernadino 0.643202
accommodates:neighborhoodRancho Penasquitos 0.593817
accommodates:neighborhoodRoseville 0.161071
accommodates:neighborhoodSan Carlos 0.548479
accommodates:neighborhoodScripps Ranch 0.006311 **
accommodates:neighborhoodSerra Mesa 0.079258 .
accommodates:neighborhoodSouth Park 0.192775
accommodates:neighborhoodUniversity City 0.344762
accommodates:neighborhoodWest University Heights 0.369719
bathrooms:neighborhoodBalboa Park < 2e-16 ***
bathrooms:neighborhoodBay Ho 4.60e-10 ***
bathrooms:neighborhoodBay Park 1.11e-09 ***
bathrooms:neighborhoodCarmel Valley 1.11e-07 ***
bathrooms:neighborhoodCity Heights West 4.87e-06 ***
bathrooms:neighborhoodClairemont Mesa < 2e-16 ***
bathrooms:neighborhoodCollege Area 4.37e-16 ***
bathrooms:neighborhoodCore 1.45e-13 ***
bathrooms:neighborhoodCortez Hill 5.45e-08 ***
bathrooms:neighborhoodDel Mar Heights 8.71e-05 ***
bathrooms:neighborhoodEast Village < 2e-16 ***
bathrooms:neighborhoodGaslamp Quarter 1.71e-09 ***
bathrooms:neighborhoodGrant Hill 3.25e-11 ***
bathrooms:neighborhoodGrantville 0.001704 **
bathrooms:neighborhoodKensington 0.055892 .
bathrooms:neighborhoodLa Jolla < 2e-16 ***
bathrooms:neighborhoodLa Jolla Village 0.409651
bathrooms:neighborhoodLinda Vista 2.10e-09 ***
bathrooms:neighborhoodLittle Italy < 2e-16 ***
bathrooms:neighborhoodLoma Portal < 2e-16 ***
bathrooms:neighborhoodMarina 4.49e-07 ***
bathrooms:neighborhoodMidtown 1.28e-15 ***
bathrooms:neighborhoodMidtown District 1.31e-07 ***
bathrooms:neighborhoodMira Mesa 2.55e-08 ***
bathrooms:neighborhoodMission Bay < 2e-16 ***
bathrooms:neighborhoodMission Valley 0.050935 .
bathrooms:neighborhoodMoreno Mission 7.07e-09 ***
bathrooms:neighborhoodNormal Heights 7.88e-14 ***
bathrooms:neighborhoodNorth Clairemont 2.74e-05 ***
bathrooms:neighborhoodNorth Hills < 2e-16 ***
bathrooms:neighborhoodNorthwest 4.58e-07 ***
bathrooms:neighborhoodOcean Beach 2.85e-16 ***
bathrooms:neighborhoodOld Town 1.25e-13 ***
bathrooms:neighborhoodOtay Ranch 0.012863 *
bathrooms:neighborhoodPacific Beach < 2e-16 ***
bathrooms:neighborhoodPark West < 2e-16 ***
bathrooms:neighborhoodRancho Bernadino 0.122861
bathrooms:neighborhoodRancho Penasquitos 0.102129
bathrooms:neighborhoodRoseville 0.002070 **
bathrooms:neighborhoodSan Carlos 1.14e-06 ***
bathrooms:neighborhoodScripps Ranch 1.68e-07 ***
bathrooms:neighborhoodSerra Mesa 8.66e-07 ***
bathrooms:neighborhoodSouth Park 3.00e-08 ***
bathrooms:neighborhoodUniversity City < 2e-16 ***
bathrooms:neighborhoodWest University Heights 2.18e-08 ***
bedrooms:neighborhoodBalboa Park 0.013608 *
bedrooms:neighborhoodBay Ho 0.185782
bedrooms:neighborhoodBay Park 0.013803 *
bedrooms:neighborhoodCarmel Valley 0.277400
bedrooms:neighborhoodCity Heights West 0.681416
bedrooms:neighborhoodClairemont Mesa 0.099236 .
bedrooms:neighborhoodCollege Area 0.455100
bedrooms:neighborhoodCore 0.047070 *
bedrooms:neighborhoodCortez Hill 0.037033 *
bedrooms:neighborhoodDel Mar Heights 0.847309
bedrooms:neighborhoodEast Village 0.383650
bedrooms:neighborhoodGaslamp Quarter 0.390058
bedrooms:neighborhoodGrant Hill 0.282895
bedrooms:neighborhoodGrantville 0.239317
bedrooms:neighborhoodKensington 0.798872
bedrooms:neighborhoodLa Jolla 0.255422
bedrooms:neighborhoodLa Jolla Village 0.022623 *
bedrooms:neighborhoodLinda Vista 0.450143
bedrooms:neighborhoodLittle Italy 0.851191
bedrooms:neighborhoodLoma Portal 0.104857
bedrooms:neighborhoodMarina 0.104332
bedrooms:neighborhoodMidtown 3.37e-05 ***
bedrooms:neighborhoodMidtown District 0.983369
bedrooms:neighborhoodMira Mesa 0.800169
bedrooms:neighborhoodMission Bay 0.006583 **
bedrooms:neighborhoodMission Valley 0.691053
bedrooms:neighborhoodMoreno Mission 0.326658
bedrooms:neighborhoodNormal Heights 0.744334
bedrooms:neighborhoodNorth Clairemont 0.685798
bedrooms:neighborhoodNorth Hills 0.033080 *
bedrooms:neighborhoodNorthwest 0.634643
bedrooms:neighborhoodOcean Beach 0.854736
bedrooms:neighborhoodOld Town 0.394173
bedrooms:neighborhoodOtay Ranch 0.990680
bedrooms:neighborhoodPacific Beach 0.785097
bedrooms:neighborhoodPark West 0.092450 .
bedrooms:neighborhoodRancho Bernadino 0.074358 .
bedrooms:neighborhoodRancho Penasquitos 0.193994
bedrooms:neighborhoodRoseville 0.137390
bedrooms:neighborhoodSan Carlos 0.465713
bedrooms:neighborhoodScripps Ranch 0.000990 ***
bedrooms:neighborhoodSerra Mesa 0.683614
bedrooms:neighborhoodSouth Park 0.144046
bedrooms:neighborhoodUniversity City 0.654804
bedrooms:neighborhoodWest University Heights 0.499025
beds:neighborhoodBalboa Park 0.810374
beds:neighborhoodBay Ho 0.416352
beds:neighborhoodBay Park 0.489423
beds:neighborhoodCarmel Valley 0.601836
beds:neighborhoodCity Heights West 0.662195
beds:neighborhoodClairemont Mesa 0.000332 ***
beds:neighborhoodCollege Area 0.057226 .
beds:neighborhoodCore 0.029663 *
beds:neighborhoodCortez Hill 2.18e-05 ***
beds:neighborhoodDel Mar Heights 0.005914 **
beds:neighborhoodEast Village 0.007156 **
beds:neighborhoodGaslamp Quarter 0.452696
beds:neighborhoodGrant Hill 2.04e-12 ***
beds:neighborhoodGrantville 0.414683
beds:neighborhoodKensington 0.840859
beds:neighborhoodLa Jolla 0.987357
beds:neighborhoodLa Jolla Village 0.004812 **
beds:neighborhoodLinda Vista 0.608851
beds:neighborhoodLittle Italy 1.25e-08 ***
beds:neighborhoodLoma Portal 0.251837
beds:neighborhoodMarina 0.955112
beds:neighborhoodMidtown 0.011927 *
beds:neighborhoodMidtown District 0.168872
beds:neighborhoodMira Mesa 0.453518
beds:neighborhoodMission Bay 0.104565
beds:neighborhoodMission Valley 0.567962
beds:neighborhoodMoreno Mission 0.262337
beds:neighborhoodNormal Heights 0.019038 *
beds:neighborhoodNorth Clairemont 0.424572
beds:neighborhoodNorth Hills 0.023899 *
beds:neighborhoodNorthwest 0.118233
beds:neighborhoodOcean Beach 1.26e-06 ***
beds:neighborhoodOld Town 0.109654
beds:neighborhoodOtay Ranch 0.810658
beds:neighborhoodPacific Beach 0.040583 *
beds:neighborhoodPark West 0.000108 ***
beds:neighborhoodRancho Bernadino 0.278202
beds:neighborhoodRancho Penasquitos 0.668932
beds:neighborhoodRoseville 0.233515
beds:neighborhoodSan Carlos 0.852819
beds:neighborhoodScripps Ranch 1.03e-06 ***
beds:neighborhoodSerra Mesa 0.148987
beds:neighborhoodSouth Park 0.264826
beds:neighborhoodUniversity City 0.137433
beds:neighborhoodWest University Heights 0.688164
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.81 on 5930 degrees of freedom
Multiple R-squared: 0.8759, Adjusted R-squared: 0.8721
F-statistic: 232.4 on 180 and 5930 DF, p-value: < 2.2e-16
This allows us to get a separate constant term and estimate of the impact of each variable for every neighborhood.
Source Code
# Spatial heterogeneity## DependenciesWe will rely on the following libraries in this section, all of them included in @sec-dependencies:```{r}# Layoutlibrary(tufte)# For pretty tablelibrary(knitr)# For string parsinglibrary(stringr)# Spatial Data managementlibrary(rgdal)# Pretty graphicslibrary(ggplot2)# Pretty mapslibrary(ggmap)# For all your interpolation needslibrary(gstat)# For data manipulationlibrary(dplyr)# Spatial regressionlibrary(spdep)```Before we start any analysis, let us set the path to the directory where we are working. We can easily do that with `setwd()`. Please replace in the following line the path to the folder where you have placed this file and where the `house_transactions` folder with the data lives.```{r}setwd('.')```## DataTo explore ideas in spatial regression, we will the set of Airbnb properties for San Diego (US), borrowed from the "Geographic Data Science with Python" book (see [here](https://geographicdata.science/book/data/airbnb/regression_cleaning.html) for more info on the dataset source). This covers the point location of properties advertised on the Airbnb website in the San Diego region.Let us load the data:```{r}db <-st_read('data/abb_sd/regression_db.geojson')```The table contains the followig variables:```{r}names(db)```For most of this chapter, we will be exploring determinants and strategies for modelling the price of a property advertised in AirBnb. To get a first taste of what this means, we can create a plot of prices within the area of San Diego:```{r}db %>%ggplot(aes(color = price)) +geom_sf() +scale_color_viridis_c() +theme_void()```## Spatial heterogeneitySpatial heterogeneity (SH) arises when we cannot safely assume the process we are studying operates under the same "rules" throughout the geography of interest. In other words, we can observe SH when there are effects on the outcome variable that are intrinsically linked to specific locations. A good example of this is the case of seafront houses above: we are trying to model the price of a house and, the fact some houses are located under certain conditions (i.e. by the sea), makes their price behave differently. This somewhat abstract concept of SH can be made operational in a model in several ways. We will explore the following two: spatial fixed-effects (FE); and spatial regimes, which is a generalization of FE.**Spatial FE**Let us consider the house price example from the previous section to introduce a more general illustration that relates to the second motivation for spatial effects ("space as a proxy"). Given we are only including two explanatory variables in the model, it is likely we are missing some important factors that play a role at determining the price at which a house is sold. Some of them, however, are likely to vary systematically over space (e.g. different neighborhood characteristics). If that is the case, we can control for those unobserved factors by using traditional dummy variables but basing their creation on a spatial rule. For example, let us include a binary variable for every neighbourhood, as provided by AirBnB, indicating whether a given house is located within such area (`1`) or not (`0`). Neighbourhood membership is expressed on the `neighborhood` column:```{r}db %>%ggplot(aes(color = neighborhood)) +geom_sf() +theme_void()```Mathematically, we are now fitting the following equation:$$\log(P_i) = \alpha_r + \beta_1 Acc_i + \beta_2 Bath_i + \beta_3 Bedr_i + \beta_4 Beds_i + \epsilon_i$$where the main difference is that we are now allowing the constant term, $\alpha$, to vary by neighbourhood $r$, $\alpha_r$.Programmatically, we can fit this model with `lm`:```{r}# Include `-1` to eliminate the constant term and include a dummy for every aream2 <-lm('log_price ~ neighborhood + accommodates + bathrooms + bedrooms + beds - 1', db)summary(m2)```Econometrically speaking, what the postcode FE we have introduced imply is that, instead of comparing all house prices across San Diego as equal, we only derive variation from *within* each postcode. In our particular case, estimating spatial FE in our particular example also gives you an indirect measure of area *desirability*: since they are simple dummies in a regression explaining the price of a house, their estimate tells us about how much people are willing to pay to live in a given area. We can visualise this "geography of desirability" by plotting the estimates of each fixed effect on a map:```{r}# Extract neighborhood names from coefficientsnei.names <- m2$coefficients %>%as.data.frame() %>%row.names() %>%str_replace("neighborhood", "")# Set up as Data Framenei.fes <-data.frame(coef = m2$coefficients,nei = nei.names,row.names = nei.names) %>%right_join( db, by =c("nei"="neighborhood"))# Plotnei.fes %>%st_as_sf() %>%ggplot(aes(color = coef)) +geom_sf() +scale_color_viridis_c() +theme_void()```We can see how neighborhoods in the left (west) tend to have higher prices. What we can't see, but it is represented there if you are familiar with the geography of San Diego, is that the city is bounded by the Pacific ocean on the left, suggesting neighbourhoods by the beach tend to be more expensive.Remember that the interpretation of a $\beta_k$ coefficient is the effect of variable $k$, *given all the other explanatory variables included remain constant*. By including a single variable for each area, we are effectively forcing the model to compare as equal only house prices that share the same value for each variable; in other words, only houses located within the same area. Introducing FE affords you a higher degree of isolation of the effects of the variables you introduce in your model because you can control for unobserved effects that align spatially with the distribution of the FE you introduce (by neighbourhood, in our case).**Spatial regimes**At the core of estimating spatial FEs is the idea that, instead of assuming the dependent variable behaves uniformly over space, there are systematic effects following a geographical pattern that affect its behaviour. In other words, spatial FEs introduce econometrically the notion of spatial heterogeneity. They do this in the simplest possible form: by allowing the constant term to vary geographically. The other elements of the regression are left untouched and hence apply uniformly across space. The idea of spatial regimes (SRs) is to generalize the spatial FE approach to allow not only the constant term to vary but also any other explanatory variable. This implies that the equation we will be estimating is: $$\log(P_i) = \alpha_r + \beta_{1r} Acc_i + \beta_{2r} Bath_i + \beta_{3r} Bedr_i + \beta_{4r} Beds_i + \epsilon_i$$where we are not only allowing the constant term to vary by region ($\alpha_r$), but also every other parameter ($\beta_{kr}$).Also, given we are going to allow *every* coefficient to vary by regime, we will need to explicitly set a constant term that we can allow to vary:```{r}db$one <-1```Then, the estimation leverages the capabilities in model description of R formulas:```{r}# `:` notation implies interaction variablesm3 <-lm('log_price ~ 0 + (accommodates + bathrooms + bedrooms + beds):(neighborhood)', db)summary(m3)```This allows us to get a separate constant term and estimate of the impact of each variable *for every neighborhood*.