## In Defence of the Linear Demand Function

### Presentations of microeconomic analysis often assume linear demand functions, but rarely justify them in terms of utility theory. However it can be done, and without making outrageous assumptions.

Originally posted 11/2/2016.  Re-posted following site reorganisation 21/6/2016.

Run a Google search on “assume a linear demand function”, and you get thousands of hits. The straight-line demand function is much used in elementary economic analysis and student exercises, presumably because the complexities of curves can distract from a topic’s economic substance, perhaps even just because it’s easy to draw. But is there more to be said for it than that? Could a real demand function be linear?

The answer to the question might seem to lie in empirical investigation, and certainly that could be helpful. For the most part, we would have to rely on natural experiments in which the price and other variables have changed through the normal working of an economy, and use statistical methods to try to isolate the effect of the price. Given some unexplained variation in the data points, the form of the demand function may be far from clear.

Furthermore, the range of prices observed may be quite narrow. The fitted demand function may be of doubtful value if we want to predict the effect of a price outside that range, or to use the consumer surplus (the area under the whole demand function above the actual price) as a welfare measure. The latter point is especially important in the economic valuation – from revealed preferences in surrogate markets – of environmental goods for which the actual price is often zero. The travel cost and hedonic methods often require measurement of the area under the whole demand function for an environmental good, right down to zero price, so that it becomes crucial to consider the shape of the demand function even at very low prices.

There remains therefore a considerable role for theory, in particular utility theory. While theory cannot tell us the form of the demand function for a particular good, it can provide some insight into what forms are possible, and in what sort of circumstances they might occur. A general principle of econometrics is that model-building should be supported by theory (1), and the identification of demand functions is no exception.

The derivation of a demand function from a utility function together with a budget constraint is a straightforward application of constrained optimisation. The converse – to find the utility function that will result in a given form of demand function – needs more mathematical sophistication. A detailed treatment, including the case of linear demand, may be found in Border (2). The utility function developed below, leading to a linear demand function, is an adaptation of that derived by Border. The aim here is to show that such a utility function is not just a mathematical curiosity, but in certain circumstances has economic plausibility.

Let’s consider some utility functions, for the case of two goods, X and Y. Take the functions to relate to a normal person, neither rich nor with an unusually modest desire for goods. To relate the two-good case to the reality of a world of many goods, take X to be a particular good in whose demand function we are interested, and Y to be a composite good representing all other goods. Suppose further that X is an inessential, a leisure good for example. Let $P_X$ and $P_Y$ be the prices of the goods, with $P_Y$ assumed fixed, and let M be the person’s income.

The assumption that more of a good is normally preferred to less suggests two basic types of utility function, multiplicative and additive. To allow some degree of flexibility, we may include coefficients in either type, say a and b, leading naturally to the two forms below:

$U(X,Y) = X^aY^b.......(E1)$

$U(X,Y) = aX + bY.......(E2)$

The multiplicative approach results in the familiar Cobb-Douglas utility function (E1). A feature of this function is that it exhibits a diminishing marginal rate of substitution, represented by indifference curves that are convex to the origin. In other words, as we increase the quantity of one good, say X, the ‘value’ of given increments gradually decreases, where value is measured in terms of the quantity of Y that must be forgone to keep utility constant. A standard piece of analysis (3) shows that it leads to curvilinear demand functions with demand inversely proportional to price, implying that there is no limit on quantity demanded as price falls to zero. These properties suggest that the Cobb-Douglas function may be plausible in some circumstances. For our case, however, it is inappropriate because it implies that utility will be nil when consumption of X is nil, even if consumption of Y is high, which contradicts the assumption that X is an inessential.

What is needed for our case, therefore, is an additive utility function. However, the basic additive form (E2) exhibits perfect substitutability between the two goods, represented by straight-line indifference curves. If we increase the quantity of X in this case, the value of given increments, in terms of the quantity of Y that must be forgone to keep utility constant, does not decrease. Since this is not very plausible, let us refine (E2) by replacing aX and bY by functions f(X) and g(Y):

$U(X,Y) = f(X) + g(Y)......(E3)$

To ensure a diminishing marginal rate of substitution, we require that, as X and Y respectively increase from zero, these functions initially increase, but at a decreasing rate (in terms of calculus their first derivatives are positive and their second derivatives negative).

Since X is an inessential good, we expect that the rate of increase of f(X) will decrease quite rapidly, so that f(X) will reach a maximum at quite a low threshold value of X, and will remain at that maximum when X exceeds the threshold. For the composite good Y, however, we expect that the rate of increase of g(Y) will decrease much more slowly, and that g(Y) will still be increasing at the maximum Y the individual can afford from their income, that is, $M/P_Y$.

The indifference map will look something like that below, although the relative sizes of the horizontal and vertical dimensions will depend on the units in which the goods are measured and the scales of the axes.

Any expenditure on X above that needed to achieve the threshold will not add to the utility derived from X, and by reducing expenditure on Y will reduce the utility derived from Y. A rational individual will therefore spend most (or all) of their income on Y and at most a very small proportion q on X.

The possible range of expenditure on Y will therefore be small, implying a narrow range for consumption of Y:

$M(1-q)/P_Y \leq Y \leq M/P_Y.......(E4)$

Since the rate of increase of the Y component g(Y) of U decreases only very slowly, a straight line can provide a very good approximation of g(Y) within the narrow range defined by (E4), narrow because q is very small. Hence we can find parameters, say s and t, such that within the defined range:

$s + tY \approx g(Y)$

We can therefore rewrite (E3), to a very good degree of approximation within the relevant range, as:

$U(X,Y) = f(X) + s + t(Y).......(E5)$

It remains to consider the function f(X). A simple way to achieve the necessary properties – increasing in X, but at a decreasing rate – is:

$f(X) = aX - cX^2.......(E6)$

It can be seen (eg using differentiation) that f is increasing when X is less than $a/2c$, and comes to a maximum of $a^2/4c$ at that threshold (when X exceeds the threshold we assume that it rermains at that maximum, rather than declining as (E6) suggests).

There are other functions that share the same properties. For example, the square in (E6) could be replaced by a cube, or some other power, and if desired the parameters a and c could be calibrated to yield the same maximum as (E6). However, such functions differ in their curvature, a simple indicator of which is the ratio R of their value at half the threshold to their value at the threshold (their maximum). For (E6) we have:

$R = \displaystyle \frac{a(a/4c) - c(a/4c)^2}{a(a/2c) - c(a/2c)^2} = \frac{3a^2/16c}{a^2/4c} = 0.75$

It can be shown that the higher the power replacing the square in (E6), the lower R becomes. For a cube, for example, it is about 0.69. While there cannot be said to be a correct value of R, which may vary between individuals and between goods, a value of three-quarters for a leisure good – implying that a person gets three-quarters of the possible enjoyment from half the quantity at which they are sated – seems entirely plausible.

For the circumstances we have described, therefore, a plausible form of utility function within the relevant range is:

$U(X,Y) = aX - cX^2 + s + tY.......(E7)$

We now use the standard constrained maximisation technique to find the demand function for X. The budget constraint is:

$XP_X + YP_Y = M$

Hence the Lagrangian expression is:

$L(X,Y,\lambda) = aX - cX^2 + s + tY + \lambda(XP_X + YP_Y - M)$

Taking partial differentials and equating to zero:

$\partial L / \partial X = a - 2cX + \lambda P_X = 0$

$\partial L / \partial Y = t + \lambda P_Y = 0$

Hence:

$\lambda = (2cX - a)/P_X = -t/P_Y$

$2cX = a - tP_X / P_Y$

$X = (a/2c) - (t /2cP_Y)P_X.......(E8)$

Since a, c, t and by assumption $P_Y$ are all constant, the demand function (E8) is linear in $P_X$.

A caveat. The above is an individual demand function, and like all demand functions is subject to the overriding condition that demand cannot be negative. Because of that condition, a market demand function built up by summing linear individual demand functions is not necessarily linear along its whole length. This is only so where each individual demand function has the same choke price (lowest price at which demand is zero). Otherwise, the market demand function is linear over the price range from zero up to the lowest choke price of any of the individual functions, but will be kinked at that choke price (and at all higher individual choke prices).

### Notes and References

1. Gujarati D N (International Edition 2006) Essentials of Econometrics McGraw Hill International p 336

2. Border K C (2003) The “Integrability” Problem http://people.hss.caltech.edu/~kcb/Notes/Demand4-Integrability.pdf   For linear demand see pp 7-8.

3. See for example Nicholson W (9th edn 2005) Microeconomic Theory: Basic Principles and Extensions Thomson South-Western pp 102-3

## Fishing and Economic Welfare

### Within a static model of a fishery one can identify levels of fishing effort for maximum yield, maximum profits and maximum welfare. Where demand is downward-sloping, effort for maximum welfare will normally be above that for maximum profits but below that for maximum yield.

Originally posted 14/6/2014.  Re-posted following site reorganisation 21/6/2016.

In a previous post discussing the reform of the EU’s Common Fisheries Policy, I outlined a model of a fishery in steady state with price flexibility. Here I present that model in mathematical form.

Bioeconomic models of fisheries often take the price of fish as given. This could be for the good reason that a model is intended to represent a local fishery whose output is too small to affect the market price of fish. In the context of textbooks on natural resource economics, there may also be a pedagogical motive. The combination of a biological growth function, a harvest function and a cost function is sufficient to demonstrate some important results – such as the distinction between open access and private property equilibria -, and may be judged complex enough for an introductory treatment, without the additional complication of downward-sloping demand for fish.

A consequence of a fixed price assumption is that there can be no consumer surplus. Hence the private property equilibrium, maximising profits (producer surplus), is also the social optimum, maximising economic welfare defined as the sum of the consumer and producer surpluses (1).  Once the fixed price assumption is relaxed, however, it no longer follows that the same fishing effort will maximise both profits and welfare.  Although this has been recognised in the literature at least since it was shown by Anderson (1973) (2), the point bears reiteration since it is commonly omitted in introductory textbooks.

As in other economic sectors, demand at industry level should be expected to be downward-sloping, raising the possibility that restriction of output to maximise profit could reduce overall welfare. Whether this will actually occur will depend on the structure and regulation of the industry. A monopoly is perhaps unlikely in a fishing context. A more plausible scenario is that regulation initially intended to address a situation of open access and over-fishing might evolve into a policy of maximising industry profits at the expense of the consumer.

The complexity of many bioeconomic models of fisheries has as much to do with the proliferation of letters standing for variables or parameters as to any complexity in the mathematics itself.  Judicious choice of units can limit the number of letters needed. Let us measure fish stock, X, in units such that the carrying capacity (sometimes represented by k) is one. The biological rate of growth of fish stock in the absence of harvesting, F, must be measured in units of fish stock per unit of time. For fish stock we must use the units just defined, but let us measure time in units such that we can write the standard logistic growth function without a growth parameter (sometimes represented by r) as simply:

$F=X(1-X).......(1)$

Fish harvest, H, is sometimes treated as the product of fish stock, fishing effort, E, and a coefficient representing fishing technology, but let us measure fishing effort in units such that the technology coefficient is one. The harvest function then is simply:

$H=EX.......(2)$

No doubt these units would be inconvenient for practical use, but in exploring the properties of a model all that matters is consistency (3). Thus, for example, every variable we use that represents a rate per unit of time – harvest H, cost of fishing effort C, revenue from fish sales R – must use the time unit defined above.

The condition for a steady state is that the rate of harvest should exactly offset the rate of biological growth:

$H = F.......(3)$

From (1), (2) and (3) we may infer the following relation between fish stock and effort in steady state:

$EX=H=F=X(1-X)$

Hence, unless X = 0:

$E=1-X$

$X=1-E.......(4)$

This relation provides some insight into the units we have defined for effort: since X must lie in the range from 0 to 1 (1 being the carrying capacity), E must also lie in the range from 0 to 1.

We make the common assumption that fishing costs, C, are a linear function of fishing effort:

$C=cE.......(5)$

For demand, we also assume linearity, but it is convenient to focus on the inverse demand function representing the unit price of fish P in terms of harvest:

$P=a-bH.......(6)$

It is assumed here that all fish harvested is sold at once, so that quantity demanded can be equated with harvest.

Using (2), (4) and (6) we may infer the steady-state revenue-effort relation:

$R=PH=(a-bH)H$

$= (a-bEX)EX$

$=(a-bE(1-E))E(1-E)$

$=aE - (a+b)E^2 + 2bE^3 - bE^4.......(7)$

This is the relation which in my previous post was referred to as a “flexible-price steady state revenue-effort curve” and shown in blue on Diagram 2.

We can now consider the respective levels of fishing effort needed to maximise harvest, profits (producer surplus) and welfare, in each case sustainably. The method is in principle the same in each case: we first express the quantity to be maximised as a function of effort, then use elementary calculus to find the maximum. However, the cases of profit and welfare lead to cubic equations that are difficult to solve. Instead, we will show that:

1. the level of effort for maximum profit is less than that for maximum harvest;
2. welfare increases with effort at the point of maximum profit;
3. welfare decreases with effort at the point of maximum harvest.

From the above it follows that effort for maximum welfare will be above that for maximum profits but below that for maximum harvest.

From (2) and (4) the steady state harvest-effort relation is:

$H = EX = E(1-E) = E - E^2.......(8)$

Setting the derivative equal to zero to find the maximum:

$dH/dE = 1 - 2E = 0.......(9)$

Hence for maximum harvest (maximum sustainable yield) E = 0.5. Note that relation (8) is symmetrical about the axis defined by E = 0.5. Thus any harvest obtainable at E* > 0.5 can also be obtained with less effort at (1 – E*) < 0.5. We expect therefore that both maximum profits and maximum welfare will require 0 < E < 0.5.

The steady-state relation between producer surplus, PS, and effort, from (5) and (7), is:

$PS = R-C = aE - (a+b)E^2 +2bE^3 - bE^4 - cE$

$= (a-c)E - (a+b)E^2 + 2bE^3 - bE^4.......(10)$

For a maximum we require:

$dPS/dE = (a-c) - 2(a+b)E + 6bE^2 - 4bE^3 = 0.......(11)$

Without attempting to solve (11) for E, we now consider the case of welfare.

To express welfare, W, as a function of effort, we must first express consumer surplus, CS, as a function of harvest. In terms of a price-quantity diagram, it is the triangular area below the demand curve and above the price corresponding to the harvest. Using (6), this is:

$CS = (1/2)H(a-P) = (1/2)H[a - (a - bH)] = (b/2)H^2.......(12)$

From (8) and (12), the steady-state relation between consumer surplus and effort is:

$CS = (b/2)E^2(1-E)^2 = (b/2)E^2 - bE^3 + (b/2)E^4.......(13)$

Hence, from (10) and (13), the steady-state relation between welfare and effort is:

$W = PS + CS = [(a-c)E - (a+b)E^2 + 2bE^3 - bE^4] + [(b/2)E^2 - bE^3 + (b/2)E^4].......(14)$

Although (14) could be simplified by collecting like powers of E, it is convenient for our purposes to keep separate its elements deriving from the producer and consumer surplus. Hence:

$dW/dE = [(a-c) - 2(a+b)E + 6bE^2 - 4bE^3] + [bE - 3bE^2 + 2bE^3].......(15)$

Substituting (9) into (15) we can infer the value of (15) at harvest-maximising effort:

$dW/dE = [(a-c) - 2(a+b)(0.5) + 6b(0.5)^2 - 4b(0.5)^3] + [b(0.5)^2 - 3b(0.5)^3 + 2b(0.5)^4]$

$= (a-c) - a - b + 1.5b - 0.5b + 0.25b - 0.375b + 0.125b$

$= -c.......(16)$

Since c, the cost coefficient, will be positive, this shows that welfare decreases with effort at harvest-maximising effort.

To infer the value of (15) at profit-maximising effort, we cannot substitute a specific value of E, but can use (11) to substitute zero for the first expression in square brackets within (15). Thus:

$dW/dE = [(a-c) - 2(a+b)E + 6bE^2 - 4bE^3] + [bE - 3bE^2 + 2bE^3]$

$= [0] + [bE - 3bE^2 + 2bE^3]$

$= bE(1-E)(1-2E).......(17)$

Since, at profit-maximising effort, 0 < E < 0.5, (17) will be positive, implying that welfare increases with effort at profit-maximising effort, provided that demand is downward sloping (b > 0). This completes the demonstration that, on the assumptions made and given sustainability, effort for maximum welfare lies above that for maximum profit but below that for maximum harvest. In the special case b = 0 (implying a given price of fish), (16) will equal zero, so that as noted above the point of maximum welfare will coincide with that of maximum profit.

Finally, some limitations should be noted. The above is a static analysis. It does not consider the path to an optimum from an initial position. Its steady-state assumption does not fully allow for the effect of current harvest, via future fish stocks, on future profits or welfare. The assumption of downward-sloping demand suggests that we are considering a fishing industry as a whole, with a harvest probably consisting of many species, so there is an implicit assumption that the simple growth and harvest functions can work reasonably well with X and H representing multi-species aggregates.

### Notes and references

1. See for example Hartwick J M & Olewiler N D (2nd edn 1998) The Economics of Natural Resource Use Addison Wesley pp 110-113, where profit-maximisation is presented as socially optimal, the price of fish being taken (p 107) as given.

2. Anderson L G (1973) Optimum Economic Yield of a Fishery Given a Variable Price of Output  Journal of the Fisheries Research Board of Canada 30(4) pp 509-518  http://www.nrcresearchpress.com/doi/abs/10.1139/f73-089#.U5xHWvk2zwU

3. Anyone suspecting that there is some trick in my treatment of units is invited to look at the following post I made to a mathematical question and answer website, deriving the profit-maximisation equation (equivalent of (11) above) with the conventional parameters and no special treatment of units: http://math.stackexchange.com/questions/825699/what-is-an-example-of-real-application-of-cubic-equations/830224#830224

## Sensitivity to Zone Definition in the Zonal Travel Cost Method

### How sensitive are regression estimates based on aggregated data, zonal travel cost datasets for example, to the particular form of aggregation?  Here I present an answer for one simple case.

Originally posted 21/3/2016.  Re-posted following site reorganisation 21/6/2016.

This is a brief introduction to my paper entitled:

The Maximum Difference Between Regression Coefficients Estimated from Different Levels of Aggregation of the Same Underlying Data: A Theorem and Discussion

In a previous post, I used a case study to show that the results of a zonal travel cost study can be sensitive to zone definition.  In other words, aggregations of the same underlying data within different zonal configurations can yield different results.  The case also showed that such differences can be quite large, as is illustrated in Charts 2 and 3 of that post.

This finding has a bearing on the application to zonal travel cost studies of the requirement that scientific research be replicable.  Suppose two researchers undertake independent studies including separate surveys to collect data, analyse their respective data within different zonal configurations, and obtain different results.  How large a difference would indicate a failure of replication and cast doubt on the results of one or other of the studies?  Sampling error is an issue to be considered but not the only one. Also relevant is the difference due to the different zonal configurations.  For sampling error, we can use well-established methods, such as standard errors of regression parameter estimates and hypothesis testing, to determine how large a difference in result can reasonably be attributable to that source.  But if we ask how much difference can reasonably be attributed  to different zonal configurations, there are – so far as I am aware – no established methods available.

This line of thought led me to consider whether there is any theoretical maximum to the differences in regression parameter estimates that can arise from different levels of aggregation of the same underlying data.  Note that this is an abstract formulation of the problem.  Hence a solution would be of relevance to attempted replications not just of zonal travel cost studies but of findings based on aggregated data in any field of research.

In its full generality, the problem appears intractable. Complications to be addressed would include multiple regression, alternative functional forms, and alternative estimation techniques.  However, I obtained a solution for a simple case involving higher and lower level datasets meeting the following conditions:

1. A bivariate regression model with linear functional form.
2. Estimation by ordinary least squares.
3. Each value of the independent and dependent variables in the high-level dataset is the unweighted aggregation of a pair of such values in the lower-level dataset.

Given these assumptions, the maximum difference between the estimated slope parameters based on the two datasets can be shown to be a function of:

1. The variance $Var[X_L]$ of the independent variable in the lower-level dataset.
2. The maximum t of the absolute differences between each aggregation pair of values of the independent variable in the lower-level dataset
3. The mean absolute value r of the residuals in the regression based on the lower-level dataset.

Specifically:

Maximum Difference  $\boldsymbol{= \frac{2rt}{4Var[X_L]-t^2}}$

The proof, given in full in the paper, uses only basic regression theory and elementary algebra.

The paper also presents a simple example of a zonal travel cost dataset, showing that the slope parameters estimated from datasets obtained by pairings of the original dataset are within the limit defined by the theorem.  It concludes with a consideration of the application of the theorem in testing whether the results of one research study replicate the results of another, and with suggestions for further research.

## Precision in the Zonal Travel Cost Method – A Case Study

### Zonal travel cost studies often rely on very small datasets. Conventional estimates of precision are unreliable in those circumstances.

Originally posted 19/6/2015.  Re-posted following site reorganisation 21/6/2016.

The zonal travel cost method (ZTCM) for estimating the use value of recreational sites should never be expected to yield highly accurate results. One limitation is that it is subject to aggregation bias arising from averaging of data within zones (1a-b). Another is that, as shown in a previous post, results can be sensitive to zone definition. Nevertheless, it continues to be used because alternative methods have their own limitations.

This post considers the precision of one of the results of a ZTCM study relating to Lake Mokoan in Victoria, Australia, reported in Herath (1999) (2). It is not a full review of Herath’s paper, in which ZTCM is only one of several valuations methods used.

The number of visitors interviewed in Herath’s on-site survey was 90. That may seem a reasonably sized sample from which to draw inferences. In ZTCM, however, we use regression analysis to fit a trip-generating function not to individual data but to zonal aggregates. The study’s effective sample size was not 90, but the number of zones over which the individual data was aggregated, which was 5. Such a small sample should set warning bells ringing, since any conclusions drawn will be liable to considerable sampling error.

The duration of the survey is not reported, but it is stated that visit numbers averaged 10 per day on weekdays and 40 per day at weekends (3): given 90 interviews this suggests that the survey covered only a few days (not necessarily a single continuous period).

The normal aims of a ZTCM study are to estimate the demand curve for visits to a site and then to take the implied consumer surplus as an estimate of the site’s use value. Here I will focus on the essential preliminary stage of estimating the trip-generating function relating annual visit rate to travel cost. Herath compared several functional forms and found that a double log form gave the best fit ($R^2 = 0.96$). Table 1 shows the ordinary least squares (OLS) regression results with conventional standard errors. Although the standard errors are not quoted by Herath, they are readily obtained from his coefficient estimates and t-values, or as regression output from his data.

Herath follows the common practice of scaling up the survey data to reflect total annual visits, and then calculating the regression with annual visit rate per 1,000 population as the dependent variable. Personally, I prefer to scale up only after calculating a regression with an unscaled dependent variable (surveyed visits per capita). Either approach should yield the same results, but the latter is conceptually simpler when considering (as we will below) the variance of the dependent variable.

The analysis that follows will consider the precision of these coefficient estimates as measured by their standard errors. To focus on the key points at issue, I shall assume that all data were accurately measured and the trip-generating function was correctly specified as stated in Table 1, and ignore possible challenges to those assumptions.

In outline, there are two problems with the way in which these standard errors have been obtained. Firstly, standard errors estimated in the conventional way from the sum of squared residuals are subject to considerable sampling error when the sample size and therefore the number of residuals used in the calculation is small. The data points may just happen to be closer to or further from the fitted line than is representative of the population of possible zones around the site. Secondly, standard errors estimated by OLS are unreliable in the presence of heteroscedasticity, to which ZTCM is prone. The variance of a zonal visit rate will depend on the zonal population (larger population implies smaller variance) and on the visit rate itself (higher visit rate implies larger variance) (5).

What’s more, these two problems are difficult to separate since the small sample size also undermines some of the methods most commonly used in addressing heteroscedasticity. Inferences about the presence or form of heteroscedasticity from inspection or analysis of residuals are unreliable when the number of residuals is small. Use of robust (heteroscedasticity-consistent) standard errors is unreliable with small samples (6). Two-stage estimation, in which OLS estimates are used in obtaining weights designed to correct for heteroscedasticity in a second stage weighted least squares (WLS) estimation, is also unreliable because weights obtained in this way will be subject to considerable sampling error.

Let’s look more closely at how sampling error can affect the standard errors. The true variances (squares of standard errors) of estimates of regression coefficients are given by (7):

$Var[\hat{B}]= \sigma_0^2.(X'X)^{-1}.......E1$

Here B is the vector of coefficients and X the matrix of values of the independent variables. $\sigma_0^2$ is the regression variance (the conditional variance of the error term). Since this is usually unknown it is standard practice to substitute for it the variance estimator $s^2$, defined as below where SSR is the sum of squared residuals, n the number of observations, and k the number of coefficients estimated.

$s^2 \equiv \frac{SSR}{n-k}.......E2$

The problem with this is that the variance estimator, as its name suggests, is merely an estimator. It is an unbiased estimator of $\sigma_0^2$, but unbiasedness is a repeated sample property, so the value of $s^2$ calculated from a particular sample may be either less or more than $\sigma_0^2$. Considered over repeated samples, $s^2$ is a random variable with an approximately chi-square distribution given by (8):

$s^2 \sim \frac{\sigma_0^2}{n-k} \chi^2_{n-k}.......E3$

In the present case n = 5 and k = 2, so this reduces to:

$s^2 \sim \frac{\sigma_0^2}{3} \chi^2_3.......E4$

Although $\sigma_0^2$ is unknown, we can infer that the distribution of $s^2$ is proportional to that of a chi-square variable with 3 degrees of freedom. The distribution of such a variable is highly right-skewed. Although its mean is 3, its median is about 2.37, and the probability that its value is less than 3 is therefore well over half. In other words, it is not only possible but likely that $s^2$ will underestimate $\sigma_0^2$ and therefore that conventional standard errors will be underestimated. The distribution is also highly dispersed: its 25% and 75% percentiles are 1.21 and 4.11 respectively, and the 95% confidence interval is enormous, from 0.22 to 9.35.

So what can we do? It’s all very well to make the general observation that conventional standard errors calculated from very small samples and in the presence of heteroscedasticity are extremely unreliable. But can we determine whether the standard errors calculated from a particular dataset are too large or too small? There are two features of the present case that can help. One is the logged dependent variable which, as it happens, deals neatly with heteroscedasticity due to differing zonal populations. For a given zone, let V be the number of visits identified by the survey from that zone, N the zonal population, and VR the unscaled visit rate. Since N is a constant for any one zone so that Ln N is also a constant and Var[Ln N] is zero, we have:

$Var[Ln VR] = Var[Ln(V/N)] = Var[Ln V - Ln N] = Var[Ln V]+Var[Ln N]$

and so:

$Var[Ln VR] = Var[Ln V] .......E5$

Thus we have shown that Var[Ln VR] does not depend on N.

The other helpful feature is that the aggregate nature of the dependent variable enables us, on reasonable assumptions, to draw a conclusion about its variance. We start from the observation that the value of V, the number of visits from a particular zone identified by a survey, is a consequence of many separate decisions by the many individuals in that zone. Since it is unlikely that any individual will visit more than once within the scope of the survey, we can treat the number of visits v by an individual within the scope of the survey as a Bernoulli variable (equalling either 0 or 1). Writing p for the probability that v = 1, and using standard properties of the Bernoulli distribution, we have:

$E[v] = p.......E6$

$Var[v] = p-p^2.......E7$

We must expect that p varies between individuals within the zone: some will like visiting lakes more, or have more leisure time, than others. However, it is reasonable to assume that the p are all small, since people like variety in their leisure activity, so that even if someone likes visiting lakes and likes visiting that lake in particular, the probability that they will visit it within a particular period of a few days will be small. Hence the squared term in E7 can be ignored. It is also reasonable to assume that the p are largely independent, because decisions by one individual are unlikely to influence more than a tiny proportion (family members and a few friends, perhaps) of the thousands within the same zone). It can reasonably be assumed therefore that the distribution of V, the sum of the v for all the individuals in the zone, approximates to a Poisson distribution (9) with the properties:

$E[V] = Var[V] = \sum p.......E8$

where the sum is over all the individuals in the zone.

To use E8 to draw a conclusion about Var[Ln VR] we need to express the latter as a function of Var[V]. If Z is a Poisson variable with mean and variance $\lambda$, a good approximation (obtained using a Taylor series expansion of the log function) is:

$Var[Ln Z] \approx \frac{12\lambda^2 + 18\lambda + 11}{12\lambda^3}.......E9$

Its error is between 3% and 6% for $\lambda$ between 5 and 10, falling to less than 1% for $\lambda$ greater than 20. Putting Z = V in E9 and substituting into E5, we have:

$Var[Ln VR] \approx \frac{12(E[V])^2+18E[V]+11}{12(E[V])^3}.......E10$

For convenience I will abbreviate the right-hand expression in E10 as g(E[V]). From E10 we can infer that heteroscedasticity due to differences in visit rates is still present after the log transformation. This can in principle be addressed by WLS estimation, the necessary weighting factor being the reciprocal of g(E[V]). This is equivalent to OLS estimation of the model (with u as the error term):

$(g(E[V]))^{-0.5}Ln VR)=(g(E[V])) ^{-0.5}B1+(g(E[V])) ^{-0.5}B2(Ln TC)+u.......E11$

To confirm that this is homoscedastic, we first show that the dependent variable is homoscedastic. Noting that g(E[V]) is constant for any particular zone and using E10 we have:

$Var[(g(E[V]))^{-0.5}Ln VR] = g(E[V])^{-1}Var[Ln VR] \approx g(E[V])^{-1}g(E[V])$

and so

$Var[(g(E[V]))^{-0.5}Ln VR] \approx 1 .......E12$

Given our assumption that the regression model is correctly specified, we can infer that it is homoscedastic since for any zone the regression variance $\sigma_0^2$ is given by:

$\sigma_0^2 = Var[u] = Var[(g(E[V]))^{-0.5}Ln VR] \approx 1 .......E13$

However, the problem remains that we do not have reliable estimates of E[V] for each zone to slot into E11. We cannot therefore undertake a single definitive WLS estimation leading to coefficient estimates with reasonably reliable standard errors.

To make further progress, we can consider two cases. The pair of true values of the regression coefficients are either within or outside the 95% confidence ellipse defined by the OLS results. This ellipse (see Figure 1 below) is defined by the results in Table 1 together with the estimated covariance between the constant and travel cost coefficients. Its meaning is that, on repeated sampling and if the standard errors (and covariances) are correct, 95% of such ellipses will contain the true pair of coefficients. Note that Figure 1 is based on the unscaled visit rate (hence the range of the constant coefficient is much lower than its estimate in Table 1).

If the true coefficient values are outside this ellipse, it could be that the standard errors are correct and the sample data happens to be such that the ellipse is one of the unlucky 5%. This is an unlikely but not impossible scenario. Since we know that the standard errors are very unreliable, however, a more plausible interpretation is that the standard errors are too small, and that a confidence ellipse based on the true standard errors would have included the true coefficient values.

The case of true coefficient values within the ellipse requires a different type of reasoning. We select a representative sample of pairs of coefficient values from different regions of the ellipse. Since the ellipse is long and narrow, from top left to bottom right, I shall illustrate the method using three pairs of coefficient values identified as ‘top left’, ‘centre’ and’ bottom right’.  For each such pair, we use the regression model to calculate the implied values of E[V] for each zone, and substitute these into E12 to obtain weights for WLS estimation of the model.

Having obtained the estimated coefficients and standard errors for each pair, there is a further important step to be taken. Given our conclusion (E13) that $\sigma_0^2$ approximates to 1, we can use that fact in calculating the standard errors, and need not rely on $s^2$. If we have already obtained standard errors using $s^2$, we can infer a convenient method of adjustment from the relation:

$Var[\hat{B}] = \sigma_0^2.(X'X)^{-1} = \frac{\sigma_0^2}{s^2}.(s^2.(X'X)^{-1}) \approx\frac{1}{s^2}.(Var[\hat {B}]_{Conv}).......E14$

Here the subscript ‘Conv’ identifies conventional estimates calculated using $s^2$. From E14 we can infer this simple formula relating true standard errors (in this context) to conventional ones:

$se[\widehat{B_j}] \approx \frac{se[\widehat {B_j}]_{Conv}}{s}.......E15$

Table 2 below shows the results of the WLS estimations. All three estimations use the same underlying data; they differ only in the coefficient values used in obtaining the weights.

Table 2 shows that the standard errors obtained by WLS estimation and adjusted as described above are considerably higher than those in Table 1. The standard errors based on the top left point are 35% and 41% higher respectively for the constant and travel cost coefficients; for the other two points, the percentages are much higher. While it has not been proved that standard errors based on other points within the ellipse will also be considerably higher than those in Table 1, it appears a reasonable inference.

Whether the true values of the coefficients lie outside or inside the ellipse, therefore, we can conclude, to at least a reasonable degree of likelihood, that the standard errors in Table 1 are underestimated.

If we can dispense with $s^2$ in this way for this case, it might be asked, could the method be more widely used in econometrics? There are two reasons why its applicability is limited. Firstly, the reasoning that $\sigma_0^2$ approximates to 1 depends on the fact that we are dealing with data resulting from the aggregation of many independent binary decisions. Secondly, the fact that the sample size is so small means that any elements of approximation in that reasoning are relatively small in comparison with the imprecision of $s^2$. That there are elements of approximation is not denied: an individual could visit more than once; individual decisions will not all be independent; the $p^2$ terms will not be exactly zero. A judgment has to be made, and mine is that, in the circumstances of this case, the reasoning that $\sigma_0^2$ approximates to 1 is much more reliable than the variance estimator. But if the sample were only a few times larger, say n = 20, the sampling error in $s^2$ would be much less and the judgment less clear.

### Supporting Analysis

I can provide the supporting analysis on request (in MS Office 2010 format). My email address is in About.

### Notes and References

1a. Rosenthal D H & Anderson J C (1984) Travel Cost Models, Heteroskedasticity, and Sampling Western Journal of Agricultural Economics 9(1) p 58-60; http://ageconsearch.umn.edu/bitstream/32368/1/09010058.pdf

1b. Hellerstein D (1995) Welfare Estimation Using Aggregate and Individual-Observation Models: A Comparison Using Monte Carlo Techniques American Journal of Agricultural Economics 77 (August 1995) p 623

2. Herath G (1999) Estimation of Community Values of Lakes: A Study of Lake Mokoan in Victoria, Australia Economic Analysis & Policy Vol 29 No 1 pp 31-44

3. Herath, as 2 above, p 35.

4. Herath, as 2 above, p 37.

5. Christensen J & Price C (1982) A Note on the Use of Travel Cost Models with Unequal Zonal Populations: Comment Land Economics 58(3) pp 396 & 399

6. Imbens G W & Kolesar M (Draft 2015) Robust Standard Errors in Small Samples: Some Practical Advice pp 1-2 https://www.princeton.edu/~mkolesar/papers/small-robust.pdf

7. Ruud P A (2000) An Introduction to Classical Econometric Theory Oxford University Press p 157. Note that the argument of E1 to E4 assumes both homoscedasticity and normality of error terms (but it would be optimistic to expect that conventional standard errors would be more reliable when these assumptions do not hold).

8. Ruud, as 7 above, p 199

9. The distribution of V approximates to a Poisson binomial distribution, and this in turn approximates to a Poisson distribution. See Hodges J L Jr & Le Cam L (1960) The Poisson Approximation to the Poisson Binomial Distribution The Annals of Mathematical Statistics 31(3) pp 737-740 http://projecteuclid.org/euclid.aoms/1177705799