Population Synthesis and models
The need for more detailed population analysis
Recent years have seen an increased interest in the distributional impacts of transport interventions; reducing inequalities of access to opportunities for work and study has become an important policy objective. These impacts can be estimated using classic models with an appropriate traveller segmentation in the demand models. Alternatively, one may adopt Agent or Activity Based Modelling approaches. Both classic and new models benefit from a more granular treatment of the population of interest, their preferences and interactions. A good way to achieve this is to create a synthetic population with a full set of characteristics of interest for a particular model.
A synthetic population would also be a very good starting point for more conventional models as it allows a more granular treatment of the distributional impacts of any policy intervention. A synthetic population is, of course, an artificial construct; it would contain as many households and individuals as the most recent census but to protect privacy none of the simulated entities correspond to a real one.
There are two main sources of data to build this synthetic population: A travel survey (perhaps up to a 2% sample of the population), and the most recent census data as a control to expand and distribute that sample. The survey will probably be an augmented version of a Household Travel Survey (HTS) collecting additional data in respect of issues considered important in the model; for example on preference for car ownership or Mobility as a Service, or how is the use of the car decided within the household.
The main objective is, therefore, to create this synthetic population representing everybody resident in the study area while retaining the relevant attributes of interest to use in the model. If the model is used in forecasting mode (rather than as a policy testing tool for the present) then this population needs to be synthesised for future years based on a few properties that are actually forecast by planners, such as the number of people and households per zone, perhaps income. Other attributes, like distribution of household sizes, age distribution, school and university attendance, multiple vehicle ownership, propensity for remote work and procurement of services will need to be estimated or assumed.
The synthesis procedure involves two main steps. First, a demographic distribution of households is estimated for each transport zone, and then a matching sample of households is drawn from a set of household records for which nearly complete census information is available.
The number of households in each cell is estimated through an iterative multi-proportional fitting procedure. The procedure starts with an initial joint distribution available for (aggregate) census geographical units. It then cycles iteratively through a set of control totals, one for each category of each control variable.
?A simple example
As an example, consider sample household data as shown above. There are three household sizes and two income levels. We know from, say census data, that there are 55 households (HH) with low income and 35 HH with high income in that zone, and that there is a total of 20, 40 and 30 HH of each size. The sample data is shown in the 3x2 box labelled Sample and Targets. Applying a bi-proportional adjustment, in this case, will solve this population synthesis problem. First adjustment factors are calculated to scale up the total number of households in the zone. Then an iterative process adjusts for Income level and then again for HH number until convergence, approximately in iteration 7.
领英推荐
Modelling extensions
This approach is often extended to cover other dimensions like car ownership, number of students, etc. Additional household characteristics that may be used as controls in special cases include age and gender of the head of household, presence of children, and family vs. non-family household members. The adjustments will then be multi-proportional and a requirement for this procedure to work is to have consistent control of marginal totals. In this case, the iterative procedure will converge so that all control totals are satisfied and the correlation structure of the initial joint distribution is preserved. Control totals are taken from census tables for the base year. For the forecast years, they will come from demographic and land use forecasts, which may be less detailed.
It is also useful to note that the problem of zero cells or zero marginals, that affected trip matrix expansion or matrix estimation, applies also to the population synthesisers. Similar corrections would need to be applied.
From a modelling perspective, the process of population synthesis needs a second phase. In this case, we need to identify person attributes from within each household; again, we will be interested in retaining the person attribute marginal totals for each zone. This second phase typically includes three steps.
The first is to convert into integers the non-integer values for households in zones resulting from the first phase; fractions of households cannot be handled in agent or activity-based models. Second, a Monte Carlo procedure is typically employed to draw the correct number of households of each type from the HTS. Note that as some of the desired data may not be available in the census, or it may not be accessible to the modeller, it is often inevitable to sample from the HTS and any activity diary dataset available. Third, the useful household and person variables are extracted from the drawn households and retained for use by the model system.
Optional steps
There is an optional fourth step used in some models to assign each household to a more precise location within its geographic unit. For example, for the detailed modelling of Demand Responsive Transit it is desirable to identify the coordinates of each household, each individual and each available unit (say e-scooter) to serve specific demands at particular times.
The final output from these processes is a synthetic population where each synthesised household and its members have many clearly defined characteristics of interest for use in the model system and, together, they match the estimated demographic distribution within each zone.
???????????This synthetic population would be the basis for Agent and Activity Based Models; it will also facilitate a more detailed analysis of the distributional impacts of transport interventions using classic aggregate models.
Head of Section at Federal Office For Spatial Development ARE
1 年interesting! could you please indicate where I can find further information on the models? Thanks!