Multipopulation mortality modelling and forecasting: the weighted multivariate functional principal component approaches

ABSTRACT Human mortality patterns and trajectories in closely related populations are likely linked together and share similarities. It is always desirable to model them simultaneously while taking their heterogeneity into account. This article introduces two new models for joint mortality modelling and forecasting multiple subpopulations using the multivariate functional principal component analysis techniques. The first model extends the independent functional data model to a multipopulation modelling setting. In the second one, we propose a novel multivariate functional principal component method for coherent modelling. Its design primarily fulfils the idea that when several subpopulation groups have similar socio-economic conditions or common biological characteristics such close connections are expected to evolve in a non-diverging fashion. We demonstrate the proposed methods by using sex-specific mortality data. Their forecast performances are further compared with several existing models, including the independent functional data model and the Product-Ratio model, through comparisons with mortality data of ten developed countries. The numerical examples show that the first proposed model maintains a comparable forecast ability with the existing methods. In contrast, the second proposed model outperforms the first model as well as the existing models in terms of forecast accuracy.


Introduction
There have been tremendous developments in the area of mortality modelling and forecasting over the last three decades.These include the pioneering mortality model proposed by Lee & Carter (1992), well-known as the Lee-Carter model.It rapidly gained credit and popularity, given its simplicity and ability to capture most variations in mortality pattern evolved over time.
Several modifications and extensions of the Lee-Carter model have been put forward, see, for instance, Lee & Miller (2001), Booth et al. (2002), Renshaw & Haberman (2006) and Cairns et al. (2008).It is worth noting that Hyndman & Ullah (2007) further extends the Lee-Carter model to a functional data framework, which includes non-parametric smoothing techniques, functional principal component decomposition and times series analysis to achieve the task of mortality modelling and forecasting.Although the models as mentioned earlier posed a great success in history, the single factor designs limit their capacity of mortality modelling and forecasting on solely one population.It seems improper to prepare a mortality forecast for an individual population in isolation from one another if they are closely linked together.For example, due to biological and behavioural reasons, male mortality rates have considerably been higher than female mortality rates, see Kalben (2000).However, if male mortality improvements are faster than female ones, but two genders are projected independently, the model may forecast male mortality rates lower than and eventually diverged further from female mortality rates.As such, it is always a significant challenge in human mortality modelling that the model can take multiple populations as well as their heterogeneity simultaneously into account.Several mortality models for multiple populations have been proposed in the literature over the last decade, see, for instance, Delwarde et al. (2006) and Dowd et al. (2011).In more desirable cases, the model can further ensure that the forecasts for multiple related populations maintain certain structural relationships based on the extensive theoretical considerations and historical observations.A 'coherent' or 'non-divergent' model is one of the most well-suited cases in mortality modelling given the fact that the mortality of populations that are geographically close or otherwise related is driven by a common set of factors such as socio-economic, environmental and biological conditions and differences are unlikely to increase in the long run.Such coherent forecast models are also documented in the literature, see, for example, the earliest augmented common factor (ACF) model proposed by Li & Lee (2005), which is an extension of the Lee-Carter model with an additional common factor to capture both short-term divergence and long-term coherence among related populations.Variants and extensions of the ACF model have been subsequently developed, such as Li (2013), Li et al. (2016) and Chen & Millossovich (2018).Some others like the Age-Period-Cohort (APC) model proposed by Cairns et al. (2011), incorporate a mean-reverting stochastic process for two related populations and allow for different trends in mortality improvement rates in the short-run but parallel improvements in the long-run.The Product-Ratio model developed by Hyndman et al. (2013), which models the product and ratio functions of the age-specific mortality rates of different populations individually through a functional principal component decomposition, achieves coherent mortality forecasts by constraining the forecast ratio function via a stationary time series model to appropriate constants.Shang et al. (2016) and Wu & Wang (2019) use multilevel functional principal component analysis of aggregated and population-specific data to extract the common trend and population-specific residual trend among populations.The forecast of population-specific residual trend is restricted to be a stationary time process to achieve convergence in the long run.Some other developments in this field include Jarner & Kryger (2011), Hatzopoulos & Haberman (2013) and Wan & Bertschi (2015).Also, see Danesi et al. (2015) and Enchev et al. (2017) for reviews and comparisons.
In this paper, we propose two new models for mortality modelling and forecasting with the theoretical framework of multivariate functional principal component analysis techniques introduced by Chiou et al. (2016) and Happ & Greven (2018).The main objective of the multivariate functional principal component analysis is to model multiple sets of functional curves that may be correlated among others, which allows us to construct two new models on top of it.The first one is to model groups of population mortality rates with similarities across periods and ages together.The second model is a novel method for coherent mortality modelling and forecasting that captures the common trend and the population-specific trend of groups of mortality patterns and produces forecasts of different populations that do not diverge in the long run.The models are estimated using the weighted functional principal component algorithm (Hyndman & Shang, 2009), which places more weights on recent mortality movements and so produces more realistic forecasts.
More will be discussed in detail in the paper, and the rest of this article is organised as follows.In Section 2, we give a review of the theoretical background about univariate and multivariate functional principal component analyses.In Section 3, we describe the general frameworks of two proposed multivariate functional principal component analysis models for mortality modelling and forecasting.We then illustrate the models by applying them to the sexspecific mortality rates for Japan with comparisons to two analogous functional data paradigms − the independent FPCA model and the Product-Ratio model proposed by Hyndman & Ullah (2007) and Hyndman et al. (2013), in terms of the systematic differences and forecasting per-formances using sex-specific mortality data of ten developed countries in Section 4. We lastly conclude this article with discussions and remarks in Section 5.

Theoretical background of FPCA
Functional principal component analysis (FPCA) is the core technique applied primarily in this paper.It is a statistical method for analysing the variation of a bunch of functional curves in a dataset then reducing them from infinite dimensions to finite dimensions in principal component representations of variation (Ramsay & Silverman, 2007).It can also be regarded as a functional extension of the multivariate PCA method, allowing the data dimension to increase from finite space to infinite space (Ramsay & Dalzell, 1991).After the Karhunen-Loève theorem in expansions of a stochastic process proposed by Karhunen (1946) and Loève (1946), the theoretical developments of FPCA can be divided into two main streams: the linear operator and the covariance operator perspectives, see, for example, Besse (1992), Ramsay (2004), Yao et al. (2005), Hall et al. (2006) and Bosq (2012).To get the readers well equipped with necessary concepts in this paper, we firstly give a brief review of univariate FPCA then move to discuss the theoretical framework of multivariate FPCA from the covariance operator point of view.

Univariate FPCA (UFPCA)
Let Y (x) be a mean square continuous (L 2 -continuous) stochastic process on a domain X with a mean function µ With the defined covariance operator Λ, we can perform a spectral analysis of the covariance function K(x, x ), such that to obtain a set of orthonormal basis eigenfunctions {φ n (x)} ∞ n=1 and a corresponding set of explained by the {φ n (x)} ∞ n=1 .Y (x) can now be represented as an infinite linear combination of the orthonormal functions by the Karhunen-Loève theorem, that is where β n is the principal component score with and thus reduce the infinite dimension of functional data into finite dimensions in principal direction of variation which is often used in practice for data analysis, e.g. for regression or for clustering (Ramsay & Silverman, 2007).

Multivariate FPCA (MFPCA)
We now concern with multivariate functional data and the way to establish a direct link from the Karhunen-Loève representation of univariate functional data discussed previously to the Karhunen-Loève representation of multivariate functional data.Consider a random sample which consists of p ≥ 2 sets of functions Combining all different sets of functions in a vector Y (x), we have with a mean function and a covariance function Assuming that there exists a covariance operator Γ : L 2 (X ) → L 2 (X ) for all f ∈ L 2 (X ) with the i-th element of (Γf ), With the defined covariance operator Γ and the similar structure in the univariate FPCA, we let Γψ (x) = νψ(x) by the spectral theorem on the covariance function is the orthonormal basis of eigenfunctions and ν is the corresponding eigenvalue.Then for all i, j = 1, • • • , p and x ∈ X , we have Without any loss of generality, we assume that each set of the functions has its finite univariate Karhunen-Loève representation up to the optimal first N -dimensional approximations as shown in Equation (2.1), i.e.
n (x), and For simplicity of notation, we denote Cov(β ml and X φ Equation (2.2) can be rewritten as Following a similar technique of Zemyan (2012) in solving Fredholm integral equations of the second kind with a separable covariance function, we can multiply and integrate an orthonormal basis eigenfunction φ Due to the orthonormality, X φ n , it holds with the simplified notations (2.4) For a given value of ν, the solvability of this linear system correlates with the solvability of the integral equation.Since i, j and n, m, l are arbitrarily notated, Equation (2.4) can be represented in matrix form when we consider it as a whole, which is equivalent to an eigenequation, i.e. 11) . . .Z (1p) . . . . . . . . . 1)  . . . 1)  . . .
with a positive semidefinite block matrix Z (ij) ∈ R N ×N and an eigenvector c (i) ∈ R N entries.
With a matrix eigenanalysis performed on Equation (2.5), we can obtain a set of orthonormal Substituting the orthonormal eigenvector c n and the corresponding eigenvalue ν n into Equation (2.3), the eigenfunction ψ n (x) of Γ is given by their elements: where [c n ] (i) denotes the i-th block of the orthonormal eigenvector c n of Z corresponding to its eigenvalue ν n .The truncated multivariate Karhunen-Loève expansions with the first optimal N -dimensional approximations to Y (i) (x) can be written as where ρ n is the multivariate principal component score with where m is the m-th univariate principal component score of the i-th element.
The mean and the covariance of ρ n can be derived for all i, j = 1, • • • , p and x ∈ X as (2.7) Because of the orthonormality, X ψ

Algorithm of estimating the MFPCA
In practice, the natural path from the univariate to the multivariate functional principal component analysis discussed above can be summarised in a simple algorithm introduced by Happ & Greven (2018).Given a random sample consists of p ≥ 2 sets of functions on a domain X for all x ∈ X , the algorithm comprises the following four steps: 1. Perform a univariate functional principal component analysis for each element 2. Combine all the estimated principal component scores into a single large matrix Ξ where and estimate the joint covariance matrix 3. Perform a matrix eigenanalysis for Ẑ to obtain a set of estimated orthonormal eigenvectors {ĉ n } N n=1 and a set of corresponding eigenvalues {ν n } N n=1 of Ẑ.
4. Calculate the estimated multivariate eigenfunctions and the estimated multivariate principal component scores according to their i-th elements: The empirical truncated multivariate Karhunen-Loève representation with the first optimal Ndimensional approximations to , and the estimated multivariate principal component score ρt,n gives the individual weight of each observation unit t for its corresponding estimated multivariate eigenfunction ψ(i) n (x).

Methodology
3.1 Weighted MFPCA model for multi-population mortality rates forecasting In this section we firstly introduce our new model, namely weighted MFPCA (wMFPCA) model for forecasting mortality rates of several subpopulations within a large population simultaneously. Let t (x) denote the log of the observed mortality rates of the i-th subpopulation for age x in year t.We assume there is an underlying L 2 -continuous smooth function f t (x) that we are observing with error and at discrete points of x.Our discrete observations are {x j , Y t (x j )}, t,j } T,J t,j=1 are i.i.d.standard normal random variables, and σ t (x j ) allows the amount of noise varying with age x.
In demographic modelling, it is often the case that more recent experience has greater relevance on the future behaviour than those data from the distant past.In view of this, we comprise a weighted functional component algorithm for the MFPCA model, allowing the forecasting results of the model to be based more on the recent data.
Let f (i) t (x) be a smoothed function estimated from the observation Y (i) t (x j ), and w t = κ(1 − κ) T −t be a geometrically decaying weight with 0 < κ < 1.The overall mean function The mean-centred functional data is denoted as We then follow the algorithm of estimating MFPCA introduced in the previous section to calculate the weighted principal component scores and the weighted multivariate eigenfunctions using the functional form of F (i) to obtain up to the first optimal N -dimensional approximations.We lastly combine the estimated weighted average with the estimated weighted multivariate eigenfunctions and the estimated multivariate weighted principal component scores to obtain the weighted MFPCA model for mortality modelling and forecasting of the i-th subpopulation with first optimal N -dimensional approximations, i.e.

Out-of-sample forecasts and prediction intervals of the wMFPCA model
Forecasts can be obtained by forecasting the weighted principal component scores {ρ t,n } N n=1 using time series models independently.There is no need to consider the vector autoregression (VAR) model for forecasting the weighted principal component scores as they are not correlated, see Equation (2.7).{ρ t,n } N n=1 can be extrapolated using possibly non-stationary autoregressive integrated moving average (ARIMA) model, and we can select the order of the ARIMA model based on the Akaike information criterion (AIC) or the Bayesian information criterion (BIC).
Let ρt+h,n denote the h-step ahead forecast of ρt,n , then the h-step ahead out-of-sample forecast of We can also obtain the forecasting variance of the model by adding up the variances of all terms together given the fact that the components of the wMFPCA model are uncorrelated, such that Var( where τ 2 μ(i) (x) is the variance of the smoothed estimates of the mean function derived from the smoothing method applied, νt+h,n is the estimated variance of ρt+h,n that can be obtained from the time series method used, and the estimated variance of forecast error (σ (i) t+h (x)) 2 is calculated by taking the average of observational variance from the historical data.
With the normality assumption on the model error and the known Var( , where z α is the (1 − α/2) quantile of the standard normal distribution.

Coherent wMFPCA model for multi-population mortality rates forecasting
We now introduce the idea of the coherent wMFPCA model, in the sense that the long term forecasts of several subpopulations within a large population will be non-divergent. Let where is the smoothed mortality function of the i-th subpopulation for age x in year t, µ(x) is the average of total mortality function, η (i) (x) is the mean of the i-th subpopulation specific deviation function from the averaged total mortality function, G t (x) is the common trend across all populations, and Z (i) t (x) is the i-th subpopulation specific deviation trend.
In such a model, µ(x) and η (i) (x) are unknown fixed functions, while G t (x) and Z (i) t (x) are assumed to be independent zero mean stochastic processes to ensure identifiability (Shang et al., 2016), such that G t (x) and Z (i) t (x) can then be decomposed by the (multivariate) Karhunen- Loève representation as

Estimation of the coherent wMFPCA model
We carry on the same weighted functional component algorithm applied in the wMFPCA model for the coherent wMFPCA model.The components of the coherent wMFPCA model can be obtained using the estimation procedures below in practice: 1. Obtain the total mortality function among all subpopulations smoothed mortality func- f (i) t (x), then calculate the weighted mean function of the total mortality function, μ(x) = T t=1 w t ĝt (x), where w t = κ(1 − κ) T −t is a geometrically decaying weight with 0 < κ < 1.

Calculate the centered functional data
3. Perform univariate FPCA on the functional form of Ĝ to get Ĝt (x) = K k=1 βt,k φk (x) up to the first optimal K-dimensional approximations.Let gt (x) = μ(x) + K k=1 βt,k φk (x) be the estimated weighted total mortality function.
4. Calculate the deviation of the i-th subpopulation specific mortality function from the estimated weighted total mortality function, , then calculate the weighted mean of the subpopulation specific deviation function, 5. Obtain the demeaned functional data as a T by J matrix Ẑ(i) * , then multiply Ẑ(i) * by W , where 6. Perform multivariate FPCA on the functional form of Ẑ(i) to obtain up to the first optimal L-dimensional approximations.
With all the estimated components obtained above, we can represent the coherent wMFPCA model for mortality modelling and forecasting of the i-th subpopulation, i.e.
or the full representation of the coherent wMFPCA model with the first optimal K-dimensional approximations to the common trend and the first optimal L-dimensional approximations to the i-th subpopulation deviation trend, such that where μ(i) (x) = μ(x) + η(i) (x) is the mean function of the i-th subpopulation.

Out-of-sample forecasts and prediction intervals of the coherent wMFPCA model
The h-step ahead out-of-sample forecast of Y (i) t (x) can be represented as where βt+h,k and γt+h,l are the h-step ahead forecasts of the weighted principal component scores of the common trend and the i-th subpopulation specific deviation trend.βt+h,k can be obtained using a univariate time series forecasting method, such as ARIMA model.To ensure the predictions of the subpopulations are coherent in the long term, the forecasts of all subpopulation deviation trends need to be restricted to be convergent and a stationary process, such that Given the way that the coherent wMFPCA model has been constructed, each component is independent to the other components.Therefore, the forecast variance can be expressed by the sum of component variances, i.e. Var( where τ 2 μ(i) (x) is the variance of the smoothed estimates of the mean function derived from the smoothing method used, ût+h,k and ωt+h,l are the variances of βt+h,k and γt+h,l that can be obtained from the time series methods applied, and the forecast error (σ (i) t+h (x)) 2 is the average of the observational variance estimated from the historical data.
With the normality assumption on the model error and the known Var( , where z α is the (1 − α/2) quantile of the standard normal distribution.
Note that the weights {w t } T t=1 are controlled by the tuning parameter κ in the geometrically decaying weighting approach embedded in the two proposed models.The larger κ is, the faster the weights for the historical observations are decaying over time geometrically.In practice, the tuning parameter κ can be determined by minimising the average root mean square error (RMSE) of all populations defined as The value of the parameter κ can alternatively be specified as a prior , if there is a strong prior knowledge of how past data should be weighted (Wu & Wang, 2019).
For selecting the optimal number of principal components in the two proposed models, we use a cumulative percentage of total variation method.We denote N as a generic notation of the optimal number of principal components, and N is determined by where λn is the corresponding estimated eigenvalue of the principal components analysis, and P ≥ 0.9 is set to be the minimum acceptance level as suggested by Chiou et al. (2012).

Applications
In this section, we illustrate the two proposed models − the wMFPCA model and the coherent wMFPCA model using sex-specific mortality data.We first present and plot the observed mortality dataset, then demonstrate the usefulness of these two models by forecasting of the sex-specific mortality rates of Japan.We show the forecasting results for males and females compared with the observed data.We further exhibit the ability of non-diverging long term forecasts of the proposed coherent wMFPCA model and finally assess the forecasting accuracy of the two proposed models in comparison to the Product-Ratio model and the independent FPCA model using the sex-specific mortality data of ten different developed countries.

Sex-specific mortality data of Japan
The sex-specific mortality rates data of Japan are available for the year 1947 to the year 2016 from the Human Mortality Database (2020).The database consists of central death rates by gender and single calendar year of age up to 110 years old.We restrict the data at the maximum age of 100 to avoid problems associated with erratic rates above age 100.The observed mortality rates curves are smoothed using penalised regression splines with a partial monotonic constraint so that each mortality curve is increasing above age 65 monotonically (Hyndman & Ullah, 2007).Figure 4.1 shows the log mortality rates for males and females for selected age groups as univariate time series.that there are steady declines in male and female mortality rates at most ages over our examined period.The mortality curve patterns for male and female are reasonably similar, while for male, the mortality rates are generally higher than the mortality rates of female, particularly at around age 20.Despite the higher male mortality rates in comparison with female's, the mortality gap between male's and female's gets narrower over time at older ages.

Sex-specific mortality forecasting by the wMFPCA model
In the demonstration of the first proposed weighted MFPCA model, we aim to make 20-yearsahead out-of-sample forecasts for male and female mortality rates of Japan.We first split the dataset with the observed mortality rates from the year 1947 to the year 1996 and a test dataset with remaining observed mortality data from the year 1997 to the year 2016.We choose the weight parameter κ in the wMFPCA model by minimising the average root mean square error (RMSE) stated in Equation (3.1) based on rolling-window analyses.For example, in our demonstration, we aim to make a 20-years horizon prediction for the male and female mortality in Japan for the year 2016.We firstly use the data from the year 1947 to the year 1967 to make a 20-years horizon prediction for the mortality rates in the year 1987 then compute the RMSE with the observed data for the first rolling-window analysis.We subsequently use the data from the year 1947 to the year 1968 to make the prediction for the mortality rates in the year 1988 and compute RMSE with the observed data again.We repeat this procedure until it reaches a point where it uses the data from the year 1947 to the year 1976 to make a forecast for the endpoint of the observed data in the year 1996.We decide the value of the weight parameter over the interval 0 < κ < 1 that can minimise the average RMSE of male and female mortality data among these ten rolling-window tests.
The mean functions for male and female and their functional principal components are estimated as discussed in the previous section.The analysis shows that the first three functional principal components for male and female explain 97.2%, 2.3% and 0.2% respectively, and account for more than 99% in total of the variations in the sample data.We, therefore, select the first three estimated principal components for approximations and demonstrations.For each score of the corresponding functional principal component shared by male and female, we forecast it independently by a univariate ARIMA time series using the R package 'forecast' (Hyndman et al., 2007).The order of ARIMA models is chosen based on the Akaike information criterion (AIC).
The top panels of Figure 4.4 demonstrate the estimated mean functions μ(i) (x) and the first three estimated functional principal components { ψ(i) n (x)} for male (i = M) and female (i = F).Their corresponding scores of the PCs {ρ t,n } 3 n=1 with a 20-years-ahead out-of-sample forecast horizon and 95% prediction intervals are displayed in the bottom panel of Figure 4.4.
The functional principal components model different movements in mortality rates.The first functional principal components for male and female show a similar pattern, which both count for very high variation for young teenagers, then become level-off in middle-age and elderly.
Although the corresponding scores for the first component show an increasing trend, we can interpret that there is a decreasing trend in mortality rates across our observed period given its corresponding first components for male and female are both negative.The second and the third principal components for male have relatively high negative variation among younger ages, but they get less oscillated in the ranges of middle-age and elderly.While the second and the third principal components for female show kinds of the opposite direction of each other.
Figure 4.5 presents the residual plots by year and by age of the wMFPCA model fitted by gender in Japan.As we have applied the weighting approach for placing more recent data into considerations, both the residual plots by year for male and female, have a funnel shape.These indicate that the fitted wMFPCA model captures more information from the recent data and leave the distant past data with less impact.The residual plots by age for male and female show that the fitted values of the models are underestimated for the young age groups and slightly overestimated for the elderly groups.These may be due to the higher mortality improvement rates for the young group and the lower mortality improvement rates for the elderly group over our fitting period.

Sex-specific mortality forecasting by the coherent wMFPCA model
The presentation of the coherent wMFPCA model forecasting follows the same strategies of how we split the dataset for in-and out-of-sample data and choosing the weight parameter for the coherent wMFPCA model as we have done for the wMFPCA model in the previous section.
In the top panels of in mortality among different age groups as we expect that the variance of mortality rates in young age groups is relatively larger than elderly groups.In contrast, φ2 (x) and φ3 (x) model the young adult age below 40 and differences between late teen and those over 60.From the first forecast common principal component scores, the declines of mortality rates seem to continue, and we do not spot any apparent pattern from the second and third estimated common principal component scores.
In the top panels of Figure 4.8, we plot the estimated male and female deviation functions from the overall mean.It is obvious that the estimated male deviation function ηM (x) is a positive function, while the female estimated deviation function ηF (x) covers whole negative range over all ages.These indicate that the male mortality rates are in general higher than female's, and the difference reaches its peak at around age 20.All estimated male and female deviation functions from the overall mean behave like an upside-down reflection of each other and demonstrate completely reverse patterns.The bottom panels of Figure 4.8 show the first three estimated deviation trend functional principal components for male and female.From the forecasts of the three shared principal component scores among two genders, they seem to have a flat zero-convergent trend.It is therefore likely to achieve non-divergent forecasts between male and female subpopulations in the long term, in which we will further examine this in detail in the next section.
Figure 4.9 shows the residual plots by year and by age of the fitted coherent wMFPCA model for male and female mortality of Japan.Compared to the residual patterns of the fitted wMFPCA model used in the previous section, the fitted coherent wMFPCA model seems to capture mortality information evenly over the whole observed period, given that both residual plots demonstrate i.i.d.random variables patterns with zero mean and equal variance across all the age-groups over our examined period.Figure 4.10 shows the 20-years-ahead forecasting results of mortality curves of male (with RMSE = 0.1821) and female (with RMSE = 0.1460) from age 0 to age 100 for the year 2016 using the coherent wMFPCA model based on the in-sample data from the year 1947 to the year 1996 in Japan.

Forecast pattern with mortality sex ratio and life expectancy by coherent and non-coherent forecasting methods
In this section, we move to examine and compare forecast patterns with mortality sex ratios and life expectancy by the forecasts of the two proposed models − the wMFPCA model and the coherent wMFPCA model with two existing models − the independent FPCA model (Hyndman & Shang, 2009) and the Product-Ratio model (Hyndman et al., 2013).
The independent FPCA model is a univariate FPCA method for forecasting two subpopulation groups independently without considerations of any potential correlation of two subpopulation groups.Similar to the wMFPCA model, the forecast results of the independent FPCA model are based on a non-stationary time series model on its estimated principal component scores, leading to forecast results of two or more subpopulations divergent to different directions in the long run.It is thus regarded as a non-coherent forecasting approach.Meanwhile, the Product-Ratio model begins with an idea of obtaining the product and ratio function of all subpopulations by assuming all subpopulations have equal variance.In the log scale, the product function can be treated as the sum of all sub populations, whereas the ratio function can be treated as the differences among subpopulations.The predictions can be obtained by firstly applying the independent FPCA model to forecast the future realisations of the product and ratio functions separately, then transforming the forecasts of the product and ratio functions back to the original subpopulations functions.The convergent forecasts are achieved by using stationary time series methods, namely the ARMA model or the ARFIMA model, on the ratio function, which implicitly implies that the differences among subpopulations will be convergent to zero in the long term.It is therefore viewed as an example of coherent forecasting approach.
In a similar vein, the proposed coherent wMFPCA model restricts the stationary properties on the deviation functions of each subpopulation from the overall mean to accomplish the coherent forecasting with no need to assume all the subpopulations have the same variances.
To deliver the concept of coherent forecasting more concretely, we plot the historical mortality sex ratios (Male/Female) and the life expectancy curves obtained from the observed male and female mortality rates from the year 1997 to the year 2016 alongside the 20-years-ahead forecasts of the mortality sex ratios and the life expectancy curves from the year 1997 to the year 2016 by the non-coherent forecast methods − the independent FPCA model and the wMF-PCA model and the coherent forecast methods − the Product-Ratio model and the coherent wMFPCA model using the observed mortality rates from the year 1947 to the year 1996 in Fig- ure 4.11 and Figure 4.12.We can see that all the coherent forecasting methods exhibit a fairly one-to-one smooth pattern with the actual mortality sex ratio, and with much less fluctuation than the non-coherent forecast methods in Figure 4.11.The convergent forecastings by the coherent models also fit well with the actual biological characteristics trends in Figure 4.12, where the differences in males and females life expectancy converge to a certain level gradually and slowly, instead of diverging into different directions like the forecast results of the non-coherent forecast methods.Our demonstrations prove the importance of coherent modelling when there exist common biological characteristics among several subpopulations.

Forecast accuracy evaluation with comparisons to other existing methods
We now evaluate and compare the forecast accuracy of our two proposed models − the wMPFCA model and the coherent wMFPCA model with the two existing models − the independent FPCA model and the Product-Ratio model demonstrated in the previous section.In order to have a comprehensive investigation of the forecast accuracy of our two proposed models, we consider ten other developed countries for which data are also available in the Human Mortality Database.We restrict data periods of all selected countries commencing in the year 1947 up to the year 2016 (70 years in total) for a unified purpose.We examine and quantify the forecasting performance of our models by a rolling window analysis, which is frequently used for assessing the consistency of a model's forecasting ability by rolling a fixed size prediction interval (window) throughout the observed sample (Zivot & Wang, 2007).More generally speaking, we hold the sample data from the initial year up to the year t as holdout samples.We produce the forecast for the t + h year where h is the forecast horizon, then determine the forecasts errors by comparing the forecast result with the actual out-of-sample data.We increase one rolling window (1 year ahead) in year t + 1 to make the same procedure again for the year t + h + 1 until the rolling window analysis covers all available data.
We include four different forecast horizons (h = 5, 10, 15 and 20) with ten sets of rollingwindow to exam the short-term, the mid-term and the long-term forecast abilities of the proposed two models.We use the root mean square error (RMSE) to measure the standard deviation of the average square prediction error regardless of sign.In our experiments, we define the the smallest overall forecast errors and bias for both genders in the long-term prediction in our study.
The main finding in this section is that in the two-sex case, the accuracy of the male forecast is significantly improved by the coherent wMFPCA model at the small expense of accuracy in female mortality forecasts.By adopting the coherent wMFPCA model, the forecast accuracy among all subpopulations is homogeneous as it incorporates additional information into the forecast for a single subpopulation.The additional information acts as a frame of reference limiting to the probability of a subpopulation forecast which may continue a diverging trend from other related subpopulations directions.This feature of the coherent wMFPCA model is also useful in some other specific practical applications, such as financial planning with several related stock prices, in a situation that we aim to maintain a balanced error margin amongst all subpopulations.This speciality is unique and has not been achieved by other non-coherent or single population models.) Forecast accuracy of mortality for male and female using the independent FPCA model, the wMFPCA model, the Product-Ratio model and the coherent wMPFCA model is measured by the average RMSEs of ten sets of rollingwindows analysis.The minimal forecast errors are highlighted, and the lowest averaged forecast errors among models in different forecast horizons are highlighted in bold.

Discussion and conclusion remarks
With the theoretical framework of multivariate functional principal component analysis motivated by Chiou et al. (2016) and Happ & Greven (2018), in this paper, we have proposed two new models that aim to model and forecast for a group of mortality rates, taking advantages of commonalities in their historical experience and age patterns.The first one, namely as wMFPCA model, is introduced to acknowledge differences in groups, age patterns and trends of several subpopulations to model together when subpopulations have somewhat sufficiently similar historical patterns.The coherent wMFPCA model is a novel extension of the wMFPCA model in a coherent direction.We design the coherent structure of the model to primarily fulfil the idea that when several subpopulation groups have similar socio-economic conditions or common biological characteristics and such these close connections are expected to continue and evolve in a non-diverging fashion in the distant future.The time weightings approaches on these two models lead us to expect the future patterns of mortality follow more likely recent past observations and obliterate some parts of irrelevant distant past mortality movements in favour of forecast performances of the two proposed models.
We have demonstrated the two models through forecasting for sex-specific mortality with the observed data from Japan.The wMFPCA model consists of the mean functions and the functional principal components of each subpopulation with corresponding scores shared by all subpopulations.We can obtain the forecasts of the wMFPCA model by extrapolating the shared principal component scores ahead with any non-stationary time series model, such as ARIMA model in our example.Whereas there are two primary components in the coherent wMFPCA model, including the average components among all subpopulations and the deviation components of each subpopulation deviated from the average components.We also use a non-stationary time series model for the predicted average component corresponding scores.
However, we apply a stationary time series model for the forecast of deviation components corresponding scores so that the differences among all subpopulations will gradually decline and be bound to achieve coherence in the long term forecast.The forecasts of mortality sex ratios In contrast, the coherent wMFPCA model produces more sensible forecast results with less forecast error and bias in the two-sex mortality case in the long term.
The main limitations of the two proposed models also attribute to the characteristics in which they belong to the classes of 'non-parametric' or 'pure extrapolative' methods.They can capture trends in the historical data well.At the same time, they lack the ability to incorporate more other related information, such as the change in medical technology, environment and social-economy for predictions.Another issue is the compatibility of the wMFPCA models.
It requires the sufficient level of homogeneity among subpopulations for forecasting, and the forecast ability of the wMFPCA models may be affected if several completely irrelevant subpopulations are placed together in the wMFPCA models.Although we used two sex-specific mortality rates for our demonstration in this article, it is obviously straightforward to apply these models to other scenarios with multiple related populations.
The principal component scores {β n } ∞ n=1 are uncorrelated random variables with mean zero and variance {λ n } ∞ n=1 .β n serves as the weight and the projection of the centred stochastic process (Y (x) − µ(x)) in the direction of the n-th eigenfunction φ n (x) in the Karhunen-Loève representation of Y (x).As the eigenvalues decrease quickly towards zero, only the first few eigenfunctions are needed to represent the most important features of Y (x).The truncated Karhunen-Loève expansion with the optimal first N -dimensional approximations to Y (x) can be written as x) = 0. γt+h,k can hence be achieved using possibly any stationary autoregressive moving average (ARMA) process or autoregressive fractionally integrated moving average (ARFIMA) process.The order of the aforementioned time series models can be decided based on the Akaike information criterion (AIC) or the Bayesian information criterion (BIC).
Figure 4.2 and Figure 4.3 present the same set of data as a batch of observed and smoothed curves (functional observation) in rainbow plots with time-ordering indicated by the colours of the rainbow, from red to violet.
Figure 4.2 and Figure 4.3 both show Figure 4.6 shows the 20-years-ahead forecasting results of mortality curves of male (with RMSE = 0.2006) and female (with RMSE = 0.2406) from age 0 to age 100 for the year 2016 by the wMFPCA model based on the in-sample data from the year 1947 to the year 1996 in Japan.
Figure 4.7, we display the estimated overall mean function μ(x) and the first three estimated functional principal components { φk (x)} 3 k=1 .The corresponding scores of the first three estimated functional principal components { βt,k } 3 k=1 are presented along with 20years-ahead out-of-sample forecast means and 95% confidence intervals by ARIMA models in the bottom panel of Figure 4.7.The first three common trend functional principal components capture 98%, 1.7% and 0.2% variations respectively and more than 99% of the variations in the age-specific total mortality overall among this dataset.Each functional principal component models different movements in mortality rates.φ1 (x) primarily models the degree of variations

Figure 4 . 1 :
Figure 4.1: Log mortality rates for (a) male and (b) female from age 0 to age 100 with 20-year age intervals from the year 1947 to the year 2016 in Japan.

Figure 4
Figure 4.2: (a) Observed and (b) smoothed log mortality rates for male from the year 1947 to the year 2016 in Japan, viewed as functional data curves with time-ordering indicated by the colours of the rainbow from red to violet.

Figure 4
Figure 4.3: (a) Observed and (b) smoothed log mortality rates for female from the year 1947 to the year 2016 in Japan, viewed as functional data curves with time-ordering indicated by the colours of the rainbow from red to violet.

Figure 4 . 4 :
Figure 4.4: Estimated mean functions and the first three functional principal components for (a) male and (b) female mortality of Japan.
Figure 4.5: Residual plots by year and by age of the fitted sex-specific mortality rates of Japan for (a) male and (b) female via the wMFPCA model.

Figure 4 . 6 :
Figure 4.6: Fitted mortality curves for (a) male (with RMSE = 0.2006) and for (b) female (with RMSE = 0.2406) from age 0 to age 100 with 95% confidence intervals using the wMFPCA model for the year 2016 based on the observations from the year 1947 to the year 1996 in Japan.Circles are the true log mortality rates, solid lines are the predictive means, and dashed lines are the 95% confidence intervals.

Figure 4 . 7 :
Figure 4.7: Estimated overall mean function and the first three estimated common trend functional principal components for the total mortality rates of Japan.

Figure 4
Figure 4.7: (cont.)Corresponding PC scores of the estimated common trend with 20-yearsahead forecast means with 95% confidence intervals (in grey).
(a) Male deviation from the overall mean function (b) Female deviation from the overall mean function

Figure 4 . 8 :
Figure 4.8: Estimated deviation functions from the overall mean function and the first three estimated deviation trend functional principal components for (a) male and (b) female mortality rates of Japan.
Figure 4.9: Residual plots by year and by age of the fitted sex-specific mortality rates of Japan for (a) male and (b) female via the coherent wMFPCA model.

Figure 4 .
Figure 4.10: Predicted mortality curves of male (with RMSE = 0.1821) and female (with RMSE = 0.1460) from age 0 to age 100 with 95% confidence intervals using the coherent wMFPCA model for the year 2016 based on the observations from the year 1947 to the year 1996 in Japan.Circles are the true log mortality rates, solid lines are the predictive means and dashed lines are the 95% confidence intervals.

Figure 4 .
Figure 4.11: Historical Japanese mortality sex ratio and 20-years-ahead forecasts of mortality sex ratios in Japan from the year 1997 to the year 2016 using the independent FPCA model, the wMFPCA model, the Product-Ratio model and the coherent wMFPCA model.
and life expectancy cure patterns confirmed the non-divergent forecastability of the proposed coherent wMFPCA model.The usefulness of the two proposed models is illustrated through a series of forecast accuracy evaluations and comparisons with other existing methods.The wMFPCA model provides a very flexible framework for multi-population mortality forecasting with comparable forecast accuracy as the independent FPCA model and the Product-Ratio model.The coherent wMFPCA model outperforms the Product-Ratio model in terms of forecast accuracy with no assumptions needed to place on the equal variance of all subpopulations.The coherent wMFPCA model maintains the same level in the short term forecast ability as the independent FPCA model.

Table 4 .
1: Forecast accuracy of mortality for male and female using the independent FPCA model, the wMFPCA model, the Product-Ratio model and the coherent wMPFCA model is measured by the average RMSEs of ten sets of rolling-windows analysis.The minimal forecast errors are highlighted, and the lowest averaged forecast errors among models in different forecast horizons are highlighted in bold.