Wind Forecasts for Rocket and Balloon Launches at the Esrange Space Center Using the WRF Model

High-altitude balloons and rockets are regularly launched at the Esrange Space Center (ESC) in Kiruna, Sweden, with the aim of retrieving atmospheric data formeteorological and space studies in theArctic region. Meteorological conditions, particularly wind direction and speed, play a critical role in the decision of whether to go ahead with or postpone a planned launch. Given the lack of high-resolution wind forecasts for this remote region, the Weather Research and Forecasting (WRF) Model is used to downscale short-term forecasts given by the Global Forecast System (GFS) for the ESC for six 5-day periods in the warm, cold, and transition seasons. Three planetary boundary layer (PBL) schemes are considered: the local Mellor– Yamada–Janjić (MYJ), the nonlocal Yonsei University (YSU), and the hybrid local–nonlocal Asymmetric Convective Model 2 (ACM2). The ACM2 scheme is found to provide the most skillful forecasts. An analysis of the WRF Model output against the launch criteria for two of the most commonly launched vehicles, the sounding rockets Veículo de Sondagem Booster-30 (VSB-30) and Improved Orion, reveals probability of detection (POD) values that always exceeds 60% with the false alarm rate (FAR) generally below 50%. It is concluded that the WRF Model, in its present configuration, can be used to generate useful 5-day wind forecasts for the launches of these two rockets. The conclusions reached here are applicable to similar sites in the Arctic and Antarctic regions.


Introduction
The Esrange Space Center (hereafter ESC) is located at ;67.888N and 21.058E in Swedish Lapland and around 200 km north of the Arctic Circle. ESC is just outside the city of Kiruna and has been extensively used to launch high-altitude balloons and rockets to study the dynamics of the upper levels of Earth's atmosphere. As stated in the Esrange Safety Manual (www.sscspace.com/file/esrange-safety-manual.pdf), weather conditions play an important role in decisionmaking related to whether a planned launch will actually take place. One of the most important factors considered is the wind, with strict requirements for the maximum allowed wind variation and speed for each vehicle based upon this and other atmospheric conditions. The two most commonly launched vehicles at the ESC are the sounding rockets Veículo de Sondagem Booster-30 (VSB-30) and Improved Orion. As for the launch requirements, for the former maximum variations in horizontal wind speed of 1.8 m s 21 and in wind direction of 258 in the time window from the moment when the final launch settings are configured (typically 6 min before launch) to the actual launch time have to be accomplished. For the latter the requirements are 2.7 m s 21 for Denotes content that is immediately available upon publication as open access. the wind speed and 658 for the wind direction. These figures are obtained from simulations performed at the ESC (M. Bysell 2017, personal communication). Given these strict requirements, an accurate simulation of the atmospheric conditions in the planetary boundary layer (PBL), particularly of the wind, is crucial, as erroneous forecasts may lead to costly postponements or cancellations of planned launched events.
The PBL is the lowest part of Earth's atmosphere within which interactions between the atmosphere and the surface take place. As opposed to lower latitudes, the boundary layer in the Arctic region is usually shallow and stably stratified, particularly during the cold season. During the wintertime, the limited amount of incoming solar radiation, together with the strong longwave cooling of the surface, leads to the formation of strong surface inversions (Tjernström et al. 2005;Pepin et al. 2009). Models generally underperform in this region in the sense that the parameterization schemes employed do not work well in stable boundary layers leading to biases in key variables such as the 2-m temperature and horizontal wind speed (e.g., Mahrt 1998;Steeneveld 2014). The limited availability of observational data also makes it harder to fully evaluate the performance of numerical models.
General circulation models (GCMs) and regional climate models (RCMs) run by the different meteorological operational centers, such as the Swedish Meteorological and Hydrological Institute, are at too coarse of a resolution to fully capture the many processes that drive local weather variability. A successful forecast of local-scale atmospheric conditions for the purpose of this work, particularly wind fields, requires very high horizontal and vertical resolutions that are currently too computationally expensive for runs over large regions and/or long periods of time. Short-time wind forecasts have been performed for wind-energy-related applications (e.g., Lazić et al. 2010;Cassola and Burlando 2012) and to find the best model configuration by comparing model data with observed measurements taken during a field campaign (e.g., Banks et al. 2016) or an extreme weather event (e.g., Powers 2007). Work has also been done on forecasts for rocket launches, in particular at the Kennedy Space Center (e.g., Manobianco et al. 1996;Short et al. 2004) and White Sands Missile Range (e.g., Duncan and Rachele 1967) in the United States and at the Tanegashima Space Center (Kingwell et al. 1991) in Japan. As discussed by Kingwell et al. (1991), four meteorological factors are of particular importance: lightning, wind, turbulence, and temperature. Electrical surges can lead to a loss of control and even to the destruction of the rocket, which can be hazardous for ground personnel and equipment during launch and routine site operations. Rockets can themselves trigger lightning strikes as they travel vertically at very high speed through layers with rapid changing atmospheric electrical fields and leave behind sharp and narrow plumes of conductive and ionized gases. Wind is a major issue when the rocket is taken to the launchpad, due to its lower prefueling weight before, and after the shelter tower is removed. In addition, and as also discussed by Rachele and Armendariz (1967), the impact of the wind is significant during the burning phase of the rocket when it is near the surface and its relative velocity is low, stressing the need for high quality forecasts. Turbulence, in particular that arising from vertical wind shear, can lead to unacceptable stresses on key structural elements of the rocket. Also, very high or low temperatures can cause damage to components of the rocket and affect the performance of ground crews and equipment. Out of those factors the one that is found to be the most relevant to the launches at the ESC is the wind, which is the focus of this work. The discussion above is also relevant to balloon launches that, as stated by Wetzel et al. (1995), are mostly sensitive to the vertical wind shear near the surface and to the vertical temperature lapse rate (Boatman 1974).
The Weather Research and Forecasting (WRF; Skamarock et al. 2008) Model is used in this study. WRF is a fully compressible, nonhydrostatic model that uses a terrain-following hydrostatic pressure-based coordinate in the vertical and Arakawa C-grid staggering for horizontal discretization. It is a community model that has been used in a wide variety of applications, including coupled-model applications (Hogrefe et al. 2015), idealized simulations (Steele et al. 2013), and boundary layer research (Banks et al. 2016). Here, WRF is used to downscale 5-day forecasts by the Global Forecast System (GFS; Sun et al. 2010) for the ESC. This study has two goals: 1) test different model configurations and determine the one that gives the most skillful wind forecasts for use in subsequent simulations and 2) check whether the WRF wind forecasts can be used for go/ no-go decisions for the two most commonly launched vehicles at the ESC.
This manuscript is divided into six sections. In section 2, details about the model setup and methods used are given. A summary of the observational platforms and sensors available at the ESC is presented in section 3. The results of the model experiments are discussed in section 4, while in section 5 the possible use of WRF data to make go/no-go decisions for the launch of the VSB-30 and Improved Orion rockets is investigated. The main conclusions are outlined in section 6.

Experimental setup
In this study, version 3.7.1 of the WRF Model is forced with 3-hourly forecast data from the GFS. This dataset is available online in near-real time (http://www.nco.ncep. noaa.gov/pmb/products/gfs/) in a format that can be readily ingested into WRF without any need for postprocessing. The forecast dataset used for the experiments presented here is taken from the archive (https:// www.ncdc.noaa.gov/data-access/model-data/model-datasets/ global-forcast-system-gfs). The model is run in a oneway nesting configuration for two periods during the summer (0000 UTC 8 July-0000 UTC 13 July and 0000 UTC 24 August-0000 UTC 29 August 2016), winter (0000 UTC 30 November-0000 UTC 5 December 2016 and 0000 UTC 19 December-0000 UTC 24 December 2016), and transition (0000 UTC 27 September-0000 UTC 2 October 2016 and 0000 UTC 16 April-0000 UTC 21 April 2017) seasons. The GFS forecast data used to force the model is initialized at the beginning of each 5-day simulation and has a spatial resolution of 0.58 3 0.58. Figure 1 shows the model domains used in this work. The outermost grid covers most of northern Europe and the adjacent Atlantic and Arctic Oceans and is at a resolution of 27 km. Grids 2-4 are over northern Scandinavia, with the innermost grid centered over the ESC at a horizontal resolution of 1 km. WRF has been found to perform well in very-high-resolution (microscale) runs of up to a few meters (e.g., Aitken et al. 2014;Chu et al. 2014) and so is suitable for this work. For some of the experiments WRF was run with a fifth grid (spatial resolution of ;333 m), which yielded similar results to that of the 1-km grid (not shown). This is in line with Deb et al. (2016), who ran the model over Antarctica and found little sensitivity to the horizontal resolution at inland sites beyond 15 km. In the vertical, 60 levels concentrated in the PBL are used with the model top at 30 hPa. About 30 of those levels are located in the lowest 1 km with the first model level at ;11 m. In the outermost grid, analysis nudging toward the GFS data is employed with the potential temperature perturbation and horizontal wind components relaxed in the upper troposphere and stratosphere whereas the water vapor mixing ratio is nudged from the lower troposphere above the boundary layer to the upper troposphere. All fields are nudged on a time scale of 1 h. Even though interior nudging is applied to the outermost grid to prevent the large-scale fields from diverging strongly from those of the GFS forecast data, experimentation has revealed that similar results are obtained if no interior nudging is employed (not shown), which is not surprising as the simulations presented here are for a very short (5 days) period of time. The model output is stored every 3 h for the first two grids, 1 h for the third grid, and 10 min for the innermost nest. With this configuration, a 5-day run with 96 central processing units (CPUs) at the High Performance Computing Center North Abisko cluster takes less than 1.5 days to finish, therefore allowing the model forecasts to be available well in advance of a scheduled event. As the preflight meeting at the ESC takes place 2 days before launch, a 5-day run makes sense because the forecasts can be made available for that discussion, which will allow for better planning of the event. The final decision on whether to go ahead or postpone a launch is generally taken the day before a planned launch date.
The WRF version used here contains most of the improvements made in the polar-optimized version of the WRF Model (Polar WRF; Hines and Bromwich 2008). The physical parameterizations used include the Goddard six-class microphysics scheme (Tao et al. 1989), the four-layer Noah land surface model (Chen and Dudhia 2001), and the Rapid Radiative Transfer Model for GCMs (RRTMG) models for both short-and longwave radiation (Iacono et al. 2008). In the latter, a climatological aerosol distribution based on Tegen et al. (1997) is applied. Cumulus convection is parameterized in the model with the Betts-Miller-Janjić (BMJ) scheme (Janjić 1994). To account for the cumulus cloudradiation feedbacks, a precipitating convective cloud scheme developed for the BMJ scheme (Koh and Fonseca 2016) is employed, with the radiation scheme called every 5 min. The cumulus scheme is switched off in the two innermost grids whereas slope and shading effects on the surface solar radiation flux are added in the innermost nest. Three PBL schemes are considered: Yonsei University (YSU; Hong et al. 2006), Asymmetric Convective Model 2 (ACM2; Pleim 2007a,b), and Mellor-Yamada-Janjić (MYJ; Janjić 1990Janjić , 1994. These schemes are tied to the Monin-Obukhov surface layer parameterization (Monin and Obukhov 1954). A simple interactive prognostic scheme for the sea surface skin temperature (SSKT) based on Zeng and Beljaars (2005), which takes into account the effects of the sensible, latent, and radiative fluxes, as well as molecular diffusion and turbulent mixing, is added to the model to capture the diurnal variation of the SSKT and allows its feedback to the atmosphere. The lower boundary condition to the SSKT scheme comes from the 3-hourly SST data from the GFS, linearly interpolated in time in order to have a continuously varying forcing on the skin layer. The fractional sea ice coverage is also read in every 3 h from the GFS forecast data but as they do not provide sea ice thickness data, a default thickness has to be defined for the cold season experiments. The ice thickness is typically of 50 cm in the Gulf of Bothnia  F O N S E C A E T A L . (Leppäranta and Seinä 1985) and up to 4 m in the Arctic Ocean region included in the outermost model grid (Bourke and Garrett 1987). As the thickest ice is located near the northern and western sides of the 27-km grid, far away from the region of interest, a default sea ice thickness of 1 m is used. The sea ice albedo is a function of air temperature, skin temperature, and snow (Mills 2011). Gravitational settling of cloud drops in the atmosphere is parameterized as described by Duynkerke (1991) and Nakanishi (2000), whereas cloud water (fog) deposition onto the surface due to turbulent exchange and gravitational settling is treated using the simple Fog Deposition Estimation (FogDES) scheme (Katata et al. 2008(Katata et al. , 2011. In addition, in all WRF simulations nudging is applied at the lateral boundaries over a nine-gridpoint transition zone. A Rayleigh damping is also employed in the top 5 km to the wind components and potential temperature on a time scale of 5 s (Skamarock et al. 2008). The PBL schemes used in this study comprise one nonlocal (YSU), one local (MYJ), and one hybrid localnonlocal (ACM2) schemes. Local schemes assume that the size of the turbulent eddies is smaller than the vertical grid spacing of the model. In these schemes, only vertical levels adjacent to a given grid point directly affect the variables at that location. Conversely, in nonlocal schemes, multiple vertical levels are considered. The idea behind them is that larger-scale eddies can transport fluid over some distance before it is mixed by smaller-scale eddies. While local schemes are known to have problems with localized stability maxima, nonlocal PBL schemes have a tendency to overmix, which can result in a convective boundary layer being too deep, warm, and dry. The ACM2 features local and nonlocal upward mixing and local downward mixing, with the nonlocal transport shut off for stable or neutral flows. For some of the experiments, the quasi-normal scale elimination (QNSE; Sukoriansky et al. 2005) scheme, a local scheme like the MYJ but that uses a new theory for stably stratified environments, is tested. It is concluded that while it works better than the MYJ during the winter season, it generally gives the lowest skill scores when compared to the YSU, MYJ, and ACM2 schemes in the summer season and hence is not considered here. A full description of these PBL schemes, together with their main advantages and disadvantages, is given by Cohen et al. (2015) and Banks et al. (2016).
The model performance is assessed with the verification diagnostics proposed by Koh et al. (2012). They include the model bias, normalized bias m, correlation r, variance similarity h, and normalized error variance a, as defined in the appendix. The bias is defined as the mean discrepancy between the model and observations while the normalized bias is given by the bias divided by the standard deviation of the discrepancy between the model and observations. The correlation is a measure of the phase agreement between the model and observations. The variance similarity is an indication of how the signal amplitude given by the model agrees with that observed and is defined as the ratio of the geometric mean to the arithmetic mean of the modeled and observed variances. The normalized error variance is the variance of the error arising from the disagreements in phase and amplitude, normalized by the combined modeled and observed signal variances. For vector variables, two additional diagnostics are considered that give information about the error ellipse: the symmetrized eccentricity « s and the preferred direction of the vector pattern errors u. The former gives information about the anisotropy of the vector pattern errors ranging from 0 for isotropy to 1 for maximum possible anisotropy (i.e., the vector errors are aligned in a straight line). The orientation u represents a tendency for the random error to align in that direction. The best performance corresponds to zero bias and normalized bias, and zero a, which requires both r and h to be equal to 1 as the three diagnostics are related by the identity below: (1) The root-mean-square error (RMSE) is determined by the normalized error variance and normalized bias as follows: where s 2 O and s 2 F refer to the variance of the observation and model forecast, respectively. Further details about these diagnostics can be found in the appendix.
The main goal of the sensitivity experiments is to find the best PBL scheme, out of those considered, for use in future forecast runs. The main verification diagnostic used for this purpose is the normalized error variance a with the best PBL scheme being the one that gives the lowest values of a.
The potential use of the WRF forecasts for planned launches will be quantitatively assessed using the probability of detection (POD), false alarm rate (FAR), and critical success index (CSI) scores defined in Schaefer (1990). The POD is defined as the ratio of the number of hits (i.e., events that are correctly forecasted by the model) to the total number of events (which includes hits and misses, with the latter defined as the number of actual events that are not forecasted) and gives the fraction of actual events that are successfully predicted by the model. The FAR is the ratio of the number of false alarms (i.e., unsuccessful positive forecasts) to the total number of positive forecasts (sum of hits and false alarms) expressing the fraction of the model forecasts that turn out not to be correct. The CSI, also denoted as the ratio of verification, is the ratio of the number of hits to the total number of hits, misses, and false alarms, giving the ratio of the number of correct forecasts to the total number of forecasts that were either made or needed. These scores are defined in the appendix. They are normally expressed in percentages with perfect scores of 100% for POD and CSI and 0% for FAR.

ESC observational network
At the ESC, the weather sensors are located on four platforms shown in  Table 1 shows the coordinates and ellipsoid heights of the platforms as well as a list of the weather sensors available on each of them whereas in Table 2 the specifications of the sensors are given. A view of the BPW platform is presented in Fig. 2b. The distance between the RH and the balloon pads and between the balloon pads and the WT is ;0.9-1.1 km (corresponding to about one grid point in the innermost grid) whereas the two balloon pads are located in the same grid point of the 1-km domain as they are ;234 m apart.
As stated in Table 2, the wind sensor at the location of the BPW does not work well in cold weather conditions. As a result, whenever the sensor is not operating properly its measurements are discarded and not used for assessment. Because of missing data during some time periods, at each site and forecast day, a minimum of 50 (out of the possible 144 given the 10-min output frequency) data points are required for the diagnostics to be computed; otherwise, they will not be shown. Using a different threshold does not change the conclusions reached in this work.
To directly compare the WRF output with the observed measurements, the model's surface-layer scheme is modified to output the temperature and water vapor mixing ratio at 3 m and the horizontal wind components at 3.5 m above the surface. These values are extrapolated using the fields at the surface and the first model level, located at ;11 m above the surface, in the manner described by Jiménez et al. (2012). For the comparison with the WT measurements, the 3D winds given on model levels are interpolated to the required height levels. These fields are also interpolated into a set of 29 pressure levels with increased vertical resolution just above the surface. The WRF grid point used for comparison is not chosen as the closest one to the location of the station. Instead, the low-level winds, defined as the winds at the pressure level just above the surface pressure, are bilinearly interpolated to the location of the station with the reference grid point chosen to be the neighboring grid point that is upstream. This is particularly important for coastal stations as onshore and offshore flows normally lead to very different weather conditions but is also applied here.
It is important to note that while observed data are measured at a given point in space every 1-10 s and are consistent with the physical forcing, the 10-min WRF fields represent a spatial average over the area of a grid box and are based on the forcing resolved in the model. Hence, the model is not expected to simulate the highfrequency variability seen in observations, mainly for fields such as the wind. A common practice in the literature is to time average the observed wind data with a typical averaging time of 3 min (e.g., Koskela et al. 2001). For consistency, all observed fields used in this work are averaged over 3-min periods before being compared with the model data.

Model results
In this section, the sensitivity experiments conducted to determine the best model configuration for subsequent forecast runs are discussed. In section 4a, the large-scale circulation for each of the six cases is presented. The evaluation of the model performance, using the verification diagnostics proposed by Koh et al. (2012), is given in section 4b.

a. Synoptic analysis of case studies
In Fig. 3, the sea level pressure and 10-m horizontal wind vectors for the two summer cases (at 0000 UTC 9-12 July and 25-28 August 2016) from the ERA-Interim reanalysis (Dee et al. 2011), GFS forecast data, and WRF outermost (27 km) grid are shown. In the July 2016 case, the weather conditions are dominated by two main areas of low pressure: one located over Finland on 9 and 10 July and another that moves in from the Atlantic into southern Scandinavia on 11 and 12 July. The former splits in two on 11 July, with one piece moving northwestward just off the northwestern coast of Norway and the other southeastward into northwestern parts of Russia. The main WRF biases are a slightly stronger area of low pressure to the north of Tromsø, Norway, on 11 July, giving a southwesterly flow at the ESC not seen in the reanalysis data, and a weaker area of low pressure coming in from the Atlantic on 12 July. These discrepancies with ERA-Interim are also seen in the GFS data but are more significant in the WRF fields, in particular during the latter part of the period. It is important to note that in the troposphere only the water vapor mixing ratio is nudged, while the low-level circulation in the interior of the domain is allowed to evolve freely. In the August 2016 case, the weak disturbance moving over the northern and central parts of Scandinavia on 25 and 26 August is captured by WRF and the GFS, but the second deeper storm that affects the region during the latter part of the period is significantly underpredicted by both, in particular on 27 August. In any case the WRF Model is able to capture the near-surface wind except on that day, when the  WRF winds are more southwesterly as opposed to southeasterly in ERA-Interim. Figure 4 is as in Fig. 3, but for the winter cases. The November 2016 case starts with an area of low pressure over Arctic Scandinavia that is stronger in WRF and GFS compared to the reanalysis data. This system moves eastward, slower in WRF and GFS, and is eventually replaced by an area of high pressure before another storm approaches the region on 4 December. The low-level flow at the ESC simulated by WRF generally agrees with that observed although it is has a tendency to be stronger. This period is characterized by predominantly northerly winds and cold-air advection at the ESC. On the contrary, the mid-December 2016 case is dominated by a persistent southwesterly flow with a deep area of low pressure over the adjacent Atlantic waters and an area of high pressure to the south. The latter moved eastward on 22 and 23 December and the former moved northeastward with a strong low-level flow over northern Scandinavia, in particular, on 23 December. The main WRF bias, also seen in the GFS, is a displacement of the area of low pressure closer to the coast of northern Norway resulting in stronger nearsurface winds at the ESC.
The large-scale circulation in the two transition season cases is given in Fig. 5. The first period, mid-April 2017, is mostly quiescent with an area of high pressure in control. The near-surface winds are rather weak and blow predominantly from the west and southwest, which the model does not capture in particular on 18 April when WRF predicts northwesterly winds at the ESC. In the last forecast day, however, a deep area of low pressure approaches from the northwest and the southwesterly flow intensifies, which WRF simulates. The other case considered takes place in late September and early October 2016 and, in terms of the large-scale pattern, is the opposite of the first: areas of low pressure, one particularly deep, affect the weather conditions in northern Scandinavia with southerly winds at the beginning of the period gradually shifting to westerly and then to northwesterly. The strength of the system on 30 September is simulated by WRF but the low is displaced to the southwest, along the western coast of Norway, while in the GFS there are two centers: one where the storm is located in ERA-Interim and another where it is centered in WRF. As a result of these discrepancies, there are some disagreements between the modeled and observed near-surface winds in particular in the latter part of the period.
In conclusion, and as expected, the WRF Model captures the large-scale circulation in all six cases with the main discrepancy being an overestimation of the near-surface winds. WRF and GFS data are also more similar among themselves than with ERA-Interim, which is not surprising as WRF gets its initial and boundary conditions from the GFS.

b. Model evaluation
In this section the results of the model evaluation are presented. Figure 6 shows the a diagnostic for each site and forecast day for the six cases. To facilitate the comparison, the results for each scheme are plotted next to each other with the 95% confidence intervals shown as error bars and estimated using bootstrapping based on 4000 bootstrap samples. An inspection of Fig. 6 reveals that the ACM2 scheme generally gives the best scores. As will be shown in the next section, this scheme clearly outperforms the other two for the purpose of the launch of the VSB-30 and Improved Orion sounding rockets. As a result, the ACM2 scheme will be used in subsequent forecast runs. For a given forecast day and PBL scheme, the range of a values can be very large, at   times exceeding 1. This indicates a significant spatial variability of the winds that the model, at its spatial resolution, is not capable of simulating. In any case, and for most sites and forecast days, a , 1, indicating that the WRF wind forecasts are practically useful.
In the subsequent discussion, only the ACM2 experiments are considered. Figure 7 shows the correlationsimilarity diagram (Koh et al. 2012) for the horizontal wind vector. Two features stand out: for most sites and seasons a , 1, indicating good model performance, and most data points lie within the jr/hj , 1 circles, meaning that phase errors dominate over amplitude errors. As the wind variability at the ESC is mostly controlled by the passage of transient baroclinic systems, the lower r values when compared to h values indicate that the errors in the timing and location of these systems prevail over the intensity errors. The rather low values of h for the November 2016 case occur in the first and last forecast days when the near-surface wind is particularly strong and indicate that the observed wind variability is not well captured by the model. Figure 8 shows the error decomposition diagram for the horizontal wind vector. As stated in Koh et al. (2012), and deduced from Eq. (2), when the absolute value of m does not exceed 0.5, the contribution of the bias to the RMSE is less than ;10% and the biases can be considered not significant when compared to the error variance. In the case of the wind vector, and for most seasons and sites, m is large mostly in the range from 0.5 to 1.5, indicating that the contribution of the bias to the RMSE varies from ;10% to 80%. The largest values of m occur during the two winter periods, in particular at the balloon pads on 21 December when the WRFpredicted wind speed exceeds that observed by up to 10 m s 21 (not shown). The WRF Model has been found to underperform during the cold season in the Arctic (e.g., Kilpeläinen et al. 2011Kilpeläinen et al. , 2012 and Antarctic (e.g., Tastula et al. 2012) regions. Figure 9 shows the error anisotropy diagram. The vector pattern errors are generally anisotropic (« s . 0:2) with the wind errors tending to align along the east-west axis with a spread up toward the northeast-southwest and southeast-northwest directions. Regarding the interpretation, if u is east-west, it means that, after correcting for the model bias, easterly-westerly winds tend to be modeled with the wrong magnitude more than with the wrong direction, with the opposite being true for the southerly-northerly direction. Figure 9 suggests that the inaccurate day-today positioning of the midlatitude zonal average jet stream is a possible reason for the observed axial preference as a too strong (weak) westerly jet would lead to westerly (easterly) wind vector pattern errors. Figure 10 shows the correlation-similarity diagram for the remaining variables for which observations are available: temperature at the BPW and RH sites (circles), as well as relative humidity (triangles) and surface pressure (squares) at the BPW site. As is the case for the winds, phase errors largely dominate over amplitude FIG. 6. Normalized error variance a for the BPN, BPW, RH, and WT (six vertical levels) horizontal wind vectors. The scores for the YSU, MYJ, and ACM2 PBL schemes are shown in the blue, green, and red circles, respectively. Shown are results for the (top) two summer seasons considered, (middle) two winter seasons, and (bottom) two transition seasons for which WRF is run. The 95% confidence intervals for each scheme, shown as error bars, are estimated using bootstrapping based on 4000 bootstrap samples.
errors as most of the data points lie within the jr/hj , 1 circles. The larger h values indicate that subgrid-scale variations for these fields are not as important. Overall, the WRF performance for these fields is superior to that of the winds: a majority of the data points are found near the bottom of the plot, with r values in excess of 0.8 and h values in excess of 0.9, which results in a values less than ;0.3. In the error decomposition diagram (Fig. 11) most data points lie within jmj , 1, and hence the contribution of the bias to the RMSE for these fields is much less than that of the wind with most of the biases considered not significant. There are, however, a few rather large normalized biases, in particular one in the November 2016 case for which m is close to 8. An analysis of the WRF output showed that these scores occur on 3 December at the BPW site, with the WRFpredicted temperature not dropping below 2108C while that observed is as low as 2318C (not shown). The surface skin temperature in the model dropped to 2198C and so was closer to, but still higher than, that observed. Experimentation has shown that adding one further nest (;333 m) does not alleviate the problem (not shown). As these discrepancies are not seen at the RH site, they are likely related to local topography. As seen in Fig. 2 and Table 1, the balloon pads are located at a lower elevation compared to the hill where the radar sensors are found, meaning that theoretically a cold-air pool can form in the area. This is confirmed to be the case and is seen at other times of the year with the difference between the temperatures at the RH and BPW sites being as high as 308C (not shown). Taking a temperature difference of 308C and using the elevation of the sensors given in Table 1, the maximum lapse rate is ;1608C km 21 . Despite having a rather large magnitude, steeper lapse rates have been observed elsewhere such as in Kevo Valley (Finnish Lapland), where the largest magnitude lapse rate observed during the period February 2006-07 was 5008C km 21 (Pepin et al. 2009). The static fields used in the experiments are carefully interpolated from a 30 00 (;930 m) dataset, the highest resolution available online on WRF's website. There is a need to use even higher-resolution datasets for the model to properly represent the observed atmospheric flow at very small spatial scales. Figures 12 and 13 show the POD, CSI, and FAR scores for each PBL scheme and forecast day for two of the most commonly launched vehicles at the ESC: the sounding rockets VSB-30 and Improved Orion. The scores are obtained by applying the wind speed and direction criteria stated in section 1 (for VSB-30 maximum wind speed and direction variation are 1.8 m s 21 and 258, respectively, and for Improved Orion they are 2.7 m s 21 and 658) to the 10-min WRF and observed data. As the figures are for the 6-min time window from when the launch settings are configured to the actual launch event, the 10-min window considered here is more restrictive giving more conservative values. For each 10-min interval the criteria are applied to the four sites (RH, BPW, BPN, and the six vertical levels of the WT) separately to generate the correspondent POD, FAR, and CSI scores. The higher scores for the Improved Orion rocket are consistent with the less restrictive criteria for the maximum wind speed and direction shifts for this vehicle.

Launch criteria for sounding rockets
The ACM2 scheme consistently gives the most accurate forecasts while the MYJ is generally the worstperforming scheme. For all seasons and forecast days, the PODs are in excess of 60% for the VSB-30 and 85% for the Improved Orion rocket. This means that in about two-thirds of the cases or more when there are favorable conditions for the launches WRF generates a successful forecast. For the VSB-30, the FARs are generally below ;60% for the summer and transition seasons but reach ;75% for the winter periods. This indicates that up to three-quarters of the time when the model predicts good conditions for the launch of the two vehicles they turn out not to be favorable. As seen in Fig. 13, these values are much lower for the Improved Orion rocket not exceeding ;45%. The lowest CSIs obtained for VSB-30 are ;20% for the last forecast day in the summer season and first forecast day in the winter season while for the Improved Orion rocket the CSIs are above 50% for all seasons.
Overall, the scores for the winter periods are found to be lower, showing a larger spread. As seen in Fig. 4, the winter periods are characterized by strong near-surface winds. As the launch criteria for these vehicles are tied to the temporal variability of the horizontal wind vector, and are consistent with the smaller h values shown in Fig. 7, lower scores are expected. As far as the variability of the scores during the 5-day forecast is concerned, for the summer periods there is a general deterioration with forecast time. This is expected as the GFS forecast data, FIG. 11. As in Fig. 8, but for the temperature (circles), relative humidity (triangles), and surface pressure (squares). The diagnostics for temperature are shown for the BPW and RH sites, and those for relative humidity and pressure are for the BPW site only. Fig. 7, but for the temperature (circles), relative humidity (triangles), and surface pressure (squares).The diagnostics for temperature are shown for the BPW and RH sites, and those for relative humidity and pressure are for the BPW site only.

FIG. 10. As in
FIG. 12. POD, FAR, and CSI for the VSB-30 sounding rocket for the two (top) summer, (middle) winter, and (bottom) transition seasons considered. The launch criteria are applied separately at the BPN, BPW, RH and WT (six vertical levels) sites. The scores for the YSU, MYJ, and ACM2 PBL schemes are shown in the blue, green, and red circles, respectively. The 95% confidence intervals for each scheme, shown as error bars, are estimated using a bootstrapping approach based on 4000 bootstrap samples. The perfect score corresponds to 100% for POD and CSI and 0% for FAR. used to generate the initial and boundary conditions for the WRF runs, start to deviate more strongly from the reanalysis dataset, as seen in Fig. 3. However, for the winter season there is a general improvement in the scores from days 1 to 3, followed by the expected deterioration in the later forecast days. This increase in skill is not likely due to a more favorable large-scale pattern and probably arises from an improved model performance. An analysis of the reasons behind such an improvement is beyond the scope of this study. For the transition seasons, the scores do not show much variability during the forecast period.
In conclusion, the WRF Model can be used for go/ no-go decisions for the launches of these two sounding rockets for up to 5 days with the ACM2 scheme giving the best scores. Given their lower temporal frequency, these three diagnostics cannot be computed using reanalysis data or the 3-hourly GFS forecast data used to force the model for comparison with the WRF values.

Conclusions
In this paper, the WRF Model is used to generate wind forecasts for the ESC (;67.888N, 21.058E), where rockets and balloons are regularly launched with the aim of retrieving atmospheric data for meteorological and space studies. Out of the different factors that play an important role in rocket and balloon launches, which include lightning, temperature, wind, and turbulence, as discussed by Kingwell et al. (1991) and Wetzel et al. (1995), the one that is found to be more relevant to the ESC is the wind and that is the focus of this work. The initial and boundary conditions for the model runs are taken from the 3-hourly GFS forecasts available online in near-real time. The model is run for six 5-day periods during the summer, winter, and transition seasons. At the ESC the preflight meeting takes place 2 days before a planned launch. As a 5-day simulation can be completed in less than 1.5 days at the Abisko HPC2N cluster with just 96 CPUs, the WRF forecasts can be made available for the preflight meeting and therefore can be helpful in the planning of the event. Such a short forecast latency time also allows for successive runs initialized at different times before a planned launch that will help to gauge trends in the forecasts and provide further guidance for the go/no-go decision.
The model performance is evaluated using the suite of diagnostics proposed by Koh et al. (2012). These include the model bias, normalized bias m, correlation r, variance similarity h, and normalized error variance a. The latter varies from 0 (optimal forecast) to 2 and is equal to 1 for a random forecast. A WRF forecast is deemed practically useful if a , 1. The r, h, and a diagnostics are nondimensional, symmetric with respect to the observations and forecasts, and can be applied to both scalar and vector variables, making them ideal for use in this study. For vector fields, two additional diagnostics are used that give information about the vector pattern errors: the symmetrized eccentricity « s and the preferred direction of the vector pattern errors u.
Three PBL schemes are considered in this work: one local scheme (MYJ), one nonlocal scheme (YSU), and one hybrid local-nonlocal scheme (ACM2). A comparison of the a values for the different experiments and forecast days reveals that the ACM2 scheme generally gives the best scores. The range of values obtained for the normalized error variance can be rather large with this spread indicating a pronounced spatial variability of the winds that the model, at its spatial resolution, is not able to capture. For the ACM2 simulations, further analysis is conducted using the three diagrams proposed by Koh et al. (2012): the correlation-similarity diagram, the error decomposition diagram, and the error anisotropy. It is concluded that phase errors dominate over amplitude errors, meaning that more effort has to be put into improving the timing and location of the baroclinic systems that affect the region year-round than into improving their intensity. In general, the model biases contribute significantly to the RMSE when compared to the error variance, and the anisotropy of the wind error variance is generally large with the preferred direction of the vector errors lying along the east-west axis with a spread up to the northeast-southwest and southeast-northwest axes. A possible explanation for the observed axial preference is the inaccurate representation of the day-to-day position of the midlatitude zonal average jet stream.
Even though the focus of this work is on the horizontal wind vector, a similar analysis is conducted for the other fields for which observational data are available that include the air temperature, relative humidity, and surface pressure. For these the performance is much improved, with phase errors also dominating over amplitude errors. The contribution of the biases to the RMSE is generally small but for some sites, and mostly in the cold season, it can be large. These discrepancies arise from an incorrect representation of the local topography and associated cold-air pooling that the model, at its spatial resolution, is not able to simulate.
The utility of the WRF forecasts for actual launches is tested by applying the launch criteria to two of the most common vehicles launched at the ESC: the sounding rockets VSB-30 and Improved Orion. For all seasons and forecast days, and with the ACM2 scheme that is found to give the best performance, the PODs are in excess of 60% for the VSB-30 and 85% for the Improved Orion, with FARs generally below ;60% for the former and ;45% for the latter. It is concluded that WRF, in its present configuration, can be used for go/no-go decisions for the launches of these vehicles. Even though the focus of this work is on the ESC, the findings reached here are applicable to similar sites in the Arctic/Antarctic region where rockets and balloons are regularly launched such as in Barrow, Alaska, and the Svalbard archipelago.
Acknowledgments. This work is partly funded by the Swedish National Space Board (SNSB) through the NRFP-3 program and Luleå University of Technology (LTU). We are grateful to the High Performance Computing Center North (HPC2N) for providing the computer resources needed to perform the numerical experiments presented in this paper. We would like to acknowledge Martin Bysell, Klas Nehrman, Mikael Viertotak, and Per Baldemar from the Swedish Space Corporation (SSC), for their assistance and valuable discussions that helped to shape this paper. We would also like to thank three anonymous reviewers for their detailed and insightful comments and suggestions that helped to improve the quality of the paper.

Verification Diagnostics
The verification diagnostics used in this work are defined below: BIAS 5 hDi 5 hFi2hOi , (A2) In the equations above, D is the discrepancy between the model forecast F and the observations O, s X is the standard deviation of X, m is the normalized bias, r is the correlation, h is the variance similarity, a is the normalized error variance, « s is a symmetrized measure of the eccentricity of the error ellipse, and u is the preferred direction of the vector pattern errors. More information about these diagnostics can be found in Koh et al. (2012) and Koh and Ng (2009). The advantages of this set of diagnostics are highlighted below: 1) There is a systematic and complete breakdown of the RMSE into normalized bias and normalized error variance and the normalized error variance further into correlation and variance similarity. 2) Statistics are normalized on ''absolute'' scales, where universal reference values are located and comparison with which yields meaningful guidance for model improvement: 2a) m ( 1, where the bias contributes much less to RMSE than does the error variance and hence more effort should be placed on reducing the error variance; 2b) a 5 1, which is a random forecast based on the climatological mean and variance; note that a , 1 makes a model practically useful, and a . 1 means that the model is more likely wrong than right and hence gross modeling problems exist; 2c) r/h , 1, where phase errors contribute more than amplitude errors in varying signals, implying a need to preferentially improve the phase agreement in the model. 3) Important conceptual characteristics of the diagnostics are observed: 3a) There is invariance when the observations and model datasets are swapped. For example, h is superior to the fractional discrepancy (F 2 O)/O implied in Taylor diagrams where the standard deviation of observations is taken as a reference. 3b) The vector nature of the wind error is respected and is not decomposed into its (Cartesian or polar) components, which are then incorrectly treated as scalars. Unlike scalars, the components of a vector are not invariant to coordinate transforms. 3b.1) The invariant trace of the tensor variance is preferred to separate noninvariant variances of the u and y wind errors or of the magnitude of the wind error only. 3b.2) The error information associated with the wind direction is fully and correctly captured by the error ellipse (two invariant parameters: « s and u), and not incompletely and incorrectly by treating the direction of the wind error (one noninvariant parameter, angle) only. 3b.3) The three error diagnostics of the wind vector provide a rigorous, consistent description of the tensor variance [Eq. (A7)] in all coordinates. In contrast, the treatment as two separate (Cartesian or polar) components cannot be mathematically related despite the fact that these wind components are not independently varying and depend on the orientation of the Cartesian axes or the origin of the polar coordinate system.
In order to assess the usefulness of the WRF wind forecasts for the launch of the VSB-30 and Improved Orion sounding rockets, the following three diagnostics are considered: POD 5 hits hits 1 misses , FAR 5 false alarms hits 1 false alarms , and CSI 5 hits hits 1 misses 1 false alarms .
In the equations above, POD is the probability of detection, FAR is the false alarm ratio, and CSI is the critical success index; hits are the numbers of correctly forecasted events (true positives), misses are the numbers of actual events that were not predicted (false negatives), and false alarms are the numbers of predicted events that did not occur (false positives). More information about these scores can be found in Schaefer (1990).