Increasing the quality of seismic interpretation

Geologic models are based on the interpretation of spatially sparse and limited resolution data sets. Nonunique interpretations often exist, resulting in commercial, safety, and environmental risks. We surveyed 444 experienced geoscientists to assess the validity of their interpretations of a seismic section for which multiple concepts honor the data. The most statistically influential factor in improving interpretation was writing about geologic time. A randomized controlled trial identified for the first time a significant causal link between being explicitly requested to describe the temporal geologic evolution of an interpretation and increased interpretation quality. These results have important implications for interpreting geologic data and communicating uncertainty in models. Introduction Geology is an interpretation-based science (Frodeman, 1995). Waste disposal, carbon capture and storage, and hydrocarbon exploitation require an increasingly nuanced understanding of the subsurface. Subsurface understanding is often based on remotely sensed geophysical imagery, combined with spatially limited borehole data. Interpretation is required for the value of such data to be realized through the creation of geologic models. Traditional quantification of uncertainty in geologic models considers a single “base case” conceptual interpretation, within which the locations of interpreted horizons and/or model parameters are represented probabilistically (Harbaugh and BonhamCarter, 1970; Mann, 1993). An inherent weakness of this type of approach is that it only explores part of the uncertainty space (Chamberlin, 1890; Bentley and Smith, 2008): Only one geologic concept is considered, whereas many could honor the data (Bond et al., 2008). Bond et al. (2008) propose that multiple models should be created in an interpretation work flow and then assessed for validity against accepted geologic rules. These rules are determined by the geometric relationships and physical properties of geologic units and their evolution through time (Bond, 2015). Expert elicitation may be used to assess the likelihood of each interpretation (Polson and Curtis, 2010). However, there remains a need to improve work flows to increase the number of structurally valid seismic interpretations (Bond et al., 2012). Factors affecting interpretation We conducted a survey to quantify the factors affecting geoscience interpretation, targeting experts from academia and the oil and gas industry. Bond et al. (2007, 2012) use a synthetic seismic image constructed from a single forward-modeled geologic cross section. We built on this by using real seismic reflection data (Stewart, 2007) for which more than one interpretation was valid, and we increased the sample size and elicited more detailed information on individuals’ training and experience. Critically, we then tested our findings to demonstrate that a change in work flow increased interpretation quality. Four hundred and forty-four experienced geoscientists interpreted a real 2D seismic image by hand (Figure 1a). An accompanying questionnaire included 21 questions that captured respondents’ backgrounds (i.e., education, work environments, and experience). The questionnaire and the resulting data sets used in the analyses are openly available (see Macrae et al., 2016). An experienced geoscientist was defined as one whowasgreaterthan21yearsold,hadauniversitydegree, hadmore than two years of experience after the completionof theirhighestdegree,andhadexperienceofseismic interpretation and structural geology. Structural geology experience was included as part of our definition of an University of Strathclyde, Department of Civil and Environmental Engineering, Glasgow, UK. E-mail: euan.j.macrae@gmail.com; zoe.shipton@strath.ac.uk; rebecca.lunn@strath.ac.uk. University of Aberdeen, Geology and Petroleum Geology, Aberdeen, UK. E-mail: clare.bond@abdn.ac.uk. Manuscript received by the Editor 21 December 2015; revised manuscript received 5 April 2016; published online 14 July 2016. This paper appears in Interpretation, Vol. 4, No. 3 (August 2016); p. T395–T402, 2 FIGS., 5 TABLES. http://dx.doi.org/10.1190/INT-2015-0218.1. © The Authors.Published by the Society of Exploration Geophysicists and the American Association of Petroleum Geologists. All article content, except where otherwise noted (including republished material), is licensed under a Creative Commons Attribution 4.0 Unported License (CC BY). See http://creativecommons.org/licenses/by/4.0/. Distribution or reproduction of this work in whole or in part commercially or noncommercially requires full attribution of the original publication, including its digital object identifier (DOI). t Special section: Building complex and realistic geologic models from sparse data Interpretation / August 2016 T395 Downloaded from https://pubs.geoscienceworld.org/interpretation/article-pdf/4/3/T395/3015649/int2015-0218.pdf by guest on 09 January 2019 experienced geoscientist because our previous research (Bond et al., 2007, 2012) indicated that it affected 2D seismicinterpretationquality.Toavoidinfluencingtheir interpretation, respondents were not told where the seismic datawere fromand theonly instructionwas “please interpret the whole seismic image.” The age and gender of respondents were analyzed for demographic representation by combining the 2009 membership lists of AAPG, AGU, EAGE, and Geological Society of London to serve as a proxy for the underlying geoscientist population. Compared with the pooled membership lists, our sample had less than a 13% absolute difference in each of the questionnaire’s age categories and less than a 3% absolute difference in the gender categories. Because the survey used real seismic data, there was no “correct” interpretation. To assess respondents’ interpretations, they were compared with the interpretations of five reference experts (REs). The REs were contacted separately and asked to spend at least 30 min interpreting the seismic image without collaboration. The REs had a median of 24.5 years of experience since completion of their highest degree, came from different technical backgrounds and they were acknowledged leading experts in seismic interpretation, structural geology, sedimentology, and tectonics. The response variable in the analysis was the similarity of respondents’ interpretations to at least one of the REs’ interpretations. The five REs were asked to provide key geologic features (“those geologic features that helped to define the tectonic setting and/or stratigraphic setting of the interpretation”) that were integral to their interpretation (Figure 1b). The REs provided between six and nine key features each. All five RE interpretations differed from each other, but honored the data and were geologically valid, i.e., met accepted geologic rules. Interpretations ranged from a detached listric normal fault interpretation to a transtensional fault interpretation; all but one expert interpreted an extensional tectonic regime. All experts interpreted salt with a detachment horizon, but only four experts chose salt as a key feature, indicating disFigure 1. (a) The 2D seismic reflection image used in the survey, from the UK Central North Sea peripheral graben system (Blocks 20/20, 21/16 area), from Stewart (2007). The vertical axis is in two-way time and has an approximate 3× vertical exaggeration. (b) The key features chosen by two or more REs are annotated, whereas those chosen by only one are not included. T396 Interpretation / August 2016 Downloaded from https://pubs.geoscienceworld.org/interpretation/article-pdf/4/3/T395/3015649/int2015-0218.pdf by guest on 09 January 2019 agreement on its importance relative to other parts of their interpretations. Respondents’ interpretations were scored against the key features, identified via visual inspection by author E. J. Macrae, with a sample independently verified by coauthors C. E. Bond and Z. K. Shipton. We define a high-quality interpretation as one with a high RE score; i.e., the respondent’s interpretation captured most of the key features identified by one (or more) of the experts. During the analysis of respondents’ interpretations, we did not detect any other type of interpretation that was geologically valid and not at least partially represented by the five RE interpretations. We do not consider that any one RE interpretation is best; we allow that each could be correct. Respondents’ questionnaire responses yielded 28 background factors that characterized their training and experience. A further 17 interpretational factors were derived from the techniques used to interpret the seismic image (Table 1). These factors captured whether or not respondents had each type of experience (or the amount of experience) and whether each technique had been used. Use of the techniques was determined via visual inspection. We statistically analyzed the data to determine which type of respondents (in terms of their backgrounds) performed best and what interpretational techniques were most effective. Multivariate ordinal logistic regression with the proportional odds model (McCullagh, 1980), a form of generalized linear modeling, was used to determine which factors (if any) were associated with high scores, indicating better interpretations. Factors were assessed against the response variable individually, and nonsignificant factors were not progressed to the multivariate analysis. The multivariate analysis allowed the influence of each factor to be assessed relative to the simultaneous influence of all other factors in the statistical model. The analysis started with the inclusion of all individually significant factors. Factors were then iteratively removed (one at each step) until only the significant factors were left, i.e., using a manually applied backward stepwise regression procedure (Draper et al., 1966). During the factor selection process, the statistical significance o


Introduction
Geology is an interpretation-based science (Frodeman, 1995).Waste disposal, carbon capture and storage, and hydrocarbon exploitation require an increasingly nuanced understanding of the subsurface.Subsurface understanding is often based on remotely sensed geophysical imagery, combined with spatially limited borehole data.Interpretation is required for the value of such data to be realized through the creation of geologic models.
Traditional quantification of uncertainty in geologic models considers a single "base case" conceptual interpretation, within which the locations of interpreted horizons and/or model parameters are represented probabilistically (Harbaugh and Bonham-Carter, 1970;Mann, 1993).An inherent weakness of this type of approach is that it only explores part of the uncertainty space (Chamberlin, 1890;Bentley and Smith, 2008): Only one geologic concept is considered, whereas many could honor the data (Bond et al., 2008).Bond et al. (2008) propose that multiple models should be created in an interpretation work flow and then assessed for validity against accepted geologic rules.These rules are determined by the geometric relationships and physical properties of geologic units and their evolution through time (Bond, 2015).Expert elicitation may be used to assess the likelihood of each interpretation (Polson and Curtis, 2010).However, there remains a need to improve work flows to in-crease the number of structurally valid seismic interpretations (Bond et al., 2012).

Factors affecting interpretation
We conducted a survey to quantify the factors affecting geoscience interpretation, targeting experts from academia and the oil and gas industry.Bond et al. (2007Bond et al. ( , 2012) ) use a synthetic seismic image constructed from a single forward-modeled geologic cross section.We built on this by using real seismic reflection data (Stewart, 2007) for which more than one interpretation was valid, and we increased the sample size and elicited more detailed information on individuals' training and experience.Critically, we then tested our findings to demonstrate that a change in work flow increased interpretation quality.
Four hundred and forty-four experienced geoscientists interpreted a real 2D seismic image by hand (Figure 1a).An accompanying questionnaire included 21 questions that captured respondents' backgrounds (i.e., education, work environments, and experience).The questionnaire and the resulting data sets used in the analyses are openly available (see Macrae et al., 2016).An experienced geoscientist was defined as one whowasgreater than 21yearsold,hada universitydegree, had more than two years of experience after the completion of their highest degree, and had experience of seismic interpretation and structural geology.Structural geology experience was included as part of our definition of an experienced geoscientist because our previous research (Bond et al., 2007(Bond et al., , 2012) ) indicated that it affected 2D seismicinterpretation quality.To avoid influencing their interpretation, respondents were not told where the seismic data were from and the only instruction was "please interpret the whole seismic image." The age and gender of respondents were analyzed for demographic representation by combining the 2009 membership lists of AAPG, AGU, EAGE, and Geological Society of London to serve as a proxy for the underlying geoscientist population.Compared with the pooled membership lists, our sample had less than a 13% absolute difference in each of the questionnaire's age categories and less than a 3% absolute difference in the gender categories.
Because the survey used real seismic data, there was no "correct" interpretation.To assess respondents' interpretations, they were compared with the interpretations of five reference experts (REs).The REs were contacted separately and asked to spend at least 30 min interpreting the seismic image without collaboration.The REs had a median of 24.5 years of experience since completion of their highest degree, came from different technical backgrounds and they were acknowledged leading experts in seismic interpretation, structural geology, sedimentology, and tectonics.
The response variable in the analysis was the similarity of respondents' interpretations to at least one of the REs' interpretations.The five REs were asked to provide key geologic features ("those geologic features that helped to define the tectonic setting and/or stratigraphic setting of the interpretation") that were integral to their interpretation (Figure 1b).The REs provided between six and nine key features each.All five RE interpretations differed from each other, but honored the data and were geologically valid, i.e., met accepted geologic rules.Interpretations ranged from a detached listric normal fault interpretation to a transtensional fault interpretation; all but one expert interpreted an extensional tectonic regime.All experts interpreted salt with a detachment horizon, but only four experts chose salt as a key feature, indicating dis- agreement on its importance relative to other parts of their interpretations.
Respondents' interpretations were scored against the key features, identified via visual inspection by author E. J. Macrae, with a sample independently verified by coauthors C. E. Bond and Z. K. Shipton.We define a high-quality interpretation as one with a high RE score; i.e., the respondent's interpretation captured most of the key features identified by one (or more) of the experts.During the analysis of respondents' interpretations, we did not detect any other type of interpretation that was geologically valid and not at least partially represented by the five RE interpretations.We do not consider that any one RE interpretation is best; we allow that each could be correct.
Respondents' questionnaire responses yielded 28 background factors that characterized their training and experience.A further 17 interpretational factors were derived from the techniques used to interpret the seismic image (Table 1).These factors captured whether or not respondents had each type of experience (or the amount of experience) and whether each technique had been used.Use of the techniques was determined via visual inspection.We statistically analyzed the data to determine which type of respondents (in terms of their backgrounds) performed best and what interpretational techniques were most effective.Multivariate ordinal logistic regression with the proportional odds model (McCullagh, 1980), a form of generalized linear modeling, was used to determine which factors (if any) were associated with high scores, indicating better interpretations.Factors were assessed against the response variable individually, and nonsignificant factors were not progressed to the multivariate analysis.The multivariate analysis allowed the influence of each factor to be assessed relative to the simultaneous influence of all other factors in the statistical model.The analysis started with the inclusion of all individually significant factors.Factors were then iteratively removed (one at each step) until only the significant factors were left, i.e., using a manually applied backward stepwise regression procedure (Draper et al., 1966).
During the factor selection process, the statistical significance of each factor was determined by the p-value: a measure of the strength of the evidence provided in the data for a relationship between the response variable and the factor.In the multivariate analysis, we considered factors to be statistically significant when p-values were less than 0.05; smaller values were interpreted as being highly significant.Odds ratios, a measure of effect size, were then used to rank the factors in the final model.In our application, the odds ratio quantified, for each factor, how likely respondents in one category of the factor were to have higher scores than those respondents who were in another category.Factors with higher odds ratios had a greater positive effect on respondents' scores than factors with lower odds ratios.Confidence intervals for the odds ratios; these quantify the precision with which our results generalize to the underlying population, had other geoscientists been sampled.

What affects interpretation quality?
The median number of key features identified by the 444 geoscientists was three (mean 2.83).The standard deviation of the distribution was 1.5 key features.One respondent achieved a score of eight.

What was the typical training and experience of respondents?
Respondents had a median of 10 years of experience after completion of their highest degree.The median number of geographical locations where respondents had worked was six, and the four most common specialist technical areas were seismic interpretation (47.7%), structural geology (46.6%), geophysics (32.2%), and stratigraphy (27.9%).Most of respondents had experience in multiple work environments (e.g., academia, consultancy, oil and gas industry, or service companies) and specialist technical areas.Table 2 gives further information on respondents' education, experience, technical ability, and tectonic experience.

What were the typical interpretational techniques deployed?
Respondents used simpler techniques (e.g., marking stratigraphic horizons) more frequently than techniques that required substantial geologic reasoning (Table 1).Only five out of 444 respondents (1.1%) considered the What leads to a high-quality seismic interpretation?Four background factors and five interpretational techniques proved important in producing a high-quality interpretation.Of the background factors, the "level of experience in structural geology," "how often seismic images are interpreted or used," "background in a super-major or major oil company," and the "number of geographical locations" in which respondents had worked were all significant.The "length of time spent interpreting the seismic image" was not.Bond et al. (2012) also find experience in structural geology to be significant but did not collect data on the other significant factors.
The significant factors (Table 3) are ranked by their odds ratios.The most influential background factor was experience in structural geology: "specialists" were 3.25 times more likely to produce better interpretations than respondents with a "basic working knowledge," regardless of their backgrounds and the techniques used.To our knowledge, this is the first study to demonstrate that structural geologic experience is significant over and above experience in seismic interpretation.
In general, the interpretational techniques used (Table 3) were more influential in producing high-quality interpretations than respondents' background experience.The most influential technique was "geologic time."Regardless of the other significant background and technique factors, those respondents who wrote about geologic time were 4.46 times more likely to gain higher scores than those who did not.The next most influential technique was "drawing cartoons" that explained part of the interpretation, followed by writing about "geologic processes."The "justified interpretation" technique was not significant, indicating that generalized explanations were less beneficial; and the technique of "geologic evolution" was not significant, probably because so few respondents (five) used it.We defined "geologic time" to include local-scale features, such as the timing of a sedimentary package with respect to a fault, whereas "geologic evolution," a subset of geologic time, involved attempting (but not necessarily succeeding) to explain the evolution of the geology in the whole seismic image.
Although "geologic time" was identified as the most effective technique, it was not clear from this data set whether this technique caused geoscientists to produce better interpretations or whether it was just a natural consequence of being a good interpreter.This is important because if this technique produces better interpretations regardless of the individual, then current practice can be improved.
To investigate whether "geologic time" causes individuals to make better interpretations, four identical workshops were conducted.The experimental design was based on a randomized controlled clinical trial (Amberson, 1931).In total, 49 experienced geoscientists, who had not taken part in the survey, were recruited from four oil and gas companies.In each workshop, managers were asked to randomly allocate participants into two groups (a control group and a test group) and to keep the distributions of experience approximately equal while taking no other factors into account.The managers did not know the hypothesis being tested, and the geoscientists were told that they had been allocated randomly.All participants were given the same seismic image to interpret as the survey respondents, but unknown to the control group, the test group was given different written instructions.Because all other experimental factors were equal, the experiment tested for a causal link between these instructions and interpretational quality.A two-sample Poisson's rates statistical test (Przyborowski and Wilenski, 1940) was then used to determine whether the mean scores of the groups were significantly different.
The workshop control groups, which had a total of 24 participants and a median of 20.5 years of experience, were given the same instructions as the survey respondents, whereas the test group of 25 participants, which had a lower median of 14 years of experience, was instructed to: "interpret the whole seismic image.Please focus your interpretation on the geologic evolution of the section" and was asked to "summarize the geologic evolution below."All groups were given 35 min to complete the exercise; the median time taken for the control group was 30 min, whereas it was 22.5 min for the test group.
The workshops' results proved extremely strong; the quality of the two groups' interpretations was different (Figure 2).Despite having an average of 6.5 years less experience, the test group attained scores that were, on average, 62% higher than the control group (4.12∕2.54≈ 1.62).The control group scores did not differ significantly from the survey respondents.The statistical test demonstrated that the 62% increase in mean score was highly significant (p ¼ 0.002), thus establishing a causal link between "focusing on and stating" the geologic evolution and producing high-quality interpretations when compared with geoscientists who are not given any direction.
To gain a deeper understanding of the consideration of geologic evolution during interpretation, qualitative data were collected from workshop participants via a postinterpretation questionnaire and by structured group discussions.Before the groups were told what hypothesis was being tested, all participants were individually asked in a questionnaire whether considering the geologic evolution was beneficial to producing a valid interpretation; all agreed, apart from five who did not provide a response, but 36 out of 43 indicated that they found considering the geologic evolution to be "challenging" (28) or "moderately challenging" (8).Even in the test group, who were explicitly instructed to write about the geologic evolution, only 14 of the 25 participants did, and in doing so, achieved a mean score of 4.71; the remaining 11 participants explained or described their interpretations instead and achieved a lower mean score of 3.36.This difference was not significantly different, possibly due to the smaller sample size.
The managers we surveyed said that they prompted staff to consider the geologic evolution, and 90.9% or participants said it was part of their normal work flow.In the postinterpretation questionnaire, 83.3% of the control group stated that they considered the geologic evolution of their interpretation, but according to our definition only one had.This demonstrates that even though individuals might think they are considering the geologic evolution, it is the explicit process of having to write their concept down that leads to a better interpretation.Some workshop geoscientists believed they had insufficient time to consider the geologic evo-lution; however, the test group proved this was untrue given that they produced better interpretations than the control group in less than 23 min, despite (coincidentally) being less experienced.

Improving interpreter performance
The results show that even though experience is important, particularly in structural geology, interpretational techniques have the greatest impact on quality.We present statistical evidence that being instructed to "focus on and state" the geologic evolution causes geoscientists to produce better interpretations of seismic data.It is possible that some participants considered geologic evolution in their interpretation but left no evidence Note: For each factor (comparing respondents in the left-most category against those in the right-most category), the p-value indicates the strength of the evidence provided in the data for a relationship between the response variable and this factor.The odds ratio quantifies the size of the effect, showing how much more likely respondents were gain a higher score if they were in left-most category.The 95% confidence intervals quantify the precision with which the odds ratios would generalize if different geoscientists had been sampled.Tighter intervals indicate a more precise estimate of the true unknown odds ratio in the underlying geoscientist population, which this sample estimates.Technique factor rows are shaded gray.
of doing so.The statistics, however, are clear: It is writing about geologic evolution that significantly increases quality when measured against experts.Our research implies that geoscientists should be required to draw sketches with written explanations to justify the geologic evolution of their interpretations.Not only does this lead to better interpretations, it also enables improved knowledge transfer between colleagues and allows future interpreters to understand the rationale for decisions.Consideration of geologic evolution may also mitigate overconfidence because the technique can be challenging to apply.Furthermore, multiple interpretations of the same data set can be tested, and as more data become available, some can be rejected.
Our results support the theoretical proposition that effective seismic interpretation must be an investigation of structural evolutionary concepts involving geologic reasoning (Bond et al., 2015) rather than a simple stratigraphic correlation of faults and horizons.We recommend that companies would benefit from conducting controlled trials, using more complex in-house 2D and 3D data sets, with mixed teams of geologists and seismic interpreters: This approach would allow for optimization of industry work flows and would enable identification of best practice within a specific commercial environment.

Conclusions
We surveyed 444 experienced academic and industrial geoscientists and statistically analyzed their interpretations of a 2D seismic image with respect to their experience, qualifications, and the interpretational techniques used.Building on previous research, but using real seismic data for which there is no correct interpretation, we show that explicit consideration of temporal structural evolution of a section is rare among geoscientists, but it is the most influential factor in improving interpretation quality.We go on to show, through the use of controlled trials, that if interpreters are explicitly asked to describe the geologic evolution of a section, they produce significantly better interpretations.Furthermore, not only will the incorporation of written descriptions of geologic evolution within industry work flows lead to better interpretations, it will also improve knowledge transfer between colleagues and allow future interpreters to understand the rationale for historical decisions.
Our findings have implications for the interpretation of all remotely sensed data sets in which the data are sparsely distributed (e.g., gravity, magnetics, resistivity, LiDAR, and photogrammetry) and for the creation of any interpretation-based models (e.g., geologic maps).New work flows for geologic interpretation and model building, focused on evolutionary thinking, should be introduced as standard procedure to increase interpretation quality and hence reduce commercial, safety, and environmental risks.

Figure 1 .
Figure 1.(a) The 2D seismic reflection image used in the survey, from the UK Central North Sea peripheral graben system (Blocks 20/20, 21/16 area), from Stewart (2007).The vertical axis is in two-way time and has an approximate 3× vertical exaggeration.(b) The key features chosen by two or more REs are annotated, whereas those chosen by only one are not included.

Figure 2 .
Figure2.Scores attained in the workshop experiment by the control (black) and test (gray) groups of 24 and 25 participants, respectively.The 62% increase from the control group's mean (2.54) to the test group's (4.12) was statistically significant (p ¼ 0.002).

Table 1 .
Definitions and usage rates of the interpretational techniques used by respondents.
Geologic evolution (changes in large-scale structure over time, e.g., sketches) 1.1 Level of detail used (number of faults, horizons, and other features)

Table 2 .
Information about respondents.
Note: Number of respondents answering is in parentheses.Ph.D., doctor of philosophy.