Stable individual differences in strategies within, but not between, visual search tasks

A striking range of individual differences has recently been reported in three different visual search tasks. These differences in performance can be attributed to strategy, that is, the efficiency with which participants control their search to complete the task quickly and accurately. Here, we ask whether an individual’s strategy and performance in one search task is correlated with how they perform in the other two. We tested 64 observers and found that even though the test–retest reliability of the tasks was high, an observer’s performance and strategy in one task was not predictive of their behaviour in the other two. These results suggest search strategies are stable over time, but contextspecific. To understand visual search, we therefore need to account not only for differences between individuals but also how individuals interact with the search task and context.


Introduction
As is common in cognitive psychology, most visual search literature has focused on how the average participant performs in the task, despite it being well known that there is a great deal of variability between one subject and the next. From Treisman's work on Feature Integration Theory (Treisman & Gelade, 1980) to the latest incarnation of the Guided Search Model (Wolfe, Cain, Ehinger & Drew, 2015), we have a good understanding of what makes particular objects easier or harder to find. However, these theories and models have neglected the question of why some observers find visual search so much harder than others. These differences can emerge from several different sources of variation: tiredness (Mackworth, 1948), information-processing ability, speed-accuracy tradeoff, motivation, visual impairments (Nowakowska, Clarke, Sahraie & Hunt, 2016), and search strategies (Boot, Kramer, Becic, Wiegmann & Kubose, 2006). Although their existence has previously been noted (Mackworth, 1948;Clarke, Nowakowska & Hunt, 2019), a rigorous examination of individual differences in visual search is a challenge that has not been taken up by many researchers, and questions about their impact and stability remain relatively underexplored.
Here we focus on one source of individual differences in visual search: strategy. By strategy we mean a collection of search behaviours from which all observers can freely choose. Examples include adopting a systematic left-to-right and top-to-bottom strategy (Gilchrist & Harvey, 2006), or prioritizing locations that, based on knowledge or context, are more likely to contain the target (Wolfe, Cain, Ehinger & Drew, 2015). A striking example of the effect of strategy is given by Boot, Kramer, Becic, Wiegmann and Kubose (2006). They asked participants to monitor a cluttered display for an object changing colour or suddenly appearing. Large individual differences were found with respect to the number of saccades participants made while monitoring the stimulus, which was negatively correlated with detection performance.
Eye movement strategies have also been shown to be an important source of individual differences in visual search efficiency. Nowakowska, Clarke & Hunt (2017) designed a simple search paradigm to discriminate between optimal (Najemnik & Geisler, 2008) and stochastic (Clarke, Green, Chantler & Hunt, 2016) search strategies. Participants searched through arrays of line segments ( Figure 1) arranged such that those on one side of the display all had a very similar orientation (homogeneous), while those on the other side had higher variance (heterogeneous). This meant that targets appearing on the homogeneous side were highly salient, while targets on the heterogeneous side were harder to find. The optimal strategy here is to search the heterogeneous half, as targets on the homogeneous side can be detected with peripheral vision. We will refer to this paradigm as the Split-Half Line Segment task (SHLS). Some participants searched the displays near optimally, but others carried out strategies counter to this, failing to even match the performance of the stochastic searcher. The degree to which participants made saccades in line with the optimal search strategy was strongly correlated with the speed of their search. A related version of this paradigm has been used in research investigating eye movement strategies in response to (simulated) hemianopia (Nowakowska, Clarke, Sahraie & Hunt, 2016, with similar conclusions: the full spectrum of individual differences in strategy was observed. It is therefore not possible to conclude whether optimal or stochastic models better describe search without first explaining individual variability. A similar range of strategies, from random to nearoptimal, has been found by Irons and Leber (2016) with the Adaptive Choice Visual Search (ACVS) paradigm. This paradigm involves stimuli made up of small coloured boxes (red, blue, green and a fourth colour that varies from red, through purple, to blue and back again) with numerals written inside them (Figure 1). The target is a defined as a red or blue box containing one of four numerals (e.g., 2-5), and on each trial one target of each colour is present. The participant's task is to find one of either target as quickly as possible and report the numeral. On trials in which the fourth colour is red (or close to red), participants should search through the blue boxes and report the blue target, as there will be fewer distracters. As the fourth colour changes through purple to blue, participants should update their strategy and search for the red target. The results showed that participants varied substantially along two key dimensions: how frequently they used the more effective target colour to search (varying from chance performance to near optimal), and how often they changed between colours. Further work (Irons & Leber, 2018) has shown that these differences are stable over time (between one and ten days) with test-retest correlations of around r = 0.83 (95% CI = [0.72, 0.90]) for optimal choices.
Another example of differences in search strategy comes from the foraging literature (Kristjánsson, Jóhannesson & Thornton, 2014;Jóhannesson, Thornton, Smith, Chetverikov & Kristjánsson, 2016). In this context, foraging tasks involve searching for multiple targets on each trial. Participants were asked to search through a set of items from four categories, with two categories classed as targets. In the conjunction condition (searching for red-horizontal and green-vertical line segments among red-vertical and green-horizontal distracters), most observers searched in runs, finding all the targets of one target category, and then switching and finding the targets in the other category. This strategy has previously been observed in animal foraging (Dawkins, 1971), and suggests holding one complex target template in mind at a time is a better strategy than switching templates. However, a sub-set of participants, termed 'super-foragers', were able to change between search target categories with very little cost to performance. While test-retest reliability has not been measured explicitly for the foraging paradigm, the task was used as a measure to assess the effect of a six day mindfulness retreat on cognitive performance (Hartkamp & Thornton, 2017). From a re-analysis of these data, we can estimate that the test-retest reliability for the mean run length is r ≈ 0.7 for the feature condition and r ≈ 0.88 for the conjunction search.
Previous research has investigated the relationship between these behaviours and psychometrics, but to date, these differences have not shown strong correlations with other attributes. Leber (2016, 2018) found no evidence of a correlation between the proportion of optimal choices made by observers in the ACVS paradigm and measures of visual working memory; trait impulsivity; novelty seeking; need for cognition; and intolerance of uncertainty. Similarly, the differences in foraging behaviour are not accounted for by working memory or inhibitory control (Jóhannesson, Kristjánsson & Thornton, 2017). However, there is evidence of a link between Attention Deficit/Hyperactivity Disorder and various search behaviours (Van den Driessche, Chevrier, Cleeremans & Sackur, 2019). Furthermore, the degree to which children exhibit organised scanpaths appears to develop in tandem with executive function (Woods, Göksun, Chatterjee, Zelonis, Mehta & Smith, 2013).
A common theme emerging from these studies is the observation that individual strategies vary in their degree of effectiveness or optimality. However, "visual search" encompasses a wide range of tasks, each tapping into a different aspect of behaviour (e.g. feature-based attention, information sampling). The aim of the present study is to investigate the extent to which individual differences are stable across different visual search paradigms. Does it make sense to talk about 'super-searchers' who show aboveaverage performance in a range of search tasks (analogous to the 'super-recognizers' of the face-recognition literature (Russell, Duchaine & Nakayama, 2009))? As a secondary question, we will measure the test-retest reliability of the differences found in the SHLS paradigm, and compare it with existing estimates of reliability for ACVS and MCFT.

Methods
The methods and planned analysis for this study were registered on the Open Science Framework * before data collection started.
Participants 64 students with normal or corrected-to-normal vision from the University of Aberdeen took part in this study † . Participants were compensated for their time with either course credit or £15. All participants gave informed consent. The study was approved by the University of Aberdeen Psychology Ethics Committee.
Sample size was determined in part by a power analysis, and in part by counter-balancing. With n = 64 participants, correlations with r > 0.34 with α = 0.05, β = 0.80 between the different visual search paradigms can be detected. The sample is therefore of sufficient size to detect relatively small correlations.

Materials and Procedures
The study consists of three paradigms from the visual search literature in which large individual differences have been found (Nowakowska, Clarke & Hunt, 2017;Irons & Leber, 2016;Kristjánsson, Jóhannesson & Thornton, 2014). Example stimuli can be seen in Figure 1. A brief overview of each paradigm is given below, with full details in supplementary materials. The three tasks were completed over two sessions, approximately one week apart. The SHLS was run in both sessions. The order in which participants completed the tasks was counter-balanced. * https://osf.io/y6qbv/ † data from an additional 11 participants was discarded due to being recorded with an inappropriate screen resolution. Another participant was excluded due to colour blindness. There are 16 different possible orders of tasks/conditions; four participants completed each order for a total of 64.
Split-half Line Segments Stimuli consisted of arrays of black oriented line segments against a grey background. The target was oriented 45 • clockwise, while the distractor items had a random orientation with a mean of 45 • anti-clockwise. The variance was low (18 • ) on one half of the display to create a homogeneous texture, and high (95 • ) on the other side to create a heterogeneous texture. When the target is present on the homogeneous half, it can be easily be detected with peripheral vision, but when it is in the heterogeneous half, it is much harder to detect. This was verified in Nowakowska et al. (2017): for brief presentations viewed from the center, detection performance was close to chance for targets presented on the heterogeneous texture, and close to ceiling for targets presented on the homogeneous texture. There were 160 trials in total and homo-and heterogeneous sides of the display were randomly varied from trial to trial. The dominant eye position was recorded using a desktopmounted EyeLink 1000 eye tracker (SR Research, Canada).
This paradigm was carried out twice, once in each testing session, to give us an estimate of how consistent participants are in their search strategy over time.
Adaptive Choice Visual Search Each search display was composed of 54 red, blue , green and variable-coloured small squares (14 of each colour) arranged in three concentric rings around fixation (see 1). Variable distractors changed colours from trial-to-trial according to a 24 trial cyclical pattern: these distractors would be red for five trials, then across a period of seven trials, they would gradually change colour from red to blue. The variable distractors would then be blue for five trials, and then gradually transition back to red.
A white digit appeared inside each square. Participants were informed that two targets -a red square and a blue square each with a digit between 2 and 5 -were embedded in every search display. The two target digits were always different, to enable us to distinguish the color of the target that had been found on each trial. The remaining red, blue and variable squares all contained digits between 6-9. Green squares could contain any digit between 2-9. The location of the targets and distractor within the search display were randomized on each trial. Participants were only required to find one target on each trial, and they were free to search for either one.
Mouse Click Foraging Task In the feature foraging condition, search displays contained small red, green, yellow and blue circles. For half of the participants, targets were red and green circles, and for the other half of participants, targets were blue and yellow circles. Participants were asked to collect all of the targets within a trial by using the mouse to click on each target. Clicking on a target caused it to disappear from the display. If the participant clicked erroneously on a non-target, the trial immediately ended and a replacement trial began. The conjunction foraging task was the same, except search displays were composed of both circles and squares. For half of the participants, the shapes were red and green, and for the remaining participants the shapes were blue and yellow. Targets were defined by conjunctions of colour and shape (e.g., red squares and green circles, with red circles and green squares as distractors). The assignment of targets and distractors was assigned at random for each participant. The procedure was otherwise identical to the feature foraging task.

Replication of each task
A brief summary of participants' behaviour is given below. More time is spent on SHLS as the test-retest validity of it has not previously been assessed. Further analysis and details can be found in the supplementary materials.

Split-half Line Segments
Our results are consistent with the original SHLS study (Nowakowska, Clarke & Hunt, 2017): we find a large range of individual differences in search reaction time and accuracy (see Figure 2). These differences are stable across the two sessions, with Pearson's r ∈ [0.71, 0.89] (95% CI.) for accuracy in finding hard targets. We get similar scores for the correlation in reaction times between sessions a and b for hard targets, (r ∈ [0.54 − 0.81]), easy targets (r ∈ [0.52 − 0.80]) and target absent trials (r ∈ [0.66 − 0.86]).
We can also look at the initial search strategies adopted by our participants 2(c, d). Again, we see large and stable individual differences across the two sessions (test-retest r ∈ [0.63, 0.86] for the proportion of the first five saccades to the heterogeneous half of the display for target absent trials). More importantly, as with Nowakowska et al. (2017), we see that the search strategies give a good correlation with reaction times in both session a, r ∈ [0.52, 0.82] and session b, r ∈ [0.50, 0.80].
Adaptive Choice Visual Search We measured an individual's strategy as the percent of plateau trials in which the individual chose the optimal target (i.e., the target with the fewest distractors: When the variable distractor was red, the optimal choice was blue, and vice versa). The results for the ACVS were consistent with previous findings (Irons & Leber, 2016. We can clearly see from figure 3(a) that there are individual differences in the proportion of optimal targets reported (range 33.62% -100.00%,x = 59.15, s = 16.54) and the mean (log 2 ) reaction times (range 1.90 -4.80 seconds). As with the SHLS task, the degree to which participants follow the optimal strategy is correlated with reaction times (r ∈ [−0.65, −0.25]).

Mouse Click Foraging Task
The main measure of interest was average run length per trial in the conjunction condition, with a run defined as a succession of one or more of the same target type, which was followed and preceded by the other target or no target. The average run length was the mean number of target selections in a run. The multiple-target foraging results were in line with previous findings (Kristjánsson, Jóhannesson & Thornton, 2014;Jóhannesson, Thornton, Smith, Chetverikov & Kristjánsson, 2016), with shorter run lengths for feature foraging (x = 3.16, s = 3.14) than conjunction foraging (x = 11.73, s = 7.09). This suggests more frequent foraging for multiple targets concurrently when those targets were defined by features than by conjunctions. Figure 3(b) depicts the individual differences in the conjunction condition in terms of run length and the correlation with reaction time (r ∈ [−0.55, −0.10]).

Correlations Between Tasks
We have successfully replicated the previous findings around individual differences in visual search strategy in each of the three tasks. Furthermore, the SHLS task has been shown to have good test-retest reliability, similar to that of the ACVS and MCFT tasks. Given this, we can report the extent to which an individual's performance in one of the tasks predicts performance in the other two.
The results show that the correlations between the strategy metrics in the three tasks ( Figure 4) are weak. Perhaps even more surprisingly, there is also little evidence for meaningful correlations between reaction times in the different tasks. Even if we optimistically take all data together as suggesting a robust correlation in reaction times from one task to another, the mean correlation over the three tasks is only r = 0.2, implying that this correlation accounts for R 2 = 0.04 = 4% an individual's performance. (Analysis of accuracy correlations is also weak, and is included in the supplementary materials.)

Discussion
We successfully replicated the wide range of individual differences in strategy and performance that had previously been observed in each of these three visual search paradigms, with a larger sample size than the original experiments. Surprisingly, however, the between-paradigm correlations give R 2 ≈ 0.04; even a generous interpretation of the correlation between tasks would fail to pass the usual criteria for null hypothesis significance testing. Knowing how one person will behave in one of these paradigms tells us very little about how they will perform in the others. This lack of any consistent relationship between the search tasks occured despite the relatively high test-retest correlations of all three of the tasks individually. Indeed, the test-retest reliability of each of the three measures of visual search strategy we used in this study compare favourably to other cognitive psychology paradigms, such as the Erkison Flanker and Posner Cueing tasks, making them well suited for detecting relationships with other variables (Hedge, Powell & Sumner, 2018). We also observe strong correlations between measures of strategy and reaction time within each task. These correlations demonstrate that our strategy metrics determine a large proportion of search performance, and that our measurements are sufficiently reliable to produce clear correlations where they exist.
There are many reasons why two measurements might be uncorrelated, such as range restriction or measurement noise,  . Each point represents a participant.
but the test-retest correlations and within-task correlations on each of the individual visual search task metrics rule out many of these alternatives, leaving a true absence of shared variance between these tasks as a likely explanation for the lack of correlation. One might have expected reaction time to be at least modestly correlated from one search task to the next, as a general factor like an individual's speedaccuracy trade-off, or motivation might lead to better or worse overall performance, but there was no relationship. Although the tasks in this experiment all have visual search . The between-and within-task correlations for the three different search tasks. The bars indicate the 95% confidence intervals for Pearson's correlation coefficient. Blue bars represent test-retest scores for each task for reaction times (rt), optimality (opt) or run length (rl). Yellow bars indicate how well the strategy measures predict reaction times, while the red bars show that performance in one task is not a good indication of performance in another, either for reaction times or strategy. The opt measures reflect the extent to which participants adhered to an optimal strategy.
in common, they also have unique aspects that appear to have resonated with particular individuals' strengths, and not others. Our definition of a successful strategy in the SHLS task was fixating the locations that provide new information.
In the ACVS task, a successful strategy meant appropriately altering search goals to match changes in the environment. In the MCFT task, success involved minimizing cognitive load by minimizing target switching. Each of these tasks taps into unique aspects of visual search strategies, and performance on one has little bearing on the others. For example, recent work on the ACVS task suggests that that enumerating the color subsets plays an important role in achieving the optimal strategy (Hansen, Irons & Leber, 2019). Clearly this step isn't required in the SHLS search, as the stimulus consists of grey lines. Instead, participants have to judge the variability in orientation across the scene.
Individual differences pose a challenge for efforts to devise a comprehensive model of visual search. Our understanding of the mechanisms of visual search is based predominantly on experiments that systematically vary details of the search task and measure effects on averaged performance. This approach has led to important insights, for example, about the kinds of visual features that can guide attention (e.g. (Treisman & Gelade, 1980); how attentional control settings filter distractors (e.g. Folk, Remington & Johnston, 1992;Yantis & Egeth, 1999); and biases in attention, such as a bias towards unexplored locations (e.g. Klein, 2000). For all three of the experiments included in the current study, however, the average performance would be highly misleading, as it would describe very few of the individuals' performance. In the original SHLS study, for example, the original aim of the experiment was to assess whether search behaviour could be better described by an optimal (Najemnik & Geisler, 2008), versus a stochastic (Clarke, Green, Chantler & Hunt, 2016), model. Considering only the average performance, the stochastic model was a good explanation. Underlying that average performance, however, was a spectrum of search behaviour, replicated here, some of which would be clearly categorized as optimal, and some as stochastic, and some as neither. The original question needed to be refined: for whom is search optimal and for whom is it stochastic? Our approach in this experiment puts into practice several of the recommendations of Clarke et al. (2019), who suggest that a focus on accounting for variance, in addition to interpreting average patterns, will lead to important new insights. Another recommendation from that paper is to examine the generalizability of conclusions across paradigms, which we have also done here. Taking this further, it would be interesting to examine the extent to which the results of each of these search paradigms would scale to similar, but more familiar and realistic, contexts.  have summarised a range of methods and measures that can be used to study attention strategy, (e.g. saccadic choice, speed-accuracy tradeoff, meta-cognitive report, etc). The current findings add even further challenges for researchers, by suggesting we need to account not only for individual differences, but also for the interaction of a given individual with a particular search context.
We view these findings not as a discouraging result, but as thought-provoking and exciting. Vogel and Awh (2008) argued that studying individual differences in cognitive psychology (in their case, working memory) provides valuable insight to constraining potential theories of the underlying cognitive mechanisms. Our results suggest that context and structure of the task also needs to be taken into account. Understanding how an individual's behaviour varies across different search tasks can lead to the development of a comprehensive theory of search.