Article Type : Research Article
Authors : Ershova R and Tarnow E
Keywords : Working Memory Capacity
If working memory
capacity (WMC) is quantized into “slots”, then one might expect that the number
of slots would vary by individual and if so would show up as separate peaks on
distributions. We tested this
assumption. The working memory capacity of 500 Russian college students was
measured with the Tarnow Unchunkable Test using sets of 3 and 4 double digit
items. The 3-item test distribution is
unimodal (mode at 3 items) and shows a large (18%) ceiling effect. The 4-item distribution is broadened signaling
a loss of items for many participants and is also unimodal (mode at 2 items)
apart from a smaller 6% ceiling effect.
The difference distribution between the 4-item and 3-item results also
appears unimodal with no trace of integer peaks. If the number of slots was
well defined, one would expect Pearson correlation and Cronbach’s alpha to be
high. The Pearson correlation between
the 3 and 4 item test results is relatively low 0.34 (R2=0.14), Cronbach’s
alpha is less than 0.5. High scorers on one test score only somewhat high on
the other test and no discontinuities are apparent in cross-test results. We conclude that WMC is not quantized into
well-defined slots.
“Theorizing at this stage is like skating
on thin ice - keep moving, or drown” [1].
The brain contains a hundred billion neurons yet working memory capacity (WMC) is limited to 3-7 items [2]. Indeed, if the reader attempts to remember four unrelated double digit integers (the Tarnow Unchunkable Test, TUT [3], it is very probable that one of the integers vanishes; no matter how hard the reader tries. Exactly how WMC is limited (and how the integers vanish) is not known. For example, there is a controversy in the case of visual-spatial WMC whether the storage system either consist of discrete slots or a small but flexible “resource” [4]. The authors showed that standard deviation of the total recall increases monotonically from 1 to 8 items on a concave curve (smaller successive increases) and that response latency for correct responses also increase monotonically on a concave curve from 1 to 8 items and argued these two findings were incompatible with a discrete slot model. [5] Found that in the [6] free recall experiment the response latency increased linearly with time seemingly precluding a discrete increase. There is a second controversy whether storage or attentional control determines WMC [7]. These authors found that low WMC subjects had slower and less successful saccades if a target letter appeared in an unexpected position. This correlation between WMC and “attentional control” (quickly moving eyes to the unexpected position) suggests that WMC is not pure storage. We have earlier suggested that WM consists of pointers and pointer collections rather than slots. The difference between the two is not often elaborated on (and one of the authors, Tarnow, has used the “slot” terminology before). We here take the position that a “slot” is a well-defined place of storage with a location and some kind of label while a pointer or a pointer collection is defined only by the corresponding memory items. Mnemonic experts with very large storage capacities do indeed have “slots”. They create a “memory castle” with different rooms and “store” each item in a different room [8] this is likely not the case for most subjects, in particular since most subjects cannot properly manage their storage capacity [9]. Indeed, it is hard to believe that a person who went through life with these slots, that he or she would not know about these slots, how many slots there are and would not strive to name them (since humans in general name everything they can). If there were slots, there should also be a position of those slots but humans being do not have a spatial feeling for where the slots are. Here we are going to look at the WMC distribution of 500 Russian college students. We are going to use the TUT, a test designed to consist of unchunkable items to probe the WMC limit of the students. In particular, we will look for different integral capacity limits, for example, 3 or 4, in the population distributions of WMC. A hypothetical result is shown in Fig. 1. Each group with a different working memory capacity would be displayed with a single bar at that particular capacity (Figure 1).
We present data from a study of university
students aged 17 to 24. The Tarnow
Unchunkable Test (TUT) used in this study separates out the working memory (WM)
component of free recall by using particular double-digit combinations which
lack intra-item relationships [2]. It
does not contain any explicit WM operations.
The TUT was given via the internet using client-based JAVAScript to
eliminate any network delays. The instructions and the memory items were
displayed in the middle of the screen.
Items were displayed for two seconds without pause. The trials consisted of 3 or 4 items after
which the subject was asked to enter each number remembered separately, press
the keyboard enter button between each entry and repeat until all the numbers
remembered had been entered. Pressing
the enter button without any number was considered a "no entry". The next trial started immediately after the
last entry or after a "no entry". There was no time limit for number
entry. Each subject was given six three item trials and three four item trials
in which the items are particular double-digit integers.
500 Russian undergraduate students of the
State University of Humanities and Social Studies (67% females and 33% males,
mean age was 18.8 years) participated in the study for extra credit. Each
participant was tested individually in a quiet room. An experimenter was
present throughout each session. One record was discarded – the student had
only responded once out of a possible thirty times.
The average recall population distributions
for the 3 and 4-item tests are shown in (Figure 3). There is a ceiling effect
in the 3-item distribution (open circles) of 18% and the mode is 3 items. In
the 4-item test distribution (filled circles) the ceiling effect decreases to a
smaller 6%. The mode is lowered to 2 items and the 4-item test distribution
broadens significantly. Other than the non-intuitive difference in modes there
is no trace of integral WMC populations.
Figure 1: Hypothetical distribution of total recall of 4 items if there are 75% subjects with a 3 item capacity and 25% subjects with a 4 item capacity.
Figure 2 and 3: Histogram of the working memory capacity in the 3-item 4-item tests.
Figure 4: Histogram of the working memory capacity in the 4-item test with all results above 3 folded into the 3 result. The top probability is almost the same for both tests: 20.8% for the 4-item test and 21.2% for the 3-item test corresponding to 105 and 107 subjects, respectively.
Figure 5: Histogram of the working memory capacity on the same scale assuming that all higher probabilities would be folded into the same or lower average in the 3-item test and all results in the 4-item test above 3 folded into the 3 result.
Figure 6: Top (bottom) row displays the mean and standard deviations of the 4-item (3-item) test as a function of the subject score on the 3-item (4-item) test. On average, subjects with a perfect 3-item score, score less than three items when presented with four items and subjects with a perfect 4-item score score less than three items when presented with three items. Only points with ten or more subjects have been included to minimize statistical noise.
Figure
7: Cronbach’s alpha as
a function of subtests for the 3-item, 4-item TUT tests of this investigation,
the 3-item TUT test on an old population (Tarnow, 2017) and the classic Murdock
10-2 free recall test (1962).
Figure 8: Histogram of the difference between the measured WMC in the 4-item and 3-item experiments. Note the absence of narrow peaks at integer values such as 1 or 0.
Figure 9: Standard deviation as a function of
the number of test items revealing a convex curve.
If we assume that all the subjects who
scored higher than a 3 on the 4-item test are limited by the test ceiling on
the 3-item test, we can fold them into the 3-item peak to see what the 4-item
distribution would look like on a 3-item test.
The result is displayed in (Figure 4).
The top probability is almost the same for both tests: 20.8% for the
4-item test and 21.2% for the 3-item test corresponding to 105 and 107
subjects, respectively. If we carry this
further to scale the tests the same and fold all recalls on the 3-item test
into the same or immediately lower recall we obtain the result in (Figure
5). This shows that the narrower 3-item
distribution is not simply due to the larger ceiling effect of the 3-item test.
In (Figure 6), top row, we find that perfect 3-item scorers, on average score
less than three items when presented with four items and perfect 4-item
scorers, on average score less than three items when presented with three
items. There is no discontinuity (or even a rounded discontinuity) separating
top scorers from bottom scorers. Another way to measure how well slots describe
WMC is by calculating Cronbach’s alpha.
If WM did consist of well-defined slots, then for each of the TUT
subtests (six sets of three double-digit integers and three sets of four
double-digit integers) similar items should be filed into the slots the same
way. We would expect the same total recall (for 3-item subtests or for 4-item
subtests) and Cronbach’s alpha would be 1.
This is not the case. We find that Cronbach’s alpha for the six times
repeated 3-item test is only 0.465 and for the three subtests of the 4-item
test it is only 0.499. For all nine
measurements it is 0.583. A principle component analysis for the 3-item test
gives a largest eigenvector=1.65 corresponding to 27% of the variation; for the
4-item test the largest eigenvector=1.5 and corresponds to 50% of the
variation. For both tests the largest eigenvector is 2.1 and corresponds to 23%
of the variation. To anchor our findings we calculated Cronbach’s alpha for the
10-2 free recall experiment of [10], as a function of the number of subtests
and compared it with our results also including a separate 3-item dataset of
older subjects visiting a memory clinic [11], see (Figure 7). The Murdock
experiment gives a much higher alpha for similar subjects (US college students)
compared to the TUT-3 experiment. In other words, free recall of ten words, is
a more reliable measure of short term memory than three double-digit integers
is a measure of working memory. This also suggests that the population does not
consist of groups of integral WMC slots. In (Figure 8) is displayed the
histogram of the individual difference between WMC in the 4-item and 3-item
tests. The graph has some positive
skewness (0.156) but does not look obviously asymmetric and seems to be
unimodal. In contrast to Schneegans and
Bays (2016), we find that the standard deviation of the total recall increases
as a convex curve (it blows up), see (Figure 9); in fact, as subjects go from 3
to 4 items it increases by a large 130%, suggesting 3 is an average upper limit
of WMC. Similarly to Schneegans and Bays (2016), we find that the average
response time to correct items was only 8% higher in the 4-item test than in
the 3-item test (5950 msec and 5500 msec, respectively). Thus if the items are
accessible, they just take a little extra time to recall. Note that overall,
these response times are very large compared to search rates for recognition of
50-100 msec [12]. In contrast with Kane et al (2005) we find almost no
correlation between the average response time to correct items and the average
recall (-0.017 for the 3-item test and 0.002 for the 4-item test).
We found that neither the 3-item nor the 4-item distributions showed any signs of having separate discrete peaks (as in Fig. 1) other than resulting from a ceiling effect. The 4-item distribution was significantly broader than the 3-item distribution and had a lower mode (mode=2) than the 3-item distribution (mode=3 which was also the test ceiling). Top scorers on one test were not necessarily the top scorers on the other test, and there were no discontinuities in the cross-probabilities. The Pearson correlation of the 3-item and 4-item tests was a low 0.34 and Cronbach’s alpha was nowhere near 1.0 which one might expect if WM consisted of well-defined slots. The within-subject difference between the average 4-item and 3-item recalls was unimodal and did not have any strong peaks at integer values which would be expected if some subjects had a three slot WMC and others had a four slot WMC. So far the only data we can show in support of a fixed number of slots is that the 4-item distribution was much broader than the 3-item distribution; that the average recall from the 4-item distribution is less than for the 3-item distribution and that the mode difference is 1 (however, the mode for the 3-item test was the test ceiling).
General Discussion
Thus we have shown that a slot model of WMC
is likely incorrect. We have previously shown that a large class of mistakes in
the TUT 3-item test involve either the tens or the ones digits but not both,
consistent with a pointer collection description of working memory. We will
discuss how these pointer collections make up a WMC of about three items in a
subsequent contribution.