Paper for the Summer School in Analytic Philosophy
on Knowledge and Cognition
July 1-7, 2002.
Seeing, Perceiving and Knowing
The present paper has two major goals, one of which is to argue that seeing is not always perceiving and the other of which is to argue that visual perception alone leads to knowledge of the world. Let me immediately try to make these two cryptic claims more transparent. Not all human vision has been designed to allow visual perception. Seeing can and often does make us visually aware of objects, properties and facts in the world. But it need not. Often enough, seeing allows us to act efficiently on objects of which we are dimly aware, if at all. While moving at high speed, for example, experienced drivers are sometimes capable of avoiding an interfering obstacle of whose visual attributes they become fully aware afterwards. One may efficiently either catch or avoid being hit by a flying tennis ball without being aware of either its color or texture. This is the sense in which seeing is not always perceiving. If so, then the question arises as to the nature, function and cognitive role of non-perceptual vision. Here, I will make two joint claims. First of all, I will try to argue that the main job of human visual perception is to provide visual information for what functionalist philosophers have called the “belief box”. In other words, visual percepts are inputs to further conceptual processing whose output can be stored in the belief box. Secondly, I will try to argue that the function of that part of the visual system that produces what I shall call “non-perceptual” or more often “visuomotor” representations is to provide visual guidance to the “intention box”. More specifically, I will argue that, unlike visual percepts, visuomotor representations — which, I shall claim, are genuine representations — present visual information to motor intentions and serve as inputs to “causally indexical” concepts. On the joint assumptions (that I accept) that in the relevant propositional sense, only facts can be known, and that one cannot know a fact unless one believes that this very fact (or state of affairs) holds, then it follows from my distinction between perceptual and visuomotor processing that only visual perception can give rise to “detached” knowledge of the mind-independent world.
In their (1982) paper “Two Cortical Visual Systems”, the cognitive neuroscientists Leslie Ungerleider and Mortimer Mishkin posited an anatomical distinction between the ventral pathway and the dorsal pathway in the primate visual system (see Figure 1). The former projects the primary visual cortex onto inferotemporal areas. The latter projects the primary visual cortex onto parietal areas, which serve as a relay between the primary visual cortex, the premotor and the motor cortex. Ungerleider and Mishkin based their anatomical distinction on neurophysiological and behavioral evidence gathered from the study of macaque monkeys. They performed intrusive lesions respectively in the ventral and in the dorsal pathway of the visual system of macaque monkeys and they found the following double dissociation. Animals with a lesion in the ventral pathway were impaired in the identification and recognition of the colors, textures and shapes of objects. But they were relatively unimpaired in tasks or spatial orientation. In tasks of spatial orientation, they were presented with two wells one of which contained food and the other of which was empty: the former was closer to a landmark than the latter (see Figure 2). Animals with a ventral lesion could accurately use the presence of the landmark in order to discriminate the well with food from the well without. By contrast, animals with a dorsal lesion were severely disoriented, but their capacity to identify and recognize the shapes, colors and textures of objects were well-preserved. On this basis, Ungerleider and Mishkin (1982) concluded that the ventral pathway of the primate visual system is the What system and the dorsal pathway of the primate visual system is the Where system.
In their (1995) book, The Visual Brain in Action, the cognitive neuroscientists David Milner and Mel Goodale presented a number of arguments in favor of a new interpretation of the dualistic model of the human visual system. On their view, the ventral stream of the human visual system serves what they call “vision-for-perception” and the dorsal stream serves what they call “vision-for-action”. The important idea underlying Milner and Goodale’s dualistic model of human vision is that one and the same visual stimulus can be processed in two fundamentally different ways. Now, two caveats are important here. First of all, it is quite clear, I think, that, as Austin (1962) emphasized, humans can see a great variety of things: they can see e.g., tables, trees, rivers, substances, gases, vapors, mountains, flames, clouds, smoke, shadows, holes, pictures, movies, events and actions. Here, I will not examine the ontological status of all the various things that human beings can see and I shall restrict myself to seeing ordinary middle-sized objects that can also happen to be targets of human actions. Secondly, it is no objection to the dualistic model of the human visual system to acknowledge that, in the real life of normal human subjects, the two distinct modes of visual processing are constantly collaborating. Indeed, the very idea that they collaborate — if and when they do — presupposes that they are distinct. The trick of course is to find experimental conditions in which the two modes of visual processing can be dissociated. In the following, I will provide some examples drawn first from the psychophysical study of normal human subjects and then from the neuropsychological study of brain-lesioned human patients.
Bridgeman et al. (1975) Goodale et al. (1986) found that normal subjects can point accurately to a target on the screen of a computer whose motion they could not consciously notice because it coincided with one of their saccadic eye movement (see Jeannerod, 1997: 82). Castiello et al. (1991) found that subjects are able to correct the trajectory of their hand movement directed towards a moving target some 300 milliseconds before they became conscious of the target’s change of location. Pisella et al. (2000) and Rossetti & Pisella (2000) performed experiments involving a pointing task in which subjects were presented with a green target towards which they were requested to point their index finger. Some of them were instructed to stop their pointing movement towards the target when and only when it changed location by jumping either to the left or to the right. Pisella et al. (2000) and Rossetti & Pisella (2000) found a significant percentage of very fast unwilled correction movements generated by what they called the “automatic pilot” for hand movement. In a second experiment, Pisella et al. (2000) presented subjects simultaneously with pairs of a green and a red target. They were instructed to point to the green target, but the color of the two targets could be interchanged unexpectedly at movement onset. Unlike a change of target location, a change of color did not elicit fast unwilled corrective movements by the “automatic pilot”. On this basis, Pisella et al. (2000) draw a contrast between the fast visuomotor processing of the location of a target in egocentric coordinates and the slower visual processing of the color of an object.
One psychophysical area of particular interest is the study of visual size-contrast illusions. One particularly well-known such illusion is the Titchener or Ebbinghaus illusion. The standard version of the illusion consists of the display of two circles of equal diameter, one surrounded by an annulus of circles greater than it, and the other surrounded by an annulus of circles smaller than it. Although they are equal, the former looks smaller than the latter (see Figure 3). One plausible account of the Titchener illusion is that the array of smaller circles is judged to be more distant than the array of larger circles. Visually based perceptual judgments of distance and size are typically relative judgments: in a perceptual task, one cannot but fail to see some things as smaller (or larger) and closer (or further away) than other neighboring things that are parts of a single visual array. In perceptual tasks, the output of obligatory comparisons of sizes, distances and positions of constituents of a visual array serves as input to perceptual constancy mechanisms. As a result, of two physically equal objects, if one is perceived as more distant from the observer than the other, the former will be perceived as larger than the latter. A non-standard version of the illusion consists in the display of two circles of unequal diameter: the larger of the two is surrounded by an annulus of circles larger than it, while the smaller of the two is surrounded by an annulus of circles smaller than it, so that the two unequal circles look equal.
Aglioti et al. (1995) designed an experiment in which they replaced the two central circles by two graspable three-dimensional plastic disks, which they displayed within a horizontal plane. In a first row of experiments with pairs of unequal disks whose diameters ranged from 27 mm to 33 mm, they found that on average the disk in the annulus of larger circles had to be 2,5 mm wider than the disk in the annulus of smaller circles in order for both to look equal. These numbers provide a measure of the delicacy of the human visual system. Finally, Aglioti et al. (1995) alternated presentations of physically unequal disks, which looked equal, and presentations of physically equal disks, which looked unequal. Both kinds of trials were presented randomly and so were the left vs. right positions of either kind of stimuli. Subjects were instructed to pick up the disk on the left between the thumb and index finger of their right hand if they thought the two disks to be equal or to pick up the disk on the right if they judged them to be unequal.
The sequence of subjects’ choices of the disk on the right or the disk on the left provided a measure of the magnitude of the illusion prompted by the perceptual comparison between two disks surrounded by two distinct annuli. In the visuomotor task, the measure of grip size was based on the unfolding of the natural grasping movement performed by subjects while their hand approached the object. During a prehension movement, fingers progressively stretch to a maximal aperture before they close down until contact with the object. It has been found that the maximum grip aperture (MGA) takes place at a relatively fixed point, i.e., at about 60% of the duration of the movement (cf. Jeannerod, 1984). In non-illusory contexts, MGA has been found to be reliably correlated with the object’s physical size. Although much larger, it is directly proportional to the actual physical size of the object. MGA cannot depend on a conscious visual comparison between the size of the object and subjects’ hand during the prehension movement since the correlation between MGA and object’s size is reliable even when subjects have no visual access to their own hand. Rather, MGA is assumed to result from an early anticipatory automatic visual process of calibration. Thus, Aglioti et al. (1995) measured MGA in flight using optoelectronic recording.
What Aglioti et al. (1995) found was that, unlike comparative perceptual judgment expressed by the sequence of choices of either the disk on the left or the disk on the right, the grip was not significantly affected by the illusion. The influence of the illusion was significantly stronger on perceptual judgment than on the grasping task. This experiment, however, raises a number of methodological problems. The main issue, raised by Pavani et al. (1999) and Franz et al. (2000), is the asymmetry between the two tasks. In the perceptual task, subjects are asked to compare two distinct disks surrounded by two different annuli. But in the grasping task, subjects focus on a single disk surrounded by an annulus. So the question arises whether, from the observation that the comparative perceptual judgment is more affected by the illusion than the grasping task, one may conclude that perception and action are based on two distinct representational systems.
Aware of this problem, Haffenden & Goodale (1998) performed the same experiment, but they designed one more task: in addition to instructing subjects to pick up the disk on the left if they judged the two disks to be equal in size or to pick up the disk on the right if they judged them to be unequal, they required subjects to manually estimate between the thumb and index finger of their right hand the size of the disk on the left if they judged the disks to be equal in size and to manually estimate the size of the disk on the right if they judged them to be unequal (see Figure 4). Haffenden & Goodale (1998) found that the effect of the illusion on the manual estimation of the size of a disk (after comparison) was intermediary between comparative judgment and grasping.
Furthermore, Haffenden & Goodale (1998) found that the presence of an annulus had a selective effect on grasping. They contrasted the presentation of pairs of disks either against a blank background or surrounded by an annulus of circles of intermediate size, i.e., of size intermediary between the size of the smaller circles and the size of the larger circles involved in the contrasting pair of illusory annuli. The circles of intermediate size in the annulus were slightly larger than the disks of equal size. When a pair of physically different disks were presented against either a blank background or a pair of annuli made of intermediate size circles, both grip scaling and manual estimates reflected the physical difference in size between the disks. When physically equal disks were displayed against either a blank background or a pair of annuli made of circles of intermediate size, no significant difference was found between grasping and manual estimate. The following dissociation, however, turned up: when physically equal disks were presented with a middle-sized annulus, overall MGA was smaller than when physically equal disks were presented against a blank background. Thus, the presence of an annulus of middle-sized circles prompted a smaller MGA than a blank background. Conversely, overall manual estimate was larger when physically equal disks were presented against a background with a middle-sized annulus than when they were presented against a blank background. The illusory effect of the middle-size annulus presumably arises from the fact that the circles in the annulus were slightly larger than the equal disks. Thus, whereas the presence of a middle-sized annulus contributes to increasing manual estimation, it contributes to decreasing grip scaling. This dissociation shows that the presence of an annulus may have conflicting effects on perceptual estimate and on grip aperture.
Finally, Haffenden, Schiff & Goodale (2001) went one step further. They presented subjects with three distinct Titchener circle displays one at a time, two of which are the traditional Titchener central disk surrounded by an annulus of circles either smaller than it or larger than it. In the former case, the gap between the edge of the disk and the annulus is 3 mm. In the latter case, the gap between the edge of the disk and the annulus is 11 mm. In the third display, the annulus is made of small circles (of the same size as in the first display), but the gap between the edge of the disk and the annulus is 11 mm (like the gap in the second display with an annulus of larger circles) (see Figure 5). What Haffenden, Schiff and Goodale (2001) found was the following dissociation: in the perceptual task, subjects estimated the third display very much like the first display and unlike the second display. In the visuomotor task, subjects’ grasping in the third condition was much more similar to grasping in the second than in the first condition (see Figure 6). Thus, perceptual estimate was far more sensitive to the size of the circles in the annulus than to the distance between target and annulus. Conversely, grasping was far more sensitive to the distance between target and annulus than to the size of the circles in the annulus. The idea here is that the annulus is processed by the visuomotor processing as a potential obstacle for the position of the fingers on the target disk.
From this selective review of evidence on size-contrast illusions, I would like to draw two temporary conclusions. First of all, visual perception and visually guided hand actions directed towards objects impose different computational requirements on the human visual system. As I said above, visually based perceptual judgments of distance and size are typically relative comparative judgments. By contrast, visually guided actions directed towards objects are typically based on the computation of the absolute size and the egocentric representation of the location of objects on which to act. In order to successfully grab a branch or a rung, one must presumably compute the distance and the metrical properties of the object to be grabbed quite independently of pictorial contextual features in the visual array.
Second of all, what the above experiments suggest is not that, unlike perceptual judgments, the visuomotor control of grasping is immune to illusions. Rather, both perceptual judgment and the visuomotor control of action can be fooled by the environment. But if so, then they can be fooled by different features of the visual display. The effect of the Titchener size-contrast illusion on perceptual judgment arises mostly from the comparison between the diameter of the disk and the diameter of the circles in the surrounding annulus. The visuomotor processing, which delivers a visual representation of the absolute size of a target of prehension, is so sensitive to the distance between the edge of the target and its immediate environment that it can be led to process two-dimensional cues as if they were three-dimensional obstacles. I take this last point quite seriously because I claim that it is evidence that the output of the visuomotor processing of the target of an action can misrepresent features of the distal stimulus and is thus a genuine mental representation.
In the 1970’s, Weiskrantz and others discovered a neuropsychological condition called “blindsight” (see Weiskrantz, 1986, 1997). Since then, the phenomenon has been extensively studied and discussed by philosophers. Blindsight results from a lesion in the primary visual cortex anatomically located prior to the bifurcation between the ventral and the dorsal streams. The significance of the discovery of this phenomenon lies in the fact that although blindsight patients have no phenomenal subjective visual experience of the world in their blind field, nonetheless it was found out that they are capable of striking residual visuomotor capacities. In situations of forced choice, they can do such remarkable things as grap quandragular blocks and insert a hand-held card into an oriented slot. According to most neuropsychologists who have studied such cases, in blindsight patients, the visual information is processed by subcortical pathways that bypass the visual cortex and relay visual information to the motor cortex.
In the early 1990’s, DF, a British woman suffered an extensive lesion in the ventral stream of her visual system as a result of poisoning by carbon monoxide. She thus became an apperceptive agnosic, i.e., a visual form agnosic patient (see Farah, 1990 for the distinction between apperceptive and associative agnosia). Following the discovery of blindsight, the main novelty of the neuropsychological description of patient DF’s condition — first examined by Goodale et al. (1991) and his colleagues — lies in the fact that DF’s examination did not focus exclusively on what she could not do as a result of her lesion. Rather, she was investigated in depth for what she was still able to do.
Careful sensory testing of DF revealed subnormal performance for color perception and for visual acuity with high spatial frequencies, though detection of low spatial frequencies was impaired. Her motion perception was poor. DF’s perception of shape and patterns was very poor. She was unable to report the size of an object by matching it by the appropriate distance between the index finger and the thumb of her right hand. Her line orientation detection (reveald by either verbal report or by turning a hand-held card until it matched the orientation presented) was highly variable: although she was above chance for large angular orientation differences between two objects, she fell at chance level for smaller angles. DF was unable to recognize the shape of objects. Interestingly, however, her visual imagery was preserved. For example, although she could hardly draw copies of seen objects, she could draw copies of objects from memory — which she then could hardly later recognize.
By contrast with her impairment in object recognition, DF was normally accurate when object orientation or size had to be processed, not in view of a perceptual judgment, but in the context of a goal-directed hand movement. During reaching and grasping between her index finger and thumb the very same objects that she could not recognize, she performed accurate prehension movements. Similarly, while transporting a hand-held car towards a slit as part of the process of inserting the former into the latter, she could normally orient her hand through the slit at different orientations (Goodale et al., 1991, Carey et al., 1996). When presented with a pair of rectangular blocks of either the same or different dimensions and asked whether they were the same or different, she failed. When she was asked to reach out and pick up a block, the measure of her (maximal) grip aperture between thumb and index finger revealed that her grip was calibrated to the physical size of the objects, like that of normal subjects. When shown a pair of objects selected from twelve objects of different shapes for same/different judgment, she failed. When asked to grasp them using a “precision grip” between thumb and index finger, she succeeded.
Conversely, optic ataxia is a syndrome produced by lesions in the dorsal stream. An optic ataxic patient, AT, examined by Jeannerod et al. (1994) shows the reversed dissociation. While she can recognize and identify the shape of visually presented objects, she has serious visuomotor deficits: her reach is misdirected and her finger grip is improperly adjusted to the size and shape of the target of her movements.
At bottom, DF turns out to be able to visually process size, orientation and shape required for grasping objects, i.e., in the context of a reaching and grasping action, but not in the context of a perceptual judgment. Other experimental results with DF, however, indicate that her visuomotor abilities are restricted in at least two respects. First, in the context of an action, she turns out to be able to visually process simple sizes, shapes and orientations. But she fails to visually process more complex shapes. For example, she can insert a hand-held card into a slot at different orientations. But when asked to insert a T-shaped object (as opposed to a rectangular card) into a T-shaped aperture (as opposed to a simple oriented slit), her performance deteriorated sharply. Inserting a T-shaped object into a T-shaped aperture requires the ability to combine the computations of the orientation of the stem with the orientation of the top of the object together with the computation of the corresponding parts of the aperture. There are good reasons to think that, unlike the quick visuomotor processing of simple shapes, sizes and orientations, the computations of complex contours, sizes and orientations require the contribution of visual perceptual processes performed by the ventral stream — which, we know, has been severely damaged in DF.
Secondly, the contours of an object can and often are computed by a process of extraction from differences in colors and luminance cues. But normal humans can also extract the contours or boundaries of an object from other cues — such as differences in brightness, texture, shades and complex principles of Gestalt grouping and organization of similarity and good form. Now, when asked to insert a hand-held card into a slot defined by Gestalt principles of good form or by textural information, DF failed (see e.g., Goodale, 1995).
Apperceptive agnosic patients like DF raise the question: What is it like to see with an intact dorsal system alone? I presently want to emphasize what I take to be a crucial characteristic of the content of visuomotor representations jointly from the examination of DF’s condition and from the visuomotor representations of normal subjects engaged in tasks of grasping illusory displays such as Titchener circles. As I said above, a visual percept yields a representation of the relative size and distance of various neighboring elements within a visual array. I take it that it is of the essence of a percept that the processing of such visual attributes of an object as its size, shape and position or distance must be available for comparative judgment. By contrast, a visuomotor representation of a target in a task of reaching and grasping provides information about the absolute size of the object to be grasped. Crucially, the spatial position of any object can be coded in at least two major coordinate systems or frames of reference: it may be coded in an egocentric frame of reference centered on the agent’s body or it may be coded in an allocentric frame of reference centered on some object present in the visual array. The former is required for allowing an agent to reach and grasp an object. The latter is required in order to locate an object relative to some other object in the visual display.
Consider e.g., a visual percept of a glass to the left of a telephone. In the visual percept, the location of the glass relative to the location of the telephone is coded in allocentric coordinates. The visual percept has a pictorial content that, I shall argue momentarily, is both informationally richer and more fine-grained than the verbally expressible conceptual content of a different representation of the same fact or state of affairs. For example, unlike the sentence ‘The glass is to the left of the telephone’, the visual percept cannot depict the location of the glass relative to the telephone without depicting ipso facto the orientation, shape, texture, size and color of both the glass and the telephone. Conceptual processing of the pictorial content of the visual percept may yield a representation whose conceptual content can be expressed by the English sentence ‘The glass is to the left of the telephone’. Now the visuomotor representation of the glass as a target of a prehension action requires that information about the size and shape of the glass be contained within a representation of the position of the glass in egocentric coordinates. Unless the telephone interferes with the trajectory of the reaching part of the action of grasping the glass, when one intends to grasp the glass, one does not need to represent the spatial position of the glass relative to the telephone.
We know that patient DF cannot match the orientation of her wrist to the orientation of a slot in the context of a perceptual task, i.e., when she is not involved in the action of inserting a hand-held card into the slot. She can, however, successfully insert a card into an oriented slot. She cannot perceptually represent the size, shape and orientation of an object. However, she can successfully grasp an object between her thumb and index finger. So the main relevant contrast revealed by the examination of DF is that while she can use an effector (e.g., the distance between her thumb and index finger or the rotation of her wrist) in order to grasp an object or to insert a card into a slot, i.e., in the context of an action, she cannot use the same effector to express a perceptual judgment. What is the main difference between the perceptual and the visuomotor tasks? Both tasks require that visual information about the size and shape of objects be provided. But in the visuomotor task, this information is contained in a representation of the spatial position of the target coded in an egocentric frame of reference. In the perceptual task, information about the size and shape of objects is contained in a representation of the spatial position of the object coded in an allocentric frame of reference. Normal subjects can easily switch from one spatial frame of reference to the other. Such fast transformations may be required when e.g., one switches from counting items lying on a table or from drawing a copy of items lying on a table to grasping one of them. However, DF’s visual system cannot make the very same visual information about the size, shape and orientation of an object available for perceptual comparisons. In DF, information about the size and the shape of an object is trapped within a visuomotor representation of its location coded in egocentric coordinates. It is not available for recoding in an allocentric frame of reference. Coding spatial relationships among different constituents of a visual scene is crucial to forming a visual percept. By contrast, locating a target in egocentric coordinates is crucial to forming a visuomotor representation on the basis of which to act on the target.
II. Visual knowledge of the world
Although , if the above is on the right track, not all human vision has been designed to allow visual perception, nonetheless one crucial function of human vision is visual perception. Like many psychological words, ‘perception’ can be used at once to refer both to a process and to its product. There are two complementary sides to visual perception: there is an objective side and a subjective side. On the objective side, visual perception is a fundamental source of knowledge about the world. Visual perception is indeed a — if not “the” — paradigmatic process by means of which human beings gather knowledge about objects, events and facts in their environment. On the subjective side, visual perception yields a peculiar kind of awareness of the world, namely: sight. Sight has a special kind of phenomenal character (which is lacking in blindsight patients). The phenomenology of human visual experience is unlike the phenomenology of human experience in sensory modalities other than vision, e.g., touch, olfaction or audition.
On my representationalist view (close to Dretske, 1995 and Tye, 1995), much of the distinctive phenomenology of visual experience derives from the fact that the human visual system has been selected in the course of evolution to respond to a specific set of properties. Visual perception makes us aware of such fundamental properties of objects as their size, orientation, shape, color, texture, spatial position, distance and motion, all at once. One of the puzzles that arises from neuroscientific research into the visual system (and which I will not discuss here) is the question of how these various visual attributes are perceived as bound together, given the fact that neuroscience has discovered that they are processed in different areas of the human visual system (see Zeki, 1993). Unlike vision, audition makes us aware of sounds. Olfaction makes us aware of smells and odors. Touch makes us aware of pressure and temperature. Although shape can be both seen and felt, what it is like to see a shape is clearly different from what it is like to touch it. Part of of the reason for the difference lies in the fact that a normally sighted person cannot see e.g., the shape of a cube without seeing its color. But by feeling the shape of a cube, one does not thereby feel its color.
I will presently argue that visual perception is a fundamental source of knowledge about the world: visual knowledge. I assume that propositional knowledge is knowledge of facts and that one cannot know a fact unless one believes that this fact obtains. I accept something like Dretske’s (1969) distinction between two levels of visual perception: nonepistemic perception (of objects) and epistemic perception (of facts). Importantly, on my view, the nonepistemic perception of objects gives rise to visual percepts and visual percepts are different from what I earlier called visuomotor representations of the targets of one’s action. What Dretske (1969) calls nonepistemic seeing is part of the perceptual processing of visual information. In the previous section, I gave empirical reasons why visual percepts differ from visuomotor representations. Unlike the visuomotor representation of a target, a visual percept makes visual information about colors, shapes, sizes, orientations of constituents of a visual display available for contrastive identification and recognition. This is why visual percepts can serve as input to a conceptual process that can lead to a peculiar kind of knowledge of the world — visual knowledge. Visual percepts serve as inputs to conceptual processes, but percepts are not concepts: perceptual contrasts are not conceptual contrasts. My present task then will be to show that the claim that visual perception can give rise to visual knowledge of the world is consistent with the claim that visual percepts are different from thoughts and beliefs. Visual percepts lead to thoughts and beliefs, but it would be a mistake to confuse the nonconceptual contents of visual percepts with the conceptual contents of beliefs and thoughts.
II. 1. Percepts and thoughts
As many philosophers of mind and language have argued, what is characteristic of conceptual representations is that they are both productive and systematic. Like sentences of natural languages, thoughts are productive in the sense that they form an open ended infinite set. Although the lexicon of a natural language is made up of finitely many words, thanks to its syntactic rules, a language contains indefinitely many well formed sentences. Similarly, an individual may entertain indefinitely many conceptual thoughts. In particular, both sentences of public languages and conceptual thoughts contain such devices as negation, conjunction and disjunction. So one can form indefinitely many new thoughts by prefixing a thought by a negation operator, by forming a disjunctive or a conjunctive thought out of two simpler thoughts or one can generalize a singular thought by means of quantifiers. Sentences of natural languages are systematic in the sense that if a language contains a sentence S with a syntactic structure e.g., Rab, then it must contain a sentence expressing a syntactically related sentence, e.g., Rba. An individual’s conceptual thoughts are supposed to be systematic too: if a person has the ability to entertain the thought that e.g., John loves Mary, then she must have the ability to entertain the thought that Mary loves John. If a person can form the thought that Fa, then she can form both the thought that Fb and the thought that Ga (where “a” and “b” stand for individuals and “F” and “G” stand for properties). Both Fodor’s (1975, 1987) Language of Thought hypothesis and Evans’ (1982) Generality constraint are designed to account for the productivity and the systematicity of thoughts, i.e., conceptual representations. It is constitutive of thoughts that they are structured and that they involve conceptual constituents that can be combined and recombined to generate indefinitely many new structured thoughts. Thus, concepts are building blocks with inferential roles.
Because they are productive and systematic, conceptual thoughts can rise above the limitations imposed to perceptual representations by the constraints inherent to perception. Unlike thought, visual perception requires some causal interaction between a source of information and some sensory organs. For example, by combining the concepts horse and horn, one may form the complex concept unicorn, even though no unicorn has or ever will be visually perceived (except in visual works of art). Although no unicorn has ever been perceived, within a fictional context, on the basis of the inferential role of its constituents, one can draw the inference that if something is a unicorn, then it has four legs, it eats grass and it is a mammal.
Hence, possessing concepts is to master inferential relations: only a creature with conceptual abilities can draw consequences from her perceptual processing of a visual stimulus. Thought and visual perception are clearly different cognitive processes. One can think about numbers and one can form negative, disjunctive, conjunctive and general thoughts involving multiple quantifiers. Although one can visually perceive numerals, one cannot visually perceive numbers. Nor can one visually perceive negative, disjunctive, conjunctive or general facts (corresponding to e.g., universally quantified thoughts).
As Crane (1992: 152) puts it, “there is no such thing as deductive inference between perceptions”. Upon seeing a brown dog, one can see at once that the animal one faces is a dog and that it is brown. If one perceives a brown animal and one is told that it is a dog, then one can certainly come to believe that the brown animal is a dog or that the dog is brown. But on this hybrid epistemic basis, one can think or believe, but one cannot see that the dog is brown. One came to know that the dog is brown by seeing it. But one did not come to know that what is brown is a dog by seeing it. Unlike the content of concepts, the content of visual percepts is not a matter of inferential role. As emphasized by Crane (ibid.), this is not to say that the content of visual percepts is amorphous or unstructured. One proposal for capturing the nonconceptual structure of visual percepts is Peacocke’s (1992) notion of a scenario content, i.e., a visual way of filling in space. As we shall see momentarily, one can think or believe of an animal that it is dog without thinking or believing that it has a particular color. But one cannot see a dog in good daylight conditions without seeing its particular color (or colors). I shall momentarily discuss this feature of the content of visual percepts, which is part of their distinctive informational richness, as an analog encoding of information.
In section I.3, I considered the contrast between the pictorial content of a visual percept of a glass to the left of a telephone and the conceptual content expressible by means of the English sentence: ‘The glass is to the left of the telephone’. I noticed that, unlike the English sentence, the visual percept cannot represent the glass to the left of the telephone unless it depicts the shape, size, texture, color and orientation of both the glass and the telephone. I concluded that an utterance of this sentence conveys only part of the pictorial content of the visual percept since the utterance is mute about any visual attribute of the pair of objects other than their relative locations. But, further conceptual processing of the conceptual content conveyed by the utterance of the sentence may yield a more complex representation involving, not just a two-place relation, but a three-place relation also expressible by the English predicate ‘left of’. Thus, one may think that the glass is to the left of the telephone for someone standing in front of the window, not for someone sitting at the opposite side of the table. In other words, one can think that the glass is to the left of the telephone from one’s own egocentric perspective and that the same glass is to the right of the telephone from a different perspective. Although one can form the thought involving the ternary relation ‘left of’, one cannot see the glass as being to the left of the telephone from one’s own egocentric perspective because one cannot see one’s own egocentric perspective. Perspectives are not things that one can see. This is an example of a conceptual contrast that could not be drawn by visual perception. Thus, unlike a thought, a visual percept is, in one sense of the word, “informationally encapsulated”. Thought, not perception, can, as Perry (1993) puts it, increase the arity of a predicate. Notice that percepts can cause thoughts. This is one way thoughts arise. Thoughts can also cause other thoughts. But presumably, thoughts do not cause percepts.
II. 2. The finegrainedness and informational richness of visual percepts
Unlike thought, visual perception has a spatial, perspectival, iconic and/or pictorial structure not shared by conceptual thought. The content of visual perception has a spatial perspectival structure that pure thoughts lack. In order to apply the concept of a dog, one does not have to occupy a particular spatial perspective relative to any dog. But one cannot see a dog unless one occupies some spatial standpoint or other relative to it: one cannot e.g., see a dog simultaneously from the top and from below, from the front and from the back. The concept of a dog applies indiscriminately to poodles, alsatians, dalmatians or bulldogs. One can think that all dogs bark. But one cannot see all dogs bark. Nor can on see a generic dog bark. One must see some particular dog: a poodle, an alsatian, a dalmatian or a bulldog, as it might be. Although one and the same concept — the concept of a dog — may apply to a poodle, an alsatian, a dalmatian or a bulldog, seeing one of them is a very different visual experience than seeing another. One can think that a dog barks without thinking of any other properties of the dog. One cannot, however, see a dog unless one sees its shape and the colors and texture of its hairs.
Thus, the content of visual perceptual representations turns out to be both more finegrained and informationally richer than the conceptual contents of thoughts. There are three paradigmatic cases in which the need to distinguish between conceptual content and the nonconceptual content of visual perceptions may arise. First, a creature may be perceptually sensitive to objective differences for which she has no concepts. Secondly, two creatures may enjoy one and the same visual experience, which they may be inclined to conceptualize differently. Finally, two different persons may enjoy two distinct visual experiences in the presence of one and the same distal stimulus to which they may be inclined to apply one and the same concept.
Peacocke (1992: 67-8) considers, for example, a person’s visual experience of a range of mountains. As he notices, one might want to conceptualize one’s visual experience with the help of concepts of shapes expressible in English with such predicates as ‘round’ and ‘jagged’. But these concepts of shapes could apply to the nonconceptual contents of several different visual experiences prompted by the distinct shapes of several distinct mountains. Arguably, although a human being might not possess any concept of shape whose finegrainedness could match that of her visual experience of the shape of the mountain, her visual experience of the shape is nonetheless distinctive and it may differ from the visual experience of the distinct shape of a different mountain to which she would apply the very same concept. Similarly, human beings are perceptually sensitive to far more colors than they have color concepts and color names to apply. Although a human being might lack two distinct concepts for two distinct shades of color, she might well enjoy a visual experience of one shade that is distinct from her visual experience of the other shade. As Raffman (1995: 295) puts it, “discriminations along perceptual dimensions surpasses identification […] our ability ro judge whether two or more stimuli are the same or different surpasses our ability to type-identify them”.
Against this kind of argument in favor of the nonconceptual content of visual experiences, McDowell (1994, 1998) has argued that demonstrative concepts expressible by e.g., ‘that shade of color’ are perfectly suited to capture the finegrainedness of the visual percept of color. I am willing to concede to McDowell that such demonstrative concepts do exist. But I agree with Bermudez (1998: 55-7) and Dokic & Pacherie (2000) that such demonstrative concepts would seem to be too weak to perform one of the fundamental jobs that color concepts and shape concepts must be able to perform — namely recognition. Color concepts and shape concepts stored in a creature’s memory must allow recognition and reidentification of colors and shapes over long periods of time. Although pure demonstrative color concepts may allow comparison of simultaneously presented samples of color, it is unlikely that they can be used to reliably reidentify one and the same sample over time. Nor presumably could pairs of demonstrative color concepts be used to reliably discriminate pairs of color samples over time. Just as one can track the spatio-temporal evolution of a perceived object, one can store in a temporary object file information about its visual properties in a purely indexical or demonstrative format. If, however, information about an object’s visual properties is to be stored in episodic memory, for future reidentification, then it cannot be stored in a purely demonstrative or indexical format, which is linked to a particular perceptual context. Presumably, the demonstrative must be fleshed with some descriptive content. One can refer to a perceptible object as ‘that sofa’ or even as ‘that’ (followed.by no sortal). But presumably when one does not stand in a perceptual relation to the object, information about it cannot be stored in episodic memory in such a pure demonstrative format. Rather, it must be stored using a more descriptive symbol such as ‘the (or that) red sofa that used to face the fire-place’. This is presumably part of what Raffman (1995: 297) calls “the memory constrainst”. As Raffman (1995: 296) puts it:
the coarse grained character of perceptual memory explains why we can recognize ‘determinable’ colors like red and blue and even scarlet and indigo as such, but not ‘determinate’ shades of those determinables […] Because we cannot recognize determinate shades as such, ostension is our only means of communicating our knowledge of them. If I want to convey to you the precise shade of an object I see, I must point to it, or perhaps paint you a picture of it […] I must present you with an instance of that shade. You must have the experience yourself .
Two persons might enjoy one and the same kind of visual experience prompted by one and the same shape or one and the same color, to which they would be inclined to apply pairs of distinct concepts, such as ‘red’ vs ‘crimson’ or ‘polygon’ vs ‘square’. If so, it would be justified to distinguish the nonconceptual content of their common visual experience from the different concepts that each would be willing to apply. Conversely, as argued by Peacocke (1998), presented with one and the same geometrical object, two persons might be inclined to apply one and the same generic shape concept e.g., ‘that polygon’ and still enjoy different perceptual experiences or see the same object as having different shapes. For example, as Peacocke (1998: 381) points out, “one and the same shape may be perceived as square, or as diamond-shaped […] the difference between these ways is a matter of which symmetries of the shape are perceived; though of course the subject himself does not need to know that this is the nature of the difference”. If one mentally partitions a square by bisecting its right angles, one sees it as a diamond. If one mentally partitions it by bisecting its sides, one sees it as a square. Presumably, one does not need to master the concept of an axis of symmetry to perform mentally these two bisections and enjoy two distinct visual experiences.
The distinctive informational richness of the content of visual percepts has been discussed by Dretske (1981) in terms of what he calls the analogical coding of information. One and the same piece of information — one and the same fact — may be coded analogically or digitally. In Dretske’s sense, a signal carries the information that e.g., a is F in a digital form iff the signal carries no additional information about a that is not already nested in the fact that a is F. If the signal does carry additional information about a that is not nested in the fact that a is F, then the information that a is F is carried by the signal in an analogical (or analog) form. For example, the information that a designated cup contains coffee may be carried in a digital form by the utterance of the English sentence ‘There is some coffee in the cup’. The same information can also be carried in an analog form by a picture or by a photograph. Unlike the utterance of the sentence, the picture cannot carry the information that the cup contains coffee without carrying additional information about the shape, size, orientation of the cup and the color and the amount of coffee in it. As I pointed out above, unlike the concept of a dog, the visual percept of a dog carries information about which dog one sees, its spatial position, the color and texture of its hairs, etc. The contents of visual percepts are informationally rich in the sense of being analog. A thought involving several concepts in a hierarchically structured order might carry the same informational richness as a visual percept. But it does not have to. As the slogan goes, a picture is worth a thousand words. Unlike a thought, a visual percept of a cup cannot convey the information that the cup contains coffee without conveying additional information about several visual attributes of the cup.
The arguments by philosophers of mind and by perceptual psychologists in favor of the distinction between the conceptual content of thought and the nonconceptual content of visual percepts is based on the finegrainedness and the informational richness of visual percepts. Thus, it turns on the phenomenology of visual experience. In section I, I provided some evidence from psychophysical experiments performed on normal human subjects and from the neuropsychological examination of brain lesioned human patients that point to a different kind of nonconceptual content, which I labelled “visuomotor” content. Unlike the arguments in favor of the nonconceptual content of visual percepts, the arguments for the distinction between the nonconceptual content of visual percepts and the nonconceptual content of visuomotor representations do not rely on phenomenology at all. Rather, they rely on the need to postulate mental representations with visuomotor content in order to provide a causal explanation of visually guided actions towards objects. Thus, on the assumption that such behaviors as grasping objects can be actions (based on mental representations), I submit that the nonconceptual content of visual representation ought to be bifurcated into perceptual and visuomotor content as in Figure 7:
conceptual content nonconceptual content
perceptual content visuomotor content
II. 3. The interaction between visual and non-visual knowledge
Traditional epistemology has focused on the problem of sorting out genuine instances of propositional knowledge from cases of mere opinion or guessing. Propositional factual knowledge is to be distinguished from both nonpropositional knowledge of individual objects (or what Russell called “knowledge by acquaintance”) and from tacit knowledge of the kind illustrated by a native speaker’s implicit knowledge of the grammatical rules of her language. According to epistemologists, in the relevant propositional sense, what one knows are facts. In the propositional sense, one cannot know a fact unless one believes that the corresponding proposition is true, one’s belief is indeed true, and the belief was not formed by mere fantasy. On the one hand, one cannot know that the cup contains coffee unless one believes it. One cannot have this belief unless one knows what a cup is and what coffee is. On the other hand, one cannot know what is not the case: one can falsely believe that e.g., the cup contains coffee. But one cannot know it, unless a designated cup does indeed contain some coffee. True belief, however, is not sufficient for knowledge. If a true belief happens to be a mere guess or whim, then it will not qualify as knowledge. What else must be added to true belief to turn it into knowledge?
Broadly speaking, epistemologists divide into two groups. According to externalists, a true belief counts as knowledge if it results from a reliable process, i.e., a process that generates counterfactually supporting connexions between states of a believer and facts in her environment. According to internalists, for a true belief to count as knowledge, it must be justified and the believer must in addition justifiably believe that her first-order belief is justified. Since I am willing to claim that, in appropriate conditions, the way a red triangle visually looks to a person having the relevant concepts and located at a suitable distance from it provides grounds for the person to know that the object in front of her is a red triangle, I am attracted to an externalist reliabilist view of perceptual knowledge.
Although the issue is controversial and is by no means settled in the philosophical literature, externalist intuitions suit my purposes better than internalist intuitions. Arguably, one thing is to be justified or to have a reason for believing something. Another thing is to use a reason in order to offer a justification for one’s beliefs. Arguably, if a perceptual (e.g., visual) process is reliable, then the visual appearances of things may constitute a reason for forming a belief. However, one cannot use a reason unless one can explicitly engage in a reasoning process of justification, i..e., unless one can distinguish one’s premisses from one’s conclusion. Presumably, a creature with perceptual abilities and relevant conceptual resources can have reasons and form justified beliefs even if she lacks the concept of reason or justification. However, she could not use her reasons and provide justifications unless she had language and metarepresentational resources. Internalism derives most of its appeal from reflection on instances of mathematical and scientific knowledge that result from the conscious application of explicit principles of inquiry by teams of individuals in the context of special institutions. In such special settings, it can be safely assumed that the justification of a believer’s higher-order beliefs do indeed contribute to the formation and reliability of his or her first-order beliefs. Externalism fits perceptual knowledge better than internalism and, unlike internalism, it does not rule out the possibility of crediting non-human animals and human infants with knowledge of the world — a possibility made more and more vivid by the development of cognitive science.
On my view, human visual perceptual abilities are at the service of thought and conceptualisation. At the most elementary level, by seeing an object (or a sequence of objects) one can see a fact involving that object (or sequence of objects). By seeing my neighbor’s car in her driveway, I can see the fact that my neighbor’s car is parked in her driveway. I thereby come to believe that my neighbor’s car is parked in her driveway and this belief, which is a conceptually loaded mental state, is arrived at by visual perception. Hence, my term “visual knowledge”. If one’s visual system is — as I claimed it is — reliable, then by seeing my neighbor’s car — an object — in her driveway, I thereby come to know that my neighbor’s car is parked in her driveway — a fact. Hence, I come to know a fact involving an object that I actually see. This is a fundamental epistemic situation, which Dretske (1969) labels “primary epistemic seeing”: one’s visual ability allows one to know a fact about an object one perceives.
However, if my neighbor’s car happens to be parked in her driveway if and only if she is at home (and I know this), then I can come to know a different fact: I can come to know that my neighbor is at home. “Seeing” that my neighbor is at home by seeing that her car is parked in her driveway is something different from seeing my neighbor at home (e.g., seeing her in her living-room). Certainly, I can come to know that my neighbor is at home by seeing her car parked in her driveway, i.e., without seeing her. “Seeing” that my neighbor is at home by seeing that her car is parked in her driveway is precisely what Dretske (1969) calls “secondary epistemic seeing”. Secondary epistemic seeing lies at the interface between pure visual knowledge of facts involving a perceived object and non-visual knowledge that can be derived from it.
This transition from seeing one fact to seeing another displays the hierarchical structure of visual knowledge. In primary epistemic seeing, one sees a fact involving a perceived object. But in moving from primary epistemic seeing to secondary epistemic seeing, one moves from a fact involving a perceived car to a fact involving one’s unperceived neighbor (who happens to own the perceived car). This epistemological hierarchical structure is expressed by the “by” relation: one sees that y is G by seeing that x is F where x ≠ y. Although it may be more or less natural to say that one “sees” a fact involving an unperceived object by seeing a different fact involving a perceived object, the hierarchical structure that gives rise to this possibility is ubiquitous in human knowledge.
One can see that a horse has walked on the snow by seeing hoof prints in the snow. One sees the hoof prints, not the horse. But if hoof prints would not be visible in the snow at time t unless a horse had walked on that very snow at time t - 1, then one can see that a horse has walked on the snow just by seeing hoof prints in the snow. One can see that a tennis player has just hit an ace at Flushing Meadows by seeing images on a television screen located in Paris. Now, does one really see the tennis player hit an ace at Flushing Meadows while sitting in Paris and watching television? Does one see a person on a television screen? Or does one see an electronic image of a person relayed by a television? Whether one sees a tennis player or her image on a television screen, it is quite natural to say that one “sees that” a tennis player hit an ace by seeing her (or her image) do it on a television screen. Even though, strictly speaking, one perhaps did not see her do it — one merely saw pictures of her doing it —, nonetheless seeing the pictures comes quite close to seeing the real thing. By contrast, one can “see” that the gas-tank in one’s car is half-full by seeing, not the tank itself, but the dial of the gas-gauge on the dashboard of the car. If one is sitting by the steering wheel inside one’s car so that one can comfortably see the gas-gauge, then one cannot see the gas-tank. Nonetheless, if the gauge is reliable and properly connected to the gas-tank, then one can (perhaps in some loose sense) “see” what the condition of the gas-tank is by seeing the dial of the gauge.
One could wonder whether secondary epistemic seeing is really seeing at all. Suppose that one learns that the New York Twin Towers collapsed by reading about it in a French newspaper in Paris. One could not see the New York Twin Towers — let alone their collapse — from Paris. What one sees when one reads a newspaper are letters printed in black ink on a white sheet of paper. But if the French newspaper would not report the collapse of the New York Twin Towers unless the New York Twin Towers had indeed collapsed, then one can come to know that the New York Twin Towers have collapsed by reading about it in a French newspaper. There is a significant difference between seeing that the New York Twin Towers have collapsed by seeing it happen on a television screen and by reading about it in a newspaper. Even if seeing an electronic picture of the New York Twin Towers is not seeing the Twin Towers themselves, still the visual experience of seeing an electronic picture of them and the visual experience of seeing them have a lot in common. The pictorial content of the experience of seeing an electronically produced color-picture of the Towers is very similar to the pictorial content of the experience of seeing them. Unlike a picture, however, a verbal description of an event has conceptual content, not pictorial content. The visual experience of reading an article reporting the collapse of the New York Twin Towers in a French newspaper is very different from the experience of seeing them collapse. This is the reason why it may be a little awkward to say that one “saw” that the New York Twin Towers collapsed if one read about it in a French newspaper in Paris as opposed to seeing it happen on a television screen.
Certainly, ordinary usage of the English word ‘see’ is not sacrosanct. We say that we “see” a number of things in circumstances in which what we do owes little — if anything — to our visual abilities. “I see what you mean”, “I see what the problem is” or “I finally saw the solution” report achievements quite independent of visual perception. Such uses of the verb ‘to see’ are loose uses. Such loose uses do not report epistemic accomplishments that depend significantly on one’s visual endowments. By contrast, cases of what Dretske (1969) calls secondary epistemic seeing are epistemic achievements that do depend on one’s visual endowments. True, in cases of secondary epistemic seeing, one comes to know a fact without seeing some of its constituent elements. True, one could not come to learn that one’s neighbor is at home by seeing her car parked in her driveway unless one knew that her car is indeed parked in her driveway when and only when she is at home. Nor could one see that the gas-tank in one’s car is half-full by seeing the dial of the gas-gauge unless one knew that the latter is reliably correlated with the former. So secondary epistemic seeing could not possibly arise in a creature that lacked knowedge of reliable correlations or that lacked the cognitive resources required to come to know them altogether.
Nonetheless secondary epistemic seeing has indeed a crucial visual component in the sense that visual perception plays a critical role in the context of justifying such an epistemic claim. When one claims to be able to see that one’s neighbor is at home by seeing her car parked in her driveway or when one claims to be able to see that the gas-tank in one’s car is almost empty by seeing the gas-gauge, one relies on one’s visual powers in order to ground one’s state of knowledge. The fact that one claims to know is not seen. But the grounds upon which the knowledge is claimed to rest are visual grounds: the justification for knowing an unseen fact is seeing another fact correlated with the former. Of course, in explaining how one can come to know a fact about one thing by knowing a different fact about a different thing, one cannot hope to meet the philosophical challenge of scepticism. From the standpoint of scepticism, as Stroud (1989) points out, the explanation may seem to beg the question since it takes for granted one’s knowledge of one fact in order to explain one’s knowledge of another fact. But the important thing for present purposes is that — scepticism notwithstanding — one offers a perfectly good explanation of how one comes to know a fact about an object one does not perceive by knowing a different fact about an object one does perceive. The point is that much — if not all — of the burden of the explanation lies in visual perception: seeing one’s neighbor’s car is the crucial step in justifying one’s belief that one’s neighbor is at home. Seeing the gas-gauge is the crucial step in justifying one’s belief that one’s tank is almost empty. The reliability of visual perception is thus critically involved in the justification of one’s knowledge claim. In cases of primary epistemic seeing, the reliability of one’s visual system provides justifications for one’s visual knowledge in the sense that it provides one with reasons for believing that the fact involving an object one perceives obtains. In secondary epistemic seeing, one claims to know a fact that does not involve a perceived object. Still, the reliability of one’s visual system plays an indirect role in cases of secondary epistemic seeing in the sense that it provides grounds for one’s visual knowledge about a fact involving a perceived object, upon which one’s knowledge of a fact not involving a perceived object rests.
Thus, secondary epistemic seeing lies at the interface between an individual’s visual knowledge (i.e., knowledge formed by visual means) and the rest of her knowledge. In moving from primary epistemic seeing to secondary epistemic seeing, an individual exploits her knowledge of regular connections. Although it is true that unless one knows the relevant correlation, one could not come to know the fact that the gas-tank in one’s car is empty by seeing the gas-gauge, nonetheless one does not consciously or explicitly reason from the perceptually accessible premiss that one’s neighbor’s car is parked in her driveway together with the premiss that one’s neighbor’s car is parked in her driveway when and only when one’s neighbor is at home to the conclusion that one’s neighbor is at home. Arguably, the process from primary to secondary epistemic seeing is inferential. But if it is, then the inference is unconscious and it takes place at the “sub-personal” level.
What the above discussion of secondary epistemic seeing so far reveals is that the very description and understanding of the hierarchical structure of visual knowledge and its integration with non-visual knowledge requires an epistemological and/or psychological distinction between seeing of objects and seeing facts — a point much emphasized in Dretske’s writings on the subject — or between nonepistemic and epistemic seeing. The neurophysiology of human vision is such that some objects are simply not accessible to human vision. They may be too small or too remote in space and time for a normally sighted person to see them. For more mundane reasons, a human being may be temporarily so positioned as not to be able to see one object — be it her neighbor or the gas-tank in her car. Given the correlations between facts, by seeing a perceptible object, one can get crucial information about a different unseen object. Given the epistemic importance of visual perception in the hirarchical structure of human knowledge, it is important to understand how by seeing one object, one can provide decisive reasons for knowing facts about objects one does not see.
II. 4. The scope and limits of visual knowledge
I now turn my attention again from what Dretske calls secondary epistemic seeing (i.e., visually based knowledge of facts about objects one does not perceive) back to what he calls primary epistemic seeing, i.e., visual knowledge of facts about objects one does perceive. When one purports to ground one’s claim to know that one’s neighbor is at home by mentioning the fact that one can see that her car is parked in her driveway, clearly one is claiming to be able to see a car, not one’s neighbor herself. Now, let us concentrate on the scope of knowledge claims in primary epistemic seeing, i.e., knowledge about facts involving a perceived object. Let us suppose that someone claims to be able to see that the apple on the table is green. Let us suppose that the person’s visual system is working properly, the table and what is lying on it are visible from where the person stands, and the lighting is suitable for the person to see them from where she stands. In other words, there is a distinctive way the green apple on the table looks to the person who sees it. Under those circumstances, when the person claims that she can see that the apple on the table is green, what are the scope and limits of her epistemic claims?
Presumably, in so doing, she is claiming that she knows that there is an apple on the table in front of her and that she knows that this apple is green. If she knows both of this, then presumably she also knows that there is a table under the apple in front of her, she knows that there is a fruit on the table. Hence, she knows what the fruit on the table is (or what is on the table), she knows where the apple is, she knows the color of the apple, and so on. Arguably, the person would then be in a position to make all such claims in response to the following various queries: is there anything on the table? What is on the table? What kind of fruit is on the table? Where is the green apple? What color is the apple on the table? If the person can see that the apple on the table is green, then presumably she is in a position to know all these facts.
However, when she claims that she can see that the apple on the table is green, she is not thereby claiming that she can see that all of these facts obtain. What she is claiming is more restricted and specific than that: She is indeed claiming that she knows that there is an apple on the table and that the apple in question is green. Furthermore, she is claiming that she learnt the latter fact — the fact about the apple’s color — through visual perception: if someone claims that she can see that the apple on the table is green, then she is claiming that she has achieved her knowledge of the apple’s color by visual means, and not otherwise. But she is not thereby claiming that her knowledge of the location of the apple or her knowledge of what is on the table have been acquired by the very perceptual act (or the very perceptual process) that gave rise to her knowledge of the apple’s color. Of course, the person’s alleged epistemic achievement does not rule out the possibility that she came to know that what is on the table is an apple by seeing it earlier. But if she did, this is not part of the claim that she can see that the apple on the table is green. It is consistent with this claim that the person came to know that what is on the table is an apple by being told, by tasting it or by smelling it. All she is claiming and all we are entitled to conclude from her claim is that the way she learnt about the apple’s color is by visual perception.
The investigation into the scope and limits of primary visual knowledge is important because it is relevant to the challenge of scepticism. As I already said, my discussion of visual knowledge does not purport to meet the full challenge of scepticism. In discussing secondary epistemic seeing, I noticed that in explaining how one comes to know a fact about an unperceived object by seeing a different fact involving a perceived object, one takes for granted the possibility of knowing the latter fact by perceiving one of its constituent objects. Presumably, in so doing, one cannot hope to meet the full challenge of scepticism that would question the very possibility of coming to know anything by perception. I briefly turn to the sceptical challenge to which claims of primary epistemic seeing are exposed. By scrutinizing the scope and limits of claims of primary visual knowledge, I want to examine briefly the extent to which such claims are indeed vulnerable to the sceptical challenge. Claims of primary visual knowledge are vulnerable to sceptical queries that can be directed backwards and forwards. They are directed backwards when they apply to background knowledge, i.e., knowledge presupposed by a claim of primary visual knowledge. They are directed forward when they apply to consequences of a claim of primary visual knowledge. I turn to the former first.
Suppose a sceptic were to challenge a person’s commonsensical claim that she can see (and hence know by perception) that the apple on the table in front of her is green by questioning her grounds for knowing that what is on the table is an apple. The sceptic might point out that, given the limits of human visual acuity and given the distance of the apple, the person could not distinguish by visual means alone a genuine green apple — a green fruit — from a fake green apple (e.g., a wax copy of a green apple or a green toy). Perhaps, the person is hallucinating an apple when there is in fact nothing at all on the table. If one cannot visually discriminate a genuine apple from a fake apple, then, it seems, one is not entitled to claim that one can see that the apple on the table is green. Nor is one entitled to claim that one can see that the apple on the table is green if one cannot make sure by visual perception that one is not undergoing a hallucination. Thus, the sceptical challenge is the following: if visual perception itself cannot rule out a number of alternative possibilities to one’s epistemic claim, then the epistemic claim cannot be sustained.
The proper response to the sceptical challenge here is precisely to appeal to the distinction between claims of visual knowledge and other knowledge claims. When the person claims that she can see that the apple on the table is green, she is claiming that she learnt something new by visual perception: she is claiming that she just gained new knowledge by visual means. This new perceptually-based knowledge is about the apple’s color. The perceiver’s new knowledge — her epistemic “increment”, as Dretske (1969) calls it — must be pitched against what he calls her “proto-knowledge”, i.e., what the person knew about the perceived object prior to her perceptual experience. The reason it is important to distinguish between a person’s prior knowledge and her knowledge gained by visual perception is that primary epistemic seeing (or primary visual knowledge) is a dynamic process. In order to determine the scope and limits of what has been achieved in a perceptual process, we ought to determine a person’s initial epistemic stage (the person’s prior knowledge about an object) and her final epistemic stage (what the person learnt by perception about the object). Thus, the question raised by the sceptical challenge (directed backwards) is a question in cognitive dynamics: How much new knowledge could a person’s visual resources yield, given her prior knowledge? How much has been learnt by visual perception, i.e., in an act of visual perception? What new information has been gained by visual perception?
So when the person claims that she can see that the apple on the table is green, she no doubt reports that she knows both that there is an apple on the table and that it is green. She commits herself to a number of epistemic claims: she knows what is on the table, she knows that there is a fruit on the table, she knows where the apple is, and so on. But she merely reports one increment of knowledge: she merely claims that she just learnt by visual perception that the apple is green. She is not thereby reporting how she acquired the rest of her knowledge about the object, e.g., that it is an apple and that it is on the table. She claims that she can see of the apple that it is green, not that what is green is an apple, nor that what is on the table is an apple. The claim of primary visual knowledge bears on the object’s color, not on some of its other properties (it’s being e.g., an apple, a fruit or its location). All her epistemic claim entails is that, prior to her perceptual experience, she assumed (as part of her “proto-knowledge” in Dretske’s sense) that there was a apple on the table and then she discovered by visual perception that the apple was green.
I now turn my attention to the sceptical challenge directed forward — towards the consequences of one’s claims of visual knowledge. The sceptic is right to point out that the person who claims to be able to see the color of an apple is not thereby in a position to see that the object whose color she is seeing is a genuine apple — a fruit — and not a wax apple. Nor is the person able to see that she is not hallucinating. However, since she is neither claiming that she is able to see of the green object that it is a genuine apple nor that she is not hallucinating an apple, it follows that the sceptical challenge cannot hope to defeat the person’s perceptual claim that she can see what she claims that she can see, namely that the apple is green. On the externalist picture of perceptual knowledge which I accept, a person knows a fact when and only when she is appropriately connected to the fact. Visual perception provides a paradigmatic case of such a connexion. Hence, visual knowledge arises from regular correlations between states of the visual system and environmental facts. Given the intricate relationship between a person’s visual knowledge and her higher cognitive functions, she will be able to draw many inferences from her visual knowledge. If a person knows that the apple in front of her is green, then she may infer that there is a colored fruit on the table in front of her. Given that fruits are plants and that plants are physical objects, she may further infer that there are at least some physical objects. Again, the sceptic may direct his challenge forward: the person claims to know by visual means that the apple in front of her is green. But what she claims she knows entails that there are physical objects. Now, the sceptic argues, a person cannot know that there are physical objects — at least, she cannot see that there are. According to the sceptic, failure to see that there are physical objects entails failure to see that the apple on the table is green.
A person claims that she can know proposition p by visual perception. Logically, proposition p entails proposition q. There could not be a green apple on the table unless there exists at least one physical object. Hence, the proposition that the apple on the table is green could not be true unless there were physical objects. According to the sceptic, a person could not know the former without knowing the latter. Now the sceptic offers grounds for questioning the claim that the person knows proposition q at all — let alone by visual perception. Since it is dubious that she does know the latter, then, according to scepticism, she fails to know the former. Along with Dretske (1969) and Nozick (1981), I think that the sceptic relies on the questionable assumption that visual knowledge is deductively closed. From the fact that a person has perceptual grounds for knowing that p, it does not follow that she has the same grounds for knowing that q, even if q logically follows from p. If visual perception allows one to get connected in the right way to the fact corresponding to proposition p, it does not follow that visual perception ipso facto allows one to get connected in the same way to the fact corresponding to proposition q even if q follows logically from p.
A person comes to know a fact by visual perception. What she learns by visual perception implies a number of propositions (such as there are physical objects). Although such propositions are logically implied by what the person learnt by visual perception, she does not come to know by visual perception all the consequences of what she learnt by visual perception. She does not know by visual perception that there are physical objects — if she knows it at all. Seeing a green apple in front of one has a distinctive visual phenomenology. Seeing that the apple in front of one is green too has a distinctive visual phenomenology. There is something distinctively visual about what it is like for one to see that the apple in front of one is green. If an apple is green, then it is colored. However, it is dubious whether there is a visual phenomenology to thinking of the apple in front of one that it is colored. A fortiori, it is dubious whether there is a visual phenomenology to thinking that there are physical objects. Hence, contrary to what the sceptic assumes, I want to claim, as Dretske (1969) and Nozick (1981) have, that visual knowledge is not deductively closed.
III. The role of visuomotor representations in the human cognitive architecture
In the present section, I shall sketch my reasons for thinking that visuomotor representations do not lead to detached knowledge of the world. Rather, they serve as input to intentions in at least two respects: on the one hand, they provide visual guidance to what I shall call “motor intentions”. On the other hand, they provide visual information for “causally indexical” concepts. I will start by laying out the basic distinction between two different kinds of “direction of fit” that can be exemplified by mental representations.
III.1. Direction of fit
Whereas visual percepts serve as inputs to the “belief box”, visuomotor representations, I now want to argue, serve is inputs to a different kind of mental representations, i.e., intentions. As emphasized by Anscombe (1957) and Searle (1983, 2001), perceptions, beliefs, desires and intentions each have a distinctive kind of intentionality. Beliefs and desires have what Searle calls “opposite direction of fit”. Beliefs have a mind-to-world direction of fit: they can be true or false. A belief is true if and only if the world is as the belief represents it to be. It is the function of beliefs to match facts or actual state of affairs. In forming a belief, it is up for the mind to meet the demands of the world. Unlike beliefs, desires have a world-to-mind direction of fit. Desires are neither true nor false: they are fulfilled or frustrated. The job of a desire is not to represent the world as it is, but rather as the agent would like it to be. Desires are representations of goals, i.e., possible nonactual states of affairs. In entertaining a desire, it is so to speak up for the world to meet the demands of the mind. The agent’s action is supposed to bridge the gap between the mind’s goal and the world.
As Searle (1983, 2001) has noticed, perceptual experiences and intentions have opposite directions of fit. Perceptual experiences have the same mind-to-world direction of fit as beliefs. Intentions have the same world-to-mind direction of fit as desires. In addition, perceptual experiences and intentions have opposite directions of causation: whereas a perceptual experience represents the state of affairs that causes it, an intention causes the state of affairs that it represents.
Although intentions and desires share the same world-to-mind direction of fit, intentions are different from desires in a number of important respects, which all flow from the peculiar commitment to action of intentions. Broadly speaking, desires are relevant to the process of deliberation that precedes one’s engagement into a course of action. Once an intention is formed, however, the process of deliberation comes to an end. To intend is to have made up one’s mind about whether to act. Once an intention is formed, one has taken the decision whether to act. I shall mention four main differences between desires and intentions.
First, although desires may be about anything or anybody, intentions are always about the self. One can only intend oneself to do something. Second, unlike desires, intentions are tied to the present or the future: one cannot intend to do something in the past. Third, unlike the contents of desires, the contents of intentions must be about possible nonactual states of affairs. An agent cannot intend to achieve a state of affairs that she knows to be impossible at the time when she forms her intention. Finally, although one may entertain desires whose contents are inconsistent, one cannot have two intentions whose contents are inconsistent.
Reaching and grasping objects are visually guided actions directed towards objects. I assume that all actions are caused by intentions. Intentions are psychological states with a distinctive intentionality. As I said earlier, intentions derive their peculiar commitment to action from the combination of their distinctive world-to-mind direction of fit and their distinctive mind-to-world direction of causation. I shall now argue that visuomotor representations have a dual function in the human cognitive architecture: they serve as inputs to “motor intentions” and they serve as input to a special class of indexical concepts, the “causally indexical” concepts.
III.2. Visuomotor representations serve as inputs to motor intentions
Not all actions, I assume, are caused by what Searle (1983, 2001) calls prior intentions, but all actions are caused by what he calls intentions in action, which, following Jeannerod (1994), I will call motor intentions. Unlike prior intentions, motor intentions are directed towards immediately accessible goals. Hence, they play a crucial role, not so much in the planning of action as in the execution, the monitoring and the control of the ongoing action. Arguably, prior intentions may have conceptual content. Motor intentions do not. For example, one intends to climb a visually perceptible mountain. The content of this prior intention involves e.g., the action concept of climbing and a visual percept of the distance, shape and color of the mountain. In order to climb the mountain, however, one must intentionally perform an enormous variety of postural and limb movements in response to the slant, orientation and the shape of the surface of the slope. Human beings automatically assume the right postures and perform the required flexions and extensions of their feet and legs. Since they do not possess concepts matching each and every such movements, their non-deliberate intentional behavioral responses to the slant, orientation and shape of the surface of slope is monitored by the nonconceptual nonperceptual content of motor intentions.
Not any sensory representation can match the peculiar commitment to action of motor intentions. Visuomotor representations can. Percepts are informationally richer and more fine-grained than either concepts or visuomotor representations. As I claimed above, visual percepts have the same mind-to-world direction of fit as beliefs. This is why visual percepts are suitable inputs to a process of selective elimination of information, whose ultimate conceptual output can be stored in the belief box.
I shall presently argue that visuomotor representations have a different function: they provide the relevant visual information about the properties of a target to an agent’s motor intentions. Indeed, I want to think of the role of the visuomotor representation of a target for action as Gibson (1979) thought of an affordance. However, unlike Gibson (1979), who did not make a distinction between perceptual and visuomotor processing, I do not think of the visuomotor processing of a target as a “direct pick up of information”. I think that visuomotor representations are genuine representations. My main reason for thinking of the output of the visuomotor processing of a target as a genuine mental representation — and for thinking of grasping as a genuine action, not a behavioral reflex — is that Haddenden, Schiff & Goodale’s (2001) experiment suggests that the visuomotor processing of a target can be fooled by features of the visual display: it can be led to process two dimensional cues as if they were three dimensional obstacles. If the output of the visuomotor processing of a display can misrepresent it, then it represents it.
Unlike visual percepts whose single role is to present visual information for further processing the output of which will be stored in the belief box, visuomotor representations are hybrid: as Millikan (1996), who calls them “pushmi-pullyu representations” has perceptively recognized, they have a dual role. I slightly depart from Millikan (1996), however, in that, unlike her, I assume that visuomotor representations, not motor intentions, have a double direction of fit. Visuomotor representations present states of affairs as both facts and goals for immediate action. On the one hand, they provide visual information for the benefit of motor intentions. On the other hand, their content can be conceptualized with the help of a special class of indexical concepts: causal indexicals. Whereas visual percepts must be stripped of much of their informational richness to be conceptualized, visuomotor representations can directly provide relevant visual information about the target of an action to motor intentions. To put it crudely, it follows from the work summarized in Jeannerod (1994, 1997) that the content of a motor intention has two sides: a subjective side and an objective side. On the subjective side, a motor intention represents the agent’s body in action. On the objective side, it represents the target of the action. Visuomotor representations contribute to the latter. Their ‘motoric’ informational encapsulation makes them suitable for this role. The nonconceptual nonperceptual content of a visuomotor representation matches that of a motor intention.
Borrowing from the study of language processing, Jeannerod (1994, 1997) has drawn a distinction between the semantic and the pragmatic processing of visual stimuli. The view I want to put forward has been well expressed by Jeannerod (1997: 77): “at variance with the […] semantic processing, the representation involved in sensorimotor transformation has a predominantly ‘pragmatic’ function, in that it relates to the object as a goal for action, not as a member of a perceptual category. The object attributes are represented therein to the extent that they trigger specific motor patterns for the hand to achieve the proper grasp”. Thus, the crucial feature of the pragmatic processing of visual information is that its output is a suitable input to the nonconceptual content of motor intentions.
III.3. Visuomotor representations serve as inputs to causal indexicals
I have just argued that what underlies the contrast between the pragmatic and the semantic processing of visual information is that, whereas the output of the latter is designed to serve as input to further conceptual processing with a mind-to-world direction of fit, the output of the former is designed to match the nonconceptual content of motor intentions with a world-to-mind direction of fit and a mind-to-world direction of causation. The special features of the nonconceptual contents of visuomotor representations can be inferred from the behavioral responses which they underlie, as in patient DF. They can also be deduced from the structure and content of elementary action concepts with the help of which they can be categorized.
I shall presently consider a subset of elementary action concepts, which, following Campbell (1994), I shall call “causally indexical” concepts. Indexical concepts are shallow but indispensable concepts, whose references change as the perceptual context changes and whose function is to encode temporary information. Indexical concepts respectively expressed by ‘I’, ‘today’ and ‘here’ are personal, temporal and spatial indexicals. Arguably, their highly contextual content cannot be replaced by pure definite descriptions without loss. Campbell (1994: 41-51) recognizes the existence of causally indexical concepts whose references may vary according to the causal powers of the agent who uses them. Such concepts are involved in judgments having, as Campbell (1994: 43) puts it, “immediate implications for [the agent’s] action”. Concepts such as “too heavy”, “out of reach”, “within my reach”, “too large”, “fit for grasping between index and thumb” are causally indexical concepts in Campbell’s sense.
Campbell’s idea of causal indexicality does capture a kind of judgment that is characteristically based upon the output of the pragmatic (or motor) processing of visual stimuli in Jeannerod’s (1994, 1997) sense. Unlike the content of the direct output of the pragmatic processing of visual stimuli or that of motor intentions, the contents of judgments involving causal indexicals is conceptual. Judgments involving causally indexical concepts have low conceptual content, but they have conceptual content nonetheless. For example, if something is categorized as “too heavy”, then it follows that it is not light enough. The nonconceptual contents of either visuomotor representations or motor intentions is better compared with that of an affordance in Gibson’s sense.
Causally indexical concepts differ in one crucial respect from other indexical concepts, i.e., personal, temporal and spatial indexical concepts. Thoughts involving personal, temporal and spatial indexical concepts are “egocentric” thoughts in the sense that they they are perception-based thoughts. This is obvious enough for thoughts expressible with either the first-person pronouns ‘I’ or ‘you’. To refer to a location as ‘here’ or ‘there’ and to refer to a day as ‘today’, ‘yesterday’ or ‘tomorrow’ is to refer respectively to a spatial and a temporal region from within some egocentric perspective: a location can only be referred to as ‘here’ or ‘there’ from some particular spatial egocentric perspective. A temporal region can only be referred to by ‘today’, ‘yesterday’ or ‘tomorrow’ from some particular temporal egocentric perspective. In this sense, personal, temporal and spatial indexical concepts are egocentric concepts. Arguably, egocentric indexicals lie at the interface between visual percepts and an individual’s conceptual repertoire about objects, times and locations.
Many philosophers (see e.g., Kaplan, 1989 and Perry, 1993) have argued that personal, temporal and spatial indexical and/or demonstrative concepts play a special “essential” and ineliminable role in the explanation of action. And so they do. As Perry (1993: 33) insightfully writes: “I once followed a trail of sugar on a supermarket floor, pushing my cart down the aisle on one side of a tall counter and back the aisle on the other, seeking the shopper with the torn sack to tell him he was making a mess. With each trip around the counter, the trail became thicker. But I seemed unable to catch up. Finally it dawned on me. I was the shopper I was trying to catch”. To believe that the shopper with a torn sack is making a mess is one thing. To believe that oneself is making a mess is something else. Only upon forming the thought expressible by ‘I am making a mess’ is it at all likely that one may take appropriate measures to change one’s course of action. It is one thing to believe that the meeting starts at 10:00 AM. It is another thing to believe that the meeting starts now, even if now is 10:00 AM. Not until one thinks that the meeting starts now will one get up and run. Consider someone standing still at an intersection, lost in a foreign city. One thing is for that person to intend to go to her hotel. Something else is to intend to go this way, not that way. Only after she has formed the latter intention with a demonstrative locational content, will she get up and walk.
Thus, such egocentric concepts as personal, temporal and spatial indexicals and/or demonstratives derive their ineliminable role in the explanation of action from the fact that their recognitional role cannot be played by any purely descriptive concept. Recognition involves a contrast but it can be achieved without recourse to a uniquely specifying definite description. Indexicals and demonstratives are mental pointers that can be used to refer to objects, places and times. Personal indexicals are involved in the recognition of persons. Temporal indexicals are involved in the recognition of temporal regions or instants. Spatial indexicals are involved in the recognition of locations. To recognize oneself as the reference of ‘I’ is to make a contrast with the recognition of the person one addresses in verbal communication as ‘you’. To identify a day as ‘today’ is to contrast it with other days that might be identified as ‘yesterday’, ‘the day before yesterday’, ‘tomorrow’, etc. To identify a place as ‘here’ is to contrast it with other places referred to as ‘there’.
Although indexicals and demonstratives are concepts, they have non-descriptive conceptual content. The conceptual system needs such indexical concepts because it lacks the resources to supply a purely descriptive symbol, i.e., a symbol that could uniquely identify a person, a time or a place. A purely descriptive concept would be a concept that a unique person, a unique time or a unique place would satisfy by uniquely exemplifying each and every of its constituent features. We cannot specify the references of our concepts all the way down by using uniquely identifying descriptions on pain of circularity. If, as Pylyshyn (2000: 129) points out, concepts need to be “grounded”, then on pain of circularity, “the grounding [must] begin at the point where something is picked out directly by a mechanism that works like a demonstrative” (or an indexical). If concepts are to be hooked to or locked onto objects, times and places, then on pain of circularity, definite descriptions will not supply the locking mechanism.
Personal, temporal and spatial indexicals owe their special explanatory role to the fact that they cannot be replaced by purely descriptive concepts. Although they allow recognition by nondescriptive means, their direction of application is mind-to-world. Causally indexical concepts, however, play a different role altogether. Unlike personal, temporal and spatial indexical concepts, causally indexical concepts have a distinctive quasi-deontic or quasi-evaluative content. I want to say that, unlike that of other indexicals, the direction of fit of causal indexicals is hybrid: it is partly mind-to-world, partly world-to-mind. To categorize a target as “too heavy”, “within reach” or “fit for grasping between index and thumb” is to judge or evaluate the parameters of the target as conducive to a successful action upon the target. Unlike the contents of other indexicals, the content of a causally indexical concept results from the combination of an action predicate and an evaluative operator. What makes it indexical is that the result of the application of the latter onto the former is relative to the agent who makes the application. Thus, the job of causally indexical concepts is not just to match the world but to play an action guiding role. If it is, then presumably causal indexicals have at best a hybrid direction of fit, not a pure mind-to-world direction of fit.
In the previous section, I have argued that, unlike visual percepts, visuomotor representations provide visual information to motor intentions, which have nonconceptual content, a world-to-mind direction of fit and a mind-to-world direction of causation. I am presently arguing that the visual information of visuomotor representations can also serve as input to causally indexical concepts, which are elementary contextually dependent action concepts. Judgments involving causally indexical concepts have at best a hybrid direction of fit. When an agent makes such a judgment, he is not merely stating a fact: he is not thereby coming to know a fact that holds independently of his causal powers. Rather, he is settling, accepting or making his mind on an action plan. The function of causally indexical concepts is precisely to allow an agent to make action plans. Whereas personal, temporal and spatial indexicals lie at the interface between visual percepts and an individual’s conceptual repertoire about objects, times and places, causally indexical concepts lie at the interface between visuomotor representations, motor intentions and what Searle calls prior intentions. Prior intentions have conceptual content: they involve action concepts. Thus, after conceptual processing via the channel of causally indexical concepts, the visual information contained in visuomotor representations can be stored in a conceptual format adapted to the content and the direction of fit of one’s intentions — if not one’s motor intentions, then perhaps one’s prior intentions. Hence, the output of the motor processing of visual inputs can serve as input to further conceptual processing whose output will be stored in the ‘intention box’.
Aglioti, S., De Souza, J.F.X. and Goodale, M.A. (1995) “Size-contrast illusions deceive the eye but not the hand”, Current Biology, 5, 6, 679-85.
Anscombe, G.E.M. (1957) Intention, Ithaca: Cornell University Press.
Austin, J. L. (1962) Sense and Sensibilia, Oxford: Clarendon Press.
Bermudez, J. (1998) The Paradox of Self-Consciousness, Cambridge, Mass.: MIT Press.
Bridgeman, B., Hendry, D. & Stark, L. (1975) “Failure to detect displacement of the visual world during saccadic eye movement”, Vision Research, 15, 719-22.
Campbell, J. (1994) Past, Space and Self, Cambridge, Mass.: MIT Press.
Carey, D.P., Harvey, M. & Milner, A.D. (1996) “Visuomotor sensitivity for shape and orientation in a patient with visual form agnosia”, Neuropsychologia, 34, 329-37.
Castiello, U., Paulignan, Y. & Jeannerod, M. (1991) “Temporal dissociation of motor responses and subjective awareness. A study in normal subjects”, Brain, 114, 2639-2655.
Crane, T. (1992) “The nonconceptual content of experience” in Crane, T. (ed.)(1992) The Contents of Experience, Cambridge: Cambridge University Press.
Dokic, J. & Pacherie, E. (2001) “Shades and concepts”, Analysis, 61, 3, 193-202.
Dretske, F. (1981) Knowledge and the Flow of Information, Cambridge, Mass.: MIT Press.
Dretske, F. (1995) Naturalizing the Mind, Cambridge, Mass.: MIT Press.
Evans, G. (1982) The Varieties of Reference, Oxford: Oxford University Press.
Farah, M. (1990) Visual Agnosia: Disorders of Object Recognition and What They Will Tell Us About Normal Vision, Cambridge, Mass.: MIT Press.
Fodor, J.A. (1987) Psychosemantics, Cambridge, Mass.: MIT Press.
Franz, V.H., Gegenfurtner, K.R., Bülthoff and Fahle, M. (2000) “Grasping visual illusions: no evidence for a dissociation between perception and action”, Psychological Science, 11, 1, 20-25.
Gibson, J.J. (1979) The Ecological Approach to Visual Perception, Boston: Houghton-Miffin.
Goodale, M. A. (1995) “The cortical organization of visual perception and visuomotor control”, in Osherson, D. (1995)(ed.) An Invitation to Cognitive Science, Visual Cognition, vol. 2, Cambridge, Mass.: MIT Press.
Goodale, M.A., Pélisson, D., Prablanc, C. (1986) “Large adjustments in visually guided reaching do not depend on vision of the hand or perception of target displacement”, Nature, 320, 748-50.
Goodale, M. A., Milner, A.D., Jakobson I.S. and Carey, D.P. (1991) “A neurological dissociation between perceiving objects and grasping them”, Nature, 349, 154-56.
Haffenden, A. M. & Goodale, M. (1998) “The effect of pictorial illusion on prehension and perception”, Journal of Cognitive Neuroscience, 10, 1, 122-36.
Haffenden, A.M. Schiff, K.C. & Goodale, M.A. (2001) “The dissociation between perception and action in the Ebbinghaus illusion: non-illusory effects of pictorial cues on grasp”, Current Biology, 11, 177-181.
Jacob, P. (1997) What minds can do, Cambridge: Cambridge University Press.
Jeannerod, M. (1984) “The timing of natural prehension movements”, Journal of Motor Behavior, 16, 235-54.
Jeannerod, M. (1994) “The representing brain: neural correlates of motor intentions”, Behavioral and Brain Sciences,
Jeannerod, M. (1997) The Cognitive Neuroscience of Action, Oxford: Blackwell.
Jeannerod, M., Decety, J. and Michel, F. (1994) “Impairment of grasping movements following bilateral posterior parietal lesions”, Neuropsychologia, 32, 369-80.
Kaplan, D. (1989) “Demonstratives”, in Almog, J., Perry, J. & Wettstein, H. (eds.)(1989) Themes from Kaplan, New York: Blackwell.
McDowell, J. (1994) Mind and the World, Cambridge, Mass.: Harvard University Press.
McDowell, J. (1998) Précis of Mind and World, and Reply to Commentators, Philosophy and Phenomenological Research, LVIII, 2, 365-68, 403-31.
Millikan, R.M. (199 ) “Pushmi-pullyu Representations”, in J. Tomberlin (ed.) Philosophical Perspectives, vol. IX, Atascadero, CA.
Milner, D. & Goodale, M.A. (1995) The Visual Brain in Action, Oxford: Oxford University Press.
Milner, D., Paulignan, Y., Dijkerman, H.C., Michel, F. and Jeannerod, M. (1999) “A paradoxical improvement of misreaching in optic ataxia: new evidence for two separate neural systems for visual localization”, Proc. of the Royal Society, 266, 2225-9.
Nozick, R. (1981) “Knowledge and scepticism”, in Bernecker, S. & Dretske, F. (ed.)(2000) Knowledge, Readings in Contemporary Epistemology, Oxford: Oxford University Press.
Pavani, F., Boscagli, I, Benvenuti, F., Rabuffetti & Farné, A. (1999) “Are perception and action affected differently by the Titchener circles illusion”, Experimental Brain Research, 127, 95-101.
Peacocke, C. (1992) A Study of Concepts, Cambridge, Mass.: MIT Press.
Peacocke, C. (1998) “Nonconceptual content defended”, Philosophy and Phenomenological Research, LVIII, 2, 381-88.
Perry, J. (1979) “The essential indexical”, in Perry, J. (1993).
Perry, J. (1986a) “Perception, action and the structure of believing”, in Perry, J. (1993).
Perry, J. (1986b) “Thought without representation”, in Perry, J. (1993).
Perry, J. (1993) The Problem of the Essential Indexical and Other Essays, Oxford: Oxford University Press.
Pisella, L et al. (2000) “An ‘automatic pilot’ for the hand in human posterior parietal cortex: toward reinterpreting optic ataxia”, Nature Neuroscience, 3, 7, 729-36.
Pylyshyn, Z. (2000) “Visual indexes, preconceptual objects and situated vision”, Cognition, 80, 127-58.
Rossetti, Y. & Pisella, L. (2000) “Common mechanisms in perception and action”, in Prinz, W. & Hommel, B. (eds.)(2000) Attention and Performance, XIX, Oxford: Oxford University Press.
Searle, J. (1983) Intentionality, Cambridge, Cambridge University Press.
Searle, J. (2001) Rationality in Action, Cambridge, Mass.: MIT Press.
Stroud, B. (1989) “Understanding human knowledge in general”, Bernecker, S. & Dretske, F. (ed.)(2000) Knowledge, Readings in Contemporary Epistemology, Oxford: Oxford University Press.
Tye, M. (1995) Ten Problems about Consciousness, Cambridge, Mass.: MIT Press.
Ungerleider, L.G. & Mishkin, M. (1982) “Two cortical visual systems”, in Ingle, D.J., Goodale, M.A. & Mansfield, R.J.W. (eds.) Analysis of visual behavior, MIT Press.
Weiskrantz, L. (1986) Blindsight. A Case Study and Implications, Oxford: Oxford University Press.
Weiskrantz, L. (1997) Consciousness Lost and Found, Oxford: Oxford University Press.
Zeki, S. (1993) A Vision of the Brain, Oxford: Blackwell.
 For discussion, see Jacob (1997, ch. 2).
 The egocentricity of indexical concepts should not be confused with the egocentricity of an egocentric frame of reference in which the visual system codes e.g., the location of a target. The former is a property of concepts. The latter is a property of visual representations. One crucial difference between the egocentricity of indexical concepts and the ecogentricity of an egocetnric frame of reference for coding the spatial location of a target is that, unlike the latter, the former involves a contrast: if e.g., something is here, it is not there.