I want to build a robot that can dance to rock & roll with a partner. You know, the type of loose dancing you do at parties, where there aren’t any predetermined moves but (if you are any good at it) you respond to your partner’s motions in various ways.
Why build a dancing robot?
This project calls the bluff on a lot of the rhetoric I and other situated types have been spouting for the last few years. Dancing is an activity that has all the characteristics that we have argued are typical and which the blocks world lacks.
Dancing is an on-going activity. There’s no goal to be achieved. You can’t win or lose. It’s not a problem to be solved. There’s no way to do it “right” or criterion of success. You get incrementally better at it with experience. Most real activities are like this.
Partner dance is a social activity. Good dancing is a collaborative accomplishment of the two people involved. Most of the social activities we’ve looked at are predominantly cognitive and require language use, and are therefore difficult. Dancing is noncognitive and does not requires language.
Dancing involves nonverbal communication and synchronization. I believe that understanding these is a prerequisite to understanding “more advanced” sorts of representation and intentionality.
Dancing requires a real body with real perception. However, the motor and visual tasks are very different from those that are taken as prototypical by current robotics and vision research, and are, I think, more typical. Dancing is paradigmatically a process of interaction – interaction with your own body, with the surrounding space, and most importantly with your partner.
Dancing is a good domain for looking at learning by apprenticeship and imitation. You don’t learn to dance by reading a book or by proving theorems or by being told or by doing experiments in a laboratory, you learn it by doing it with someone who is better at it than you are. I claim that this is the primordial and most important form of learning.
The experience of embodiment
Dancing is an experience of one’s own body. AI, when it talks about bodies at all, thinks of them as tools that are somewhere down the hall and which you use to execute plans. But we typically experience ourselves as being our bodies – never more so than when dancing.
In dancing, you coordinate four sources of ongoing experience: your experience of your body, your visual experience of your partner, the music, and what you are feeling.
- Your experience of your body consists both of what you are doing and what’s happening as a result. Often what happens isn’t quite what you expect; it puts you off-balance or feels too tight or loose or jerky or smooth, and this feeling is part of the experience that contributes to what you will do next.
- As you dance, you watch what your partner is doing, and do something that complements that. You may adopt a similar style or rhythm, follow his changes, suggest new patterns, act mock fear in response to mock aggression, and so on. All this follows from an ability to experience your partner’s activity (typically visually in this sort of dancing).
- As you dance, you listen to the music, and you do things that feel like what the music feels like. If the music is nervous you might dance jerkily; if it is menacing you might dance violently; if it is sexy you might dance suggestively. And, of course, your motions will typically be timed to coincide with the beat.
- Dancing is an emotional business; it brings up feelings, and your dance orients to those feelings.
Taking dance as a prototype of activity will force us to develop ideas about representations that are kinesthetic. Introspectively, it seems to me that much of even abstract reasoning, when I’m proving a theorem for example, involves imagining performing bodily operations on imaginary spatial objects. (The vocabulary of mathematics supports this; we talk of “retractions” and “surgery” and “pumping” for example.) I believe that such representations are very important and that they underlie the sorts of representations we are used to thinking about. There’s a lot of developmental psychological evidence for this, and evidence also from linguistics. For example, an awful lot of the vocabulary for describing the shapes and motions of inanimate objects are derived from the vocabulary for describing human body parts and actions.
Taking dance as a prototype of activity will also force us to develop ideas about representations as temporal. Not just representations of time, like interval algebras, but representations in time. I believe that mental activity has rhythms that reflect the rhythms of the physical activity we are engaged in, and that a structural coupling between these two rhythms, somewhat after the manner of a phase-locked loop, is both an integral part of our ability to engage in the concrete activity and in itself constitutes a representation of the temporal structure of that activity. This is a very different sort of representation from those taught in computer science courses; it is a representation by a process, not by a token.
Pursuing this project would require a much deeper understanding of the phenomenology of dance than I have. For a start, I would want to look hard at a lot of videotape of people dancing and to talk to dance teachers about the learning process.
Kinesthetic and temporal representation require kinesthetic and temporal perception. Phenomenologically, we perceive gestures kinesthetically. We feel what another person is doing. In imitating a partner’s dance style, the task is to make your body do something that the actions you see feel like. Mechanistically, this probably means that the visual scene is “represented” in the same terms we use to represent our experience of our own bodies.
Dance orients to rhythm, and so rhythm perception, leading to rhythmic representation, is key. You must perceive both the rhythm of the music and the rhythm of your partner’s dance. I’ve argued elsewhere (in the abstract-emergent paper) that the ability to perceive rhythm and simultaneity is very important for learning more generally.
The vision “problem” in dancing is very different from the vision problems usually studied. In particular, no object recognition is involved. At most you have to pick out your partner from the background, and this might well be finessed in implementation by having the partner wear special clothes and dance against a contrasting backdrop. Similar, it is not necessary to build an accurate CAD-type model of the object of interest. Agre, Horswill, and I have argued elsewhere that an emphasis on object recognition and solid modeling as the goal of vision is misguided. The task here is rather to characterize your partner’s dancing in terms of the sorts of representation I’ve been discussing. What is needed is to perceive the temporal structure of your partner’s actions and their “manner”: are they sinuous or frenetic or emphatic or blithe? It may be useful, particularly when trying to imitate particular gestures, to compute a model of the partner’s body’s position in phase space (joint angles and velocities), but this model probably need not be very accurate, and probably is not a precursor to perceiving “manner”.
The music perception task is parallel to the vision one in these ways; what is needed is to extract the basic beat and feel, but no detailed representation is necessary or useful.
To dance you must also perceive your own body. I’ll take this up in the next section.
Robotics is traditionally obsessed with assembly tasks; these demand accurate positioning of manipulators. The trend toward mobile robotics and navigation as an alternative is I believe salutary. Developmentally, tasks involving accurate manipulation appear well after the child is routinely and accomplishedly using its body in other ways. Dancing does not require accurate limb control. This makes most of robotics inapplicable, and fortunately eliminates most of traditional robotics’s problems.
The approach robotics has taken to accurate manipulation control has been to make manipulators which can be commanded accurately and to develop mathematical theories of manipulator dynamics. This has meant in particular that robot manipulators have been rigid, because flexible manipulators can not be commanded accurately. Rigid manipulators are heavy, and consequently power-hungry; we couldn’t build a dancing robot based on current manipulator technology because it would be extremely dangerous and because it would be impossible to get enough power to it. But a dancing robot could be built with plastic limbs, with some flex in the members and backlash in the joints. It could be light enough to be safe and might even have low enough power requirements not to need a tether.
There’s not much feedback you can get from a rigid manipulator: the joint angles and their time derivatives tell you everything you could know. But in a flexible body, you can get a lot of complex feedback back, in the form of measurements of the deformations of the flexible members and of the backlash in the joints. This provides rich lived experience: the sensory signals on which kinesthetic representation can be based.
Because dancing is not a manipulation task, all the standard problems of robotics like inverse dynamics go away. You don’t need to accurately control the end-effector, so you don’t need an accurate dynamics. This is just as well, because we probably can’t get an accurate dynamics for a flexible kinematic chain. However, the rich feedback such a chain can provide can be the basis for guiding motions by feedback, by interaction with ongoing unpredictable events, rather then by rigid command (of position or of forces). Understanding this in the case where accurate positioning is not needed might eventually to understanding how to do accurate control with flexible arms as well.
The first hard robotics problem in a dancing robot would be balance: can you keep it from falling over? Raibert has had trouble enough getting robots to run; why should we expect to be able to make one that can dance? Perhaps we can learn enough from his work to do more. But we could also finesse it by attaching the body to a rigid stand. This wouldn’t compromise the motivations of the project I explained in the first section (“why build a dancing robot?”).
I believe (following in particular the Soviet activity theory school, and much other psychological and sociological research) that learning by apprenticeship is crucial to cognition generally. The project I’m outlining here is a good opportunity to make computational studies of such learning.
I am thinking in particular of the possibility of learning by imitating particular gestures. A dancer has a vocabulary of moves with which she communicates. Dancing with someone, you may see her do something you haven’t done before. You may imitate it, if you can, to see how it feels. Probably the first few times you make that move it doesn’t really come out right. This is probably because your kinesthetic perception doesn’t build accurate models of the other’s motions (because it doesn’t need to) and because you couldn’t accurately imitate it anyway (because you can’t command your limbs to accurately conform to even an accurately represented motion pattern, because you don’t need to be able to). However, each time you try to imitate a gesture, you do it a bit better. You get feedback by seeing what happens and how it feels (and perhaps looks, to the extent that you can watch yourself dance). (Chris Atkeson’s work is relevant here.)1
I believe that the transmission of culture depends crucially on the apprentice being similar to her mentor. Computers can’t be intelligent not only because they don’t have bodies, but because they don’t have human bodies. Before a computer can learn anything useful, it has to be able to act human enough that people will accept it as human for the relevant purposes. (The “framing” processes by which this works are explained in Kenneth Kaye’s brilliant and relevant book The Mental and Social Life of Babies: How Parents Can Create Persons.) Dancing is a task circumscribed enough that it may be possible to build a robot good enough that people will enjoy dancing with it and so treat it as human for dancing purposes and so enable it to learn.
It’s hard for people to learn to dance. Why should it be easy for the robot? Perhaps there’s a “superhuman human” fantasy here. I’d like to learn more about the process of learning to dance: what makes it difficult for people.
Scope and limitations
Is this project feasible? I don’t know. It’s certainly very ambitious. It may be feasible, but not for me; my attention span is about a year, and this is five-to-ten year project. I’m passionate enough about it at the moment that I intend to learn more about what’s needed. It may also be feasible, but only at one of a very few places; plausibly only MIT has the resources to make it possible, if indeed anywhere does.
Taking dancing out of context necessarily does violence to it. Anything less than building a complete person is always going to involve terrible compromises. For example, much of what you do on the dance floor is to make motions reminiscent of specific motions that have significances in other activities, so that you draw on your experience in the rest of life. A robot won’t have that experience to draw on.
The fact that there’s no “right” way to dance makes evaluation difficult.
If dancing is too hard, an easier task would be to build a robot with one or two arms and optionally a head which can imitate gestures.2 This factors out the balance issue and a lot of the mechanical problems, because you can use external motors with tendon drive. It’s a much less interesting task, but perhaps one that illustrates enough of the relevant issues while being tractable.
- 1. Rereading this in 2016, I have zero memory of what Chris had done by 1990, and haven’t been able to locate it easily. I’m not sure how much my proposal was influenced by his work. However, it turns out that he has recently started working on soft humanoid robots, along lines generally similar to what I wrote here! Check out “What's Next For Humanoid Robotics?”, a seriously cool Powerpoint presentation.
- 2. The Cog Project did precisely this. I suspect that it was influenced by “Robots That Dance”; no one else had previously suggested anything similar as far as I know. I never got around to asking Rod Brooks (my PhD advisor and subsequently PI of the Cog Project), so I’m not sure.