Tuesday, February 12, 2008

Open Question: Chomsky versus Turing, Round 1

What it is

I find writing blog entries from time to time to be a very pleasant source of output. However, one aspect of blogging that I hope not to overlook is the incredible ability to receive feedback in the form of comments. Therefore, I have decided to start making posts that explicitly grasp at feedback. To formalize these posts, I will put the important questions in bold with some framing statements interspersed between them; see below.

Basically I am highlighting issues that I am not quite certain about in hopes that other peoples' insight can help fill in the holes in what would otherwise be a declarative piece of swiss cheese.

The Question(s)

Why do we consider words to be part of a natural language, but we often consider methods and classes as “external” to a programming language? Why do programming languages seem to have such an immunity, save for keywords and arbitrary functions that find their way into the “standard library”? I find the spelling of the word “weird” to be a flaw with the English language, yet if I come across a method written for a C++ project that I feel uses Hungarian notation incorrectly (using Hungarian notation at all is a flaw in my book, but I digress) then I will blame that programmer and not the language itself.

Part of the issue is that we tend to view words as atoms in natural languages whereas with programming languages we label such things as primitives and operators (including keywords and perhaps some wrapper methods in standard libraries) as atoms. In this way, natural languages have a much broader palette of atoms than programming languages, though programming languages maintain the strength of being simpler to learn and understand. So my original question, comparing user-created objects and user-created methods to words, is a bit flawed and might be more accurate if I was comparing them to sentences; we can certainly blame an individual for sentences that don't fit the grammatical rules of the language. But wait—isn't that what syntax errors are the equivalent of?

So if keywords and operators are analogous to words, statements are analogous to sentences, syntax rules are analogous to grammatical rules, and type errors as well as certain run time errors (null dereferences, incorrect array indexing, invalid casting) are analogous to illogical utterances (“Losing is the reddest flatulence.”), then what is the natural language equivalent of the user-created class or method? Is it simply more words (neologisms, portmanteaus, etc.)? Maybe the atomic nature of class names and methods names is illusory; are they more like phrases or sentences?

If this seems like typical internet philosophical tallywhacking so far—then I'm doing my job as a game design blogger ;). But seriously, I do have a concern that legitimizes this post, if you will but take a moment to walk down a side road (it will loop back to the main street if I don't fall asleep first).

Wreaking Havok

I dabble in the Havok physics engine at work, and one thing that I have had to deal with recently illustrates a problem that I see with saddening frequency in the programming world. I wouldn't call it “reinventing the wheel”—perhaps just “reinventing the tread”. Anyway, the problem occurs when working with two libraries that have their own internal representation of the same concept. The concept in this particular example happens to be vectors; Havok has its own internal struct for vectors, and we have ours. Meaning that when I am doing vector math that crosses library boundaries I have to bloat my code with conversion function calls. For the uninitiated, vectors are used a lot in 3D games, in my experience more so than any other non-primitive data structure (save for perhaps containers). The concept that Havok vectors tries to encapsulate is the same as the concept that we try to encapsulate with our vectors, so it is unfortunate that compilers can't make this connection unaided. Worse, we can't expect Havok to get rid of their vector structs, nor can we get rid of ours and solely use Havok (what happens if we decide to let our Havok license end and pursue another physics solution?).

So something as common as Vectors seems like a nice concept to promote to some level above us and Havok. I'm not sure where, the language or the standard libraries? Ignoring this particular blemish, I would like to summarize this problem and what I believe to be its cause. Basically, with such a limited, domain-agnostic approach to programming languages, reinventing the wheel at the class and method level emerges from the design. I have been thinking a lot about improving programming languages lately, and one of the key issues that I tend to steer away from is redundancy. So is there a real problem here, and more importantly is there a solution?

I say “Yes” and “Yes”

I have two solutions, one of which needs more thought before I put it online. I will go into detail on the more reasonable, less risky one.

I believe that this problem arises from languages constraining their atom palette to be as generic as possible. I believe that the game industry would do well to get together and generate such a language. I also believe that this will never happen, so we as hobbyists and enthusiasts will have to continue to experiment in isolation until something takes off. I'm working on it, but trust me—we can benefit from someone far more educated in the field of programming language design than I solving this problem. If no one else steps up to the plate, I'll eventually finish my amateurish, patchwork solution and release it to the world—then you'll all be sorry!

Even a domain specific language for video game design runs the risk of becoming stale with how rapidly game development changes (though I see this slowing down as Nintendo consoles, cellphones, and more focus on casual/indie games draw us away from inventing new shader technology). We could make vectors, matrices, and quaternions primitive, but what do we do when some MIT prodigy discovers a better way to simulate bouncing breasts using a crazy new data structure called a xerbaton*? For such a domain-specific language to truly evolve with an evolving domain, we would have to work to standardize such things into the language itself in real time instead of us all creating our own xerbaton structs and management functions. I believe this implies being less strict about the size of our atom palette (the easy part) and more open to viewing the game development community as a team (think a WoW guild) as opposed to a bunch of competing teams (think football). The latter is the hard part.

So what do you think? Can programming languages learn from natural languages and improve by exposing their atom palettes (keywords, operators, standard classes and methods) to the public and encouraging open extension? Is this dangerous? Could this work for game development? Will it provide benefits by minimizing wheel/tread reinvention? I can think of examples of phenomenon like what I am proposing, but not exactly. For example, Java's libraries are modified and added to with nearly each major release, but this is more of a committee decision with a somewhat democratic input system and is by no means geared toward game design. Lisp takes a different route seen in the form of different dialects, but this is community extension at a different layer. The natural language equivalent would be if we decided the English language should start using periods instead of commas to separate digits in large numbers, or if we decided that adjectives should come after the nouns that they modify. Both of these examples seem to happen at a slow pace, in large steps by individuals or small groups of people. Could we as a community moderate such extension by an anonymous mass and come out ahead? Could we convince compiler and IDE vendors to play along to add syntax highlighting to new keywords (a dangerous concept for backwards compatibility in the first place, but I'll let that can of worms be opened in the comments to keep this from getting longer) and include new standard library files by default? To do this right seems to require something somewhat revolutionary ala Wikipedia. Or do you know of any project that already does what I have been describing in this post?

And the all-important meta-question: Do you like this open question format?

* picture a damped spring with a “sensuality” modulator, and please keep your hands out of your pants while doing so.


Eric said...

I'll be coming back for more commenting, but for now I'll toss out some thoughts on just one piece.

>> Why do we consider words to be part of a natural language, but we often consider methods and classes as “external” to a programming language?

Personally I am of the mentality that methods and classes are analogous to words; I simply see them as different kinds. But my view is the minority.

If I were to guess what causes many people to see them externally, it is likely the mutability. Programming languages provide an open framework for adding whatever vocabulary you wish to the language. This is not true (in the same way) for English. Granted, that has never stopped rebels like us from coining new words, but no one else uses those words either.

Although no one else uses most of the functions I write either, so maybe the difference isn't huge lol.

My gut tells me it is worthwhile to apply general linguistic concepts and categorizations to programming languages, but not to take it very far--especially into grammar. The forced fit of English into Latin grammatical terms sent the grammatical analysis of English askew for centuries. Forcing programming languages into the mold of English (or whatever) grammar could push us into a long chase down a road to nowhere.

More thoughts to come, and nice article!

eiyukabe said...

Yes, following natural language too closely can lead to the frightening issues of ambiguity, though I do theorize that natural languages more closely mimic how our minds think. It might just be that this seems true because I learned English at a much younger age than I learned C++, but if there is some advantage to be taken from ages and ages of natural language evolution we should definitely try to do so.

One of the more interesting benefits of talking to a person versus "talking" to a compiler is that people are generally more able to get around your mistakes and assume what you mean. Whether this is an attribute of natural language or the human mind (or both) I do not know.