2010-03-26 - Designing DSLs ======================================================================== >>> Designing DSLs Programming languages differ in numerous ways: 1. Each uses different notations for writing down programs. As we've observed, however, syntax is only partially interesting. (This is, however, less true of languages that are trying to mirror the notation of a particular domain.) 2. Control constructs: for instance, early languages didn't even support recursion, while most modern languages still don't have continuations. 3. The kinds of data they support. Indeed, sophisticated languages like Scheme blur the distinction between control and data by making fragments of control into data values (such as first-class functions and continuations). 4. The means of organizing programs: do they have functions, modules, classes, ...? 5. Automation such as memory management, run-time safety checks, and so on. Each of these items suggests natural questions to ask when you design your own languages in particular domains. Now, you know (hopefully) what programming languages are used for. One example that should not be ignored is using a programming language to write a programming language -- for example, what we did so far (or any other interpreter or compiler). In the same way that some series of statements in a PL can be used to represent things to do in the "real world", there are other statements that can be used to represent things to do in your language. For example, the meaning of `one-brick' might abstract over laying a brick when making a wall -- it abstracts all the little details into a function: (define (one-brick wall brick-pile) (move-eye (location brick-pile)) (let ([pos (find-available-brick-position brick-pile)]) (move-hand pos) (grab-object)) (move-eye wall) (let ([pos (find-next-brick-position wall)]) (move-hand pos) (drop-object))) allows us to write (one-brick my-wall my-brick-pile) instead of all of the above, and and in the same way we can think of a loop as an abstraction over the details of executing some sequence of operations over and over again, for example this: (define (build-wall wall pile) (loop-for i from 1 to 500 (one-brick wall pile))) instead of: (one-brick my-wall my-pile) (one-brick my-wall my-pile) (one-brick my-wall my-pile) (one-brick my-wall my-pile) ... Now going back to the different ways in which you can use a language: there are lots of domain specific languages. For example, four of the oldest languages are: Fortran -- Formula Translator Algol -- Algorithmic Language Cobol -- Common Business Oriented Language Lisp -- List Processing Only in the late 60s / early 70s languages began to get free from their special purpose domain and become "general purpose" languages (GPLs). These days, we usually use some GPL for our programs and often come up with small "domain specific" languages (DSLs) for specific jobs. The problem is designing such a specific language. There are lots of decisions to make, and as should be clear now, many ways of shooting your self in the foot. You need to know: * What is your domain? * What are the common notations in this domain (need to be convenient both for the machine and for humans)? * What do you expect to get from your DSL? (eg, performance gains when you know that you're dealing with a certain limited kind of functionality like arithmetics.) * Do you have any semantic reason for a new language? (For example, using special scoping rules, or a mixture of lazy and eager evaluation, maybe a completely different way of evaluation (eg, makefiles).) * Is your language expected to envelope other functionality (eg, shell scripts, TCL), perhaps throwing some functionality on a different language (makefiles and shell scripts), or is it going to be embedded in a bigger application (eg, PHP), or embedded in a way that exposes parts of an application to user automation (Emacs Lisp, Word Basic, Visual Basic for Office Application or Some Other Long List of Buzzwords). * If you have one language embedded in another enveloping language -- how do you handle syntax? How can they communicate (eg, share variables)? And very important: * Is there a benefit for implementing a DSL over using a GPL -- how much will your DSL grow (usually more than you think)? Will it get to a point where it will need the power of a full GPL? Do you want to risk doing this just to end up admitting that you need a "Real Language" and dump your solution for "Visual Basic for Applications"? => It might be useful to think ahead about things that you know you don't need, rather than things you need. ======================================================================== >> Sidenote: WSJ on the proliferation of PLs "Computer Languages Multiply, Pleasing Many--But Not All" Wall Street Journal (12/14/05) P. B1; Gomes, Lee While the proliferation of languages has been a boon to software programmers, the extensive variety often frustrates their bosses and confounds the larger software companies. C and the subsequent C++ may be the most popular languages in use today, but any programmer working on the Web must also include languages such as Perl, Python, PHP, and TCL in his resume. The explosion has been partially fueled by the ability of an individual programmer or a small group to create and market a language, as was the case with Ruby on Rails, which became an overnight sensation thanks to a 15-minute demonstration video the Danish programmer David Hansson circulated over the Web. Once a language has gained a core following, blogs and Web sites appear to track its developments. Many languages owe their origins to small design firms trying to make a commercial success of themselves, while others are labors of love, as is the case with many open source projects. As new languages continue to emerge, however, more programmers are defecting from mainstream systems such as .NET and Java in favor of niche offerings that are more tailored to a specific project. CIOs are often assailed by complaints from their programmers when they try to impose restrictions on the number of languages that are permissible. While it has been demonstrated theoretically that each language is the rough equivalent of any other, it is no more likely for a consensus to appear within the programming community than it is for a single car to be met with a universal embrace from the entire fleet of motorists. ========================================================================