A Look Into Domain Specific Languages
Not to long ago I had the opportunity to hear Charles Simonyi present on the topic of Intentional Programming (IP). In his presentation he discussed many of the underlying objectives and concepts that I found similar to those that we have in Business Process Management (BPM), the area where most of my experience lies. Reflecting the intentions of the developer (i.e. the business process) is the holly grail in terms of what we want to achieve in BPM, but where the “developer” is a domain expert rather than your traditional programmer. Unfortunately, the IP platform from Intentional Software, the company that Charles is a founder of, was not yet publicly available. As a result, I recently took a look at several other projects out their with similar objectives and concepts including Meta Programming System (MPS), Whole Platform, and XMF. Looking at these products naturally lead me to think much more about DSL(s), their architectures, and role in enterprise application development. At this point I’ll describe some of my high level findings to set the stage for some later postings and list some references for those interested in the topic.
1. Domain Specific Languages
A Domain Specific Language (DSL) is a specialized language engineered with the goal of implementing solutions for a particular problem domain. This is in contrast to General Purpose Languages (GPL) that are aimed at generally solving any type of problem. The key value proposition for a DSL is that is abstracts away the underlying complexities that are unnecessary for the targeted developer(s) in implementing a solution.
2. Horizontal vs. Vertical Domains
A domain is a problem space with a set of interrelated concepts. Domains can be further classified as being horizontal or vertical. In the context of DSL(s) Horizontal domains are those domains that are both technical and broad in nature with concepts that apply across a large group of applications. Examples of horizontal DSL(s) include the following (Note: I am biased towards Adobe technologies ;-).
- Flex: An open sourced framework and DSL that targets the user interface domain by enabling Rich Internet Applications (RIA) that are portable across multiple browsers.
- Cold Fusion: A language for creating data-driven web sites.
- Document Description XML(DDX): A declarative language used to describe the intended structure of a one or more PDF(s) based on the assembling of one or more input documents.
- LiveCycle Workflow: Similar to workflow languages such as BPEL, LC Workflow is An imperative language used for specifying the interaction of a process across one or more services.
- SQL & DDL: Structured Query Language and Data Definition Language(s) are standard languages used for querying and defining data structures respectively with respect to relational databases.
- Pixel Bender: An toolkit made up of a kernel language, a high-performance graphics programming language intended for image processing, and a graph language, an XML-based language for combining individual pixel-processing operations (kernels) into more complex filters.
Vertical domains on the other hand are more narrow by nature, pertain to an industry, and contain concepts that can only be re-used across a relatively small number of similarly typed applications. Good examples of vertical DSL(s) can be found in  and are listed below:
- IP Telephony & Call Processing: A modeling language created to enable the easy specification of call processing services using telephony service concepts.
- Insurance Products: A modeling language that enables insurance experts to specify the static structure of various insurance products from which a a J2EE web portal can be be generated.
- Home Automation: A modeling language for controlling low-level sensors and actuators within a home environment.
- Digital Watches: A wristwatch language for a fictitious digital watch manufacturer that enable the building of a product family of watches with varying software requirements.
Two important points with respect to horizontal vs. vertical DSL(s) are:
- Its much easier to find examples of horizontal DSL(s) rather than vertical. This is in part due to the fact that developers are the primary contributors to new languages and creating DSL(s), which is a technical activity of itself. Its often a natural next step to look towards DSLs after establishing a successful API and/or framework that itself provides a level of abstraction
- There is overlap between horizontal and vertical DSL(s) that must be managed. In his thesis Anders Hessellund describes the general problem of managing overlapping DSL(s) as the coordination problem .
In this section we classify DSL(s) into three broad classifications based on architecture. The three classifications; Internal, External, and Language Workbench were first termed by Martin Fowler .
In our spoken languages experts typically do not invent entirely new languages to support their domain. Similarly, many argue that DSLs should be built on-top of an existing language such that its usage can be embedded in the host environment. Rather than remain true to the original syntax of the host programming language, DSL designers attempt to formalize a syntax or API style within the host language that best expresses the domain of the language.
Figure 2: Internal DSL Architecture
The syntax support of a host language constrains how expressive an internal DSL can be to a great extent. Languages with a rigid syntax structure such as Java or C# have difficulty in supporting expressive DSLs, whereas flexible (& dynamic) languages such Ruby, Python, Groovy, and Boo are designed for such support. Projects such as JMock have shown that even syntactically rigid languages such as Java can still provide fairly expressive DSL(s) by implementing method chaining and builder patterns to give the illusion of a more natural language . Fowler refers to DSLs that employ such patterns to work within the confines of a rigid syntax as fluent interfaces .
Unlike Internal DSL(s), external DSL(s) exist outside the confines of an existing language. Examples of such languages are SQL, XPath, BPEL, and Regular Expressions. Building an external DSL can be a complex and time consuming endeavour. An external DSL developer is essentially starting from a blank slate and while that may empower them to an extent, it also means that they must handle everything themselves.
So what exactly does “everything” entail? Similar to implementing a GPL, an external DSL developer must implement a multistage pipeline that analyzes or manipulates a textual input stream, i.e. the language’s concrete syntax. The pipeline gradually converts the concrete syntax as a set of one or more input tokens into an internal data structure, the Intermediate Representation (IR).
The architecture of a external DSL can still vary significantly based on its runtime requirements.
The four broad subsets of the multi-stage pipeline above as described in  are:
- Reader: readers build internal data structure from one or more input streams. Examples include configuration file readers, program analysis tools and class file loaders.
- Generator: generators walk an internal data structure (e.g. Abstract Syntax Tree) and emit output. Examples include object-to-relational database mapping tools, object serializers, source code generators, and web page generators.
- Translator: A translator reads text or binary input and emits output conforming to the same or a different language. It is essentially a combined reader and generator. Examples include translators from extinct programming languages to modern languages, wiki to HTML translators, refactorers, pretty printers, and macro preprocessors. Some translators, such as assemblers and compilers, are so common they warrant their own sub-categories.
Language Workbench & Platform
In  Fowler describes the benefits of DSL(s) versus the cost of building the necessary tools to support them effectively. While recognizing that implementing internal, rather than external DSL(s) can reduce the overall tool cost, the constraints on the resulting DSL can greatly reduce the benefits, particularly if you are limited to static typed languages that traditionally have a more rigid syntax. An external DSL gives you the most potential to realize benefits, but comes at a greater cost of designing and implementing the language and its respective tooling. Language Workbench’s are a natural evolution that provide both the power and flexibility of external DSL(s) as well as the infrastructure (IDE, frameworks, languages, etc…) to greatly facilitate the implementation of the necessary tooling for the DSL. In the picture below we expand upon the concept of a Workbench to a Language Platform, which includes not only the Workbench but also the Runtime for the implemented languages as well.
The concept of a Language Workbench (or Platform) is relatively new. Much of initial theoretical work in this area was started by Charles Simonyi with the concept of Intentional Programming, which he started at Microsoft  and the work being done by his later company, Intentional Software. For purposes of this posting we will discuss two important aspects of Language Workbenches shown in the illustration above.
Multi-level Projection Editors
A projection editor is an editor for the concrete syntax of a language, whether that syntax is textual or graphical. These editors are tightly bound to their respective languages and offer the assistance capabilities which we now expect from modern editors such as code completion. Unlike, traditional free format text editors, projection editors can:
- work directly with an underlying abstract representation (i.e. an Abstract Syntax Tree)
- can bypass the scanning and parsing stages of the compiler pipeline
- direct users to enter only valid grammatical structures
- can be textual or graphical
- can offer multi-level representations that abstract out various views of the program for various users and enabling multi-level customizations of the end solution 
Its long been realized that compilers & interpreters are typically unable to take advantage of domain specific context encoded within the source code of a GPL. Moreover, as discussed previously DSL(s), both internal and external, often lack tooling support or acquire it through significant costs. Active Libraries enable DSL developers to overcome the overall lack of domain-specific tooling by taking an active role throughout the language tool chain, from the IDE, through the compiler, to the runtime. The are three types/levels of Active Libraries :
- Compiler Extensions: libraries that extend a compiler by providing domain-specific abstractions with automatic means of producing optimized target code. They may compose and specialize algorithms, automatically tune code for a target machine and instrument code. Czarnecki et al provided Blitz++ and the Generative Matrix Computation Library as two examples of libraries that extended a compiler with domain specific optimizations.
- Domain-Specific Tool Support: libraries that extend the programming environment providing domain-specific debugging support, domain-specific profiling, and code analysis capabilities, etc… The interaction between Tuning and Analysis Utilities (TAU), a set of tools for analyzing the performance of C, C++, Fortran and Java programs and Blitz++ are good examples.
- Extended Meta-programming: libraries that contain meta-code which can be executed to compile, optimize, adapt, debug, analyze, visualize, and edit the domain-specific abstractions. Czarnecki et al  discuss active libraries generating different code depending on the deployment context providing the example that they may query the hardware and operating system about their architecture.
For my purposes I was primarily interested in the domain-specific tool support afforded by active libraries (i.e. #2 above). In the case of Language Workbench’s we are looking for domain specific support for rendering, editing, re-factoring, reduction(i.e. compiling), debugging, and versioning capabilities.
4. Role of the Language Engineer
While the concept of DSL(s) and Unix “small languages” have been around for some time, the focus has been on slowly evolving General Purpose Languages (GPL). As a result, the industry currently lacks developers, patterns and best practices, as well as tooling that facilitate the design, implementation, and evolution of languages on a larger scale. Language Engineering is a more complex activity than typical software and system engineering. In-order for the benefits of DSL(s) to be fully realized there needs to be a new discipline evolved that focuses on engineering of domain specific languages. Similar to others  this paper asserts that there will be a paradigm shift that demands the creation of a new role, the Language Engineer, in the software development process. Furthermore, the creation of this role will significantly reduce the workload of application developers in building a given solution in a similar fashion to the introduction of 3rd generation languages and subsequently platforms (e.g. Java/J2EE and .Net).
While we would not expect to see the dramatic increase in productivity that we saw with the introduction of 3rd generation languages, we would still expect the productivity increases to be substantial once effective tooling is in place for both the Language Engineer and the Application Developer as the consumer of any given DSL.
5. Benefits & Costs
In this section we review & summarize many of the costs and benefits associated with creating and using domain specific languages.
Benefits of DSL(s):
- Domain Knowledge: Domain specific functionality is captured in a concrete form that is more easily and readily available to both application developers, who are direct language users, and domain experts. The benefit of capturing domain knowledge can be realized throughout an applications life-cycle.
- Domain Expert Involvement: DSLs should be designed to focus on the relevant detail of the domain while abstracting away the irrelevant details making them much more accessible to domain level experts with varying levels of programming expertise. In cases where there is already an existing notation understood by the domain expert the DSL should be designed with a concrete form to match that existing notation. For example, for users familiar with the Business Process Modeling Notation (BPMN), a related Workflow DSL should enable users to implement workflow solutions using familiar BPMN constructs.
- Expressiveness: DSL’s that are tailored to a specific domain can more concisely & precisely represent the formalisms for the specified domain. Furthermore, DSLs tend to be declarative in nature, enabling the developer to focus on the “what” while eliminating the need for them to understand or over specify their program with the “how.”
- Compilation & Runtime Optimizations: As discussed with Active Libraries the additional context afforded DSL(s) can provide for domain-specific optimizations at either compile and/or runtime. Adobe’s Pixel Bender language is an example of a language designed to provide optimizations by leveraging parallelization on multi core machines. Similarly, there are many use cases in the scientific community where DSL’s have been leveraged for abstraction and optimization to handle sparse arrays, automatic differentiation, interval arithmetic, adaptive mesh refinement, etc… – The expressiveness of DSLs not only eliminate the need for developers to over specify their code, but consequently provides more opportunity for the infrastructure to intelligently optimize where and when possible.
- Re-Use: Due to the additional costs associated with designing and supporting DSLs mentioned previously DSLs should not be created for isolated solutions. However, in cases where a problem in a particular domain (horizontal or vertical) is repeating there is ample benefit to be gained by re-using the expressiveness of DSLs and the other benefits that follow.
- Reliability: Like libraries once tested DSL(s) provide higher level of reliability when re-used across projects.
- Toolability: By associating a concrete and abstract syntax to a language, such as is the case with External DSL(s) or DSL(s) defined in a Language Workshop, tools are better enabled to analyze and provide guidance through assistance or verification tooling.
Costs of DSLs:
- Tool Support: One of the primary benefits of creating DSL(s) vs. application libraries is the ability to add tooling based on the concrete syntax of the DSL, however today such tooling does not come for free, it must be built as part of the cost of creating the DSL. Tooling features include features such as code guidance/assistance, re-factoring, and debugging.
- Integration: The successful development of most applications today requires the convergence of multiple views. Business analysts, domain experts, interaction designers, database experts, and developers with different types of expertise all take part in the process of building such applications. Their respective work products must be managed, aligned and integrated to produce a running system .
- Training Cost: As opposed to mainstream languages, DSL users will likely not have an established specification to look to for guidance. However, this cost can be offset by the degree to which the DSL narrowly reflects and abstracts the domain concepts.
- Design Experience: DSL platforms are not yet widely adopted in the software industry. As a result, there is an evident lack of experience in the field around language engineering, language design patterns, prescriptive guidelines, mentors, and/or research.
Anyway, hope to have some more postings related to DSLs in the not to distant future. At this point I am attempting to pull together a lot of good ideas from various areas. Which reminds me here is the list of docs referenced throughout this posting.
 T. Veldhuizen and D. Gannon, “Active Libraries: Rethinking the role of compilers and libraries,” page 2
 A. Hessellund. “Domain-Specific Multimodeling,” Thesis. IT University of Copenhagen, Denmark. page 15.
 S. Kelly & J. Tolvanen, “Domain-Specific Modeling,” John Wiley & Sons, Inc. 2008, Hoboken, New Jersey.
 S. Freeman, N. Pryce. “Evolving an Embedded Domain-Specific Language in Java,” OOPSLA, Oct 22-26, 2006, Portland, Oregon, USA.
 T. Parr, “Language Design Patterns, Techniques for implementing Domain Specific Languages, ” Pragmatic Bookshelf, 2009. pages 14-16.
 K. Czarnecki, M. Antokiewicz, and C. Kim. “Multi-Level Customization in Application Enginneering, ” Communications of the ACM, Vol I, December 2006. pages 61-65
 C. Simonyi. “Intentional Programming – Innovation in the Legacy Age, ” Presented at IFIP WG 2.1 meeting. Jun 4, 1996
 K. Czarnecki, U. Eisenecker, R. Gluck, D. Vandevoorde, and T. Veldhuizen. “Generative programming and active libraries (extended abstract). ” In M. Jazayeri, D. Musser, and R. Loos, editors, Generic Programming ’98. Proceedings, volume 1766 of Lecture Notes in Computer Science, pages 25–39. Springer-Verlag, 2000.
 Meta Programming System, http://www.jetbrains.com/mps/index.html
 T. Clark, P. Sammut, J. Willans. “Applied Metamodeling A Foundation For Language Development, ” Ceteva 2008.
 R. Solmi. “Whole Platform, ” Thesis for Department of Computer Science at University of Bologna. Italy. March 2005
 B. Langlois, C. Jitia, E. Jouenne. “DSL Classification,” In 7th OOPSLA Workshop on Domain-Specific Modeling, 2007.
 OMG, Meta Object Facility (MOF) 2.0 Core Proposal, ad/2003-04-07, 7 Apr 2003.
 A. Kleppe. “Software Language Engineering.” Addison-Wesley Professional. 12/09/2008. Page 19.