Archive for the ‘Programming Language’ Tag
Well, finally getting back to some (or a) posting after some time. Fortunately or unfortunately depending on how you look at it I was heads down in project work at my last semester at GA Tech. Now, I have a few brief moments between ramping up on a new gig and the moving thing (back to SJ, once again). Figured I would take some time to convert over an overview of a fun project I did along with Shauvik Roy Choudhary and Jared Klumpp, two talented CS students here at Tech and authors of the content below. It was part of Nate Clark’s CS8803-DynamicCompilation course, which I highly recommend for anyone reading this from Tech. The class ROCKED, had quite the cast of characters in there with strong backgrounds, but unfortunately I missed the Tab tasting day, but wtf drinks Tab anyway 😉 Ok, thats a few inside remarks, but for anyone interested in VMs and Tracing read on. The idea is to improve tracing results by fusing loops together prior to any recording. The project was in large part driven by the “Trace-based just-in-time type specialization for dynamic languages” PLDI 09 paper, and thanks out to initial ideas from guys back at Adobe and Mozilla IRC channel.
Trace compilation represents a different paradigm than traditional just-in-time (JIT) compilation techniques. Rather than adhering to principles derived from traditional static compilers such as whole method compilation, trace compilation relies on runtime profiling information, making it unique to the domain of dynamic compilation. With trace compilation program execution starts in the interpreter. While the interpreter executes bytecodes, the VM maintains profiling information to identify frequently executed (hot) loops within the program. In a trace-based approach it is assumed that backward branches represent a control flow transfer from the end of the loop to the beginning of the loop. When such branches becomes hot, i.e., when the execution count crosses a certain threshold, traces are recorded and compiled. The benefits of tracing include the ability to easily inline procedure calls, simplified on-stack replacement, and the compilation of only relevant code areas (cold code is ignored).
Trace compilation was a major component of the JIT compiler provided in the FireFox 3.1 browser . The trace based JIT, TraceMonkey, used trace compilation as a means of asserting type stability within a loop. Unlike method-based dynamic compilers, TraceMonkey operates at the granularity of individual loops. This design choice was based on the expectation that programs spent most of their time in hot loops. Furthermore, we can expect hot loops to be mostly type-stable, meaning that the types of values are invariant. TraceMonkey relies on type stability to generate type specialized code during tracing, obtaining an overall 10x speedup in performance .
The objective of this project, JSS, is to facilitate Trace-Monkey’s use of Trace compilation by increasing the scope of what gets included in a trace. More specifically we look at applying loop-fusion as a means of effectively fusing what would otherwise be separate traces, creating larger traces overall. The details of our implementation are discussed in section 3, followed by the results in section 4. Fusing loops before the tracer adds difficulties because parsing and compilation into a bytecode representation within TraceMonkey occurs in a single pass. These constraints and their potential workarounds are discussed in section 5.
2. Related Work
There has been much significant work in the area of Tracing as of recent. In particular, the benefits of trace over whole method compilation are still being debated and evaluated. Accordingly, the two areas of related work discussed in this section are investigations into whole method compilation and expansions of prior work in trace compilation for dynamic languages.We discuss both below and present several other more loosely related topics.
At the same time there is possible work going forward to improve the usage of trace complication in the context of dynamic languages. Andreas Gal has discussed extending TraceMonkey’s current implementation with parallel dynamic compilation and recompilation of traces, the folding of trees with minimal differences, more precise load propagation within a tree, and improved register allocation .
In this paper, we improve the trace based type specialization in TraceMonkey by performing loop-fusion on the program’s AST. Hence, this augmented technique leverages both static as well as dynamic information of the program, thereby harnessing the benefits of both.
JSS seeks to broaden the granularity at which tracing occurs. Increasing the vision of the tracer through loop-fusion reduces the number of loops, and inherently branches, as well as increases the opportunity for optimizations within the loops. JSS therefore performs loop-fusion during parse time in order to occur before the tracer and avoid adding additional passes, which would decrease overall performance.
The TraceMonkey parser scans and parses one statement at a time. After a statement has been parsed, the generated AST is emitted to a stack-based intermediate representation and parsing begins for the next statement. Whenever loops are emitted to IR, additional instrumentation is inserted to control the tracer and count the number of trips. After the file has been parsed and emitted, the interpreter begins executing at the top of the IR. When the trace instructions have been executed a specified number of times, execution pauses while the tracer consumes the IR and emits native code for that loop. Parsing also performs rudimentary data-flow analysis for constant propagation and folding. JSS uses this information to characterize loops at parse time and identify data-flow dependencies. Only loops within a higher-level statement construct, such as a function, may be fused. This occurs because of the self-imposed restriction that loop-fusion occurs within parse time and characteristic of the parser to emit the IR on a per-statement basis (as opposed to emitting after the entire program has been parsed). First the AST nodes were augmented with an additional data structure, LoopInfo, which characterizes the loop. The characteristics required for fusion are the loops bounds, direction and stride. Identifying these features requires the use of usedef chains created during parsing. Very basic induction variable identification assumes that the induction variable must occur in the condition of the loop, for the proof-of-concept. The induction variable is then traced back to its definition through the use-def chains to identify one of the constant bounds. The other side of the conditional expression is then traced through the use-def information to identify the other constant bound. If either of these bounds results in an expression or global variable, then the array bounds cannot be safely determined and loop-fusion is not possible. Both the loop body and header (consisting of the conditional) are examined to identify the stride. The stride must be a monotonic, constant change, and only basic induction variables are supported (such as d = d +/- C). Dependency identification at parse time proved more challenging, because the ASTs lack features that would greatly ease their identification. Were these features present, they would still be undesirable because they would essentially add several passes of analysis for each loop. Instead JSS identifies dependencies after the loop has been completely parsed, using the aforementioned use-def chains and list of generated variables. Because the variables used by a loop cannot be propagated to the top-level AST node without augmenting every aspect of the parser, the list of all variables in the program serves as the mechanism for identifying dependencies. After loops are parsed the list of variables is scanned and each variable’s use-def set examined for a use or definition occurring within the loop. If this occurs then the loop will be conservatively marked as carrying a dependency on that name, although this could be improved to more aggressively remove dependencies. The code lying between two loops a and b can also create potential dependencies. If the definition of a variable, x1 used in loop b, depends upon a variable x2 used or defined in loop a, then a dependency is formed from b to a. This is the only dependency-creating condition, but additional source transformations must occur for code lying between the two loops:
- The intermediate code contains the use of a variable defined or used in a. In this situation the code can be sunk below b.
- The definition of a variable that is used by b occurs in the intermediate code. In this situation the definition can be
hoisted above a.
The appropriate actions for each of the above two situations occurs for all intermediate code between a and b. Any remaining code carries no dependence on the loops and can arbitrarily be sunk or hoisted depending upon other data-flow dependencies. The final dependency causing condition occurs because of global variables and function calls. Loops are dependent upon any global variable defined or accessed within the loop, and if the use or definition of a global variable occurs between the two loops then they cannot be merged, because the effects of globals are difficult to determine. This is discussed in further detail in section 5, Problems. Function calls cause dependencies from the loop to all global variables, because the effects of a call cannot be determined at parse time. Two loops a and b are then considered for fusing after the identification of loop-characteristics and dependencies. First the dependencies are checked to ensure that there is no dependency from b on a and that the intermediate code between a and b does not carry a use of a dependency from a in the same statement as the definition or use of a dependency from b. If the two loops pass the dependency analysis then they can be transformed and merged. The merging process first creates a pre-header for both loops and moves the initialization variables (for for-loops) into this pre-header. The update-component is then appended to the end of the loop body for for-loops. Then, the trip count for both loops is calculated. If both loops execute the same number of times then the bodies are fused and the condition for the fused loop comes from the first loop. Otherwise, the loop with the smaller trip count has a guard with its current condition inserted around the body, the two bodies are fused, and the condition for the fused loop is created using the condition condition from the larger loop. This is clearly an opportunity for loop-peeling and is discussed in section 6, Future Work.
TraceVis1 was used to profile TraceMonkey and determine how much time the tracer was spending in each phase of execution. TraceMonkey emits timing events to a log events whenever the compiler phase changes, which the TraceVis tool can later analyze. The post-execution analysis reduces the overhead and provides more accurate timing results. The two TraceVis tools used were Vis, a visualization creating tool, and binlog, a text-based analysis tool that was used to quantify times spent in each phase. These tools were used to analyze the performance of several benchmarks before and after application of JSS.
The results of a specific test case we designed for loopfusion and two benchmark programs from sunspider  are shown in table 1.
The first program, which we designed, tests a best-case situation where two loops are immediately next to each other, increment in the same direction, but have different bounds. The second program, modified from sunspider to allow for loop-fusion, measures binary tree access times. The third program, access-nbody.js, was the only SunSpider test that allowed for loop-fusion. The bound for access-nbody’s fusible loops, although the same was variable rather than constant. JSS was modified to account for variable bounds. Each benchmark was repeated 100 times and the average speed found. Figure 2 shows the amount of time spent in each phase of TraceMonkey, with different colors representing the different phases. The colors have been shown in the key in Figure 2(g).
Applying our optimization should result in the reduction of the overhead, which would mean less total time spent monitoring, recording, compiling and setting up and tearing down. The first and third images show the timing without the JSS optimization while the second and fourth show the timing after applying JSS, respectively. The second graph (Fusion1-Post) shows a definite reduction in the amount of time spent monitoring and recording traces, as well as fewer number of traces, indicating the application of JSS.
The results of applying JSS to the first benchmark is clearly visible, with a greatly reduced amount of time spent in the overhead. The results from table 1 help to more clearly quantify the change, and show that the time spent monitoring and recording traces was significantly lower after JSS than before. This reduction made up for the additional time spent in compilation because of the larger amount of code being compiled. The estimated speed up increased by 0.01, indicating improved performance. However, the additional guard added to prevent the loop with the smaller trip count from executing too often caused more time to be spent in native execution and caused the benchmark to execute more slowly than pre-JSS. Future optimizations, such as loop-peeling, could remove the need for a guard and increase the performance of JSS.
The second benchmark ran over a larger period of time, increasing the benefit of JSS. The time spent monitoring traces was reduced, and the overall native execution time was much lower. Discrepancies between this benchmark and the previous could be due to better loop-bounds alignment that removed the need for a guard to prevent the fused loop from executing too often.
The third benchmark shows the greatest improvement due to JSS. The time spent in the interpretation, monitoring, recording and compilation phases was greatly reduced and the code spent more time running natively. This third benchmark highlights the benefits of JSS, benefits that could be realized in other less-optimal situations with additional optimizations.
Side exits were ignored in JSS to show that loop-fusion was a viable optimization for increasing performance. Detecting side exits would not be difficult, but would require traversing an AST after parsing completion or carrying a flag up the tree to show that a break or continue statement occurs. Traversing the AST after creation is undesirable because it would add extra processing time by adding another analysis pass. Adding flags to the tree would require augmenting the parser in many locations and adding additional information to the AST node objects.
Global variable dependencies are not particular to Java-Script, but their difficulties are certainly more pronounced than other languages. Because variables default to global, they abound in real-world scripts. The global variables then cause unnecessary dependencies because the dependencies cannot be easily determined statically. They also cause problems because any loop calling a function causes a dependency from that loop to every global variable. Methods for removing the pressure from global variables could include copying the global variable in the pre-header and using the copied name inside the loop. Analyzing called functions would reduce the number of dependencies, but would add analysis time. As shown in the example in Figure 3, both variables a and b are globals as their declaration is not preceded by the keyword var. A naive analysis would wrongly try to fuse the two for loops that are declared on lines 3 and 7. However, function bar which is called at line 4 of the first loop, changes the global variable b making the loops infusible. Hence, a more thorough inter-procedural analysis is required to handle global variables.
6. Future Work
Future work would improve the safety of JSS and add other loop-related optimizations that could increase the performance of the resulting code. Currently the induction variable is identified based on the variable used in the loop condition, but true induction variable analysis would lead to a more accurate optimization. Dependency analysis, loop-peeling, and induction variable normalization would all aid in improving the performance and number of opportunities available to JSS.
Current dependency analysis marks any use or definition of a used variable as a dependency. The number of dependencies could be reduced by analyzing variable uses and reordering fused code to remove unnecessary dependencies. For example, if the same array, A, is used in both loops, the access variables can be analyzed to determine if it is a true dependence. This analysis could be performed using a standard combination of tests, such as GCD, Banerjee Inequalities, or I-Test. These tests have the undesirable quality of slowness, but a balance between analysis time and conservatism could possibly be found.
Loop-peeling could also increase the performance of JSS by removing the need for guard checks when the trip counts of the two loops are different. The guard checks add several unnecessary branch instructions to ensure the side effects after fusion remain the same as before. Although loop-peeling would reduce the amount of native code, because it would not occur within the trace, it would reduce the number of checks and branches occurring within the native code.
Finally, induction variable normalization would greatly improve the number of loops that could be fused and increase opportunity for additional optimizations. Normalization would allow loops that run in opposite directions to be fused, increasing the number of loops that could be fused by JSS. It would also increase opportunities for additional optimizations, such as induction variable elimination and strength reduction.
 A. Gal. Tracing the web. URL http://andreasgal.wordpress.com/2008/08/22/tracing-the-web/. Accessed on 06 May 2010.
 A. Gal, B. Eich, M. Shaver, D. Anderson, D. Mandelin, M. R. Haghighat, B. Kaplan, G. Hoare, B. Zbarsky, J. Orendorff, J. Ruderman, E. W. Smith, R. Reitmaier, M. Bebenita, M. Chang, and M. Franz. Trace-based just-in-time type specialization for dynamic languages. In PLDI ’09: Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, pages 465–478, New York, NY, USA, 2009. ACM. ISBN 978-1-60558-392-1. doi:http://doi.acm.org/10.1145/1542476.1542528.
 K. Kennedy and K. S. McKinley. Maximizing loop parallelism and improving data locality via loop fusion and distribution. In Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, pages 301–320, London, UK, 1994. Springer-Verlag. ISBN 3-540-57659-2.
 M. Wiki. J¨agermonkey – method based jit engine, . URL https://wiki.mozilla.org/JaegerMonkey. Accessed on 06 May 2010.
For the past few weeks I have been working on helping my kids better understand what I do at work everyday, i.e. programming. Admittedly I do not not program everyday but they will have years to learn about other topics such as organizational politics and communication overload , so no reason to start on those topics yet and besides my eldest son has always shown a high degree of interest in computers and the workings behind them.
Keep in mind that my kids are fairly young to understand programming concepts. My two eldest are 4 and 6. You cannot have high expectations at that age for programming but just provide them the knowledge and the tools so that their creative minds can go to work & have fun and that is what I attempted to do.
Before diving into the details of this programming experiment below, for those just interested in the end product of our holiday weekend, the “Thor’s Christmas Feast” project, click on the following image.
Where to Start?
First question you are confronted with when attempting to show your kids how to program are “what tools to use?” I knew C++ would be out of the question and considered what I could do with Java or ActionScript. I then did the next logical thing, I Googled the topic (kids & programming) and eventually came across some very interesting tools, which I hadn’t heard about before; Scratch and Alice. After a brief look at both I decided to start with Scratch and later try out Alice.
Note: For parents with older kids I also came across another excellent resource, “Java Programming for Kids, Parents, and Grandparents,” written by Yakov Fain.
What is Scratch?
Scratch is a programming language developed by the Lifelong Kindergarten Group at the MIT Media Lab to help young people learn how to develop computer programs. The development of Scratch (and its name) was supposedly inspired by the scratching process that DJs use to create new sounds and music by rubbing old-style vinyl records back and forth on record turntables, creating new and distinctively different sound out of something that exists. Similarly scratch projects allow young developers to mix together graphics and sounds, using them in new and creative ways.
Technically Scratch is an interpreted dynamic visual programming language based on and implemented in a derivative of Smalltalk, called Squeak (hmmm.. need to play around with that as well). Being dynamic, Scratch allows code to be changed even as programs are running. As a visual language, Scratch does not involve writing code in the traditional sense; instead, you write “code” by dragging and dropping component blocks onto a graphical editor (Hey, Sounds like LiveCycle;-).
Developing a Lesson Plan
Admittedly, I am not a teacher and know little about the process of education, however like most parents I am driven to learn more about education both to help me communicate with my kid’s teachers at school and to help my kids learn at home as well. If anyone has any good recommendations on resources, please forward them to me!
My first step in teaching my kids programming concepts in Scratch was to write a list of goals to attain:
1) Learn the Scratch terminology and Environment
2) Co-develop a program to make our dogs bark
3) Co-develop a program to make our dogs spin
4) Have them write a program by themselves
5) Co-develop a game and publish it to the Scratch WebSite
Goal 1 – The Scratch Terminology & Environment
First we talked about this computer program called Scratch which allows you, a Computer Programmer, to write programs such as video games. Next, we discussed the primary concepts within Scratch of a Stage and Sprites. Technically the stage is the area in the Scratch IDE, located in the upper right hand side, where your Scratch applications execute. For my kids, however, the stage is the area where all the Sprite pictures play with one another. Then we opened up Scratch and identified the purpose of the various panels in it as shown below.
- The Stage: Place for Sprites (i.e. pictures of things) to play with one another
- Sprite List: Panel for selecting a Sprite to work with
- Code Blocks: The top panel is a set of Category buttons. You can see different types of blocks by clicking on different categories. Code Blocks are what computer programmers use to tell Sprites what to do. The color coding for the various categories seemed to help my kids immensely when referring back to categories of code blocks.
- Script Area: Technically the place where you compose programs by assembling code blocks. but for my kids its the place you move blocks to tell Sprites what to do.
- The Flag and Stop buttons: The two primary buttons for starting and stopping Scratch applications respectively.
This part went fairly smooth and after a few sessions of messing around my kids quickly picked up on the concepts of Stage, Sprite, Block, Script, etc…. By pulling up some of the sample apps such as Pong they learned how the Flag and Stop buttons worked while having a bit of fun as well.
Goals 2 & 3 – Co-Developing Applications
After learning the environment and trying some samples we set off to create our own very basic Scratch applications. I have to admit that I initially didn’t get it as to why a Kids programming language should be so media centric (yeah I am a little slow like that), but after seeing my kids interact with Scratch it became much more clearer to me. One of the nicest things I saw with the Scratch was that it personalized the development experience in new ways by making it easy for my kids to add personalized content and actively participate in the development process. Not only could they develop abstract programs to do mindless things with a Cat or a box, etc… but they could add THEIR own pictures and THEIR own voices to the Scratch environment which has given them hours of fun and driven them to learn. Awesome Job to the Scratch dev team!
Anyway, over a few sessions I sat in the drivers seat while the kids and I worked on making Thor, our English Bulldog, bark and spin.
Goal 4 – Have the Kids Write a Program Themselves
For this one both of the kids chose to make Thor Bark. Now the interesting thing for me here was the differences in my two kids. For my older son it probably took about half an hour for me and him to walk through creating a scratch application with Thor barking. He clearly knew what he was doing and needed to be done (We had done it several times with me in the drivers seat), but he was more interested in the process of creating the app than the actual end result, so he ended up asking me TONS of questions along the way (e.g. Why do Sprites need commands? What if I use the move Block instead?, etc….). My younger son on the other hand cared little about the process, but definitely wanted to see the end result. He managed to complete the program in a couple of minutes with little direction from me with the anticipation of hearing him self say “WUFF” repeatedly 😉
Another interesting point here was that I noticed that this was the first time that my younger son had learned how to drag and drop an object on a computer (in this case blocks onto the script area), that was the bulk of my help to him here. I was amazed how quickly he learned to do it and how quickly it seemed to become second nature (now compare that to his Grandparents, thats a different story 😉
Here is the source code for the Bark application.
This script simply tells a Sprite (named Thor in this case) to make a sound “Wuff” when the Flag is clicked. Simple aye?
Goal 5 – Co-develop a game and publish it to the Scratch WebSite
Okkk…. Now it was MY time to Shine!! I quickly grabbed back the drivers seat and went on to create an actual game in Scratch (The inner geek acting in me). The kids helped me to pick the theme (once again dog related). We decided to make a game where our dog Thor was eating all the goodies (Candy, Cookies, and Cake). Realize that my kids have a lot of experience with getting tackled by this 40 lb brute if they ever dare to venture around the floor with food in their hands. The boys learned this harsh lesson early on, but my daughter (age 1) is still in the learning process unfortunately.
My 6 year old did the Wuff sound for the game, while my younger son came up with the idea of making Thor get “fatter” as he ate more food and “smaller” as he missed food. Unfortunately, they couldn’t sit and watch me develop the program because well, it took me to long (couple of hours at most). I think it only took that long due to the learning curve I had with Scratch myself. In the and I was pleased with the end result (My wife was not so impressed). While I found scratch lacking in some ways (lack of functions, re-use, objects, encapsulation, polymorphism, etc…) the composition capabilities where impressive and the ways it dealt with expressions and variables visually. Moreover, building parallel threads in Scratch came naturally to the point that you weren’t even aware that you were doing it, which is the first time I can say I have seen that in an IDE.
Finally, another great aspect of the Scratch experience was the ease with which it enables you to publish and share code through a Scratch hosted service. You can find tons of great applications and examples at –> http://scratch.mit.edu/. This was yet another great lesson for the kids. They can investigate games out their on scratch and easily download the source and see how it works. Now I don’t think my kids quite grasp all of that yet, nor do they quite know the process of downloading the code themselves and opening it within the Scratch IDE, but I see it as a great foundation for working in a more open and collaborative development environment.