Skip to content

FOSDEM 2011 Highlights: LLVM & Clang

12 February, 2011

It’s great to be back. I’ve been so busy these last couple of weeks and it might have been a good idea to put up a notice of absence.

Well, after passing some of the nastiest exams (or so I’ve been told), I’ve been in Brussels with the ROSEdu crew. Specifically, for FOSDEM, but everyone knows that was just a pretext to visit Belgium (for me at least).

Well, in trying to keep a professional tone, let me talk a bit about my favorite talks from this year’s FOSDEM. The keynotes are not available online as of today (February 12th), but I’ll try adding a link as soon as possible.

LLVM and Clang

LLVM, although initially standing for Low Level Virtual Machine is nowadays an umbrella project for an improved compiler infrastructure. Essentially, tools like gcc and gdb have significant portions of common functionality (parsing C/C++ code) and this is done twice, using two different engines.

The talk, by Chris Lattner (mastermind of LLVM) himself was introductory and although it’s not online yet, here’s a link to an interview he had taken for FOSDEM. The amusing part is that you won’t find an actual interview, only a compilation of information he himself provided (Because of his employer’s [Apple] policy, Chris couldn’t be interviewed…).

The core libraries are based around the LLVM intermediate representation, LLVM IR and can be targeted by various compilers (Clang being one of them). So, what happens is that code is compiled to LLVM IR which is then optimized and finally CPU-specific code is generated.

What a compiler does (very high level overview) is:

  1. parses the source language to an intermediate representation (like an abstract syntax tree)
  2. transforms the intermediate representation (possibly to some other intermediate representation as the performance is improved and the code is simplified to a lower level to simplify language generation)
  3. generate machine code from the intermediate representation

The first two steps are performed by the front-end and the last step by the back-end. The intermediate representation is the crux of the matter. Which to chose so that the parsing and machine code generation can be split? LLVM offers an intermediate representation that is well tested and quite mature now (in development for over 10 years) and that can generate high quality machine code. So, with a back-end in place, there needs to be a front-end for each supported language.

This is where Clang comes in. Clang is a C/C++/Objective-C compiler that is faster than GCC, delivers faster code, provides meaningful error messages. As wonderful as all that sounds (and I will try using it from now on — it’s not like my C/C++ code is that complex/exotic), there are cases where GCC and Clang behave differently.

For example, the keyword inline is treated differently in Clang than in GCC, but that’s because GCC doesn’t adhere to the C99 standard completely by default. Actually dealing with inline is surprisingly tricky apparently… although I never noticed before. I just thought it was a better way of doing macros. Have fun reading the rules here.

At any rate, there are lots of projects that want to compile to LLVM IR like LDC (compiler for the D language with an LLVM backend) or a GHC backend. The GHC backend is the one I think is the most interesting. You can read the thesis I linked to, by David Terei who did all the work by replacing the Cmm language the GHC uses with LLVM IR. At least have a look for the details about the pipeline of the GHC — starting with Haskell code, to HS (an in memory representation of Haskell with syntax represented and on which type checking is performed), Core (version of lambda calculus with some extensions, just large enough to express Haskell), STG (the Spineless Tagless G-Machine which is an abstract machine representation of Haskell, I don’t understand, but has an awesome name :), and finally Cmm (a variation of the C- language that represents a Haskell program in a procedural form). This is the representation that is converted into LLVM IR.

In addition, LLDB uses libraries provided by LLVM and Clang to implement a more powerful debugger (faster and more memory efficient than GDB at loading symbols according to the hype). Also a new C++ standard library, libc++ is coming down the pipeline. A faster, less memory hungry one apparently.

Now, if this project is so dramatically better, everything is so much faster, why doesn’t everyone get to work on using it? Why don’t you hear about Ubuntu preparing to use it?

Well, as far as I can tell, although the LLVM project is under a less restrictive license than the GPL (aka allows proprietary, binary-only extensions, which should be good, right?) and it is still being worked on. For one, C++0x support is unfinished and only recently, on October 26th 2010 to be more specific has Clang built a working Linux Kernel. Other projects like llvm-gcc (GCC 4.2 front-end) apparently work whereas dragonegg (GCC 4.5, GPL3) is still buggy. And projects like libc++ exist (only?) because “Mainline libstdc++ has switched to GPL3, a license which the developers of libc++ cannot use”.

So, since Apple is the main sponsor of this project, it’s pretty clear that they don’t really appreciate the GPL3 (although I’m not sure whether or not it actually affects them — possibly forces them to open-source parts of Xcode?). As a consequence of that, all LLVM projects obviously work on Mac OS X and LLVM has already been used successfully to convert some more advanced OpenGL functions not supported by Macs using Intel GMA chipsets to simpler subroutines to ensure correct operation. (of course, one can also ask what in the world is a shitty Intel GPU doing inside a Mac; maybe next time they’ll not use piece of junk hardware instead?)

LLVM is featured most prominently in Xcode, Apple’s IDE for Mac OS X and iOS and it’s quite clear that future versions of both OSes will no longer use GCC. In Xcode 4, there is support for LLVM2.0, Fix-It (which basically detects possible errors at edit thanks to some LLVM magic) and the LLDB debugger.

BSDs will probably follow, pouncing at the opportunity to no longer rely on a GPL3 compiler.

Whether or not this is good in the long-term is still debatable. Some may view Apple’s involvement in the LLVM project with suspicion and there might be some hesitation to switch entire Linux distributions to Clang. Whatever happens, LLVM is clearly here to stay. Hopefully cross-platform support gets better, although right now, it’s pretty clear that Mac OS X is definitely the priority. Here’s to better compilers for everyone!

Oh, and Brussels was a lot of fun!

In Brussels

Team ROSEdu @ De Brouckère, Bruxelles

Advertisements
10 Comments leave one →
  1. 12 February, 2011 9:42 PM

    I read STG articles this week and it is still unknown in certain parts 🙂

    Yes, LLVM is a great tool and you’ll use it if you’ll choose C3 😀

  2. Călin Iorgulescu permalink
    13 February, 2011 2:00 PM

    @Mithrandir:

    Not to be a stick-in-the-mud, but I’ll believe the last part when I see it. 🙂
    Don’t get me wrong, it’s really something I’d love to see, but when the team proved such exquisite skill in using even the most basic tools (Vmchecker, hint hint), not to mention some of the great woes of involved in correcting the homework, I really see it as just wishful thinking for now.

    Here’s to hoping you’ll be right. 🙂

  3. 13 February, 2011 4:27 PM

    @Călin

    If I’ll not leave this will be my task for the summer, porting one homework to use LLVM (at least that was when I last spoke to him)

  4. 14 February, 2011 11:53 AM

    Less than a year ago, few people in our community at least heard about LLVM, less used it once. Therefore it is a great thing they used LLVM at the compilers lab, because as you see this might be the compiler to replace gcc soon. It was a nice start that leaves place for better men to do better things in the future (maybe starting from next year).

    There was a proposal and a homework ported to use LLVM, but it was considered too hard by many reasons (resulted to be a good decision given that some started complaining even so, when faced to the fact that they won’t have only A+ scoring homework).

    One big problem with using LLVM in our faculty is the language used for development, C++, totally missing from our curricula (bad thing, I know). Even so, if a tradition is made young students will know to study alone C++ in order to be prepared if choosing C3, and having homeworks related to LLVM won’t be such a problem, but only a great thing. So, let’s start a rumor and spread some advice: learn C++.

  5. 14 February, 2011 4:20 PM

    LLVM is not a compiler!! [1] defines it as a compiler infrastructure, a compiler is something else.

    And some of the complaints were caused by some of the TAs who weren’t able to teach that matter!!

    C++ can be learned. EGC does teach some C++. If you know Java and C you can go to C++ if you really want to do this.

    [1]: http://en.wikipedia.org/wiki/Llvm

    • danf permalink*
      15 February, 2011 8:22 AM

      I would also like to mention that now we’re specifically encouraged to use C++ for PA.

    • 15 February, 2011 9:32 AM

      You’ve got my point anyway.!! 😛

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: