GHC supports running Haskell programs in parallel on an SMP (symmetric multiprocessor).
There's a fine distinction between concurrency and parallelism: parallelism is all about making your program run faster by making use of multiple processors simultaneously. Concurrency, on the other hand, is a means of abstraction: it is a convenient way to structure a program that must respond to multiple asynchronous events.
However, the two terms are certainly related. By making use of multiple CPUs it is possible to run concurrent threads in parallel, and this is exactly what GHC's SMP parallelism support does. But it is also possible to obtain performance improvements with parallelism on programs that do not use concurrency. This section describes how to use GHC to compile and run parallel programs, in Section 7.24, “Concurrent and Parallel Haskell” we describe the language features that affect parallelism.
In order to make use of multiple CPUs, your program must be
        linked with the -threaded option (see Section 4.12.6, “Options affecting linking”).  Additionally, the following
        compiler options affect parallelism:
-feager-blackholing
            Blackholing is the act of marking a thunk (lazy
            computuation) as being under evaluation.  It is useful for
            three reasons: firstly it lets us detect certain kinds of
            infinite loop (the NonTermination
            exception), secondly it avoids certain kinds of space
            leak, and thirdly it avoids repeating a computation in a
            parallel program, because we can tell when a computation
            is already in progress.
            The option -feager-blackholing causes
            each thunk to be blackholed as soon as evaluation begins.
            The default is "lazy blackholing", whereby thunks are only
            marked as being under evaluation when a thread is paused
            for some reason.  Lazy blackholing is typically more
            efficient (by 1-2% or so), because most thunks don't
            need to be blackholed.  However, eager blackholing can
            avoid more repeated computation in a parallel program, and
            this often turns out to be important for parallelism.
          
            We recommend compiling any code that is intended to be run
            in parallel with the -feager-blackholing
            flag.
          
There are two ways to run a program on multiple
        processors:
        call Control.Concurrent.setNumCapabilities from your
        program, or use the RTS -N option.
-N[x]
              Use x simultaneous threads when
              running the program.  Normally x
              should be chosen to match the number of CPU cores on the
              machine[9].  For example,
              on a dual-core machine we would probably use
              +RTS -N2 -RTS.
Omitting x,
              i.e. +RTS -N -RTS, lets the runtime
              choose the value of x itself
              based on how many processors are in your machine.
Be careful when using all the processors in your machine: if some of your processors are in use by other programs, this can actually harm performance rather than improve it.
Setting -N also has the effect of
              enabling the parallel garbage collector (see
              Section 4.17.3, “RTS options to control the garbage collector”).
The current value of the -N option
              is available to the Haskell program
              via Control.Concurrent.getNumCapabilities, and
              it may be changed while the program is running by
              calling Control.Concurrent.setNumCapabilities.
The following options affect the way the runtime schedules threads on CPUs:
-qaUse the OS's affinity facilities to try to pin OS threads to CPU cores. This is an experimental feature, and may or may not be useful. Please let us know whether it helps for you!
-qmDisable automatic migration for load balancing.
            Normally the runtime will automatically try to schedule
            threads across the available CPUs to make use of idle
            CPUs; this option disables that behaviour.  Note that
              migration only applies to threads; sparks created
              by par are load-balanced separately
              by work-stealing.
              This option is probably only of use for concurrent
              programs that explicitly schedule threads onto CPUs
              with Control.Concurrent.forkOn.
            
Add the -s RTS option when
        running the program to see timing stats, which will help to tell you
        whether your program got faster by using more CPUs or not.  If the user
        time is greater than
        the elapsed time, then the program used more than one CPU.  You should
        also run the program without -N for
        comparison.
The output of +RTS -s tells you how
        many “sparks” were created and executed during the
        run of the program (see Section 4.17.3, “RTS options to control the garbage collector”), which
        will give you an idea how well your par
        annotations are working.
GHC's parallelism support has improved in 6.12.1 as a result of much experimentation and tuning in the runtime system. We'd still be interested to hear how well it works for you, and we're also interested in collecting parallel programs to add to our benchmarking suite.
[9] Whether hyperthreading cores should be counted or not is an open question; please feel free to experiment and let us know what results you find.