To Dream of Magick

Dreamer Shaper Seeker Maker

Configuring your Haskell application

Posted on Mon Jun 26 19:30:00 UTC 2017

One way or another, you are going to need to configure your Haskell application, and for that you have three major ways of doing it. I recommend choosing one and sticking to it. You can choose multiple ones, but it is important that you minimize one of them in order to keep yourself out of the mind-numbing tedium of consistently combining multiple input parameter sets and their overrides.

Your options tend to be...

  • CLI Option parsing

    I recommend this for small utilities, especially those which you are going to run frequently and with a variety of configurations.

  • Configuration files

    This is generally my preferred way of running an application. You'll still need to do a little bit with option parsing, but only enough to get a configuration. However, it can be a total pain to need to edit a file to change the configuration for a utlity, so use this for your longer-running applications.

  • Environment variables

    This is not generally how I want to configure an application, but some environments, such as Heroku, make it the easiest way.

CLI Option Parsing

The most important rule of parsing options from the CLI is...

*Don't write your own CLI parsing library.*

I have made this mistake. It is no longer on the internet. Do not do what I have done. Do this instead.

For particularly simple parameter parsing, you don't need any libraries. For example I have a tool that I use on occasion to reformat an m3u playlist for my phone. Rhythmbox exports the playlist in an m3u format, but with all paths that don't work for my Android phone. A tool like this is so simple that the only parameters to it are the input file and the output file.

In fact, the tool is so simple that it may have been better for me to accept the input data on standard in and emit the output data on standard out. Please forgive me for that, too.
import           System.Environment (getArgs)

main :: IO ()
main = do
    (source:dest:_) <- getArgs

That is the simplest way. However, you may wish to be kind to your users...

main :: IO ()
main = do
    args <- getArgs
    case args of
        (source:dest:_) -> {- do your thing! -}
        _ -> print "Run the application with the source and destination files."

This is your standby for applications with very simple parameters, and these applications are quite common. However, more complex configuration is often needed. For that, resort to Optparse-Applicative. This will give you command line options that are very similar in power to the one available in Go.

The tutorial covers basically everything, but here's a starter example:

cliParser :: Parser Config
cliParser = Config <$> option auto (long "interval" <> help "number of seconds between samples" <> value 5)
                   <*> strOption (long "log" <> help "log output file")

main = do
    Config{..} <- execParser (info (helper <*> cliParser)
                             (fullDesc <> progDesc "description of the program"))

Look here for a summary of the functions and typeclasses involved above. The entire block around execParser is basically boilerplate code, and all of the interesting bits happen inside cliParser.

This technique is as common as mud. As an administrator, I do like to pass parameters to my applications, but I dislike services that require excessively long command lines to run. If your application requires more than four or five parameters, or if the parameters rarely change from one run to the next, look to the next section for configuration files, instead.

Configuration Files

For almost all of my configuration needs, I like to go with a file on the disk. I usually put it into a Yaml format, because that allows some complex nested configurations and saves me from needing to write a configuration parser myself.

For my example, I will demonstrate with a program that I use for my HDR processing toolchain. The program has to go through several steps, and basically it needs these parameters:

  • Do I need to align the photographs?
  • What are my input files?
  • What white balance parameters should I use for developing the files?

and so forth. These are the most important parameters. A typical file looks like this:

wb: camera
project: lake-travis-dam
- _DSC3656.dng
- _DSC3657.dng
- _DSC3658.dng
- _DSC3659.dng
- _DSC3660.dng
align: false
fanout: false

So, first I want a data structure to store this:

data WhiteBalance = Camera | Auto

data Project = Project {
      sources :: [String]
    , project :: String
    , wb :: WhiteBalance
    , align :: Bool
    , fanout :: Bool
    deriving (Show)

instance Default Project where
    def = Project [] "" Camera False False

(incidentally, I like having defaults for my structures, if I can concieve of a reasonable default)

Whether Yaml or JSON, in Haskell I need a FromJSON instance for parsing this file:

instance FromJSON Project where
    parseJSON (Object obj) =
        Project <$> obj .: "sources"
                <*> obj .: "project"
                <*> obj .: "wb"
                <*> obj .: "align"
                <*> obj .: "fanout"
    parseJSON obj = fail $ show obj

instance FromJSON WhiteBalance where
    parseJSON (String str) =
        case str of
            "camera" -> pure Camera
            "auto" -> pure Auto
            _ -> fail $ "invalid wb string: " ++ T.unpack str
    parseJSON (Object obj) =
        WhiteBalance <$> obj .: "temp"
                     <*> obj .: "green"
    parseJSON obj = fail $ show obj

aside: I use fail instead of mzero or mempty because propogating out any error message at all helps immensely with debugging. I wish I could use throwError, but MonadError is not implemented for Parser.

-- now include code for reading JSON format and Yaml format

Environment Variables

While I do not particularly like using environment variables for configuration an application, Heroku and presumably some other services require their use. On the other hand, most languages treat environment variables as a simple dictionary, making them simple to retrieve. Haskell is no exception to this. The only catch is that nested structures require a little more effort to build.

Your workhorse function is System.Environment.getEnv :: String -> IO String. The function will return the value if present, or throw an IO exception if it is not present. Since you may sometimes want to make the variable optional, so, here is a function that will capture isDoesNotExistError and translate it into a Maybe:

maybeGetEnv :: String -> IO (Maybe String)
maybeGetEnv k = (Just <$> getEnv k) `catch` handleIOExc
    handleIOExc exc
        | isDoesNotExistError exc = pure Nothing
        | otherwise = throw exc

Then write your configuration function like so:

import Data.List.Split (splitOn)

loadConfiguration :: IO Config
loadConfiguration = do
    p <- getEnv "PROJECT_NAME"
    s <- splitOn "," <$> getEnv "SOURCES"
    align <- maybe False read <$> maybeEnv "ALIGN_IMAGES"
    fanout <- maybe False read <$> maybeEnv "FANOUT_EXPOSURES"
    pure $ Config s p Camera align fanout

These are your three major methods for configuring an application. Many applications will permit a certain degree of hybridization between them, but I think it is best to minimize that as much as possible. For instance, a command line parameter to specify the path to a configuration file. Doing it in the general case, handling command line parameters, defaults, configuration options, and environment variables, has typically lead to a very difficult-to-use mess, and I have regretted such attempts.

Whichever method you use for passing configuration in, you'll then want to wrap that configuration up into a context for your application. I will hint more on that in my next article, on the application monad, and give it significantly more detailed treatment later on.

Questions? Comments? Feedback? Email me. I am particularly interested in places that you feel are unclear or which could use better explanation, or experiments you have run that turned out better.