Configuring your Haskell application
January 01, 0001
One way or another, you are going to need to configure your Haskell application, and for that you have three major ways of doing it. I recommend choosing one and sticking to it. You can choose multiple ones, but it is important that you minimize one of them in order to keep yourself out of the mind-numbing tedium of consistently combining multiple input parameter sets and their overrides.
Your options tend to be…
-
CLI Option parsing
I recommend this for small utilities, especially those which you are going to run frequently and with a variety of configurations.
-
Configuration files
This is generally my preferred way of running an application. You’ll still need to do a little bit with option parsing, but only enough to get a configuration. However, it can be a total pain to need to edit a file to change the configuration for a utlity, so use this for your longer-running applications.
-
Environment variables
This is not generally how I want to configure an application, but some environments, such as Heroku, make it the easiest way.
CLI Option Parsing
The most important rule of parsing options from the CLI is…
*Don't write your own CLI parsing library.*
I have made this mistake. It is no longer on the internet. Do not do what I have done. Do this instead.
For particularly simple parameter parsing, you don’t need any libraries. For example I have a tool that I use on occasion to reformat an m3u playlist for my phone. Rhythmbox exports the playlist in an m3u format, but with all paths that don’t work for my Android phone. A tool like this is so simple that the only parameters to it are the input file and the output file.
In fact, the tool is so simple that it may have been better for me to accept the input data on standard in and emit the output data on standard out. Please forgive me for that, too.
import System.Environment (getArgs)
main :: IO ()
main = do
(source:dest:_) <- getArgs
That is the simplest way. However, you may wish to be kind to your users…
main :: IO ()
main = do
args <- getArgs
case args of
(source:dest:_) -> {- do your thing! -}
_ -> print "Run the application with the source and destination files."
This is your standby for applications with very simple parameters, and these applications are quite common. However, more complex configuration is often needed. For that, resort to Optparse-Applicative. This will give you command line options that are very similar in power to the one available in Go.
The tutorial covers basically everything, but here’s a starter example:
cliParser :: Parser Config
cliParser = Config <$> option auto (long "interval" <> help "number of seconds between samples" <> value 5)
<*> strOption (long "log" <> help "log output file")
...
main = do
Config{..} <- execParser (info (helper <*> cliParser)
(fullDesc <> progDesc "description of the program"))
Look here for a summary of the functions and typeclasses involved above. The entire block around execParser
is basically boilerplate code, and all of the interesting bits happen inside cliParser
.
This technique is as common as mud. As an administrator, I do like to pass parameters to my applications, but I dislike services that require excessively long command lines to run. If your application requires more than four or five parameters, or if the parameters rarely change from one run to the next, look to the next section for configuration files, instead.
Configuration Files
For almost all of my configuration needs, I like to go with a file on the disk. I usually put it into a Yaml format, because that allows some complex nested configurations and saves me from needing to write a configuration parser myself.
For my example, I will demonstrate with a program that I use for my HDR processing toolchain. The program has to go through several steps, and basically it needs these parameters:
- Do I need to align the photographs?
- What are my input files?
- What white balance parameters should I use for developing the files?
and so forth. These are the most important parameters. A typical file looks like this:
wb: camera
project: lake-travis-dam
sources:
- _DSC3656.dng
- _DSC3657.dng
- _DSC3658.dng
- _DSC3659.dng
- _DSC3660.dng
align: false
fanout: false
So, first I want a data structure to store this:
data WhiteBalance = Camera | Auto
data Project = Project {
sources :: [String]
, project :: String
, wb :: WhiteBalance
, align :: Bool
, fanout :: Bool
}
deriving (Show)
instance Default Project where
def = Project [] "" Camera False False
(incidentally, I like having defaults for my structures, if I can concieve of a reasonable default)
Whether Yaml or JSON, in Haskell I need a FromJSON instance for parsing this file:
instance FromJSON Project where
parseJSON (Object obj) =
Project <$> obj .: "sources"
<*> obj .: "project"
<*> obj .: "wb"
<*> obj .: "align"
<*> obj .: "fanout"
parseJSON obj = fail $ show obj
instance FromJSON WhiteBalance where
parseJSON (String str) =
case str of
"camera" -> pure Camera
"auto" -> pure Auto
_ -> fail $ "invalid wb string: " ++ T.unpack str
parseJSON (Object obj) =
WhiteBalance <$> obj .: "temp"
<*> obj .: "green"
parseJSON obj = fail $ show obj
aside: I use fail
instead of mzero
or mempty
because propogating out any error message at all helps immensely with debugging. I wish I could use throwError
, but MonadError
is not implemented for Parser
.
-- now include code for reading JSON format and Yaml format
Environment Variables
While I do not particularly like using environment variables for configuration an application, Heroku and presumably some other services require their use. On the other hand, most languages treat environment variables as a simple dictionary, making them simple to retrieve. Haskell is no exception to this. The only catch is that nested structures require a little more effort to build.
Your workhorse function is System.Environment.getEnv :: String -> IO String
. The function will return the value if present, or throw an IO exception if it is not present. Since you may sometimes want to make the variable optional, so, here is a function that will capture isDoesNotExistError
and translate it into a Maybe:
maybeGetEnv :: String -> IO (Maybe String)
maybeGetEnv k = (Just <$> getEnv k) `catch` handleIOExc
where
handleIOExc exc
| isDoesNotExistError exc = pure Nothing
| otherwise = throw exc
Then write your configuration function like so:
import Data.List.Split (splitOn)
loadConfiguration :: IO Config
loadConfiguration = do
p <- getEnv "PROJECT_NAME"
s <- splitOn "," <$> getEnv "SOURCES"
align <- maybe False read <$> maybeEnv "ALIGN_IMAGES"
fanout <- maybe False read <$> maybeEnv "FANOUT_EXPOSURES"
pure $ Config s p Camera align fanout
These are your three major methods for configuring an application. Many applications will permit a certain degree of hybridization between them, but I think it is best to minimize that as much as possible. For instance, a command line parameter to specify the path to a configuration file. Doing it in the general case, handling command line parameters, defaults, configuration options, and environment variables, has typically lead to a very difficult-to-use mess, and I have regretted such attempts.
Whichever method you use for passing configuration in, you’ll then want to wrap that configuration up into a context for your application. I will hint more on that in my next article, on the application monad, and give it significantly more detailed treatment later on.
Questions? Comments? Feedback? Email me. I am particularly interested in places that you feel are unclear or which could use better explanation, or experiments you have run that turned out better.
Configuring your Haskell application by Savanni D’Gerinel is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.