Processing SqlValues in Haskell

Savanni D'Gerinel 1 Jun, 2012

Let us assume that, whether you are just learning Haskell for the first time or have long since mastered Haskell the language, you suddenly need to hook up to a database for the first time.

Don’t laugh. Many applications work extremely well without a database.

My applications, however, almost always have a relational database somewhere in their guts. I work with a lot of relational data. I have to learn Database.HDBC early, and many of you will, too.

You should know that I have actually practiced Haskell several times over the last six years. I probably have about one year of active development scattered across that time. This time I was finally able to comprehend some concepts that really kicked my ass in previous attempts. In that time, libraries have changed and Database.HDBC has emerged as the primary interface to SQL.

Putting data into the database is really easy. Even getting data back out is easy, up to a point. You can find basic instructions on connecting for the first time, putting data into the database, and running queries, over in Chapter 21 of Real World Haskell. My article does not cover that. Instead, it covers what to do with the data that HDBC hands back to you.

I have not written much about it, but I am building up a general Fitness Application that covers workouts of the form “x repetions, y sets”. This application is currently distinct from my Cycling application, but I may merge them some time in the future.

The application

I am slowly working my way through the 100 Pushup Challenge. I have not purchased the Android app or anything and have been keeping logs of my progress in a normal text-based journal. But, I want to do some data analysis so that I can see trend lines and whether and how quickly I am making progress.

Several different workout types all follow this data structure. Whether I am doing the 100 Pushup Challenge or the 200 Situp Challenge or any of the other challenges, all of the workouts share identical data structures. I call this a SetRepWorkout (a workout consisting of multiple sets of repetitions with rest time in between) and modelled it as such:

data WorkoutType = Pushups | Situps deriving (Show, Read, Eq)
data SetRep = CSet { reps :: Integer, rest :: Integer, next :: SetRep }
            | FSet { reps Integer }
data SetRepWorkout = SetRepWorkout { uuid :: Data.UUID.UUID,
                                     date :: Data.Time.Calendar.Day,
                                     workout_type :: WorkoutType,
                                     description :: String,
                                     sets :: SetRep }

In theory, I could have represented things like so:

data SetRep = CSet { reps :: Integer, rest :: Maybe Integer }
            | FSet { reps :: Integer }
data SetRepWorkout = SetRepWorkout { uuid :: Data.UUID.UUID,
                                     date :: Data.Time.Calendar.Day,
                                     workout_type :: WorkoutType,
                                     description :: String,
                                     sets :: [SetRep] }

In fact, you will see later that I use this representation when storing the data. Arguably, this could have worked out better, but I used the original recursive definition. Even more, I probably could have left out the rest time as not informative.

Now, given the representation, we have to work on the ever-annoying functions to move the data into and out of the database.

When I started, I investigated the Persistent database architecture. I will likely investigate it again later, but I discovered that Persistent does not handle recursive data structures. When I told Persistent to save a SetRep structure, it saved the first CSet just fine, but everything in the next field ended up serialized into the next column in the database. I find this representation unacceptable as it forces all data queries to load the data into the objects, making ad-hoc queries impossible.

Changing the representation to non-recusive would have fixed the problem, but I would not have created nearly so much fodder to explain. And, ultimately, when doing complex data modelling we will all find a case in which the ORM breaks down and we must use SQL instead.

For the final application, though, I do not rule out the possibility of changing the representation. I may even drop the rest time by dropping the SetRep structure completely and making sets a list of Integers. If I make this change, Persistent becomes more viable, but still has some problems. I will discuss those down in my Conclusions.

The really easy way

Most documentation online covers running queries and retrieving data, but it rarely focuses on retrieval beyond the basic fetchRow or fetchAllRows functions. Once you have called these functions (or fetchAllRowsAL or fetchAllRowsMap), you have SqlValues, not values relevant to your domain model. Consider this:

conn <- connectSqlite3 "example.sqlite"
stmt <- prepare conn "SELECT * FROM Workout WHERE uuid=?"
execute stmt [toSql "9d4f98cd-63eb-4fad-8d9d-070f191b72db"]
rows <- fetchAllRowsAL' stmt

This looks distressingly procedural, but I could have rearranged it into a »= pipe to make it feel less so. When we examine rows, in this case we see:

[[("uuid",SqlByteString "9d4f98cd-63eb-4fad-8d9d-070f191b72db"),
  ("day",SqlByteString "2012-04-18"),
  ("workout_type",SqlByteString "Pushups"),
  ("description",SqlByteString "Week 3, Day 3, Column 2")]]

Now, remember that I called fetchAllRowsAL'. This will fetch all of the rows (hence the outermost list that has only a single element), and return each row as an association list of (column name, value). Further, this particular function fetches all data strictly, so I don’t have to worry about laziness issues when I finalize the stmt.

So, extract just the interesting workout with

let row = head rows

Then we need to convert from [(String, SqlValue)] to Workout:

let uuid = fromJust $ fromString $ fromSql $ fromJust $ lookup "uuid" row
    day = parseTime defaultTimeLocale "%Y-%m-%d" $ fromSql $ fromJust $ lookup "day" row
    workout_type = workouttypeFromSql $ fromJust $ lookup "workout_type" row
    description = fromSql $ fromJust $ lookup "description" row
SetRepWorkout uuid day workout_type description (FSet 0)

This works, but it catches no errors. fromJust will throw an exception if it gets Nothing as an input. fromString :: String -> Maybe UUID return Nothing if it receives a string in the wrong format. So, two classes of errors:

  • column not present
  • invalid data

I feel comfortable with allowing a column not present error to crash the application. The application expects a particular structure for the database and does not get it, so I consider it a programming error. But data in the database can get corrupted, and I do not want the program to crash because of an invalid data format. I would flag the error and do something with it in a higher level of the application.

A more robust (and almost as easy) way

It turns out that a more robust implementation is almost as easy to implement, but figuring that out took me seven hours of experimenting.

Notice that HDBC contains safeFromSql :: Convertible SqlValue a => SqlValue -> ConvertResult a. This leads naturally to the definition of Convertible and to ConvertResult.

The code does not include an instance for Convertible SqlValue UUID, but such an instance would help. If you had such an instance, you could say this:

let uuid = safefromSql $ fromJust $ lookup "uuid" row

(Note: I left a fromJust. lookup will return Nothing if the “uuid” column does not exist, and I consider this a programming error. So, I actually want fromJust there to blow up the program in that event.)

In fact, with proper instances, you could replace much of the above code like this:

let uuid = safeFromSql $ fromJust $ lookup "uuid" row
    day = safeFromSql $ fromJust $ lookup "day" row
    workout_type = safeFromSql $ fromJust $ lookup "workout_type" row
    description = safeFromSql $ fromJust $ lookup "description" row

Now, each of the fields will contain Left (ConvertError ...) instead of exploding when the code encounters invalid data for that field. You cannot feed that directly into a SetRepWorkout and unravelling would be tedious, but ConvertResult b is just a type alias for Either ConvertError b`, and so it functions in the Either Monad. So the entire function above can look like this:

rowToWorkout :: [(String, SqlValue)] -> ConvertResult SetRepWorkout
rowToWorkout row = do
    uuid <- safeFromSql $ fromJust $ lookup "uuid" row
    day <- safeFromSql $ fromJust $ lookup "day" row
    workout_type <- safeFromSql $ fromJust $ lookup "workout_type" row
    description <- safeFromSql $ fromJust $ lookup "description" row
    return $ SetRepWorkout uuid day workout_type desciption (FSet 0)

To make this work, you must ensure that each of the above conversions has a Convertible a b instance.

As it turns out, HDBC already provides Convertible SqlValue Day and Convertible SqlValue String. I simply had to provide Convertible SqlValue UUID and Convertible SqlValue WorkoutType. Here I provide those implementations, complete with the compiler flags necessary to even make them possible. I put the compile flags at the top of the file, but you could put them on the command line or in your Cabal file.

{-# LANGUAGE TypeSynonymInstances, FlexibleInstances, MultiParamTypeClasses #-}

import Data.ByteString.UTF8 a BUTF8 (toString)
import Data.UUID
import Data.Convertible.Base


instance Convertible SqlValue UUID where
    safeConvert (Sqlstring a) = case fromString a of
        Just b -> Right b
        Nothing -> Left $ ConvertError (show a) "SqlValue" "UUID" "Could not parse UUID"
    safeConvert (SqlByteString a) = safeConvert $ SqlString $ BUTF8.toString a
    safeConvert a = Left $ ConvertError (show a) "SqlValue" "UUID" "No conversion available"

instance Convertible SqlValue WorkoutType where
    safeConvert (SqlString a) = case a of
        "Pushups" -> Right Pushups
        "Situps" -> Right Situps
        _ -> Left $ ConvertError (show a) "SqlValue" "WorkoutType" "Unrecognized value"
    safeConvert (SqlByteString a) = safeConvert $ SqlString $ BUTF8.toString a
    safeConvert a = Left $ ConvertError (show a) "SqlValue" "WorkoutType" "No conversion available"

Given these implementations, and the rowToWorkout function, the selectWorkoutByID function is not bad, though it could probably use some more refinement.

selectWorkoutByID :: IConnection a => UUID -> a -> IO (Either String SetRepWorkout)
selectWorkoutByID w_id conn = do
    stmt <- prepare conn "SELECT * FROM Workout WHERE uuid=?"
    execute stmt [toSql $ show w_id]
    workout <- fetchAllRowsAL' stmt >>= mapM (return . rowToWorkout)
    return $ case length workout of
        0 -> Left "No workout found"
        1 -> case (head workout) of
            Right w -> Right w
            Left err -> Left $ show err
        _ -> Left "Multiple hits for a single workout Id"

Conclusions

You will need this information. For one reason or another, you will one day encounter a case in which an ORM cannot function, and then you will need to go to SQL. The needs of the database absolutely should not dictate your domain model. Persistent annoys me because, in an effort to stick with “Don’t Repeat Yourself”, it actually forces database knowledge into the domain model, which I feel is a violation of the Onion Architecture. Ultimately, though, this is a simple application and such a violation may not be so bad.

This case, however, I used for practice and for teaching. The program is so simple that I really can explore the implications of the Onion Architecture and force in a level of architectural purity. I would like you to closely examine the code and consider for yourself whether such a setup makes sense in your application. Perhaps, when I learn TemplateHaskell, I will even develop a set of templates to describe this code and allow the compiler to figure it out for me.

More concerning to me, though, is that I am not sure how Persistent can be made to follow even First Normal Form when dealing with a composite data structure without modelling the data in some very odd ways. Now, I would accept this if I were going to a key-value store or a document database, but I’m going to a relational database and I want the option of running queries based on proper relations. In almost every relational database I build, I demand at least Second Normal Form, frequently also Third Normal Form, and possibly also some insights from even higher normal forms.

Model <-> Database code is annoying, tedious, and error-prone. I recently read an article by Patrick Loi, Dependency Injection Ceremony, and really hooked on the image of the Devil on one shoulder, the Angel on the other shoulder, and the Angel saying “You should be listening to that other guy”. Read the article, as it is hilarous and informative. It also is relevant here.

Martin Fowler has some objections to ORM Hate, and he is right in that rolling your own ORM almost never makes sense. On the other hand, the ORM always (no, really, ALWAYS) violates a good Onion Architecture because it forces you to define database representation in the middle of your domain model. Ew. Gross.

Your solution is going to vary. For now, due to the extreme pain of a project at work in which I definitely did not properly isolate the domain model from the database, I will practice quite a lot with the purest architecture I can get. After I have experienced the benefits and drawbacks of this purity, I will revisit the compromises I want to make on my projects.

I have the full set of code for this fitness project (which will slowly grow into an application) on Gitlab. If you have comments or feedback, please email them directly to me at savanni@alyra.org.

http://i.creativecommons.org/l/by-nc-sa/3.0/88x31.png

Processing SqlValues in Haskell by Savanni D'Gerinel is licensed under a Creative Commons Attribution-NonCommercial-SharAlike 3.0 Unported License. You can link to it, copy it, redistribute it, and modify it, but don't sell it or the modifications and don't take my name from it.


Dreamer, Shaper, Seeker, Maker