Custom SQL queries in Groundhog

Recently, I’ve been in a mood to play with ORMs in Haskell. The most recognizable one is Persistent, developed as part of a Yesod project. It is quite pleasant to work with, I must say, but some assumptions it makes are… not suitable for me. So I’ve looked for something different and I found out about Groundhog, yet another ORM in Haskell. Unfortunately, it is almost identical to Persistent, but maybe instead of searching for the perfect library, I will be able to adapt it to my needs?

The point is - I don’t want fancy ORMs with custom DSLs in pure Haskell that will allow me to express every query (well, sort of). Of course, I don’t want to write simple SELECTs with some conditions by hand as that may be expressed in EDSL well, but advanced queries are just PITA to write in something different than SQL (and even in SQL it’s not that easy). But being able to write raw SQL and then map the result to a custom type is tempting.

My goal is simple - force Groundhog to decode results of raw SQL queries into Haskell objects. This means using queryRaw with correct decoder. Sounds simple. But it isn’t. Converting a list of values into a real object, using different code for every type is painful and I’ve really wanted to avoid it. To my luck, I was able to leverage PersistEntity, something that every type usable by Groundhog must instantiate. So, let’s start to code!

The code

Assume we have this model:

data Author = Author {
    authorName :: Text
  , authorEmail :: Text
}

data Category = Category {
    categoryName :: Text
}

data Post = Post {
    postTitle :: Text
  , postContent :: Text
  , postCategory :: DefaultKey Category
  , postAuthor :: DefaultKey Author
}

-- This is a triple (Post, Category, Author), but using PCA c-tor is simpler
data PCA = PCA {
    pcaPost :: Post
  , pcaCategory :: Category
  , pcaAuthor :: Author
}

mkPersist defaultCodegenConfig [groundhog|
- entity: Author
- entity: Category
- entity: Post
|]

And we want to get a Post with its Category and Author (PCA). It’s just a SELECT with two JOINs, so I’ll skip it. How to do so? Simple - queryRaw False queryString [PersistInt64 i] decode. The problem is - how to write the decode function? ;)

Let’s face a less complex problem first. How can we decode whole Post? It instantiates PersistEntity, so we can use fromEntityPersistValues. And it has a signature that is almost identical to desired - [PersistValue] -> m (v, [PersistValue]) vs m (Maybe [PersistValue]) -> m v (and the constraints on m and v match, more or less)! We just need to extract a list from a Maybe, pass it to this function and select the first element of the result:

decode v = do
  Just v' <- v
  fst <$> fromEntityPersistValues v'

or even

decode = fmap fromJust >=> fromEntityPersistValues >=> return . fst

This was easy. There is one gotcha here - default implementation of PersistEntity generated by groundhog-th uses a magic value, encoded as a first column, to determine which constructor should be used (in sum types). Sadly, it does so for EVERY type, even for simple product types like Post. We need to take care of it ourselves either by manually SELECTing 0 as first column, or prepending PersistInt64 0 to the v.

Let’s take a closer look at fromEntityPersistValues. What does it do? It tries to decode an object from a list of PersistValue and returns the remaining values or fails with an error. If there are more values than are needed, it will consume just the required ones and let us decide what to do with the rest. This allows us to construct the inner objects first and then build the outer, like this:

-- To simplify, I assume that `v` is `[PersistValue]` not `Maybe [PersistValue]`.
decode v = do
  (post, v1) <- fromEntityPersistValues v
  (cat, v2)  <- fromEntityPersistValues v1
  (auth, _)  <- fromEntityPersistValues v2
  return $ PCA post cat auth

Can you spot the pattern? Decode an object, save it and, if we need more, (sort of) recursively decode, otherwise construct the final object. But do we really need to construct the object at the end? Can’t we just do it in the meantime? Of course we can! ;) PCA (or any other constructor) is just a function and we may leverage partial application to keep track of the partially constructed object (or, to be exact, a partially applied constructor).

Additionally, PCA has a type Post -> Category -> Author -> PCA and we want to decode an object that may be constructed with a function of type (PersistEntity a, PersistEntity b, ...) => a -> b -> ... -> Object or, more generally, a function of type PersistEntity a => a -> b where b may be of the same form or is a fully constructed object. Having this, we can develop a function that tries to decode a single object (thus consumes some of the PersistValues), applies it to the partially applied function and returns it along with the rest of values.

decodePart :: (PersistBackend m, PersistEntity a) => (a -> b, [PersistValue]) -> m (b, [PersistValue])
decodePart (p, v) = do
  (o, v') <- fromEntityPersistValues v
  return (p o, v')

This way, we can just bind some calls to decodePart (for PCA - 3) and we have fully constructed object:

decode v = decodePart (PCA, l) >>= decodePart >>= decodePart >>= return . fst
-- or with TupleSections extension we can make it poit-free
decode = (decodePart >=> decodePart >=> decodePart >=> return . fst) . (PCA,)

The problem is - we must put as many calls to decodePart as parameters to the constructor. This makes it quite unpleasant to work with. Fortunately, we can overcome this with a little bit of TemplateHaskell.

As I said earlier, the constructor’s type is always of the form PersistEntity => a -> b (where b may be of the same form). This means that we could call decodePart recursively, but unfortunately I don’t know what to do to make it stop calling with only a function (as we can’t examine type in code that easily). But! We can make a typeclass that will be able to do exactly this.

class PartialDecode m a where
  type PartialResult a
  decodePart :: (a, [PersistValue]) -> m (PartialResult a, [PersistValue])

We need the associated type PartialResult instead of an additional parameter to the typeclass and functional dependencies as they would make the following definition a nightmare to write (or maybe even not possible).

And the key - an instance of PartialDecode for PersistEntity => a -> b (assumes that the 0, which denotes the constructor, is prepended in SQL):

instance (PersistBackend m, PersistEntity a, PartialDecode m b) => PartialDecode m (a -> b) where
  type PartialResult (a -> b) = PartialResult b
  decodePart (p, v) = do
    (a, v') <- fromEntityPersistValues v
    decodePart (p a, v')

Call to this decodePart for functions like PCA, will construct a real object, but will not compile as this sole instance requires the type to be infinite (as we always require a function to return another function). We have to have a trivial instance of PartialDecode for every type we want to decode (because Haskell does not allow to write a general form) that will just return for decodePart.

-- For PCA
instance (Monad m) => PartialDecode m PCA where
  type PartialResult PCA = PCA
  decodePart = return

Writing this boilerplate code is tiresome and that’s where Template Haskell helps. With a little helper function we can collapse this definition to a single line (or, with a little bit of tweaking, we can force Groundhog to do this for us):

mkPartialDecode :: Name -> Q [Dec]
mkPartialDecode n =
  [d|
    instance (PersistBackend m) => PartialDecode m $(conT n) where
      type PartialResult $(conT n) = $(conT n)
      decodePart = return|]

mkPartialDecode ''PCA and the instance is generated. A little bit of fmaping, mapMing and concating and we can make it accept a list of types instead of only one. We also need to have a way of selecting appropriate constructor to pass to the first call to decodePart. Sadly, this is another obstacle that requires Template Haskell and a bunch of assumptions (type has only one constructor). Introduce another type class for this purpose:

class DecodeObject m a where
    decodeObject :: [PersistValue] -> m a

instance (PersistBackend m) => DecodeObject m PCA where
  decodeObject v = fst <$> decodePart PCA

With a bit of TH, writing it may be automated:

mkPartialDecode :: Name -> Q [Dec]
mkPartialDecode n = do
  TyConI (DataD _ _ _ (c1:_) _) <- reify n
  [d|
    instance (PersistBackend m) => DecodeObject m $(conT n) where
      decodeObject v = fst <$> decodePart ($(conE (getName c1)), v)
    instance (PersistBackend m) => PartialDecode m $(conT n) where
      type PartialResult $(conT n) = $(conT n)
      decodePart = return|]
    where
      getName (NormalC n _) = n
      getName (RecC n _) = n
      getName (InfixC _ n _) = n
      getName (ForallC _ _ n) = getName n

One final bit - real decode function:

decode :: (PersistBackend m, PartialDecodeConstructor m a) => RowPopper m -> m a
decode m = do
    Just l <- m
    decode l

Summary

OK, I must admit - this post is way too long for the solution. It was fun to write (although I changed it too many times), forced me to learn a little bit of Template Haskell and opened my eyes for what I want from ORMs in Haskell. I think that the solution will not be of much use as it is not that elegant, but maybe the overall idea will be useful?

Groundhog is great, but having to be in a monad just to convert some PersistValues is not. I know that this has its application (sum types), but for me it’s mostly overengineering and not something valuable, at least in a data layer. Persistent has a little different way of executing raw SQL queries (and has its own drawbacks), but everything presented here may be replicated there as well. Maybe in my next post I will show this? Who knows… ;)

I know now that I had not really knew what I wanted from ORMs. I think I still don’t know, but at least I’m a bit wiser. ;)