Recently, I’ve been in a mood to play with ORMs in Haskell. The most recognizable one is Persistent, developed as part of a Yesod project. It is quite pleasant to work with, I must say, but some assumptions it makes are… not suitable for me. So I’ve looked for something different and I found out about Groundhog, yet another ORM in Haskell. Unfortunately, it is almost identical to Persistent, but maybe instead of searching for the perfect library, I will be able to adapt it to my needs?
The point is - I don’t want fancy ORMs with custom DSLs in pure Haskell that will allow me to express every query (well, sort of). Of course, I don’t want to write simple SELECT
s with some conditions by hand as that may be expressed in EDSL well, but advanced queries are just PITA to write in something different than SQL (and even in SQL it’s not that easy). But being able to write raw SQL and then map the result to a custom type is tempting.
My goal is simple - force Groundhog to decode results of raw SQL queries into Haskell objects. This means using queryRaw
with correct decoder. Sounds simple. But it isn’t. Converting a list of values into a real object, using different code for every type is painful and I’ve really wanted to avoid it. To my luck, I was able to leverage PersistEntity
, something that every type usable by Groundhog must instantiate. So, let’s start to code!
The code
Assume we have this model:
data Author = Author {
authorName :: Text
, authorEmail :: Text
}
data Category = Category {
categoryName :: Text
}
data Post = Post {
postTitle :: Text
, postContent :: Text
, postCategory :: DefaultKey Category
, postAuthor :: DefaultKey Author
}
-- This is a triple (Post, Category, Author), but using PCA c-tor is simpler
data PCA = PCA {
pcaPost :: Post
, pcaCategory :: Category
, pcaAuthor :: Author
}
mkPersist defaultCodegenConfig [groundhog|
- entity: Author
- entity: Category
- entity: Post
|]
And we want to get a Post
with its Category
and Author
(PCA
). It’s just a SELECT
with two JOIN
s, so I’ll skip it. How to do so? Simple - queryRaw False queryString [PersistInt64 i] decode
. The problem is - how to write the decode
function? ;)
Let’s face a less complex problem first. How can we decode whole Post
? It instantiates PersistEntity
, so we can use fromEntityPersistValues
. And it has a signature that is almost identical to desired - [PersistValue] -> m (v, [PersistValue])
vs m (Maybe [PersistValue]) -> m v
(and the constraints on m
and v
match, more or less)! We just need to extract a list from a Maybe
, pass it to this function and select the first element of the result:
decode v = do
Just v' <- v
fst <$> fromEntityPersistValues v'
or even
decode = fmap fromJust >=> fromEntityPersistValues >=> return . fst
This was easy. There is one gotcha here - default implementation of PersistEntity
generated by groundhog-th
uses a magic value, encoded as a first column, to determine which constructor should be used (in sum types). Sadly, it does so for EVERY type, even for simple product types like Post
. We need to take care of it ourselves either by manually SELECT
ing 0 as first column, or prepending PersistInt64 0
to the v
.
Let’s take a closer look at fromEntityPersistValues
. What does it do? It tries to decode an object from a list of PersistValue
and returns the remaining values or fails with an error. If there are more values than are needed, it will consume just the required ones and let us decide what to do with the rest. This allows us to construct the inner objects first and then build the outer, like this:
-- To simplify, I assume that `v` is `[PersistValue]` not `Maybe [PersistValue]`.
decode v = do
(post, v1) <- fromEntityPersistValues v
(cat, v2) <- fromEntityPersistValues v1
(auth, _) <- fromEntityPersistValues v2
return $ PCA post cat auth
Can you spot the pattern? Decode an object, save it and, if we need more, (sort of) recursively decode, otherwise construct the final object. But do we really need to construct the object at the end? Can’t we just do it in the meantime? Of course we can! ;) PCA
(or any other constructor) is just a function and we may leverage partial application to keep track of the partially constructed object (or, to be exact, a partially applied constructor).
Additionally, PCA
has a type Post -> Category -> Author -> PCA
and we want to decode an object that may be constructed with a function of type (PersistEntity a, PersistEntity b, ...) => a -> b -> ... -> Object
or, more generally, a function of type PersistEntity a => a -> b
where b
may be of the same form or is a fully constructed object. Having this, we can develop a function that tries to decode a single object (thus consumes some of the PersistValue
s), applies it to the partially applied function and returns it along with the rest of values.
decodePart :: (PersistBackend m, PersistEntity a) => (a -> b, [PersistValue]) -> m (b, [PersistValue])
decodePart (p, v) = do
(o, v') <- fromEntityPersistValues v
return (p o, v')
This way, we can just bind some calls to decodePart
(for PCA
- 3) and we have fully constructed object:
decode v = decodePart (PCA, l) >>= decodePart >>= decodePart >>= return . fst
-- or with TupleSections extension we can make it poit-free
decode = (decodePart >=> decodePart >=> decodePart >=> return . fst) . (PCA,)
The problem is - we must put as many calls to decodePart
as parameters to the constructor. This makes it quite unpleasant to work with. Fortunately, we can overcome this with a little bit of TemplateHaskell
.
As I said earlier, the constructor’s type is always of the form PersistEntity => a -> b
(where b may be of the same form). This means that we could call decodePart
recursively, but unfortunately I don’t know what to do to make it stop calling with only a function (as we can’t examine type in code that easily). But! We can make a typeclass that will be able to do exactly this.
class PartialDecode m a where
type PartialResult a
decodePart :: (a, [PersistValue]) -> m (PartialResult a, [PersistValue])
We need the associated type PartialResult
instead of an additional parameter to the typeclass and functional dependencies as they would make the following definition a nightmare to write (or maybe even not possible).
And the key - an instance of PartialDecode
for PersistEntity => a -> b
(assumes that the 0
, which denotes the constructor, is prepended in SQL):
instance (PersistBackend m, PersistEntity a, PartialDecode m b) => PartialDecode m (a -> b) where
type PartialResult (a -> b) = PartialResult b
decodePart (p, v) = do
(a, v') <- fromEntityPersistValues v
decodePart (p a, v')
Call to this decodePart
for functions like PCA
, will construct a real object, but will not compile as this sole instance requires the type to be infinite (as we always require a function to return another function). We have to have a trivial instance of PartialDecode
for every type we want to decode (because Haskell does not allow to write a general form) that will just return
for decodePart
.
-- For PCA
instance (Monad m) => PartialDecode m PCA where
type PartialResult PCA = PCA
decodePart = return
Writing this boilerplate code is tiresome and that’s where Template Haskell helps. With a little helper function we can collapse this definition to a single line (or, with a little bit of tweaking, we can force Groundhog to do this for us):
mkPartialDecode :: Name -> Q [Dec]
mkPartialDecode n =
[d|
instance (PersistBackend m) => PartialDecode m $(conT n) where
type PartialResult $(conT n) = $(conT n)
decodePart = return|]
mkPartialDecode ''PCA
and the instance is generated. A little bit of fmap
ing, mapM
ing and concat
ing and we can make it accept a list of types instead of only one. We also need to have a way of selecting appropriate constructor to pass to the first call to decodePart
. Sadly, this is another obstacle that requires Template Haskell and a bunch of assumptions (type has only one constructor). Introduce another type class for this purpose:
class DecodeObject m a where
decodeObject :: [PersistValue] -> m a
instance (PersistBackend m) => DecodeObject m PCA where
decodeObject v = fst <$> decodePart PCA
With a bit of TH, writing it may be automated:
mkPartialDecode :: Name -> Q [Dec]
mkPartialDecode n = do
TyConI (DataD _ _ _ (c1:_) _) <- reify n
[d|
instance (PersistBackend m) => DecodeObject m $(conT n) where
decodeObject v = fst <$> decodePart ($(conE (getName c1)), v)
instance (PersistBackend m) => PartialDecode m $(conT n) where
type PartialResult $(conT n) = $(conT n)
decodePart = return|]
where
getName (NormalC n _) = n
getName (RecC n _) = n
getName (InfixC _ n _) = n
getName (ForallC _ _ n) = getName n
One final bit - real decode
function:
decode :: (PersistBackend m, PartialDecodeConstructor m a) => RowPopper m -> m a
decode m = do
Just l <- m
decode l
Summary
OK, I must admit - this post is way too long for the solution. It was fun to write (although I changed it too many times), forced me to learn a little bit of Template Haskell and opened my eyes for what I want from ORMs in Haskell. I think that the solution will not be of much use as it is not that elegant, but maybe the overall idea will be useful?
Groundhog is great, but having to be in a monad just to convert some PersistValue
s is not. I know that this has its application (sum types), but for me it’s mostly overengineering and not something valuable, at least in a data layer. Persistent has a little different way of executing raw SQL queries (and has its own drawbacks), but everything presented here may be replicated there as well. Maybe in my next post I will show this? Who knows… ;)
I know now that I had not really knew what I wanted from ORMs. I think I still don’t know, but at least I’m a bit wiser. ;)