Many of the programmers fall into the trap of creating too many unnecessary abstractions in code, that may introduce even more chaos and maintenance issues, instead of simplifying overall project structure and providing some real benefit. One of such abstractions, that have been discussed countless number of times, is the (one and only) repository pattern. I’m going to mostly whine about this abstraction (I have to point out some common mistakes) which of course can be useful in some cases. In order to keep things clear (and stop making that wall of text even bigger wall), in the next post, I’ll provide code examples of the extension methods that you can use, to have your data access logic aggregated in a single place and separated from the other infrastructural code. Bare in mind that the extension methods are not the only solution (also query handlers may come in handy, which I’d like to discuss in the future as well).
As we all know (I hope), the repository pattern provides an access to the object storage where that storage could be literally anything (memory, database, file system etc.). It exposes the set of CRUD operations, and usually some additional queries for more advanced filtering (so called criteria pattern). In the world of DDD (Domain Driven Design) each operation in the repository should be performed as an atomic operation, and we can leverage that behaviour with another pattern called Unit of Work, that is able to “tie” multiple calls between different repository/ies methods into a single, atomic transaction. However, one of the most popular arguments is that the usage of the repository pattern allows us to easily (at least in theory) swap the underlying implementation of an object storage and for example, seamlessly switch from one database provider to another. And here’s where everything starts to fall apart.
I’ll try to explain why, using comparison between the SQL database (e.g. MSSQL) and NoSQL database (e.g. MongoDB), and as we know from the mathematics, proving a single counterexample makes the whole theorem false ;).
At first, let me ask you a question – when was the last time, that you woke up in the morning and all of a sudden had the following thought: “let’s replace the current database with another one, I’ll just switch the interface and everything will work just fine”? Replacing one data storage system with another one is not the „plug & play” type of thing. It turns out that pretty often it’s a rather huge architectural decision that may have (and usually does) a big impact on the whole application that you’re building. And it’s not just about the way that queries are being processed by database or whether you can use the transactions or not. It’s about the whole domain of the software being built.
We may fool ourselves that the domain should be ideally pure, and unaware of the underlying data storage, but unless there are some middleware objects (e.g. specialized factories), that would map our database schema into the domain entities, there will always be some leakage (at least conceptual one) into our domain, having an influence on how we’re defining our entities. Some may say that the repository should handle that mapping (internally fetch database objects and then map them into our entities with our without the help from the factories) and while that statement is true, such a solution usually increases complexity and has an impact on the overall performance once we have to deal with complex queries and map multiple objects into entities (but that’s not the topic of that post). The thing is that defining the associations within our entities (which should not just be “anaemic property bags”) may differ greatly whether SQL, NoSQL or anything else is being used. For example, NoSQL due to its “schemaless” nature allows us to define the entities with much more flexibility than the traditional SQL approach. We can literally model aggregate roots (DDD strikes again) 1:1 (entity:db object). And why wouldn’t you take advantage of a such opportunity? Because you can replace the ISqlRepository with INoSqlRepository interface in your IoC container?
However this is not the only issue. Let’s say that you use the Unit of Work pattern, to commit a single, transactional call that involves multiple repositories. Well, where does that UoW goes now, once you replace your SQL database with another one, that does not support transactions?
And what about these primary keys, usually defined as an auto increment integer? Sure, you can use GUIDs instead, but maybe (just maybe) you don’t really want to deal with these nasty sets of 32 hexadecimal digits with hyphens between them?
Last but not least, I’m not even going to talk about the GenericRepository<T> which is a terrible implementation (as the repositories should only contain specific operations for entities or rather the aggregate roots in a DDD world) or creating abstractions on top of the DbContext when Entity Framework ORM is being used. Besides, please try to think about some special cases in which you need to make a call to the database to get some specific fields (it has to be a really fast and performant call), and then map these values into something else (which is not the domain entity). Would you introduce that operation into your repository interface (that is being called only in a single place) and put that junk object along with the domain models, which obviously violates not only the purpose of this pattern but also the domain itself? There are even more reasons against using the repository pattern without giving it a second thought, but I hope that you get my point by now.
I do believe that this pattern can be really useful and might fit well e.g. if we’re building our software based on DDD principles and carefully specify the interface contract, but very often is misunderstood and poorly implemented (some generic crap) which is a result of some false claim, that the more abstractions and layers we have in our code, the better (and smarter) developers we are (been there, done that).