About my experience of using Django Framework¶
At one time, someone beautifully said that security is a balance between the cost of protection and the potential benefits of hacking. There is no sense to exceed this balance.
Taking a decision on IT-technologies, we are also trying to find a balance between the costs of maintaining technology (including search and training of new staff) and the functionality that is being acquired.
Django framework, of course, brings some trouble, but at the same time it allows you to solve a huge range of tasks quickly and easily find the developers. In other words, Django framework makes software development by Python cheaper. With a competent approach, you can use all the advantages of Django and not become a hostage of its shortcomings.
Contents
- About my experience of using Django Framework
- Django Models problems and their solutions
- Semantic coupling of model validation
- Active Record
- Identity Map
- Transactional consistency of data
- Service Layer and django.db.models.Manager
- Composite foreign keys and Django Models
- Building of complicated SQL-queries for Django ORM
- Implementation of complicated Models for Django Framework
- Third-party tools
- Cache invalidation
- Model Versioning and Audit Log
- Django REST framework
- Graphql
- Advantages and disadvantages of Django Framework
- Conclusion
Django ORM brings the most trouble, so we’ll start with it.
Django Models problems and their solutions¶
Semantic coupling of model validation¶
The principle of “Defensive Programming” [2] requires making it impossible to create an invalid object. You must use object setters for validation. In Django Model, we have to explicitly call the method Model.full_clean() of an object before saving, that, of course, often no one does, and this often leads to various troubles. This problem is known as “Semantic Coupling” as well as “G22: Make Logical Dependencies Physical” [1] and “G31: Hidden Temporal Couplings” [1]. You can solve this problem technically, but usually it’s enough just to follow the development discipline.
Active Record¶
Django Model implements the ActiveRecord pattern, which makes it easy to use due to violation of the Single responsibility principle (SRP) principle, for this reason it is often called antipattern. This pattern mixes business logic and data access logic in one class. Unfortunately, this simplicity is appropriate only in simple cases. In a more serious application, there are more problems than advantages.
Since Django does not implement Repository pattern, it would be desirable to hide an implementation of data access logic at least with “Service Layer”. This is necessary because the capabilities of Django ORM are not always enough to build complicated queries or to create complicated models, and you have to replace Django ORM with third-party tools or a bare implementation of DataMapper pattern (we will return to this issue a little later). In any case, the implementation of data access must be hidden from the application, and this is one of the responsibilities of the Service Layer.
In the article “Clean Architecture in Django” you can find an example of using the Repository pattern to hide the data source for Django-application.
Identity Map¶
Django does not implement the pattern Identity Map, and, as a result, creates many duplicate queries. Part of this weakness is mitigated by the presence of prefetch_related(). There are implementations of this pattern in the form of third-party libraries, django-idmapper, django-idmap. But they do not perform any functions except caching, and do not provide transactional data consistency. However, you hardly notice this problem, since the Django application usually processes an HTTP-request inside one transaction.
Transactional consistency of data¶
Django allows you to create multiple instances of the same domain object in the thread’s memory, and this can lead to data loss due to the dissynchronization of the state of these instances. Worse still, these instances do not synchronize their state with their records in the database at the time of the commit (rollback) of the transaction.
Django supports transactions, but does not support the transactional consistency of the data unlike the Storm ORM / SQLAlchemy. You have to take care about the state model instances in the memory at the time of the commit (rollback) of the transaction.
For example, if you use the transaction isolation level “Repeatable read”, after the transaction is committed, the status of your model instances in the memory may become outdated. Accordingly, when you roll back a transaction, you must return the initial state to them.
As previously mentioned, this is not critical for HTTP request processing, since Django framework usually serves it with one transaction. But when you develop command-line scripts or scheduled tasks, you need to take this into account.
You must also take care of yourself to prevent Deadlock, since the Django ORM does not implement the Unit of Work pattern and does not use topological sorting.
It is worth also mention the frequent problem of novice developers, who are trying to process a large collection of objects without using select_for_update(). The processing of the collection takes a considerable amount of time, which is enough for the loaded object, waiting for its processing, to change the record in the database. Unskilful use of transactions can lead to the loss of parallel changes, and skillful use can lead to an unresolvable conflict.
In addition, you should carefully read all the precautions of the iterator() method, the use of which does not guarantee that there is no memory leak if you do not use SSCursor for MySQL.
Service Layer and django.db.models.Manager¶
A common mistake is using the django.db.models.Manager class as a Service Layer. This question was considered in detail in the article “Design of Service Layer and Application Logic”.
Composite foreign keys and Django Models¶
As you can see from the ticket #373 and the discussion of “Multi-Column Primary Key support”, Django Model does not yet support composite relations.
This means that you have to create surrogate keys, which can cause certain difficulties in the integration of an existing database, or you have to use one of these libraries:
Frankly, I have not used these libraries. In that case, I just do not use Django ORM. But you have a choice.
Building of complicated SQL-queries for Django ORM¶
The capabilities of the Django ORM interface are not enough to build complicated SQL queries. Fortunately, Django ORM perfectly deals with Raw-SQL, which means that the responsibility of creating a SQL-query does not necessarily have to be assigned to it. In this case, you have to either use third-party tools that will be discussed later, or use Raw-SQL. In any case, the details of implementation should be encapsulated within a query factory class.
In my practice there was a case when it was necessary to implement a user search by pattern matching (LIKE ‘% keyword%’) in the Django admin panel using the user table joined with the table of profiles (using LEFT JOIN).
Moreover, the search criteria had to be combined with the OR condition, this leaded to a complete pass through the attached table for each row of the user table. There were several million MySQL database entries, and it worked very slowly. That version of MySQL did not yet support ngram FULLTEXT index. To optimize the query, we had to join the already filtered result from the profile table instead of the entire profile table, by moving the selection criterion to a subquery. A similar example can be found in the book «High Performance MySQL» [4]. To solve the problem my colleague had to make an adapter for sql-builder Storm ORM like sqlalchemy-django-query. As a result, it became possible to express an SQL query of any complexity in the interface of django.db.models.query.QuerySet.
Implementation of complicated Models for Django Framework¶
Very often you have to deal with objects that contain aggregated data, annotations, or combine the data of several tables.
SQLAlchemy certainly provides more flexible features. But even these features are not always enough.
The annotations of Storm ORM / SQLAlchemy are implemented more successfully than annotations of Django ORM (which are better not to use at all, in favor of a bare implementation of the pattern Data Mapper). The fact is that the model scheme is constantly evolving, and new fields are constantly added to it. And it often happens that the name of the new field is already used by the annotation that leads the conflict in the namespace. The solution can be to separate the namespace by using a separate model or Wrapper for annotations over the model instance.
Identity Map is another reason not to use the Django ORM annotations (and also be careful with prefetch_related()). After all, if there is only one instance of an object in the thread, then its state can not have any differences for each particular query.
That is why it is important to hide the implementation details of the data access using Repository pattern or Service Layer. In this case, I just make an implementation in the form of the bare pattern DataMapper and the plain Domain Model.
Practice shows that such cases usually do not exceed 10% (rarely 30%), which is not so significant for refusal from Django ORM, because the attractiveness of easy hiring of the developers still outweighs.
Third-party tools¶
SQLAlchemy¶
Django has several applications for SQLAlchemy integration:
SQLBuilder¶
To build complicated queries for Django Model, I usually use the library sqlbuilder.
Good manners require you to create a separate factory class for each query to hide implementation details from the application. Within the interface of this class, you can easily replace one implementation with another.
Storm ORM¶
The issue of integration of Storm ORM has already been considered, so I’ll just give the links:
Testing¶
If you use several data access technologies, then it’s worth mentioning the fake data generator mixer, which supports several ORMs. Other generators can be found, as usual, on djangopackages.org.
Cache invalidation¶
Django Model implements the ActiveRecord pattern, which forces us to explicitly call Model.save() method. The problem is that the post_save and pre_delete signals are often used by developers to invalidate the cache. This is not quite the right way, since Django ORM does not use the Unit of Work pattern, and the time between saving and committing the transaction is sufficient to parallel thread could recreate the cache with outdated data.
On the Internet, you can find libraries that allow you to send a signal when the transaction is committed (use search query “django commit signal” on pypi.python.org). Django 1.9 and above allows you to use transaction.on_commit(), which partially solves the problem if you do not use replication.
I use the library cache-dependencies, as I wrote in the article “About problems of cache invalidation. Cache tagging.”.
Model Versioning and Audit Log¶
Django has a lot of libraries for model versioning, see, for example, “Model Auditing and History” and “Versioning”. However, I did not succeed in finding such a mature and perfect solution as sqlalchemy-continuum among Django-libraries.
As a result, I had to write a library for versioning (implementing Slowly Changing Dimensions (SCD) - Type 4) by myself (it is not public, since all rights belong to the company), which allows you to restore the state of the aggregate (i.e. the structure of interrelated objects) for the specified version, even if some of the objects of the aggregate has been removed. Since the boundaries of the aggregate are also the boundaries of the transaction, the implementation of versioned relations was easily solved with already mentioned library django-composite-foreignkey, which allows you to organize composite (including the version stamp of the object) relations between model instances.
The following libraries and articles helped me with information:
- Automating an audit trail
- django-audit-log
- cleanerversion
- sqlalchemy-continuum
- Audit Log
- Slowly changing dimension
- Change data capture
- Anchor modeling
- Shadow table
- Audit trigger
- Audit trigger 91plus
- How to Implement Audit Functionality In PostgreSQL
- PostgreSQL Audit Extension
As an alternative to versioning an aggregate, you can use JSON-patch at the level of serializer of Django REST Framework. However, serializers can have different versions, and in this case you will need to create a separate serializer for versioning. In this case, however, the question arises how to create a diff for a list with a changed order of objects, so that the contents of the moved objects are not included in the diff.
See also:
Django REST framework¶
If we have previously considered the shortcomings of Django ORM, the Django REST framework surprisingly turns its disadvantages into advantages, because the interface for building Django ORM queries is great for REST.
If you were lucky enough to use Dstore on the client side, then you can use django-rql-filter or pyrql on the server side.
Frankly, the Django REST framework requires a lot of time for the debugger to help researching it, and this, of course, characterizes its design solutions not from the best side. A good program should be read, not understood, and even more so without the help of a debugger. This characterizes the observance of the main imperative of software development:
Software’s Primary Technical Imperative is managing complexity. This is greatly aided by a design focus on simplicity. Simplicity is achieved in two general ways: minimizing the amount of essential complexity that anyone’s brain has to deal with at any one time, and keeping accidental complexity from proliferating needlessly. («Code Complete» [2])
However, the overall balance of advantages and disadvantages makes the Django REST framework very attractive for development, especially if you need to involve new (or temporary) developers or allocate some of the work for outsourcing.
You just have to take into account that there is a certain entry barrier, which requires certain costs to overcome it, and you need to understand what benefit you can get from this, because not always this benefit is worth the effort to overcome the entrance barrier.
I will not dwell on the criticism of the design decisions, the Django REST framework does not restrict me in anything constructively, and this is most important.
SQLAlchemy¶
The huge advantage of Django REST framework is that it is ORM agnostic. It has perfect interfacing with Django Models, but it can easily work with a bare implementation of the Data Mapper pattern which returns a namedtuple collection for some Data Transfer Object. It also has good integration with SQLAlchemy in the form of a third-party application djangorest-alchemy (docs). See also discussion of the integration.
MongoDB and MongoEngine¶
Also, the Django REST framework has an integration application to support MongoDB and Mongoengine django-rest-framework-mongoengine. An example of use can be found in django-rest-framework-mongoengine-example with the description in the article “Django + MongoDB = Django REST Framework Mongoengine”.
GIS¶
Third party application django-rest-framework-gis supports GeoJSON.
OpenAPI и Swagger¶
Django REST framework allows you to generate scheme OpenAPI and integrates with swagger using the django-rest-swagger library.
This opens up unlimited possibilities for generating Service Stub for clients and also allows using one of the existing stab generators for swagger. This allows you to test client-side without any server-side implementation, divide the responsibility between client-side and server-side developers, quickly find the cause of problems, freeze the communication protocol, and, most importantly, allows you to develop client-side in parallel even if server-side implementation is not finished yet.
OpenAPI schema could also be used to automatically generate tests, for example, using the pyresttest.
My friend works on the python-easytest library, which eliminates the need for writing integration tests and performs the testing of the application using the OpenAPI scheme.
JOIN-s problem¶
The Django REST framework is often used together with django-filter. And here is a problem, which is reflected in the documentation as:
“To handle both of these situations, Django has a consistent way of processing filter() calls. Everything inside a single filter() call is applied simultaneously to filter out items matching all those requirements. Successive filter() calls further restrict the set of objects, but for multi-valued relations, they apply to any object linked to the primary model, not necessarily those objects that were selected by an earlier filter() call.”
See more info on: https://docs.djangoproject.com/en/1.8/topics/db/queries/#lookups-that-span-relationships
To solve this problem, you should use a wrapper with lazy evaluation in the FilterSet() class instead of the real django.db.models.query.QuerySet, which will fully match its interface, but will call the original filter() method once, passing all accumulated selection criteria to it.
Generating *.csv, *.xlsx¶
Django and Django REST framework has a lot of extensions. This is a major advantage for which it makes sense to tolerate their shortcomings. You can even generate *.csv, *.xlsx files:
However, there is a problem with translating the nested data structures into the flat list, and vice versa, with the parsing of the flat list into the nested data structure. Partially this problem can be solved using the library jsonmapping. But this decision did not suit me, and I had done a complete declarative data mapper.
Graphql¶
- graphene-django - a Django integration for graphene.
Advantages and disadvantages of Django Framework¶
Advantages¶
Django framework is written by the great programming language Python.
Django has a successful View, which is a kind of the pattern Page Controller, fairly successful forms and template (if you use django.template.loaders.cached.Loader).
Despite all the shortcomings of Django Models, its query building interface is well suited for the REST API.
I can also note that, despite the limited capabilities of the interface for creating SQL queries, Django ORM is designed ideally for imitating aggregates (in DDD and NoSQL this means composite of nested documents), which greatly facilitates the use of NoSQL databases such as MongoDB.
There are several ORMs to deal with MongoDB, which largely reproduce the Django ORM interface, for example MongoEngine. There are also backends for native Django ORM with MongoDB support, for example, djongo (source code). For other solutions, see the MongoDB documentation.
For business, this means that it can relatively painlessly substitute RDBMS with MongoDB, affecting only a small part of the client code (if certain conditions were met beforehand), and any Django developer can instantly start working with MongoDB at the level of the abstract interface.
However, NoSQL databases are usually used in conjunction with graphical databases or an external indexing engine, and such a bundle of different technologies should be hidden behind the Repository (or Service Layer) layer, which is usually absent in Django applications.
Django has a huge community with a huge number of ready-made applications. It is very easy to find developers (and outsourcing companies) for Django and Django REST framework.
Django declares such a way of development, which is not exacting to the skill level of developers.
Django can save a lot of time and financial resources if used properly.
Disadvantages¶
The level of complexity of Django grows with each release, often outstripping the opportunities it implements, and from this its attractiveness is constantly decreasing.
If you need to adapt Django ORM for your needs, then it’s probably more difficult to do this with the latest release than to adapt SQLAlchemy. And it needs to adapt more often than SQLAlchemy. Simplicity is no longer the main prerogative of Django, as it was in earlier versions. Almost in all projects that I had to deal with, Django ORM was supplemented (or replaced) with third-party tools or bare implementation of the Data Mapper pattern.
In the circle of my friends Django framework is used mainly because of habit and inertia.
Despite the fact that Django framework has a huge number of ready-made applications, their quality often leaves much to be desired, or even contains bugs. Moreover, very insidious bugs may appear, which only appear in a multi-threaded environment under high loads, and which are very difficult to debug.
The quality of developers specializing in Django is also often low. Highly skilled developers from my friends try to avoid working with Django.
Conclusion¶
Whether to use Django framework depends on what goals you set for yourself and how qualified are the teams you have.
If your team is highly qualified in the field of architecture and design, you use collaborative development techniques for the dissemination of experience, have sufficient resources and finances to make the project more better without Django, then it makes sense to use another stack of technologies.
Otherwise the Django framework can do you a good favor. A lot of overconfident teams have not been able to improve their projects by excluding Django Framework.
Nobody obliges you to use Django anytime and anywhere. Django REST framework allows you to abstract from Django ORM and even from its serializer.
If you are engaged in outsourcing, your average project lasts no more than a year, the budget is low and the deadlines are short, then Django has a lot to offer you.
If you are working on a large ongoing project, the benefits are not so obvious. All the matter in the balance that you need to determine for themselves.
But if you use Bounded Contexts or Microservice Architecture, then each team can decide on their own technology stack. You can use Django only for part of the project, or use only some of the Django Framework components.
And you can not use it at all. Among the alternatives, I advise you to pay attention to the web-framework that impresses me wheezy.web.
Эта статья на Русском языке “О моем опыте использования Django Framework”.
Footnotes
[1] | (1, 2) «Clean Code: A Handbook of Agile Software Craftsmanship» Robert C. Martin |
[2] | (1, 2) «Code Complete» Steve McConnell |
[3] | «Refactoring: Improving the Design of Existing Code» by Martin Fowler, Kent Beck, John Brant, William Opdyke, Don Roberts |
[4] | «High Performance MySQL» by Baron Schwartz, Peter Zaitsev, and Vadim Tkachenko |
Updated on May 16, 2018