Dealing with composite keys in Django
In this note I explain how I solved by using triggers in MS SQL Server the problem of using a legacy database with composite keys in a system developed in Django that only supports simple keys. Actually, the solution applies to any development environment, since the trigger functionality is exclusive to the database and exists in other engines such as PostgreSQL, Mysql, DB/2 or Oracle.
To summarize my problem, in my database I have a sales order model with a unique primary key (id) but related to an activity that has a composite key (id_company + id_activity). The system is multi-company, i.e. the sales order can belong to different companies. The activity, whose sole purpose is to group sales for statistics, is linked to a company and its key is composed of the company code + activity code. So, for company A there are activities 1 and 2, and for company B there are also other activities 1, 2 and 3 (which have nothing to do with those of company A).
Disclaimer: the names of tables and columns have been changed so as not to violate confidentiality rules.
Context
The project involves replacing a system developed by a third party by one made with Django. We do not have access to the source code but we do have access to the database.
In Django our models representing the tables are unmanaged (managed = False). In addition, by project definition, we wanted to keep the legacy system working and gradually replace it.
MS SQL Server is not officially supported by Django, but there is a mssql-django library that allows us to establish this connection. Since my very first first table that I had to mirror in models.py, I found that it was related to several tables whose primary keys are composite, something that Django does not support:
Do Django models support multiple-column primary keys??
No. Only single-column primary keys are supported.
Is it a good idea to use composite keys?
For me definitely not, although not everyone thinks so. If you are not interested in knowing of composite keys you can skip this section.
I believe that composite keys are a legacy from the early days of computing, in which the difference in bytes between a smallint and an int could mean a dramatic increase in memory and storage. Let's look at a simple case: an invoice with its invoice lines, one for each item to be sold. The invoice would have its identifier, and the lines hardly exceed 256 (1 byte); moreover, the line number is also a necessary data, especially if we think of an invoice where we have the same product in different lines with different conditions. So, my table of invoice lines would have a header ID, and a line ID, why add again a line ID column that is unique? Just to consume more memory we would have been told 30 years ago. A similar case to the professor that used to cross himself when he saw us using in Pascal (are you old but so old?) a long when we could use an int, or a double instead of a float. 20 years ago when I was a student, we were taught that even if you have an external key, it was always a good idea to use your own autonumeric ID. But I don't remember reading or being taught anything against primary keys. They were there, they could be used.
When I came across modern frameworks something I loved was to see that certain actions I didn't have to think about, they were just like that. Tables always carry a primary key of their own and it is called id. Period. Let's use intelligence in other things.
Trying to make composite and primary keys coexist
If you are only interested in the actual solution you can skip this section.
A "simple" alternative would have been to add an autonumeric column and replace foreign keys pointing to composite key tables with their corresponding simple key. It would involve creating columns, doing a few massive UPDATEs and finally deleting the unnecessary columns. But this was not feasible since we intended to keep the original system working.
I thought: if my new id column is autonumeric, even if the legacy system doesn't know it, just by doing an insert the database would take care of them. I did the test and indeed it worked, a table had its id column in addition to the composite key, and although the legacy system had no idea of its existence, when the legacy did the insert there was the new id. If it did an update there would be no problem either, if someone changed the value of a column in the legacy system, the id would remain immutable (in fact, being autonumeric it is immutable, something —on the other hand— recommended for a primary key).
However, what did not have an easy solution was the edition of a record in which the link to a table with a composite key was altered. To exemplify it with my case: a sale had a company and an activity, changing the company was not possible, but the activity of course it was, since it is statistical. Maybe if at the end of the day I ran a process to match both keys it would solve it, but this implied that my new application would have incorrect data in the meantime. Something similar happened when inserting a new record in the sales table: I would have NULL in my new column until some process matched them. I didn't like it.
Teaching Django to use composite keys
If you are only interested in the actual solution you can skip this section. If you are hesitating about the approach to use, the following may help you understand the implications.
My first attempt was to look for additional packages in Django that would allow us to solve that lack. As it is an open source alternative, many would have already had the same problem and would have published the solution. And I found them, of course. In fact, I developed several months under that paradigm in which I encountered several problems that made me think of a different way to approach it.
I had to install two packages: one for the models to accept a foreign key with a composite key, and another for the model itself to have a composite key.
Composite foreign keys: django-composite-foreignkey
My sales table did not have a composite key, but it did have a composite foreign key. So I installed the package without much hassle and went to work it out. The modeling was solved like this:
activity = CompositeForeignKey(Activity, on_delete=CASCADE, related_name='sales', to_fields=
???????"company ": "company_id",
???????"activity": "activity_id"
???})
However, what I finally got was this:
activity = CompositeForeignKey(Activity, on_delete=models.DO_NOTHING, to_fields={
????"EmpresaCodigo": "VentaEmpresaCod",
????"ActividadesEmpresaId": "VentaActiEmpresaId"
??})
No, not really neat. Basically because I couldn't keep the name in English and with underscores as good practices would indicate, but why if I could give those columns the pythonic names? Moreover, one of the things I liked the most when I started the project was to be able to replace the mess of names that were in the database with some chosen by me.
The problem was that, when I tried to use the names of the columns chosen by me in the foreign keys, something in the library was not prepared for it and I was forced to maintain the original names. The worst part of the case is that not only did it affect the composite key, but I had to go and change the part of the foreign key that was itself a simple foreign key by itself. So instead of using company as I wanted I had to use VentaEmpresaCod (SalesBusinessCode, would be ugly in English too). And so on in all my queries, highlighting between attributes in English and with underscores.
But well, now when I wrote in my code sale.activity I accessed my related model, and when I needed to access sale.company I had to put sale.VentaEmpresaCod. I convinced myself to live with those horrible names. I even thought of touching the code of the external library, it seemed to me that it was a detail that would not have required so much effort to correct.
领英推荐
Compound primary keys: django-viewflow
Finding this was not so easy. In fact, it involves installing a version of a not-yet-officialized package. And the initial problem was the same: goodbye to pythonic names. To continue with the previous example, my "activity" model now had an id column defined as follows:
id = CompositeKey(columns=['EmpresaCodigo', 'ActividadesEmpresaId'])
Again, the link from activity to company was through EmpresaCodigo (CompanyCode) and not company as I would have liked. More was lost in the war I said and moved on.
Other problems
While implementing I ran into a strange error when trying to use select_related in order to speed up a query. I was already about to look for other libraries when I realized that if I replaced it with prefetch_related my query worked again. Yes, the solution was not the same, but for all practical purposes it worked.
The kicker was when I decided to incorporate testing into the project. In both libraries the philosophy was the same, we had to be able to use a legacy database, so we would not create it with django migrations. However, for testing we have to create the database from Django! If the libraries were not so used, and therefore did not end up solving the drawbacks well, they hardly generate correctly the composite keys in the testing databases. To complete the picture, I was connecting to MS SQL Server, a database not officially supported, which could make the process worse. NOTE: connecting to the database was not trivial, I had to make a strange change in the TLS configuration of the Debian servers where it was running.
Triggers
I went back to my initial idea of adding an autonumeric id column to tables with composite keys (and incorporating this new column as a duplicate foreign key in the table that relates to it). To go over my aforementioned example, my activity would have a new id column apart from its composite key; a sale would have the company code, the activity code (both linked to my activity table) and also an activity_id that would be linked to that new autonumeric column.
The problem was editing in the legacy system, so if I edited the composite foreign key in a record, the relationship was inconsistent, VentaActiEmpresaCod (activity composite code) was updated but not activity_id. The same if I inserted a record in the sales table, my activity_id would start with the value null.
Googling I learned about triggers, a table level functionality that issues an action each time a record is inserted, updated or deleted (which operations trigger the trigger is something that is defined in the trigger itself). It is something designed mainly for logging. Every time someone makes an operation we can record in another log table for example who deleted and when. A trigger could make that every time a record is inserted or edited, it would look for the value in the related table, and if the redundant foreign keys did not match, it would match them.
It worked like a charm, but what about performance? Obviously some impact it would have, especially if you did a massive UPDATE, since for each of the records you would have to launch the trigger to match the fields. I did a test on a copy of the DB and it seemed pretty fast.
But of course it could not go so well. The next day after several hours of running smoothly the application threw a strange error linked to the database. The tempdb table (which I didn't know about) was overgrown and affected the whole engine. A web search confirmed my fears: the tempdb table was used by the engine itself when executing a trigger.
What happened is that at the time the problem started an automatic process was activated that updated a value in all the sales records, and evidently the way it did it caused an excess of calls to my new trigger.
The solution was simple. In each trigger I had to check if the column I was interested in (VentaActiEmpresaCod in this case) was being affected, and only in that case continue executing.
A disadvantage of this approach is that in the code I developed I always had to contemplate the two keys, the autonumeric one generated by me and the previously existing composite one. Otherwise the legacy system would not have the information to join the data properly, and also the trigger would try to match them by undoing the changes made by my development.
The complete solution
The following are MS SQL Server statements, they are likely to change a bit in other engines.
First step: create a new id column in the database to the table with composite key and put a UNIQUE constraint on it.
ALTER TABLE dbo.Actividad ADD id INT IDENTITY(1,1)
ALTER TABLE dbo.Actividad add unique(id)
Second step: create a new column in the table linked to the composite key.
ALTER TABLE dbo.Ventas add activity_id int
Third step: associate this new column to a foreign key.
ALTER TABLE Ventas ADD CONSTRAINT FK_Actitity FOREIGN KEY(activity_id) REFERENCES dbo.Actividad(id)
Fourth step: massively update all the records in my table linked to the composite key. With an inner join I should get the unique id representing the composite key and update the corresponding column.
# Ventas=Sales in Spanish
UPDATE dbo.ventas
SET activity_id=a.id
FROM dbo.ventas v inner join dbo.Actividad a
ON v.VentaEmpresaod=a.EmpresaCodigo and v.VentaActiEmpresaId=a.ActividadesEmpresaId
WHERE v.activity_id is null or v.activity_id <> i
Fifth and last step: add the trigger to my table linked to the composite key, which is activated with INSERT and UPDATE only if some of the columns of the composite key are affected.
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TRIGGER [dbo].[after_sale_upsert]
ON [dbo].[Ventas]
AFTER INSERT,UPDATE
AS
BEGIN
??????IF UPDATE(VentaActiEmpresaId)
??????BEGIN
????????????update l
????????????set activity_id = a.id
# Ventas=Sales in Spanish
????????????from Ventas v inner join Actividades a
on l.VentaEmpresaCod=a.EmpresaCodigo and v.VentaActiEmpresaId=a.ActividadesEmpresaId
????????????where (v.activity_id is null or v.activity_id <> a.id)
??????END
END
Conclusion
One way to avoid the inconvenience that occurs when in Django we try to model a pre-existing table with a composite key, is to add an autonumeric column that works as a primary key for Django but not for the database.
If a legacy system only knows the composite key, this will be a problem when the legacy system inserts records or updates in those tables that are linked to another through composite keys.
By using triggers we can make the database take care of keeping the composite key and the autonumeric key consistent when the insert/update comes from the legacy system.
We must keep in mind in the development of the new system that both composite and simple keys are always updated.