For those of you wondering what the background story on the previous post is about: we had a meeting today to discuss 2 alternatives on how we should implement the database audit trail for a new project we're doing. These are the functional requirements:
- We have to be able to easily retrieve what user X changed in the database on day Y or during a certain time frame.
- We need to know what record X looked like at point Y in time, and be able to show a historic list of changes for a specific record.
- We need to configure the duration for which auditing data must be kept in the database on a per-table (or per-schema) level
We've done this in the past, but never in a manner that is just perfect (or close enough to being perfect anyway) for both requirements. So we discussed two alternative approaches to tackling this problem.
The first approach is to have 3 tables. The first two tables would contain metadata about the database structure (tables and columns basically). The third table would have a reference to each column with a previous value, the new value, who changed it and a timestamp of when the change occurred. We would use database triggers that would store the appropriate auditing data in the third table for each action (insert/update/delete). The benefits of this approach would be that it would be pretty easy to set up and also easy to query which changes were made by a certain user. It would be a bit more complex to reconstruct the state of a record at a certain point in time, but it would still be doable. The biggest downside (in my opinion), is that for each insert/update you would have X amount of inserts into the auditing table, and you'd also have to either perform some queries to get the identifier values of the relevant tables/columns or hardcode those identifier values in the generated triggers to avoid the cost of the extra select statements. So if you have an entity with 12 columns, an insert into the entity's table would mean 12 inserts into the auditing table. Updating 4 columns of that entity would lead to 4 insert statements in the auditing table, etc...
The other alternative that we considered was to use a specific auditing table for each real table, and every time a user does an insert/update/delete, insert one record into the table's specific auditing table with the current state of the record along with a timestamp, the userid, and an indication of which action was performed (insert/update/delete). This insert would also be done with database triggers. The benefits of this approach would be that it would be very easy to show what the state of a certain entity was at a certain point in time, and that you'd only pay the cost of the single extra insert. There are however two big downsides to this approach. The first is that it'll lead to an explosion of tables. We'd probably store the auditing tables in a separate database but you still basically have twice the amount of tables (and scripts!) that you'd have to worry about compared to what you'd usually have. We can very realistically expect a couple hundred of real tables in a few years, so you'd have to multiply that by two if you'd go for this auditing approach. Another big downside is that it would be very expensive to perform a query which would tell you what user X changed at point Y in time (or during a time frame).
Neither solution appears to be ideal. Also, this database will definitely grow a lot (more tables and obviously data) as we add more functionality and it's pretty safe to consider this entire project as something that will be expanding in functionality for a couple of years to come.
So my question to you is: Which approach do you think is best, and why? Or better yet, is there a better approach that we simply haven't thought of yet? I would definitely appreciate some feedback on this (either on the validity of both approaches or alternative approaches) so please do feel free to leave your thoughts in the comments ![]()
Pingback: Implementing Audit Trail for your application « Niraj Bhatt – Architect's Blog