Partition your Rolling Fact by time or not

We are creating a data warehouse to store event logging for actions done by user. The requirements are to keep 6 months of historical data and allow user to run audit reports. The challenge here is to partition the facts by creationtime of the record or not. The advantage of partitioning by creationtime is that we can chop a partition when we want to purge the data within seconds and all new data would be added to current month partition so ETL data loads would become fast. The disadvantage is that you will have to include time horizon in your every query that gets fired on the data warehouse otherwise it will do FULL SCAN. The alternative is to not partition by time or you can create global indexes on time-partitioned tables but when you drop data these indexes/tables becomes fragmented. This is a very important decision here and if you can get the User requirements and all the queries would contain time then go ahead and partition your fact by time else its better to pay the performance penalty during ETL as its a background job rather then make the user suffer on every query.

Programming fun at startup

Search This Blog

Partition your Rolling Fact by time or not

Comments

Post a Comment

Popular posts from this blog

RabbitMQ java clients for beginners

Spring 3.2 quartz 2.1 Jobs added with no trigger must be durable.

Killing a particular Tomcat thread