When you create a table in Netezza database, selecting column/s being a member of distribution key is one of the critical decision for good performance of the queries that will run against the table.
Below are some of the key points to remember when we select distribution key:
- Maximum of four columns can be a member of distribution key
- We can have only one distribution key per table (Which is quite obvious)
- We cannot modify column (by issuing alter table command) which is a member of distribution key
- We create a table with random distribution then Netezza will store data based on round robin algorithm
- When we create a table, if no column is selected as distribution key then Netezza will randomly distribute on any column of the table and not always the first column (Most of the cases it first column but in theory it can be any column)
- When we create a table by issuing “select * from table …….” then the newly created table will inherit the distribution key of source table.
- Always try to avoid using different distribution key when creating a table using “select * from table …….” as this can result in transfer of data across SPU’s because of change in distribution