Thursday, November 27, 2014

Columnar Database

I recently read about an interesting database concept that uses a columnar database, which is where data is stored in columns instead of rows.
Apparently this approach has several advantages over a row based database. A columnar database has better performance when a large number of queries are run at the same time. Joins and aggregation are faster as aggregate function results are incrementally calculated. Columnar databases can be compressed which eliminates the storage of multiple indexes, views and aggregation. Columnar databases also load data quickly. Each column is built in one pass and stored separately, so multiple threads can load different columns at the same time. Also, since columns are stored separately, data can be added and removed without updating the system after each change.
There are disadvantages to a columnar database though. It doesn't work for every case. Converting a row based database into a columnar database can be slow for large datasets. Incremental loads can be slow. Uncompressing data for reading can be slow. There are structural limitations, such as some columnar databases require the same primary key on all tables. These imitations may make it difficult to expand the system in the future. Columnar databases are also meant for large datasets and may not be suitable for small datasets.

This post was brought to you by your friends at MemSQL.