Database

From UVOO Tech Wiki
Revision as of 18:17, 22 July 2019 by Busk (talk | contribs) (Created page with "Dumping Ground for Database/Datastore information # Partitioning https://stackoverflow.com/questions/20771435/database-sharding-vs-partitioning ``` 31 Looks like this answ...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Dumping Ground for Database/Datastore information

Partitioning

https://stackoverflow.com/questions/20771435/database-sharding-vs-partitioning

31

Looks like this answers both your questions:

Horizontal partitioning splits one or more tables by row, usually within a single instance of a schema and a database server. It may offer an advantage by reducing index size (and thus search effort) provided that there is some obvious, robust, implicit way to identify in which table a particular row will be found, without first needing to search the index, e.g., the classic example of the 'CustomersEast' and 'CustomersWest' tables, where their zip code already indicates where they will be found.

Sharding goes beyond this: it partitions the problematic table(s) in the same way, but it does this across potentially multiple instances of the schema. The obvious advantage would be that search load for the large partitioned table can now be split across multiple servers (logical or physical), not just multiple indexes on the same logical server.

Source:Wiki-Shard.

Sharding is the process of storing data records across multiple machines and is MongoDB’s approach to meeting the demands of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharding solves the problem with horizontal scaling. With sharding, you add more machines to support data growth and the demands of read and write operations.

Source: MongoDB.

shareimprove this answer
edited Dec 27 '13 at 5:55
answered Dec 25 '13 at 12:34

NoChance
4,22311 gold badge2222 silver badges3030 bronze badges
add a comment

28

I've been diving into this as well and although I'm by far the reference on the matter, there are few key facts that I've gathered and points that I'd like to share:

A partition is a division of a logical database or its constituent elements into distinct independent parts. Database partitioning is normally done for manageability, performance or availability reasons, as for load balancing.

https://en.wikipedia.org/wiki/Partition_(database)

Sharding is a type of partitioning, such as Horizontal Partitioning (HP)

There is also Vertical Partitioning (VP) whereby you split a table into smaller distinct parts. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized.

https://en.wikipedia.org/wiki/Shard_(database_architecture)

I really like Tony Baco's answer on Quora where he makes you think in terms of schema (rather than columns and rows). He states that...

"Horizontal partitioning", or sharding, is replicating [copying] the schema, and then dividing the data based on a shard key.

"Vertical partitioning" involves dividing up the schema (and the data goes along for the ride).

https://www.quora.com/Whats-the-difference-between-sharding-DB-tables-and-partitioning-them

Oracle's Database Partitioning Guide has some nice figures. I have copied a few excerpts from the article.

https://docs.oracle.com/cd/B28359_01/server.111/b32024/partition.htm

When to Partition a Table

Here are some suggestions for when to partition a table:

Tables greater than 2 GB should always be considered as candidates for partitioning.
Tables containing historical data, in which new data is added into the newest partition. A typical example is a historical table where only the current month's data is updatable and the other 11 months are read only.
When the contents of a table need to be distributed across different types of storage devices.
Partition Pruning

Partition pruning is the simplest and also the most substantial means to improve performance using partitioning. Partition pruning can often improve query performance by several orders of magnitude. For example, suppose an application contains an Orders table containing a historical record of orders, and that this table has been partitioned by week. A query requesting orders for a single week would only access a single partition of the Orders table. If the Orders table had 2 years of historical data, then this query would access one partition instead of 104 partitions. This query could potentially execute 100 times faster simply because of partition pruning.

Partitioning Strategies

Range
Hash
List
You can read their text and visualize their images which explain everything pretty well.

And lastly, it is important to understand that databases are extremely resource intensive:

CPU
Disk
I/O
Memory
Many DBA's will partition on the same machine, where the partitions will share all the resources but provide an improvement in disk and I/O by splitting up the data and/or index.

While other strategies will employ a "shared nothing" architecture where the shards will reside on separate and distinct computing units (nodes), having 100% of the CPU, disk, I/O and memory to itself. Providing it's own set of advantages and complexities.

https://en.wikipedia.org/wiki/Shared_nothing_architecture