Among developers, it is still common practice, and natural, to browse from
/shows/2, etc. This is why using an Integer or a Serial as column data type for a primary key is the default choice. For most applications, this is fine. I won't deny, this is how I do most of the time. But there is another approach that deserves to be explored, Universally Unique Identifiers: UUIDs.
Universally Unique IDentifiers are 128 bits long unique identifiers across space and time, and they requires no central registration process. They can be used from tagging objects with an extremely short lifetime, to reliably identifying very persistent objects across a network.
- Unique across every table, every database, every server (more or less guaranteed to be).
- You're not reliant on a single system for generating your keys.
- Databases replications and merging are easier.
- You'll never show to the outside world, how many items are in your table.
Let's see how to play with UUIDs :
Using the Python uuid module:
>>> from uuid import uuid4 >>> uid = uuid4() >>> print(uid) d095c69c-255f-40b0-ac89-14b6e56211e2 >>> uid UUID('d095c69c-255f-40b0-ac89-14b6e56211e2') >>> print(uid.bytes) b'\xd0\x95\xc6\x9c%[email protected]\xb0\xac\x89\x14\xb6\xe5b\x11\xe2' >>> print(uid.int) 277257103643445164943391027606663074274 >>>
From Django 1.8, there is now a UUID field. I guess, this should be now, the default way of using an UUID as a field for your model:
#!/usr/bin/env python # -*- coding:utf-8 -*- import uuid from django.db import models class Show(models.Model): id = models.UUIDField(primary_key=True, default=uuid.uuid4, editable=False) # other fields
Don't be surprised, before Django 1.8, developers were already using UUIDs as primary keys. Here is an example using the Django CharField:
#!/usr/bin/env python # -*- coding:utf-8 -*- from uuid import uuid4 from django.db import models def myuuid(): return str(uuid4()) class Show(models.Model): id = models.CharField(max_length=36, primary_key=True, default=myuuid, editable=False) # other fields
Completely outside of Python and Django. Postgres has support for UUID as a column data type. Postgres also have the
uuid-ossp module providing some functions for generating UUIDs. We'll be interested only by the
uuid_generate_v4() function which bases the UUID entirely off random numbers.
First, create the extension, then generate one UUID:
postgres=# create extension if not exists "uuid-ossp"; CREATE EXTENSION postgres=# select uuid_generate_v4(); uuid_generate_v4 -------------------------------------- 7c9e06fd-1954-4b87-aa07-86872ffda436 (1 ligne) postgres=#
Now, let's suppose you have a table, with a column of UUID type.
postgres=# create table show( postgres(# id uuid not null default uuid_generate_v4(), postgres(# title character varying not null, postgres(# constraint show_id primary key (id) postgres(# ); CREATE TABLE postgres=#
This is how you would insert some values:
postgres=# insert into show values (uuid_generate_v4(), 'foo'); INSERT 0 1 postgres=# insert into show (title) values ('bar'); INSERT 0 1 postgres=# select * from show; id | title --------------------------------------+------- d540d7c4-1bc7-4945-b6fd-890edf0e15dd | foo acaa592b-db22-4163-bbcb-63269d86b7d4 | bar (2 lignes) postgres=#
Do you really need UUIDs?
UUIDs are larger than traditional integer:
128 bits, Serial
32 bits, BigSerial
UUIDs are hard to read and debug:
Do your brain really want to deal with
d540d7c4-1bc7-4945-b6fd-890edf0e15dd? Face it, reading, remembering, and typing UUIDs kinda sucks.
One of the points of a UUID is to have a universally unique identifier. You can achieve uniqueness of an item using composite keys. A phone number, a social security number could serve as a unique identifier. In a distributed system, a combination of server IP, application, and timestamp could also serve as a unique identifier.
If you really want to achieve uniqueness of your items using UUIDs, it's also possible to just add another field in your table with the
UUID data type and the UNIQUE constraint, and keep the
primary key as a Serial.
UUIDs are basically impossible to index because they should be uniformly distributed across a very large range of possibilities.
While the approach seems interesting, the truth is you don't need UUIDs if:
- You don't do replication or merging.
- You're not in a concurrent and distributed environment.
- You need insertions to be fast.
- You have natural keys to achieve uniqueness.
The end of the story is, nobody can really tell you whether you should use an Integer or a UUID. They can give advices, but nobody knows your data, your environment and your intent better than you.
Voilà. I'd really love to hear from you guys who ran into problems using UUIDs!
More on the topic: