Use MD5 function to create unique IDs
- I first encountered this function when trying to join two tables together using about eight separate fields. Not ideal.
- The natural inclination is to create your own ID by simply concatenating a bunch of fields together. These columns are bad because they kind of look like data but operate as an ID. It’s important to have a column whose sole function is to be a unique identifier for that row.
- Instead, use MD5 functions to create unique IDs on AWS. IDs that are obviously IDs reduce confusion among junior analyst and end users by removing semi-comprehensible data strings throughout your database.
- At any rate, be aware that MD5 is no longer considered strong as a hash function, should it contain sensitive information
- More info on how this works is in Learn about cryptographyLearn about cryptography
Entropy
Entropy is a measure of randomness
Hashing functions
A cryptographic hash function maps data of arbitrary size to a fixed size
An example of a hash function is SHA-1, which is used in Git references
At a high level, a hash function can be thought of as a hard-to-invert random-looking (but deterministic) function.
A hash function has the following properties:
Deterministic: the same input always generates the same output.
Non-invertible: it is hard to f...
select md5('Amazon Redshift')
# ---
# f7415e33f972c03abd4f3fed36748f7a (1 row)
Metadata