Hash

A hash is a unique string of characters generated from input data. Imagine you input the phrase “Hello there” into a hash function; it processes the input and generates a fixed-length, scrambled output.

This output represents your input, but if you change even one letter of “Hello there”, the hash result will be completely different. Hashing is like a blender for data: it transforms the input into a fixed-size output, no matter how big or small the original data is.

Hash Tables and finding “Hello there”

A hash table is a special data structure where you can store key-value pairs, and it uses a hash function to map your data to specific locations in an array.

If “Hello there” is hashed to the number 85,513, that number would be the index in the array where the data is stored.

This allows for super-fast lookups because instead of scanning the whole table, you jump straight to the hashed index 85,513 to retrieve the data.

Linear Search VS Hash Table

Linear search is the simplest form of searching. You start at the beginning of the dataset (like an array or a list) and check each element, one by one, until you find the element you’re looking for or reach the end of the list.

A linear search literally traverses the entire structure (an array, list, or tree) one element at a time.

Why is hashing usually better?

Direct Access: Hashing allows you to jump straight to the location of the data rather than searching sequentially through the dataset.

Category Linear Search Hash Table
Time Complexity O(n) O(1) on average
Efficiency Slower for large datasets Fast, especially with large datasets
Best Use Case Small datasets, unsorted data Large datasets, frequent lookups
Collision Handling No collisions Requires handling (e.g., chaining, open addressing)
Data Structure Array, List, Tree Array with hash indices

Secure Hash Algorithm

SHA is a cryptographic hash function. The function spits out a fixed-size smoothie (hash).

A hash is used everywhere in real life for cryptographic apps; digital signatures, certificates, and password storage (hash/salts).

For instance, payment verification to check the transaction data integrity during online payments. Your bank to secure communication and document verification. Cloud storage for file transfer.

Bitcoin and Ethereum, the blocks, nodes, whatever, in the chain are hashed, and hashes are core to the whole thing to verify transaction data in the blockchain networks. No hash, no blockchain, you see now ape, hash very importante.

What else… Version control like in GitHub, we use SHA-1 hashes to track changes and maintain code with every commit.

Padding the Input

Input data is padded to ensure its length is a multiple of 512 bits (64 bytes). Padding involves appending a single ‘1’ bit followed by ‘0’ bits until the length is just before 64 bits from being a multiple of 512. Appending Length:

The original length of the data in bits is appended to the end of the padded data. This step ensures that the hash calculation considers both the content and the length of the input, crucial for uniqueness.

Initializing Variables

Five 32-bit variables (a, b, c, d, e) are initialized with specific constants.

Processing Chunks

The padded input is divided into 512-bit chunks, each further divided into sixteen 32-bit words. These words undergo bitwise operations involving the variables a, b, c, d, e.

Main Processing Loop

Each chunk undergoes a series of operations (based on the chunk’s position in the overall process) that include bitwise operations (AND, OR, XOR), addition, and rotation. Concatenating Results:

After processing all chunks, the final values of variables a, b, c, d, and e are concatenated to produce a 160-bit (20-byte) hash value represented as a hexadecimal string.

function sha1(msg) {
  // Step 1: preprocessing
  msg += String.fromCharCode(0x80); // add padding
  while (msg.length % 64 !== 56) {
    msg += String.fromCharCode(0);
  }

  // add length (ignoring for simplicity)
  msg += String.fromCharCode(0); 

  // Step 2: initialize variables
  let a = 0x67452301;
  let b = 0xEFCDAB89;
  let c = 0x98BADCFE;
  let d = 0x10325476;
  let e = 0xC3D2E1F0;

  // Step 3: for loop
  for (let i = 0; i < msg.length; i += 64) {
    let words = [];
    for (let j = 0; j < 64; j += 4) {
      words.push(
        (msg.charCodeAt(i + j) << 24) |
          (msg.charCodeAt(i + j + 1) << 16) |
          (msg.charCodeAt(i + j + 2) << 8) |
          msg.charCodeAt(i + j + 3)
      );
    }

    let [aa, bb, cc, dd, ee] = [a, b, c, d, e];

    for (let j = 0; j < 80; j++) {
      let f, k;
      if (j < 20) {
        f = (b & c) | (~b & d);
        k = 0x5A827999;
      } else if (j < 40) {
        f = b ^ c ^ d;
        k = 0x6ED9EBA1;
      } else if (j < 60) {
        f = (b & c) | (b & d) | (c & d);
        k = 0x8F1BBCDC;
      } else {
        f = b ^ c ^ d;
        k = 0xCA62C1D6;
      }

      const temp = ((a << 5) | (a >>> 27)) + f + e + k + words[j];
      e = d;
      d = c;
      c = (b << 30) | (b >>> 2);
      b = a;
      a = temp;
    }

    [a, b, c, d, e] = [a + aa, b + bb, c + cc, d + dd, e + ee];
  }

  // Step 4: the output
  return [a, b, c, d, e].map((x) => x.toString(16)).join("");
}

// test it
const hash = sha1("hello");
console.log(`SHA-1 Hash: ${hash}`);

SHA-1 is still used but mostly deprecated due to vulnerabilities. SHA-256 or higher should be used for modern applications.

SHA-3 (Keccak) Known for its resistance against recent attacks and chosen as a standard by NIST (National Institute of Standards and Technology).

MD5 Despite its vulnerabilities, it’s still used for checksums and data integrity verification in non-cryptographic contexts.

bcrypt Primarily used for password hashing due to its adaptive hashing function, making it resilient against brute-force attacks.

PBKDF2 (Password-Based Key Derivation Function 2) Often used for key stretching and generating cryptographic keys from passwords.

Argon2 Designed for hashing passwords securely, offering resistance against GPU and ASIC attacks.

SipHash Optimized for hash-table hashing and protecting against hash-flooding DoS attacks.