%title: Password cracking: past, present, future %author @solardiz @Openwall OffensiveCon May 10-11, 2024 -> A talk by <- -> Solar Designer <- -> @solardiz <- -> https://www.openwall.com <- -> @Openwall <- -> Credits: CIQ/Rocky Linux -> May 10-11, 2024 -> Berlin, Germany --- -> INTRODUCTION --- -> Why passwords? <- * From a defensive perspective, "password" should usually mean "passphrase" - We proceed with "password" as a general term here * Passwords remain a distinct and ubiquitous authentication factor - "Something you know" in 2FA/MFA * Passwords are widely used to derive encryption keys for data or other keys * Passphrases may also be used to derive private keys e.g. for wallets - Generated seed phrases are commonly used, but extra words may be added * Similar concerns and techniques apply to other hashed low entropy data --- -> Why crack passwords? <- * Authentication - prevent or gain access to systems - Security audits, penetration tests - Access recovery (alternative to password reset) - (Un)authorized * Key derivation - recover or gain access to data, other keys, or funds - By owner, heir, service provider, law enforcement, anyone (un)authorized * General - Enhancement of breached password lists, training and test sets for tools - Research projects, contests, hobby, historical preservation - Analysis of malware and backdoors, OSINT, recovery of other hashed data --- -> Scenarios <- * Focus of this talk - Offline against a local copy of hashes, traffic dump, encrypted data - Optimization * Also closely related - Online against a remote system - e.g. Hydra by van Hauser / THC (2001+) * Not so closely related - Online against a remote system with unprivileged local access - Side-channel inference (network or local keystroke timings, etc.) - Physical against a (tamper-resistant) device * Mostly unrelated, although these may be best when they work - Password reset, plaintext password leak/extraction, password check bypass - Crack derived key directly --- -> Targets <- * Focus of this talk is tools usable for authentication passwords * There are also tools specialized for certain other targets such as encrypted files, filesystems, etc., but these are mostly beyond scope * Current most flexible password crackers - John the Ripper and hashcat - are not limited to authentication passwords, so most of the information here also pertains to password cracking/recovery for other supported targets * John the Ripper started to gain support for what we call "non-hashes" with Dhiru Kholia's Google Summer of Code project under Openwall (2011) * We process non-hash files via \*2john tools, which extract pseudo-hashes * Some reveal the encrypted data, better ones don't; btcrecover's also vary * hashcat followed suit and accepts pseudo-hashes output by \*2john --- -> Optimization <- * Speed (parallel throughput vs. hardware, maintenance, energy cost) - of candidate password generation (vs. focus) - of duplicate candidate password avoidance or suppression (vs. uniqueness) - of hashing or key derivation (vs. memory, storage, and reliability) - of matching against loaded hashes or checking for correct decryption - instantaneous (async) vs. average until completion (have to sync) * Memory and storage requirements (vs. speed, focus, uniqueness) * Focus (optimal search order) - vs. candidate password generation speed and parallel processing - vs. ease of reasoning (for checkpointing, reporting, exclusion) * Targeting (custom candidate password streams per target) * Uniqueness (no or few duplicate candidate passwords for same target) * Feedback loops and workflow (refocus based on passwords cracked so far) --- -> PASSWORD AUTHENTICATION TIMELINE --- -> Plaintext password storage (1960s to early 1970s CTSS, TENEX, Unix) <- -> password <- -> | <- -> v <- -> password store <- -> password password store <- -> \\ / <- -> v <- -> compare <- -> is it timing-safe? <- --- -> Plaintext password storage (1960s to early 1970s CTSS, TENEX, Unix) <- * On CTSS, "one afternoon [...] any user who logged in found that instead of the usual message-of-the-day typing out on his terminal, he had the entire file of user passwords" Fernando J. Corbato, "On Building Systems That Will Fail", 1991 The problem was a text editor temporary file collision, "early 60's" to "1965" by different sources * TENEX had a character-by-character timing leak exacerbated by paging * "The UNIX system was first implemented with a password file that contained the actual passwords of all the users" Robert Morris and Ken Thompson, "Password Security: A Case History", 1978 --- -> Password hashing (early 1970s Multics) <- * "Multics User Control subsystem stored passwords one-way encrypted [...] I knew people could take square roots, so I squared each password and ANDed with a mask to discard some bits." * After successful break by the Air Force tiger team doing a security evaluation of Multics in 1972-1974, "we quickly changed the encryption to a new stronger method" Tom Van Vleck, "How the Air Force cracked Multics Security", 1993 --- -> Password hashing (mid 1970s Unix) <- * crypt(3) of Unix up to 6th Edition inclusive reused code from an "encryption program [that] simulated the M-209 cipher machine used by the U.S. Army during World War II. [...] the password was used not as the text to be encrypted but as the key, and a constant was encrypted using this key." * "The running time to encrypt one trial password and check the result turned out to be approximately 1.25 milliseconds on a PDP-11/70 when the encryption algorithm was recoded for maximum speed." * "It takes essentially no more time to test the encrypted trial password against all the passwords in an entire password file, or for that matter, against any collection of encrypted passwords, perhaps collected from many installations." Robert Morris and Ken Thompson, "Password Security: A Case History", 1978 --- -> Password hashing (mid 1970s Multics & Unix) <- -> 2000s web apps & Windows <- -> password <- -> \\ <- -> v <- -> hash function <- -> v <- -> password hash store <- -> password password hash store <- -> \\ / <- -> v / <- -> hash function / <- -> \\ / <- -> v <- -> compare <- --- -> Password hashing (late 1970s Unix) <- -> password new salt <- -> policy check?\\ / | <- -> v | <- -> slow hash function | <- -> v v <- -> password hash store <- -> password password hash store <- -> \\ /(salt) /(hash) <- -> v / <- -> slow hash function / <- -> \\ / <- -> v <- -> compare <- --- -> Password hashing (1990s BSDI, bcrypt, PBKDF2) <- -> password new salt & cost <- -> \\ / (aka setting) <- -> v | <- -> tunably slow hash function | <- -> v v <- -> password hash store <- -> password password hash store <- -> \\ /(setting) /(hash) <- -> v / <- -> tunably slow hash function / <- -> \\ / <- -> v <- -> compare <- --- -> Password hashing (2010s+ scrypt, Argon2, ...) <- -> password new salt, costs <- -> \\ / (aka setting) <- -> v | <- -> memory-hard hash function | <- -> v v <- -> password hash store <- -> password password hash store <- -> \\ /(setting) /(hash) <- -> v / <- -> memory-hard hash function / <- -> \\ / <- -> v <- -> compare <- --- -> Other authentication methods (1990s+) <- The examples so far involve the plaintext password being sent to the server (hopefully over a transport security layer), but authentication methods not requiring that exist. Usually they are also susceptible to offline password cracking against certain authentication material: * Challenge/response (many kinds): network sniffed challenge/response pairs - Even worse, POP3 APOP and CRAM-MD5 require plaintext-equivalent storage * Kerberos: TGTs, AFS user database * S/Key, OPIE: skeykeys file * SSH: passphrase-encrypted private key, hashed known_hosts * SRP (and other PAKEs): verifiers --- -> PASSWORD CRACKING SPEED --- ``` Newsgroups: net.general Date: Thu Jan 6 08:02:37 1983 We proudly announce The Second Official UNIX PASSWORD CRACKER CONTEST Submit your ingenious /etc/passwd password cracker program (source code) to the undersigned by January 31, 1983. We will test all programs for speed, portability, and elegance, on verious Unix versions and on different machines. A manual page and a short writeup explaining the algorithm is a plus. The writers of the best three programs will win the Grand Prize, The Super Grand Prize, and The Ultra Grand Prize (and world-wide, ever lasting fame). Ran Ginosar, Computer Technology Research Center, Bell Labs, Murray Hill. ``` --- -> Password cracking (1980s, unoptimized) <- * For each user's hash - For each candidate password -> candidate password password hash <- -> \\ /(salt) / <- -> v / <- -> hash function / <- -> \\ / <- -> v <- -> compare <- --- -> Password cracking (unsalted, semi-optimized) <- * For each candidate password -> candidate password password hash(es) <- -> \\ / <- -> v / <- -> hash function / <- -> \\ / <- -> v <- -> one-to-many compare <- * We've amortized the cost of hashing, reusing the result of each computation --- -> Password cracking (early 1990s, salted, semi-optimized) <- * For each candidate password (groups of more likely passwords first) - For each salt -> candidate password password hash(es) that use the current salt <- -> \\ /(salt) /(hashes) <- -> v / <- -> hash function / <- -> optimized \\ / <- -> v <- -> one-to-many compare (initially just a loop) <- -> becomes one-to-one when each salt is unique, as they should be <- * We can no longer amortize the cost of hashing when salts are globally unique --- -> Password cracking (late 1990s, salted, mostly optimized) <- * For each candidate password group (those of more likely passwords first) - For each salt -> candidate password(s) password hash(es) that use the current salt <- -> \\ /(salt) /(hashes maybe partially reversed) <- -> v / <- -> many-to-many hash function / <- -> cost-amortizing, optimized\\ / <- -> parallelized v <- -> many-to-many compare <- -> becomes many-to-one when each salt is unique, as they should be <- * Amortizing means some parts are computed less often and their results reused - "Key setup" and/or "salt setup" may be moved to an outer loop - "Finalization" may be reversed at startup and then not performed at all --- -> Password cracking many-to-many hash comparison <- * If few hashes loaded for the current salt, compare or eliminate directly * If many hashes loaded for the current salt (1997+ hash tables in JtR 1.2+) -> partial hashes for comparison against those being cracked (for current salt) <- -> v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v <- -> +-----------------------------------+-----------------------------------+ <- -> 1. Prefetch needed elements of ---> | per-salt | <- -> 2. Reject hashes not in ----------> | sparse bitmap(s) or Bloom filter | <- -> 3. Complete computation of the rest | or/and (perfect) hash table, etc. | <- -> +-----------------------------------+-----------------------------------+ <- -> remaining full hashes <- -> v v <- -> 4. Look up in the final data structure(s), which can start similar to the <- -> above or be as simple as a linked list right away <- --- -> Password cracking cost reduction <- * When we amortize cost, we reduce total cost to achieve the same results - We do it e.g. through reducing the total amount of computation * For well-suited hashing schemes, very little computation can be amortized - However, that's not the only way to reduce cost * Besides computational complexity, the other major metric is space complexity * Real-world costs may be incurred for hardware, maintenance, energy, etc. - These costs are related to both computational and space complexities, as well as to real-world constraints, which may vary by attacker - For example, "how many CPUs and how much RAM is occupied for how long, and do we readily have those or do we have to acquire them?" \ ... and then it might not be CPUs anymore --- -> Password cracking on GPUs (2007+) <- * Pioneered by Andrey Belenko of Elcomsoft - Initially for NTLM, LM, and raw MD5 hashes, achieving 100M+/s - Beyond reach of existing CPU software (except for Cell such as in PS3) * Andrey and others improved the speeds and implemented further (non-)hashes - Whitepixel by Marc Bevand: 33.1 billion/s against one MD5 hash on a sub-$3000 4x dual-GPU HD 5970 computer (2010) - oclHashcat-lite by atom: 10.9 billion on one dual-GPU HD 6990 (2012) - oclHashcat-plus made GPUs usable almost as completely as CPUs (2012) - Closed source at first, tweeted MD5 of "hashcat open source" in 2015 * John the Ripper patches for some hashes in 2010 and 2011, integrated into 1.7.9-jumbo-6, was first to implement bcrypt, sha512crypt (2012) * hashcat 6.2.6: 164.1 billion/s against one MD5 hash on one RTX 4090 (2022) --- -> Password cracking cost reduction through parallel processing <- * Parallel processing during defensive hashing or key derivation is limited * Parallel processing potential during password cracking is "unlimited" - Yet efficient designs that also satisfy other goals are far from trivial * Thus, attack duration can be "arbitrarily" reduced through addition of parallel processing elements (CPUs/SIMD, more/larger GPUs/FPGAs/ASICs) - along with accordingly more memory ... unless the hashing scheme allows for memory cost amortization - Most older schemes don't use much memory anyway - Most modern schemes should avoid this, which they do to varying extent * Parallel processing doesn't reduce the amount of computation, but - it reduces the amount of time for which other resources are held and/or - it amortizes their cost (e.g. a CPU alone vs. the CPU+GPUs per chassis) --- -> Password cracking cost reduction through time-memory trade-off (TMTO) <- * With no salts or few commonly used values, it may help to precompute and partially store the hashes to bypass most computation in future attacks - QCrack (1995-1997) for DES-based crypt(3) could save ~1 day per disk - Rainbow tables (Philippe Oechslin, 2003 building on Martin Hellman, 1980) * It may be possible to compute a function in less time by using more memory - Matt Bishop, "A Fast Version of the DES and a Password Encryption Algorithm", 1987 uses larger/combined lookup tables (up to 200 KB total) * Conversely, it may also be possible to compute a function in less memory by throwing away and recomputing intermediate results when needed - scrypt is deliberately friendly to this trade-off, which crackers use * With many same-salt hashes, early-rejection rate may be increased (and thus further computation and matching reduced) with larger and sparser bitmaps --- -> Password cracking cost metrics <- * For a given performance ({password, hash} tests per time, maybe amortized) - Hardware: ASIC die area, mm^2 - for a certain design at a certain clock rate in a certain ASIC process - Power, W - not a cost per se, but for lengthy attacks translates into energy cost - correlates with die area * For a given attack ("test these candidate passwords against these hashes") - Hardware: ASIC die area and time product (area-time, AT), mm^2 \* s - Energy: power and time product, J = W \* s - correlates with AT, letting estimate relative costs in AT terms alone * This is state of the art theoretical model for informing defensive designs * Hardware and energy may have monetary costs, but not always to the attacker * Real-world attackers' costs may vary greatly e.g. due to existing hardware --- -> Parallelized hash function (originally memoryless) <- -> candidate passwords <- -> v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v <- -> +--------+--------+--------+--------+--------+--------+--------+--------+ <- -> | core | core | core | core | core | core | core | core | <- -> +--------+--------+--------+--------+--------+--------+--------+--------+ <- -> | core | core | core | core | core | core | core | core | <- -> +--------+--------+--------+--------+--------+--------+--------+--------+ <- -> | core | core | core | core | core | core | core | core | <- -> +--------+--------+--------+--------+--------+--------+--------+--------+ <- -> | core | core | core | core | core | core | core | core | <- -> +--------+--------+--------+--------+--------+--------+--------+--------+ <- -> v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v v <- -> hashes for comparison against those being cracked (for current salt) <- 32 hashes in parallel in the same amount of time that a defender needs for one --- -> Parallelized hash function (amortizable memory-hard) <- -> candidate passwords <- -> v v v v v v v v v v v v v v v v <- -> +--------+--------+--------+--------+--------+--------+--------+--------+ <- -> | core | core | core | core | | <- -> +--------+--------+--------+--------+ + <- -> | core | core | core | core | | <- -> +--------+--------+--------+--------+ memory + <- -> | core | core | core | core | | <- -> +--------+--------+--------+--------+ + <- -> | core | core | core | core | | <- -> +--------+--------+--------+--------+--------+--------+--------+--------+ <- -> v v v v v v v v v v v v v v v v <- -> hashes for comparison against those being cracked (for current salt) <- 16 hashes in parallel in the same amount of time that a defender needs for one --- -> Parallelized hash function (parallelizable memory-hard) <- -> candidate password <- -> v <- -> +--------+--------+--------+--------+--------+--------+--------+--------+ <- -> | core | core | core | core | | <- -> +--------+--------+--------+--------+ + <- -> | core | core | core | core | | <- -> +--------+--------+--------+--------+ memory + <- -> | core | core | core | core | | <- -> +--------+--------+--------+--------+ + <- -> | core | core | core | core | | <- -> +--------+--------+--------+--------+--------+--------+--------+--------+ <- -> v <- -> hash for comparison against those being cracked (for current salt) <- 1 hash in 1/16 of the amount of time that a defender using one core would need --- -> Parallelized hash function (sequential memory-hard, e.g. scrypt) <- -> candidate passwords <- -> v v <- -> +--------+--------+--------+--------+--------+--------+--------+--------+ <- -> | core | | core | | <- -> +--------+ +--------+ + <- -> | | | <- -> + memory + memory + <- -> | | | <- -> + + + <- -> | | | <- -> +--------+--------+--------+--------+--------+--------+--------+--------+ <- -> v v <- -> hashes for comparison against those being cracked (for current salt) <- 2 hashes in parallel in the same amount of time, but we need 2x more memory --- -> Parallelized hash function (sequential memory-hard + ROM-port-hard) <- -> candidate passwords <- -> v v <- -> +--------+--------+--------+--------+--------+--------+--------+--------+ <- -> | | core | | <- -> + RAM +H------H+ + <- -> | port=# #=port | <- -> +--------+ | yescrypt ROM 100+ GB + <- -> | port=# #=port | <- -> + RAM +H------H+ + <- -> | | core | | <- -> +--------+--------+--------+--------+--------+--------+--------+--------+ <- -> v v <- -> hashes for comparison against those being cracked (for current salt) <- 2 hashes in parallel in the same amount of time, but we need lots of memory --- -> Segmentation fault (core dumped) <- In these illustrations | core | refers to any processing element capable of (mostly) computing the target hash without having a lot of memory of its own. We give memory to cores, sometimes sharing it. A core may be today's usual CPU core, or it may be a SIMD (single instruction, multiple data) unit (e.g. within a GPU CU or SM), or it may be a SIMD lane (within a CPU core or a GPU SIMD unit), or it may be a pipeline stage (e.g. with different hash computations' instructions interleaved on a superscalar CPU or a GPU), or it may even be a single bit number across N-bit registers in a register file (when we've rotated our problem and are now "bitslicing" it) Of course, and most importantly, a core may also be a logic circuit in an FPGA or ASIC, but even there by a core we might also be referring e.g. to each pipeline stage, whichever option is relevant or optimal in a given context --- -> Bitslicing <- * Eli Biham, "A Fast New DES Implementation in Software", 1997 ~100 logic gates per S-box "about three times faster than our new optimized DES implementation on 64-bit computers. [...] view the processor as a SIMD computer, i.e., as 64 parallel one-bit processors computing the same instruction." (It's commonly 512 now.) * Matthew Kwan, "Reducing the Gate Count of Bitslice DES", 1997+ 87.75 (1997), 51 to 56 (1998) on average depending on available gate types * Marc Bevand (45.5 with bit select), Dango-Chu (39.875, ditto), Roman Rusakov (32.875 with bit select, 44.125 without), DeepLearningJohnDoe (23.5 with LUT3), Sovyn Y. (22.125 with LUT3 - later in 2024, not in crackers yet) * On AVX-512, bitslicing lets us eliminate 512 computed hashes in ~9 steps --- -> Wrapping up on hashing speed optimization <- * Current most flexible password crackers - John the Ripper and hashcat - support hundreds of different (non-)hash types on many hardware platforms * It takes a lot of effort to optimize each combination, so the extent of optimization at a given time varies across tools, (non-)hashes, platforms * Where practical, we find ways to share optimizations across (non-)hash types and/or platforms - e.g. John the Ripper's shared SIMD "pseudo intrinsics" * Related higher-level efforts include Aleksey Cherepanov's john-devkit and Alain Espinosa's fast-small-crypto, which are specialized code generators * OpenCL abstracts much of this as well, so recent hashcat uses it even on CPU * Tuning, including runtime on the target machine (e.g. OpenCL work sizes) --- -> Speeds for historical Unix hashes <- * Unix up to 6th Edition, based on M-209 ``` | c/s | year | software | hardware | power | | 800 | mid 1970s | | PDP-11/70 | | ``` * Unix 7th Edition, 25 iterations of salt-modified DES, amortized multi-salt ``` | 3.6 | 1977 | | VAX-11/780 | | | 45 | 1988 | Morris worm | VAX 6800 | | | <= 1k | ~1993 | Crack, Cracker Jack | 386DX 40 MHz with cache | | | 12.5k | 1998 | John the Ripper 1.5 | Pentium 133 MHz | | | 214k | 1998 | John the Ripper 1.5 | Alpha 21164A 600 MHz | | | 973M | 2019 | John the Ripper 1.9 | ZTEX 1.15y FPGA (2012) | 34W | | 6277M | 2022 | hashcat 6.2.6 | RTX 4090 GPU | ~450W | | 1825M | 2024 | John the Ripper | 2x Xeon Platinum 8488C | ~770W | ``` --- -> Speeds for contemporary decent hashes <- bcrypt at 32 iterations (cost 5), even though modern uses are at 256 to 4096 (cost 8 to 12), so would be ~8 to ~128 times slower ``` | c/s | year | software | hardware | power | | 6.5 | 1998 | OpenBSD 2.3 library | Pentium 133 MHz | | | 22.5 | 1998 | John the Ripper 1.5 | Pentium 133 MHz | | | 62.5 | 1998 | John the Ripper 1.5 | Alpha 21164A 600 MHz | | | 6595 | 2013 | John the Ripper 1.8 | i7-4770K ~3.7 GHz turbo | ~84W | | 185k | 2017 | hashcat 3.5.0-22-.. | 8x GTX 1080 Ti GPU | | | 106k | 2017 | John the Ripper | ZTEX 1.15y FPGA 141 MHz | | | 119k | 2019 | John the Ripper 1.9 | ZTEX 1.15y FPGA 150 MHz | 27W | | 2.1M | 2019 | John the Ripper 1.9 | 18x ZTEX 1.15y + host | 585W | | ~25k | 2019 | hashcat, JtR | Vega 64 GPU | | | <250k | 2022 | hashcat 6.2.6 | RTX 4090 GPU | <450W | | 169k | 2024 | John the Ripper | 2x Xeon Platinum 8488C | ~770W | ``` --- -> Speeds for historical Windows hashes <- ``` L0phtCrack 1.5 "on a Pentium Pro 200 checked a password file with 10 passwords using the alpha character set (A-Z) in 26 hours. [...] [note from mudge: try building the CLI version on an ultrasparc using the compile flags in the Makefile provided - this will make these figures look sloooooowwww ;-)]" (1997) ``` * This means 89k p/s against 10 LM hashes, which JtR improved upon shortly * Hash Suite 3.5 by Alain Espinosa: 8360M on HD 7970, 7270M on GTX 970 (2018) * hashcat 4.1: 4134M on HD 7970, 5250M on GTX 970 (2018) (all at 10 hashes) * hashcat 6.2.6 vs. one LM hash (10 would be slightly slower), RTX 4090 (2022) ``` Speed.#1.........: 151.1 GH/s (6.95ms) @ Accel:256 Loops:1024 Thr:32 Vec:1 ``` --- -> Speeds for contemporary Windows hashes <- * hashcat 6.2.6 vs. NTLM, RTX 4090 ``` Speed.#1.........: 288.5 GH/s (7.24ms) @ Accel:512 Loops:1024 Thr:32 Vec:8 ``` * hashcat 6.2.6 vs. NTLM, 8x RTX 4090 (different rig, driver, CUDA version) ``` Speed.#1.........: 258.6 GH/s (16.22ms) @ Accel:128 Loops:1024 Thr:256 Vec:1 Speed.#2.........: 255.1 GH/s (16.40ms) @ Accel:128 Loops:1024 Thr:256 Vec:1 Speed.#3.........: 255.3 GH/s (16.43ms) @ Accel:128 Loops:1024 Thr:256 Vec:1 Speed.#4.........: 257.4 GH/s (16.26ms) @ Accel:128 Loops:1024 Thr:256 Vec:1 Speed.#5.........: 251.7 GH/s (16.64ms) @ Accel:128 Loops:1024 Thr:256 Vec:1 Speed.#6.........: 230.3 GH/s (18.27ms) @ Accel:128 Loops:1024 Thr:256 Vec:1 Speed.#7.........: 228.7 GH/s (18.37ms) @ Accel:128 Loops:1024 Thr:256 Vec:1 Speed.#8.........: 256.7 GH/s (16.35ms) @ Accel:128 Loops:1024 Thr:256 Vec:1 Speed.#*.........: 1993.9 GH/s ``` --- -> Speeds for contemporary strong hashes <- * John the Ripper vs. Argon2, GTX 1080 (at max turbo, 180W, 2016 gaming GPU) ``` Cost 1 (t) is 3 for all loaded hashes Cost 2 (m) is 16384 for all loaded hashes Cost 3 (p) is 1 for all loaded hashes Cost 4 (type [0:Argon2d 1:Argon2i 2:Argon2id]) is 0 for all loaded hashes Trying to compute 480 hashes at a time using 7680 of 8119 MiB device memory LWS=[32-256] GWS=[15360-15360] ([60-480] blocks) => Mode: WARP_SHUFFLE Press 'q' or Ctrl-C to abort, 'h' for help, almost any other key for status 03091983 (?) 1g 0:00:00:28 0.03457g/s 1742p/s 1742c/s 1742C/s Dev#4:53C thriller1..d1amond ``` * 2x Xeon E5-2670 + 8x DDR3-1600 (32 threads, 230W total, 2012 server CPUs) 1g 0:00:01:54 0.008726g/s 437.8p/s 437.8c/s 437.8C/s 050489..010591 --- -> Wrapping up on speeds <- * So far we discussed how to optimize the speed - of hashing - of matching against loaded hashes * We mostly didn't yet discuss optimizing the speed - of candidate password generation - of duplicate candidate password avoidance or suppression * For contemporary strong targets the speeds per device haven't obviously changed much since the 1970s and are mostly within one or two orders of magnitude of 1000 checks per second * At such low speeds, other optimization goals matter most (focus, targeting, uniqueness vs. memory requirements, ease of reasoning, feedback, workflow) * Amdahl's law has effect when other processing is sequential and synchronous --- -> PASSWORD CRACKING FOCUS --- ``` /* * Warning: this program burns a lot of cpu. */ /* * Insecure - find accounts with poor passwords Date: Tue, 29 Nov 83 18:19:32 pst From: leres%ucbarpa@Berkeley (Craig Leres) Insecure is something that Jef Poskanzer and I wrote to rid a local system of an overly persistent ankle-biting adolescent. It was a quick hack we whipped up in just a few minutes and was never intended to be publically distributed. Unfortunately, I made the mistake of giving a copy to an associate at UC Berkeley. Apparently, he incorporated it in a security package he later developed for use at Berkeley. Someone else distributed it outside Berkeley which explains why it's been publically distributed. ``` --- -> Candidate password generators (1980s) <- ``` static char *rcsid = "$Header: pwchkr.c,v 1.1 85/09/10 16:00:56 root Exp $"; * By default, this program only checks for accounts with passwords the same * as the login name. The following options add more extensive checking. * -w file: use the list of words contained in "file" as likely * passwords. Words in the file are one to a line. * -b: check all guesses backwards too * -g: use the Full Name portion of the gecos field to * generate more guesses; also check .plan, .signature * and .project files. * -s: check the single letters a-z, A-Z, 0-9 as passwords * -c: with each guess, check for all-lowercase and * all-uppercase versions too. * -d: check the doubling of the username ``` --- -> Candidate password generators (early 1990s) <- ``` "The first pass that Crack makes is over the [data] gleaned from the users' password field. In my test file, this gets about 4% of the passwords (out of a total of 15% guessed). This pass also completes very quickly, working as it does from a very small amount of data, but one which is very frequently used for passwords. The first sweep of the second pass, consisting of lowercase dictionary words, gets about another 5% of the passwords. The length of the first sweep depends on how much CPU and how many dictionaries I supply, but using the Ultrix /usr/dict/words and my bad_pws.dat, over 4 CPUs, it doesn't take much more that a few hours. For the further sweeps, the percentages cracked per sweep tail off, 2%, 1%, 0.5%... But they are the people with fairly exotic passwords, and it's only common sense that that they will take some time to crack." - Alec Muffett ``` --- -> Candidate password generators (mid 1990s) <- Excerpts from Crack user manual by Alec Muffett: ``` "Crack 5.0 supports the notion of dictionary groups - collations of words taken from a selection of raw text dictionaries (with words given, one per line) permitting the user to group her dictionaries into "most-likely", "less-likely" and "least-likely" to generate a successful password guess." "When Crack starts up [...] two other dictionary groups are created: "gecos" and "gcperm". The "gecos" group contains only words directly derived from the information held in the password file; the "gcperm" group holds words which are mechanically created by permuting and combining parts of words held in the password file (eg: "Alec Muffett" becomes "AMuffett", "AlecM", etc)." "When the cracker is running, it [is] taking successive mangling rules [...] and applying them to the cited dictionary group" ``` --- -> Mangling rules <- ``` "These rules are macro commands, one per line, which specify patterns and actions that are applied to words from a dictionary in order to generate a series of guesses. For instance, one such rule: /ese3u ...will select words which contain the letter "e", replace it with the digit "3", and force the rest of the word to uppercase." (Can also be "/e se3 u".) ``` * Introduced in Crack 4.0 (Nov 1991), maintained until Crack 5.0 (Dec 1996) * Adopted and extended in John the Ripper (1996+), InsidePro's tools, hashcat --- -> Mangling rules evolution <- * Specific to John the Ripper (not adopted by InsidePro and hashcat) - Preprocessor - Rule reject flags (e.g. skip a rule if hash is case-insensitive) - Word pair commands (used on concatenated first and last names only) ``` # johnsmith -> John Smith, John_Smith, John-Smith -p-c 1 <- (?a c $[ _\-] 2 (?a c ```_ * Other extra commands in John the Ripper * Same and different extra commands in InsidePro's PasswordsPro & Hash Manager * Same and different extra commands in hashcat (started with PasswordsPro's) * hashcat "World's first and only in-kernel rule engine" (OpenCL and CUDA) * hashcat compatibility mode in John the Ripper --- -> Mangling rulesets <- * Old John the Ripper default rules were hand-written, some tuned, some not * KoreLogic rules targeting users' coping with password policies (2010) - Rewritten to use the preprocessor (much shorter, expands to ~7M rules) * InsidePro's PasswordsPro rules imported by the hashcat community, many new rulesets created by the community over the years (hand-written, generated) - hashcat best64 contest - come up with most efficient 64 rules * OneRuleToRuleThemAll - best 25% of all hashcat rules (~52k rules) - OneRuleToRuleThemStill - optimized further (~49k rules) * Passphrase rules for hashcat and John the Ripper (some expect special input) * John the Ripper "All" ruleset (using nested include directives) is ~11M --- -> Mangling ruleset tuning <- * Simon Marechal, "Automatic mangling rules generation", 2012 specifically for John the Ripper, but didn't result in anything integrated in the project - Original implementation of rulesfinder abandoned, project re-born in 2020 * John the Ripper PerRuleStats feature added and used to re-order ~7600 line preprocessor output from certain hand-written rules, now new defaults (2022) ``` "[...] decreasing weighted score, which considers number of guesses both per rule and per password candidates tested. This ordering is good to use on fast to medium speed hashes. Generated [...] on pwned-passwords-sampler output for HIBP v8 (100M non-unique, ~54M unique hashes), counting unique guesses. Manually split into best (40%), worse (next 40%), and worst (final 20%), which achieve, respectively, most of the cracks (97%+), some more (2%+), and hardly any." ``` --- -> Targeting <- * When deriving candidate passwords from users' information, check those only against the same users' hashes - ... and against other hashes with the same salt since it's almost free * Cracker Jack's inconvenient "single crack" mode (1993) - Run a separate program to create a custom format wordlist, then use it * John the Ripper's convenient "single crack" mode (1996) - Everything built-in and automatic - Also checks successfully cracked passwords against all hashes * Markus Duermuth et al. OMEN+ Markov model with personal information (2013) * hashcat's Association mode (2020) - changes.txt: [...] attack hashes from a hashlist with associated "hints" --- -> Duplicate candidate passwords <- Consider the following entries on a common passwords list used as a wordlist: ``` password password1 Password ``` Then the rule "l $1" (lowercase and append the digit 1) will produce: ``` password1 password11 password1 ``` This has a duplicate, and besides "password1" is duplicate with the original wordlist (the output of a likely preceding no-op rule) --- -> Duplicate candidate password avoidance <- * Crack's rules support many "reject the word if/unless ..." commands, which can be used to avoid producing most effectively-duplicates * John the Ripper added more word reject commands and also rule reject flags, and its default ruleset makes extensive use of all of these (not trivial) * With input wordlists constrained to lowercase-only, this is very effective; with common passwords as input, not too bad (e.g. 89.6% unique for default wordlist and ~3k rules, which the duplicate candidate password suppressor at 256 MiB RAM usage improves to 94.3%, producing a ~50 GB output stream) * hashcat rulesets typically do not use rejects much or at all, resulting in many more duplicates (e.g. 59.7% unique for the same wordlist as above with top 3k rules from OneRuleToRuleThemStill in John the Ripper's compatibility mode, which the 256 MiB RAM duplicate suppressor improves to 80.6%) --- -> Duplicate candidate password suppression <- * John the Ripper "single crack" uses small per-salt ring buffers (along with hash tables for fast lookup) to detect and suppress recently seen candidates * hashcat "Supports password candidate brain functionality" (2018) - Network client/server architecture - can suppress duplicates across jobs - Requires (easy) manual setup even for local single-job use - Per documentation, server has performance bottleneck at 50k p/s - Writes fast hashes of password candidates to disk, no limit, no eviction * John the Ripper duplicate candidate password suppressor (2022) - Currently per-process - won't suppress duplicates across processes - Enabled by default when mangling rules are in use, auto-disables when hit rate is low and other processing speed is high, memory use can be tweaked - Speed of a few million p/s, but sequential and synchronous with the rest - Opportunistic - probabilistic filter of pre-specified size, has eviction --- -> Duplicate candidate password suppressor <- John the Ripper's duplicate candidate password suppressor uses a hash table to store fingerprints (other fast hashes) of items. This is similar to a cuckoo filter, except that it's degraded from having 2 to only 1 potential bucket for each item (so not cuckoo). The buckets are currently 8 items wide. When the bucket is full, we simply evict/replace a fingerprint (from the second half). -> (index, fp) = two\_fast\_hashes(candidate) <- -> | +--------------------+ <- -> | LOCKED ADD-ONLY v | EVICTABLE <- -> | 0------1------2------3------4------5------6------7------+ <- -> | | fp | fp | fp | fp | fp | fp | | | <- -> | 1------+------+------+------+------+------+------+------+ <- -> +>| fp | fp | fp | | | | | | <- -> 2------+------+------+------+------+------+------+------+ <- -> | fp | fp | fp | fp | fp | fp | fp | fp | <- --- -> Wordlist de-duplication tools <- The task is to remove duplicate lines without changing the order (which may have been optimized in some way), so without sorting, and without requiring memory for the whole output nor disk storage beyond that for the output * John the Ripper 1.6+ bundled "unique" (1998) - When output exceeds memory, re-reads the output file written so far to identify duplicates between what's in memory and what's already output * duplicut (2014) - Multiple threads "are only used when the file is huge enough to chunk" * hashcat-utils rli (2015) * Rling, "a faster multi-threaded, feature rich alternative to rli" (2020) - by Waffle (author of MDXfind password cracker) --- -> Wordlists <- * Historically, password crackers literally used lists of dictionary words * Tiny public common password lists appeared in 1980s, e.g. from Morris worm * Moderately longer ordered list was maintained in John the Ripper (1997+) * RockYou leak (32.6M plaintext passwords, 14.3M unique) changed a lot (2009) - A large sample of passwords that are not biased to what was crackable - The standard for password security tools' training, testing, and usage * Lists of dictionary words were and still are used as well (e.g. Openwall wordlists collection, 2003, which merges and credits lots of sources) * Additional published wordlists were scraped e.g. from Wikipedia (Sebastien Raveau, 2009, 2012) and Project Gutenberg books (CrackStation, 2010) --- -> Password cracking communities <- * Hobbyist community forums such as InsidePro's, hashkiller.co.uk, hashes.org (all of which are now defunct) collected uploads of hashes to crack and the plaintexts members cracked so far, typically without identifying the source and without the usernames * Files from the above were also retrieved by the defensive security community * Have I Been Pwned (HIBP) or "Pwned Passwords are hundreds of millions of real world passwords previously exposed in data breaches", published by Troy Hunt in the form of SHA-1 and NTLM hashes (re-hashed from plaintexts) along with numbers of occurrences in breaches + passwdqc is able to proactively check new passwords against HIBP offline * It is still possible to obtain the hashes.org plaintext lists elsewhere, re-crack almost all of HIBP and generate an ordered breached passwords list --- -> Wordlist optimization <- HIBP v8 being at 847M unique passwords (from a few billion accounts) is large (although perhaps not the largest collection in existence). RockYou is arguably cleaner. Both are fair play for security researchers. John the Ripper's current password.lst (2022, generated in time for the mangling ruleset tuning) is "based on Pwned Passwords v8 (HIBP) 100+ hits overlap with RockYou, further filtered to require 97+ hits on top of RockYou's. These criteria are such that a password used by just one person many times is very unlikely to be included." * A focused common passwords list of ~1.8M lines and ~15 MB * Obviously ethical to redistribute and highly effective --- -> Probabilistic candidate password generators (mid 1990s) <- * Probabilistic password generator is a "technique for generating candidate passwords from a statistical model" (Simon Marechal, 2012) * Novel algorithm to search the keyspace exhaustively and without duplicates while walking the 2D surface of Charset \*\* Length uphill (1995) * John the Ripper 1.0 introduced "incremental mode" (1996) ``` [Incremental:Alpha] CharCount = 26 MinLen = 1 MaxLen = 8 CharsetB = smcbtdpajrhflgkwneiovyzuqx CharsetM = eaiornltsuchmdgpkbyvwfzxjq CharsetE = erynsatldoghikmcwpfubzjxvq ``` --- -> Probabilistic candidate password generators (later 1990s) <- * John the Ripper 1.0's may retroactively be called a 0th-order Markov chain * A further 0th-order variation added per-length and per-position statistics ``` Charset11 = ajyfioqxdehmnrstlcupbgkwvz Charset21 = mdjpagetbrnsckyfilwhuoqvzx Charset22 = olstabegrkjdhnvwcmpfiquxyz Charset31 = dacjmbrtpslknfeghowqvzxiuy Charset32 = aoeisumctgdblrfjpnvhwkxyzq Charset33 = msnctdxepghlywabrjikuzofvq ``` * Star Cracker by The SOrCErEr went beyond 0th-order (late 1996) * John the Ripper 2nd-order along with per-length and per-position (late 1996) --- -> Probabilistic candidate password generators (late 1990s) <- * Training on previously cracked passwords (reading john.pot) as a feature in John the Ripper (obscure releases in late 1996, popular 1.4 in early 1997) ``` -makechars: make a charset, will be overwritten ``` * John the Ripper 1.5 upgraded the surface to walk uphill on from 2D to 3D and the height from keyspace portion to expected cracks per candidate (1998) * The 3 components are: current virtual character count, current length, and current position of fixed virtual character - The 3rd component was internal to the earlier algorithm, now exposed - "Virtual character" means its index into a sorted string, which vary depending on up to 2 preceding characters (2nd-order Markov chain) * The finer granularity is later helpful for parallel/distributed processing --- -> Vowel/consonant patterns <- * John the Ripper 1.0 to 1.4 had ``` Wordlike Set to 'Y' to enable a simple built in word filter (words with more than one vowel in a row, or more than two non-vowels in a row, will get filtered out). ``` This was needed along with 1.0's 0th-order Markov (non-)chain, which was still available as an option in 1.4 for memory saving (2nd-order required 4 MB RAM) and for possible manual specification of character sets per position (similar to future "mask mode"). * John the Ripper 1.5 no longer had 0th-order at all, so the above was dropped * Specifically exploiting vowel/consonant patterns is a recurring idea --- -> Custom candidate password generators (late 1990s) <- * John the Ripper 1.3+ "external mode" (late 1996 or early 1997) - Write your own candidate password generator or filter in a C-like language in the configuration file - Compiles to stack-based VM implemented via threaded code - Optimizations: top of stack cache, multi-push, GCC "Labels as Values" - Optional JIT to 32-bit x86 (1997), abandoned in the rewrite for 1.5 (1998) * Incremental mode training filters (initial use case) * Password policy matching filters * Recovery of partially lost passwords (mostly obsoleted by mask mode) * Exploits for Strip, DokuWiki, KDEPaste, Awesome Password Generator * Date/time, keyboard walks and other sequences, any same-character repeats, passwords with any few different characters (now built-in Subsets mode) * Searches of short valid Unicode strings (auto-generated from Unicode spec) --- -> Custom candidate password patterns (2000s+) <- Crackers that didn't have probabilistic candidate password generators instead added features to focus dumber exhaustive searches on reasonable sub-spaces * InsidePro introduced the mask syntax e.g. ?u?l?l?l20?d?d - Reuses the character class notation from Crack's word reject rules - Enumerates all strings matching the mask e.g. Aaaa2000 to Zzzz2099 * hashcat adopted the syntax, later extended the implementation to use Markov * John the Ripper later also adopted this syntax as mask mode, extending it with constructs similar to the rule preprocessor's e.g. ?u?l?l?l20[0-2]?d * Hybrid modes add a mask on top of a smarter and slower generator (2010s) - hashcat modes append or prepend a mask (on device) to wordlist (from host) - John the Ripper (on device) mask ?w refers to another (host) mode's "word" --- -> Probabilistic candidate password generators (early 2000s) <- * Dawn Xiaodong Song, David Wagner, and Xuqing Tian, "Timing Analysis of Keystrokes and Timing Attacks on SSH", 2001 "for passwords that are chosen uniformly at random with length of 7 to 8 characters, [...] can reduce the cost of password cracking by a factor of 50" "we model the relationship of latencies and character sequences as a Hidden Markov Model. We extend the standard Viterbi algorithm to an n-Viterbi algorithm that outputs the n most likely candidate character sequences." * Code never released, third-party n-Viterbi implementations appeared later * Independently, our "SSH Traffic Analysis" (with Dug Song, 2001) allowed to infer sudo, etc. password lengths from packet traces, and countermeasures for that were added, but it took until OpenSSH 9.5 (2023) for the timings --- -> Probabilistic candidate password generators (mid 2000s) <- * Arvind Narayanan and Vitaly Shmatikov, "Fast Dictionary Attacks on Passwords Using Time-Space Tradeoff", 2005 "Our first insight is that the distribution of letters in easy-to-remember passwords is likely to be similar to the distribution of letters in the users' native language. Using standard Markov modeling techniques from natural language processing, this can be used to dramatically reduce the size of the password space to be searched. Our second contribution is an algorithm for efficient enumeration of the remaining password space. This allows application of time-space tradeoff techniques, limiting memory accesses to a relatively small table of "partial dictionary" sizes and enabling a very fast dictionary attack." "We note the similarity of the ideas used in this algorithm to the well-known Viterbi algorithm from speech processing" --- -> Probabilistic candidate password generators (circa 2010) <- * Simon Marechal, "Etat de l’art sur le cassage de mots de passe", 2007 - Narayanan and Shmatikov's work re-applied to classical password cracking * John the Ripper jumbo "Markov mode" contributed by Simon Marechal, extended by Frank Dittrich and magnum (2007-2012+) * 1st-order Markov chain, no per-length and per-position separation * Outperforms "incremental mode" on certain tests, but requires advance choice of attack duration (via minimum and maximum strength of passwords) * Supports parallel and distributed processing (limiting a node's sub-range) * Simon Marechal, "Probabilistic password generators", 2012 - Comparison of many probabilistic models (including 2nd-order variations) --- -> Probabilistic candidate password generators (2010s) <- * Matt Weir et al., "Password Cracking Using Probabilistic Context-Free Grammars", 2009 or Pretty Cool Fuzzy Guesser (PCFG) "new method that generates password structures in highest probability order. We first automatically create a probabilistic context-free grammar based upon a training set of previously disclosed passwords. This grammar then allows us to generate word-mangling rules, and from them, password guesses" * Markus Duermuth et al., "OMEN: Faster Password Guessing Using an Ordered Markov Enumerator", 2013 "Narayanan et al.'s indexing [not] in order of decreasing probability. [We] enumerate passwords with (approximately) decreasing probabilities. On a high level, our algorithm discretizes all probabilities into a number of bins, and iterates over all those bins in order of decreasing likelihood." (3rd-order) --- -> Candidate passphrase generators (mostly 2010s) <- * Wordlist rules appending/prepending specific embedded words * Trivial word-combining Perl scripts posted to john-users (2006) * hashcat Combinator mode (2 words from 2 lists, not probabilistic) * PRINCE (PRobability INfinite Chained Elements) by atom (2014) - Sorts for increasing combined length, otherwise not probabilistic - hashcat project's princeprocessor - Kindly also contributed to John the Ripper, became a built-in mode * Passphrase mangling rulesets like for wordlists, but expect phrases (2019) * Passphrase lists e.g. extract all 2 to 6 sequences from Project Gutenberg books, sort from most to least common (2021, unreleased) --- -> Probabilistic candidate passphrase generators (2010s+) <- * Probabilistic candidate password generators also happen to generate phrases if trained on such input (or just on a real-world mix of passwords/phrases) - PCFG fares better than per-character Markov chains * "Phraser is a phrase generator using n-grams and Markov chains to generate phrases for passphrase cracking" in C# for Windows (2015) * RePhraser "Python-based reimagining of Phraser using Markov-chains for linguistically-correct password cracking" (2020) * Also includes related hand-written and generated rulesets * What about middle ground (e.g. syllables, including some followed by space)? - e.g. extract all substrings of 2+ characters, sort from most to least common, take top ~100, map them onto indices along with single characters, train/use existing probabilistic candidate password generators, map back --- -> Probabilistic candidate password generation with neural networks (2010s+) <- * William Melicher et al., "Fast, Lean, and Accurate: Modeling Password Guessability Using Neural Networks", 2016 - Recurrent neural network (RNN) predicts next character, no duplicates - 60 MB model outperforms other generators, but apparently was too slow to actually go beyond 10 million candidates so that is only simulated - 3 MB performs almost as well, takes ~100 ms per password in JavaScript * Generative Adversarial Networks (GAN) produce duplicates (~50% at 1 billion) - "PassGAN: A Deep Learning Approach for Password Guessing" (2017) - "Improving Password Guessing via Representation Learning" (2019) - "Generative Deep Learning Techniques for Password Generation" (2020) - David Biesner et al., VAE, WAE, fine-tuned GPT2 - maybe currently best? - "GNPassGAN: Improved Generative Adversarial Networks For Trawling Offline Password Guessing" "guessing 88.03% more passwords and generating 31.69% fewer duplicates" than PassGAN, which had already been outperformed (2022) --- -> Candidate password generator combinations <- * Different generators produce some unique and some overlapping candidates - It is desirable to use multiple generators and suppress cross-duplicates - In practice so far, it is most common to use multiple, but not suppress - John the Ripper's default invocation does suppress duplicates between its wordlist and incremental mode passes; hashcat's brain may do similar, too - Another way to suppress is by exhausting easy patterns (or planning to), then excluding them from complex runs (similar to policy matching) - Maybe even at the expense of most optimal order for some runs * Generators may be optimized for usage along with some other ones - e.g. probabilistic ones are generally to be used after some wordlist runs - training a probabilistic generator to perform best on its own means it may be overfit (mimic a wordlist) and crack fewer passwords beyond wordlist * Comparisons often fail to take this into account, but they should --- -> Workflow <- * Unfortunately, mostly did not fit in this talk, but is very important * Best results are achieved by using multiple approaches in multiple steps - Beyond usage of multiple candidate password generators - Cracking progress so far should inform further actions - Cracked passwords so far should be used as wordlist and for re-training * Workflow automation / job management tools exist - Functionality overlaps with distributed processing and team coordination - e.g. in contests and red teams' work - Crack had some of this bundled - its scripts split the workload and could run jobs over rsh - Tools like Hashtopolis (nee Hashtopussy) by s3inlc take this much farther * Writeups from contests starting with KoreLogic's Crack Me If You Can (2010+) --- -> Future <- * Speed - Obvious: larger and higher-clocked CPUs, GPUs, FPGAs - Major: publicly available ASICs, open hardware FPGAs (or both on one die) - Minor: further optimizations (e.g. bitsliced Lotus/Domino coming soon) * Focus - Better passphrase support (tools, datasets), arbitrary tokenization - Further neural networks, tackling the duplicates problem of generative NNs - Meanwhile, publicly release pre-generated and pre-filtered output - Application of NNs for targeting (scraping and training on user data) * Features - More (non-)hashes; yescrypt ROM support; usage of NVIDIA unified memory - Easy distributed processing (easy setup, dynamic re-assignment, security) - Ease of use, including by one-time end-users - UI or LLM guiding the user --- -> Takeaways <- Password cracking is * simple on the surface, with low barrier to entry and gradual learning curve but it also is * serious computer science and engineering, with non-trivial social aspects * still an evolving and highly competitive field welcoming new contributors * Efficiency can mean win or lose, rich or poor * Like with other offensive security fields, new techniques and results inform design and parameters of new defenses * Public availability helps level the playing field --- -> Contact information and credits <- -> e-mail -> Solar Designer -> website -> https://www.openwall.com -> Twitter -> @solardiz @Openwall Thanks to CIQ, the primary corporate sponsor of Rocky Linux, for encouraging me to give this talk https://ciq.com