From owner-freebsd-hackers Sat Sep 27 11:25:25 1997 Return-Path: Received: (from root@localhost) by hub.freebsd.org (8.8.7/8.8.7) id LAA05853 for hackers-outgoing; Sat, 27 Sep 1997 11:25:25 -0700 (PDT) Received: from skynet.ctr.columbia.edu (skynet.ctr.columbia.edu [128.59.64.70]) by hub.freebsd.org (8.8.7/8.8.7) with SMTP id LAA05830 for ; Sat, 27 Sep 1997 11:25:15 -0700 (PDT) Received: (from wpaul@localhost) by skynet.ctr.columbia.edu (8.6.12/8.6.9) id OAA13926 for hackers@freebsd.org; Sat, 27 Sep 1997 14:25:28 -0400 From: Bill Paul Message-Id: <199709271825.OAA13926@skynet.ctr.columbia.edu> Subject: Idea for a software licensing scheme To: hackers@freebsd.org Date: Sat, 27 Sep 1997 14:25:26 -0400 (EDT) X-Mailer: ELM [version 2.4 PL24] Content-Type: text Sender: owner-freebsd-hackers@freebsd.org X-Loop: FreeBSD.org Precedence: bulk I'm hoping some of the more mathematically astute out there can answer a question for me. A while ago, there was a discussion on -hackers about possibly creating a software licensing system for FreeBSD as an incentive to commercial software vendors to port their code. Well, the Diffie-Hellman patent expired earlier this month, and that started me thinking about a way to possibly use D-H key pairs as part of a licensing scheme. I _think_ I've come up with a viable way to do this and have even written some code to show that it works. My main goal is to come up with a scheme that makes it next to impossible for an 'attacker' to generate his own license codes even knowing the algorithms and some of the values involved. But I'm rotten at math and I'm uncertain that my scheme is as foolproof as I think it is. Here's how it works. First of all, consider two parties that want to participate in D-H key-exchange. The first thing the two parties have to do is agree on two special values. The first value, p, is some large prime number. The second value, g, is some number larger than 2 and less than p. The value p is called the modulus and g is the root. Now each party independently chooses a number, X, to be their secret key. Each party keeps his secret key to himself and never reveals it to the anyone. We'll call the first party's secret key X1 and the second party's secret key X2. Finally, party one and party two derive public keys (Y1 and Y2) from their secret keys (X1 and X2) using the following computations: Y1 = g ^ X1 % p Y2 = g ^ X2 % p Now party one and party two can exchange their public keys (Y1 and Y2). The trick is that Y1 and Y2 are related to their corresponding secret keys X1 and X2 in such a way that the following holds true: s = Y1 ^ X2 % p s = Y2 ^ X1 % p In other words, each party is now able to compute a session key, s, which is essentially a shared secret. The magic is that both parties can arrive at the same shared secret value without having to exchange it over a possibly unsecured communications channel where it could be intercepted. For a third party to learn this shared secret, they would have to compute the other two parties' secret keys, which is very hard to do even if you know, p, g, and a set of public keys. Okay, that's how D-H works. Now here's my idea. Let's say I'm a software vendor. I choose two values, p and g. I keep 'g' a secret and never reveal it to anyone. Now let's say I have an application, 'a', that I want to protect. I choose a secret key for the application called Xa. This key can be anything really; in my scheme I use an MD5 hash of the program name, vendor name, program version, and some random string. From the applications' secret key, Xa, I derive the application's secret key, Ya using the following computation: Ya = g ^ Xa % p I then embed the values for Xa, Ya and the modulus, p, into the application. Now I want to make a license authorization file for this application so that some customer of mine can run it. The file contains certain pertinent information: name of the feature I want to activate, number of concurrent users, expiration date, hostid, and so forth. All of this information is hashed together to generate a value which becomes what I call the license secret key, Xl. (Again, I used an MD5 hash for this.) The authorization passcode is actually the license public key Yl, which is generated like this: Yl = g ^ Xl % p I then include Yl in the license file as the activation code and ship the whole thing off to the customer. Now for the verification part. When the application program runs, the license checking code reads the license file and regenerates the license secret key, Xl, by computing the MD5 hash of all the license information (feature name, number of users, etc...). It also reads the authorization passcode and considers that to be Yl. So now the code has 5 values: Xa, Ya, Xl, Yl and p. It performs the following computations: s1 = Ya ^ Xl % p s2 = Yl ^ Xa % p This yields two 'session keys,' s1 and s2. If s1 == s2, then the code considers the license to be valid and the program runs, otherwise it signals an error and bombs. The logic is this: the only way s1 and s2 can come out being the same is if the license public key, Yl, and the application public key, Ya, were both generated using the same Diffie-Hellman parameters, g and p. But I, the vendor, keep g a secret. (Unfortunately, you can't keep p a secret since the application needs it to compute the session keys.) This means that only I, the vendor, can derive a license public key (Yl) that will satisfy the equality. Now comes the part I'm not sure about. In theory, it is possible for an 'attacker' to obtain values for p, Xl and Yl. My question is: can the attacker use these values to compute g? In other words, is it very hard for someone to calculate g even knowing p, Xl and Yl, or is there some mathematical shortcut that I'm just too stupid to see? If an attacker can calculate g without too much trouble, then the whole scheme falls apart because the attacker will be able to generate his own authorization passcode for any license parameters he chooses. You might be wondering what there is to gain by using a system like this. The answer is that it avoids (I hope!) a problem present in certain other schemes that use what I call 'checksumming' to verify license passcodes. With certain kinds of license management software, the passcode is essentially a hash or checksum of the other information in the file which has been encryped using keys that only the vendor knows. The problem is that in order for the application to verify that the license is valid, it must contain the same keys. To verify the passcode, the license software embedded in the application will read the information from the license file, generate its own hash value and encrypt it using its own copies of the encryption keys. The result is its own version of the authorization passcode which it can check against the passcode in the license file. If the code in the file does not match the code computed by the application, the program bombs. The problem is that the entire security of the scheme relies on the assumption that nobody will be able to trick the application into revealing the code that it has computed internally. This is a big mistake. Say for example than an attacker changes the expiration date in his license file. The application will read the information from the file, compute what it believes should be the correct authorization code and then compare that the code supplied in the license file. The values won't match of course, but supposing the attacker is able to extract the computed (and correct!) authorization code from the program's address space. Now all he has to do is put this code in the license file in place of the old one, and presto! he has a perfectly valid license that expires whenever he wants. The irony here is that licensing software based on this kind of scheme may include all sorts of special hash algorithms, strong encryption, and any number of keys, yet the whole mess can be easily defeated without knowing anything about cryptography. In effect, you use the vendor's own software against him. With my scheme, you eliminate this vulnerability by giving the application enough information to verify that the authorization passcode is genuine, but not enough for it to generate passcodes of its own. Of course, any licensing software can be defeated by patching the application executable, and my scheme does nothing to defend against that. But that kind of attack requires a fair amount of OS-dependent and CPU-dependent knowledge which the ovwewhelming number of computer users in the world today are unlikely to have. Also, consider what happens when a vendor makes their software available for multuple platforms and uses the same licensing software for each one. This makes it convenient for the vendor since he only has to generate one license file, and the customer can use it on whatever platform he chooses. It also makes it very convenient for a software pirate: rather than have to generate patched binaries for every platform, all he has to do is coerce the application into generating a magic unlimited-use license on one platform, and he has automatically defeated the licensing scheme for all platforms in one stroke. Anyway, the question stands: does my idea have merit, or should I make a reservation for the conical hat. -Bill -- ============================================================================= -Bill Paul (212) 854-6020 | System Manager, Master of Unix-Fu Work: wpaul@ctr.columbia.edu | Center for Telecommunications Research Home: wpaul@skynet.ctr.columbia.edu | Columbia University, New York City ============================================================================= "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness" =============================================================================