From owner-freebsd-hackers  Sat Sep 27 11:25:25 1997
Return-Path: <owner-freebsd-hackers>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.7/8.8.7) id LAA05853
          for hackers-outgoing; Sat, 27 Sep 1997 11:25:25 -0700 (PDT)
Received: from skynet.ctr.columbia.edu (skynet.ctr.columbia.edu [128.59.64.70])
          by hub.freebsd.org (8.8.7/8.8.7) with SMTP id LAA05830
          for <hackers@freebsd.org>; Sat, 27 Sep 1997 11:25:15 -0700 (PDT)
Received: (from wpaul@localhost) by skynet.ctr.columbia.edu (8.6.12/8.6.9) id OAA13926 for hackers@freebsd.org; Sat, 27 Sep 1997 14:25:28 -0400
From: Bill Paul <wpaul@skynet.ctr.columbia.edu>
Message-Id: <199709271825.OAA13926@skynet.ctr.columbia.edu>
Subject: Idea for a software licensing scheme
To: hackers@freebsd.org
Date: Sat, 27 Sep 1997 14:25:26 -0400 (EDT)
X-Mailer: ELM [version 2.4 PL24]
Content-Type: text
Sender: owner-freebsd-hackers@freebsd.org
X-Loop: FreeBSD.org
Precedence: bulk


I'm hoping some of the more mathematically astute out there can answer
a question for me. A while ago, there was a discussion on -hackers about
possibly creating a software licensing system for FreeBSD as an incentive
to commercial software vendors to port their code. Well, the
Diffie-Hellman patent expired earlier this month, and that started me
thinking about a way to possibly use D-H key pairs as part of a licensing
scheme. I _think_ I've come up with a viable way to do this and have even
written some code to show that it works. My main goal is to come up with
a scheme that makes it next to impossible for an 'attacker' to generate
his own license codes even knowing the algorithms and some of the values
involved. But I'm rotten at math and I'm uncertain that my scheme is as
foolproof as I think it is.

Here's how it works. First of all, consider two parties that want to
participate in D-H key-exchange. The first thing the two parties have
to do is agree on two special values. The first value, p, is some large
prime number. The second value, g, is some number larger than 2 and less
than p. The value p is called the modulus and g is the root.

Now each party independently chooses a number, X, to be their secret
key. Each party keeps his secret key to himself and never reveals it to
the anyone. We'll call the first party's secret key X1 and the second
party's secret key X2.

Finally, party one and party two derive public keys (Y1 and Y2) from
their secret keys (X1 and X2) using the following computations:

		Y1 = g ^ X1 % p

		Y2 = g ^ X2 % p

Now party one and party two can exchange their public keys (Y1 and Y2).
The trick is that Y1 and Y2 are related to their corresponding secret keys
X1 and X2 in such a way that the following holds true:

		s = Y1 ^ X2 % p

		s = Y2 ^ X1 % p

In other words, each party is now able to compute a session key, s,
which is essentially a shared secret. The magic is that both parties
can arrive at the same shared secret value without having to exchange
it over a possibly unsecured communications channel where it could be
intercepted. For a third party to learn this shared secret, they would
have to compute the other two parties' secret keys, which is very hard
to do even if you know, p, g, and a set of public keys.

Okay, that's how D-H works. Now here's my idea. Let's say I'm a software
vendor. I choose two values, p and g. I keep 'g' a secret and never reveal
it to anyone. Now let's say I have an application, 'a', that I want to
protect. I choose a secret key for the application called Xa. This key
can be anything really; in my scheme I use an MD5 hash of the program
name, vendor name, program version, and some random string. From the
applications' secret key, Xa, I derive the application's secret key,
Ya using the following computation:

		Ya = g ^ Xa % p

I then embed the values for Xa, Ya and the modulus, p, into the 
application.

Now I want to make a license authorization file for this application so
that some customer of mine can run it. The file contains certain pertinent
information: name of the feature I want to activate, number of concurrent
users, expiration date, hostid, and so forth. All of this information
is hashed together to generate a value which becomes what I call the
license secret key, Xl. (Again, I used an MD5 hash for this.) The
authorization passcode is actually the license public key Yl, which is
generated like this:

		Yl = g ^ Xl % p

I then include Yl in the license file as the activation code and ship
the whole thing off to the customer.

Now for the verification part. When the application program runs, the
license checking code reads the license file and regenerates the license
secret key, Xl, by computing the MD5 hash of all the license information
(feature name, number of users, etc...). It also reads the authorization
passcode and considers that to be Yl. So now the code has 5 values:
Xa, Ya, Xl, Yl and p. It performs the following computations:

		s1 = Ya ^ Xl % p

		s2 = Yl ^ Xa % p

This yields two 'session keys,' s1 and s2. If s1 == s2, then the code
considers the license to be valid and the program runs, otherwise it
signals an error and bombs.

The logic is this: the only way s1 and s2 can come out being the same
is if the license public key, Yl, and the application public key, Ya,
were both generated using the same Diffie-Hellman parameters, g and p.
But I, the vendor, keep g a secret. (Unfortunately, you can't keep p
a secret since the application needs it to compute the session keys.)
This means that only I, the vendor, can derive a license public key (Yl)
that will satisfy the equality.

Now comes the part I'm not sure about. In theory, it is possible for
an 'attacker' to obtain values for p, Xl and Yl. My question is: can
the attacker use these values to compute g? In other words, is it
very hard for someone to calculate g even knowing p, Xl and Yl, or is
there some mathematical shortcut that I'm just too stupid to see?
If an attacker can calculate g without too much trouble, then the whole
scheme falls apart because the attacker will be able to generate his
own authorization passcode for any license parameters he chooses.

You might be wondering what there is to gain by using a system like
this. The answer is that it avoids (I hope!) a problem present in certain
other schemes that use what I call 'checksumming' to verify license
passcodes. With certain kinds of license management software, the
passcode is essentially a hash or checksum of the other information
in the file which has been encryped using keys that only the vendor
knows. The problem is that in order for the application to verify
that the license is valid, it must contain the same keys. To verify
the passcode, the license software embedded in the application will
read the information from the license file, generate its own hash
value and encrypt it using its own copies of the encryption keys. 
The result is its own version of the authorization passcode which it
can check against the passcode in the license file. If the code in the 
file does not match the code computed by the application, the program 
bombs.

The problem is that the entire security of the scheme relies on the 
assumption that nobody will be able to trick the application into 
revealing the code that it has computed internally. This is a big 
mistake. Say for example than an attacker changes the expiration date
in his license file. The application will read the information from the 
file, compute what it believes should be the correct authorization code 
and then compare that the code supplied in the license file. The values 
won't match of course, but supposing the attacker is able to extract the 
computed (and correct!) authorization code from the program's address 
space. Now all he has to do is put this code in the license file in
place of the old one, and presto! he has a perfectly valid license that 
expires whenever he wants.

The irony here is that licensing software based on this kind of scheme
may include all sorts of special hash algorithms, strong encryption,
and any number of keys, yet the whole mess can be easily defeated without
knowing anything about cryptography. In effect, you use the vendor's
own software against him. With my scheme, you eliminate this vulnerability
by giving the application enough information to verify that the
authorization passcode is genuine, but not enough for it to generate 
passcodes of its own.

Of course, any licensing software can be defeated by patching the
application executable, and my scheme does nothing to defend against
that. But that kind of attack requires a fair amount of OS-dependent
and CPU-dependent knowledge which the ovwewhelming number of computer
users in the world today are unlikely to have. Also, consider what
happens when a vendor makes their software available for multuple
platforms and uses the same licensing software for each one. This makes 
it convenient for the vendor since he only has to generate one license 
file, and the customer can use it on whatever platform he chooses. It 
also makes it very convenient for a software pirate: rather than have
to generate patched binaries for every platform, all he has to do is 
coerce the application into generating a magic unlimited-use license on 
one platform, and he has automatically defeated the licensing scheme for 
all platforms in one stroke.

Anyway, the question stands: does my idea have merit, or should I
make a reservation for the conical hat.

-Bill

-- 
=============================================================================
-Bill Paul            (212) 854-6020 | System Manager, Master of Unix-Fu
Work:         wpaul@ctr.columbia.edu | Center for Telecommunications Research
Home:  wpaul@skynet.ctr.columbia.edu | Columbia University, New York City
=============================================================================
 "It is not I who am crazy; it is I who am mad!" - Ren Hoek, "Space Madness"
=============================================================================