Ask Analytics: Difference between Nodupkey and Nodup in Proc Sort ... Continued

<<< Back to Previous part of this article

Now let's explore Nodupkey option...

Consider the same data again :

*--------------------------------------------------------------------;

Data Sample;

input name $ X Y Z;

cards;

A 1 2 3

A 4 5 6

B 1 2 3

A 1 2 3

;

Run;

*--------------------------------------------------------------------;

Let's use option Nodupkey ...

*--------------------------------------------------------------------;

proc sort data = Sample nodupkey;by name;run;

*--------------------------------------------------------------------;

With Nodupkey option, Data would get sorted on name, and all duplicate on the basis of only "name" would be removed. As stated in the previous blog,

Nodupkey removes the observations duplicate in data just on the basis of variable(s) listed with by statement in Proc Sort. Basically the variable(s) in the by statement is considered as "key" and hence Nodupkey removes duplicate keys.

Below is the demonstration of what happens, initially there were three "A"s in "name" variable and as we have given only "name" with by statement, all except first "A" would be removed.