Difference between Nodupkey and Nodup in Proc Sort ... Continued

<<< Back to Previous part of this article

Now let's explore Nodupkey option...

Consider the same data again :

*--------------------------------------------------------------------;
Data Sample;
input name $ X Y Z;
cards;
A 1 2 3
A 4 5 6
B 1 2 3
A 1 2 3
;
Run;
*--------------------------------------------------------------------;

Let's use option Nodupkey ...

*--------------------------------------------------------------------;
proc sort data = Sample nodupkey;by name;run;
*--------------------------------------------------------------------;

With Nodupkey option, Data would get sorted on name, and all duplicate on the basis of only "name" would be removed. As stated in the previous blog

Nodupkey removes the observations duplicate in data just on the basis of variable(s) listed with by statement in Proc Sort. Basically the variable(s) in the by statement is considered as "key" and hence Nodupkey removes duplicate keys.


Below is the demonstration of what happens, initially there were three "A"s in "name" variable and as we have given only "name" with by statement, all except first "A" would be removed.


Now try following code to understand it better.

*--------------------------------------------------------------------;
proc sort data = Sample nodupkey;by name X;run;
*--------------------------------------------------------------------;

This time, only one records would be deleted as with combination of "name" and "X", only first and 4th records are duplicates.



Enjoy reading our other articles and stay tuned with ...


Kindly do provide your feedback in the 'Comments' Section and share as much as possible.

1 comment:

Do provide us your feedback, it would help us serve your better.