Mask the given string

Hi all,

In my company we are not allowed to keep data in Dev environment in its original form. So we have developed a masking algorithm.

The problem is I am Netezza( a non relational database) resource and it has its own limitations. In netezza there is no concept of functions in SQl. what we have to do is we have to write User Defined function in C++ and then use it in a Update query.

Since I don't know a thing about C++, I am severely stuck. Please someone help me in this.


The algorithm for masking is :

******************************************************************

Suppose my input string is : aBcVyZ123a



Now treat A-Z as 1-26 and a-z as 27 to 52. 1 to 10 will be counted as 1-10 only.



Suppose the key value which I need to add is 5. In starting key = 5



Now my algorithm is :

current value is =current value + key + previous value



a = 27 + 5 + 0= 32 =>f

B = 2 + 5 + 32 =39 =>m

c = 29 + 5 + 39 =73 =>73mod52 =>21 =>U (taking mod if value greater than 52 in case of character)

V = 22 + 5 + 21 =58 =>58 mod 52 =>6 =>F

y = 51 + 5 + 6 =62 =>62 mod 52 =10=>J

Z = 26 +5 +10=41 => p

1 = 1+5+41 =47=>47 mod 10 =7 (taking mod of 10 if value greater than 10 in case of numeric character)

2 = 2+5+7 =14=>14mod10 = 4

3 = 3+5+4=12 =>12mod10 = 2

a = 27 +5 + 2 = 34 =h



So my mask string should be



aBcVyZ123a ->fmUFJp742h
*********************************************************************

One need to take care about the structure of the program which should be
like this :

char *MASK_DATA(<string to be masked>)
{
char *maskPntr = <string to be masked>
//logic for masking

return maskPntr;
}

int main ()
{

//variable declaration
MASK_DATA("aBcVyZ123a ");
return 0;
}

I need only one function in which the whole logic of masking is written because in my UDF format of netezza there is a function evaluate() where I have to use this logic

please help me on this.

Thanks in advance.
closed account (D80DSL3A)
Hi. I have assumed that a char* to the unmasked array will be passed to the MASK_DATA function, and that it is OK to alter the string being passed.

You made an arithmetic error here:
V = 22 + 5 + 21 =58 =>58 mod 52 =>6 =>F
22 + 5 + 21 = 48, so the masked string will differ from what you figured.

Also:
1 to 10 will be counted as 1-10 only.
'10' is not a single character. Did you mean:
"0 to 9 will be counted as 0-9 only."?
I ran with this.

I have the following code, which seems to work.
I have added a line which produces test information so you can see the math being done. You would remove this line after testing is complete.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61

#include <iostream>// needed only to support test output
using namespace std;

int key = 5;// must be global?

char *MASK_DATA( char* pStr )
{
    int i = 0;
    int prevVal = 0;
    //logic for masking
    while( pStr[i] != '\0' )
    {
        int currVal = 0;
        char ch = pStr[i];

        // map pStr[i] to an int value
        if( (ch>='A') && (ch<= 'Z') )
            currVal = (int)(ch - 'A') + 1;// 1-26
        else if( (ch>='a') && (ch<= 'z') )
            currVal = (int)(ch - 'a') + 27;// 27-52
        else if( (ch>='0') && (ch<= '9') )
            currVal = (int)(ch - '0') + 0;// 0-9

        int newVal = currVal + key + prevVal;
        // TEMP for testing.
        cout << "ch= " << pStr[i] << " currVal= " << currVal << " key= " << key << " prevVal= " << prevVal << " newVal = " << newVal << endl;

        if( (ch>='0') && (ch<= '9') )
            newVal %= 10;
        else
            newVal %= 52;

        // invert map
        if( (ch>='0') && (ch<= '9') )
            pStr[i] = '0' + newVal;
        else
        {
            if( (newVal>= 1) && (newVal<= 26) )// A-Z
                pStr[i] = 'A' + newVal - 1;
            else if( (newVal>= 27) && (newVal<= 52) )// a-z
                pStr[i] = 'a' + newVal - 27;
        }

        prevVal = newVal;
        ++i;
    }

    return pStr;
}

int main()
{
    char str[] = "aBcVyZ123a";// test case
    cout << "Unmasked: " << str << endl << endl;

    MASK_DATA( str );

    cout << endl << "Masked: "  << str << endl;
    return 0;
}

Output:

Unmasked: aBcVyZ123a

ch= a currVal= 27 key= 5 prevVal= 0 newVal = 32
ch= B currVal= 2 key= 5 prevVal= 32 newVal = 39
ch= c currVal= 29 key= 5 prevVal= 39 newVal = 73
ch= V currVal= 22 key= 5 prevVal= 21 newVal = 48
ch= y currVal= 51 key= 5 prevVal= 48 newVal = 104
ch= Z currVal= 26 key= 5 prevVal= 0 newVal = 31
ch= 1 currVal= 1 key= 5 prevVal= 31 newVal = 37
ch= 2 currVal= 2 key= 5 prevVal= 7 newVal = 14
ch= 3 currVal= 3 key= 5 prevVal= 4 newVal = 12
ch= a currVal= 27 key= 5 prevVal= 2 newVal = 34

Masked: fmUvye742h

Is this about what you're looking for?
Last edited on
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
char *MASK_DATA(char *strmasked)
 {
	 int capsA=1,smalla=27,number=1,modc=52,modn=10,chcnt=26;
	 int stlen=strlen(strmasked);
	 char *maskPntr=new char[stlen];
	 int prev=0;
	 for(int i=0;i<stlen-1;i++)
	 {
		int val=0;
		if(strmasked[i]>='A' && strmasked[i]<='Z')
			val=(strmasked[i]+1)-'A'+5+prev;
		if(strmasked[i]>='a' && strmasked[i]<='z')
			val=(strmasked[i]+1)-'a'+chcnt+5+prev;
		if(strmasked[i]>='0' && strmasked[i]<='9')
			val=strmasked[i]+5+prev;

		if(isdigit(strmasked[i]))
		{
			if(val>10)
				val=val%modn;
			prev=val;
			val=val+'0';
		}
		else if(val>52)
		{
			val=val%modc;
		}
		if(!isdigit(strmasked[i]))
		{
			prev=val;
			if(val>=1 && val <=26)
				val=val+('A'-1);
			else if(val>=27 && val <=52)
				val=(val-27)+'a';
		}
		maskPntr[i]=(char)val;			
	 }
	 maskPntr[i]='\0';
		 
	 return maskPntr;
 }
Last edited on
I don't understand what will happen if 104 mod 52 becomes 0.
In this case which character we should replace 'A' or 'z'
Last edited on
closed account (D80DSL3A)
Good point. That's why 'y' mapped to 'y'. Neither condition on lines 39 or 41 were met and pStr[i] remained unchanged.

Bumping the value by one in this case would change remaining results.

Unmasked: aBcVyZ123a

ch= a currVal= 27 key= 5 prevVal= 0 newVal = 32
ch= B currVal= 2 key= 5 prevVal= 32 newVal = 39
ch= c currVal= 29 key= 5 prevVal= 39 newVal = 73
ch= V currVal= 22 key= 5 prevVal= 21 newVal = 48
ch= y currVal= 51 key= 5 prevVal= 48 newVal = 104
ch= Z currVal= 26 key= 5 prevVal= 1 newVal = 32
ch= 1 currVal= 1 key= 5 prevVal= 32 newVal = 38
ch= 2 currVal= 2 key= 5 prevVal= 8 newVal = 15
ch= 3 currVal= 3 key= 5 prevVal= 5 newVal = 13
ch= a currVal= 27 key= 5 prevVal= 3 newVal = 35

Masked: fmUvAf853i

@Vivekanand Joshi: It looks like there are 2 ambiguous aspects to the coding scheme, as expressed.

EDIT: I see that newVal must be bumped by one after mod 52 in all cases, or else ch='z', 'Z' would get missed.
Replacing lines 29-32 with:
1
2
3
4
if( (ch>='0') && (ch<= '9') )
     newVal %= 10;
else if( newVal > 52 )
     newVal = 1 + newVal%52;

now produces:

Unmasked: aBcVyZ123a

ch= a currVal= 27 key= 5 prevVal= 0 newVal = 32
ch= B currVal= 2 key= 5 prevVal= 32 newVal = 39
ch= c currVal= 29 key= 5 prevVal= 39 newVal = 73
ch= V currVal= 22 key= 5 prevVal= 22 newVal = 49
ch= y currVal= 51 key= 5 prevVal= 49 newVal = 105
ch= Z currVal= 26 key= 5 prevVal= 2 newVal = 33
ch= 1 currVal= 1 key= 5 prevVal= 33 newVal = 39
ch= 2 currVal= 2 key= 5 prevVal= 9 newVal = 16
ch= 3 currVal= 3 key= 5 prevVal= 6 newVal = 14
ch= a currVal= 27 key= 5 prevVal= 4 newVal = 36

Masked: fmVwBg964j
Last edited on
Thanks for the detailed solution guys and pointing out the mistakes. However, I forget one thing in my first post. We have to mask data for more than one table and in some of the table we have to mask the column with data type Date.

The logic for masking date is very simple. Just add +10 on the date.

But how to handle the date variable in same function without impacting the original functionality of the code?

One more point, our downstream will decrypt the data and will use it then only.

I need one more program which decrypt the masked string and give me the original result.

Really thanks for helping me out guys.
closed account (D80DSL3A)
What is the character representation of a Date type?

Please give an example of a datum for this case, and what the expected masked result would be.

I'm quite hoping that a Date type is represented by a single character from a set which is unique from the others (a-z, A-Z, 0-9), and with contiguous ascii values.

Please answer about the discrepancies in the coding scheme noted.
Thanks to tvrameshmc for spotting the %52 = 0 problem. We got lucky that this flaw was exposed by the single test case provided.

It sounds like you're happy with the adaptations I made on the fly there, but have you examined that last output to be sure it is doing what you need it to?

Problem specification must be precise or you wind up with a solution to the wrong problem (which is useless).

I think I see a way to decrypt a datum, but I will await your answers before attempting it.

EDIT: Of course I had to find out if my decryption idea would work.
It is working for the 4 test cases tried so far:

Unmasked: aBcVyZ123a  Masked: fmVwBg964j
Unmasked: aBcVyZ123a

Unmasked: agcVZx123a  Masked: fSBcIM964j
Unmasked: agcVZx123a

Unmasked: 56cVZMpqrS  Masked: 01iKpIEByX
Unmasked: 56cVZMpqrS

Unmasked: 56c3ZMpQrS  Masked: 01i3hzvSQo
Unmasked: 56c3ZMpQrS

I hope the new Date detail doesn't make this a problem I can no longer solve. An amateur is working on this, so that's a line which can be crossed quite suddenly.
Last edited on
Let me explain you more clearly. Sorry for not being clear.

Let us suppose I have two tables A and B.

I need to make a function which can take both date and string as an input and return the corresponding masked date and string. Is it possible?

In table A, I need to mask COLA which is of date type, like 15/1/2013
and In table B I need to mask COLB which is of NVARCHAR type(string for the case of c++),

So when I pass the column value for table A, I should get 25/1/2013 as a masked string. for input 25/3/2013 I should get 4/4/2013 and so on.

where as for COLB for string 'aBcVyZ123a' , I should get 'fmVwBg964j' as an answer.

It doesn't matter at all what is masked data as long as two things are achieved, But that doesn't mean I add 2 in all the characters. That would be too simple to crack.

1.) I can decrypt the data to its original form.
2.) Alphabet should be represented by alphabet and numbers by numbers.

The main reason we have to encrypt the data is because it is sensitive information and legally we are not allowed to use unmasked data in DEV and TEST environment.

Since I am decrypting the data to its original form it can be managed.
Last edited on
closed account (D80DSL3A)
I need to make a function which can take both date and string as an input and return the corresponding masked string and string. Is it possible?

Yes. I could even create an overload for MASK_DATA which takes 2 arguments, so the one function could be used in both cases.

I am relieved that the Date is expressed in a separate string, and not as some new part of the original one.
The Date conversion seems easy enough, and it is completely decoupled from the treatment of the COLB string.

I'm afraid I have found a test case for which my decryption scheme fails:
Unmasked: A6yv1MzXx9  Masked: F7LN0RXBF0
Unmasked: A6yv1MAXx9

There is an off-by-one error that I thought I'd pinned down, but I see not quite yet. If I don't get it soon I'll ask for help here.

where as for COLB for string 'aBcVyZ123a' , I should get 'fmVwBg964j' as an answer.
And this is what I get (see 1st test case) so my encryption is hopefully correct.

EDIT: No change. Decided to bump instead.
Last edited on
Can you please share the overloaded and decryption functions?
Last edited on
closed account (D80DSL3A)
I'm sorry.
I couldn't get the decryption function worked out completely.
Hopefully someone else can do it for you.
Please share the decryption code even if it is not behaving properly. I will try to modify it since it is a weekend.
closed account (D80DSL3A)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
char* UNMASK_DATA( char* pStr )
{
    int i = 0;
    int count = 0;

    while( pStr[i] != '\0' )
    {
        int maskVal = 0;
        char ch = pStr[i];

        // map ch to an int value
        if( (ch>='A') && (ch<= 'Z') )
            maskVal = (int)(ch - 'A') + 1;// 1-26
        else if( (ch>='a') && (ch<= 'z') )
            maskVal = (int)(ch - 'a') + 27;// 27-52
        else if( (ch>='0') && (ch<= '9') )
            maskVal = (int)(ch - '0') + 0;// 0-9

        int origVal = maskVal - key - count;
        
        if( (ch>='0') && (ch<= '9') )
            while(origVal<0) origVal += 10;
        else if(origVal < 1)
        {
            while(origVal<0) origVal += 52;
            if(origVal != 0)
                origVal -= 1;// new
            else
                origVal = 51;
        }  

        // invert map
        if( (ch>='0') && (ch<= '9') )
            pStr[i] = '0' + origVal;
        else
        {
            if( (origVal>= 1) && (origVal<= 26) )// A-Z
                pStr[i] = 'A' + origVal - 1;
            else if( (origVal>= 27) && (origVal<= 52) )// a-z
                pStr[i] = 'a' + origVal - 27;
        }
        count = maskVal;
        ++i;
    }

    return pStr;
}

Please post back if you get it working properly.
Try encode+decode on "SAya1MzUx9"
Last edited on
Topic archived. No new replies allowed.