Never Worry about Duplicate Data Again with the Unique ID Creator Macro.
A unique ID field is hugely important for all datasets in order to be able to distinguish between records and to better manage your data. The Record ID tool in Alteryx is a very quick and simple way of creating a unique ID by essentially adding 1 for every row of data. However, this is not sufficient enough when working with more dynamic data. What if we have data coming from various sources and we want a unique ID field that is unique for all data in datasets of all workflows? What if the order of the data keeps changing meaning that using the Record ID tool will always change the ID of each record? The solution in overcoming these issues I had working with a London craft brewer was the Unique ID Creator macro.
First things first… what is a macro?
A macro is a group of Alteryx tools that are built into a single tool so that the macro can deploy a consistent process across multiple workflows. My Unique ID Creator macro is designed to be added at the end of a workflow to create a unique ID value for each record of data. It is designed to always create an ID that is unique for not only all records within the particular dataset but all datasets from other workflows that also use the macro.
Using a consistent Unique ID across multiple datasets
The macro requires you to select the fields in your dataset which ensure that the combination of these fields are always unique for each record. It is this unique set of field values that builds the Composite Key which is the concatenation of the all the values. In the example below, the Composite Key is built from Customer, Date and FileName.
In this dataset, every combination of these three values should be unique. This Composite Key is then used to build the Transaction ID which is the right-sided number of the Unique ID. The list of Composite Keys that pass through the macro are stored in an Alteryx database (.yxdb) file inside the macro. Whenever data passes through the macro, it checks for this Composite Key in the existing .yxdb file and either keeps the same Transaction ID and therefore Unique ID if the Composite Key already exists or creates a new Transaction ID and Unique ID if the Composite Key is a new combination. Using the Composite Key ensures that the Unique ID will always be consistent and unique.
It is also necessary to select the field that contains the name of the source of data. In the example, this is the FileName field. This value is necessary in order to distinguish data between different workflows. This FileName value is used to create a Source ID value which is the left-sided number of the Unique ID. The Unique ID is created using this combination of the Source ID from the FileName and the Transaction ID from the Composite Key. This Source ID is vital in order to avoid duplicates of IDs between different datasets and it also helps you identify the source of your data quickly.
Keeping an audit trail
The macro outputs additional useful fields to provide information about your data. These include a Deleted field for existing records that no longer pass through the macro, a Created Timestamp to show when the record first passed through the macro, and an Updated Timestamp for existing records that have changed values since the last time it passed through the macro.
The latter field is created based on the fields you selected for the Record Hash. You may have noticed this in the previous image where Amount and Discount are selected. The Record Hash is the combination of values selected in the macro which has then been applied a hash function. The hash function creates a type of code of a certain length that can easily be used for data comparison and speed up data processes. The fields used for the record hash will usually contain numeric values where the values can change. The macro will compare the new record hash with the original record hash for each Composite Key and if the record hash changes then it will update the Updated Timestamp with the current date and time. This therefore allows you to have transparency over when a record changes without Alteryx treating the record as a completely new row of data.
The Created Timestamp and Updated Timestamp fields were very useful in providing incremental data to an external client. We only sent data when the dates of these two values were the same as today to ensure that the client would only be receiving newly created or newly updated data each day rather than the whole dataset.
You can find more information on the hash function here.
Increase Robustness with a Mapping File
Finally, we created a mapping file to make the macro more robust. The mapping file was created as an Excel file and implemented within the macro to ensure that the correct fields are being selected for the Composite Key and Record Hash for each workflow. For example, the Sales workflow should be using the fields Customer, Date and FileName for the Composite Key.
Selecting any wrong combination of fields will create an error and stop the macro from creating any invalid unique IDs or changing the existing IDs. Although this may seem like a redundant step, this avoids any human error and it provides clear transparency on how the keys were created. More importantly, you can even get the macro to tell you which fields to select if you’re ever feeling lazy. Finally, the Source ID value comes from this mapping file.
The Unique ID Creator macro is a great way to start organising and consolidating all relevant data from multiple workflows. Having an accurate Unique ID field is vital for data management and using this macro can be a way to ensure this happens. The great thing about this macro as well is that it can be further customised to suit different needs. For example, we can tailor this macro to output additional fields about your data beyond the created and updated dates. Alternatively, you can do some auditing on your data by identifying not just when but also why the record hash value changed. The Unique ID Creator macro is definitely a great way to ensure that you have a robust data infrastructure to facilitate further analysis.