Duplicate data always seems to creep it’s way into your system no matter how hard you try. Unfortunately, in FileMaker there is no easy way to deal with junk. Or is there?
This process depends on the fact that you have a unique id in your table. Ideally, this is an auto-enter serial number. If you have one of these then we are ready to go… if not, shame on you! You should pretty much always have a unique id of some type in your table and quite frankly you might as well make it a serial number. Go ahead and create one now… I’ll Wait.
Alright, at this point you are ready to go. Now you just need to decide what fields determine if the data is duplicate. If you want to weed out all products with the same sku or if you want to axe all customers with the same first name, last name and zip this solution is going to work for you.
Now that you have your fields ready, we are going to create a self-relationship. Create a new table occurrence of the table you want to remove duplicates from and connect it to the original one. Using my two examples from above, here is what your relationships should look like:

or

This relationship says: “Find me all the records that don’t have the same unique id as this one but have matches in all the fields I want.” You can add as many predicates to this list as you want and it will still work.
Note: Predi-whats? Predicates. That’s the nerd word for the conditions you add in the Edit Relationship dialog box when you tell FileMaker how two tables relate.
Now that you have your relationship setup you have to make yourself a looping script. For this example, the layout containing all the records is called record_list and the relationship we are using to determine if there are duplicates is named duplicate_records:
Go To Layout[ `record_list` ]
Show All Records
Go to Record/Request/Page [ First ]
Loop
If[ Not IsEmpty( duplicate_records::ID ) ]
Delete Record/Request [ No Dialog ]
Else
Go to Record/Request/Page [ Next, Exit After Last]
End If
End Loop
So you are basically just looping through all the records and checking to see if there is a related duplicate…. If there is you don’t need it, so we just remove the record.
Note: This method of deleting records will remove the oldest record in the table in terms of creation order. So if you were to create a record today and one tomorrow, the record you created today would be deleted. If you would like to keep the oldest version of the duplicate record, just sort your
IDfield in descending order before executing the above loop
Instead of showing all the records, you first to narrow your found set by simply finding the records that have duplicates. To do this just execute a find on the id field of the duplicate_record table occurrence while you are on the record_list layout. Whatever you do… your table is now free of pesky, garbage data.


Robert Jackson
7-16-2007
H there - I tried this and thought it was a pretty good idea. However I had a problem because the deleted duplicate was the original and this meant that any related tables using that id key would now be broken.
e.g
id: 10001, firstname: Robert, lastname:Jackson, employer: FileMaker
id: 10002, firstname: Robert lastname:Jackson, employer: FileMaker
The deletion of the duplicate would be id:10001. This key may be used in tables such as addresses or tasks or communications etc. Deleting id 10001 would orphan all records that had used that ID and wouldn’t pick up on the id now being 10002!
Better to be able to delete the most recent record created by altering the script to go the last record and then loop to prev before deleting most recently created record/s.
Whatdya think?
Jesse Antunes
7-16-2007
@Robert - Thank you for this excellent comment. You could definitely achieve the desired results by going to the last record and looping backwards or by just doing a descending sort on the ID field first. The only reason why I hesitate to do the loop starting from the bottom is because I have never done a reverse loop before. Maybe it’s time to expand my horizons! I will edit the post so as to warn people about which record they are deleting
Geoff Coffey
7-16-2007
@jesse, @robert: Also, it could conceivable be argued in some cases that you should keep the last record since it was the one most recently added. But different situations may call for different priorities. And Robert, you’re absolutely right to point out that (as is always the case when deleting) you have to be careful about related data.
Jesse, it would be fun some day to do another version of this that keeps a list of IDs as it goes, and fixes related records once it gets to the final (keeper) record.
David Head
7-18-2007
Just as a matter of interest, what is the purpose of the first script step:
Go To Layout[ original ]
Curious
Jesse Antunes
7-18-2007
@David - it’s to align the stars appropriate so the nexus… of er… Saturn is peaked…. Um…. Yea it’s a little confusing. I actually named my layout original… as in it displayed original records. It’s definitely misleading so I changed it. Thanks David!
Steve Wright
7-18-2007
I have been using this technique for quite some time, Im just planning to create a “syncronise feature” regarding which one to delete is a bit of a pain.
I have a unique id throughout my tables, I also have a created and modified field along with a document id.
In theory I should be able to check if doc id’s match, unique id’s dont and sort by the last modified date, however…
On importing filemaker changes the modified date to the date of import and as perform auto enter options is required for creating a new unique id this creates a problem.
Do you have any thoughts on this o a workaround to save my sore head?
Bob Walker
7-23-2007
I work for an elementary school in Orlando. Is it possible to use the same scrip to evaluate duplicate addresses and then omit it? Is this even usable in FMP6? After I created the script and just substituted the Delete for Omit it omitted all records from the file.
Thank you for your help.
Geoff Coffey
7-23-2007
@bob: Unfortunately, you said “FileMaker 6″ which makes smoke start coming out of Jesse’s ears. He is, like, 12 or something. So I’ll answer this one:
This script is made simple because the records are deleted as you go. This means that once you’ve removed all the duplicates but one, there will be no more duplicates, so the last won’t be deleted. If you
Omitinstead, it won’t work right (as you discovered). If you want toOmityou’ll have to get a little trickier. First, remove theID ≠IDpredicate from your duplicates relationship. Then change this line:to this instead:
(Only, instead of Customers, you’ll use the main table occurrence for the table you’re removing duplicates from.) This code will cause the stuff in the
Ifto run on all but the first of the duplicate records. Then you can switch theDeletestep to aDuplicateand it should work.As for FileMaker 6, this technique won’t translate to 6 directly because 6 can’t do multi-predicate relationships. If you must do it, you can create a concatenated key — in other words, create a stored calculation field where you string together all the values you want to match on. Then create a new relationship matching this key to itself. Bear in mind, though, that this will only work with a relatively short set of data (approximately 80 characters, although FileMaker 6’s maximum index entry length rules were esoteric, so that’s not exactly right).
Your best bet, of course, is to use 9
Douglas Alder
7-23-2007
One approach to preserving related records is to add some logic to the loop that checks to see if there are related records in various linked records and then use script variables to reassign these records.
I did this for a solution where new records were being imported from a web form. I have a check to see if the ‘first name’ and ‘last name’ and ‘birthdate’ match. If so, then I offer to update the info.
Still some records slip through and I have duplicates occasionally.
So I made a very wide layout and put portals to important related records (invoices, payment info and correspondence, etc.)
I made a script that searches for duplicates (I didn’t go the self-join portal route)
Then I made two buttons, one that are labeled ‘Keep this Record, Delete Below’ and the other says ‘Keep this Record, Delete Above’ Depending on which button you push, the related records are assigned either to the one you are on, or the one below it and the record that has been stripped of it’s related records is deleted.
There will likely be some fields contained on the record to be deleted that I want to keep the values, so I set the values of those fields to a batch of script variables, then switch records and set those fields to the newly created script variables.
Then I run through the portals and do something similar.
Here is an example of a loop I use:
It is a little complicated, but basically, I am selecting the record I want to keep, storing it’s ID in a variable, then going to the other record and reassigning all the ’soon-to-be-deleted’ record’s related records to the script variable’s ID.
Clear as mud?
Geoff Coffey
7-24-2007
@douglas: Great comment. Jesse and I talked about the related records question and figured we would use a similar approach. The only thing I would add is that I tend to avoid scripting portals. You may have to add an extra layout or two, and you need a few extra script steps, but in the end, I think it is safer to
Go to Related Recordsand loop through the records instead.The trouble with portal looping is twofold: First, the portal has to remain on the layout. I try hard to avoid scripts that will break if someone “cleans up” a layout. Second, it is too easy to lose your place on a portal. Many things can cause FileMaker to switch back to the first row (like committing the record, switching modes, switching layouts, etc…). Sometimes a script grows beyond our original expectations, and it becomes very hard to remember that you need to be careful about portal context. (For example, imagine you later call a subscript from inside your loop to do something shared by other scripts. Now you just have to know not to commit in that new subscript).
So if it were me, I’d revise your script to switch layouts for each relationship. But otherwise, I love it
Keriann K
8-13-2007
Thanks so much for this. It has been a great solution for me, even as someone who is new to using scripts.
One question- is there anyone to bypass that nagging dialog box that asks if I’m sure I want to permanently delete each record? I’m not looking forward to spending this afternoon clicking delete several thousand times…
Thanks again!
Geoff Coffey
8-13-2007
@keriann: If you look closely, you’ll see the
No Dialogoption on theDelete Recordscript step. That means that in FileMaker, when you select that script step, you should turn on its “Perform without dialog” option. This tells FileMaker not to show a dialog box before deleting the record. Hope this helps!Erik Cayré
8-23-2007
I just thought of a way to solve the orphaned related records problem:
Add a text field to the table where the dupes will be deleted, and populate it with the unique ids already present in the table’s primary key field.
Then change all relationships from using the unique key field to using the new field. Everything will still be related exactly as before.
Now, in the find/delete dupes loop, just before deleting: add the key of the record being deleted to the new key field of all self related records, separated by a newline.
The end result will be that after all dupes are gone, all records formerly related to any now deleted record, will now be related to the ’surviving’ record effectively consolidating all related redords to the kept records…
Remember to add an auto-enter calc to the new key field which adds the real unique id, in case it is missing, so that the mechanism stays functional).
Hope this helps!
Paul
10-6-2007
A duplicate record in my book is a record where ALL fields are identical to the original record (with the exception of the record number or record ID). If you want to remove duplicate records ALL fields have to be checked - save the ID field. You can do that by integrating the values of all fields (except record ID) of one record into one variable x - like a checksum over the record would do, and then insert that value into a new “checksum” field created for the purpose:
x = field_1 & field_2 …field_n
Then sort all records based on the “checksum” field, so that identical records are next to each other.
Then simply compare the value of x of record 1 with the value of
x of record 2 - and if they are identical you can delete one of the two records.
If you have more than two duplicate records you simply run the script again - until the number of records stays constant.
As my Filemaker sadly does not allow me to export scripts I cannot show you the actual text of the script.
Geoff Coffey
10-6-2007
@Paul: That will work in many cases, but it has a pretty significant drawback. If you make your calculation field stored, you are effectively doubling the amount of data FileMaker has to store and manage. That can have a significant impact on backup systems, increase the cost of disk systems, and slow things down if you’re dealing with a large database.
If you make the calculation unstored, then the search performance will be very poor in a medium-to-large table.
The method in this article will perform well on large tables, and although it might introduce some indexing overhead, the impact will be much less than a full duplication of all data.
Dave Bertram
12-2-2007
I just want to compare last names and selectively delete the ones I don’t want. Do I need a script or is there a feature, like in itunes that shows duplicate records ?
Geoff Coffey
12-2-2007
@dave: Switch to Find mode, put
!in the Last Name field, then sort by Last Name. The!find operator finds all records with duplicates in that field.Javier Miranda V.
4-8-2008
The solution seems elegant since it doesn´t require the definition of new fields (except for the ID that almost every DB has already) but…. I don´t get one thing:
If[ Not IsEmpty( duplicate_records::ID ) ]
What is duplicate_records? a table? relations can have a name?
The tables in your example are named products and Products_dup, so the notation duplicate_records::ID don´t make sense to me).
Jesse Antunes
4-8-2008
@Javier: I can see how it’s a little bit confusing. Just before that code block I switched my example to a more generic look. So `duplicate_records` is the table occurrence that is used to determine if there is a duplicate record.
So if you were looking at the first two examples, the fields you would check for are
Products_dup::IDorCustomers_dup::ID. If you find a value in those fields, then you know you have a duplicate record and you can just delete it.Søren Dyhr
5-5-2008
I’ve just learned that the approach showed here, not is most potent algoritm one could come up with, using the scriptmaker steps availiable.
Following script using a tecnique introdced to me by Agnès Barouh of Tic Tac:
[code]Sort Records [ Speci?ed Sort Order: Contacts::FistName; ascending
Contacts::LastName; ascending
Contacts::ZipCode; ascending ]
[ Restore; No dialog ]
Replace Field Contents [ Contacts::gStrainedList; Replace with calculation: Case(
Get( RecordNumber ) ? 1 and Contacts::FistName = $t and
Contacts::LastName = $u and Contacts::ZipCode = $w;
Let( $theIDs = $theIDs & ¶ & Contacts::RecordID; “” );
Let(
[
$t = Contacts::FistName;
$u = Contacts::LastName;
$w = Contacts::ZipCode
];
“”
)
) ]
[ No dialog ]
Set Field [ Contacts::gStrainedList; $theIDs ]
Go to Related Record [ From table: “Contacts 2”; Using layout: “Contacts” (Contacts) ] [ Show only related records ]
Delete All Records [ No dialog ]
[/code]
–sd