![]() It places the data from both data frames into one series.Internally, the SortedNeighbourhoodIndex does the following to determine candidate links: However, knowing how this is processed internally is important for understanding when this method might return unexpected results. ![]() This shows how the sorted neighbourhood method can be used to identify candidate links even when you cannot guarantee exact matches between potential matches. # Create pandas MultiIndex containing candidate linksĬandidate_links = indexer.index(df_a, df_b)Īs can be seen in the network diagram of the result, we have candidate links between all names which would be adjacent in a sorted list. Indexer = rl.SortedNeighbourhoodIndex(on='names', window=3) Names_2 = ĭf_a = pd.DataFrame(pd.Series(names_1, name='names'))ĭf_b = pd.DataFrame(pd.Series(names_2, name='names')) The interleaved neighbourhoodĬonsider the following code, which creates a set of candidate links using the sorted neighbourhood method: This tutorial contains a series of examples, which are intended to demonstrate both the power and the quirks of the sorted neighbourhood method. ![]() To get the most out of this tutorial, you should already be familiar with indexing candidate pairs with recordlinkage. This tutorial takes a closer look at the sorted neighbourhood indexing method, which has more complex behaviour than other indexing methods. This tutorial is a supplement to our tutorial on indexing candidate pairs with the recordlinkage Python package for data integration.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |