Service User Credentials
Altrata Matching SFTP Guide
20 min
matching involves the systematic identification of overlaps between a customer's dataset and the altrata dataset the input consists of the customer's dataset, encompassing the specific data points we employ for matching purposes in turn, the output materializes as a comprehensive collection of matched results the figure below outlines the sftp matching user flow connecting to the sftp account for clients who have subscribed to the matching sftp service, their service account credentials will be created and shared directly with them these login details contain a temporary password that will be emailed to you upon successful creation of the account you must change the password within 24 hours of receiving the email this service user account will be used to log in to the altrata sftp site to access your matching files to clients with already leveraging our datafeed offering via sftp, when subscribed to the matching service, there will be able to see a separate directory for matching additionally, the sftp site can be connected to any sftp client note that altrata only has an sftp site there is no ftp site you will need to use a client that supports sftp connections connection details the following details should be used to connect to the altrata sftp site domain name protocol port sftp altrata com sftp 22 example connecting using winscp 1\ open winscp and start a new session enter the hostname as above, make sure the port is set to 22, and enter your sftp credentials 2\ once logged in, you should see folders available to you there will be two folders with your username for either download or upload folder structure the sftp server boasts an easily navigable directory layout users equipped with the matching service privilege will discover a designated "matching" directory in their root directory nested within this directory, two distinct subdirectories shall exist, namely "{username}downloads" and "{username}uploads" for instance, if the user's name is "pts serviceuser", the corresponding subdirectories would be titled "pts serviceuserdownloads" and " pts serviceuser uploads" the figure below highlights the sftp folder structure person matching file specifications the inputs required for the person file are detailed below file format utf 8, first row must contain all column names, pipe (|) delimiter file limit 1,000,000 record limit file header format specification field description required client supplied id this is a unique string identifier for each record that does not change string yes first name person's first name string yes middle name person's middle name string no last name person's last name string yes organization name the organization the person is associated with string yes role title person's role in the organization string yes work email person's work email string no age age of person integer no data of birth person's date of birth format in (yyyy mm dd) no boardex director id boardex director id unique id assigned to the boardex person integer no relsci id unique id assigned to the relsci person integer no boardroom insiders id unique id assigned to the boardroom insiders person integer no wealthx profile id unique id assigned to the wealth x person integer no wealthengine entity id unique id assigned to the wealthengine person integer no linkedin profile id linkedin profile id of the person integer no for any file client supplied id is a mandatory field along with at least one of the below combinations are required to pass the validation ▪ client supplied id + any of the foundation ids ▪ client supplied id + email address ▪ client supplied id + linkedin profile ▪ client supplied id+ first name, last name, and organization name organization matching file file format utf 8, first row must contain all column names, pipe (|) delimiter file limit 1,000,000 record limit file header format specification field description required client supplied id this is a unique string that does not change yes organization name full name of the organization yes organization type denotes the nature or type of business yes ticker primary ticker symbol associated with the main stock exchange listing of the organization a ticker symbol is a unique one to five letter code used by a stock exchange to identify a company no email domain the email domain that is associated with the organization no head office country the country where the headquarters of an organization is located no website the website is associated with the organization no industry primary industry vertical associated with the relsci organization no relsci id unique id assigned to the relsci organization no boardex org id unique id assigned to the boardex organization no boardroom insiders id unique id assigned to the boardroom insider organization no wealthx entity id unique id assigned to the wealth x organization no for any file client supplied id is a mandatory field, and at least one of the below combinations, which are required to pass the validation ▪ client supplied id + any of the foundation ids ▪ client supplied id + ticker ▪ client supplied id+ organization name upload files for matching after preparing the matching input file for upload, follow the steps provided in the "connecting to sftp account" section to access our altrata sftp site once connected, you have the option to either upload the file or simply drop it into the corresponding folder for matching ensure that individual files are placed in the "person" folder within the "uploads" directory, while organizational files should be placed in the "organization" folder within the same "uploads" directory every matching request will have a unique request id generated, which will be emailed to the user after the validation is completed this request id would be the reference for downloading the results after the matching processing is completed validation checks file level validation after a user uploads a file via the sftp site, a series of validation checks are carried out to ensure that the file is suitable for the matching process the following are the two different validation steps involved 1\ file inspection in the validation process, the first step involves checking the file for its format, size compatibility, has all the required headers, etc , before advancing the file for further processing if the uploaded file has one of the following errors, it will be marked as failed, and a notification will be sent to the user regarding the failure errors & warnings • the file is not a csv • the file path contains invalid characters • unable to determine file size please review the file • the file size is too large file size must be 500mb or lower • the file is binary only csv files are supported • the request limit failure, listed • invalid delimiter '\[delimiter]' the file must be delimited by '|' • the file has too many rows the maximum permitted is 1,000,000, but \[number of rows] rows were found • the file is missing all the required headers is the header row missing? • the file is missing the following headers \[missing headers] • the file has too many headers expected no more than \[number of allowed headers] headers but found \[number of headers] • the file has the following invalid headers \[invalid headers] 2\ limit on requests per day the matching service enforces a daily upload limit of 10 valid files for each user if a user attempts to upload an 11th file for matching on the same day, we will conduct all the necessary validation checks subsequently, an email will be sent to the user, detailing the reasons for failure and requesting them to resubmit the file on the following day within the allowed limit the time for the day is based on utc time 3\ row level validation after passing the initial file validation, the files will undergo row validation during this step, the data in each file will be read row by row, and we will check for valid data in the required fields necessary for matching only the rows that pass validation will proceed to the matching process, while all the rows that fail validation will be returned to the user as part of a "no match" file validation checks on the row • string validation – check for script tags • all foundation ids and altrata ids must be numeric • client supplied id is a string • email validation – email address must include the @ sign it must be a full email address, not just the domain • linkedin validation – linkedin id should be a complete url • person and organization name validation o minimum one character in first name o minimum 2 characters for last name o minimum 2 characters for organization name • duplicate client supplied ids will be treated as invalid matching process rows that successfully clear the validation phase will undergo a meticulous matching procedure to locate their counterparts within the altrata dataset this matching process encompasses various criteria, including but not limited to matching foundation ids, individuals' professional email addresses, linkedin profiles, organization tickers, and names, both for individuals and organizations the name matching aspect involves comprehensive checks such as matching against partial names, potential misspellings, and synonyms all id matchings will yield precise and exact matches regarding name matching, it can lead to either exact matches, unique matches, or possible matches unique match when the match is exact or unique (only one match found) and the match score is more than 90 and less than or equal to 100 possible match when the match found for a record is more than one & a unique match with a match score less than 90 no match match is not identified output results upon the matching process completion, the system will generate pipe (|) delimited csv files, and these outcomes will be deposited in their corresponding destinations within the download folder a folder with the date of file submission and the request id will be created to post the results for each of the matching requests (yyyymmdd req {request id}) a pair of matched files will be provided, distinctively presenting unique matches and possible matches the unique match file shall consolidate outcomes from both exact and unique matches which could be directly ingested the possible match file will have up to 5 potential matches we have identified for each record, which will need a manual review of the results before ingestion notably, all the matched results will incorporate a match score assigned to each row all these matched results will be presented in a predefined template for both person and organizational files it is noteworthy that the format of the file containing results without matches will mirror that of the input format below are the sample file screenshots for the matching result files person results organization results compliance with gdpr mandates organizations to establish a well defined data retention policy it is crucial to implement measures that ensure personal data is retained only for the necessary duration and securely deleted when no longer needed following this requirement, we have applied retention policies to the input & output folders on the sftp client server and the tables created in the matching engine this ensures proper management of personal data in accordance with gdpr guidelines input folder files uploaded to the input folder will be retained for a maximum of 7 days an automated process is in place to regularly remove any input files that exceed this period note some of the data will be stored in our database for 30 days (about 4 and a half weeks) and routine jobs are initiated to delete these copies permanently output folder the results posted in the output folder will remain accessible for 30 days (about 4 and a half weeks) this period provides users with an ample amount of time to retrieve their data additionally, an automated task is executed to remove any results from the output folder that surpass the 30 day threshold matching status all users who submit matching requests can monitor the progress of their submitted files the ongoing status can be observed in the log file within the output folder, and they will also receive notifications through email updates this is an example text file the email shown below is in the format that users will receive at different stages of the matching process figure 4 1 email design utf 8 files a utf 8 file is a text file encoded using the utf 8 character encoding standard, allowing it to represent a wide range of characters from different languages and scripts efficiently this is the optimal file format to use in the altrata matching service how to create a utf 8 file to create a utf 8 encoded csv file from excel, you can follow these steps 1\ open excel open the excel spreadsheet containing the data you want to export 2\ select data select the range of cells that you want to export to a csv file make sure the data you're exporting doesn't contain any special formatting that you want to preserve, as exporting to a csv file will only retain the text content 3\ save as go to the "file" menu and choose "save as" (or equivalent) 4\ choose file type in the save dialog box, choose "csv (comma delimited) ( csv)" as the file format 5\ specify filename specify the filename and location where you want to save the csv file 6\ click on "tools" depending on your excel version, you might need to click on a "tools" or "options" button in the save dialog box 7\ select encoding in the options/settings window, look for an option to specify the encoding choose "utf 8" as the encoding 8\ save file click "ok" or "save" to save the csv file with utf 8 encoding the excel data should now be saved in a utf 8 encoded csv file, ready to be used in the matching service