The data-driven Digital Transformation process forces us to ensure the quality of this asset, to consequently be able to rely on data-driven decision making. When combining different data sources, there are many opportunities for data to be duplicated or mislabeled among other considerations.
Email marketing data cleansing or data cleaning will help us to correct or eliminate duplicate data, detect spamtraps, inactive or hyperactive users, incorrect or incomplete data within a data set and thus be more efficient in marketing campaigns and have better deliverability in the case of email marketing campaigns.
In this post we will discuss the following topics:
- The importance of data cleansing
- Data cleansing process
- Glossary of database cleansing terms: SpamTraps
Importance of database hygiene and verification:
- Keeps data clean and available
- Ensures IP and domain reputation are healthy
- Prolongs the life of mail resources
- Increases deliverability metrics
- Improves sender score and sender quality
- Helps in-boxing and consequently increase ROI
Data cleaning process:
Data Analysis
The first phase is to perform a data audit in order to analyze the data and identify spamtraps, bounces, invalid or other unwanted emails.
Workflow and mapping rules:
Next, the workflow defines the detection of anomalies detected in the previous phase. It is specified after data analysis to obtain information about existing anomalies.
Verification:
In this phase, the correctness and effectiveness of the transformation workflow is evaluated. This phase consists of multiple iterations to verify that all errors are being corrected correctly.
Transformation (may or may not apply)
Once the data is verified and validated, the transformation steps will be executed to update the data in the data warehouse.
Backflow of data cleaned
Finally, after all errors have been removed, the erroneous data must be replaced with the cleaned data.
Glossary of database cleaning terms:
Once we have performed database cleaning, as results we will have the following groups:
Greylist: Emails that can generate some problem at the time of being used in the CRM program.
Invalid: Emails that we do not recommend to use in the CRM program because they are problematic.
Valid: List of valid emails to send according to our checks.
Within the invalid ones we will have the following typologies:
blocked Header Keyword: They are mails that in the left part (header) include combinations of symbols that we associate with potential unwanted mails, created only to fill in a form, random characters, swear words, etc.
Complainer: These are emails that usually generate a high number of complaints, compared to a historical database kept up to date. They are usually valid emails.
Escalator: These are emails that, apart from clicking the spam button, usually escalate the complaint to the level of providers, hostings, registrars, etc.
HardBounce: These are emails that in the past gave hard bounce messages. They may be valid emails, but they are one step away from becoming spamtraps.
Invalid DataMaid – Content exclusion: According to our algorithm criteria, these are emails that do not usually accept advertising.
Invalid DataMaid – IP Blocklist: These are emails that usually produce actions / opens or clicks / from IPs that appear in some blacklist.
Invalid DataMaid – Offline/NoResponse: They are mails with the mail server with some configuration problem in their DNS records.
Invalid DataMaid – PageSize: They are mails that usually have an invalid agent, or that open from empty pages of content.
Invalid EHLO: The EHLO verification statuses are detailed in the following slide.
Spamtraps: More detailed below.
Within the group of invalid data we highlight the spamtraps:
“Spamtraps are email addresses created and maintained by ISPs and third-party blacklisting companies for the sole purpose of detecting spammers. Using these types of addresses in an email can damage your reputation and get you blacklisted.”
Below we make a description of the different typologies of spamtraps that we may have:
Recycled Traps:
These are recycled addresses that have been left in disuse or even the same user asked the provider to unsubscribe. The address has been inactive for some time and has even given hardbounces, but the provider has reactivated and reused it to expose and block emails from senders who do not responsibly manage their email data.
Pristine Traps:
These email addresses generated by vendors or affiliated organizations and are typically posted on public websites, blogs, forums, forms . The only way to obtain these addresses is for example by scraping web pages. and consequently they are very useful for detecting in paid lists and unaudited sources of recruitment.
Honeypot Traps:
These are deliberately hidden in websites, code and forms for harvesters, bots and malicious actors to pick up. They are another form of bait, intended to detect private and commercial unsolicited bulk mailing offenses and generally works to reduce the amount of spam being sent and received on the Internet.
Message ID Traps:
Message ID traps are intended to identify scrappers that capture any data with an @, including message IDs. If you send an email to this trap, it will tell the trap owner that the sender is scraping addresses or buying lists from someone who is.
Investigative Traps:
These email addresses are created and send emails directly to senders. The reason is to monitor the sender’s activity. This type of trap is useful for monitoring the ongoing behavior of a sender. Typically, this is used to ensure that the sender is using confirmation and proper email hygiene on their lists.
Pure Traps:
These are email addresses that have never been used by anyone, never signed up for any mailing list. The only way to get them is to use malpractices. These emails are left on the Internet to attract bots or people who illegitimately collect addresses to find them.
Typo Traps:
These are the ones with typos, for example, @gnail instead of @gmail. These are the most common usernames, but misspelled usernames before @ can also be typo traps. This usually happens when user data is collected and entered into your database manually or entered incorrectly over the phone, or intentionally as a way for customers to avoid being emailed.
Dead Address Traps:
These were previously valid emails, but were deactivated, usually 12 months or more ago, and then the addresses are re-activated. Most major ISPs use these traps because they are useful for identifying senders with a poor hygiene list.
As you have seen database cleaning is a very important exercise to guarantee a good IP reputation and consequently to have a good deliverability in email marketing, if you want to know more information about our database cleaning service, contact us and we will give you all the required information.