Back to blog

Top Tools For Flat File Data Manipulation & Analysis

Happy New Year folks! Before the Holiday break my last guest on The Corporate Data Show was the fabulous Scott Miller, who is the Founder, CEO, and as he calls it, “chief bottle washer and wire straightener,” at RampedUp Data Solutions..

Although we are definitely industry “somebodies”, when you go looking for data, Every Market Media and RampedUp probably won’t be the first companies you find. I say that because our competitors have an awful lot of website content in SEO. The most common way people meet me is through a referral or very specific outreach. We are people from, not the underbelly - that makes it sound shady - but certainly inside the inner works. Scott likes to call it the “chocolate chip cookies,” and we are the flour and chocolate chips. We are the wholesalers providing the ingredients. So we had a chat between two guys, not to toot my own horn (but “beep beep”), with a lot of industry cred.

ETL…D?

Some of you will know it, but let’s talk about ETL. It stands for Extract, Transform, and Load. In Layman’s terms, it’s the thing we use to manipulate the raw data files before they go places. There are not many people I know of that are doing this that are similar to Scott’s business. My business isn’t automated, it is all manual per file. Scott added a new letter on his ETL process: D for delivery. This is important for the unique requirements and frequencies his customers have in terms of taking these large files. 

YOU CAN HAVE IT YOUR WAY

Going back to food analogies (because it’s the holidays and everyone loves food), RampedUp is flexible and making money on it because they aren’t putting so much time into production that they have to say, “Sorry we don’t have time to make your custom cheeseburger.”

When it comes to fast food comparison, I think I am more McDonalds. I will make it this way, and that way, and maybe put ketchup on it if you’re really nice. Scott, on the other hand, says RampedUp would be Burger King: “You can have it your way”.

They have three large data assets: a contact data asset, a company data asset, and a technographic data asset. People want data by country, or different aspects blended together. That's the reason for the automation; before they embarked on this project, it would take weeks to try to make a custom file to the point where RampedUp wouldn't even do custom files. If they did, they would charge heavily for it.  I pretty much don't do them, but it is nice to be able to have that fluidity to change the layout sometimes. Here at EMM we are in Google Cloud, so we don’t quite have the same loot and our data delivers a little differently right off our ETL. Our tool, Alteryx, lets me manipulate, engineer, and analyze data.

SO HOW DOES IT WORK?

Scott explains there are a lot of tools that they use at RampedUp. They think of the information they provide as from the web- it is publicly available information. But how do they get that information?, Different IP spooling companies and different hosting companies, like Amazon or Digital Ocean. That level of data is what RampedUp considers to be the input type information. Then it enters the ETL environment after it's been normalized and shaped, sort of like at Every Market Media. Scott is picking up contacts while my stuff is mostly company level data and crawling direct sites, not a site owned by somebody, so I don’t have to deal with proxy bots like he does. 

Once you pick the data up, you have lots of tools on the front end for parsing and IP, various things to procure information, and that’s step one. Once you have that big pile of stuff, you’re ready for the “E” of the tool- Extracting. Now onto the Transformation step- the ingest logic, especially if you have more than one source of information, and we all try to have multiple sources. 

Scott says, “What's cool about that is we're using Apache Airflow to help us with that ingestion logic, but there’s different services that we’ve written that weren’t necessarily part of the ETL Process. In the process of validating email addresses, we host what is called an Email Exhaust Exchange: We validate hundreds of millions of email addresses every quarter, but we also work with partners that are delivering emails, so we get that email delivery back. That is a service we would put on to the transformation process to validate records.”

I don’t want to spill all the sausage and things that make up the product as far as what transformations we do. It is interesting to discuss how RampedUp performs those ingestion changes because it is a different order than how we end up making the transformations. When it comes to verification as part of transformation, I have tried the email exhaust tool and it has worked okay. It depends if the person selling it to me is bending the story on how old the exhaust is. The other challenge I’ve had are bulk mailers send so much and don’t care what the deliverability is, and you end up with false negatives and soft bounces. I validate about 200 million a year just brand new, and that gets me through my published stuff. Once something is a “delete”, it usually does not come back- when I roast an email it stays roasted. Or, if I roasted an email for a rule instead of a validation (such as characters in it that make it an invalid email or a swear word suppression) then that passes through data.

FINAL QUESTION

What is the worst tool for data manipulation and analysis on the market?

Scott said, “I don't know if it's the worst tool, I mean it's obviously for people that aren't very technical, but they do have an acumen manipulating data…there's CSV’s and Excel.” I am not a huge fan of Google Data studio either. I don’t think it is great for data visualization and definitely not a fan of excel or sheets. EmEditor is one that Scott likes to use. Like Notepad++, it opens huge text files.

At the end of the day though, as Scott says, “if that's all you have, then that's all you have”.

I appreciate Scott for coming on the podcast and having some great data nerd talk with me! Folks, if you want to get in touch with Scott or ask any questions, check out www.rampedup.io.

 

Happy New Year and have a great 2023 everyone!