A big part of my work, in addition to sales presentations and consulting, is data analysis and extraction. Data extraction often involves deciphering older, flat file databases. To accomplish this, I rely on several software tools to make my job easier. In today’s issue, I’ll share some of those tools with you.
If you’re familiar with Notepad, the default Windows text editor, it can be frustrating to work with if you’re trying to manipulate or better understand text files. Looking at text files with long records, such as a meter reading interface file or outsource billing export file, can be frustrating with Notepad because it uses word wrap. With word wrap, trying to determine where one line ends and the next line starts can be difficult.
My personal favorite text editor is TextPad. TextPad offers several features not available in Notepad. In addition to not word wrapping, it also displays the line and column positions of where the cursor is located, which comes in very handy when trying to determine if data is in the correct position. The screenshot below shows the cursor is on line 65, column 648.
TextPad can also open files in binary mode, which shows each byte of data as the two-digit hexadecimal value. This is most valuable when trying to find binary or packed numeric data in a flat file. In the illustration below, 129906 is the current meter reading, but because it is stored as binary data, it’s not visible in the margin like the meter number (02779619) is.
Another TextPad feature I use frequently is Find in Files. This allows me to search all files, or a wildcard representation, for a particular value. I often use this when first trying to determine which files are important for a data conversion. If I know an account number, I can search all the data files to see which of those file contain the account number.
Let me start by saying Excel is a terrible tool to use to exchange data. Excel notoriously tries to convert everything it can to numeric data. This means leading zeros get dropped from fields where they are important (like the meter number in the illustration above). Also, long numeric fields often get converted to scientific notation.
However, that doesn’t mean Excel isn’t a valuable tool for data analysis, just don’t use it to exchange files!
One of the most obvious uses for Excel is to open delimited (tab, CSV, pipe) files. Excel arranges the data in columns, making it much easier to review than opening a delimited file in a text editor. Once data is in columns, it can easily be sorted and filtered. Filtering data is another feature I use frequently. If I’m working with a file that has a column of codes and I want to know how many unique codes are represented, Excel’s filter function is what I use. In the screenshot below, I have a column called Account Class. To determine what codes are represented in the Account Class column, all I had to do is turn on filtering and click the dropdown arrow for the Account Class column and it shows me the three unique values of C, M, and R:
One last tool I use less frequently than the others, but still rely on, is CSVed. As the name implies, this is a CSV, or any delimited file, editor. Excel has a limit to how large a file it can open which can be frustrating with very large files. CSVed doesn’t have that limitation and it will also allow me to filter records to save a subset of a large file.
For example, If I’m trying to analyze a history file that has billing, payment, and adjustment records, CSVed provides the functionality to export just one of the record types, for example, adjustments. This provides a much more manageable file to analyze and open in Excel.
Need assistance extracting old data?
If you’re trying to extract data from an old database and need assistance, please give me a call at 919-232-2320 or e-mail me at firstname.lastname@example.org to learn how I could assist you.
© 2019 Gary Sanders