Handling Large CSV Files for Digital Forensics and Incident Response
I would love to hear about your techniques for large CSVs. Here’s a roundup of tools. I’ve only tested the first seven.
Free Tools
Timesketch/Kibana à la Skadi (Linux)
I tend to use Timesketch for its collaboration and multi-timeline capabilities. Four installation methods: Docker only, OVA, Vagrant, or installer script. Docker only instructions here.
Timeline Explorer (Windows)
Usually works on large CSVs despite dialog box pop-up. If it doesn’t, email Eric and he will work with you to troubleshoot. Thank him profusely.
Visual Studio Code + Excel Viewer (Windows, Mac, Linux)
Although there is a hard coded limit of 50 MB for Visual Studio Code extensions, I mention it for those who already use VSC.
“…use the explorer context menu or editor title menu to invoke the Open Preview command” to put the data into columns
Gnumeric (Linux)
Maximum number of rows = 16,777,216
WSL Instructions
Steps on Windows
You may need to allow VcXsrv windows xserver through Windows Defender Firewall.
Run XLaunch | Display number: 0
Steps on Ubuntu
apt-get install gnumeric
echo export DISPLAY=:0.0 >> ~/.bashrc
sed -i 's+<listen>unix:tmpdir=/tmp</listen>+<listen>tcp: host=localhost port=0 </listen>+g' /usr/share/dbus-1/session.conf
Reopen Bash
Source: Sous-système Windows pour Linux : Ubuntu sur Windows
Woanware’s LogViewer2 (Windows)
Linux CLI
I find myself using mostly the following for log manipulation: grep, awk, cut, rev, sed, sort, uniq, tail, head, cat, wc, tr
Import into Excel Data Model (Windows, Mac)
Data Tab | New Query | From File | From CSV | Load To… | Only Create Connection
Source: Loading CSV/text files with more than a million rows into Excel
Import into Access
External Data | New Data Source | From File | Text File
sift (Linux)
csvkit (Linux)
liquid Large File Editor (Windows)
VisiData (Mac or Linux)
CSView (Windows or Mac)
Handles files > 4 GB
reCsvEdit (Windows, Mac, Linux)
Handles files > 1 GB
OpenRefine (Windows, Mac, Linux)
Commercial Tools
010 Editor using Column Mode (Windows, Mac, Linux) | $129.95 (commercial) | $49.95 (home/academic)
Handles files > 50 GB
Delimit Pro (Windows) | $49 annually
Up to 2 billion rows and 2 million columns
Tablecruncher Pro (Mac) | $29
Handles files > 2 GB and 15 million rows