A common question we receive at Plexon is about processing large data files in Offline Sorter. The details of these discussions are important. “Large” can mean a file a few GB in size or hundreds of GB in size (no joke). “Processing” can mean anything from opening and exporting a new file, to filtering, extracting waveforms, and spike sorting.
A commonality in the questions about large data files is the time it takes to process them. There are several factors that impact the time it takes to do certain actions in Offline Sorter: File type, number of channels, type of data, etc. I have had several recent data discussions with users, and when they send me their datafiles, they are large. “Large” in this case is tens of GB. The purpose of this blog is to address why those files are so “large,” and offer specific examples of how to create smaller files without losing data.
Users recording from a Plexon system typically generate either plx or pl2 files; thus, these are the file types discussed in this blog. The plx file format is an old, legacy format, whereas the pl2 format was introduced in 2013 specifically to address large data file handling.
Keep Only What You Need
A common misconception I often encounter with users is the need to “record everything.” The best way to manage a large recording file is to make sure you record only what you need. An OmniPlex system can generate multiple sources of data online, all of which can be saved to a pl2 file: Waveforms or spikes, continuous unthresholded spikes, wide band, local field potential, auxiliary signals, etc. I often encounter users thinking they must save each of the available sources of data for all channels. Mistakenly, the user thinks saving all sources of data will ensure “everything” is saved. Saving all these sources creates large datafiles filled with redundant data. With a Plexon OmniPlex system, “everything” is derived from the wide band data, which is the default signal type acquired on each channel. In actuality, the wide band data is not truly “everything,” as the signal from each electrode is high-pass and low-pass filtered, but generally speaking the wide band data is the most complete data type available to OmniPlex users.
Using a Digital Headstage Processor (DHP) OmniPlex system, I created several pl2 files in the hopes of illustrating my points. Each of these files were created using a 32 channel headstage and Plexon’s Headstage Tester Unit and wav file sample data. I configured the OmniPlex’s PlexControl software to record exactly ten minutes of data. In total, I recorded three different files. Each file contains exactly ten minutes of data, but different data sources were saved for each example, resulting in different files sizes.
Here is a screenshot from Windows Explorer showing each file. Below I discuss which data sources were saved to create the dramatically different files sizes.
Spike Data, Local Field Potential Data, and Continuous Data from 3 Auxiliary Channels: File Size 128MB
The OmniPlex system was configured for recording from one 32 channel digital headstage and had an auxiliary analog input card equipped in the chassis. The file named 32 Spike 32 LFP 3 Aux.pl2 contains: 32 channels of spike data, 32 channels of local field potential data, and continuous data from three auxiliary channels. Note the size of the file: 128MB.
Here is a screenshot of the Properties Spreadsheet in PlexControl showing the options I selected to save these different sources to the pl2 file.
This file will contain:
- the spike signal (I.e., The waveforms extracted from online thresholding for each spike channel)
- the continuous local field potential derived from the wide band signal and downsampled to whatever value is set by the user (1kHz by default)
- three continuously acquired “auxiliary” channels, which are independent from the headstage.
One benefit of saving these sources is the file size remains small and manageable. The user can spike sort the waveforms offline in Offline Sorter and analyze the field potential in NeuroExplorer or some other visualization package. The main downside is that changing the extracted waveforms or changing the threshold position is impossible. This would have only been possible if the wide band (WB) or continuous spike (SPKC) was saved. But there is a tradeoff. Saving either of those sources will increase the file size.
Spike Data, Wide Band Data, and Continuous Data from 3 Auxiliary Channels: File Size 1.6GB
This brings us to the second file: 32 Spike 32 Wide Band 3 Aux.pl2. This file contains 32 channels of spikes and three auxiliary channels, same as the first file. However, instead of 32 channels of field potential, this second file contains 32 channels of wide band data. Saving 32 channels of wide band data instead of 32 channels field potential data increased the file size from approximately 128MB to approximately 1.6GB.
Here is a screenshot of the Properties Spreadsheet in PlexControl showing the options I selected to save these different sources to the pl2 file.
This file will contain:
- The spike signal (same as above).
- Three auxiliary channels (same as above).
- Wide band data from all 32 channels, which is acquired at 40kHz.
The benefit of saving the wide band is it affords the user the opportunity to modify the waveform extraction done online.
- The waveforms extracted online and sorted are still saved (as the SPK) channels, but the wide band (WB) channels can be refiltered and re-thresholded offline in Offline Sorter.
- Likewise, the local field potential signal can be extracted from the wide band signal through filtering and down sampling. Saving the wide band data is as close a user can get to saving “everything.”
For many recording set-ups, the configuration recommended is saving the spike (SPK) channels and the wide band (WB) channels.
Saving “Everything”: Spike Data, Wide Band Data, Continuous Spikes, Local Field Potential Data and Continuous Data from All Auxiliary Channels: File Size 3.1GB
The third file is an example of a common misconception. Users want to save “everything,” and believe enabling every recording option is best. The problem with this misconception is the resulting file has several sources of redundant data. The third file, 32 Spike 32 Wide Band 32 Cont Spike 32 LFP 32 Aux.pl2, contains 32 channels of spikes, 32 channels of wide band, 32 channels of continuous spikes (SPKC), 32 channels of field potential, and 32 auxiliary channels (all channels available). The size of this file is more than 3GB.
Here is a screenshot of the Properties Spreadsheet in PlexControl showing the options I selected to save these different sources to the pl2 file.
This file will contain:
- The spike signal (same as above).
- The wide band signal (same as above)
- Continuous spike data from all 32 channels, which is acquired at 40kHz
- Field potential data from all 32 channels, which is acquired at 40kHz and downsampled to 1kHz by default
- Data from all 32 channels of the auxiliary card.
There are two main reasons why this file is larger than the other two previously discussed: 1) File 3 includes the wide band data and the continuous spike data, which are both sources collected at the maximum 40kHz sampling rate, and 2) File 3 includes all available auxiliary channels, even though only three are being used.
Remarkably, File 3 is nearly twice as large as file 2 yet contains no more valuable data than file 2.
In conclusion, to avoid the “recording everything” trap, users should refrain from recording all sources. In the examples discussed here, the file was only ten minutes long. Many recording situations require much longer recording sessions, which would mean much larger data files. Being careful to save only the data sources and channels needed will ensure manageable data files, which should mean more efficient offline data management and processing.
Written by Andrew Klein