Reading and writing files with Nipype

The first thing you likely want to do in building a pipeline is selecting the files that you want to work with. Nipype has several interfaces that you could use for this, like the SelectFiles, the DataGrabber, or the S3DataGrabber. The first two get data from disk directly based on a template, the third one collects data from an online (Amazon) repository, a so-called s3-bucket.

We highly recommend to use the SelectFiles interface to select files for your pipeline, because it is easier (and more intuitive) to use than the DataGrabber interface. In this section, we’ll showcase how to use SelectFiles nodes in your Porcupine pipeline to select files from disk and, at the end of your pipeline, how to use Nipype’s DataSink interface to appropriately write results to disk.

Scenario sketch

Before we explain how to use SelectFiles and DataSink, let’s sketch out a scenario. Suppose a researcher has a (BIDS-formatted) dataset, consisting of structural MRI-scans (a single T1-weighted scan) and functional MRI-scans (two runs of fMRI acquisition during a nback-task) of 10 subjects. This data may be stored in the researcher’s home-directory of his/her laptop, in the folder ~/data, which is organized as follows:

~/data
├── sub-01
│   ├── anat
│   │   └── sub-01_T1w.nii.gz
│   └── func
│       ├── sub-01_task-nback_run-1_bold.nii.gz
│       ├── sub-01_task-nback_run-2_bold.nii.gz
├── sub-02
│   ├── anat
│   │   ├── sub-02_T1w.nii.gz
│   └── func
│       ├── sub-02_task-nback_run-1_bold.nii.gz
│       ├── sub-02_task-nback_run-2_bold.nii.gz

...

└── sub-10
    ├── anat
    │   └── sub-10_T1w.nii.gz
    └── func
        ├── sub-10_task-nback_run-1_bold.nii.gz
        └── sub-10_task-nback_run-2_bold.nii.gz

Using Porcupine, this researcher wants to implement a simple pipeline that calculates a registration for each functional file from functional space to T1-space. Accordingly, the workflow needs to do the following (for each subject):

Skullstrip the T1-scan
Extract the first volume of each functional scan
Register each volume to the skullstripped T1-scan

We’ll use this hypothetical pipeline to walk you through file I/O in Porcupine using SelectFiles (for selecting files from disk) and DataSink (for writing files to disk) in the rest of this section of the documentation.

Using the `SelectFiles` interface in Porcupine

In this section, we’ll walk you through increasingly complex uses of the SelectFiles interface to select files for Porcupine pipelines.

A simple example: selecting a single file

Now, let’s start simple: suppose we want to select just one file - the T1-scan from subject 1 (sub-01) for the skullstripping part of our pipeline (we’ll extend the pipeline with the registration-part in the following examples). We’ll create a SelectFiles node in Porcupine, but you’ll notice that it doesn’t list any output-ports in the workflow and node editor! This is because SelectFiles expects you to create the output-ports yourself (this refers to Nipype’s “dynamic ports” construct; you can read more on this in the corresponding section of the documentation).

So, what you have to do is (1) create an output-port manually and (2) link some file(s) to that output-port. To do this, you need to do the following in Porcupine (see also the gif below):

Click the “Add port” button at the bottom of the node in the node editor
Enter the name of the new output-port (let’s call it “anat”)
Select “Both” in the new popup

After doing this, you should now see a “hybrid” input/output-port in the workflow editor and a new field named “anat” in the node editor! Now you need to tell the SelectFiles node where it can find the T1-file on disk. To do so, we simply fill in the path in the “anat” field in the node editor with:

'data/sub-01/anat/sub-01_T1w.nii.gz'

How to add a port to the SelectFiles node (click to enlarge)

(Note: don’t forget to set the base_directory input-port of SelectFiles if you don’t specify the absolute path in the value of your “anat” output-port!)

Then, create an FSL BET node for the skullstripping operation and connect the anat output-port of SelectFiles with the in_file input-port from the BET node, and done! We have built a fully functioning skullstripping “pipeline”!

Selecting multiple files and iteration

For our registration-pipeline, however, we also need to select the two functional (BOLD) files, from which we need to extract the first volume and subsequently calculate the registration from this volume to the skullstripped T1-file. To do so, we simply add another input/output-port - let’s name this one “func”. This time, we fill in the following in its field:

'data/sub-01/func/sub-01_task-nback_run*.nii.gz'

Now, the “func” output-port of SelectFiles will, upon runtime, contain not one but a list of two files, namely: sub-01_task-nback_run-1.nii.gz and sub-01_task-nback_run-2.nii.gz! We can, now, connect the “func” output-port to the in_file input-port from ExtractRoi node (from fsl.utils) which will extract the first volume of the func file.

However, the in_file input-port expects not a list, but a single string referring to the path of a file! To enable ExtractRoi to iterate over this list of inputs (which turns ExtractRoi into a “MapNode”; see here for more details), we only need to click the icon with the circular arrow (see gif below) next to the in_file input-port!

Selecting multiple files with SelectFiles (click to enlarge)

Importantly, we need to enable iteration (by clicking the circular arrow icon) for every node ‘downstream’ of this “func” output-port! So, suppose we add an FSL FLIRT node (to calculate the registration from the functional volume to the skullstripped T1), and connect the roi_file output-port from ExtractRoi to the in_file input-port from FLIRT. We then also need to make the in_file input-port iterable! (Otherwise, it’ll crash when receiving a list of two outputs from the roi_file output-port from the preceding node!)

Alright! Right now, we have a fully functioning registration-pipeline for a single subject that iterates over two functional files and registers them separately to a single T1-weighted file. Oftentimes though, we want to further iterate our pipelines across subjects. For that, we’ll use a little trick in the SelectFiles interface, as is explained next.

Iterating across subjects

To iterate across subjects, we only need to apply a couple of modifications to the SelectFiles node. First, we change the values of our “anat” and “func” ports the following, respectively:

'data/sub-{sub_id}/anat/sub-{sub_id}_T1w.nii.gz'  # anat port
'data/sub-{sub_id}/func/sub-{sub_id}_task-nback_run*.nii.gz'  # func port

Second, we manually add a new input-port (not a ‘hybrid’ input/output-port!) with the name “sub_id”. Last, for the value of this sub_id input-port, we assign a list with subject-identifiers (e.g. ['01', '02', '03']) that SelectFiles will insert into the template of the “func” and “anat” ports (see the gif below for a summary).

Iterating over subjects with SelectFiles (click to enlarge)

Importantly, SelectFiles will try to insert the entire list (['01', '02', '03']) into the {sub_id} placeholders at once, and this will crash the SelectFiles node! Just like we discussed for the ExtractRoi node, we need to make sure SelectFiles will iterate over the values of the sub_id input-port by clicking the circular arrow icon next to the sub_id port in the node editor. By doing so, we ensure SelectFiles will, one by one, insert the values of sub_id into the template in the func and anat ports. (This process refers to making sub_id an iterable, in Nipype lingo. For more info on this, check out the corresponding section in the documentation.)

By now, we have a fully functioning registration-pipeline that for each subject calculates the registration from functional to T1-space for two functional files! This way of selecting files covers probably most of the scenarios. But you should realize that you can add many more input/output-ports (in addition to func and anat; e.g., fieldmap!) and placeholders (in addition to {sub_id}; e.g., {session_id}!).

The only thing left to explain is how to write the results of your pipeline - in our case the registration matrices for each file from each subject - to disk again.

Writing outputs to disk using `DataSink`

Well, actually, all results from your pipeline are (almost always) already on your hard-drive, but they’re stored in an inconvenient place (somewhere buried deep in your ‘temporary files’ folder). You probably want to copy these files to a more convenient location. This you can do with Nipype’s DataSink interface.

If you initialize a DataSink node in Porcupine, however, you’ll see that it lacks any input-ports! To sink our results (i.e. the registration matrices) to a prespecified folder, let’s say ~/results/registration, we need to do two things:

Set the base_directory input-port to {your_home_dir}/results
Manually add an input-port named “registration”

We can then connect the out_matrix_file output-port from FLIRT to the new registration input-port from DataSink (see also gif below)!

Writing results to disk using DataSink (click to enlarge)

Tip: If you set the parameterization input-port of DataSink to False, it will not create any sub-directories for the fields that are iterated across (i.e., the two functional files and the different subjects).

Of course, you can add as many input-ports to DataSink as you deem useful! For example, you can create another input-port named skullstrip which you connect to the out_file output-port from BET, such that the skullstripped file will get ‘sinked’ to your result-directory! If you would then run this pipeline with these DataSink configurations (assuming parameterization is set to False), this will yield the following outputs:

~/results
├── registration
│   ├── sub-01_task-nback_run-1_bold_roi_flirt.mat
│   ├── sub-01_task-nback_run-2_bold_roi_flirt.mat
│   ├── sub-02_task-nback_run-1_bold_roi_flirt.mat
│   ├── sub-02_task-nback_run-2_bold_roi_flirt.mat
│  ...  
│   ├── sub-10_task-nback_run-1_bold_roi_flirt.mat
│   └── sub-10_task-nback_run-2_bold_roi_flirt.mat
└── skullstrip
    ├── sub-01_T1w_brain.nii.gz
    ├── sub-02_T1w_brain.nii.gz
   ...
    └── sub-10_T1w_brain.nii.gz