Diffusion-Based Parcellation of Piriform Cortex with Snakemake Workflow
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Snakemake workflow overview for diffusion-based parcellation of the piriform cortex:
-
rule hcp7T_Diffusion_bedpostx_gpu
: Generate white matter fibre orientations within each brain voxel for each subject's HCP 7T diffusion data via FSL'sbedpostx_gpu
. -
rule hcp_mmk
: Generate target segmentation 3D nifti volumes (bilateral) from HCP32k surf gifti files (180 regions) -
rule diffparc_smk
: Resample the piriform seed/hcp-mmk180_targets-->HCP7TDiffusionResolution and subsequently perform probabilistic tractography from the piriform seed in each subject's space to hcp-mmk180_targets via FSL'sprobtrackx2_gpu
3b. Bring connectivity data for each subject's seed voxels into template and performs spectral clustering on the concatenated feature vectors to parcellate into k regions
3c. Create node and edges tables to use as the input to the Gephi software for each of the clustering solutions
-
rule tractmap
Perform probabilistic tractography on each individual cluster to generate tractography maps for each cluster following spectral clustering (useful for visualization of each cluster's individual connectivity)
Inputs in config/config.yml:
-
participants.tsv
: Target subject IDs that have both HCP 7T diffusion and be used in creation of a subject specific template from: ants_build_template_smk , (see input 5) -
template_prob_seg
: Probabilistic segmentation as a 3D nifti for individual piriform hemisphere to process -
prob_seg_threshold
: Value to threshold the probabilistic segmentation at (default is 0.5, e.g. 0.2 would include 80% of the probseg) -
template
: Name of template to use for filenaming within the pipeline -
ANTS transformations from template T1w space to/from each subject's T1w, i.e. from: ants_build_template_smk
-
ants_affine_mat
: Forward affine from subject-->template -
ants_invwarp_nii
: Warp from template-->subject -
ants_warp_nii
: Warp from subject-->template -
ants_ref_nii
: Study-specific T1w created from: [ants_build_template_smk]
-
-
lhrh_targets_independent_condition
: When set to "True": 358 regions, analyze connectivity of piriform segmentation within each hemisphere independently (considers each lh and rh region to be separate/different between hemispheres; when set to "false": 179 regions, analyze connectivity of piriform segmentation irrespective of hemispheres (considers each lh and rh region to be a combined single region across hemispheres) -
targets_txt_lh-rh_separate_358regions
: List of ROIs in atlas segmentations to use as targets for piriform connectivity (358 lh vs. rh) -
targets_txt_lh-rh_combined_179regions
: List of ROIs in atlas segmentations to use as targets for piriform connectivity (179 lh + rh) -
max_k
: Maximum number of clusters to create during spectral clustering (will create all cluster solutions <= max_k) -
probtrack
: Tractography parameters, seed_resolution defines the resolution to resample all other inputs to -
probtrack_tractmap
: Tractography parameters to generate individual tractmaps for each cluster following spectral clustering (useful for visualization) -
in_freesurfer
: Path to subject freesurfer dir from HCP_1200 release -
dwi_preproc_dir
: Path to base directory containing the HCP 7T diffusion data for all subjects
-
in_dwi_nii
-
in_dwi_bval
-
in_dwi_bvec
-
in_dwi_grad_dev
-
in_dwi_mask_nii
Required Singularity Containers:
-
singularity_neuroglia
: FSL, ANTS, NiftyReg, gnu-parallel -
singularity_freesurfer
: Freesurfer -
singularity_connectome_workbench
: Connectome wb -
singularity_cuda
: FSL6 with CUDA container (for running gpu versions of bedpostx and probtrackx2)
Usage:
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and, if available, its DOI (see above).
Step 1: Obtain a copy of this workflow
-
Create a new github repository using this workflow as a template.
-
Clone the newly created repository to your local system, into the place where you want to perform the data analysis.
Step 2: Configure workflow
Configure the workflow according to your needs via editing the files in the config/ folder. Adjust config.yml to configure the workflow execution, and participants.tsv to specify your subjects.
Step 3: Install and Activate Snakemake
Install Snakemake using conda:
conda create -c bioconda -c conda-forge -n snakemake snakemake
Activate the conda environment:
conda activate snakemake
or add this line into your bash startup
For installation details, see the instructions in the Snakemake documentation.
Step 4: Execute workflow
-
Option A) Execute the workflow locally via:
snakemake --use-conda --use-singularity --cores $N
using $N cores, or, run it in a cluster environment viasnakemake --use-conda --use-singularity --cluster qsub --jobs 100
orsnakemake --use-conda --use-singularity --drmaa --jobs 100
-
Option B) If you are using Compute Canada, you can use the cc-slurm profile, which submits jobs and takes care of requesting the correct resources per job (including GPUs). Once it is set-up with cookiecutter, run:
snakemake --profile cc-slurm
-
Option C) Or, with neuroglia-helpers can get a 1-GPU, 8-core, 32gb node and run locally there. First, get a node with a GPU (default 8-core, 32gb, 3 hour limit):
regularInteractive -g
. Then, run:snakemake --use-conda --use-singularity --cores 8 --resources gpu=1 mem=32000
Step 4b: Executing workflow on Compute Canada via the cc-slurm profile in iterations
If you have a large number of subjects, we recommend you run the workflow in the following stages:
-
bedpostx_gpu: via
snakemake --profile cc-slurm --until perform_bedpostx_gpu
-
generate target segs and probtrackx2_gpu: via
snakemake --profile cc-slurm --until run_probtrack
-
piriform diffusion parcellation with spectral_clustering in template space on concatenated subjects and creation of Gephi input nodes/edges tables: via
snakemake --profile cc-slurm --until create_gephi_input_nodes_and_edge_tables
-
probtrackx2_gpu run again on each cluster to generate visualizations of each cluster's connectivity: via
snakemake --profile cc-slurm --until vote_tractmap_template
After running each iteration, run
snakemake --profile cc-slurm --until <iteration name previously run> -np
to ensure that all jobs completed and all outputs were sucessfully created before moving on to the next iteration.
Code Snippets
18 | shell: 'fslmaths {input.lh} -max {input.rh} {output.lh_rh} &> {log}' |
30 | shell: 'fslmaths {input.lh} -max {input.rh} {output.lh_rh} &> {log}' |
41 42 | shell: 'cp -v {input} {output} &> {log}' |
52 53 | shell: 'fslmaths {input} -thr {params.threshold} -bin {output} &> {log}' |
66 67 | shell: 'fslmaths {input.dwi} -bin {output.mask} &> {log}' |
112 113 | shell: 'fslmaths {input} -thr {params.threshold} -bin {output} &> {log}' |
156 157 | shell: 'fslmaths {input.seed_res_bin} -mul 1000 -max {input.targets_res} -uthr 999 {output.targets_pir_overlap_removed} &> {log}' |
174 175 | shell: 'mkdir -p {output} && parallel --jobs {threads} fslmaths {input.targets_res} -thr {{1}} -uthr {{1}} -bin {{2}} &> {log} ::: {params.target_nums} :::+ {params.target_seg}' |
188 189 190 191 192 | run: f = open(output.target_txt,'w') for s in params.target_seg: f.write(f'{s}\n') f.close() |
220 221 | shell: 'mkdir -p {output.probtrack_dir} && singularity exec -e --nv {params.container} probtrackx2 --samples={params.bedpost_merged} --mask={input.mask} --seed={input.seed_res} --targetmasks={input.target_txt} --seedref={input.seed_res} --nsamples={params.nsamples} --dir={output.probtrack_dir} {params.probtrack_opts} -V 1 &> {log}' |
244 245 | shell: 'mkdir -p {output.probtrack_dir} && singularity exec -e --nv {params.container} probtrackx2_gpu --samples={params.bedpost_merged} --mask={input.mask} --seed={input.seed_res} --targetmasks={input.target_txt} --seedref={input.seed_res} --nsamples={params.nsamples} --dir={output.probtrack_dir} {params.probtrack_opts} -V 2 &> {log}' |
268 269 | shell: 'mkdir -p {output} && ITK_GLOBAL_DEFAULT_NUMBER_OF_THREADS=1 parallel --jobs {threads} antsApplyTransforms -d 3 --interpolation Linear -i {{1}} -o {{2}} -r {input.ref} -t {input.warp} -t {input.affine} &> {log} ::: {params.in_connmap_3d} :::+ {params.out_connmap_3d}' |
288 | script: '../scripts/save_connmap_template_npz.py' |
304 | script: '../scripts/gather_connmap_group.py' |
322 | script: '../scripts/spectral_clustering.py' |
345 | script: '../scripts/save_connmapClusters_template_npz.py' |
365 | script: '../scripts/gather_connmapClusters_group.py' |
383 | script: '../scripts/gather_connmapClusters_group.py' |
407 | script: '../scripts/create_connmapClusters_group_GephiNodes_and_Edges_Tables.py' |
427 | script: '../scripts/create_connmapClusters_group_GephiNodes_and_Edges_Tables.py' |
31 32 33 34 35 36 | shell: 'cp {input.dwi7T_nii} {output.dwi7T_nii_copy} && ' 'cp {input.dwi7T_bval} {output.dwi7T_bval_copy} && ' 'cp {input.dwi7T_bvec} {output.dwi7T_bvec_copy} && ' 'cp {input.dwi7T_grad_dev} {output.dwi7T_grad_dev_copy} && ' 'cp {input.dwi7T_mask_nii} {output.dwi7T_mask_nii_copy}' |
53 54 | shell: 'mkdir -p {output.outdir} && singularity exec -e --nv {params.container} bedpostx_gpu {input.indir} -g &> {log}' |
21 | shell: 'FS_LICENSE={params.license} mris_convert {input} {output} &> {log}' |
32 | shell: 'FS_LICENSE={params.license} mri_convert {input} {output} &> {log}' |
45 | shell: 'FS_LICENSE={params.license} mri_info {input.t1} --tkr2scanner > {output.tkr2scanner} 2> {log}' |
57 | shell: 'wb_command -surface-apply-affine {input.surf} {input.tkr2scanner} {output.surf} &> {log}' |
70 | shell: 'wb_command -surface-average {output.midthickness} -surf {input.white} -surf {input.pial} &> {log}' |
88 | shell: 'wb_command -surface-resample {input.surf} {input.current_sphere} {input.new_sphere} {params.method} {output.surf} &> {log}' |
108 109 110 111 | shell: 'wb_command -label-resample {input.label} {input.current_sphere} {input.new_sphere}' ' {params.method} {output.label}' ' -area-surfs {input.current_surf} {input.new_surf} &> {log}' |
128 129 130 131 | shell: 'wb_command -label-to-volume-mapping {input.label} {input.surf} {input.vol_ref} {output.label_vol}' ' -ribbon-constrained {input.white_surf} {input.pial_surf}' ' -greedy &> {log}' |
145 146 | shell: 'c3d {input.label_vol} -replace 110 0 -o {output.label_vol_removepir110} &> {log}' |
164 165 166 167 | shell: 'fslmaths {input.label_vol_removepir110} -thr 111 -sub 1 {output.tempA_thrBelow111_sub1} && ' 'fslmaths {input.label_vol_removepir110} -uthr 109 {output.tempB_thrAbove109} && ' 'fslmaths {output.tempA_thrBelow111_sub1} -max {output.tempB_thrAbove109} {output.relabelled_vol_removepir110} &> {log}' |
179 180 | shell: 'fslmaths {input.relabelled_rh_vol_removepir110} -add 179 -thr 180 {output.relabelled_rh_vol_removepir110_add179} &> {log}' |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | import numpy as np import pandas as pd import matplotlib.pyplot as plt import nibabel as nib from functools import reduce import os #Set the name of the seed you want to create the Gephi input for seeds = snakemake.params.seeds_py #Set the k parcellation you want to create the Gephi input for #this_k = 3 cluster_range = range(2,snakemake.params.max_k+1) #Set the path to the location of the directory containing the connmapGroup directories for each seed and each k #(i.e. connmapClusters dir from diffparc located in /scratch/nchris5/HCP_template_piriform_segmentations/diffparc/results) path = snakemake.params.connmapClusters_dir #Read in the Glasser_2016_Table with the 'SectionsBOLD' column manually included, which indicates which of the 22 overarching regions each of the 180 regions belongs to; #and the 'ColourRGB' column manually included, which indicates the RGB for that specific SectionsBOLD GlasserTable = pd.read_csv(snakemake.params.GlasserTable_22regions_colours, header=0, usecols=range(1,8)) GlasserTable.index = range(1,snakemake.params.num_targets+1) print('Original GlasserTable : \n', GlasserTable) #Sort by the section values from 1-->22 GlasserTable_sorted_PrimarySection = GlasserTable.sort_values(by='SectionsBOLD', ascending=True) #Change the actual index to 1-->179 GlasserTable_sorted_PrimarySection.index = range(1,snakemake.params.num_targets+1) #Insert a column that describes the sorted dataframe new index from 1-->180 GlasserTable_sorted_PrimarySection.insert(loc=1, column='Sorted_PrimarySection_Index', value=range(1,snakemake.params.num_targets+1)) #Create the list of target ID's with leading zeros so all ID's have 3 digits (i.e. 001, 010, 100), as for gephi to read in proper order GlasserTable_sorted_PrimarySection_Index_ThreeDigits = [str(indID).zfill(3) for indID in GlasserTable_sorted_PrimarySection.Sorted_PrimarySection_Index] print('GlasserTable_sorted_PrimarySection : \n', GlasserTable_sorted_PrimarySection) print('\n ------ Creating Nodes df ------ \n ') #Includes the Id, Label, color, and SectionsBOLD for each of the 179 or 358 Glasser targets (001-179 or 001-358) Targets = pd.DataFrame({'Id': GlasserTable_sorted_PrimarySection_Index_ThreeDigits, 'Label': GlasserTable_sorted_PrimarySection.AreaName, 'color': GlasserTable_sorted_PrimarySection.ColourRGB, 'Modularity Class': GlasserTable_sorted_PrimarySection.SectionsBOLD}, index=range(1,snakemake.params.num_targets+1)) print('Targetsdf : \n', Targets) for seed in seeds: for i,this_k in enumerate(cluster_range): #For 0,2 1,3 2,4 #Set the RBG colours for each of the clusters within this_k to the same colour that is present in the clustering output in itksnap (i.e. diffparc spectral_clustering output) if this_k==2: this_k_ColourRGB = ['255,0,0','0,255,0'] #Cluster01=Red, #Cluster02=Green elif this_k==3: this_k_ColourRGB = ['255,0,0','0,255,0','0,0,255'] #Cluster01=Red, #Cluster02=Green, #Cluster03=Blue elif this_k==4: this_k_ColourRGB = ['255,0,0','0,255,0','0,0,255', '255,255,0'] #Cluster01=Red, #Cluster02=Green, #Cluster03=Blue #Yellow #Create the Id's and labels for the individual clusters for and get the basename for the individual clusters (e.g. Cluster01, Cluster02, Cluster03), assigning Id values above 179 or 358 (since 179 or 358 targets) e.g. 180=Cluster01, 181=Cluster02, 182=Cluster03 IdlistClusters = list() labellist = list() for i,this_label in enumerate(range(1,this_k+1)): IdlistClusters.append(snakemake.params.num_targets+1+i) labellist.append('Cluster{this_label:02d}'.format(this_label=this_label)) #Set the Modularity Class for each individual cluster to be distinct (i.e. Cluster01=23, Cluster02=24, Cluster03=25) ModularityClassClusters = list(range(23,this_k+23)) #Create the dataframe for the clusters to add to the list of targets individualclusterstoadd = pd.DataFrame({'Id': IdlistClusters, 'Label': labellist, 'color': this_k_ColourRGB, 'Modularity Class': ModularityClassClusters}, index=range(snakemake.params.num_targets,snakemake.params.num_targets+this_k)) print('\n individualclusterstoadd : \n', individualclusterstoadd) #Append the cluster labels dataframe to the bottom of the targets list dataframe to yield the final Nodesdf Nodesdf = Targets.append(individualclusterstoadd) print('\n Nodesdf : \n ', Nodesdf) print('\n ------ Creating Edges df ------ \n ') print(print('------ Creating Edges df SourceId and SourceName columns ------ \n ')) #Multiply the targets df by number of clusters (i.e. first 180 will represent cluster 1, next 180 will represent cluster 2, etc.) EdgesSourceId = np.zeros((snakemake.params.num_targets,this_k), dtype=int) EdgesSourceNames = list() for i,this_label in enumerate(range(1,this_k+1)): EdgesSourceId[:,i] = this_label+snakemake.params.num_targets tempSourceNames = list() for j in range(snakemake.params.num_targets): tempSourceNames.append('Cluster{this_label:02d}'.format(this_label=this_label)) EdgesSourceNames.append(tempSourceNames) #Use functools reduce to remove iterative lists into a single list i.e. [179xCluster01], [179xCluster02], [179xCluster03] to [179xCluster01, 179xCluster02, 179xCluster03] EdgesSourceNames = reduce(lambda x,y: x+y, EdgesSourceNames) #Reshape the edges source col current with shape (179,numClusters) to 179*numClusters with order 'F' (i.e. 179x'181' --> 179x'182' --> 179x'183') EdgesSourceId = np.reshape(EdgesSourceId, (snakemake.params.num_targets*this_label), order='F') TargetsMultiplied = pd.concat([Targets]*this_k, ignore_index=True) print('TargetsMultiplied Shape = ', TargetsMultiplied.shape) print('TargetsMultiplied = \n', TargetsMultiplied) print('\n ------ Creating Edges df Columns: Source, SLabel, Target, TLabel, ConnectivityScore, ConnectivityScoreNormalized, Rank_in_Cluster, Weight ------ \n ') #Create list of paths to individual clusternpz files for this_k connmapClustersDirList = list() thiskString = str(this_k) #Only select the directories that have the specified seed and also end with k=this_k for connmapClustersDir in os.listdir(path): if ((connmapClustersDir.find(seed) != -1) and (connmapClustersDir.endswith(thiskString))): connmapClustersDirList.append(connmapClustersDir) newpath = path + '/' + connmapClustersDirList[0] individualClusters_thisk_npzList = list() for individualClusters_thisk_npz in os.listdir(newpath): if individualClusters_thisk_npz.endswith('.npz'): fullpath = newpath + '/' + individualClusters_thisk_npz individualClusters_thisk_npzList.append(fullpath) print(individualClusters_thisk_npzList) #Initiate an empty dataframe that will be appended to include the values from each cluster df_ALLCLUSTERS_InsertRank = pd.DataFrame(columns = ['targetnames', 'ConnectivityScore', 'ConnectivityScoreNormalized', 'SortedRank']) #Create a list of ranks from 1(lowest connectivity) to 180(highest connectivity) for i,thisk_thisCluster_npz in enumerate(individualClusters_thisk_npzList): data_thisCluster = np.load(thisk_thisCluster_npz) conngroup_thisCluster = data_thisCluster['conn_group'] connAvg_across_subs_thisCluster = conngroup_thisCluster.mean(axis=0) #Shape of input is (#sub,#voxels,#targets) targetAvg_across_voxels_thisCluster = connAvg_across_subs_thisCluster.mean(axis=0) #Shape of input is (#voxels,#targets) sum_targetAvg_across_voxels_thisCluster = np.sum(targetAvg_across_voxels_thisCluster) #Sum all the targets to get sum of conn scores quotients = [number / sum_targetAvg_across_voxels_thisCluster for number in targetAvg_across_voxels_thisCluster] df_thisCluster = pd.DataFrame({'targetnames': GlasserTable.AreaName, 'ConnectivityScore': targetAvg_across_voxels_thisCluster, 'ConnectivityScoreNormalized': quotients}, index=range(1,snakemake.params.num_targets+1)) #Insert a column the SectionsBold column from the GlasserTable df_thisCluster.insert(loc=1, column='SectionsBOLD', value=GlasserTable.SectionsBOLD) print('\n df_thisCluster_InsertedSectionsBold k-%s and cluster-%s : \n %s' % (this_k, i+1, df_thisCluster)) df_thisCluster_sorted_PrimarySection = df_thisCluster.sort_values(by='SectionsBOLD', ascending=True) df_thisCluster_sorted_PrimarySection.index = range(1,snakemake.params.num_targets+1) #Insert a column that describes the sorted dataframe new index from 1-->179 df_thisCluster_sorted_PrimarySection.insert(loc=1, column='Sorted_PrimarySection_Index', value=range(1,snakemake.params.num_targets+1)) print('\n df_thisCluster_sorted_PrimarySection k-%s and cluster-%s : \n %s' % (this_k, i+1, df_thisCluster_sorted_PrimarySection)) #Assign weight based on ConnectivityScore df_sorted_thisCluster = df_thisCluster_sorted_PrimarySection.sort_values(by='ConnectivityScoreNormalized', ascending=False) print('\n df_sorted_thisCluster k-%s and cluster-%s : \n %s' % (this_k, i+1, df_sorted_thisCluster)) df_sorted_thisCluster_InsertRank = df_sorted_thisCluster df_sorted_thisCluster_InsertRank['SortedRank'] = pd.Series(reversed(range(1,snakemake.params.num_targets+1)),index=(df_sorted_thisCluster.index)) df_thisCluster_sorted_PrimarySection_InsertRank = df_thisCluster_sorted_PrimarySection df_thisCluster_sorted_PrimarySection_InsertRank['SortedRank'] = pd.Series(reversed(range(1,snakemake.params.num_targets+1)),index=(df_sorted_thisCluster.index)) print('\n df_sorted_thisCluster k-%s and cluster-%s with Rank : \n %s' % (this_k, i+1, df_sorted_thisCluster_InsertRank)) print('\n df_thisCluster k-%s and cluster -%s with Rank : \n %s' % (this_k, i+1, df_thisCluster_sorted_PrimarySection_InsertRank)) df_ALLCLUSTERS_InsertRank = df_ALLCLUSTERS_InsertRank.append(df_thisCluster_sorted_PrimarySection_InsertRank, ignore_index=True) print('CURRENT df_ALLCLUSTERS_InsertRank Shape = ', df_ALLCLUSTERS_InsertRank.shape) print('FINAL df_ALLCLUSTERS_InsertRank Shape = ', df_ALLCLUSTERS_InsertRank.shape) print('FINAL df_ALLCLUSTERS_InsertRank : \n', df_ALLCLUSTERS_InsertRank) #Add in the edge weight column #Multiply the SortedRank*ConnectivityScoreNormalized to get the Weight of each edge, round to integer, then add 1 to ensure that lowest edge weight value is > 0 df_ALLCLUSTERS_InsertRank['Weight'] = df_ALLCLUSTERS_InsertRank['SortedRank'] * df_ALLCLUSTERS_InsertRank['ConnectivityScoreNormalized'] df_ALLCLUSTERS_InsertRank['Weight'] = df_ALLCLUSTERS_InsertRank['Weight'].apply(lambda x: round(x, 0) + 1) Edgesdf = pd.DataFrame({'Source': EdgesSourceId, 'SLabel': EdgesSourceNames, 'Target': TargetsMultiplied.Id, 'TLabel': TargetsMultiplied.Label, 'ConnectivityScore': df_ALLCLUSTERS_InsertRank.ConnectivityScore, 'ConnectivityScoreNormalized': df_ALLCLUSTERS_InsertRank.ConnectivityScoreNormalized, 'Rank_in_Cluster': df_ALLCLUSTERS_InsertRank.SortedRank, 'Weight': df_ALLCLUSTERS_InsertRank.Weight}, index=range(0,this_k*snakemake.params.num_targets)) print('\n Shape of this Edges is %s and Edgesdf = \n %s ' % (Edgesdf.shape, Edgesdf)) print('\n ------ Saving Nodesdf and Edgees df to csv ------ \n ') if not os.path.exists('results/gephi_input_NodesandEdges_tables_' + seed): os.mkdir('results/gephi_input_NodesandEdges_tables_' + seed) #Create savepaths savepathNodesdf = 'results/gephi_input_NodesandEdges_tables_' + seed + '/GephiNodes_' + seed + '_k-' + str(this_k) + '_Sorted_Overarching22Categories_WithRGBColours.csv' savepathEdgesdf = 'results/gephi_input_NodesandEdges_tables_' + seed + '/GephiEdges_' + seed + '_k-' + str(this_k) + '_Sorted_Overarching22Categories_NormalizedWeight.csv' print('savepathNodesdf = ', savepathNodesdf) print('savepathEdgesdf = ', savepathEdgesdf) #Write to Nodesdf and Edgesdf to CSV files Nodesdf.to_csv(savepathNodesdf, index = False, header=True) Edgesdf.to_csv(savepathEdgesdf, index = False, header=True) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | import numpy as np import pandas as pd import os import matplotlib.pyplot as plt from scipy import stats targetnames = pd.read_table('results/diffparc/sub-100610/target_images_from-hcp7Tsubj_rpiriform_overlap_removed.txt', names=['TargetNames'], header=None) targetnamesList = targetnames.TargetNames.to_list() targetnamesList = [ line.replace('results/diffparc/sub-100610/lh_rh_targets_from-hcp7Tsubj_rpiriform_space-native_resampled_dwi_resolution_pir-overlap-removed/','') for line in targetnamesList ] #Remove that string from each line targetnamesList = [ line.replace('_ROI.nii.gz','') for line in targetnamesList ] #Remove this string from each cluster_range = range(2,snakemake.params.max_k+1) for i,this_k in enumerate(cluster_range): #For 0,2 1,3 2,4 label = 1 thiskdirslist = list() thiskString = str(this_k) for j,thiskdir in enumerate(snakemake.input.connmap_Clusters_npz_dir): if thiskdir.endswith(thiskString): thiskdirslist.append(thiskdir) newClusterGroupDir = '{snakeout}_k-{thisk}'.format(snakeout=snakemake.params.connmap_Clusters_group_npz_dir_base,thisk=this_k) if not os.path.exists(newClusterGroupDir): os.mkdir(newClusterGroupDir) fig = plt.figure(figsize=[6.4*this_k,4.8]) while label <= this_k: thislabel_namesList = list() labelString = str(label) + '_connmap.npz' print('labelString = ',labelString) for l,eachthiskdir in enumerate(thiskdirslist): thislabel_names = [f for f in os.listdir(path=eachthiskdir) if f.endswith(labelString)] thislabel_namesString = ''.join(thislabel_names) thislabel_fullpath = eachthiskdir + '/' + thislabel_namesString thislabel_namesList.append(thislabel_fullpath) thisk_thislabelonesubj = thislabel_namesList[0] print('thisk_thislabelonesubj = ',thisk_thislabelonesubj) data_thisk_thislabelonesubj = np.load(thisk_thislabelonesubj) affine = data_thisk_thislabelonesubj['affine'] mask = data_thisk_thislabelonesubj['mask'] conn_shape = data_thisk_thislabelonesubj['conn'].shape nsubjects = len(thislabel_namesList) print('nsubjects = ',nsubjects) conn_group = np.zeros([nsubjects,conn_shape[0],conn_shape[1]]) savestring = '{newClusterGroupD}/cluster{k_i:02d}_connmap_group.npz'.format(newClusterGroupD=newClusterGroupDir,k_i=label) for sub,npz in enumerate(thislabel_namesList): data_thisk_thislabel_eachsub = np.load(npz) conn_group[sub,:,:] = data_thisk_thislabel_eachsub['conn'] np.savez(savestring, conn_group=conn_group,mask=mask,affine=affine) connmatrix_for_each_target_avggroup = conn_group.mean(axis=0) #Average across rows (subjects) to get connmap matrix average across all subjects print('connmatrix_for_each_target_avggroup shape = ',connmatrix_for_each_target_avggroup.shape) avg_conn_for_each_target_avggroup = connmatrix_for_each_target_avggroup.mean(axis=0) #Average across rows (voxels in piriform, indexed) to get connmap matrix average across all subjects sum_avg_conn_for_each_target_avggroup = np.sum(avg_conn_for_each_target_avggroup) quotients = [number / sum_avg_conn_for_each_target_avggroup for number in avg_conn_for_each_target_avggroup] #Create dataframe that includes columns target names and the values of the targets df = pd.DataFrame({'targetnames': targetnamesList, 'ConnectivityScore': avg_conn_for_each_target_avggroup, 'ConnectivityScoreNormalized': quotients}, index=range(0,snakemake.params.num_targets)) print("Targets prior to sorting based on avg conn for k-%s and label %s = %s" % (this_k, label, df)) dfsorted = df.sort_values(by='ConnectivityScoreNormalized', ascending=False) #Sort the target regions descending print("Targets sorted based on avg conn for k-%s and label %s = %s" % (this_k, label, dfsorted)) #Plot the 10 targets with the highest connectivity to the piriform x = np.linspace(1,10,10) #start,stop,number of ticks y = np.linspace(0,1,11) #Makes it incerement by 0.1 ax = fig.add_subplot(1,this_k,label) ax.set_title("k-%s Cluster-0%s" % (this_k, label)) ax.set_xticks(x) ax.set_yticks(y) ax.set_ylim([0,0.5]) ax.set_xticklabels(dfsorted.targetnames[0:10], rotation='vertical') ax.set_xlabel('Target Name') ax.set_ylabel('Average Connectivity to Target') ax.bar(x,dfsorted.ConnectivityScoreNormalized[0:10]) #Gives the target regions with the top 10 highest average connectivites from the group averaged piriform seed label +=1 figuresavestring = '{newClusterGroupDir}/k-{thisk}_Individual_Clusters_Top10connectivity_to_targets.png'.format(newClusterGroupDir=newClusterGroupDir,thisk=this_k) print('figuresavestring is : ',figuresavestring) plt.savefig(figuresavestring, bbox_inches = "tight") |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | import numpy as np #load first file to get shape data = np.load(snakemake.input.connmap_npz[0]) affine = data['affine'] mask = data['mask'] conn_shape = data['conn'].shape nsubjects = len(snakemake.input.connmap_npz) conn_group = np.zeros([nsubjects,conn_shape[0],conn_shape[1]]) for i,npz in enumerate(snakemake.input.connmap_npz): data = np.load(npz) conn_group[i,:,:] = data['conn'] #save conn_group, mask and affine np.savez(snakemake.output.connmap_group_npz, conn_group=conn_group,mask=mask,affine=affine) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | import nibabel as nib import numpy as np import os #kclusters_nii_list_names = os.listdir(path=snakemake.input.clusters_template_space_dir) #kclusters_nii_list = list() #for filename in kclusters_nii_list_names: # fullpath = os.path.join(snakemake.input.clusters_template_space_dir, filename) # kclusters_nii_list.append(fullpath) kclusters_nii_list = list() for a,thiskclustervol in enumerate(snakemake.input.clusters_template_space): kclusters_nii_list.append(thiskclustervol) print('kclusters_nii_list = ',kclusters_nii_list) cluster_range = range(2,snakemake.params.max_k+1) #Range of the clusters [2, 3, 4] ntargets = len(snakemake.params.connmap_3d) #Number of targets = 179 for i,this_k in enumerate(cluster_range): #For 0,2 1,3 2,4 label = 1 #Lowest label value is 1 newClusterDir = '{snakeout}_k-{thisk}'.format(snakeout=snakemake.params.connmap_Clusters_npz,thisk=this_k) if not os.path.exists(newClusterDir): os.mkdir(newClusterDir) while label <= this_k: #For each individual label in that cluster file, stop after label hits that current k value print("Label val for k-%s prior to creating indices = %s" % (this_k, label)) clustmask_nib = nib.load(kclusters_nii_list[i]) #Load in cluster 2, 3, 4 print("This k-%scluster = %s" % (this_k, kclusters_nii_list[i])) clustmask_vol = clustmask_nib.get_fdata() clustmask_indices = clustmask_vol==label #Indices within the clustmask_vol for label=label (e.g. indices in k-2 colume where label = 1) maskedCount = clustmask_vol[clustmask_indices] #Number of voxels in a list that contain this label nvoxels = len(maskedCount) #Count of the number of voxels print("nvoxels for k-%s and label %s = %s" % (this_k, label, nvoxels)) maskedVoxelsIndividualCluster = clustmask_vol maskedVoxelsIndividualCluster[maskedVoxelsIndividualCluster!=label] = 0 #Any of the voxels that do not contain the label value become 0 (e.g. if this_k = 2, and label = 1, then label = 2 becomes 0) conn = np.zeros((nvoxels,ntargets)) #Create conn matrix of size nvoxels in this cluster,ntargets savestring = '{newClustDir}/cluster{k_i:02d}_connmap.npz'.format(newClustDir=newClusterDir,k_i=label) print("savestring for k-%s and label %s = %s" % (this_k, label, savestring)) for j,conn_file in enumerate(snakemake.params.connmap_3d): targvol = nib.load(conn_file).get_fdata() maskedtargvol = targvol[clustmask_indices].T conn[:,j] = maskedtargvol print("Conn shape for k-%s and label %s = %s \n" % (this_k, label, conn.shape)) np.savez(savestring, conn=conn,mask=maskedVoxelsIndividualCluster,affine=clustmask_nib.affine) label +=1 #Label becomes 2 and loop while loop continues, then label becomes 3 and the loop exits |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | import nibabel as nib import numpy as np mask_nib = nib.load(snakemake.input.mask) mask_vol = mask_nib.get_fdata() mask_indices = mask_vol > 0 masked = mask_vol[mask_indices] nvoxels = len(masked) ntargets = len(snakemake.params.connmap_3d) conn = np.zeros((nvoxels,ntargets)) for i,conn_file in enumerate(snakemake.params.connmap_3d): vol = nib.load(conn_file).get_fdata() masked = vol[mask_indices].T conn[:,i] = masked np.savez(snakemake.output.connmap_npz, conn=conn,mask=mask_vol,affine=mask_nib.affine) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | import sklearn import numpy as np import nibabel as nib # Define a function for saving niftis def save_label_nii (labels,mask,affine,out_nifti): labels_vol = np.zeros(mask.shape) labels_vol[mask > 0] = labels+1 #add a 1 so label 0 is diff from bgnd labels_nib = nib.Nifti1Image(labels_vol,affine) nib.save(labels_nib,out_nifti) data = np.load(snakemake.input.connmap_group_npz) cluster_range = range(2,snakemake.params.max_k+1) out_nii_list = snakemake.output conn_group = data['conn_group'] mask = data['mask'] affine = data['affine'] # Concat subjects conn_group_m = np.moveaxis(conn_group,0,2) conn_concat = conn_group_m.reshape([conn_group_m.shape[0],conn_group_m.shape[1]*conn_group_m.shape[2]]) # Run spectral clustering and save output nifti for i,k in enumerate(cluster_range): from sklearn.cluster import SpectralClustering clustering = SpectralClustering(n_clusters=k, assign_labels="discretize",random_state=0,affinity='cosine').fit(conn_concat) print(f'i={i}, k={k},saving {out_nii_list[i]}') save_label_nii(clustering.labels_,mask,affine,out_nii_list[i]) |
Support
- Future updates
Related Workflows





