bmtk.builder.edges_sorter package


bmtk.builder.edges_sorter.memory_sorter module

bmtk.builder.edges_sorter.memory_sorter.copy_attributes(in_grp, out_grp)[source]

Recursively copy hdf5 Group/Dataset attributes from in_grp to out_grp

  • in_grp – hdf5 Group object whose attributes will be copied from.

  • out_grp – hdf5 Group object that will have it’s attributes updated/copied to.

bmtk.builder.edges_sorter.memory_sorter.quicksort_edges(input_edges_path, output_edges_path, edges_population, sort_by, sort_model_properties=True, **kwargs)[source]

bmtk.builder.edges_sorter.merge_sorter module

class bmtk.builder.edges_sorter.merge_sorter.ProgressFile(cache_dir, n_edges, n_chunks, sort_key, root_name=None)[source]

Bases: object

Class for keeping track of the progress of the sorting of the hdf5 edges files. Will write progress to disk, so that if sorting fails we can still continue from where it left off.

update(initialized=None, n_edges=None, n_chunks=None, iteration=None, chunk_files=None, chunk_indices=None, write_index=None, completed=None)[source]
bmtk.builder.edges_sorter.merge_sorter.external_merge_sort(input_edges_path, output_edges_path, edges_population, sort_by, sort_model_properties=True, n_chunks=12, max_itrs=13, cache_dir='.sort_cache', **kwargs)[source]

Does an external merge sort on an input edges hdf5 file, saves value in new file. Usefull for large network files where we are not able to load into memory.

Will split the original hdf5 into <n_chunks> chunks on the disk in cache_dir, Will sort each individual chunk of data in memory, then perform a merge on all the chunks. For speed considers may try to do the chunking and merging in multiple iterations.

Itermediate sorting results are saved in cache_dir (eg .sort_cache) and if the sorting fails or doesn’t finish in max_itrs, running this function again will continue where it last left off.

  • input_edges_path – path to original edges file

  • output_edges_path – path name of new file that will be created

  • edges_population

  • sort_by – ‘edge_type_id’, ‘source_node_id’, etc.

  • sort_model_properties – resort the model group so edges_group_id+edge_group_index is in order

  • n_chunks – Number of chunks, eg the fraction of the edges.h5 file that will be loaded into memory at a given time. (default: 12)

  • cache_dir – A temporary directory where itermeidate results will be stored. (default: ‘./cache_dir/)

  • max_itrs – The maximum number of iterations to run the merge sort.

Module contents

bmtk.builder.edges_sorter.sort_edges(input_edges_path, output_edges_path, edges_population, sort_by, sort_model_properties=True, sort_on_disk=False, **sorter_args)[source]