Skip to content

Commit

Permalink
added changes to support mmapped file.
Browse files Browse the repository at this point in the history
  • Loading branch information
hariharan-devarajan committed Nov 28, 2023
1 parent 52011b4 commit 6221b33
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 5 deletions.
6 changes: 3 additions & 3 deletions dlio_benchmark/configs/workload/megatron_deepspeed.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@ workflow:
checkpoint: True

dataset:
data_folder: /p/lustre1/haridev/dataset/megatron-deepspeed/
data_folder: dataset/megatron-deepspeed/
format: mmap_indexed_binary
num_files_train: 1
num_samples_per_file: 277203535
record_length: 4096
record_length: 2048

reader:
data_loader: pytorch
Expand All @@ -26,7 +26,7 @@ train:
computation_time: 8.99

checkpoint:
checkpoint_folder: /p/lustre1/haridev/checkpoints/megatron-deepspeed
checkpoint_folder: checkpoints/megatron-deepspeed
checkpoint_after_epoch: 1000
model_size: 30102
type: independent
Expand Down
4 changes: 2 additions & 2 deletions dlio_benchmark/data_generator/indexed_binary_generator.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,14 +67,14 @@ def generate(self):
data_file = open(out_path_spec, "wb")
off_file = open(out_path_spec_off_idx, "wb")
sz_file = open(out_path_spec_sz_idx, "wb")
records = np.random.randint(255, size=write_size, dtype=np.uint8)
while written_bytes < total_size:
data_to_write = write_size if written_bytes + write_size <= total_size else total_size - written_bytes
samples_to_write = data_to_write // sample_size

# Write data
records = np.random.randint(255, size=data_to_write, dtype=np.uint8)
myfmt = 'B' * data_to_write
binary_data = struct.pack(myfmt, *records)
binary_data = struct.pack(myfmt, *records[:data_to_write])
data_file.write(binary_data)

# Write offsets
Expand Down

0 comments on commit 6221b33

Please sign in to comment.