Fix model loading time through prefetching the file on another thread #734

CoderRC · 2023-04-03T04:21:54Z

I found that when using the current llama.cpp that the first model loading time went down due to OS not knowing which location of the file will be read and it caused a 4kb requests to disk, by reading the file on the seperate thread it will now cause only 1mb requests. Which for example 1000000 4kb requests to disk is decreased down to 4000 1mb requests to disk which is a big difference.
Specifically solves #705
Right now it read for me 500 mb/s, after this patch it now reads 2.3 gb/s.

Try: git clone https://github.com/CoderRC/llama.cpp
to see the difference of first loading time only after restart to produce accurate results

From:
llama_print_timings: load time = 31928.12 ms

To:
llama_print_timings: load time = 11478.47 ms

I created a separate thread to read the file therefore improving loading time due to fetching file from disk.

danielzgtg · 2023-04-03T04:37:50Z

It would be better if we use madvise and MADV_WILLNEED on Linux and the equivalent on Windows. This way the OS will still read the pages from the disk without needing to copy them to the separate thread.

comex · 2023-04-03T04:43:00Z

Very interesting that Windows is not issuing correctly sized reads by default. Which tool are you using to measure this?

That said, this solution is really hacky. You probably want to use PrefetchVirtualMemory on Windows and madvise on Unix.

CoderRC · 2023-04-03T04:46:14Z

I just on windows empty the standby list through RAM MAP utility in windows and then run the llama.cpp and go to taskmgr and look at the performance disk tab.

danielzgtg · 2023-04-03T04:55:06Z

ggml.h

+#ifndef _POSIX_THREADS
+#if defined(_WIN32)
+#include <windows.h>
+#endif
+typedef HANDLE pthread_t;
+typedef DWORD thread_ret_t;
+static int pthread_create(pthread_t* out, void* unused, thread_ret_t(*func)(void*), void* arg);
+static int pthread_join(pthread_t thread, void* unused);
+#endif


Why reinvent std::thread?

std::thread does not exist in msvc

Seems to exist at https://learn.microsoft.com/en-us/cpp/standard-library/thread-class?view=msvc-170 . But I didn't test, so feel free to resolve this thread if std::thread doesn't work on your machine

std::thread do exists in msvc

CoderRC · 2023-04-03T05:00:07Z

Gonna try PrefetchVirtualMemory before draft to non draft.

CoderRC · 2023-04-03T06:47:35Z

PrefetchVirtualMemory is perfect and it works on windows properly. Gonna do some POSIX tests and update https://github.com/CoderRC/libmingw32_extended

prusnak · 2023-04-03T10:20:35Z

I don't like the fact that this approach complicates stuff also for platforms that are not broken (i.e. not Windows).

Can we get rid of everything else and keep just the PrefetchVirtualMemory call?

In another word, does this fix the issue with no further changes required?

diff --git a/llama.cpp b/llama.cpp
index 854bb89..78fbace 100644
--- a/llama.cpp
+++ b/llama.cpp
@@ -327,6 +327,11 @@ static void *mmap_file(const char *fname, uint64_t *mm_length) {
     void *addr = MapViewOfFile(hMapping, FILE_MAP_READ, 0, 0, 0);
     CloseHandle(hMapping);
     if (!addr) return 0;
+    // Prefetch the virtual memory range
+    WIN32_MEMORY_RANGE_ENTRY range;
+    range.VirtualAddress = addr;
+    range.NumberOfBytes = (SIZE_T)length;
+    PrefetchVirtualMemory(GetCurrentProcess(), 1, &range, 0);
 #else
     int fd = open(fname, O_RDONLY);
     if (fd == -1) return 0;

danielzgtg · 2023-04-03T10:28:03Z

not broken (i.e. not Windows).

madvise should speed things up for Linux too. It might not be as dramatic, but there should be a tiny improvement

prusnak · 2023-04-03T10:31:45Z

madvise should speed things up for Linux too. It might not be as dramatic, but there should be a tiny improvement

Came up with #740 - it speeds the loading time of 7B model 3.5 times on my macbook

CoderRC · 2023-04-03T18:24:56Z

Try on linux it works and compare speeds to #740

prusnak · 2023-04-03T18:31:53Z

You changed end-of-line format (from Unix to Windows), so GitHub shows you replaced all the lines. Please fix this. Also don't merge master into your branch. What you should do is to rebase: https://www.atlassian.com/git/tutorials/rewriting-history/git-rebase

…reate and pthread_join

…on right side of

…token

CoderRC · 2023-04-03T21:54:04Z

Try to do tests and tell result below and compare to #740

comex · 2023-04-03T23:51:53Z

@CoderRC Can you please test #740 as well?

x02Sylvie · 2023-04-05T15:20:34Z

With this PR my HDD's loading speed went from 5 mb/s to 50 mb/s of model.

It's not as fast as before mmap but we're getting there

CoderRC · 2023-04-19T02:36:25Z

I am going to close this since #801 was merged and this merge solved my model loading time for msys2 with mingw32. If want to solve Efficient preloading for mmap() go to #869

CoderRC added 2 commits April 2, 2023 23:48

Fixed loading time by reading the file while letting the code execute

578c327

I created a separate thread to read the file therefore improving loading time due to fetching file from disk.

Sync

0a1c308

This was referenced Apr 3, 2023

Bring back the ggml model format and revert breaking mmap change (#613) #711

Closed

Windows page fault disk i/o slow on first load #705

Closed

Added threading for non posix systems

ec59387

Patch 1: Added threading for non posix systems

61cd520

danielzgtg reviewed Apr 3, 2023

View reviewed changes

danielzgtg mentioned this pull request Apr 3, 2023

Regression: "The first main on the moon was " #693

Closed

CoderRC marked this pull request as draft April 3, 2023 04:59

danielzgtg mentioned this pull request Apr 3, 2023

Advise the kernel to preload the mapped memory #740

Closed

CoderRC added 2 commits April 3, 2023 14:20

Merge branch 'ggerganov:master' into master

68623ee

Patch 2: Added threading for non posix systems

8a7dd2c

CoderRC added 5 commits April 3, 2023 15:45

Remove deletions of Patch 2: Added threading for non posix systems

0de310a

Change static pthread_create and pthread_join to non static pthread_c…

10d758b

…reate and pthread_join

Remove redundant duplicate #include <windows.h>

8889c3b

Trying to fix error on windows compilation C2589: '(': illegal token …

b90a3bf

…on right side of

Trying again to fix error on windows compilation C2589: '(': illegal …

32d0fe7

…token

CoderRC closed this Apr 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix model loading time through prefetching the file on another thread #734

Fix model loading time through prefetching the file on another thread #734

CoderRC commented Apr 3, 2023

danielzgtg commented Apr 3, 2023

comex commented Apr 3, 2023

CoderRC commented Apr 3, 2023

danielzgtg Apr 3, 2023

CoderRC Apr 3, 2023

danielzgtg Apr 3, 2023

prsyahmi Apr 3, 2023

CoderRC commented Apr 3, 2023

CoderRC commented Apr 3, 2023

prusnak commented Apr 3, 2023 •

edited

Loading

danielzgtg commented Apr 3, 2023

prusnak commented Apr 3, 2023

CoderRC commented Apr 3, 2023

prusnak commented Apr 3, 2023 •

edited

Loading

CoderRC commented Apr 3, 2023

comex commented Apr 3, 2023

x02Sylvie commented Apr 5, 2023 •

edited

Loading

CoderRC commented Apr 19, 2023

Fix model loading time through prefetching the file on another thread #734

Fix model loading time through prefetching the file on another thread #734

Conversation

CoderRC commented Apr 3, 2023

danielzgtg commented Apr 3, 2023

comex commented Apr 3, 2023

CoderRC commented Apr 3, 2023

danielzgtg Apr 3, 2023

Choose a reason for hiding this comment

CoderRC Apr 3, 2023

Choose a reason for hiding this comment

danielzgtg Apr 3, 2023

Choose a reason for hiding this comment

prsyahmi Apr 3, 2023

Choose a reason for hiding this comment

CoderRC commented Apr 3, 2023

CoderRC commented Apr 3, 2023

prusnak commented Apr 3, 2023 • edited Loading

danielzgtg commented Apr 3, 2023

prusnak commented Apr 3, 2023

CoderRC commented Apr 3, 2023

prusnak commented Apr 3, 2023 • edited Loading

CoderRC commented Apr 3, 2023

comex commented Apr 3, 2023

x02Sylvie commented Apr 5, 2023 • edited Loading

CoderRC commented Apr 19, 2023

prusnak commented Apr 3, 2023 •

edited

Loading

prusnak commented Apr 3, 2023 •

edited

Loading

x02Sylvie commented Apr 5, 2023 •

edited

Loading