Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Citation preview incrementally increases memory usage with each preview #2533

Closed
kingwarrick opened this issue Feb 9, 2017 · 32 comments · Fixed by #3533
Closed

Citation preview incrementally increases memory usage with each preview #2533

kingwarrick opened this issue Feb 9, 2017 · 32 comments · Fixed by #3533

Comments

@kingwarrick
Copy link

kingwarrick commented Feb 9, 2017

JabRef 3.8.2
windows 10 10.0 amd64
Java 1.8.0_121

RAM is used during preview, but not fully released. With more previews, less is released until run-away usage occurs. After ~4-5 previews (or ~500MB RAM, not sure which is more significant), no appreciable amount of RAM is released, with subsequent previews adding 100-500MB per preview to total RAM usage. Behaviour changes after ~1400MB RAM use, with slower incremental use and more frequent release of small amounts of ram.

Tested with different preview styles (no change); interesting to note how CPU use in IEEE preview generation spikes to ~40%, whereas APA spikes to ~20%.

NOT tested with any release versions.

Steps to reproduce [RAM use in square brackets]:

  1. Open JabRef and bibtex database (~900 entries)
  2. Single click on entry to show preview box, wait for preview to show
  3. Single click on another entry, wait for preview to show

This shows in the exceptions tab of the error console:

ANTLR Tool version 4.5.3 used for code generation does not match the current runtime version 4.6ANTLR Runtime version 4.5.3 used for parser compilation does not match the current runtime version 4.6ANTLR Tool version 4.5.3 used for code generation does not match the current runtime version 4.6ANTLR Runtime version 4.5.3 used for parser compilation does not match the current runtime version 4.6ANTLR Tool version 4.5.3 used for code generation does not match the current runtime version 4.6ANTLR Runtime version 4.5.3 used for parser compilation does not match the current runtime version 4.6ANTLR Tool version 4.5.3 used for code generation does not match the current runtime version 4.6ANTLR Runtime version 4.5.3 used for parser compilation does not match the current runtime version 4.6
@kingwarrick kingwarrick changed the title Citation preview incrementally increases memory usage for each preview Citation preview incrementally increases memory usage with each preview Feb 9, 2017
@kingwarrick
Copy link
Author

Similar to this:
#2247
but for preview rather than entry editing

tobiasdiez commented on Nov 8 2016
I can confirm at least the large memory footprint. After opening a normal db (mine has 500 entries and a few groups) JabRef eats only 200 MB of RAM. Now open the entry editor and run through a few entries (using the arrow keys so that a new entry editor is generated for the entries). Result: over 1 GB of RAM usage. Btw: the same db with 3.6 only needed < 100 MB RAM.

@koppor
Copy link
Member

koppor commented Feb 10, 2017 via email

@lenhard lenhard added the status: waiting-for-feedback The submitter or other users need to provide more information about the issue label Feb 28, 2017
@lenhard
Copy link
Member

lenhard commented Feb 28, 2017

The mention did not work, so here again @bartsch-dev What is our policy on cache clearance?

@chriba
Copy link
Contributor

chriba commented Feb 28, 2017

The cached citation gets deleted when a field of the BibEntry changes or the BibEntry itself is removed (or the selected citation style changes). https://github.com/JabRef/jabref/blob/master/src/main/java/org/jabref/logic/citationstyle/CitationStyleCache.java#L59

It gets generated again the next time it's selected by the user and the preview is active.

@koppor
Copy link
Member

koppor commented Mar 27, 2017

This issue will be solved if we remove the cache and finally integrate #2250 in the master branch.

@LinusDietz LinusDietz self-assigned this Mar 27, 2017
@LinusDietz
Copy link
Member

Sorry, can't confirm with current master 1f893cd under linux with

openjdk version "1.8.0_121"
OpenJDK Runtime Environment (build 1.8.0_121-b13)
OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)

I opened a 6k entry bibtex database and performed the above described steps. RAM never went beyond 425MB before the Garbage Collection kicked in.

@lenhard
Copy link
Member

lenhard commented Apr 5, 2017

Thanks for testing @lynyus!

@kingwarrick Please give a it a try with the latest version of JabRef from http://builds.jabref.org/master/ Your problem seems to be solved in the most recent dev builds. Please do this with a backup of your bib file since there are a number of breaking changes regarding groups. Let us know if your experiences!

@kingwarrick
Copy link
Author

kingwarrick commented Apr 5, 2017 via email

@matthiasgeiger
Copy link
Member

Can reproduce the high RAM and CPU usage if one of the CSL Styles is used for preview. Performing a manual garbage collection using jvisualvm reduces the amount of used heapspace - but this is not freeed for the OS.

I assume we cannot really do anything about this - potentially #2250 will solve this but I'm not Sure...

@koppor
Copy link
Member

koppor commented Apr 5, 2017

Yes, it will be solved. However, currently no one has time to work on the build process. Hopefully after the alpha release. Let's see.

@koppor koppor removed the status: waiting-for-feedback The submitter or other users need to provide more information about the issue label Apr 6, 2017
@koppor
Copy link
Member

koppor commented Apr 6, 2017

@lynyus has worked on the caching of citation styles and it should get better.

@kingwarrick Is there any upper bound of memory usage? - There should be...

@LinusDietz
Copy link
Member

LinusDietz commented Apr 6, 2017 via email

@kingwarrick
Copy link
Author

kingwarrick commented Apr 6, 2017 via email

@matthiasgeiger
Copy link
Member

That's jvisualvm (just search in the start menu) which is shipped with the Oracle JDK.

@kingwarrick
Copy link
Author

Hopefully this will help:
jabref-memory

@koppor
Copy link
Member

koppor commented Sep 1, 2017

The PR #3089 should have introduced a limit of the memory consumption. Could you please check again? I hope, that not each preview adds 100 MB of RAM. If it does, we have to really limit the number of cached entry previews. Currently, there are 1024 entries cached.

@kingwarrick
Copy link
Author

JabRef 4.0-dev--snapshot--2017-09-01--master--e46d5ee88
Windows 10 10.0 amd64
Java 1.8.0_144

First test (preview one entry at a time, advancing after display; no screenshot):
Memory usage increased significantly for the first 15 previews up to ~1500MB (same as prior), peaked around 1700MB briefly, with memory usage hovering around ~1550MB after preview 100.

Second test (quickly advance through 3-10 entries, triggers preview for more than one entry; screenshot attached):
Memory use increases rapidly, peaking at ~2200MB and hovering around 2000MB.
The time-to-display of the preview is much quicker after about 20 previews.

jabref-mem-use

@sbitzer
Copy link

sbitzer commented Oct 16, 2017

I can confirm that the 4.0 release has severe memory issues under Linux. Just opening my database with Jabref 4.0 makes Java use 984 MB of RAM. The first search makes this go up to 1.2 GB. Stepping through the first 20 shown entries brings this to 2.1 GB. After adding 5 articles this morning and searching some others Jabref was at 3.2 GB. That's not really practicable. How can caching something like a 300 character-string (citation preview) cost several 10s of MB of RAM?

JabRef 4.0
Linux 3.16.0-38-generic amd64
Java 1.8.0_144

@lenhard
Copy link
Member

lenhard commented Oct 17, 2017

To be fair on the 4.0 release, this problem was present well before that :) And of course caching a 300 character string does not take 10s of MB of RAM, obviously JabRef stores more than that (and I am in no way saying that's good).

Given that the CSL cache seems to be totally non-functionally, can we simply disable/remove it? Or limit the cache to, say, a 1 digit number of entries? I know that this will mean that displaying the previews will take longer, but the memory consumption described here is far beyond anything justifiable. I have no deeper knowledge of how the previews work internally, but maybe one of the developers who contributed to it can clarify?

@sbitzer the quickfix (mentioned above) seems to be to not use citation styles for the preview.

EDIT: I think we should do something here soon and added it to the 4.1 milestone.

@tobiasdiez
Copy link
Member

@halirutan do you have time to look at this issue? Your analysis in other performance related issues was superb. Of course, I can help fixing the code.

@halirutan
Copy link
Collaborator

@tobiasdiez I already profiled this. I was working the last weeks on my Mathematica plugin which explains my absence. I have two issues on my list that I want to look at. It's this one and the disastrous behavior of the group-view for moderately nested group-trees.

When my memory serves me right, I couldn't come to a definite conclusion with this issue. It is a combination of the several string-builders. I will try to repeat the profiling to find out if there is a point where we can make a worthy impact on the memory consumption

@halirutan
Copy link
Collaborator

halirutan commented Dec 9, 2017

@tobiasdiez @sbitzer @lenhard @kingwarrick I cannot confirm the high RAM usage with the latest master c51ecbe. I used a large database (several thousand entries) and scrolled through 500 of them, showing the preview each time in IEEE style. The preview is the biggest contributor, but it used less than 90MB of RAM. This comes down to 180kb per preview. See the highlighted line

img

However, one thing that I find highly suspicious is that the simple call to setting the text in the JPane with javax.swing.JEditorPane#setText in these lines uses 52 MB of the overall 86MB.

Can someone on a machine that shows the weird behavior test to comment out the actual setting of the preview string?

    public void setPreviewLabel(String text) {                  
        if (SwingUtilities.isEventDispatchThread()) {           
            final Document document = previewPane.getDocument();
//            previewPane.setText(text);                        
//            previewPane.revalidate();                         
        } else {                                                
            SwingUtilities.invokeLater(() -> {                  
//                previewPane.setText(text);                    
//                previewPane.revalidate();                     
            });                                                 
        }                                                       
        this.scrollToTop();                                     
    }

This leaves the whole machinery of JabRef creating the preview string alive but shoves the JPanel out of the memory picture. Scroll slowly bit by bit through the entries with preview open and see what your memory does.

I'm on Ubuntu 16.04 with Oracle JRE.

@tobiasdiez
Copy link
Member

@halirutan Thanks for the analysis. 180kb per preview sounds fine. So with the current cache size of 1024, there shouldn't be any problems since the total memory consumption is then around 200 mb. So how does these astronomical reports of > 2 gb come about?

It is understandable that the the JEditorPane has quite some overhead. In the end it is a full webbrowser and the simple string has to be parsed to a html tree and then displayed. Maybe the migration to JavaFX #3504 helps in this regard a bit.

@lenhard
Copy link
Member

lenhard commented Dec 9, 2017

@halirutan @tobiasdiez If I am not mistaken, the memory issue happens only when using citationstyles, not the default preview or the IEEE preview.

@koppor
Copy link
Member

koppor commented Dec 9, 2017 via email

@halirutan
Copy link
Collaborator

Here comes the revised performance analysis. I am not certain, but when I understand this correctly, then the cache is not the problem. The main drain of resources seems to be the CSL library. Let us first look at the memory. Note that the profiler does not track all objects which is why there is a large difference between the size in line 1 and the size that the CSL lib uses. What I tested is the creation of exactly one preview that uses one of the provided csl styles and not the default one.

This is the memory footprint:

img

As you can see, 98% of the memory is eaten up by creating the styled bib-entry through the CSL library. Notably, 63% of it only by constructing the CSL instance inside makeAdhocBibliography. When we look at the comment of this function, it states:

Creates an ad hoc bibliography from the given citation items. Calling this method is rather expensive as it initializes the CSL processor. If you need to create bibliographies multiple times in your application you should create the processor yourself and cache it if necessary.

This is by far no understatement. I understand that using makeAdhocBibliography is a nice way of feeding only one bib-item and creating an html-view of it, but it comes at a steep price. I have quickly looked through the API and it seems this is rather optimized for handling and rendering whole bibliographies instead of a fast preview for single items.

If possible at all, I would advise to hold our own CSL instance and reuse it. If we look at the performance, we get the same picture. Please don't be confused by the overall 25%. I left the application run idle for a moment so a lot of time was spent only waiting for me. The 25% is the time where the hard work was done.

img

What you should take from this is that from the 25% of the time that was needed to preview the item, 18% were spent only constructing the CSL instance. That is tremendous (50 out of 70 seconds under profiling!). The code for makeAdhocBibliography is short enough to share it:

public static Bibliography makeAdhocBibliography(String style, String outputFormat,
		CSLItemData... items) throws IOException {                                 
	ItemDataProvider provider = new ListItemDataProvider(items);                   
	CSL csl = new CSL(provider, style);                                            
	csl.setOutputFormat(outputFormat);                                             
	                                                                               
	String[] ids = new String[items.length];                                       
	for (int i = 0; i < items.length; ++i) {                                       
		ids[i] = items[i].getId();                                                 
	}                                                                              
	csl.registerCitationItems(ids);                                                
	                                                                               
	return csl.makeBibliography();                                                 
}                                                                                  

There are 3 parts in this method. First, the CSL instance is created. As you can see, the ListItemDataProvider is used to construct the CSL instance and it already contains our 1 item that we want to process. So if this is possible, we should use our own data provider that is updated with the current item, but I looked for the first time at this library and I'm not sure this works.

There are 2 other parts. The for loop just copies the ids which is hopefully fast. The last part consists of registerCitationItems and makeBibliography. Both methods come with a very notable performance and memory footprint as well as you can see in the images, but not as large as the constructor and I'm not sure how much this can be improved.

My very naive view is: They need to initialize the whole machinery and they need to parse the csl style. I understand this is hard work. But then they have everything set up and we provide CSLItemData so they only need to render it according to the style. Why are both final methods so expensive?

Anyway, my suggestion is to leave the cache alone or turn it of. More important is that we initialize the CSL engine and reuse it as much as possible.

@koppor
Copy link
Member

koppor commented Dec 11, 2017

Another quick way to add a workaround is to switch to the V8 engine. See #3180. Especially the screen videos with V8 and without V8.

@tobiasdiez
Copy link
Member

This should be fixed in the latest development version. Could you please check the build from http://builds.jabref.org/master/. Thanks!

@sbitzer
Copy link

sbitzer commented Dec 21, 2017

I just tried it out. Stepping through the first 50 entries of my database producing APA-style previews for them, increased the memory usage of Jabref from 915 MB to 1.4 GB. That's certainly less than before, but it's still a lot. For comparison: Doing the same with the standard entry preview raises memory usage only from 915 MB to 920 MB.

JabRef 4.1-dev--snapshot--2017-12-21--master--7819cb4aa
Linux 3.16.0-38-generic amd64
Java 1.8.0_151

@koppor
Copy link
Member

koppor commented Dec 21, 2017 via email

@sbitzer
Copy link

sbitzer commented Dec 21, 2017

The first entry preview takes a while to be generated. The others are almost instantaneous.

@halirutan
Copy link
Collaborator

@sbitzer The first entry takes exactly as long as every entry needed before the patch. It initializes the CSL engine before it can render a preview. After that, we reuse it and until you change the style, every preview is fast. On a style change, the first item will trigger a re-initialization again.

@koppor I am certain the cache is not the problem and in fact, it saves us from using even more memory. Look here:

img

Simply running through 20 items and showing the preview uses 87% of the memory. 43% is spent in CSL.makeBibliography, 39% is spent in registering the item in the CSL engine and only 4% is spent by JabRef creating the format for CSL. I'm sorry to say this, but this is as far as I can optimize it.

The real pain under the hood is the NashornScriptEngine that is afaik used to run the JS code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants