Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Network latency correction between Vitro and the outsourced tripplet #378

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

michel-heon
Copy link
Member

Testing performance of external graph usage:

What does this pull request do?

This PR partially solves the network latency problem in the communication between a VIVO instance in server-less mode and its remote triples server.

What's new?

Four classes are affected by this PR:
- JsonServlet - Decrease the number of individuals called per page
- IndividualListController - Unify the individual name value to be processed per page as defined in JsonServlet
- GetRenderedSearchIndividualsByVClass - Set the addShortViewRenderings method to parallel processing mode

How should this be tested?

Prerequisites

  • A computational instance (e.g. AWS-EC2) containing a triplet server (e.g. jena Fuseki or AWS-Neptune)
  • A computational instance containing VIVO in server mode less
    Note: it is important that both servers (VIVO & TripleStore) are in their own computational instance (VM) to observe network latency during SPARQL calls

Configuration

  • Ensure that the triples server communications ports are open and that the network route opens up access to the SparqlEndPoint at the VIVO instance
  • In VIVO, properly configure the applicationSetup.ttl file, including the triplet: :application :hasContentTripleSource :sparqlContentTripleSource ; and the contents of `sparqlContentTripleSource

Here is a sample configuration for AWS-NEPTUNE for our development

# ------------------------------------------------------------------------------
#
# This file specifies the structure of the Vitro application: which modules
# are used, and what parameters they require.
#
# Most Vitro installations will not need to modify this file.
#
# For most installations, only the settings in the runtime.properties file will
# be changed.
#
# ------------------------------------------------------------------------------

@prefix : <http://vitro.mannlib.cornell.edu/ns/vitro/ApplicationSetup#> .
@prefix vitroWebapp: <java:edu.cornell.mannlib.vitro.webapp#> .

# ----------------------------
#
# Describe the application by its implementing class and by references to the
# modules it uses.
#

:application
    a   vitroWebapp:application.ApplicationImpl ,
        vitroWebapp:modules.Application ;
    :hasSearchEngine              :instrumentedSearchEngineWrapper ;
    :hasSearchIndexer             :basicSearchIndexer ;
    :hasImageProcessor            :iioImageProcessor ;
    :hasFileStorage               :ptiFileStorage ;
    :hasContentTripleSource       :sparqlContentTripleSource ;
    :hasTBoxReasonerModule        :jfactTBoxReasonerModule ;
    :hasConfigurationTripleSource :tdbConfigurationTripleSource .
    
# ----------------------------
#
# Image processor module:
#

:iioImageProcessor
    a   vitroWebapp:imageprocessor.imageio.IIOImageProcessor ,
        vitroWebapp:modules.imageProcessor.ImageProcessor .

# ----------------------------
#
# File storage module:
#    The PairTree-inspired implementation is the only standard option.
#    It requires no parameters.
#

:ptiFileStorage
    a   vitroWebapp:filestorage.impl.FileStorageImplWrapper ,
        vitroWebapp:modules.fileStorage.FileStorage .

# ----------------------------
#
# Search engine module:
#    The Solr-based implementation is the only standard option, but it can be
#    wrapped in an "instrumented" wrapper, which provides additional logging
#    and more rigorous life-cycle checking.
#

:instrumentedSearchEngineWrapper
    a   vitroWebapp:searchengine.InstrumentedSearchEngineWrapper ,
        vitroWebapp:modules.searchEngine.SearchEngine ;
    :wraps :solrSearchEngine .

:solrSearchEngine
    a   vitroWebapp:searchengine.solr.SolrSearchEngine ,
        vitroWebapp:modules.searchEngine.SearchEngine .

# ----------------------------
#
# Search indexer module:
#    There is only one standard implementation. You must specify the number of
#    worker threads in the thread pool.
#

:basicSearchIndexer
    a   vitroWebapp:searchindex.SearchIndexerImpl ,
        vitroWebapp:modules.searchIndexer.SearchIndexer ;
    :threadPoolSize "10" .

# ----------------------------
#
# Content triples source module: holds data contents
#    The SDB-based implementation is the default option. It reads its parameters
#    from the runtime.properties file, for backward compatibility.
#
#    Other implementations are based on a local TDB instance, a "standard" SPARQL
#    endpoint, or a Virtuoso endpoint, with parameters as shown.
#

#:sdbContentTripleSource
#    a   vitroWebapp:triplesource.impl.sdb.ContentTripleSourceSDB ,
#        vitroWebapp:modules.tripleSource.ContentTripleSource .

#:tdbContentTripleSource
#    a   vitroWebapp:triplesource.impl.tdb.ContentTripleSourceTDB ,
#        vitroWebapp:modules.tripleSource.ContentTripleSource ;
    # May be an absolute path, or relative to the Vitro home directory.
#    :hasTdbDirectory "tdbContentModels" .

:sparqlContentTripleSource
    a   vitroWebapp:triplesource.impl.sparql.ContentTripleSourceSPARQL ,
        vitroWebapp:modules.tripleSource.ContentTripleSource ;
    # The URI of the SPARQL endpoint for your triple-store.
    :hasEndpointURI "https://vivo-studio-neptune-cluster.cluster-ro-c2o1sdzzfasi.ca-central-1.neptune.amazonaws.com:8182/sparql" ;
    # The URI to use for SPARQL UPDATE calls against your triple-store.
    :hasUpdateEndpointURI "https://vivo-studio-neptune-cluster.cluster-c2o1sdzzfasi.ca-central-1.neptune.amazonaws.com:8182/sparql" .

#:virtuosoContentTripleSource
#    a   vitroWebapp:triplesource.impl.virtuoso.ContentTripleSourceVirtuoso ,
#        vitroWebapp:modules.tripleSource.ContentTripleSource ;
#    # The URI where Virtuoso can be accessed: don't include the /sparql path.
#    :hasBaseURI "http://localhost:8890" ;
#    # The name and password of a Virtuoso account that has the SPARQL_UPDATE role.
#    :hasUsername "USERNAME" ;
#    :hasPassword "PASSWORD" .


# ----------------------------
#
# Configuration triples source module: holds configuration data and user accounts
#    The TDB-based implementation is the only standard option.
#    It requires no parameters.
#

:tdbConfigurationTripleSource
    a   vitroWebapp:triplesource.impl.tdb.ConfigurationTripleSourceTDB ,
        vitroWebapp:modules.tripleSource.ConfigurationTripleSource .

# ----------------------------
#
# TBox reasoner module:
#    The JFact-based implementation is the only standard option.
#    It requires no parameters.
#

:jfactTBoxReasonerModule
    a   vitroWebapp:tboxreasoner.impl.jfact.JFactTBoxReasonerModule ,
        vitroWebapp:modules.tboxreasoner.TBoxReasonerModule .

Compilation and execution

  1. Start the triplet server and empty it of its contents
  2. Load a large triplet source
  3. Compile VIVO without this PR and start the execution
  4. Observe the slow refresh of the person page
  5. Apply the PR, compile VIVO and start the execution
  6. Observe the improvement in the refresh of the person page

Additional Notes:

For more details on how to perform more formal testing, please refer to the outcome definition

Interested parties

@chenejac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improvement of communication with external graphs
2 participants