From 80061da52382286fe9567fda7f79f087e17e6fa2 Mon Sep 17 00:00:00 2001
From: Jerry Gao <109158931+Sanhaoji2@users.noreply.github.com>
Date: Mon, 6 May 2024 12:03:43 +0800
Subject: [PATCH] Jegao/label host fix with main3 (#549)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

* add codebook passing and pq/opq dim overwrite.

* Support per query filter (#279)

* Transferring Varun's chagges from external fork with squash merge

* generating multiple gt's for each filter label + search with multiple filter labels (code cleanup)

* supporting no-filter + one filter label + filter label file (multiple filters) while computing GT

* generating multiple gt's + refactoring code for readability & cleanliness

* adding more tests for filtered search

* updating pr-test to test filtered cases

* lowering recall requirement for disk index

* transferred functions to filter_utils

* adding more test for build and search without universal label

* adding one_per_point distribution to generate_synthetic_labels + cleaning up artifacts after compute gt+ removing minor errors

* refactoring search_disk_index to use a query filter vector
---------

Co-authored-by: patelyash <patelyash@microsoft.com>
Co-authored-by: Varun Sivashankar <t-varunsi@microsoft.com>

* Rebasing main's latest commits onto ravi/filter_support_rebased (#225)

- add code for two variants of filtered index, readme and CI tests

- add utils for synthetic label generation and CI tests.

* Add co-authors

Co-authored-by: ravishankar <rakri@microsoft.com>
Co-authored-by: Varun Sivashankar <t-varunsi@microsoft.com>

---------

Co-authored-by: ravishankar <rakri@microsoft.com>
Co-authored-by: David Kaczynski <dkaczynski@microsoft.com>
Co-authored-by: Siddharth Gollapudi <t-gollapudis@microsoft.com>
Co-authored-by: Neelam Mahapatro <nmahapatro@microsoft.com>
Co-authored-by: Harsha Vardhan Simhadri <harshasi@microsoft.com>
Co-authored-by: Harsha Vardhan Simhadri <harsha-simhadri@users.noreply.github.com>
Co-authored-by: REDMOND\patelyash <patelyash@microsoft.com>
Co-authored-by: Varun Sivashankar <t-varunsi@microsoft.com>

* Clang-format now errors on push and PR if formatting is incorrect (#236)

* Rather than sift through all the *.cpp and *.h in the root directory, we're looking for only the sources in our main repository for formatting. Git submodules are excluded

* Removing the --Werror flag only until we actually format all of the code in a future commit

* We're choosing to base our style on the Microsoft style guide and not make any changes

* Running format action on source code.  Settling on Google styling.  Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false')

* Enabling error on malformatted file

* Revert "Enabling error on malformatted file"

This reverts commit fa33e8284cb9ee815d882e516aaeb7be6800a982.

* Revert "Running format action on source code.  Settling on Google styling.  Settled on '.clang-format' instead of '_clang-format'. Fixed instructions such that only clang-format 12 is installed (13 changes SortIncludes options from true/false to a trinary set of options, none of which include the word 'false')"

This reverts commit e0281bec8c265ecd3b56d65f61e768238ed8b1c1.

* Trying again; formatting rules based on Google rules, disables sorting includes as that breaks us, and enabling check on build.

* Somehow this was missed in the mass format.  Formatting include/distance.h.

* Manually fixing the formatting because clang-format wouldn't, but WOULD flag it as invalid

* Update SSD_index.md (#258)

Fix typo in SSD index readme

* Add filter-diskann paper link to readme (#275)

* Update README.md (#277)

* update citation (#281)

* Some fixes to pass internal building pipeline (#282)

Remove warnings affecting internal build pipelines

---------

Co-authored-by: Yiyong Lin <yiyolin@microsoft.com>

* Add support for multiple frozen points (#283)

* Add support for multiple frozen points

* Add the missing parameters to the constructor.

* Added filtered disk index readme (#276)

* Added filtered disk index readme

* Support per query filter (#279)

* Transferring Varun's chagges from external fork with squash merge

* generating multiple gt's for each filter label + search with multiple filter labels (code cleanup)

* supporting no-filter + one filter label + filter label file (multiple filters) while computing GT

* generating multiple gt's + refactoring code for readability & cleanliness

* adding more tests for filtered search

* updating pr-test to test filtered cases

* lowering recall requirement for disk index

* transferred functions to filter_utils

* adding more test for build and search without universal label

* adding one_per_point distribution to generate_synthetic_labels + cleaning up artifacts after compute gt+ removing minor errors

* refactoring search_disk_index to use a query filter vector
---------

Co-authored-by: patelyash <patelyash@microsoft.com>
Co-authored-by: Varun Sivashankar <t-varunsi@microsoft.com>

* udpate merging code

* Using boost program options under Visual Studio MSVC 14.0 Assertion failed

* some commts and rewriting

* add back LF which might be confict with  MSVC 14.0

* clang formating change

* clang formating

* revert back to Lf

* unexpected failure on UT re-try

* adding default string to the path

* fix reference issue

* Fixing Build errors in remove_extra_typedef (#290)

remove _u, _s typedefs

* converting uint64's to size_t where they represent array offsets

---------

Co-authored-by: harsha vardhan simhadri <harsha.v.simhadri@gmail.com>

* clang format

* bump it up to 512 for MAX_PQ_CHUNKS

* default codebook prefix value pass in for generate_quantized_data

* add check for disabling both -B and -QD pass in

* remove rules for force only one of -B and -QD

* clange change

* change clang format

* bring back -B params

* generate_quantized_data pass in referemce instead of const string

* update clang and param reference

* updated dockerfile (#299)

* updated dockerfile

* add parallel build flag to dockerfile

* Adds CI jobs to build our docker container  (#302)

* Adding a step that at least builds the docker container.  I'm not yet sure how I want to actually integrate tests within the container, but at the least we should verify it builds

* docker build needs a path. i honestly thought it defaulted to the CWD

---------

Co-authored-by: Dax Pryce <daxpryce@microsoft.com>

* Python API and Test Suite (#300)

* The first step in the python-api-enhancements branch.  We need to fix a problem with the Parameters class with a double free or segfault on deletion.

* Removing the parameters class in favor of the IndexRead and IndexWrite parameters classes.

* API changes and python packaging changes for linux.  It's almost ready for PR, but definitely ready for push.

* Suppressing the CIBuildWheel step on windows

* added in-mem static and dynamic index class to python bindings (#301)

* Advancing our version number to 0.5.0

* Some more updates as per harsha's comments on PR #300.  The diskann_bindings.cpp still need some more tlc and the wrapper needs to make use of it, and we also want to include some examples, but this is a good place to bring into main and then do further enhancements
---------

Co-authored-by: Harsha Vardhan Simhadri <harsha-simhadri@users.noreply.github.com>

* reducing number of L values for stitched search (#307)

* reducing number of L values for stitched search in CI

* add a warning in prune_neighbor if zero distance neighbor is detected (#320)

* Fix condition on ubuntu version in README (#246)

* Fix building SSD index performance issue (#321)

Fix performance gap between in-mem and SSD based graph built by passing an appropriate number of threads.
---------

Co-authored-by: Yiyong Lin <yiyolin@microsoft.com>
Co-authored-by: Harsha Vardhan Simhadri <harsha-simhadri@users.noreply.github.com>

* remove the distance 0 warning in prune candidate the list, since diskann::cerr does not seem thread safe (#330)

* Set compile warning as error for core projects (#331)

* set(CMAKE_COMPILE_WARNING_AS_ERROR ON)


---------

Co-authored-by: Yiyong Lin <yiyolin@microsoft.com>

* Create a data store abstraction (#305)

Create a virtual data store base class and a derived in-mem store class. In-mem index now uses the data store class.

---------

Co-authored-by: Gopal Srinivasa <gopalsr@microsoft.com>
Co-authored-by: ravishankar <rakri@microsoft.com>
Co-authored-by: yashpatel007 <patelyash1311@gmail.com>

* Disabling Python builds (#338)

* Disabling Python builds

debian stretch no longer seems to have valid apt repos - or at least not ones that we can access - which means our cibuildwheel is failing.

* New python interface, build setup, apps and unit tests (#308)


---------

Co-authored-by: Dax Pryce <daxpryce@microsoft.com>

* Adding some diagnostics to a pr build in an attempt to see what is going on with our systems prior to running our streaming/incremental tests

* fix cast error and add some status prints to in-mem-dynamic app

* Adding unit tests for both memory and disk index builder methods

* After the refactor and polish of the API was left half done, I also left half a jillion bugs in the library. At least I'm confident that build_memory_index and StaticMemoryIndex work in some cases, whereas before they barely were getting off the ground

* Sanity checks of static index (not comprehensive coverage), and tombstone file for test_dynamic_memory_index

* Argument range checks of some of the static memory index values.

* fixes for dynamic index in python interface (#334)

* create separate default number of frozen points for dynamic indices

* consolidate works

* remove superfluous param from dynamic index

* remove superfluous param from dynamic index

* batch insert and args modification to apps

* batch insert and args modification to apps

* typo

* Committing the updated unit tests. At least the initial sanity checks of StaticMemory are done

* Fixing an error in the static memory index ctor

* Formatting python with black

* Have to disable initial load with DynamicMemoryIndex, as there is no way to build a memory index with an associated tags file yet, making it impossible to load an index without tags

* Working on unit tests and need to pull harsha's changes

* I think I aligned this such that we can execute it via command line with the right behaviors

* Providing rest of parameters build_memory_index requires

* For some reason argparse is allowing a bunch of blank space to come in on arguments and they need stripped. It also needs to be using the right types.

* Recall test now works

* More unit tests for dynamic memory index

* Adding different range check for alpha, as the values are only really that realistic between 1 and 2. Below 1 is an error, and above 2 we'll probably make a warning going forward

* Storing this while I cut a new branch and walk back some work for a future branch

* Undoing the auto load of the dynamic index until I can debug why my tag vector files cause an error in diskann

* Updating the documentation for the python bindings. It's a lot closer than it was.

* Fixing a unit test

* add timers to dyanmic apps (#337)

* add timers to dyanmic apps

* clang format

* np.uintc vs. int for dtype of tags

* fixes to types in dynamic app

* cast tags to np.uintc array

* more timers

* added example code in comments in app file

* round elapsed

* fix typo

* fix typo

---------

Co-authored-by: Harsha Vardhan Simhadri <harsha-simhadri@users.noreply.github.com>
Co-authored-by: harsha vardhan simhadri <harsha.v.simhadri@gmail.com>

* Harshasi/timer python app (#341)

* added timer and QPS to static search app

* search only option to static index

* search only option to static index

* exposing metric in static function

* Force error on warnings and add casts to test directory (#342)

* Force error on warnings and add casts to test directory

* Use size_t for index of point IDs

* Refactor iterator and conditions for printing labels

---------

Co-authored-by: David Kaczynski <dkaczynski@microsoft.com>

* Enable Windows python bindings (#343)

* Use int64 for counter to fix windows compilation error

* Fix windows python bindings by adding install_lib command to move windows build output into python package

* Update to use Path instead of os

* Change batch_insert num_inserts signature to signed type for OpenMP compatibility

* Update num_inserts to int32_t per PR request

---------

Co-authored-by: Nick Caurvina <nicaurvi@microsoft.com>

* Use new macro(ENABLE_CUSTOM_LOGGER) to turn on Custom logger (#345)

* custom logger


---------

Co-authored-by: Yiyong Lin <yiyolin@microsoft.com>

* updting from std cpp 14 to cpp 17 (#352)

* updting from std cpp 14 to cpp 17

* adding cmake_cxx_standard flag

* CICD Refactor (#354)

* Refactored the build processes. Broke things into components as much as
possible. We have standalone actions for the build processes to make
sure they are consistent across push or PR builds, a format-check that
doesn't rely on cmake to be there to work, and centralized our
randomized data generation into a single action that can be called in
each section.

We now are reusing as many of the steps as we can without copy/pasting,
which should ensure we're not making mistakes.

* Fixing the dynamic tests, the paths to the data were wrong

---------

Co-authored-by: yashpatel007 <patelyash1311@gmail.com>

* Fix the disparity between disk and memory search for Universal label (#347)

* UNV Search Fix for Memory

* two places to update

* clang format

* unify find_common_filters function

* fix comments

- only return size of common filters from the find_common_filters function

* dummy comments

* clang format

* Reduce repetitive calls

* changing name and return type of function

* Remove compute_groundtruth from labels.yml (#363)

Co-authored-by: Yiyong Lin <yiyolin@microsoft.com>

* Handle some corner cases in generate_cache_list_from_sample_queries (#361)

Co-authored-by: Yiyong Lin <yiyolin@microsoft.com>

* Reduce the size of coord_scratch in SSDQueryScratch to reduce memory usage (#362)

* Remove useless coord_scratch in SSDQueryScratch to reduce memory usage


---------

Co-authored-by: Yiyong Lin <yiyolin@microsoft.com>

* Upload data and binary files to artifact in CI workflows (#366)

* Upload data and binary files to artifact so that we could debug issue locally when the workflows fails

* use different artifact name for different scenarios

---------

Co-authored-by: Yiyong Lin <yiyolin@microsoft.com>

* Python Type Enhancements (#364)

* Adding cosine distance - I didn't know we had that as a first level distance metric

* Making our mkl and iomp linking game more rigorously defined for the ubuntus

* Included latest as a path fragment twice on accident

* libmkl_def.so is named something different when installed via the intel oneapi installer

* Making a number of changes to homogenize our api (same parameters, minimize parameters as much as possible, etc)

* Stashing this and going to work on the CICD stuff, it's driving me nuts

* Fairly happy with the Python API now. Documentation needs another pass, the @overloads in the .pyi files need to be addressed, and documentation checked again.  The apps folder also needs updating to use fire instead of argparse

* Updated build to not use tcmalloc for pybind, as well as fixed the pyproject.toml so that cibuildwheel can actually successfully build our project.

* Making a change to in-mem-static for the new api and also adjusting the comment in in-mem-dynamic a bit, though... I probably shouldn't have

* Add unit test project based on boost_unit_test_framework (#365)

* Add unit test project based on boost_unit_test_framework

* Add another dockerfile for developers

* update path

---------

Co-authored-by: Yiyong Lin <yiyolin@microsoft.com>

* Fix inefficiency in constructing reverse label map (#373)

* single loop for reverse label map

* clang formatting

* unnecessary comments removed

* minor

---------

Co-authored-by: Varun Sivashankar <t-varunsi@microsoft.com>

* fixed a bug with loading medoids for sharded filtered index, and adde… (#368)

* fixed a bug with loading medoids for sharded filtered index, and added better caching for filtered index

clang-format

fixed minor cout error

addressed Yiyong's comments, and fixed a bug for finding medoid in sharded+filtered index

Fixed windows compile error (warnings)

Fix inefficiency in constructing reverse label map (#373)

* single loop for reverse label map

* clang formatting

* unnecessary comments removed

* minor

---------

Co-authored-by: Varun Sivashankar <t-varunsi@microsoft.com>

clang-formatted

* minor cleanup

* clang-format

---------

Co-authored-by: ravishankar <rakri@microsoft.com>

* patelyash/index factory (#340)

* gi# This is a combination of 2 commits.

remove _u, _s typedefs

* added some seed files

* add seed files

* New distance metric hierarchy

* Refactoring changes

* Fixing compile errors in refactored code

* Fixing compile errors

* DiskANN Builds with initial refactoring changes

* Saving changes for Ravi

* More refactoring

* Refactor

* Fixed most of the bugs related to _data

* add seed files

* gi# This is a combination of 2 commits.

remove _u, _s typedefs

* added some seed files

* New distance metric hierarchy

* Refactoring changes

* Fixing compile errors in refactored code

* Fixing compile errors

* DiskANN Builds with initial refactoring changes

* Saving changes for Ravi

* More refactoring

* Refactor

* Fixed most of the bugs related to _data

* Post merge with main

* Refactored version which compiles on Windows

* now compiles on linux

* minor clean-up

* minor bug fix

* minor bug

* clang format fix + build error fix

* clang format fix

* minor changes

* added back the fast_l2 feature

* added back set_start_points in index.cpp

* Version for review

* Incorporating Harsha's comments - 2

* move implementation of abstract data store methods to a cpp file

* clang format

* clang format

* Added slot manager file (empty) and fixed compile errors

* fixed a linux compile error

* clang

* debugging workflow failure

* clang

* more debug

* more debug

* debug for workflow

* remove slot manager

* Removed the #ifdef WINDOWS directive from class definitions

* Refactoring alignment factor into distance hierarchy

* Fixing cosine distance

* Ensuring we call preprocess_query always

* Fixed distance invocations

* fixed cosine bug, clang-formatted

* cleaned up and added comments

* clang-formatted

* more clang-format

* clang-format 3

* remove deleted code in scratch.cpp

* reverted clang to Microsoft

* small change

* Removed slot_manager from this PR

* newline at EOF in_mem_Graph_store.cpp

* rename distance_metric to distance_fn

* resolving PR comments

* minor bug fix for initialization

* creating index_factory

* using index factory to build inmem index

* clang format fix

* minor bug fix

* fixing build error

* replacing mem_store with abstract_mem_store + injecting data_store to Index

* minor fix

* clang format fix

* commenting data_store injection to prevent double invocation and mem leak (for now)

* fixing the build for fiters

* moving abstract index to abstract_index.h

* IndexBuildParamsbuilder to build IndexBuildParams properly with error checking

* fixing build errors

* fixing minor error

* refactoring index search to be simple

* clang format fix

* refactoring search_mem_index to use index factory

* clang fix

* minor fix

* minor fix for build

* optimize for fast l2 restore

* removing comments

* removing comments

* adding templating to IndexFactory (can't avoide it anymore)

* fixing build error

* fixing ubuntu build error

* ubuntu build exception fix

* passing num_pq_bytes

* giving one more shot to config dricen arch with boost::any (type erasure)

* clang fix

* modifying search to use boost::any

* fixing ubuntu build errors/warning

* created indexconfigbuilder and fixed a typo

* fixing error in pq build

* some comments + lazy_delete impl

* bumping to std c++17 & replacing boost::any with std::any

* clang fix

* c++ std 17 for ubuntu

* minor fix

* converting search to batch_search + A vector wrapper using std::any to store vector as a shared ptr

* adding AnyVector to encapsulate vector in std::any + adding basic yaml parser(WIP)

* adding wrapper code for vector and set, checked with Andrija

* fixinh ubuntu build error

* trying to resolve ubuntu build error

* testing test streaming index with IndexFactory

* fixing ubuntu build error

* fixing search for test insert delete consolidate

* refactored test_streaming_scenario

* refactored test_insert_delete_consolidate to use AbstractIndex and Indexfactory

* fixing ubuntu build error

* making build method in abstract index consistent

* some code cleanup + abstract_cpp to add implementation

* remoing coments and code cleanup

* build error fix

* fixing -Wreorder warning

* separating build structs to their header + refactor search and remove batch search

* fixing ubuntu build errors

* resolving segfault error from search_mem_index

* fixing query_result_tag allocation

* minor update

* search fix

* trying to fix windows latest build for dynamic index

* ading temp loggin to debug windows latest build issue

* removing logging for debug

* fixning windows latest build error for dynamix index search

* moving any wrappers to separate file + organizing code

* fixing check error

* updating private vsr naming convention

* minor update

* unravelig search methods in abstract index. Iteraton 1

* minor fix

* unused vars remove

* returning a unique_ptr to Abstract Index from index factory

* adding implementation from abstract_index.h to abstract_index.cpp

* making abstract index api to be more explicit (expriment)

* some code cleanup

* removing detected memory leaks (free up index)

* separtaing enums for data and graph stratagy

* Index ctor(config) now uses injected datastore from IndexFactory

* distance in index population in new config ctor

* resolving some comments from Andrija

* Resolving some restructuring comments by Andrija

* minor fix

* fixing ubuntu build error

* warning fix

* simplified get() in anywrappers

* making index config a unique ptr and owned by IndexFactory

* removing complex if/else calling recursively + added unimplemented TagT to AbsIdx

* renaming get_instance to create_instance

* clang format fix

* removing const_cast from any_wrapper

* fixing andrija's comments

* removing warnings

---------

Co-authored-by: harsha vardhan simhadri <harsha.v.simhadri@gmail.com>
Co-authored-by: Gopal Srinivasa <gopalsr@microsoft.com>
Co-authored-by: ravishankar <rakri@microsoft.com>
Co-authored-by: Harsha Vardhan Simhadri <harsha-simhadri@users.noreply.github.com>

* patelyash/index factory (#340) (#380)


---------

Co-authored-by: Yash Patel <47032340+yashpatel007@users.noreply.github.com>
Co-authored-by: harsha vardhan simhadri <harsha.v.simhadri@gmail.com>
Co-authored-by: Gopal Srinivasa <gopalsr@microsoft.com>
Co-authored-by: ravishankar <rakri@microsoft.com>
Co-authored-by: Harsha Vardhan Simhadri <harsha-simhadri@users.noreply.github.com>

* hot fix for python build (#383)

* some bug fix when enable the EXEC_EnV_OLS (#377)

* some bug fix when enable the EXEC_EnV_OLS

* avoid unit test failure

* unit test testing

* changed based on gopal's suggestion

* update load_impl(AlignedFileReader &reader)

* change the load_impl to be identical to objectstore

* remvoe blank

* Output distance file in memory index search (#382)

* Output distance file

* fix

---------

Co-authored-by: Shengjie Qian <shenqian@microsoft.com>

* Add WIN macro for non-win function (#360)

* Add WIN macro for non-win funtion

* fix vc16 compile issue

* fix compile issue

* fix compile issue

* fix compile issue

* clean up code

* small EXEC_ENV_OLS bug fix (#387)

* small bug fix

* test ubuntu fail

* formatting

* re-triggering unitest

* Python Refactor (#385)

* Refactor of diskannpy module code.

* 0.5.0.rc1 for python and enabling the build-python portion of the pr-test process.

* clang-format changes

* In theory this should speed up the python build drastically by only building the wheel for the python version and OS we're attempting to fan out to in our CICD job tree

* Missed a dollar sign

* Copy/pasting left a CICD step name that implied we were running a code formatting check when instead we were building a wheel.  This is now fixed.

* In theory, readying the release action too.  We won't know if it works until it merges and we cut a release, but at least the paths have been fixed

* Designated initializers just happened to work on linux but shouldn't have as they weren't added until cpp20

* Formatting

* Jinweizhang/filter paramsfix (#388)

* small bug fix

* test ubuntu fail

* formatting

* re-triggering unitest

* cause error, remove two character params

* cause error, remove two character params

* unit test fix

* clean up code

* add more accurate error handelling

* fix filter build

* re-trigger test

* try lower recall number

* test witl more value

* revert back to test unit test

* Update python-release.yml

Github actions fix: composite action `python-wheel` publishes wheels to the `wheels` artifact.  `python-release` workflow then looks for it in the `dist` artifact, which does not exist.

This is a CICD change only.

* Fixed inputs type-o (#391)

* Fixed inputs type-o

* Action 'checkout@v2' is deprecated

* Update pyproject.toml

Trying a new release of the python lib to see if there was a packaging error in the publication of rc1.

* Fixed param documentation (#393)

* Fixed param name in comments

* Hide rust/target

* Bypass errors in logging for non-msft-prod environments (#392)

* Removed the logger and verified that the logging capability is the root cause of our consistent segfault errors in python.  Perhaps it also will fix any issues in our label test too?  I'd like to push it to GH and see.

* Formatting fixes

* Revert "Formatting fixes"

This reverts commit 9042595614c0f3b5e72f61090538abdb6510af14.

* Revert "Removed the logger and verified that the logging capability is the root cause of our consistent segfault errors in python.  Perhaps it also will fix any issues in our label test too?  I'd like to push it to GH and see."

This reverts commit 7561009932ff109ed386c4f5d50983859e49b9e7.

* The custom logging implementation is causing segfaults in python. We're not sure exactly where, but this is the easiest and quickest way to getting a working python release.

* All the integration tests are failing, and there's a chance the virtual dtor on AbstractDataStore might be the culprit, though I am not sure why.  I'm hoping it is so it won't fall on the logging changes.

* Formatting. Again.

* Improve help formatting in CLI tools (#390)

* Added utilities to standardize help across cli tools.  #370

* Made three option groupings (required/optional/print)

* Moved common parameter descriptions to a common file.  #370

* Updated usage statement for search_disk_app #370

* Updated range_search_disk_index to use the new required/optional format.  #370

* Updated test apps to use the new help format.  #370

* Fixed format issue.  #370

* Updated help format for the 'build' apps. #370

* Fixed code formatting.  #370

* Added src/*.hpp to the clang format.  #370

* Moved header into the headers directory.  #370

* Added missing configs.  #370

* Removed superflous paths from include.  #370

* Added #pragma once.  #370

* Type-o fixes.  #370

* Fixed capitolization of constant.  #370

* Make fail_if_recall description more accurate.  #370

* Changed to using set notation.  #370

* Better explanations for some options.  #370

* Added short explanation of file format.  #370

---------

Co-authored-by: Jon McLean <none@example.com>
Co-authored-by: Jonathan McLean <Jonathan.McLean@microsoft.com>

* Python build with a far more portable wheel (#396)

* Identified the appropriate build flags to get a working python build that doesn't rely on -march=native or -mtune=native.  We've run benchmarks on multiple computers that indicate the only important flag other than -mavx2 -msse2 -mfma is -funroll-loops.  Optimization levels such as -O1, -O2, or -O3 actually makes for less performant code. -Ofast is unavailble for use in Python, as it causes problems with floating point math in Python

* 1.22 was left in a comment despite 1.25 being the value specified

* Python 3.8 is not supported by numpy 1.25, so we're removing it.

* Jomclean/write timings (#397)

* Work-in-progress commit adding JSON output for timings.  in-mem-static is complete

* Added timings to dynamic and total-time to static

* Update pyproject.toml (#398)

Using the correct README for our publication to pypi.

* Added filename to log (#399)

* Jinwei/fix in memory compile error (#401)

* small bug fix

* test ubuntu fail

* formatting

* re-triggering unitest

* add small fix for in_mem_data_store when EXEC_ENV_OLS is enabed

* fix: use the passed in io_limit (#403)

* fix: use the passed in io_limit

* fix to be clang-formatted

* DynamicMemoryIndex bug fixes (#404)

* While simply creating a unit test to repro Issue #400, I found a number of bugs that I needed to address just to get it to work the way I had intended. This does not yet have what I would consider a comprehensive suite of test coverage for the DynamicMemoryIndex, but we at least do save it with the metadata file, we can load it correctly, and saving *always* consolidate_deletes() prior to save if any item has been marked for deletion prior to save.

* We actually cannot save without compacting before save anyway. Removing the parameter from save() and hardcoding it to True until we can actually support it.

* Addressing some PR comments and readying a 0.5.0.rc5 release

* Pass nullptr as nullT when creating thread_data that's of ConcurrentQueue<SSDThreadData*> type, otherwise the default null_T is uninitialized, could point to arbitraty memory (#408)

* Preparing for 0.6.0 diskannpy release (#407)

* Some early staging for README updates and pyproject updates for a 0.6.0 release for diskannpy.

* Trying to fix the CI badge to point toward main's latest build

* Updating documentation for pdoc generation

* Documentation updates. Tightened up the API to drop list support (there were entirely too many cases where it wouldn't work, and it's easier to just tell people to convert it themselves)

* Some module reorganization to make pdoc actually display the docstrings for variables re-exported at the top level

* A copy paste happened that shouldn't have.

* Updating the apps to use the new 0.6.0 api

* Addressing PR feedback

* Some of the documentation changes didn't get made in both from_file or the constructor

* Added PDoc workflow to publish github pages documentation (#412)

* Added PDoc workflow

* Added documentation to the push-test workflow

* Added diskannpy to the env for pdoc to use

* Initial commit of doc publish workflow

* Tried heredoc to get python version

* Tried another way of getting the version

* Tried another way of getting the version

* Moved to docs/python path

* Removing the test harness

* Add dependencies per wheel

* Moved dependency tree to the 'push' file so it runs on push

* Added label name to the dependency file

* Trying maxtrix.os to get the os and version

* Moved doc generation from push-test to python-release.  Will add 'dev' doc generation to push-test

* Publish latest/version docs only on release.  Publish docs for every dev build on main.

* Install the local-file version of the library

* Disable branch check so I can test the install

* Use python build to build a wheel for use in documentation

* Tried changing to python instead of python3

* Added checkout depth in order to get boost

* Use the python build action to create wheel for documentation

* Revert "Use the python build action to create wheel for documentation"

This reverts commit d900c1d42c0f4bc8295955e0d6da7a868a073661.

* Added linux environment setup

* Made only publish dev when on main and added comments

---------

Co-authored-by: Jonathan McLean <Jonathan.McLean@microsoft.com>

* Update README.md (#416)

* moved ssd index defaults to defaults.h (#415)

* moved ssd index constants to defaults.h

* Add Performance Tests (#421)

* Have a working dockerfile to run perf tests and report the times they take. We can also capture stdout/stderr with it for further information, especially for tools that report internal latencies.

* Slight changes to the perf test script, a perf.yml for the github action

* allow multi-sector layout for large vectors (#417)

* make sector node an inline function

* convert offset_node macro to inline method

* rename member vars to start with underscore in pq_flash_index.h

* added support in create_disk_index

* add read sector util

* load_cache_list now uses read_blocks util

* allow nullptr for read_nodes

* BFS cache generation uses util

* add num_sectors info to cache_beam_Search

* add CI test for 1020,1024,1536D float and 4096D int8 rand vector on disk

* Consolidate Index Constructors (#418)

* initial commit

* updating python bindings to use new ctor

* python binding error fix

* error fix

* reverting some changes -> experiment

* removing redundnt code from native index

* python build error fix

* tyring to resolve python build error

* attempt at python build fix

* adding IndexSearchParams

* setting search threads to non zero

* minor check removed

* eperiment 3-> making distance fully owned by data_store

* exp 3 clang fix

* exp 4

* making distance as unique_ptr

* trying to fix build

* finally fixing problem

* some minor fix

* adding dll export to index_factory static function

* adding dll export for static fn in index_factory

* code cleanup

* resolving gopal's comments

* resolving build failures

* Add convenience functions for parsing the PQ index (#349)

* move read_nodes to public, add get_pq_vector and get_num_points

* clang-format

* Match new private var naming convention

* more private (_) fixes

* VID->vid

* VID->vid cpp

* fix OLS build (#428)

* fix OLS build

* Add a build to CI with feature flags enabled

* In Memory Graph Store (#395)

* inmem_graph_store initial impl

* barebones of in mem graph store

* refactoring index to use index factory

* clang format fix

* making enum to enum class (c++ 11 style) for scope resolution with same enum values

* cleaning up API for GraphSore

* moving _nd back to index class

* resolving PR comments

* error fix

* error fix for dynamic

* resolving PR comments

* removing _num_frozen_point from graph store

* minor fix

* moving _start back to main + minor update in graph store api to support that

* adding requested changes from Gopal

* removing reservations

* resolving namespace resolution for defaults after build failure

* minor update

* minor update

* speeding up location update logic while repositioning

* updated with reserving mem for graph neighbours upfront

* build error fix

* minor update in assert

* initial commit

* updating python bindings to use new ctor

* python binding error fix

* error fix

* reverting some changes -> experiment

* removing redundnt code from native index

* python build error fix

* tyring to resolve python build error

* attempt at python build fix

* adding IndexSearchParams

* setting search threads to non zero

* minor check removed

* eperiment 3-> making distance fully owned by data_store

* exp 3 clang fix

* exp 4

* making distance as unique_ptr

* trying to fix build

* finally fixing problem

* some minor fix

* adding dll export to index_factory static function

* adding dll export for static fn in index_factory

* code cleanup

* resolving errors after merge

* resolving build errors

* fixing build error for stitched index

* resolving build errors

* removing max_observed_degree set()

* removing comments + typo fix

* replacing add_neighbour with set_neighbours where we can

* error fix

* Undo mistake, let frontier read in PQ flash index be asynchronous (#434)

* Undo mistake, let frontier read in PQ flash index be asynchronous

* address changes requested

* Reduce CI tests for multi-sector disk layout from 10K to 5K points so… (#439)

* Reduce CI tests for multi-sector disk layout from 10K to 5K points so they run faster

* turn off 1024D

* hot fix definate mem_leaks (#440)

* add num_Threads to indexwriteparams in sharded build (#438)

* Added clarity to the universal label (#442)

* Remove IndexWriteParams from build method. (#441)

* removing write_params from buidl and taking it upfront in Index Ctor

* renaming build_params to filter params

* Type hints and returns actually align this time. (#444)

* working draft PR for cleaning up disk based filter search (#414)

* made changes to clean up filter number conversion, and fixed bug with universal filter search

* minor typecast fix

---------

Co-authored-by: rakri <rakri@microsoft.com>

* Fixes #432, bug in using openmp with gcc and omp_get_num_threads() (#445)

* Fixes #432, bug in using openmp with gcc and omp_get_num_threads() only reporting the number of threads collaborating on the current code region not available overall. I made this error and transitioned us from omp_get_num_procs() about 5 or 6 months ago and only with bug #432 did I really get to see how problematic my naive expectations were.

* Removed cosine distance metric from disk index until we can properly fix it in pqflashindex. Documented what distance metrics can be used with what vector dtypes in tables in the documentation.

* Preparing for 0.6.1 release (#447)

* Release documentation from the release tag instead of main (#448)

* Build streaming index of labeled data (#376)

* Add bool param for building a graph of labeled data

* Add arguments for building labeled index

* Pass arguments for labeled index

* Light renaming

* Handle labels in insert_point

* Fix missing semicolon

* Add initial label handling logic

* Use unlabeled algo for uniquely labeled point

* Ignore frozen points when checking labels

* Fix missing newline

* Move label-specific logic to threadsafe zone

* Check for frozen points when assert num points and num labeled points

* Fix file name concatenation for label metadata

* inmem_graph_store initial impl

* Use Lbuild to append to pruned_list during filter build

* Add label counts for deleting from streaming index

* Fix typo

* Fix conditions for testing

* Add medoid search to support deleting label medoids from graph

* resolvig error with bfs_medoid_search()

* trying to create 2 pruned_lists and combine them

* Clear pool between calls to search_for_point_and_prune. Fix integer math

* Update pruned_list algo for link method

* making fz_points to be medoids for labels encountered

* repositioning medoids as well because they are fz points when compacting data

* removing unrequired method

* rebasing from main

* adding tests in yml workflow for dynamic index with labels

* quick fix

* removing combining of unfiltered + filtered list for now

* trying to resolve disk search poor performance

* incleasing L size while searching disk index

* minor roolback

* updating dynamic-label to not use tag file while computing GT

* altering some test search L values

* adding unfiltered search for filtered batch build index

* adding compute gt for zipf dist labels in labsls wowrkflow

* searching filtered streaming index with popular label for now

* reposition fz points as medoids for filtered dynamic build

* minor renaming vars

* seoparate functio for insert opoint with labels and without labels

* clang error fix

* barebones of in mem graph store

* refactoring index to use index factory

* clang format fix

* window build fix

* making enum to enum class (c++ 11 style) for scope resolution with same enum values

* cleaning up API for GraphSore

* resolving comments

* clang error fix

* adding some comments

* moving _nd back to index class

* removing funcrion reposition medoidds its not required, incorporated into reposition_points

* altering -L (32->5) and -R (16->32) whhile building filterted disk index to work well with modified connections in algo

* updating docs -> dynamic_index.md to have info on how to build and search filtered dynamic index

* updating docs

* updateing _pts_to_labels when repositioning fz_points

* error fix

* clang fix

* making sure _pts_to_labels are not empty

* fixing dynamic-label build error

* code improvements

* adding logic for test_ins_del_consolidate to support filtered index

* resolving PR comments

* error fix

* error fix for dynamic

* now test insert delete consolidate support building filters

* lowering recal in case of test insert delete consolidte

* resolving PR comments

* removing _num_frozen_point from graph store

* minor fix

* moving _start back to main + minor update in graph store api to support that

* adding a lock before detect_common_filter + minor naming improvement

* adding requested changes from Gopal

* removing reservations

* resolving namespace resolution for defaults after build failure

* minor update

* minor update

* speeding up location update logic while repositioning

* updated with reserving mem for graph neighbours upfront

* build error fix

* minor update in assert

* initial commit

* updating python bindings to use new ctor

* python binding error fix

* error fix

* reverting some changes -> experiment

* removing redundnt code from native index

* python build error fix

* tyring to resolve python build error

* attempt at python build fix

* adding IndexSearchParams

* setting search threads to non zero

* minor check removed

* eperiment 3-> making distance fully owned by data_store

* exp 3 clang fix

* exp 4

* making distance as unique_ptr

* trying to fix build

* finally fixing problem

* some minor fix

* adding dll export to index_factory static function

* adding dll export for static fn in index_factory

* code cleanup

* resolving errors after merge

* resolving build errors

* fixing build error for stitched index

* resolving build errors

* removing max_observed_degree set()

* removing comments + typo fix

* replacing add_neighbour with set_neighbours where we can

* error fix

* minor fix

* fixing error introduced while rebasing

* fixing error for dynamic filtered index

* resolving dynamic build deadlick error

* resolving error with test_insert_del_consolidate for dynamic filter build

* minor code cleanup

* refactoring fz_pts and filter_index to be property of IndexConfig and hence Index

* removing write_params from build()

* removing write_params from buidl and taking it upfront in Index Ctor

* minor fix

* renaming build_params to filter params

* fixing errors on auto merge

* auto decide universal_label experiment

* resolving bug with universal lable

* resolving dynamic labels error, if there are unused fz points

* exposing set_universal_label() through abstract index

* minor update: sanity check

* minor update to search

* including tag file while computing GT

* generating compacted label file and using it in generate GT

* minor fix

* resolving New PR comments (minor typo fixes)

* renaming _pts_to_labels to _tag_to_labels + adding a warning for consolidate deletes and quality of index

* minor name chnage + code cleanup

* clang format fix

* adding locks for filter data_structures

* avoiding deadock

* universal label defination update

* reverting locks on _location_to_labels as its causing problems with large dataset

* adding locks for _label_to_medoid_id

* Update dynamic_index.md

* Update dynamic-labels.yml

* renaming some variables

---------

Co-authored-by: David Kaczynski <dkaczynski@microsoft.com>
Co-authored-by: yashpatel007 <patelyash1311@gmail.com>
Co-authored-by: Yash Patel <47032340+yashpatel007@users.noreply.github.com>
Co-authored-by: Harsha Vardhan Simhadri <harsha-simhadri@users.noreply.github.com>

* Fix typo in SSD_index.md (#466)

* add check for .enc extension to support encryption (#467)

* add check for .enc extension to support encryption

* check rotation_matrix file in file blobs

* read from MemoryMappedFile when EXEC_ENV_OLS is defined (#471)

* read from MemoryMappedFile when EXEC_ENV_OLS is defined

* fix is_open/close which stringstream does not have

* fix formating to comply with clang

* fix labels.yml: create tmp directory before search_diskk_index is run

* fix to reset stream after reads

* rename 'content' variable to avoid duplicates (#475)

* read file in one time (#460)

* read whole label file to memory, use string find instead stringstream

* format doc

* Bump rustix from 0.37.20 to 0.37.25 in /rust (#479)

Bumps [rustix](https://github.com/bytecodealliance/rustix) from 0.37.20 to 0.37.25.
- [Release notes](https://github.com/bytecodealliance/rustix/releases)
- [Commits](https://github.com/bytecodealliance/rustix/compare/v0.37.20...v0.37.25)

---
updated-dependencies:
- dependency-name: rustix
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* correct index_path_prefix in __init__ function of static disk index (#483)

* Adding Filtered Index support to Python bindings (#482)

* Halfway approach to the new indexfactory, but it doesn't have the same featureset as the old way. Committing this for posterity but reverting my changes ultimately

* Revert "Halfway approach to the new indexfactory, but it doesn't have the same featureset as the old way. Committing this for posterity but reverting my changes ultimately"

This reverts commit 03dccb599449881f64664a10b397a790a7d00985.

* Adding filtered search. API is going to change still.

* Further enhancements to the new filter capability in the static memory index.

* Ran automatic formatting

* Fixing my logic and ensuring the unit tests pass.

* Setting this up as a rc build first

* list[list[Hashable]] -> list[list[str]]

* Adding halfway to a solution where we query for more items than exist in the filter set. We need to replicate this behavior across all indices though - dynamic, static disk and memory w/o filters, etc

* Removing the import of Hashable too

* Fixing index_prefix_path bug in python for StaticMemoryIndex (#491)

* Fixing the same bug I had in static disk index inside of static memory index as well.

* Unit tests and a better understanding of why the unit tests were successful despite this bug

* Handle io_setup error properly (#465)

* Address race condition in `iterate_to_fixed_point` (#478)

Co-authored-by: Siddharth Gollapudi <t-gollapudis@microsoft.com>

* Use TCMalloc to fix system memory leak (#494)

* add fix for memory leak

* cmake change for enable tcmalloc

* add hot fix for cmake for boost and tcmalloc

* fix indentation

* identitation

* change camke set on after cmake_minimum_required

* unset tcmalloc for PYBIND

* unset envirvariable beforehead

* set off

* exlucde the compile def for pybind

* disable for pybind

* Adding a new PQ Distance Metric and PQ Data Store (#384)

* Added PQ distance hierarchy

Changes to CMakelists

PQDataStore version that builds correctly

Clang-format

* Fixing compile issues after rebase to main

* minor renaming functions

* fixed small bug post rebasing with index factory

* Changes to index factory to support PQDataStore

* Merged graph_store and pq_data_store

* Implementing preprocessing for inmemdatastore

* Incorporating code review comments

* minor bugfix for PQ data allocation

* clang-formatted

* Incorporating CR comments

* Fixing compile error

* minor bug fix + clang-format

* Update pq.h

* Fixing warnings about struct/class incompatibility

---------

Co-authored-by: Gopal Srinivasa <gopalsr@microsoft.com>
Co-authored-by: ravishankar <rakri@microsoft.com>
Co-authored-by: gopalrs <33950290+gopalrs@users.noreply.github.com>

* Bump zerocopy from 0.6.1 to 0.6.6 in /rust (#499)

Bumps [zerocopy](https://github.com/google/zerocopy) from 0.6.1 to 0.6.6.
- [Release notes](https://github.com/google/zerocopy/releases)
- [Changelog](https://github.com/google/zerocopy/blob/main/CHANGELOG.md)
- [Commits](https://github.com/google/zerocopy/commits)

---
updated-dependencies:
- dependency-name: zerocopy
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix calculation of current_point_offset in test_insert_consolidate_deletes (#501)

The program builds the streaming index after two optional steps: 1) skipping S points from the input file and 2) batch building of initial index using B points from the input file.

After these two steps, the offset to the input file should be S + B, but the current code first sets it to S in line 163 then overwrites it to B in line 249, instead of adding B to the offset. The tool which `test_insert_deletes_consolidate` was based on was using `+=` in the modified line.

* add16bytes tag type (#506)

* add 16 bytes tag type

* clean up code

* format doc

* fix compile issue

* fix compile issue

* revert change

* format doc

* separate static search and streaming search

* clean up code

* resolve comment

* format doc

* fix test

* resolve comment

* Rakri/cosine bug fix (#450)

* compiles, but need to verify

* fixed windows compiler warning

* minor typo

* added cosine unit test with unnormalized data

* minor typo in user prompt cosine/l2

* cosine was already supported in groundtruth, edited the message to say so

* clang-format

---------

Co-authored-by: rakri <rakri@microsoft.com>

* Version bump 0.7.0rc2->0.7.0 (#510)

* Version bump 0.7.0rc2->0.7.0

Preparing diskannpy for 0.7.0 release (filter support, static memory indices only)

* Update pyproject.toml

the GPG key from (presumably) 2019 is no longer valid

* Update pyproject.toml

* Update python-release.yml

By default, GITHUB_TOKEN no longer has write permissions - you have to explicitly ask for it in the specific job that needs it.

We use write permissions to update the Github release action that updates the published build artifacts with the results of the release flow.

* Allow documentation to be published to our gh-pages branch (#511)

* Update push-test.yml (#512)

* Bug fix for dlvs (#509)

* Fix small bugs for DLVS path.

* Easier for user to use.

---------

Co-authored-by: REDMOND\ninchen <ninchen@microsoft.com>

* add wait() method to AlignedFileReader (#518)

* Add simplified functions for product quantization (#514)

* Add simplified functions for product quantization

* Fixing formatting errors

* Fixing clang-format issue

* Fixing another set of clang-format issues

---------

Co-authored-by: Michael Popov (from Dev Box) <mipopo@microsoft.com>

* Create in memory data store/graph store with at least max_points as 1 (#523)

* create in memory data store/graph store with at least max_points as 1

* fix code formatting

* replace callback driven wait with new Wait() method (#526)

* wait on completeCount if callback is used (#532)

* Fix PQScratch memory leak (#522)

* fix memory leak

* FIXED clang-format error

* FIXED SSDQueryScratch Destroy OOM

* fix compile issue

* add interface

* add interface

* change inteface

* move function to public

* remove hard code unv label num

* fix convert issue

* fix some issue

* Bump openssl from 0.10.55 to 0.10.60 in /rust (#496)

Bumps [openssl](https://github.com/sfackler/rust-openssl) from 0.10.55 to 0.10.60.
- [Release notes](https://github.com/sfackler/rust-openssl/releases)
- [Commits](https://github.com/sfackler/rust-openssl/compare/openssl-v0.10.55...openssl-v0.10.60)

---
updated-dependencies:
- dependency-name: openssl
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* fix issues

* fix issues

* tune perf

* test remove lock

* try shared lock

* change to shared lock

* try perfetch

* fix some issues

* fix issue

* skip unfilter search while Lindex = 1

* reserve queue size with max search lsit

* revert change

* revert change

* clean up code

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: jinwei14 <janviezhang14@gmail.com>
Co-authored-by: Yash Patel <47032340+yashpatel007@users.noreply.github.com>
Co-authored-by: patelyash <patelyash@microsoft.com>
Co-authored-by: Varun Sivashankar <t-varunsi@microsoft.com>
Co-authored-by: David Kaczynski <dkaczynski@gmail.com>
Co-authored-by: ravishankar <rakri@microsoft.com>
Co-authored-by: David Kaczynski <dkaczynski@microsoft.com>
Co-authored-by: Siddharth Gollapudi <t-gollapudis@microsoft.com>
Co-authored-by: Neelam Mahapatro <nmahapatro@microsoft.com>
Co-authored-by: Harsha Vardhan Simhadri <harshasi@microsoft.com>
Co-authored-by: Harsha Vardhan Simhadri <harsha-simhadri@users.noreply.github.com>
Co-authored-by: Dax Pryce <daxpryce@microsoft.com>
Co-authored-by: Jakub Tarnawski <jakub.tarnawski@microsoft.com>
Co-authored-by: Yiyong Lin <lyysdy@foxmail.com>
Co-authored-by: Yiyong Lin <yiyolin@microsoft.com>
Co-authored-by: Andrija Antonijevic <theantony@users.noreply.github.com>
Co-authored-by: Neelam Mahapatro <37527155+NeelamMahapatro@users.noreply.github.com>
Co-authored-by: harsha vardhan simhadri <harsha.v.simhadri@gmail.com>
Co-authored-by: gopalrs <33950290+gopalrs@users.noreply.github.com>
Co-authored-by: Gopal Srinivasa <gopalsr@microsoft.com>
Co-authored-by: yashpatel007 <patelyash1311@gmail.com>
Co-authored-by: nicaurvi <nyecarr@gmail.com>
Co-authored-by: Nick Caurvina <nicaurvi@microsoft.com>
Co-authored-by: Varun Sivashankar <44419819+varunsivashankar@users.noreply.github.com>
Co-authored-by: rakri <78582691+rakri@users.noreply.github.com>
Co-authored-by: varat73 <124637813+varat73@users.noreply.github.com>
Co-authored-by: JieCin <1875919175@qq.com>
Co-authored-by: Shengjie Qian <shenqian@microsoft.com>
Co-authored-by: Jon McLean <4429525+jonmclean@users.noreply.github.com>
Co-authored-by: Jon McLean <none@example.com>
Co-authored-by: Jonathan McLean <Jonathan.McLean@microsoft.com>
Co-authored-by: litan1 <106347144+ltan1ms@users.noreply.github.com>
Co-authored-by: Philip Adams <35666630+PhilipBAdams@users.noreply.github.com>
Co-authored-by: Shawn Zhong <github@shawnzhong.com>
Co-authored-by: Huisheng Liu <hliu@microsoft.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Xiangyu Wang <wxyucs@gmail.com>
Co-authored-by: Siddharth Gollapudi <siddharthgollapudi99@gmail.com>
Co-authored-by: NingyuanChen <chenningyuan008@hotmail.com>
Co-authored-by: REDMOND\ninchen <ninchen@microsoft.com>
Co-authored-by: Michael Popov <mpopov2012@gmail.com>
Co-authored-by: Michael Popov (from Dev Box) <mipopo@microsoft.com>
Co-authored-by: luyuncheng <luyuncheng@bytedance.com>
---
 .github/actions/build/action.yml              |   13 +-
 .../generate-high-dim-random/action.yml       |   28 +
 .github/actions/generate-random/action.yml    |    5 +-
 .github/workflows/build-python-pdoc.yml       |   80 ++
 .github/workflows/disk-pq.yml                 |   10 +
 .github/workflows/dynamic-labels.yml          |  102 ++
 .github/workflows/labels.yml                  |   27 +-
 .github/workflows/multi-sector-disk-pq.yml    |   60 +
 .github/workflows/perf.yml                    |   26 +
 .github/workflows/pr-test.yml                 |    6 +
 .github/workflows/push-test.yml               |   18 +
 .github/workflows/python-release.yml          |    7 +
 AnyBuildLogs/latest.txt                       |    1 +
 CMakeLists.txt                                |   22 +-
 README.md                                     |   10 +-
 apps/build_disk_index.cpp                     |    2 +
 apps/build_memory_index.cpp                   |   74 +-
 apps/build_stitched_index.cpp                 |    4 +-
 apps/search_disk_index.cpp                    |   12 +-
 apps/search_memory_index.cpp                  |   24 +-
 apps/test_insert_deletes_consolidate.cpp      |  142 +-
 apps/test_streaming_scenario.cpp              |  192 ++-
 apps/utils/compute_groundtruth.cpp            |    3 +-
 .../utils/compute_groundtruth_for_filters.cpp |    5 -
 apps/utils/count_bfs_levels.cpp               |    3 +-
 apps/utils/rand_data_gen.cpp                  |   52 +-
 include/abstract_data_store.h                 |   22 +-
 include/abstract_graph_store.h                |   49 +-
 include/abstract_index.h                      |   28 +-
 include/abstract_scratch.h                    |   35 +
 include/aligned_file_reader.h                 |    5 +
 include/defaults.h                            |   10 +
 include/distance.h                            |    2 +-
 include/filter_utils.h                        |    4 +
 include/in_mem_data_store.h                   |   20 +-
 include/in_mem_graph_store.h                  |   38 +-
 include/index.h                               |  107 +-
 include/index_build_params.h                  |   37 +-
 include/index_config.h                        |   94 +-
 include/index_factory.h                       |   22 +-
 include/natural_number_map.h                  |    3 -
 include/parameters.h                          |   30 +-
 include/pq.h                                  |   50 +-
 include/pq_common.h                           |   30 +
 include/pq_data_store.h                       |   97 ++
 include/pq_flash_index.h                      |  151 ++-
 include/pq_l2_distance.h                      |   87 ++
 include/pq_scratch.h                          |   23 +
 include/program_options_utils.hpp             |    4 +-
 include/quantized_distance.h                  |   56 +
 include/scratch.h                             |   35 +-
 include/tag_uint128.h                         |   68 +
 include/types.h                               |    1 +
 include/utils.h                               |   40 +-
 include/windows_slim_lock.h                   |   10 +
 pyproject.toml                                |    5 +-
 python/README.md                              |    2 +-
 python/include/builder.h                      |    5 +-
 python/include/static_disk_index.h            |   19 +-
 python/include/static_memory_index.h          |   26 +-
 python/src/_builder.py                        |   79 +-
 python/src/_builder.pyi                       |   12 +-
 python/src/_common.py                         |   22 +-
 python/src/_dynamic_memory_index.py           |    8 +-
 python/src/_static_disk_index.py              |   12 +-
 python/src/_static_memory_index.py            |   55 +-
 python/src/builder.cpp                        |   69 +-
 python/src/diskann_bindings.cpp               |    1 -
 python/src/dynamic_memory_index.cpp           |   34 +-
 python/src/module.cpp                         |    5 +-
 python/src/static_disk_index.cpp              |    5 +-
 python/src/static_memory_index.cpp            |   34 +-
 python/tests/test_dynamic_memory_index.py     |   33 +-
 python/tests/test_static_disk_index.py        |   58 +-
 python/tests/test_static_memory_index.py      |  164 ++-
 rust/Cargo.lock                               |   42 +-
 scripts/dev/install-dev-deps-ubuntu.bash      |    2 +-
 scripts/perf/Dockerfile                       |   31 +
 scripts/perf/README.md                        |   20 +
 scripts/perf/perf_test.sh                     |   40 +
 src/CMakeLists.txt                            |    2 +-
 src/abstract_data_store.cpp                   |    1 -
 src/abstract_index.cpp                        |  168 ++-
 src/disk_utils.cpp                            |  186 ++-
 src/distance.cpp                              |    8 +-
 src/dll/CMakeLists.txt                        |    7 +-
 src/filter_utils.cpp                          |   75 +-
 src/in_mem_data_store.cpp                     |   45 +-
 src/in_mem_graph_store.cpp                    |  225 +++-
 src/index.cpp                                 | 1148 ++++++++---------
 src/index_factory.cpp                         |  105 +-
 src/linux_aligned_file_reader.cpp             |   12 +-
 src/natural_number_map.cpp                    |    2 +
 src/partition.cpp                             |    2 +-
 src/pq.cpp                                    |  140 +-
 src/pq_data_store.cpp                         |  260 ++++
 src/pq_flash_index.cpp                        |  887 +++++++------
 src/pq_l2_distance.cpp                        |  284 ++++
 src/restapi/search_wrapper.cpp                |    4 +-
 src/scratch.cpp                               |   71 +-
 src/utils.cpp                                 |   45 +-
 .../index_write_parameters_builder_tests.cpp  |    7 +-
 unit_tester.sh                                |   16 +-
 workflows/SSD_index.md                        |    6 +-
 workflows/dynamic_index.md                    |   43 +-
 workflows/filtered_ssd_index.md               |    2 +-
 106 files changed, 4852 insertions(+), 1768 deletions(-)
 create mode 100644 .github/actions/generate-high-dim-random/action.yml
 create mode 100644 .github/workflows/build-python-pdoc.yml
 create mode 100644 .github/workflows/dynamic-labels.yml
 create mode 100644 .github/workflows/multi-sector-disk-pq.yml
 create mode 100644 .github/workflows/perf.yml
 create mode 100644 AnyBuildLogs/latest.txt
 create mode 100644 include/abstract_scratch.h
 create mode 100644 include/pq_common.h
 create mode 100644 include/pq_data_store.h
 create mode 100644 include/pq_l2_distance.h
 create mode 100644 include/pq_scratch.h
 create mode 100644 include/quantized_distance.h
 create mode 100644 include/tag_uint128.h
 delete mode 100644 python/src/diskann_bindings.cpp
 create mode 100644 scripts/perf/Dockerfile
 create mode 100644 scripts/perf/README.md
 create mode 100644 scripts/perf/perf_test.sh
 create mode 100644 src/pq_data_store.cpp
 create mode 100644 src/pq_l2_distance.cpp

diff --git a/.github/actions/build/action.yml b/.github/actions/build/action.yml
index 2b470d9dc..219d9d630 100644
--- a/.github/actions/build/action.yml
+++ b/.github/actions/build/action.yml
@@ -25,4 +25,15 @@ runs:
         mkdir dist
         mklink /j .\dist\bin .\x64\Release\
       shell: cmd
-    # ------------ End Windows Build ---------------
\ No newline at end of file
+    # ------------ End Windows Build ---------------
+    # ------------ Windows Build With EXEC_ENV_OLS and USE_BING_INFRA ---------------
+    - name: Add VisualStudio command line tools into path
+      if: runner.os == 'Windows'
+      uses: ilammy/msvc-dev-cmd@v1
+    - name: Run configure and build for Windows with Bing feature flags
+      if: runner.os == 'Windows'
+      run: |
+        mkdir build_bing && cd build_bing && cmake .. -DEXEC_ENV_OLS=1 -DUSE_BING_INFRA=1 -DUNIT_TEST=True && msbuild diskann.sln /m /nologo /t:Build /p:Configuration="Release" /property:Platform="x64" -consoleloggerparameters:"ErrorsOnly;Summary"
+        cd ..
+      shell: cmd
+    # ------------ End Windows Build ---------------
diff --git a/.github/actions/generate-high-dim-random/action.yml b/.github/actions/generate-high-dim-random/action.yml
new file mode 100644
index 000000000..65e9b7e38
--- /dev/null
+++ b/.github/actions/generate-high-dim-random/action.yml
@@ -0,0 +1,28 @@
+name: 'Generating Random Data (Basic)'
+description: 'Generates the random data files used in acceptance tests'
+runs:
+  using: "composite"
+  steps:
+    - name: Generate Random Data (Basic)
+      run: |
+        mkdir data
+        
+        echo "Generating random 1020,1024,1536D float and 4096 int8 vectors for index"
+        dist/bin/rand_data_gen --data_type float --output_file data/rand_float_1020D_5K_norm1.0.bin -D 1020 -N 5000 --norm 1.0
+        #dist/bin/rand_data_gen --data_type float --output_file data/rand_float_1024D_5K_norm1.0.bin -D 1024 -N 5000 --norm 1.0
+        dist/bin/rand_data_gen --data_type float --output_file data/rand_float_1536D_5K_norm1.0.bin -D 1536 -N 5000 --norm 1.0
+        dist/bin/rand_data_gen --data_type int8  --output_file data/rand_int8_4096D_5K_norm1.0.bin  -D 4096 -N 5000 --norm 1.0
+        
+        echo "Generating random 1020,1024,1536D float and 4096D int8 avectors for query"
+        dist/bin/rand_data_gen --data_type float --output_file data/rand_float_1020D_1K_norm1.0.bin -D 1020 -N 1000 --norm 1.0
+        #dist/bin/rand_data_gen --data_type float --output_file data/rand_float_1024D_1K_norm1.0.bin -D 1024 -N 1000 --norm 1.0
+        dist/bin/rand_data_gen --data_type float --output_file data/rand_float_1536D_1K_norm1.0.bin -D 1536 -N 1000 --norm 1.0
+        dist/bin/rand_data_gen --data_type int8  --output_file data/rand_int8_4096D_1K_norm1.0.bin  -D 4096 -N 1000 --norm 1.0
+
+        echo "Computing ground truth for 1020,1024,1536D float and 4096D int8 avectors for query"
+        dist/bin/compute_groundtruth  --data_type float --dist_fn l2 --base_file data/rand_float_1020D_5K_norm1.0.bin --query_file data/rand_float_1020D_1K_norm1.0.bin --gt_file data/l2_rand_float_1020D_5K_norm1.0_1020D_1K_norm1.0_gt100 --K 100
+        #dist/bin/compute_groundtruth  --data_type float --dist_fn l2 --base_file data/rand_float_1024D_5K_norm1.0.bin --query_file data/rand_float_1024D_1K_norm1.0.bin --gt_file data/l2_rand_float_1024D_5K_norm1.0_1024D_1K_norm1.0_gt100 --K 100
+        dist/bin/compute_groundtruth  --data_type float --dist_fn l2 --base_file data/rand_float_1536D_5K_norm1.0.bin --query_file data/rand_float_1536D_1K_norm1.0.bin --gt_file data/l2_rand_float_1536D_5K_norm1.0_1536D_1K_norm1.0_gt100 --K 100
+        dist/bin/compute_groundtruth  --data_type int8 --dist_fn l2 --base_file data/rand_int8_4096D_5K_norm1.0.bin --query_file data/rand_int8_4096D_1K_norm1.0.bin --gt_file data/l2_rand_int8_4096D_5K_norm1.0_4096D_1K_norm1.0_gt100 --K 100
+        
+      shell: bash
diff --git a/.github/actions/generate-random/action.yml b/.github/actions/generate-random/action.yml
index 75554773e..2755067df 100644
--- a/.github/actions/generate-random/action.yml
+++ b/.github/actions/generate-random/action.yml
@@ -9,18 +9,21 @@ runs:
         
         echo "Generating random vectors for index"
         dist/bin/rand_data_gen --data_type float --output_file data/rand_float_10D_10K_norm1.0.bin -D 10 -N 10000 --norm 1.0
+        dist/bin/rand_data_gen --data_type float --output_file data/rand_float_10D_10K_unnorm.bin -D 10 -N 10000 --rand_scaling 2.0
         dist/bin/rand_data_gen --data_type int8 --output_file data/rand_int8_10D_10K_norm50.0.bin -D 10 -N 10000 --norm 50.0
         dist/bin/rand_data_gen --data_type uint8 --output_file data/rand_uint8_10D_10K_norm50.0.bin -D 10 -N 10000 --norm 50.0
         
         echo "Generating random vectors for query"
         dist/bin/rand_data_gen --data_type float --output_file data/rand_float_10D_1K_norm1.0.bin -D 10 -N 1000 --norm 1.0
+        dist/bin/rand_data_gen --data_type float --output_file data/rand_float_10D_1K_unnorm.bin -D 10 -N 1000 --rand_scaling 2.0
         dist/bin/rand_data_gen --data_type int8 --output_file data/rand_int8_10D_1K_norm50.0.bin -D 10 -N 1000 --norm 50.0
         dist/bin/rand_data_gen --data_type uint8 --output_file data/rand_uint8_10D_1K_norm50.0.bin -D 10 -N 1000 --norm 50.0
-        
+
         echo "Computing ground truth for floats across l2, mips, and cosine distance functions"
         dist/bin/compute_groundtruth  --data_type float --dist_fn l2 --base_file data/rand_float_10D_10K_norm1.0.bin --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/l2_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --K 100
         dist/bin/compute_groundtruth  --data_type float --dist_fn mips --base_file data/rand_float_10D_10K_norm1.0.bin --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/mips_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --K 100
         dist/bin/compute_groundtruth  --data_type float --dist_fn cosine --base_file data/rand_float_10D_10K_norm1.0.bin --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/cosine_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --K 100
+        dist/bin/compute_groundtruth  --data_type float --dist_fn cosine --base_file data/rand_float_10D_10K_unnorm.bin --query_file data/rand_float_10D_1K_unnorm.bin --gt_file data/cosine_rand_float_10D_10K_unnorm_10D_1K_unnorm_gt100 --K 100
         
         echo "Computing ground truth for int8s across l2, mips, and cosine distance functions"
         dist/bin/compute_groundtruth  --data_type int8 --dist_fn l2 --base_file data/rand_int8_10D_10K_norm50.0.bin --query_file data/rand_int8_10D_1K_norm50.0.bin --gt_file data/l2_rand_int8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --K 100
diff --git a/.github/workflows/build-python-pdoc.yml b/.github/workflows/build-python-pdoc.yml
new file mode 100644
index 000000000..28766ad02
--- /dev/null
+++ b/.github/workflows/build-python-pdoc.yml
@@ -0,0 +1,80 @@
+name: DiskANN Build PDoc Documentation
+on: [workflow_call]
+jobs:
+  build-reference-documentation:
+    permissions:
+      contents: write
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 1
+      - name: Set up Python 3.9
+        uses: actions/setup-python@v2
+        with:
+          python-version: 3.9
+      - name: Install python build
+        run: python -m pip install build
+        shell: bash
+      # Install required dependencies
+      - name: Prepare Linux environment
+        run: |
+          sudo scripts/dev/install-dev-deps-ubuntu.bash
+        shell: bash
+      # We need to build the wheel in order to run pdoc.  pdoc does not seem to work if you just point it at
+      # our source directory.
+      - name: Building Python Wheel for documentation generation
+        run: python -m build --wheel --outdir documentation_dist
+        shell: bash
+      - name: "Run Reference Documentation Generation"
+        run: |
+          pip install pdoc pipdeptree
+          pip install documentation_dist/*.whl 
+          echo "documentation" > dependencies_documentation.txt
+          pipdeptree >> dependencies_documentation.txt
+          pdoc -o docs/python/html diskannpy
+      - name: Create version environment variable
+        run: |
+          echo "DISKANN_VERSION=$(python <<EOF
+          from importlib.metadata import version
+          v = version('diskannpy')
+          print(v)
+          EOF
+          )" >> $GITHUB_ENV
+      - name: Archive documentation version artifact
+        uses: actions/upload-artifact@v2
+        with:
+          name: dependencies
+          path: |
+            dependencies_documentation.txt
+      - name: Archive documentation artifacts
+        uses: actions/upload-artifact@v2
+        with:
+          name: documentation-site
+          path: |
+            docs/python/html
+      # Publish to /dev if we are on the "main" branch
+      - name: Publish reference docs for latest development version (main branch)
+        uses: peaceiris/actions-gh-pages@v3
+        if: github.ref == 'refs/heads/main'
+        with:
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          publish_dir: docs/python/html
+          destination_dir: docs/python/dev
+      # Publish to /<version> if we are releasing
+      - name: Publish reference docs by version (main branch)
+        uses: peaceiris/actions-gh-pages@v3
+        if: github.event_name == 'release'
+        with:
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          publish_dir: docs/python/html
+          destination_dir: docs/python/${{ env.DISKANN_VERSION }}
+      # Publish to /latest if we are releasing
+      - name: Publish latest reference docs (main branch)
+        uses: peaceiris/actions-gh-pages@v3
+        if: github.event_name == 'release'
+        with:
+          github_token: ${{ secrets.GITHUB_TOKEN }}
+          publish_dir: docs/python/html
+          destination_dir: docs/python/latest
diff --git a/.github/workflows/disk-pq.yml b/.github/workflows/disk-pq.yml
index 35c662184..6e71e7999 100644
--- a/.github/workflows/disk-pq.yml
+++ b/.github/workflows/disk-pq.yml
@@ -34,6 +34,11 @@ jobs:
         run: |
           dist/bin/build_disk_index --data_type float --dist_fn l2 --data_path data/rand_float_10D_10K_norm1.0.bin --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskfull_oneshot -R 16 -L 32 -B 0.00003 -M 1 
           dist/bin/search_disk_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskfull_oneshot --result_path /tmp/res --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/l2_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
+      - name: build and search disk index (one shot graph build, cosine, no diskPQ) (float)
+        if: success() || failure()
+        run: |
+          dist/bin/build_disk_index --data_type float --dist_fn cosine --data_path data/rand_float_10D_10K_unnorm.bin --index_path_prefix data/disk_index_cosine_rand_float_10D_10K_unnorm_diskfull_oneshot -R 16 -L 32 -B 0.00003 -M 1 
+          dist/bin/search_disk_index --data_type float --dist_fn cosine --fail_if_recall_below 70 --index_path_prefix data/disk_index_cosine_rand_float_10D_10K_unnorm_diskfull_oneshot --result_path /tmp/res --query_file data/rand_float_10D_1K_unnorm.bin --gt_file data/cosine_rand_float_10D_10K_unnorm_10D_1K_unnorm_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
       - name: build and search disk index (one shot graph build, L2, no diskPQ) (int8)
         if: success() || failure()
         run: |
@@ -66,6 +71,11 @@ jobs:
         run: |
           dist/bin/build_disk_index --data_type float --dist_fn l2 --data_path data/rand_float_10D_10K_norm1.0.bin --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskfull_sharded -R 16 -L 32 -B 0.00003 -M 0.00006
           dist/bin/search_disk_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_float_10D_10K_norm1.0_diskfull_sharded --result_path /tmp/res --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/l2_rand_float_10D_10K_norm1.0_10D_1K_norm1.0_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
+      - name: build and search disk index (sharded graph build, cosine, no diskPQ) (float)
+        if: success() || failure()
+        run: |
+          dist/bin/build_disk_index --data_type float --dist_fn cosine --data_path data/rand_float_10D_10K_unnorm.bin --index_path_prefix data/disk_index_cosine_rand_float_10D_10K_unnorm_diskfull_sharded -R 16 -L 32 -B 0.00003 -M 0.00006
+          dist/bin/search_disk_index --data_type float --dist_fn cosine --fail_if_recall_below 70 --index_path_prefix data/disk_index_cosine_rand_float_10D_10K_unnorm_diskfull_sharded --result_path /tmp/res --query_file data/rand_float_10D_1K_unnorm.bin --gt_file data/cosine_rand_float_10D_10K_unnorm_10D_1K_unnorm_gt100 --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
       - name: build and search disk index (sharded graph build, L2, no diskPQ) (int8)
         run: |
           dist/bin/build_disk_index --data_type int8 --dist_fn l2 --data_path data/rand_int8_10D_10K_norm50.0.bin --index_path_prefix data/disk_index_l2_rand_int8_10D_10K_norm50.0_diskfull_sharded -R 16 -L 32 -B 0.00003 -M 0.00006
diff --git a/.github/workflows/dynamic-labels.yml b/.github/workflows/dynamic-labels.yml
new file mode 100644
index 000000000..0f3b56eb9
--- /dev/null
+++ b/.github/workflows/dynamic-labels.yml
@@ -0,0 +1,102 @@
+name: Dynamic-Labels
+on: [workflow_call]
+jobs:
+  acceptance-tests-dynamic:
+    name: Dynamic-Labels
+    strategy:
+      fail-fast: false
+      matrix:
+        os: [ubuntu-latest, windows-2019, windows-latest]
+    runs-on: ${{matrix.os}}
+    defaults:
+      run:
+        shell: bash
+    steps:
+      - name: Checkout repository
+        if: ${{ runner.os == 'Linux' }}
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 1
+      - name: Checkout repository
+        if: ${{ runner.os == 'Windows' }}
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 1
+          submodules: true
+      - name: DiskANN Build CLI Applications
+        uses: ./.github/actions/build
+
+      - name: Generate Data
+        uses: ./.github/actions/generate-random
+        
+      - name: Generate Labels
+        run: |
+          echo "Generating synthetic labels and computing ground truth for filtered search with universal label"
+          dist/bin/generate_synthetic_labels  --num_labels 50 --num_points 10000  --output_file data/rand_labels_50_10K.txt --distribution_type random
+
+          echo "Generating synthetic labels with a zipf distribution and computing ground truth for filtered search with universal label"
+          dist/bin/generate_synthetic_labels  --num_labels 50 --num_points 10000  --output_file data/zipf_labels_50_10K.txt --distribution_type zipf
+
+      - name: Test a streaming index (float) with labels (Zipf distributed)
+        run: |
+          dist/bin/test_streaming_scenario --data_type float --dist_fn l2 --data_path data/rand_float_10D_10K_norm1.0.bin --universal_label 0 --label_file data/zipf_labels_50_10K.txt --index_path_prefix data/index_zipf_stream -R 64 --FilteredLbuild 200 -L 50 --alpha 1.2 --insert_threads 8 --consolidate_threads 8 --max_points_to_insert 10000 --active_window 4000 --consolidate_interval 2000 --start_point_norm 3.2 --unique_labels_supported 51
+
+          echo "Computing groundtruth with filter"
+          dist/bin/compute_groundtruth_for_filters --data_type float --universal_label 0 --filter_label 1 --dist_fn l2 --base_file data/index_zipf_stream.after-streaming-act4000-cons2000-max10000.data --query_file data/rand_float_10D_1K_norm1.0.bin --K 100 --gt_file data/gt100_zipf_base-act4000-cons2000-max10000_1 --label_file data/index_zipf_stream.after-streaming-act4000-cons2000-max10000_raw_labels.txt --tags_file data/index_zipf_stream.after-streaming-act4000-cons2000-max10000.tags
+          echo "Searching with filter"
+          dist/bin/search_memory_index --data_type float --dist_fn l2 --filter_label 1 --fail_if_recall_below 40 --index_path_prefix data/index_zipf_stream.after-streaming-act4000-cons2000-max10000 --result_path data/res_stream --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/gt100_zipf_base-act4000-cons2000-max10000_1 -K 10 -L 20 40 60 80 100 150 -T 64 --dynamic true --tags 1
+
+          echo "Computing groundtruth w/o filter"
+          dist/bin/compute_groundtruth --data_type float --dist_fn l2 --base_file data/index_zipf_stream.after-streaming-act4000-cons2000-max10000.data --query_file data/rand_float_10D_1K_norm1.0.bin --K 100 --gt_file data/gt100_zipf_base-act4000-cons2000-max10000
+          echo "Searching without filter"
+          dist/bin/search_memory_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/index_zipf_stream.after-streaming-act4000-cons2000-max10000 --result_path res_stream --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/gt100_zipf_base-act4000-cons2000-max10000 -K 10 -L 20 40 60 80 100 -T 64
+
+      - name: Test a streaming index (float) with labels (random distributed)
+        run: |
+          dist/bin/test_streaming_scenario --data_type float --dist_fn l2 --data_path data/rand_float_10D_10K_norm1.0.bin --universal_label 0 --label_file data/rand_labels_50_10K.txt --index_path_prefix data/index_rand_stream -R 64 --FilteredLbuild 200 -L 50 --alpha 1.2 --insert_threads 8 --consolidate_threads 8 --max_points_to_insert 10000 --active_window 4000 --consolidate_interval 2000 --start_point_norm 3.2 --unique_labels_supported 51
+          
+          echo "Computing groundtruth with filter"
+          dist/bin/compute_groundtruth_for_filters --data_type float --universal_label 0 --filter_label 1 --dist_fn l2 --base_file data/index_rand_stream.after-streaming-act4000-cons2000-max10000.data --query_file data/rand_float_10D_1K_norm1.0.bin --K 100 --gt_file data/gt100_rand_base-act4000-cons2000-max10000_1 --label_file data/index_rand_stream.after-streaming-act4000-cons2000-max10000_raw_labels.txt --tags_file data/index_rand_stream.after-streaming-act4000-cons2000-max10000.tags
+          echo "Searching with filter"
+          dist/bin/search_memory_index --data_type float --dist_fn l2 --filter_label 1 --fail_if_recall_below 40 --index_path_prefix data/index_rand_stream.after-streaming-act4000-cons2000-max10000 --result_path data/res_stream --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/gt100_rand_base-act4000-cons2000-max10000_1 -K 10 -L 20 40 60 80 100 150 -T 64 --dynamic true --tags 1
+
+          echo "Computing groundtruth w/o filter"
+          dist/bin/compute_groundtruth --data_type float --dist_fn l2 --base_file data/index_rand_stream.after-streaming-act4000-cons2000-max10000.data --query_file data/rand_float_10D_1K_norm1.0.bin --K 100 --gt_file data/gt100_rand_base-act4000-cons2000-max10000
+          echo "Searching without filter"
+          dist/bin/search_memory_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/index_rand_stream.after-streaming-act4000-cons2000-max10000 --result_path res_stream --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/gt100_rand_base-act4000-cons2000-max10000 -K 10 -L 20 40 60 80 100 -T 64
+
+      - name: Test Insert Delete Consolidate (float) with labels (zipf distributed)
+        run: |
+          dist/bin/test_insert_deletes_consolidate --data_type float --dist_fn l2 --universal_label 0 --label_file data/zipf_labels_50_10K.txt --FilteredLbuild 70  --data_path data/rand_float_10D_10K_norm1.0.bin --index_path_prefix data/index_zipf_ins_del -R 64 -L 10 --alpha 1.2 --points_to_skip 0 --max_points_to_insert 7500 --beginning_index_size 0 --points_per_checkpoint 1000 --checkpoints_per_snapshot 0 --points_to_delete_from_beginning 2500 --start_deletes_after 5000 --do_concurrent true --start_point_norm 3.2 --unique_labels_supported 51
+
+          echo "Computing groundtruth with filter"
+          dist/bin/compute_groundtruth_for_filters --data_type float --filter_label 5 --universal_label 0 --dist_fn l2 --base_file data/index_zipf_ins_del.after-concurrent-delete-del2500-7500.data --query_file data/rand_float_10D_1K_norm1.0.bin --K 100 --gt_file data/gt100_zipf_random10D_1K_wlabel_5 --label_file data/index_zipf_ins_del.after-concurrent-delete-del2500-7500_raw_labels.txt --tags_file data/index_zipf_ins_del.after-concurrent-delete-del2500-7500.tags
+          echo "Searching with filter"
+          dist/bin/search_memory_index --data_type float --dist_fn l2 --filter_label 5 --fail_if_recall_below 10 --index_path_prefix data/index_zipf_ins_del.after-concurrent-delete-del2500-7500 --result_path data/res_zipf_stream --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/gt100_zipf_random10D_1K_wlabel_5 -K 10 -L 20 40 60 80 100 150 -T 64 --dynamic true --tags 1
+
+          echo "Computing groundtruth w/o filter"
+          dist/bin/compute_groundtruth --data_type float --dist_fn l2 --base_file data/index_zipf_ins_del.after-concurrent-delete-del2500-7500.data --query_file data/rand_float_10D_1K_norm1.0.bin --K 100 --gt_file data/gt100_zipf_random10D_1K
+          echo "Searching without filter"
+          dist/bin/search_memory_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/index_zipf_ins_del.after-concurrent-delete-del2500-7500 --result_path res_stream --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/gt100_zipf_random10D_1K -K 10 -L 20 40 60 80 100 -T 64
+
+      - name: Test Insert Delete Consolidate (float) with labels (random distributed)
+        run: |
+          dist/bin/test_insert_deletes_consolidate --data_type float --dist_fn l2 --universal_label 0 --label_file data/rand_labels_50_10K.txt --FilteredLbuild 70  --data_path data/rand_float_10D_10K_norm1.0.bin --index_path_prefix data/index_rand_ins_del -R 64 -L 10 --alpha 1.2 --points_to_skip 0 --max_points_to_insert 7500 --beginning_index_size 0 --points_per_checkpoint 1000 --checkpoints_per_snapshot 0 --points_to_delete_from_beginning 2500 --start_deletes_after 5000 --do_concurrent true --start_point_norm 3.2 --unique_labels_supported 51
+
+          echo "Computing groundtruth with filter"
+          dist/bin/compute_groundtruth_for_filters --data_type float --filter_label 5 --universal_label 0 --dist_fn l2 --base_file data/index_rand_ins_del.after-concurrent-delete-del2500-7500.data --query_file data/rand_float_10D_1K_norm1.0.bin --K 100 --gt_file data/gt100_rand_random10D_1K_wlabel_5 --label_file data/index_rand_ins_del.after-concurrent-delete-del2500-7500_raw_labels.txt --tags_file data/index_rand_ins_del.after-concurrent-delete-del2500-7500.tags
+          echo "Searching with filter"
+          dist/bin/search_memory_index --data_type float --dist_fn l2 --filter_label 5 --fail_if_recall_below 40 --index_path_prefix data/index_rand_ins_del.after-concurrent-delete-del2500-7500 --result_path data/res_rand_stream --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/gt100_rand_random10D_1K_wlabel_5 -K 10 -L 20 40 60 80 100 150 -T 64 --dynamic true --tags 1
+
+          echo "Computing groundtruth w/o filter"
+          dist/bin/compute_groundtruth --data_type float --dist_fn l2 --base_file data/index_rand_ins_del.after-concurrent-delete-del2500-7500.data --query_file data/rand_float_10D_1K_norm1.0.bin --K 100 --gt_file data/gt100_rand_random10D_1K
+          echo "Searching without filter"
+          dist/bin/search_memory_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/index_rand_ins_del.after-concurrent-delete-del2500-7500 --result_path res_stream --query_file data/rand_float_10D_1K_norm1.0.bin --gt_file data/gt100_rand_random10D_1K -K 10 -L 20 40 60 80 100 -T 64
+
+      - name: upload data and bin
+        uses: actions/upload-artifact@v3
+        with:
+          name: dynamic
+          path: |
+            ./dist/**
+            ./data/**
diff --git a/.github/workflows/labels.yml b/.github/workflows/labels.yml
index e811c1ff5..5555f7f84 100644
--- a/.github/workflows/labels.yml
+++ b/.github/workflows/labels.yml
@@ -27,7 +27,7 @@ jobs:
         uses: ./.github/actions/build
 
       - name: Generate Data
-        uses: ./.github/actions/generate-random     
+        uses: ./.github/actions/generate-random
         
       - name: Generate Labels
         run: |
@@ -55,11 +55,16 @@ jobs:
           dist/bin/build_memory_index --data_type uint8 --dist_fn cosine --FilteredLbuild 90 --universal_label 0 --data_path data/rand_uint8_10D_10K_norm50.0.bin --label_file data/rand_labels_50_10K.txt --index_path_prefix data/index_cosine_rand_uint8_10D_10K_norm50_wlabel
           dist/bin/search_memory_index --data_type uint8 --dist_fn l2 --filter_label 10 --fail_if_recall_below 70 --index_path_prefix data/index_l2_rand_uint8_10D_10K_norm50_wlabel --query_file data/rand_uint8_10D_1K_norm50.0.bin --recall_at 10 --result_path temp --gt_file data/l2_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100_wlabel -L  16 32
           dist/bin/search_memory_index --data_type uint8 --dist_fn cosine --filter_label 10 --fail_if_recall_below 70 --index_path_prefix data/index_cosine_rand_uint8_10D_10K_norm50_wlabel --query_file data/rand_uint8_10D_1K_norm50.0.bin --recall_at 10 --result_path temp --gt_file data/cosine_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100_wlabel -L  16 32
+
+          echo "Searching without filters"
+          dist/bin/search_memory_index --data_type uint8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/index_l2_rand_uint8_10D_10K_norm50_wlabel --query_file data/rand_uint8_10D_1K_norm50.0.bin --recall_at 10 --result_path temp --gt_file data/l2_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 -L  32 64
+          dist/bin/search_memory_index --data_type uint8 --dist_fn cosine --fail_if_recall_below 70 --index_path_prefix data/index_cosine_rand_uint8_10D_10K_norm50_wlabel --query_file data/rand_uint8_10D_1K_norm50.0.bin --recall_at 10 --result_path temp --gt_file data/cosine_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 -L  32 64
+          
       - name: build and search disk index with labels using L2 and Cosine metrics (random distributed labels)
         if: success() || failure()
         run: |
-          dist/bin/build_disk_index --data_type uint8 --dist_fn l2 --universal_label 0  --FilteredLbuild 90 --data_path data/rand_uint8_10D_10K_norm50.0.bin --label_file data/rand_labels_50_10K.txt --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50_wlabel -R 16 -L 32 -B 0.00003 -M 1
-          dist/bin/search_disk_index --data_type uint8 --dist_fn l2 --filter_label 10 --fail_if_recall_below 50 --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50_wlabel --result_path /tmp/res --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/l2_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100_wlabel --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
+          dist/bin/build_disk_index --data_type uint8 --dist_fn l2 --universal_label 0  --FilteredLbuild 90 --data_path data/rand_uint8_10D_10K_norm50.0.bin --label_file data/rand_labels_50_10K.txt --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50_wlabel -R 32 -L 5 -B 0.00003 -M 1
+          dist/bin/search_disk_index --data_type uint8 --dist_fn l2 --filter_label 10 --fail_if_recall_below 50 --index_path_prefix data/disk_index_l2_rand_uint8_10D_10K_norm50_wlabel --result_path temp --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/l2_rand_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100_wlabel --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
       - name: build and search in-memory index with labels using L2 and Cosine metrics (zipf distributed labels)
         if: success() || failure()
         run: |
@@ -67,18 +72,26 @@ jobs:
           dist/bin/build_memory_index --data_type uint8 --dist_fn cosine --FilteredLbuild 90 --universal_label 0 --data_path data/rand_uint8_10D_10K_norm50.0.bin --label_file data/zipf_labels_50_10K.txt --index_path_prefix data/index_cosine_zipf_uint8_10D_10K_norm50_wlabel
           dist/bin/search_memory_index --data_type uint8 --dist_fn l2 --filter_label 5 --fail_if_recall_below 70 --index_path_prefix data/index_l2_zipf_uint8_10D_10K_norm50_wlabel --query_file data/rand_uint8_10D_1K_norm50.0.bin --recall_at 10 --result_path temp --gt_file data/l2_zipf_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100_wlabel -L  16 32
           dist/bin/search_memory_index --data_type uint8 --dist_fn cosine --filter_label 5 --fail_if_recall_below 70 --index_path_prefix data/index_cosine_zipf_uint8_10D_10K_norm50_wlabel --query_file data/rand_uint8_10D_1K_norm50.0.bin --recall_at 10 --result_path temp --gt_file data/cosine_zipf_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100_wlabel -L  16 32
+
+          echo "Searching without filters"
+          dist/bin/compute_groundtruth  --data_type uint8 --dist_fn l2 --base_file data/rand_uint8_10D_10K_norm50.0.bin --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/l2_zipf_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --K 100
+          dist/bin/compute_groundtruth  --data_type uint8 --dist_fn cosine --base_file data/rand_uint8_10D_10K_norm50.0.bin --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/cosine_zipf_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 --K 100
+          dist/bin/search_memory_index --data_type uint8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/index_l2_zipf_uint8_10D_10K_norm50_wlabel --query_file data/rand_uint8_10D_1K_norm50.0.bin --recall_at 10 --result_path temp --gt_file data/l2_zipf_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 -L  32 64
+          dist/bin/search_memory_index --data_type uint8 --dist_fn cosine --fail_if_recall_below 70 --index_path_prefix data/index_cosine_zipf_uint8_10D_10K_norm50_wlabel --query_file data/rand_uint8_10D_1K_norm50.0.bin --recall_at 10 --result_path temp --gt_file data/cosine_zipf_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100 -L  32 64
+
       - name: build and search disk index with labels using L2 and Cosine metrics (zipf distributed labels)
         if: success() || failure()
         run: |
-          dist/bin/build_disk_index --data_type uint8 --dist_fn l2 --universal_label 0  --FilteredLbuild 90 --data_path data/rand_uint8_10D_10K_norm50.0.bin --label_file data/zipf_labels_50_10K.txt --index_path_prefix data/disk_index_l2_zipf_uint8_10D_10K_norm50_wlabel -R 16 -L 32 -B 0.00003 -M 1
-          dist/bin/search_disk_index --data_type uint8 --dist_fn l2 --filter_label 5 --fail_if_recall_below 50 --index_path_prefix data/disk_index_l2_zipf_uint8_10D_10K_norm50_wlabel --result_path /tmp/res --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/l2_zipf_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100_wlabel --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
+          dist/bin/build_disk_index --data_type uint8 --dist_fn l2 --universal_label 0  --FilteredLbuild 90 --data_path data/rand_uint8_10D_10K_norm50.0.bin --label_file data/zipf_labels_50_10K.txt --index_path_prefix data/disk_index_l2_zipf_uint8_10D_10K_norm50_wlabel -R 32 -L 5 -B 0.00003 -M 1
+          dist/bin/search_disk_index --data_type uint8 --dist_fn l2 --filter_label 5 --fail_if_recall_below 50 --index_path_prefix data/disk_index_l2_zipf_uint8_10D_10K_norm50_wlabel --result_path temp --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/l2_zipf_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100_wlabel --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
+
       - name : build and search in-memory and disk index (without universal label, zipf distributed)
         if: success() || failure()
         run: |
           dist/bin/build_memory_index --data_type uint8 --dist_fn l2 --FilteredLbuild 90 --data_path data/rand_uint8_10D_10K_norm50.0.bin --label_file data/zipf_labels_50_10K.txt --index_path_prefix data/index_l2_zipf_uint8_10D_10K_norm50_wlabel_nouniversal
-          dist/bin/build_disk_index --data_type uint8 --dist_fn l2  --FilteredLbuild 90 --data_path data/rand_uint8_10D_10K_norm50.0.bin --label_file data/zipf_labels_50_10K.txt --index_path_prefix data/disk_index_l2_zipf_uint8_10D_10K_norm50_wlabel_nouniversal -R 16 -L 32 -B 0.00003 -M 1
+          dist/bin/build_disk_index --data_type uint8 --dist_fn l2  --FilteredLbuild 90 --data_path data/rand_uint8_10D_10K_norm50.0.bin --label_file data/zipf_labels_50_10K.txt --index_path_prefix data/disk_index_l2_zipf_uint8_10D_10K_norm50_wlabel_nouniversal -R 32 -L 5 -B 0.00003 -M 1
           dist/bin/search_memory_index --data_type uint8 --dist_fn l2 --filter_label 5 --fail_if_recall_below 70 --index_path_prefix data/index_l2_zipf_uint8_10D_10K_norm50_wlabel_nouniversal --query_file data/rand_uint8_10D_1K_norm50.0.bin --recall_at 10 --result_path temp --gt_file data/l2_zipf_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100_wlabel_nouniversal -L  16 32
-          dist/bin/search_disk_index --data_type uint8 --dist_fn l2 --filter_label 5 --index_path_prefix data/disk_index_l2_zipf_uint8_10D_10K_norm50_wlabel_nouniversal --result_path /tmp/res --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/l2_zipf_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100_wlabel_nouniversal --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
+          dist/bin/search_disk_index --data_type uint8 --dist_fn l2 --filter_label 5 --index_path_prefix data/disk_index_l2_zipf_uint8_10D_10K_norm50_wlabel_nouniversal --result_path temp --query_file data/rand_uint8_10D_1K_norm50.0.bin --gt_file data/l2_zipf_uint8_10D_10K_norm50.0_10D_1K_norm50.0_gt100_wlabel_nouniversal --recall_at 5 -L 5 12 -W 2 --num_nodes_to_cache 10 -T 16
       - name: Generate combined GT for each query with a separate label and search
         if: success() || failure()
         run: |
diff --git a/.github/workflows/multi-sector-disk-pq.yml b/.github/workflows/multi-sector-disk-pq.yml
new file mode 100644
index 000000000..8ea55c88d
--- /dev/null
+++ b/.github/workflows/multi-sector-disk-pq.yml
@@ -0,0 +1,60 @@
+name: Disk With PQ
+on: [workflow_call]
+jobs:
+  acceptance-tests-disk-pq:
+    name: Disk, PQ
+    strategy:
+      fail-fast: false
+      matrix:
+        os: [ubuntu-latest, windows-2019, windows-latest]
+    runs-on: ${{matrix.os}}
+    defaults:
+      run:
+        shell: bash
+    steps:
+      - name: Checkout repository
+        if: ${{ runner.os == 'Linux' }}
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 1
+      - name: Checkout repository
+        if: ${{ runner.os == 'Windows' }}
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 1
+          submodules: true
+      - name: DiskANN Build CLI Applications
+        uses: ./.github/actions/build
+        
+      - name: Generate Data
+        uses: ./.github/actions/generate-high-dim-random
+
+      - name: build and search disk index (1020D, one shot graph build, L2, no diskPQ) (float)
+        if: success() || failure()
+        run: |
+          dist/bin/build_disk_index --data_type float --dist_fn l2 --data_path data/rand_float_1020D_5K_norm1.0.bin --index_path_prefix data/disk_index_l2_rand_float_1020D_5K_norm1.0_diskfull_oneshot -R 32 -L 500 -B 0.003 -M 1 
+          dist/bin/search_disk_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_float_1020D_5K_norm1.0_diskfull_oneshot --result_path /tmp/res --query_file data/rand_float_1020D_1K_norm1.0.bin --gt_file data/l2_rand_float_1020D_5K_norm1.0_1020D_1K_norm1.0_gt100 --recall_at 5 -L 250 -W 2 --num_nodes_to_cache 100 -T 16
+      #- name: build and search disk index (1024D, one shot graph build, L2, no diskPQ) (float)
+      #  if: success() || failure()
+      #  run: |
+      #    dist/bin/build_disk_index --data_type float --dist_fn l2 --data_path data/rand_float_1024D_5K_norm1.0.bin --index_path_prefix data/disk_index_l2_rand_float_1024D_5K_norm1.0_diskfull_oneshot -R 32 -L 500 -B 0.003 -M 1 
+      #    dist/bin/search_disk_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_float_1024D_5K_norm1.0_diskfull_oneshot --result_path /tmp/res --query_file data/rand_float_1024D_1K_norm1.0.bin --gt_file data/l2_rand_float_1024D_5K_norm1.0_1024D_1K_norm1.0_gt100 --recall_at 5 -L 250 -W 2 --num_nodes_to_cache 100 -T 16
+      - name: build and search disk index (1536D, one shot graph build, L2, no diskPQ) (float)
+        if: success() || failure()
+        run: |
+          dist/bin/build_disk_index --data_type float --dist_fn l2 --data_path data/rand_float_1536D_5K_norm1.0.bin --index_path_prefix data/disk_index_l2_rand_float_1536D_5K_norm1.0_diskfull_oneshot -R 32 -L 500 -B 0.003 -M 1 
+          dist/bin/search_disk_index --data_type float --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_float_1536D_5K_norm1.0_diskfull_oneshot --result_path /tmp/res --query_file data/rand_float_1536D_1K_norm1.0.bin --gt_file data/l2_rand_float_1536D_5K_norm1.0_1536D_1K_norm1.0_gt100 --recall_at 5 -L 250 -W 2 --num_nodes_to_cache 100 -T 16
+
+      - name: build and search disk index (4096D, one shot graph build, L2, no diskPQ) (int8)
+        if: success() || failure()
+        run: |
+          dist/bin/build_disk_index --data_type int8 --dist_fn l2 --data_path data/rand_int8_4096D_5K_norm1.0.bin --index_path_prefix data/disk_index_l2_rand_int8_4096D_5K_norm1.0_diskfull_oneshot -R 32 -L 500 -B 0.003 -M 1 
+          dist/bin/search_disk_index --data_type int8 --dist_fn l2 --fail_if_recall_below 70 --index_path_prefix data/disk_index_l2_rand_int8_4096D_5K_norm1.0_diskfull_oneshot --result_path /tmp/res --query_file data/rand_int8_4096D_1K_norm1.0.bin --gt_file data/l2_rand_int8_4096D_5K_norm1.0_4096D_1K_norm1.0_gt100 --recall_at 5 -L 250 -W 2 --num_nodes_to_cache 100 -T 16
+
+      - name: upload data and bin
+        uses: actions/upload-artifact@v3
+        with:
+          name: multi-sector-disk-pq
+          path: |
+            ./dist/**
+            ./data/**
diff --git a/.github/workflows/perf.yml b/.github/workflows/perf.yml
new file mode 100644
index 000000000..1595a4221
--- /dev/null
+++ b/.github/workflows/perf.yml
@@ -0,0 +1,26 @@
+name: DiskANN Nightly Performance Metrics
+on:
+  schedule:
+    - cron: "41 14 * * *"  # 14:41 UTC, 7:41 PDT, 8:41 PST, 08:11 IST
+jobs:
+  perf-test:
+    name: Run Perf Test from main
+    runs-on: ubuntu-latest
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v3
+        with:
+          fetch-depth: 1
+      - name: Build Perf Container
+        run: |
+          docker build --build-arg GIT_COMMIT_ISH="$GITHUB_SHA" -t perf -f scripts/perf/Dockerfile scripts
+      - name: Performance Tests
+        run: |
+          mkdir metrics
+          docker run -v ./metrics:/app/logs perf &> ./metrics/combined_stdouterr.log
+      - name: Upload Metrics Logs
+        uses: actions/upload-artifact@v3
+        with:
+          name: metrics
+          path: |
+            ./metrics/**
diff --git a/.github/workflows/pr-test.yml b/.github/workflows/pr-test.yml
index 38eefb3ff..f84953b8c 100644
--- a/.github/workflows/pr-test.yml
+++ b/.github/workflows/pr-test.yml
@@ -18,12 +18,18 @@ jobs:
   disk-pq:
     name: Disk with PQ
     uses: ./.github/workflows/disk-pq.yml
+  multi-sector-disk-pq:
+    name: Multi-sector Disk with PQ
+    uses: ./.github/workflows/multi-sector-disk-pq.yml
   labels:
     name: Labels
     uses: ./.github/workflows/labels.yml
   dynamic:
     name: Dynamic
     uses: ./.github/workflows/dynamic.yml
+  dynamic-labels:
+    name: Dynamic Labels
+    uses: ./.github/workflows/dynamic-labels.yml
   python:
     name: Python
     uses: ./.github/workflows/build-python.yml
diff --git a/.github/workflows/push-test.yml b/.github/workflows/push-test.yml
index 4de999014..89e6ae018 100644
--- a/.github/workflows/push-test.yml
+++ b/.github/workflows/push-test.yml
@@ -6,6 +6,13 @@ jobs:
       fail-fast: true
     name: DiskANN Common Build Checks
     uses: ./.github/workflows/common.yml
+  build-documentation:
+    permissions:
+      contents: write
+    strategy:
+      fail-fast: true
+    name: DiskANN Build Documentation
+    uses: ./.github/workflows/build-python-pdoc.yml
   build:
     strategy:
       fail-fast: false
@@ -28,6 +35,17 @@ jobs:
         with:
           fetch-depth: 1
           submodules: true
+      - name: Build dispannpy dependency tree
+        run: |
+          pip install diskannpy pipdeptree
+          echo "dependencies" > dependencies_${{ matrix.os }}.txt
+          pipdeptree >> dependencies_${{ matrix.os }}.txt
+      - name: Archive dispannpy dependencies artifact
+        uses: actions/upload-artifact@v3
+        with:
+          name: dependencies
+          path: |
+            dependencies_${{ matrix.os }}.txt
       - name: DiskANN Build CLI Applications
         uses: ./.github/actions/build
 #  python:
diff --git a/.github/workflows/python-release.yml b/.github/workflows/python-release.yml
index a1e72ad90..a15d4d161 100644
--- a/.github/workflows/python-release.yml
+++ b/.github/workflows/python-release.yml
@@ -6,7 +6,14 @@ jobs:
   python-release-wheels:
     name: Python
     uses: ./.github/workflows/build-python.yml
+  build-documentation:
+    strategy:
+      fail-fast: true
+    name: DiskANN Build Documentation
+    uses: ./.github/workflows/build-python-pdoc.yml
   release:
+    permissions:
+      contents: write
     runs-on: ubuntu-latest
     needs: python-release-wheels
     steps:
diff --git a/AnyBuildLogs/latest.txt b/AnyBuildLogs/latest.txt
new file mode 100644
index 000000000..38b4a947f
--- /dev/null
+++ b/AnyBuildLogs/latest.txt
@@ -0,0 +1 @@
+20231019-111207-d314f8bf
\ No newline at end of file
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 89530f818..3d3d2b860 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -14,6 +14,7 @@
 #   such behavior.
 #   Contact for this feature: gopalrs.
 
+
 # Some variables like MSVC are defined only after project(), so put that first.
 cmake_minimum_required(VERSION 3.15)
 project(diskann)
@@ -52,6 +53,9 @@ endif()
 
 include_directories(${PROJECT_SOURCE_DIR}/include)
 
+if(NOT PYBIND)
+    set(DISKANN_RELEASE_UNUSED_TCMALLOC_MEMORY_AT_CHECKPOINTS ON)
+endif()
 # It's necessary to include tcmalloc headers only if calling into MallocExtension interface.
 # For using tcmalloc in DiskANN tools, it's enough to just link with tcmalloc.
 if (DISKANN_RELEASE_UNUSED_TCMALLOC_MEMORY_AT_CHECKPOINTS)
@@ -92,7 +96,9 @@ if (MSVC)
     set(Boost_USE_STATIC_LIBS ON)
 endif()
 
-find_package(Boost COMPONENTS program_options)
+if(NOT MSVC)
+    find_package(Boost COMPONENTS program_options)
+endif()
 
 # For Windows, fall back to nuget version if find_package didn't find it.
 if (MSVC AND NOT Boost_FOUND)
@@ -219,13 +225,13 @@ if (MSVC)
     # Tell CMake how to build the tcmalloc linker library from the submodule.
     add_custom_target(build_libtcmalloc_minimal DEPENDS ${TCMALLOC_LINK_LIBRARY})
     add_custom_command(OUTPUT ${TCMALLOC_LINK_LIBRARY}
-                       COMMAND ${CMAKE_VS_MSBUILD_COMMAND} gperftools.sln /m /nologo
-                           /t:libtcmalloc_minimal /p:Configuration="Release-Patch"
-                           /property:Platform="x64"
-                           /p:PlatformToolset=v${MSVC_TOOLSET_VERSION}
-                           /p:WindowsTargetPlatformVersion=${CMAKE_VS_WINDOWS_TARGET_PLATFORM_VERSION}
-                       WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}/gperftools)
-
+                        COMMAND ${CMAKE_VS_MSBUILD_COMMAND} gperftools.sln /m /nologo
+                            /t:libtcmalloc_minimal /p:Configuration="Release-Patch"
+                            /property:Platform="x64"
+                            /p:PlatformToolset=v${MSVC_TOOLSET_VERSION}
+                            /p:WindowsTargetPlatformVersion=${CMAKE_VS_WINDOWS_TARGET_PLATFORM_VERSION}
+                        WORKING_DIRECTORY ${PROJECT_SOURCE_DIR}/gperftools)
+ 
     add_library(libtcmalloc_minimal_for_exe STATIC IMPORTED)
     add_library(libtcmalloc_minimal_for_dll STATIC IMPORTED)
 
diff --git a/README.md b/README.md
index 2922c16c1..a20a1d671 100644
--- a/README.md
+++ b/README.md
@@ -1,13 +1,15 @@
 # DiskANN
 
-[![DiskANN Paper](https://img.shields.io/badge/Paper-NeurIPS%3A_DiskANN-blue)](https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf)
-[![DiskANN Paper](https://img.shields.io/badge/Paper-Arxiv%3A_Fresh--DiskANN-blue)](https://arxiv.org/abs/2105.09613)
-[![DiskANN Paper](https://img.shields.io/badge/Paper-Filtered--DiskANN-blue)](https://harsha-simhadri.org/pubs/Filtered-DiskANN23.pdf)
 [![DiskANN Main](https://github.com/microsoft/DiskANN/actions/workflows/push-test.yml/badge.svg?branch=main)](https://github.com/microsoft/DiskANN/actions/workflows/push-test.yml)
 [![PyPI version](https://img.shields.io/pypi/v/diskannpy.svg)](https://pypi.org/project/diskannpy/)
 [![Downloads shield](https://pepy.tech/badge/diskannpy)](https://pepy.tech/project/diskannpy)
 [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
 
+[![DiskANN Paper](https://img.shields.io/badge/Paper-NeurIPS%3A_DiskANN-blue)](https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf)
+[![DiskANN Paper](https://img.shields.io/badge/Paper-Arxiv%3A_Fresh--DiskANN-blue)](https://arxiv.org/abs/2105.09613)
+[![DiskANN Paper](https://img.shields.io/badge/Paper-Filtered--DiskANN-blue)](https://harsha-simhadri.org/pubs/Filtered-DiskANN23.pdf)
+
+
 DiskANN is a suite of scalable, accurate and cost-effective approximate nearest neighbor search algorithms for large-scale vector search that support real-time changes and simple filters.
 This code is based on ideas from the [DiskANN](https://papers.nips.cc/paper/9527-rand-nsg-fast-accurate-billion-point-nearest-neighbor-search-on-a-single-node.pdf), [Fresh-DiskANN](https://arxiv.org/abs/2105.09613) and the [Filtered-DiskANN](https://harsha-simhadri.org/pubs/Filtered-DiskANN23.pdf) papers with further improvements. 
 This code forked off from [code for NSG](https://github.com/ZJULearning/nsg) algorithm.
@@ -105,7 +107,7 @@ Please cite this software in your work as:
    author = {Simhadri, Harsha Vardhan and Krishnaswamy, Ravishankar and Srinivasa, Gopal and Subramanya, Suhas Jayaram and Antonijevic, Andrija and Pryce, Dax and Kaczynski, David and Williams, Shane and Gollapudi, Siddarth and Sivashankar, Varun and Karia, Neel and Singh, Aditi and Jaiswal, Shikhar and Mahapatro, Neelam and Adams, Philip and Tower, Bryan and Patel, Yash}},
    title = {{DiskANN: Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search}},
    url = {https://github.com/Microsoft/DiskANN},
-   version = {0.6.0},
+   version = {0.6.1},
    year = {2023}
 }
 ```
diff --git a/apps/build_disk_index.cpp b/apps/build_disk_index.cpp
index b617a5f4a..f48b61726 100644
--- a/apps/build_disk_index.cpp
+++ b/apps/build_disk_index.cpp
@@ -107,6 +107,8 @@ int main(int argc, char **argv)
         metric = diskann::Metric::L2;
     else if (dist_fn == std::string("mips"))
         metric = diskann::Metric::INNER_PRODUCT;
+    else if (dist_fn == std::string("cosine"))
+        metric = diskann::Metric::COSINE;
     else
     {
         std::cout << "Error. Only l2 and mips distance functions are supported" << std::endl;
diff --git a/apps/build_memory_index.cpp b/apps/build_memory_index.cpp
index 92b269f4f..544e42dee 100644
--- a/apps/build_memory_index.cpp
+++ b/apps/build_memory_index.cpp
@@ -22,50 +22,6 @@
 
 namespace po = boost::program_options;
 
-template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t>
-int build_in_memory_index(const diskann::Metric &metric, const std::string &data_path, const uint32_t R,
-                          const uint32_t L, const float alpha, const std::string &save_path, const uint32_t num_threads,
-                          const bool use_pq_build, const size_t num_pq_bytes, const bool use_opq,
-                          const std::string &label_file, const std::string &universal_label, const uint32_t Lf)
-{
-    diskann::IndexWriteParameters paras = diskann::IndexWriteParametersBuilder(L, R)
-                                              .with_filter_list_size(Lf)
-                                              .with_alpha(alpha)
-                                              .with_saturate_graph(false)
-                                              .with_num_threads(num_threads)
-                                              .build();
-    std::string labels_file_to_use = save_path + "_label_formatted.txt";
-    std::string mem_labels_int_map_file = save_path + "_labels_map.txt";
-
-    size_t data_num, data_dim;
-    diskann::get_bin_metadata(data_path, data_num, data_dim);
-
-    diskann::Index<T, TagT, LabelT> index(metric, data_dim, data_num, false, false, false, use_pq_build, num_pq_bytes,
-                                          use_opq);
-    auto s = std::chrono::high_resolution_clock::now();
-    if (label_file == "")
-    {
-        index.build(data_path.c_str(), data_num, paras);
-    }
-    else
-    {
-        convert_labels_string_to_int(label_file, labels_file_to_use, mem_labels_int_map_file, universal_label);
-        if (universal_label != "")
-        {
-            LabelT unv_label_as_num = 0;
-            index.set_universal_label(unv_label_as_num);
-        }
-        index.build_filtered_index(data_path.c_str(), labels_file_to_use, data_num, paras);
-    }
-    std::chrono::duration<double> diff = std::chrono::high_resolution_clock::now() - s;
-
-    std::cout << "Indexing time: " << diff.count() << "\n";
-    index.save(save_path.c_str());
-    if (label_file != "")
-        std::remove(labels_file_to_use.c_str());
-    return 0;
-}
-
 int main(int argc, char **argv)
 {
     std::string data_type, dist_fn, data_path, index_path_prefix, label_file, universal_label, label_type;
@@ -164,35 +120,37 @@ int main(int argc, char **argv)
         size_t data_num, data_dim;
         diskann::get_bin_metadata(data_path, data_num, data_dim);
 
+        auto index_build_params = diskann::IndexWriteParametersBuilder(L, R)
+                                      .with_filter_list_size(Lf)
+                                      .with_alpha(alpha)
+                                      .with_saturate_graph(false)
+                                      .with_num_threads(num_threads)
+                                      .build();
+
+        auto filter_params = diskann::IndexFilterParamsBuilder()
+                                 .with_universal_label(universal_label)
+                                 .with_label_file(label_file)
+                                 .with_save_path_prefix(index_path_prefix)
+                                 .build();
         auto config = diskann::IndexConfigBuilder()
                           .with_metric(metric)
                           .with_dimension(data_dim)
                           .with_max_points(data_num)
-                          .with_data_load_store_strategy(diskann::MEMORY)
+                          .with_data_load_store_strategy(diskann::DataStoreStrategy::MEMORY)
+                          .with_graph_load_store_strategy(diskann::GraphStoreStrategy::MEMORY)
                           .with_data_type(data_type)
                           .with_label_type(label_type)
                           .is_dynamic_index(false)
+                          .with_index_write_params(index_build_params)
                           .is_enable_tags(false)
                           .is_use_opq(use_opq)
                           .is_pq_dist_build(use_pq_build)
                           .with_num_pq_chunks(build_PQ_bytes)
                           .build();
 
-        auto index_build_params = diskann::IndexWriteParametersBuilder(L, R)
-                                      .with_filter_list_size(Lf)
-                                      .with_alpha(alpha)
-                                      .with_saturate_graph(false)
-                                      .with_num_threads(num_threads)
-                                      .build();
-
-        auto build_params = diskann::IndexBuildParamsBuilder(index_build_params)
-                                .with_universal_label(universal_label)
-                                .with_label_file(label_file)
-                                .with_save_path_prefix(index_path_prefix)
-                                .build();
         auto index_factory = diskann::IndexFactory(config);
         auto index = index_factory.create_instance();
-        index->build(data_path, data_num, build_params);
+        index->build(data_path, data_num, filter_params);
         index->save(index_path_prefix.c_str());
         index.reset();
         return 0;
diff --git a/apps/build_stitched_index.cpp b/apps/build_stitched_index.cpp
index 80481f8b0..60e38c1be 100644
--- a/apps/build_stitched_index.cpp
+++ b/apps/build_stitched_index.cpp
@@ -285,7 +285,9 @@ void prune_and_save(path final_index_path_prefix, path full_index_path_prefix, p
     auto pruning_index_timer = std::chrono::high_resolution_clock::now();
 
     diskann::get_bin_metadata(input_data_path, number_of_label_points, dimension);
-    diskann::Index<T> index(diskann::Metric::L2, dimension, number_of_label_points, false, false);
+
+    diskann::Index<T> index(diskann::Metric::L2, dimension, number_of_label_points, nullptr, nullptr, 0, false, false,
+                            false, false, 0, false);
 
     // not searching this index, set search_l to 0
     index.load(full_index_path_prefix.c_str(), num_threads, 1);
diff --git a/apps/search_disk_index.cpp b/apps/search_disk_index.cpp
index b46b37aef..7e2a7ac6d 100644
--- a/apps/search_disk_index.cpp
+++ b/apps/search_disk_index.cpp
@@ -118,13 +118,13 @@ int search_disk_index(diskann::Metric &metric, const std::string &index_path_pre
     {
         return res;
     }
-    // cache bfs levels
+
     std::vector<uint32_t> node_list;
-    diskann::cout << "Caching " << num_nodes_to_cache << " BFS nodes around medoid(s)" << std::endl;
-    //_pFlashIndex->cache_bfs_levels(num_nodes_to_cache, node_list);
-    if (num_nodes_to_cache > 0)
-        _pFlashIndex->generate_cache_list_from_sample_queries(warmup_query_file, 15, 6, num_nodes_to_cache, num_threads,
-                                                              node_list);
+    diskann::cout << "Caching " << num_nodes_to_cache << " nodes around medoid(s)" << std::endl;
+    _pFlashIndex->cache_bfs_levels(num_nodes_to_cache, node_list);
+    // if (num_nodes_to_cache > 0)
+    //     _pFlashIndex->generate_cache_list_from_sample_queries(warmup_query_file, 15, 6, num_nodes_to_cache,
+    //     num_threads, node_list);
     _pFlashIndex->load_cache_list(node_list);
     node_list.clear();
     node_list.shrink_to_fit();
diff --git a/apps/search_memory_index.cpp b/apps/search_memory_index.cpp
index 44817242c..1a9acc285 100644
--- a/apps/search_memory_index.cpp
+++ b/apps/search_memory_index.cpp
@@ -74,7 +74,8 @@ int search_memory_index(diskann::Metric &metric, const std::string &index_path,
                       .with_metric(metric)
                       .with_dimension(query_dim)
                       .with_max_points(0)
-                      .with_data_load_store_strategy(diskann::MEMORY)
+                      .with_data_load_store_strategy(diskann::DataStoreStrategy::MEMORY)
+                      .with_graph_load_store_strategy(diskann::GraphStoreStrategy::MEMORY)
                       .with_data_type(diskann_type_to_name<T>())
                       .with_label_type(diskann_type_to_name<LabelT>())
                       .with_tag_type(diskann_type_to_name<TagT>())
@@ -130,7 +131,7 @@ int search_memory_index(diskann::Metric &metric, const std::string &index_path,
     std::vector<std::vector<float>> query_result_dists(Lvec.size());
     std::vector<float> latency_stats(query_num, 0);
     std::vector<uint32_t> cmp_stats;
-    if (not tags)
+    if (not tags || filtered_search)
     {
         cmp_stats = std::vector<uint32_t>(query_num, 0);
     }
@@ -162,7 +163,7 @@ int search_memory_index(diskann::Metric &metric, const std::string &index_path,
         for (int64_t i = 0; i < (int64_t)query_num; i++)
         {
             auto qs = std::chrono::high_resolution_clock::now();
-            if (filtered_search)
+            if (filtered_search && !tags)
             {
                 std::string raw_filter = query_filters.size() == 1 ? query_filters[0] : query_filters[i];
 
@@ -178,8 +179,19 @@ int search_memory_index(diskann::Metric &metric, const std::string &index_path,
             }
             else if (tags)
             {
-                index->search_with_tags(query + i * query_aligned_dim, recall_at, L,
-                                        query_result_tags.data() + i * recall_at, nullptr, res);
+                if (!filtered_search)
+                {
+                    index->search_with_tags(query + i * query_aligned_dim, recall_at, L,
+                                            query_result_tags.data() + i * recall_at, nullptr, res);
+                }
+                else
+                {
+                    std::string raw_filter = query_filters.size() == 1 ? query_filters[0] : query_filters[i];
+
+                    index->search_with_tags(query + i * query_aligned_dim, recall_at, L,
+                                            query_result_tags.data() + i * recall_at, nullptr, res, true, raw_filter);
+                }
+
                 for (int64_t r = 0; r < (int64_t)recall_at; r++)
                 {
                     query_result_ids[test_id][recall_at * i + r] = query_result_tags[recall_at * i + r];
@@ -220,7 +232,7 @@ int search_memory_index(diskann::Metric &metric, const std::string &index_path,
 
         float avg_cmps = (float)std::accumulate(cmp_stats.begin(), cmp_stats.end(), 0) / (float)query_num;
 
-        if (tags)
+        if (tags && !filtered_search)
         {
             std::cout << std::setw(4) << L << std::setw(12) << displayed_qps << std::setw(20) << (float)mean_latency
                       << std::setw(15) << (float)latency_stats[(uint64_t)(0.999 * query_num)];
diff --git a/apps/test_insert_deletes_consolidate.cpp b/apps/test_insert_deletes_consolidate.cpp
index 700f4d7b6..97aed1864 100644
--- a/apps/test_insert_deletes_consolidate.cpp
+++ b/apps/test_insert_deletes_consolidate.cpp
@@ -11,6 +11,7 @@
 #include <future>
 
 #include "utils.h"
+#include "filter_utils.h"
 #include "program_options_utils.hpp"
 #include "index_factory.h"
 
@@ -91,16 +92,23 @@ std::string get_save_filename(const std::string &save_path, size_t points_to_ski
     return final_path;
 }
 
-template <typename T, typename TagT>
+template <typename T, typename TagT, typename LabelT>
 void insert_till_next_checkpoint(diskann::AbstractIndex &index, size_t start, size_t end, int32_t thread_count, T *data,
-                                 size_t aligned_dim)
+                                 size_t aligned_dim, std::vector<std::vector<LabelT>> &location_to_labels)
 {
     diskann::Timer insert_timer;
-
 #pragma omp parallel for num_threads(thread_count) schedule(dynamic)
     for (int64_t j = start; j < (int64_t)end; j++)
     {
-        index.insert_point(&data[(j - start) * aligned_dim], 1 + static_cast<TagT>(j));
+        if (!location_to_labels.empty())
+        {
+            index.insert_point(&data[(j - start) * aligned_dim], 1 + static_cast<TagT>(j),
+                               location_to_labels[j - start]);
+        }
+        else
+        {
+            index.insert_point(&data[(j - start) * aligned_dim], 1 + static_cast<TagT>(j));
+        }
     }
     const double elapsedSeconds = insert_timer.elapsed() / 1000000.0;
     std::cout << "Insertion time " << elapsedSeconds << " seconds (" << (end - start) / elapsedSeconds
@@ -141,35 +149,50 @@ void build_incremental_index(const std::string &data_path, diskann::IndexWritePa
                              size_t max_points_to_insert, size_t beginning_index_size, float start_point_norm,
                              uint32_t num_start_pts, size_t points_per_checkpoint, size_t checkpoints_per_snapshot,
                              const std::string &save_path, size_t points_to_delete_from_beginning,
-                             size_t start_deletes_after, bool concurrent)
+                             size_t start_deletes_after, bool concurrent, const std::string &label_file,
+                             const std::string &universal_label)
 {
     size_t dim, aligned_dim;
     size_t num_points;
     diskann::get_bin_metadata(data_path, num_points, dim);
     aligned_dim = ROUND_UP(dim, 8);
+    bool has_labels = label_file != "";
+    using TagT = uint32_t;
+    using LabelT = uint32_t;
+
+    size_t current_point_offset = points_to_skip;
+    const size_t last_point_threshold = points_to_skip + max_points_to_insert;
 
     bool enable_tags = true;
     using TagT = uint32_t;
-    auto data_type = diskann_type_to_name<T>();
-    auto tag_type = diskann_type_to_name<TagT>();
+    auto index_search_params = diskann::IndexSearchParams(params.search_list_size, params.num_threads);
     diskann::IndexConfig index_config = diskann::IndexConfigBuilder()
                                             .with_metric(diskann::L2)
                                             .with_dimension(dim)
                                             .with_max_points(max_points_to_insert)
                                             .is_dynamic_index(true)
                                             .with_index_write_params(params)
-                                            .with_search_threads(params.num_threads)
-                                            .with_initial_search_list_size(params.search_list_size)
-                                            .with_data_type(data_type)
-                                            .with_tag_type(tag_type)
-                                            .with_data_load_store_strategy(diskann::MEMORY)
+                                            .with_index_search_params(index_search_params)
+                                            .with_data_type(diskann_type_to_name<T>())
+                                            .with_tag_type(diskann_type_to_name<TagT>())
+                                            .with_label_type(diskann_type_to_name<LabelT>())
+                                            .with_data_load_store_strategy(diskann::DataStoreStrategy::MEMORY)
+                                            .with_graph_load_store_strategy(diskann::GraphStoreStrategy::MEMORY)
                                             .is_enable_tags(enable_tags)
+                                            .is_filtered(has_labels)
+                                            .with_num_frozen_pts(num_start_pts)
                                             .is_concurrent_consolidate(concurrent)
                                             .build();
 
     diskann::IndexFactory index_factory = diskann::IndexFactory(index_config);
     auto index = index_factory.create_instance();
 
+    if (universal_label != "")
+    {
+        LabelT u_label = 0;
+        index->set_universal_label(u_label);
+    }
+
     if (points_to_skip > num_points)
     {
         throw diskann::ANNException("Asked to skip more points than in data file", -1, __FUNCSIG__, __FILE__, __LINE__);
@@ -187,9 +210,6 @@ void build_incremental_index(const std::string &data_path, diskann::IndexWritePa
                   << " points since the data file has only that many" << std::endl;
     }
 
-    size_t current_point_offset = points_to_skip;
-    const size_t last_point_threshold = points_to_skip + max_points_to_insert;
-
     if (beginning_index_size > max_points_to_insert)
     {
         beginning_index_size = max_points_to_insert;
@@ -215,7 +235,7 @@ void build_incremental_index(const std::string &data_path, diskann::IndexWritePa
 
     if (beginning_index_size > 0)
     {
-        index->build(data, beginning_index_size, params, tags);
+        index->build(data, beginning_index_size, tags);
     }
     else
     {
@@ -226,7 +246,7 @@ void build_incremental_index(const std::string &data_path, diskann::IndexWritePa
     std::cout << "Initial non-incremental index build time for " << beginning_index_size << " points took "
               << elapsedSeconds << " seconds (" << beginning_index_size / elapsedSeconds << " points/second)\n ";
 
-    current_point_offset = beginning_index_size;
+    current_point_offset += beginning_index_size;
 
     if (points_to_delete_from_beginning > max_points_to_insert)
     {
@@ -235,8 +255,21 @@ void build_incremental_index(const std::string &data_path, diskann::IndexWritePa
                   << " points since the data file has only that many" << std::endl;
     }
 
+    std::vector<std::vector<LabelT>> location_to_labels;
     if (concurrent)
     {
+        // handle labels
+        const auto save_path_inc = get_save_filename(save_path + ".after-concurrent-delete-", points_to_skip,
+                                                     points_to_delete_from_beginning, last_point_threshold);
+        std::string labels_file_to_use = save_path_inc + "_label_formatted.txt";
+        std::string mem_labels_int_map_file = save_path_inc + "_labels_map.txt";
+        if (has_labels)
+        {
+            convert_labels_string_to_int(label_file, labels_file_to_use, mem_labels_int_map_file, universal_label);
+            auto parse_result = diskann::parse_formatted_label_file<LabelT>(labels_file_to_use);
+            location_to_labels = std::get<0>(parse_result);
+        }
+
         int32_t sub_threads = (params.num_threads + 1) / 2;
         bool delete_launched = false;
         std::future<void> delete_task;
@@ -251,7 +284,8 @@ void build_incremental_index(const std::string &data_path, diskann::IndexWritePa
 
             auto insert_task = std::async(std::launch::async, [&]() {
                 load_aligned_bin_part(data_path, data, start, end - start);
-                insert_till_next_checkpoint<T, TagT>(*index, start, end, sub_threads, data, aligned_dim);
+                insert_till_next_checkpoint<T, TagT, LabelT>(*index, start, end, sub_threads, data, aligned_dim,
+                                                             location_to_labels);
             });
             insert_task.wait();
 
@@ -271,12 +305,21 @@ void build_incremental_index(const std::string &data_path, diskann::IndexWritePa
         delete_task.wait();
 
         std::cout << "Time Elapsed " << timer.elapsed() / 1000 << "ms\n";
-        const auto save_path_inc = get_save_filename(save_path + ".after-concurrent-delete-", points_to_skip,
-                                                     points_to_delete_from_beginning, last_point_threshold);
         index->save(save_path_inc.c_str(), true);
     }
     else
     {
+        const auto save_path_inc = get_save_filename(save_path + ".after-delete-", points_to_skip,
+                                                     points_to_delete_from_beginning, last_point_threshold);
+        std::string labels_file_to_use = save_path_inc + "_label_formatted.txt";
+        std::string mem_labels_int_map_file = save_path_inc + "_labels_map.txt";
+        if (has_labels)
+        {
+            convert_labels_string_to_int(label_file, labels_file_to_use, mem_labels_int_map_file, universal_label);
+            auto parse_result = diskann::parse_formatted_label_file<LabelT>(labels_file_to_use);
+            location_to_labels = std::get<0>(parse_result);
+        }
+
         size_t last_snapshot_points_threshold = 0;
         size_t num_checkpoints_till_snapshot = checkpoints_per_snapshot;
 
@@ -287,7 +330,8 @@ void build_incremental_index(const std::string &data_path, diskann::IndexWritePa
             std::cout << std::endl << "Inserting from " << start << " to " << end << std::endl;
 
             load_aligned_bin_part(data_path, data, start, end - start);
-            insert_till_next_checkpoint<T, TagT>(*index, start, end, (int32_t)params.num_threads, data, aligned_dim);
+            insert_till_next_checkpoint<T, TagT, LabelT>(*index, start, end, (int32_t)params.num_threads, data,
+                                                         aligned_dim, location_to_labels);
 
             if (checkpoints_per_snapshot > 0 && --num_checkpoints_till_snapshot == 0)
             {
@@ -320,8 +364,7 @@ void build_incremental_index(const std::string &data_path, diskann::IndexWritePa
         {
             delete_from_beginning<T, TagT>(*index, params, points_to_skip, points_to_delete_from_beginning);
         }
-        const auto save_path_inc = get_save_filename(save_path + ".after-delete-", points_to_skip,
-                                                     points_to_delete_from_beginning, last_point_threshold);
+
         index->save(save_path_inc.c_str(), true);
     }
 
@@ -337,6 +380,10 @@ int main(int argc, char **argv)
         points_to_delete_from_beginning, start_deletes_after;
     bool concurrent;
 
+    // label options
+    std::string label_file, label_type, universal_label;
+    std::uint32_t Lf, unique_labels_supported;
+
     po::options_description desc{program_options_utils::make_program_description("test_insert_deletes_consolidate",
                                                                                  "Test insert deletes & consolidate")};
     try
@@ -385,6 +432,24 @@ int main(int argc, char **argv)
                                        po::value<uint64_t>(&start_deletes_after)->default_value(0), "");
         optional_configs.add_options()("start_point_norm", po::value<float>(&start_point_norm)->default_value(0),
                                        "Set the start point to a random point on a sphere of this radius");
+
+        // optional params for filters
+        optional_configs.add_options()("label_file", po::value<std::string>(&label_file)->default_value(""),
+                                       "Input label file in txt format for Filtered Index search. "
+                                       "The file should contain comma separated filters for each node "
+                                       "with each line corresponding to a graph node");
+        optional_configs.add_options()("universal_label", po::value<std::string>(&universal_label)->default_value(""),
+                                       "Universal label, if using it, only in conjunction with labels_file");
+        optional_configs.add_options()("FilteredLbuild,Lf", po::value<uint32_t>(&Lf)->default_value(0),
+                                       "Build complexity for filtered points, higher value "
+                                       "results in better graphs");
+        optional_configs.add_options()("label_type", po::value<std::string>(&label_type)->default_value("uint"),
+                                       "Storage type of Labels <uint/ushort>, default value is uint which "
+                                       "will consume memory 4 bytes per filter");
+        optional_configs.add_options()("unique_labels_supported",
+                                       po::value<uint32_t>(&unique_labels_supported)->default_value(0),
+                                       "Number of unique labels supported by the dynamic index.");
+
         optional_configs.add_options()(
             "num_start_points",
             po::value<uint32_t>(&num_start_pts)->default_value(diskann::defaults::NUM_FROZEN_POINTS_DYNAMIC),
@@ -418,30 +483,41 @@ int main(int argc, char **argv)
         return -1;
     }
 
+    bool has_labels = false;
+    if (!label_file.empty() || label_file != "")
+    {
+        has_labels = true;
+    }
+
+    if (num_start_pts < unique_labels_supported)
+    {
+        num_start_pts = unique_labels_supported;
+    }
+
     try
     {
         diskann::IndexWriteParameters params = diskann::IndexWriteParametersBuilder(L, R)
                                                    .with_max_occlusion_size(500)
                                                    .with_alpha(alpha)
                                                    .with_num_threads(num_threads)
-                                                   .with_num_frozen_points(num_start_pts)
+                                                   .with_filter_list_size(Lf)
                                                    .build();
 
         if (data_type == std::string("int8"))
-            build_incremental_index<int8_t>(data_path, params, points_to_skip, max_points_to_insert,
-                                            beginning_index_size, start_point_norm, num_start_pts,
-                                            points_per_checkpoint, checkpoints_per_snapshot, index_path_prefix,
-                                            points_to_delete_from_beginning, start_deletes_after, concurrent);
+            build_incremental_index<int8_t>(
+                data_path, params, points_to_skip, max_points_to_insert, beginning_index_size, start_point_norm,
+                num_start_pts, points_per_checkpoint, checkpoints_per_snapshot, index_path_prefix,
+                points_to_delete_from_beginning, start_deletes_after, concurrent, label_file, universal_label);
         else if (data_type == std::string("uint8"))
-            build_incremental_index<uint8_t>(data_path, params, points_to_skip, max_points_to_insert,
-                                             beginning_index_size, start_point_norm, num_start_pts,
-                                             points_per_checkpoint, checkpoints_per_snapshot, index_path_prefix,
-                                             points_to_delete_from_beginning, start_deletes_after, concurrent);
+            build_incremental_index<uint8_t>(
+                data_path, params, points_to_skip, max_points_to_insert, beginning_index_size, start_point_norm,
+                num_start_pts, points_per_checkpoint, checkpoints_per_snapshot, index_path_prefix,
+                points_to_delete_from_beginning, start_deletes_after, concurrent, label_file, universal_label);
         else if (data_type == std::string("float"))
             build_incremental_index<float>(data_path, params, points_to_skip, max_points_to_insert,
                                            beginning_index_size, start_point_norm, num_start_pts, points_per_checkpoint,
                                            checkpoints_per_snapshot, index_path_prefix, points_to_delete_from_beginning,
-                                           start_deletes_after, concurrent);
+                                           start_deletes_after, concurrent, label_file, universal_label);
         else
             std::cout << "Unsupported type. Use float/int8/uint8" << std::endl;
     }
diff --git a/apps/test_streaming_scenario.cpp b/apps/test_streaming_scenario.cpp
index 55e4e61cf..5a43a69f3 100644
--- a/apps/test_streaming_scenario.cpp
+++ b/apps/test_streaming_scenario.cpp
@@ -13,6 +13,7 @@
 #include <index_factory.h>
 
 #include "utils.h"
+#include "filter_utils.h"
 #include "program_options_utils.hpp"
 
 #ifndef _WINDOWS
@@ -84,9 +85,9 @@ std::string get_save_filename(const std::string &save_path, size_t active_window
     return final_path;
 }
 
-template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t>
+template <typename T, typename TagT, typename LabelT>
 void insert_next_batch(diskann::AbstractIndex &index, size_t start, size_t end, size_t insert_threads, T *data,
-                       size_t aligned_dim)
+                       size_t aligned_dim, std::vector<std::vector<LabelT>> &pts_to_labels)
 {
     try
     {
@@ -97,7 +98,18 @@ void insert_next_batch(diskann::AbstractIndex &index, size_t start, size_t end,
 #pragma omp parallel for num_threads((int32_t)insert_threads) schedule(dynamic) reduction(+ : num_failed)
         for (int64_t j = start; j < (int64_t)end; j++)
         {
-            if (index.insert_point(&data[(j - start) * aligned_dim], 1 + static_cast<TagT>(j)) != 0)
+            int insert_result = -1;
+            if (pts_to_labels.size() > 0)
+            {
+                insert_result = index.insert_point(&data[(j - start) * aligned_dim], 1 + static_cast<TagT>(j),
+                                                   pts_to_labels[j - start]);
+            }
+            else
+            {
+                insert_result = index.insert_point(&data[(j - start) * aligned_dim], 1 + static_cast<TagT>(j));
+            }
+
+            if (insert_result != 0)
             {
                 std::cerr << "Insert failed " << j << std::endl;
                 num_failed++;
@@ -113,6 +125,7 @@ void insert_next_batch(diskann::AbstractIndex &index, size_t start, size_t end,
     catch (std::system_error &e)
     {
         std::cout << "Exiting after catching exception in insertion task: " << e.what() << std::endl;
+        exit(-1);
     }
 }
 
@@ -167,40 +180,54 @@ void delete_and_consolidate(diskann::AbstractIndex &index, diskann::IndexWritePa
     }
 }
 
-template <typename T>
+template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t>
 void build_incremental_index(const std::string &data_path, const uint32_t L, const uint32_t R, const float alpha,
                              const uint32_t insert_threads, const uint32_t consolidate_threads,
                              size_t max_points_to_insert, size_t active_window, size_t consolidate_interval,
-                             const float start_point_norm, uint32_t num_start_pts, const std::string &save_path)
+                             const float start_point_norm, uint32_t num_start_pts, const std::string &save_path,
+                             const std::string &label_file, const std::string &universal_label, const uint32_t Lf)
 {
     const uint32_t C = 500;
     const bool saturate_graph = false;
-    using TagT = uint32_t;
-    using LabelT = uint32_t;
+    bool has_labels = label_file != "";
 
     diskann::IndexWriteParameters params = diskann::IndexWriteParametersBuilder(L, R)
                                                .with_max_occlusion_size(C)
                                                .with_alpha(alpha)
                                                .with_saturate_graph(saturate_graph)
                                                .with_num_threads(insert_threads)
-                                               .with_num_frozen_points(num_start_pts)
+                                               .with_filter_list_size(Lf)
                                                .build();
 
+    auto index_search_params = diskann::IndexSearchParams(L, insert_threads);
     diskann::IndexWriteParameters delete_params = diskann::IndexWriteParametersBuilder(L, R)
                                                       .with_max_occlusion_size(C)
                                                       .with_alpha(alpha)
                                                       .with_saturate_graph(saturate_graph)
                                                       .with_num_threads(consolidate_threads)
+                                                      .with_filter_list_size(Lf)
                                                       .build();
 
     size_t dim, aligned_dim;
     size_t num_points;
 
+    std::vector<std::vector<LabelT>> pts_to_labels;
+
+    const auto save_path_inc =
+        get_save_filename(save_path + ".after-streaming-", active_window, consolidate_interval, max_points_to_insert);
+    std::string labels_file_to_use = save_path_inc + "_label_formatted.txt";
+    std::string mem_labels_int_map_file = save_path_inc + "_labels_map.txt";
+    if (has_labels)
+    {
+        convert_labels_string_to_int(label_file, labels_file_to_use, mem_labels_int_map_file, universal_label);
+        auto parse_result = diskann::parse_formatted_label_file<LabelT>(labels_file_to_use);
+        pts_to_labels = std::get<0>(parse_result);
+    }
+
     diskann::get_bin_metadata(data_path, num_points, dim);
     diskann::cout << "metadata: file " << data_path << " has " << num_points << " points in " << dim << " dims"
                   << std::endl;
     aligned_dim = ROUND_UP(dim, 8);
-
     auto index_config = diskann::IndexConfigBuilder()
                             .with_metric(diskann::L2)
                             .with_dimension(dim)
@@ -208,20 +235,28 @@ void build_incremental_index(const std::string &data_path, const uint32_t L, con
                             .is_dynamic_index(true)
                             .is_enable_tags(true)
                             .is_use_opq(false)
+                            .is_filtered(has_labels)
                             .with_num_pq_chunks(0)
                             .is_pq_dist_build(false)
-                            .with_search_threads(insert_threads)
-                            .with_initial_search_list_size(L)
+                            .with_num_frozen_pts(num_start_pts)
                             .with_tag_type(diskann_type_to_name<TagT>())
                             .with_label_type(diskann_type_to_name<LabelT>())
                             .with_data_type(diskann_type_to_name<T>())
                             .with_index_write_params(params)
-                            .with_data_load_store_strategy(diskann::MEMORY)
+                            .with_index_search_params(index_search_params)
+                            .with_data_load_store_strategy(diskann::DataStoreStrategy::MEMORY)
+                            .with_graph_load_store_strategy(diskann::GraphStoreStrategy::MEMORY)
                             .build();
 
     diskann::IndexFactory index_factory = diskann::IndexFactory(index_config);
     auto index = index_factory.create_instance();
 
+    if (universal_label != "")
+    {
+        LabelT u_label = 0;
+        index->set_universal_label(u_label);
+    }
+
     if (max_points_to_insert == 0)
     {
         max_points_to_insert = num_points;
@@ -255,7 +290,8 @@ void build_incremental_index(const std::string &data_path, const uint32_t L, con
 
     auto insert_task = std::async(std::launch::async, [&]() {
         load_aligned_bin_part(data_path, data, 0, active_window);
-        insert_next_batch(*index, (size_t)0, active_window, params.num_threads, data, aligned_dim);
+        insert_next_batch<T, TagT, LabelT>(*index, (size_t)0, active_window, params.num_threads, data, aligned_dim,
+                                           pts_to_labels);
     });
     insert_task.wait();
 
@@ -265,7 +301,8 @@ void build_incremental_index(const std::string &data_path, const uint32_t L, con
         auto end = std::min(start + consolidate_interval, max_points_to_insert);
         auto insert_task = std::async(std::launch::async, [&]() {
             load_aligned_bin_part(data_path, data, start, end - start);
-            insert_next_batch(*index, start, end, params.num_threads, data, aligned_dim);
+            insert_next_batch<T, TagT, LabelT>(*index, start, end, params.num_threads, data, aligned_dim,
+                                               pts_to_labels);
         });
         insert_task.wait();
 
@@ -285,8 +322,7 @@ void build_incremental_index(const std::string &data_path, const uint32_t L, con
         delete_tasks[delete_tasks.size() - 1].wait();
 
     std::cout << "Time Elapsed " << timer.elapsed() / 1000 << "ms\n";
-    const auto save_path_inc =
-        get_save_filename(save_path + ".after-streaming-", active_window, consolidate_interval, max_points_to_insert);
+
     index->save(save_path_inc.c_str(), true);
 
     diskann::aligned_free(data);
@@ -294,9 +330,8 @@ void build_incremental_index(const std::string &data_path, const uint32_t L, con
 
 int main(int argc, char **argv)
 {
-    std::string data_type, dist_fn, data_path, index_path_prefix;
-    uint32_t insert_threads, consolidate_threads;
-    uint32_t R, L, num_start_pts;
+    std::string data_type, dist_fn, data_path, index_path_prefix, label_file, universal_label, label_type;
+    uint32_t insert_threads, consolidate_threads, R, L, num_start_pts, Lf, unique_labels_supported;
     float alpha, start_point_norm;
     size_t max_points_to_insert, active_window, consolidate_interval;
 
@@ -352,6 +387,22 @@ int main(int argc, char **argv)
             "Set the number of random start (frozen) points to use when "
             "inserting and searching");
 
+        optional_configs.add_options()("label_file", po::value<std::string>(&label_file)->default_value(""),
+                                       "Input label file in txt format for Filtered Index search. "
+                                       "The file should contain comma separated filters for each node "
+                                       "with each line corresponding to a graph node");
+        optional_configs.add_options()("universal_label", po::value<std::string>(&universal_label)->default_value(""),
+                                       "Universal label, if using it, only in conjunction with labels_file");
+        optional_configs.add_options()("FilteredLbuild,Lf", po::value<uint32_t>(&Lf)->default_value(0),
+                                       "Build complexity for filtered points, higher value "
+                                       "results in better graphs");
+        optional_configs.add_options()("label_type", po::value<std::string>(&label_type)->default_value("uint"),
+                                       "Storage type of Labels <uint/ushort>, default value is uint which "
+                                       "will consume memory 4 bytes per filter");
+        optional_configs.add_options()("unique_labels_supported",
+                                       po::value<uint32_t>(&unique_labels_supported)->default_value(0),
+                                       "Number of unique labels supported by the dynamic index.");
+
         // Merge required and optional parameters
         desc.add(required_configs).add(optional_configs);
 
@@ -363,13 +414,6 @@ int main(int argc, char **argv)
             return 0;
         }
         po::notify(vm);
-        if (start_point_norm == 0)
-        {
-            std::cout << "When beginning_index_size is 0, use a start point with "
-                         "appropriate norm"
-                      << std::endl;
-            return -1;
-        }
     }
     catch (const std::exception &ex)
     {
@@ -377,22 +421,92 @@ int main(int argc, char **argv)
         return -1;
     }
 
+    // Validate arguments
+    if (start_point_norm == 0)
+    {
+        std::cout << "When beginning_index_size is 0, use a start point with "
+                     "appropriate norm"
+                  << std::endl;
+        return -1;
+    }
+
+    if (label_type != std::string("ushort") && label_type != std::string("uint"))
+    {
+        std::cerr << "Invalid label type. Supported types are uint and ushort" << std::endl;
+        return -1;
+    }
+
+    if (data_type != std::string("int8") && data_type != std::string("uint8") && data_type != std::string("float"))
+    {
+        std::cerr << "Invalid data type. Supported types are int8, uint8 and float" << std::endl;
+        return -1;
+    }
+
+    // TODO: Are additional distance functions supported?
+    if (dist_fn != std::string("l2") && dist_fn != std::string("mips"))
+    {
+        std::cerr << "Invalid distance function. Supported functions are l2 and mips" << std::endl;
+        return -1;
+    }
+
+    if (num_start_pts < unique_labels_supported)
+    {
+        num_start_pts = unique_labels_supported;
+    }
+
     try
     {
-        if (data_type == std::string("int8"))
-            build_incremental_index<int8_t>(data_path, L, R, alpha, insert_threads, consolidate_threads,
-                                            max_points_to_insert, active_window, consolidate_interval, start_point_norm,
-                                            num_start_pts, index_path_prefix);
-        else if (data_type == std::string("uint8"))
-            build_incremental_index<uint8_t>(data_path, L, R, alpha, insert_threads, consolidate_threads,
-                                             max_points_to_insert, active_window, consolidate_interval,
-                                             start_point_norm, num_start_pts, index_path_prefix);
+        if (data_type == std::string("uint8"))
+        {
+            if (label_type == std::string("ushort"))
+            {
+                build_incremental_index<uint8_t, uint32_t, uint16_t>(
+                    data_path, L, R, alpha, insert_threads, consolidate_threads, max_points_to_insert, active_window,
+                    consolidate_interval, start_point_norm, num_start_pts, index_path_prefix, label_file,
+                    universal_label, Lf);
+            }
+            else if (label_type == std::string("uint"))
+            {
+                build_incremental_index<uint8_t, uint32_t, uint32_t>(
+                    data_path, L, R, alpha, insert_threads, consolidate_threads, max_points_to_insert, active_window,
+                    consolidate_interval, start_point_norm, num_start_pts, index_path_prefix, label_file,
+                    universal_label, Lf);
+            }
+        }
+        else if (data_type == std::string("int8"))
+        {
+            if (label_type == std::string("ushort"))
+            {
+                build_incremental_index<int8_t, uint32_t, uint16_t>(
+                    data_path, L, R, alpha, insert_threads, consolidate_threads, max_points_to_insert, active_window,
+                    consolidate_interval, start_point_norm, num_start_pts, index_path_prefix, label_file,
+                    universal_label, Lf);
+            }
+            else if (label_type == std::string("uint"))
+            {
+                build_incremental_index<int8_t, uint32_t, uint32_t>(
+                    data_path, L, R, alpha, insert_threads, consolidate_threads, max_points_to_insert, active_window,
+                    consolidate_interval, start_point_norm, num_start_pts, index_path_prefix, label_file,
+                    universal_label, Lf);
+            }
+        }
         else if (data_type == std::string("float"))
-            build_incremental_index<float>(data_path, L, R, alpha, insert_threads, consolidate_threads,
-                                           max_points_to_insert, active_window, consolidate_interval, start_point_norm,
-                                           num_start_pts, index_path_prefix);
-        else
-            std::cout << "Unsupported type. Use float/int8/uint8" << std::endl;
+        {
+            if (label_type == std::string("ushort"))
+            {
+                build_incremental_index<float, uint32_t, uint16_t>(
+                    data_path, L, R, alpha, insert_threads, consolidate_threads, max_points_to_insert, active_window,
+                    consolidate_interval, start_point_norm, num_start_pts, index_path_prefix, label_file,
+                    universal_label, Lf);
+            }
+            else if (label_type == std::string("uint"))
+            {
+                build_incremental_index<float, uint32_t, uint32_t>(
+                    data_path, L, R, alpha, insert_threads, consolidate_threads, max_points_to_insert, active_window,
+                    consolidate_interval, start_point_norm, num_start_pts, index_path_prefix, label_file,
+                    universal_label, Lf);
+            }
+        }
     }
     catch (const std::exception &e)
     {
diff --git a/apps/utils/compute_groundtruth.cpp b/apps/utils/compute_groundtruth.cpp
index f33a26b84..da32fd7c6 100644
--- a/apps/utils/compute_groundtruth.cpp
+++ b/apps/utils/compute_groundtruth.cpp
@@ -499,7 +499,8 @@ int main(int argc, char **argv)
         desc.add_options()("help,h", "Print information on arguments");
 
         desc.add_options()("data_type", po::value<std::string>(&data_type)->required(), "data type <int8/uint8/float>");
-        desc.add_options()("dist_fn", po::value<std::string>(&dist_fn)->required(), "distance function <l2/mips>");
+        desc.add_options()("dist_fn", po::value<std::string>(&dist_fn)->required(),
+                           "distance function <l2/mips/cosine>");
         desc.add_options()("base_file", po::value<std::string>(&base_file)->required(),
                            "File containing the base vectors in binary format");
         desc.add_options()("query_file", po::value<std::string>(&query_file)->required(),
diff --git a/apps/utils/compute_groundtruth_for_filters.cpp b/apps/utils/compute_groundtruth_for_filters.cpp
index 5be7135e1..52e586475 100644
--- a/apps/utils/compute_groundtruth_for_filters.cpp
+++ b/apps/utils/compute_groundtruth_for_filters.cpp
@@ -415,11 +415,6 @@ inline void parse_label_file_into_vec(size_t &line_cnt, const std::string &map_f
             lbls.push_back(token);
             labels.insert(token);
         }
-        if (lbls.size() <= 0)
-        {
-            std::cout << "No label found";
-            exit(-1);
-        }
         std::sort(lbls.begin(), lbls.end());
         pts_to_labels.push_back(lbls);
     }
diff --git a/apps/utils/count_bfs_levels.cpp b/apps/utils/count_bfs_levels.cpp
index ddc4eaf0b..6dd2d6233 100644
--- a/apps/utils/count_bfs_levels.cpp
+++ b/apps/utils/count_bfs_levels.cpp
@@ -27,7 +27,8 @@ template <typename T> void bfs_count(const std::string &index_path, uint32_t dat
 {
     using TagT = uint32_t;
     using LabelT = uint32_t;
-    diskann::Index<T, TagT, LabelT> index(diskann::Metric::L2, data_dims, 0, false, false);
+    diskann::Index<T, TagT, LabelT> index(diskann::Metric::L2, data_dims, 0, nullptr, nullptr, 0, false, false, false,
+                                          false, 0, false);
     std::cout << "Index class instantiated" << std::endl;
     index.load(index_path.c_str(), 1, 100);
     std::cout << "Index loaded" << std::endl;
diff --git a/apps/utils/rand_data_gen.cpp b/apps/utils/rand_data_gen.cpp
index a6f9305c8..e89ede800 100644
--- a/apps/utils/rand_data_gen.cpp
+++ b/apps/utils/rand_data_gen.cpp
@@ -11,23 +11,31 @@
 
 namespace po = boost::program_options;
 
-int block_write_float(std::ofstream &writer, size_t ndims, size_t npts, float norm)
+int block_write_float(std::ofstream &writer, size_t ndims, size_t npts, bool normalization, float norm,
+                      float rand_scale)
 {
     auto vec = new float[ndims];
 
     std::random_device rd{};
     std::mt19937 gen{rd()};
     std::normal_distribution<> normal_rand{0, 1};
+    std::uniform_real_distribution<> unif_dis(1.0, rand_scale);
 
     for (size_t i = 0; i < npts; i++)
     {
         float sum = 0;
+        float scale = 1.0f;
+        if (rand_scale > 1.0f)
+            scale = (float)unif_dis(gen);
         for (size_t d = 0; d < ndims; ++d)
-            vec[d] = (float)normal_rand(gen);
-        for (size_t d = 0; d < ndims; ++d)
-            sum += vec[d] * vec[d];
-        for (size_t d = 0; d < ndims; ++d)
-            vec[d] = vec[d] * norm / std::sqrt(sum);
+            vec[d] = scale * (float)normal_rand(gen);
+        if (normalization)
+        {
+            for (size_t d = 0; d < ndims; ++d)
+                sum += vec[d] * vec[d];
+            for (size_t d = 0; d < ndims; ++d)
+                vec[d] = vec[d] * norm / std::sqrt(sum);
+        }
 
         writer.write((char *)vec, ndims * sizeof(float));
     }
@@ -104,8 +112,8 @@ int main(int argc, char **argv)
 {
     std::string data_type, output_file;
     size_t ndims, npts;
-    float norm;
-
+    float norm, rand_scaling;
+    bool normalization = false;
     try
     {
         po::options_description desc{"Arguments"};
@@ -117,7 +125,11 @@ int main(int argc, char **argv)
                            "File name for saving the random vectors");
         desc.add_options()("ndims,D", po::value<uint64_t>(&ndims)->required(), "Dimensoinality of the vector");
         desc.add_options()("npts,N", po::value<uint64_t>(&npts)->required(), "Number of vectors");
-        desc.add_options()("norm", po::value<float>(&norm)->required(), "Norm of the vectors");
+        desc.add_options()("norm", po::value<float>(&norm)->default_value(-1.0f),
+                           "Norm of the vectors (if not specified, vectors are not normalized)");
+        desc.add_options()("rand_scaling", po::value<float>(&rand_scaling)->default_value(1.0f),
+                           "Each vector will be scaled (if not explicitly normalized) by a factor randomly chosen from "
+                           "[1, rand_scale]. Only applicable for floating point data");
         po::variables_map vm;
         po::store(po::parse_command_line(argc, argv, desc), vm);
         if (vm.count("help"))
@@ -139,9 +151,20 @@ int main(int argc, char **argv)
         return -1;
     }
 
-    if (norm <= 0.0)
+    if (norm > 0.0)
+    {
+        normalization = true;
+    }
+
+    if (rand_scaling < 1.0)
+    {
+        std::cout << "We will only scale the vector norms randomly in [1, value], so value must be >= 1." << std::endl;
+        return -1;
+    }
+
+    if ((rand_scaling > 1.0) && (normalization == true))
     {
-        std::cerr << "Error: Norm must be a positive number" << std::endl;
+        std::cout << "Data cannot be normalized and randomly scaled at same time. Use one or the other." << std::endl;
         return -1;
     }
 
@@ -155,6 +178,11 @@ int main(int argc, char **argv)
                       << std::endl;
             return -1;
         }
+        if (rand_scaling > 1.0)
+        {
+            std::cout << "Data scaling only supported for floating point data." << std::endl;
+            return -1;
+        }
     }
 
     try
@@ -177,7 +205,7 @@ int main(int argc, char **argv)
             size_t cblk_size = std::min(npts - i * blk_size, blk_size);
             if (data_type == std::string("float"))
             {
-                ret = block_write_float(writer, ndims, cblk_size, norm);
+                ret = block_write_float(writer, ndims, cblk_size, normalization, norm, rand_scaling);
             }
             else if (data_type == std::string("int8"))
             {
diff --git a/include/abstract_data_store.h b/include/abstract_data_store.h
index d858c8eef..165ada696 100644
--- a/include/abstract_data_store.h
+++ b/include/abstract_data_store.h
@@ -13,6 +13,8 @@
 namespace diskann
 {
 
+template <typename data_t> class AbstractScratch;
+
 template <typename data_t> class AbstractDataStore
 {
   public:
@@ -65,7 +67,7 @@ template <typename data_t> class AbstractDataStore
     // streaming setting
     virtual void get_vector(const location_t i, data_t *dest) const = 0;
     virtual void set_vector(const location_t i, const data_t *const vector) = 0;
-    virtual void prefetch_vector(const location_t loc) = 0;
+    virtual void prefetch_vector(const location_t loc) const = 0;
 
     // internal shuffle operations to move around vectors
     // will bulk-move all the vectors in [old_start_loc, old_start_loc +
@@ -78,11 +80,18 @@ template <typename data_t> class AbstractDataStore
     // num_points) to zero
     virtual void copy_vectors(const location_t from_loc, const location_t to_loc, const location_t num_points) = 0;
 
-    // metric specific operations
-
+    // With the PQ Data Store PR, we have also changed iterate_to_fixed_point to NOT take the query
+    // from the scratch object. Therefore every data store has to implement preprocess_query which
+    // at the least will be to copy the query into the scratch object. So making this pure virtual.
+    virtual void preprocess_query(const data_t *aligned_query,
+                                  AbstractScratch<data_t> *query_scratch = nullptr) const = 0;
+    // distance functions.
     virtual float get_distance(const data_t *query, const location_t loc) const = 0;
     virtual void get_distance(const data_t *query, const location_t *locations, const uint32_t location_count,
-                              float *distances) const = 0;
+                              float *distances, AbstractScratch<data_t> *scratch_space = nullptr) const = 0;
+    // Specific overload for index.cpp.
+    virtual void get_distance(const data_t *preprocessed_query, const std::vector<location_t> &ids,
+                              std::vector<float> &distances, AbstractScratch<data_t> *scratch_space) const = 0;
     virtual float get_distance(const location_t loc1, const location_t loc2) const = 0;
 
     // stats of the data stored in store
@@ -90,7 +99,10 @@ template <typename data_t> class AbstractDataStore
     // in the dataset
     virtual location_t calculate_medoid() const = 0;
 
-    virtual Distance<data_t> *get_dist_fn() = 0;
+    // REFACTOR PQ TODO: Each data store knows about its distance function, so this is
+    // redundant. However, we don't have an OptmizedDataStore yet, and to preserve code
+    // compability, we are exposing this function.
+    virtual Distance<data_t> *get_dist_fn() const = 0;
 
     // search helpers
     // if the base data is aligned per the request of the metric, this will tell
diff --git a/include/abstract_graph_store.h b/include/abstract_graph_store.h
index 387c8f675..4d6906ca4 100644
--- a/include/abstract_graph_store.h
+++ b/include/abstract_graph_store.h
@@ -5,7 +5,6 @@
 
 #include <string>
 #include <vector>
-
 #include "types.h"
 
 namespace diskann
@@ -14,20 +13,56 @@ namespace diskann
 class AbstractGraphStore
 {
   public:
-    AbstractGraphStore(const size_t max_pts) : _capacity(max_pts)
+    AbstractGraphStore(const size_t total_pts, const size_t reserve_graph_degree)
+        : _capacity(total_pts), _reserve_graph_degree(reserve_graph_degree)
     {
     }
 
     virtual ~AbstractGraphStore() = default;
 
-    virtual int load(const std::string &index_path_prefix) = 0;
-    virtual int store(const std::string &index_path_prefix) = 0;
+    // returns tuple of <nodes_read, start, num_frozen_points>
+    virtual std::tuple<uint32_t, uint32_t, size_t> load(const std::string &index_path_prefix,
+                                                        const size_t num_points) = 0;
+    virtual int store(const std::string &index_path_prefix, const size_t num_points, const size_t num_fz_points,
+                      const uint32_t start) = 0;
+
+    // not synchronised, user should use lock when necvessary.
+    virtual const std::vector<location_t> &get_neighbours(const location_t i) const = 0;
+    virtual void add_neighbour(const location_t i, location_t neighbour_id) = 0;
+    virtual void clear_neighbours(const location_t i) = 0;
+    virtual void swap_neighbours(const location_t a, location_t b) = 0;
+
+    virtual void set_neighbours(const location_t i, std::vector<location_t> &neighbours) = 0;
+
+    virtual size_t resize_graph(const size_t new_size) = 0;
+    virtual void clear_graph() = 0;
+
+    virtual uint32_t get_max_observed_degree() = 0;
+
+    // set during load
+    virtual size_t get_max_range_of_graph() = 0;
+
+    // Total internal points _max_points + _num_frozen_points
+    size_t get_total_points()
+    {
+        return _capacity;
+    }
 
-    virtual void get_adj_list(const location_t i, std::vector<location_t> &neighbors) = 0;
-    virtual void set_adj_list(const location_t i, std::vector<location_t> &neighbors) = 0;
+  protected:
+    // Internal function, changes total points when resize_graph is called.
+    void set_total_points(size_t new_capacity)
+    {
+        _capacity = new_capacity;
+    }
+
+    size_t get_reserve_graph_degree()
+    {
+        return _reserve_graph_degree;
+    }
 
   private:
     size_t _capacity;
+    size_t _reserve_graph_degree;
 };
 
-} // namespace diskann
+} // namespace diskann
\ No newline at end of file
diff --git a/include/abstract_index.h b/include/abstract_index.h
index 1a32bf8da..7c84a8ec9 100644
--- a/include/abstract_index.h
+++ b/include/abstract_index.h
@@ -42,11 +42,10 @@ class AbstractIndex
     virtual ~AbstractIndex() = default;
 
     virtual void build(const std::string &data_file, const size_t num_points_to_load,
-                       IndexBuildParams &build_params) = 0;
+                       IndexFilterParams &build_params) = 0;
 
     template <typename data_type, typename tag_type>
-    void build(const data_type *data, const size_t num_points_to_load, const IndexWriteParameters &parameters,
-               const std::vector<tag_type> &tags);
+    void build(const data_type *data, const size_t num_points_to_load, const std::vector<tag_type> &tags);
 
     virtual void save(const char *filename, bool compact_before_save = false) = 0;
 
@@ -63,7 +62,8 @@ class AbstractIndex
     // Initialize space for res_vectors before calling.
     template <typename data_type, typename tag_type>
     size_t search_with_tags(const data_type *query, const uint64_t K, const uint32_t L, tag_type *tags,
-                            float *distances, std::vector<data_type *> &res_vectors);
+                            float *distances, std::vector<data_type *> &res_vectors, bool use_filters = false,
+                            const std::string filter_label = "");
 
     // Added search overload that takes L as parameter, so that we
     // can customize L on a per-query basis without tampering with "Parameters"
@@ -79,10 +79,17 @@ class AbstractIndex
                                                       const size_t K, const uint32_t L, IndexType *indices,
                                                       float *distances);
 
+    // insert points with labels, labels should be present for filtered index
+    template <typename data_type, typename tag_type, typename label_type>
+    int insert_point(const data_type *point, const tag_type tag, const std::vector<label_type> &labels);
+
+    // insert point for unfiltered index build. do not use with filtered index
     template <typename data_type, typename tag_type> int insert_point(const data_type *point, const tag_type tag);
 
+    // delete point with tag, or return -1 if point can not be deleted
     template <typename tag_type> int lazy_delete(const tag_type &tag);
 
+    // batch delete tags and populates failed tags if unabke to delete given tags.
     template <typename tag_type>
     void lazy_delete(const std::vector<tag_type> &tags, std::vector<tag_type> &failed_tags);
 
@@ -97,14 +104,19 @@ class AbstractIndex
     // memory should be allocated for vec before calling this function
     template <typename tag_type, typename data_type> int get_vector_by_tag(tag_type &tag, data_type *vec);
 
+    template <typename label_type> void set_universal_label(const label_type universal_label);
+
+    virtual bool is_label_valid(const std::string &raw_label) const = 0;
+    virtual bool is_set_universal_label() const = 0;
+
   private:
-    virtual void _build(const DataType &data, const size_t num_points_to_load, const IndexWriteParameters &parameters,
-                        TagVector &tags) = 0;
+    virtual void _build(const DataType &data, const size_t num_points_to_load, TagVector &tags) = 0;
     virtual std::pair<uint32_t, uint32_t> _search(const DataType &query, const size_t K, const uint32_t L,
                                                   std::any &indices, float *distances = nullptr) = 0;
     virtual std::pair<uint32_t, uint32_t> _search_with_filters(const DataType &query, const std::string &filter_label,
                                                                const size_t K, const uint32_t L, std::any &indices,
                                                                float *distances) = 0;
+    virtual int _insert_point(const DataType &data_point, const TagType tag, Labelvector &labels) = 0;
     virtual int _insert_point(const DataType &data_point, const TagType tag) = 0;
     virtual int _lazy_delete(const TagType &tag) = 0;
     virtual void _lazy_delete(TagVector &tags, TagVector &failed_tags) = 0;
@@ -112,7 +124,9 @@ class AbstractIndex
     virtual void _set_start_points_at_random(DataType radius, uint32_t random_seed = 0) = 0;
     virtual int _get_vector_by_tag(TagType &tag, DataType &vec) = 0;
     virtual size_t _search_with_tags(const DataType &query, const uint64_t K, const uint32_t L, const TagType &tags,
-                                     float *distances, DataVector &res_vectors) = 0;
+                                     float *distances, DataVector &res_vectors, bool use_filters = false,
+                                     const std::string filter_label = "") = 0;
     virtual void _search_with_optimized_layout(const DataType &query, size_t K, size_t L, uint32_t *indices) = 0;
+    virtual void _set_universal_label(const LabelType universal_label) = 0;
 };
 } // namespace diskann
diff --git a/include/abstract_scratch.h b/include/abstract_scratch.h
new file mode 100644
index 000000000..b42a836f6
--- /dev/null
+++ b/include/abstract_scratch.h
@@ -0,0 +1,35 @@
+#pragma once
+namespace diskann
+{
+
+template <typename data_t> class PQScratch;
+
+// By somewhat more than a coincidence, it seems that both InMemQueryScratch
+// and SSDQueryScratch have the aligned query and PQScratch objects. So we
+// can put them in a neat hierarchy and keep PQScratch as a standalone class.
+template <typename data_t> class AbstractScratch
+{
+  public:
+    AbstractScratch() = default;
+    // This class does not take any responsibilty for memory management of
+    // its members. It is the responsibility of the derived classes to do so.
+    virtual ~AbstractScratch() = default;
+
+    // Scratch objects should not be copied
+    AbstractScratch(const AbstractScratch &) = delete;
+    AbstractScratch &operator=(const AbstractScratch &) = delete;
+
+    data_t *aligned_query_T()
+    {
+        return _aligned_query_T;
+    }
+    PQScratch<data_t> *pq_scratch()
+    {
+        return _pq_scratch;
+    }
+
+  protected:
+    data_t *_aligned_query_T = nullptr;
+    PQScratch<data_t> *_pq_scratch = nullptr;
+};
+} // namespace diskann
diff --git a/include/aligned_file_reader.h b/include/aligned_file_reader.h
index f5e2af5c3..f39d5da39 100644
--- a/include/aligned_file_reader.h
+++ b/include/aligned_file_reader.h
@@ -117,4 +117,9 @@ class AlignedFileReader
     // process batch of aligned requests in parallel
     // NOTE :: blocking call
     virtual void read(std::vector<AlignedRead> &read_reqs, IOContext &ctx, bool async = false) = 0;
+
+#ifdef USE_BING_INFRA
+    // wait for completion of one request in a batch of requests
+    virtual void wait(IOContext &ctx, int &completedIndex) = 0;
+#endif
 };
diff --git a/include/defaults.h b/include/defaults.h
index 2f157cb25..ef1750fcf 100644
--- a/include/defaults.h
+++ b/include/defaults.h
@@ -11,9 +11,19 @@ namespace defaults
 const float ALPHA = 1.2f;
 const uint32_t NUM_THREADS = 0;
 const uint32_t MAX_OCCLUSION_SIZE = 750;
+const bool HAS_LABELS = false;
 const uint32_t FILTER_LIST_SIZE = 0;
 const uint32_t NUM_FROZEN_POINTS_STATIC = 0;
 const uint32_t NUM_FROZEN_POINTS_DYNAMIC = 1;
+
+// In-mem index related limits
+const float GRAPH_SLACK_FACTOR = 1.3f;
+
+// SSD Index related limits
+const uint64_t MAX_GRAPH_DEGREE = 512;
+const uint64_t SECTOR_LEN = 4096;
+const uint64_t MAX_N_SECTOR_READS = 128;
+
 // following constants should always be specified, but are useful as a
 // sensible default at cli / python boundaries
 const uint32_t MAX_DEGREE = 64;
diff --git a/include/distance.h b/include/distance.h
index 8b20e586b..f3b1de25a 100644
--- a/include/distance.h
+++ b/include/distance.h
@@ -64,7 +64,7 @@ template <typename T> class Distance
 
     // Providing a default implementation for the virtual destructor because we
     // don't expect most metric implementations to need it.
-    DISKANN_DLLEXPORT virtual ~Distance();
+    DISKANN_DLLEXPORT virtual ~Distance() = default;
 
   protected:
     diskann::Metric _distance_metric;
diff --git a/include/filter_utils.h b/include/filter_utils.h
index df1970be4..55f7aed28 100644
--- a/include/filter_utils.h
+++ b/include/filter_utils.h
@@ -57,6 +57,10 @@ DISKANN_DLLEXPORT void generate_label_indices(path input_data_path, path final_i
 DISKANN_DLLEXPORT load_label_index_return_values load_label_index(path label_index_path,
                                                                   uint32_t label_number_of_points);
 
+template <typename LabelT>
+DISKANN_DLLEXPORT std::tuple<std::vector<std::vector<LabelT>>, tsl::robin_set<LabelT>> parse_formatted_label_file(
+    path label_file);
+
 DISKANN_DLLEXPORT parse_label_file_return_values parse_label_file(path label_data_path, std::string universal_label);
 
 template <typename T>
diff --git a/include/in_mem_data_store.h b/include/in_mem_data_store.h
index 0509b3b82..eaa1562e0 100644
--- a/include/in_mem_data_store.h
+++ b/include/in_mem_data_store.h
@@ -21,7 +21,7 @@ namespace diskann
 template <typename data_t> class InMemDataStore : public AbstractDataStore<data_t>
 {
   public:
-    InMemDataStore(const location_t capacity, const size_t dim, std::shared_ptr<Distance<data_t>> distance_fn);
+    InMemDataStore(const location_t capacity, const size_t dim, std::unique_ptr<Distance<data_t>> distance_fn);
     virtual ~InMemDataStore();
 
     virtual location_t load(const std::string &filename) override;
@@ -38,20 +38,26 @@ template <typename data_t> class InMemDataStore : public AbstractDataStore<data_
 
     virtual void get_vector(const location_t i, data_t *target) const override;
     virtual void set_vector(const location_t i, const data_t *const vector) override;
-    virtual void prefetch_vector(const location_t loc) override;
+    virtual void prefetch_vector(const location_t loc) const override;
 
     virtual void move_vectors(const location_t old_location_start, const location_t new_location_start,
                               const location_t num_points) override;
     virtual void copy_vectors(const location_t from_loc, const location_t to_loc, const location_t num_points) override;
 
-    virtual float get_distance(const data_t *query, const location_t loc) const override;
+    virtual void preprocess_query(const data_t *query, AbstractScratch<data_t> *query_scratch) const override;
+
+    virtual float get_distance(const data_t *preprocessed_query, const location_t loc) const override;
     virtual float get_distance(const location_t loc1, const location_t loc2) const override;
-    virtual void get_distance(const data_t *query, const location_t *locations, const uint32_t location_count,
-                              float *distances) const override;
+
+    virtual void get_distance(const data_t *preprocessed_query, const location_t *locations,
+                              const uint32_t location_count, float *distances,
+                              AbstractScratch<data_t> *scratch) const override;
+    virtual void get_distance(const data_t *preprocessed_query, const std::vector<location_t> &ids,
+                              std::vector<float> &distances, AbstractScratch<data_t> *scratch_space) const override;
 
     virtual location_t calculate_medoid() const override;
 
-    virtual Distance<data_t> *get_dist_fn() override;
+    virtual Distance<data_t> *get_dist_fn() const override;
 
     virtual size_t get_alignment_factor() const override;
 
@@ -73,7 +79,7 @@ template <typename data_t> class InMemDataStore : public AbstractDataStore<data_
     // but this gives us perf benefits as the datastore can do distance
     // computations during search and compute norms of vectors internally without
     // have to copy data back and forth.
-    std::shared_ptr<Distance<data_t>> _distance_fn;
+    std::unique_ptr<Distance<data_t>> _distance_fn;
 
     // in case we need to save vector norms for optimization
     std::shared_ptr<float[]> _pre_computed_norms;
diff --git a/include/in_mem_graph_store.h b/include/in_mem_graph_store.h
index 98a9e4dc5..d0206a7d6 100644
--- a/include/in_mem_graph_store.h
+++ b/include/in_mem_graph_store.h
@@ -11,13 +11,41 @@ namespace diskann
 class InMemGraphStore : public AbstractGraphStore
 {
   public:
-    InMemGraphStore(const size_t max_pts);
+    InMemGraphStore(const size_t total_pts, const size_t reserve_graph_degree);
 
-    int load(const std::string &index_path_prefix);
-    int store(const std::string &index_path_prefix);
+    // returns tuple of <nodes_read, start, num_frozen_points>
+    virtual std::tuple<uint32_t, uint32_t, size_t> load(const std::string &index_path_prefix,
+                                                        const size_t num_points) override;
+    virtual int store(const std::string &index_path_prefix, const size_t num_points, const size_t num_frozen_points,
+                      const uint32_t start) override;
 
-    void get_adj_list(const location_t i, std::vector<location_t> &neighbors);
-    void set_adj_list(const location_t i, std::vector<location_t> &neighbors);
+    virtual const std::vector<location_t> &get_neighbours(const location_t i) const override;
+    virtual void add_neighbour(const location_t i, location_t neighbour_id) override;
+    virtual void clear_neighbours(const location_t i) override;
+    virtual void swap_neighbours(const location_t a, location_t b) override;
+
+    virtual void set_neighbours(const location_t i, std::vector<location_t> &neighbors) override;
+
+    virtual size_t resize_graph(const size_t new_size) override;
+    virtual void clear_graph() override;
+
+    virtual size_t get_max_range_of_graph() override;
+    virtual uint32_t get_max_observed_degree() override;
+
+  protected:
+    virtual std::tuple<uint32_t, uint32_t, size_t> load_impl(const std::string &filename, size_t expected_num_points);
+#ifdef EXEC_ENV_OLS
+    virtual std::tuple<uint32_t, uint32_t, size_t> load_impl(AlignedFileReader &reader, size_t expected_num_points);
+#endif
+
+    int save_graph(const std::string &index_path_prefix, const size_t active_points, const size_t num_frozen_points,
+                   const uint32_t start);
+
+  private:
+    size_t _max_range_of_graph = 0;
+    uint32_t _max_observed_degree = 0;
+
+    std::vector<std::vector<uint32_t>> _graph;
 };
 
 } // namespace diskann
diff --git a/include/index.h b/include/index.h
index f341a3db2..320942013 100644
--- a/include/index.h
+++ b/include/index.h
@@ -19,9 +19,13 @@
 #include "windows_customizations.h"
 #include "scratch.h"
 #include "in_mem_data_store.h"
+#include "in_mem_graph_store.h"
 #include "abstract_index.h"
 #include <bitset>
 
+#include "quantized_distance.h"
+#include "pq_data_store.h"
+
 #define OVERHEAD_FACTOR 1.1
 #define EXPAND_IF_FULL 0
 #define DEFAULT_MAXC 750
@@ -32,7 +36,7 @@ namespace diskann
 inline double estimate_ram_usage(size_t size, uint32_t dim, uint32_t datasize, uint32_t degree)
 {
     double size_of_data = ((double)size) * ROUND_UP(dim, 8) * datasize;
-    double size_of_graph = ((double)size) * degree * sizeof(uint32_t) * GRAPH_SLACK_FACTOR;
+    double size_of_graph = ((double)size) * degree * sizeof(uint32_t) * defaults::GRAPH_SLACK_FACTOR;
     double size_of_locks = ((double)size) * sizeof(non_recursive_mutex);
     double size_of_outer_vector = ((double)size) * sizeof(ptrdiff_t);
 
@@ -163,21 +167,18 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
   public:
     // Constructor for Bulk operations and for creating the index object solely
     // for loading a prexisting index.
-    DISKANN_DLLEXPORT Index(Metric m, const size_t dim, const size_t max_points = 1, const bool dynamic_index = false,
-                            const bool enable_tags = false, const bool concurrent_consolidate = false,
-                            const bool pq_dist_build = false, const size_t num_pq_chunks = 0,
-                            const bool use_opq = false, const size_t num_frozen_pts = 0,
-                            const bool init_data_store = true);
+    DISKANN_DLLEXPORT Index(const IndexConfig &index_config, std::shared_ptr<AbstractDataStore<T>> data_store,
+                            std::unique_ptr<AbstractGraphStore> graph_store,
+                            std::shared_ptr<AbstractDataStore<T>> pq_data_store = nullptr);
 
     // Constructor for incremental index
-    DISKANN_DLLEXPORT Index(Metric m, const size_t dim, const size_t max_points, const bool dynamic_index,
-                            const IndexWriteParameters &indexParameters, const uint32_t initial_search_list_size,
-                            const uint32_t search_threads, const bool enable_tags = false,
-                            const bool concurrent_consolidate = false, const bool pq_dist_build = false,
-                            const size_t num_pq_chunks = 0, const bool use_opq = false);
-
-    DISKANN_DLLEXPORT Index(const IndexConfig &index_config, std::unique_ptr<AbstractDataStore<T>> data_store
-                            /* std::unique_ptr<AbstractGraphStore> graph_store*/);
+    DISKANN_DLLEXPORT Index(Metric m, const size_t dim, const size_t max_points,
+                            const std::shared_ptr<IndexWriteParameters> index_parameters,
+                            const std::shared_ptr<IndexSearchParams> index_search_params,
+                            const size_t num_frozen_pts = 0, const bool dynamic_index = false,
+                            const bool enable_tags = false, const bool concurrent_consolidate = false,
+                            const bool pq_dist_build = false, const size_t num_pq_chunks = 0,
+                            const bool use_opq = false, const bool filtered_index = false);
 
     DISKANN_DLLEXPORT ~Index();
 
@@ -203,31 +204,31 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
 
     // Batch build from a file. Optionally pass tags vector.
     DISKANN_DLLEXPORT void build(const char *filename, const size_t num_points_to_load,
-                                 const IndexWriteParameters &parameters,
                                  const std::vector<TagT> &tags = std::vector<TagT>());
 
     // Batch build from a file. Optionally pass tags file.
-    DISKANN_DLLEXPORT void build(const char *filename, const size_t num_points_to_load,
-                                 const IndexWriteParameters &parameters, const char *tag_filename);
+    DISKANN_DLLEXPORT void build(const char *filename, const size_t num_points_to_load, const char *tag_filename);
 
     // Batch build from a data array, which must pad vectors to aligned_dim
-    DISKANN_DLLEXPORT void build(const T *data, const size_t num_points_to_load, const IndexWriteParameters &parameters,
-                                 const std::vector<TagT> &tags);
+    DISKANN_DLLEXPORT void build(const T *data, const size_t num_points_to_load, const std::vector<TagT> &tags);
 
+    // Based on filter params builds a filtered or unfiltered index
     DISKANN_DLLEXPORT void build(const std::string &data_file, const size_t num_points_to_load,
-                                 IndexBuildParams &build_params);
+                                 IndexFilterParams &filter_params);
 
     // Filtered Support
     DISKANN_DLLEXPORT void build_filtered_index(const char *filename, const std::string &label_file,
-                                                const size_t num_points_to_load, IndexWriteParameters &parameters,
+                                                const size_t num_points_to_load,
                                                 const std::vector<TagT> &tags = std::vector<TagT>());
 
     DISKANN_DLLEXPORT void set_universal_label(const LabelT &label);
 
     // Get converted integer label from string to int map (_label_map)
-    DISKANN_DLLEXPORT LabelT get_converted_label(const std::string &raw_label);
+    DISKANN_DLLEXPORT LabelT get_converted_label(const std::string &raw_label) const;
+
+    DISKANN_DLLEXPORT bool is_label_valid(const std::string& raw_label) const override;
 
-    DISKANN_DLLEXPORT bool is_label_valid(const std::string& raw_label);
+    DISKANN_DLLEXPORT bool is_set_universal_label() const override;
 
     // Set starting point of an index before inserting any points incrementally.
     // The data count should be equal to _num_frozen_pts * _aligned_dim.
@@ -251,7 +252,8 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
 
     // Initialize space for res_vectors before calling.
     DISKANN_DLLEXPORT size_t search_with_tags(const T *query, const uint64_t K, const uint32_t L, TagT *tags,
-                                              float *distances, std::vector<T *> &res_vectors);
+                                              float *distances, std::vector<T *> &res_vectors, bool use_filters = false,
+                                              const std::string filter_label = "");
 
     // Filter support search
     template <typename IndexType>
@@ -262,6 +264,9 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
     // Will fail if tag already in the index or if tag=0.
     DISKANN_DLLEXPORT int insert_point(const T *point, const TagT tag);
 
+    // Will fail if tag already in the index or if tag=0.
+    DISKANN_DLLEXPORT int insert_point(const T *point, const TagT tag, const std::vector<LabelT> &label);
+
     // call this before issuing deletions to sets relevant flags
     DISKANN_DLLEXPORT int enable_delete();
 
@@ -313,8 +318,7 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
 
   protected:
     // overload of abstract index virtual methods
-    virtual void _build(const DataType &data, const size_t num_points_to_load, const IndexWriteParameters &parameters,
-                        TagVector &tags) override;
+    virtual void _build(const DataType &data, const size_t num_points_to_load, TagVector &tags) override;
 
     virtual std::pair<uint32_t, uint32_t> _search(const DataType &query, const size_t K, const uint32_t L,
                                                   std::any &indices, float *distances = nullptr) override;
@@ -324,6 +328,7 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
                                                                float *distances) override;
 
     virtual int _insert_point(const DataType &data_point, const TagType tag) override;
+    virtual int _insert_point(const DataType &data_point, const TagType tag, Labelvector &labels) override;
 
     virtual int _lazy_delete(const TagType &tag) override;
 
@@ -338,7 +343,10 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
     virtual void _search_with_optimized_layout(const DataType &query, size_t K, size_t L, uint32_t *indices) override;
 
     virtual size_t _search_with_tags(const DataType &query, const uint64_t K, const uint32_t L, const TagType &tags,
-                                     float *distances, DataVector &res_vectors) override;
+                                     float *distances, DataVector &res_vectors, bool use_filters = false,
+                                     const std::string filter_label = "") override;
+
+    virtual void _set_universal_label(const LabelType universal_label) override;
 
     // No copy/assign.
     Index(const Index<T, TagT, LabelT> &) = delete;
@@ -346,7 +354,7 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
 
     // Use after _data and _nd have been populated
     // Acquire exclusive _update_lock before calling
-    void build_with_data_populated(const IndexWriteParameters &parameters, const std::vector<TagT> &tags);
+    void build_with_data_populated(const std::vector<TagT> &tags);
 
     // generates 1 frozen point that will never be deleted from the graph
     // This is not visible to the user
@@ -367,9 +375,9 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
     // with iterate_to_fixed_point.
     std::vector<uint32_t> get_init_ids();
 
-    std::pair<uint32_t, uint32_t> iterate_to_fixed_point(const T *node_coords, const uint32_t Lindex,
-                                                         const std::vector<uint32_t> &init_ids,
-                                                         InMemQueryScratch<T> *scratch, bool use_filter,
+    // The query to use is placed in scratch->aligned_query
+    std::pair<uint32_t, uint32_t> iterate_to_fixed_point(InMemQueryScratch<T> *scratch, const uint32_t Lindex,
+                                                         const std::vector<uint32_t> &init_ids, bool use_filter,
                                                          const std::vector<LabelT> &filters, bool search_invocation);
 
     void search_for_point_and_prune(int location, uint32_t Lindex, std::vector<uint32_t> &pruned_list,
@@ -396,7 +404,7 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
     void inter_insert(uint32_t n, std::vector<uint32_t> &pruned_list, InMemQueryScratch<T> *scratch);
 
     // Acquire exclusive _update_lock before calling
-    void link(const IndexWriteParameters &parameters);
+    void link();
 
     // Acquire exclusive _tag_lock and _delete_lock before calling
     int reserve_location();
@@ -449,16 +457,15 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
   private:
     // Distance functions
     Metric _dist_metric = diskann::L2;
-    std::shared_ptr<Distance<T>> _distance;
 
     // Data
-    std::unique_ptr<AbstractDataStore<T>> _data_store;
-    char *_opt_graph = nullptr;
+    std::shared_ptr<AbstractDataStore<T>> _data_store;
 
     // Graph related data structures
-    std::vector<std::vector<uint32_t>> _final_graph;
+    std::unique_ptr<AbstractGraphStore> _graph_store;
+
+    char *_opt_graph = nullptr;
 
-    T *_data = nullptr; // coordinates of all base points
     // Dimensions
     size_t _dim = 0;
     size_t _nd = 0;         // number of active points i.e. existing in the graph
@@ -470,15 +477,14 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
     // needed for a dynamic index. The frozen points have consecutive locations.
     // See also _start below.
     size_t _num_frozen_pts = 0;
-    size_t _max_range_of_loaded_graph = 0;
+    size_t _frozen_pts_used = 0;
     size_t _node_size;
     size_t _data_len;
     size_t _neighbor_len;
 
-    uint32_t _max_observed_degree = 0;
-    // Start point of the search. When _num_frozen_pts is greater than zero,
-    // this is the location of the first frozen point. Otherwise, this is a
-    // location of one of the points in index.
+    //  Start point of the search. When _num_frozen_pts is greater than zero,
+    //  this is the location of the first frozen point. Otherwise, this is a
+    //  location of one of the points in index.
     uint32_t _start = 0;
 
     bool _has_built = false;
@@ -492,11 +498,14 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
     // Filter Support
 
     bool _filtered_index = false;
-    std::vector<std::vector<LabelT>> _pts_to_labels;
+    // Location to label is only updated during insert_point(), all other reads are protected by
+    // default as a location can only be released at end of consolidate deletes
+    std::vector<std::vector<LabelT>> _location_to_labels;
     tsl::robin_set<LabelT> _labels;
     std::string _labels_file;
-    std::unordered_map<LabelT, uint32_t> _label_to_medoid_id;
+    std::unordered_map<LabelT, uint32_t> _label_to_start_id;
     std::unordered_map<uint32_t, uint32_t> _medoid_counts;
+
     bool _use_universal_label = false;
     LabelT _universal_label = 0;
     uint32_t _filterIndexingQueueSize;
@@ -507,6 +516,7 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
     uint32_t _indexingRange;
     uint32_t _indexingMaxC;
     float _indexingAlpha;
+    uint32_t _indexingThreads;
 
     // Query scratch data structures
     ConcurrentQueue<InMemQueryScratch<T> *> _query_scratch;
@@ -515,7 +525,10 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
     bool _pq_dist = false;
     bool _use_opq = false;
     size_t _num_pq_chunks = 0;
-    uint8_t *_pq_data = nullptr;
+    // REFACTOR
+    // uint8_t *_pq_data = nullptr;
+    std::shared_ptr<QuantizedDistance<T>> _pq_distance_fn = nullptr;
+    std::shared_ptr<AbstractDataStore<T>> _pq_data_store = nullptr;
     bool _pq_generated = false;
     FixedChunkPQTable _pq_table;
 
@@ -545,11 +558,11 @@ template <typename T, typename TagT = uint32_t, typename LabelT = uint32_t> clas
     std::shared_timed_mutex // Ensure only one consolidate or compact_data is
         _consolidate_lock;  // ever active
     std::shared_timed_mutex // RW lock for _tag_to_location,
-        _tag_lock;          // _location_to_tag, _empty_slots, _nd, _max_points
+        _tag_lock;          // _location_to_tag, _empty_slots, _nd, _max_points, _label_to_start_id
     std::shared_timed_mutex // RW Lock on _delete_set and _data_compacted
         _delete_lock;       // variable
 
-    // Per node lock, cardinality=_max_points
+    // Per node lock, cardinality=_max_points + _num_frozen_points
     std::vector<non_recursive_mutex> _locks;
 
     simple_bitmask_buf _bitmask_buf;
diff --git a/include/index_build_params.h b/include/index_build_params.h
index ff68c5001..0233fcec4 100644
--- a/include/index_build_params.h
+++ b/include/index_build_params.h
@@ -3,31 +3,31 @@
 
 namespace diskann
 {
-struct IndexBuildParams
+struct IndexFilterParams
 {
   public:
-    diskann::IndexWriteParameters index_write_params;
     std::string save_path_prefix;
     std::string label_file;
+    std::string tags_file;
     std::string universal_label;
     uint32_t filter_threshold = 0;
 
   private:
-    IndexBuildParams(const IndexWriteParameters &index_write_params, const std::string &save_path_prefix,
-                     const std::string &label_file, const std::string &universal_label, uint32_t filter_threshold)
-        : index_write_params(index_write_params), save_path_prefix(save_path_prefix), label_file(label_file),
-          universal_label(universal_label), filter_threshold(filter_threshold)
+    IndexFilterParams(const std::string &save_path_prefix, const std::string &label_file,
+                      const std::string &universal_label, uint32_t filter_threshold)
+        : save_path_prefix(save_path_prefix), label_file(label_file), universal_label(universal_label),
+          filter_threshold(filter_threshold)
     {
     }
 
-    friend class IndexBuildParamsBuilder;
+    friend class IndexFilterParamsBuilder;
 };
-class IndexBuildParamsBuilder
+class IndexFilterParamsBuilder
 {
   public:
-    IndexBuildParamsBuilder(const diskann::IndexWriteParameters &paras) : _index_write_params(paras){};
+    IndexFilterParamsBuilder() = default;
 
-    IndexBuildParamsBuilder &with_save_path_prefix(const std::string &save_path_prefix)
+    IndexFilterParamsBuilder &with_save_path_prefix(const std::string &save_path_prefix)
     {
         if (save_path_prefix.empty() || save_path_prefix == "")
             throw ANNException("Error: save_path_prefix can't be empty", -1);
@@ -35,37 +35,36 @@ class IndexBuildParamsBuilder
         return *this;
     }
 
-    IndexBuildParamsBuilder &with_label_file(const std::string &label_file)
+    IndexFilterParamsBuilder &with_label_file(const std::string &label_file)
     {
         this->_label_file = label_file;
         return *this;
     }
 
-    IndexBuildParamsBuilder &with_universal_label(const std::string &univeral_label)
+    IndexFilterParamsBuilder &with_universal_label(const std::string &univeral_label)
     {
         this->_universal_label = univeral_label;
         return *this;
     }
 
-    IndexBuildParamsBuilder &with_filter_threshold(const std::uint32_t &filter_threshold)
+    IndexFilterParamsBuilder &with_filter_threshold(const std::uint32_t &filter_threshold)
     {
         this->_filter_threshold = filter_threshold;
         return *this;
     }
 
-    IndexBuildParams build()
+    IndexFilterParams build()
     {
-        return IndexBuildParams(_index_write_params, _save_path_prefix, _label_file, _universal_label,
-                                _filter_threshold);
+        return IndexFilterParams(_save_path_prefix, _label_file, _universal_label, _filter_threshold);
     }
 
-    IndexBuildParamsBuilder(const IndexBuildParamsBuilder &) = delete;
-    IndexBuildParamsBuilder &operator=(const IndexBuildParamsBuilder &) = delete;
+    IndexFilterParamsBuilder(const IndexFilterParamsBuilder &) = delete;
+    IndexFilterParamsBuilder &operator=(const IndexFilterParamsBuilder &) = delete;
 
   private:
-    diskann::IndexWriteParameters _index_write_params;
     std::string _save_path_prefix;
     std::string _label_file;
+    std::string _tags_file;
     std::string _universal_label;
     uint32_t _filter_threshold = 0;
 };
diff --git a/include/index_config.h b/include/index_config.h
index b291c744d..452498b01 100644
--- a/include/index_config.h
+++ b/include/index_config.h
@@ -3,14 +3,16 @@
 
 namespace diskann
 {
-enum DataStoreStrategy
+enum class DataStoreStrategy
 {
     MEMORY
 };
 
-enum GraphStoreStrategy
+enum class GraphStoreStrategy
 {
+    MEMORY
 };
+
 struct IndexConfig
 {
     DataStoreStrategy data_strategy;
@@ -25,6 +27,7 @@ struct IndexConfig
     bool pq_dist_build;
     bool concurrent_consolidate;
     bool use_opq;
+    bool filtered_index;
 
     size_t num_pq_chunks;
     size_t num_frozen_pts;
@@ -33,24 +36,23 @@ struct IndexConfig
     std::string tag_type;
     std::string data_type;
 
+    // Params for building index
     std::shared_ptr<IndexWriteParameters> index_write_params;
-
-    uint32_t search_threads;
-    uint32_t initial_search_list_size;
+    // Params for searching index
+    std::shared_ptr<IndexSearchParams> index_search_params;
 
   private:
     IndexConfig(DataStoreStrategy data_strategy, GraphStoreStrategy graph_strategy, Metric metric, size_t dimension,
                 size_t max_points, size_t num_pq_chunks, size_t num_frozen_points, bool dynamic_index, bool enable_tags,
-                bool pq_dist_build, bool concurrent_consolidate, bool use_opq, const std::string &data_type,
-                const std::string &tag_type, const std::string &label_type,
-                std::shared_ptr<IndexWriteParameters> index_write_params, uint32_t search_threads,
-                uint32_t initial_search_list_size)
+                bool pq_dist_build, bool concurrent_consolidate, bool use_opq, bool filtered_index,
+                std::string &data_type, const std::string &tag_type, const std::string &label_type,
+                std::shared_ptr<IndexWriteParameters> index_write_params,
+                std::shared_ptr<IndexSearchParams> index_search_params)
         : data_strategy(data_strategy), graph_strategy(graph_strategy), metric(metric), dimension(dimension),
           max_points(max_points), dynamic_index(dynamic_index), enable_tags(enable_tags), pq_dist_build(pq_dist_build),
-          concurrent_consolidate(concurrent_consolidate), use_opq(use_opq), num_pq_chunks(num_pq_chunks),
-          num_frozen_pts(num_frozen_points), label_type(label_type), tag_type(tag_type), data_type(data_type),
-          index_write_params(index_write_params), search_threads(search_threads),
-          initial_search_list_size(initial_search_list_size)
+          concurrent_consolidate(concurrent_consolidate), use_opq(use_opq), filtered_index(filtered_index),
+          num_pq_chunks(num_pq_chunks), num_frozen_pts(num_frozen_points), label_type(label_type), tag_type(tag_type),
+          data_type(data_type), index_write_params(index_write_params), index_search_params(index_search_params)
     {
     }
 
@@ -60,9 +62,7 @@ struct IndexConfig
 class IndexConfigBuilder
 {
   public:
-    IndexConfigBuilder()
-    {
-    }
+    IndexConfigBuilder() = default;
 
     IndexConfigBuilder &with_metric(Metric m)
     {
@@ -124,6 +124,12 @@ class IndexConfigBuilder
         return *this;
     }
 
+    IndexConfigBuilder &is_filtered(bool is_filtered)
+    {
+        this->_filtered_index = is_filtered;
+        return *this;
+    }
+
     IndexConfigBuilder &with_num_pq_chunks(size_t num_pq_chunks)
     {
         this->_num_pq_chunks = num_pq_chunks;
@@ -160,15 +166,31 @@ class IndexConfigBuilder
         return *this;
     }
 
-    IndexConfigBuilder &with_search_threads(uint32_t search_threads)
+    IndexConfigBuilder &with_index_write_params(std::shared_ptr<IndexWriteParameters> index_write_params_ptr)
     {
-        this->_search_threads = search_threads;
+        if (index_write_params_ptr == nullptr)
+        {
+            diskann::cout << "Passed, empty build_params while creating index config" << std::endl;
+            return *this;
+        }
+        this->_index_write_params = index_write_params_ptr;
         return *this;
     }
 
-    IndexConfigBuilder &with_initial_search_list_size(uint32_t search_list_size)
+    IndexConfigBuilder &with_index_search_params(IndexSearchParams &search_params)
     {
-        this->_initial_search_list_size = search_list_size;
+        this->_index_search_params = std::make_shared<IndexSearchParams>(search_params);
+        return *this;
+    }
+
+    IndexConfigBuilder &with_index_search_params(std::shared_ptr<IndexSearchParams> search_params_ptr)
+    {
+        if (search_params_ptr == nullptr)
+        {
+            diskann::cout << "Passed, empty search_params while creating index config" << std::endl;
+            return *this;
+        }
+        this->_index_search_params = search_params_ptr;
         return *this;
     }
 
@@ -177,19 +199,28 @@ class IndexConfigBuilder
         if (_data_type == "" || _data_type.empty())
             throw ANNException("Error: data_type can not be empty", -1);
 
-        if (_dynamic_index && _index_write_params != nullptr)
+        if (_dynamic_index && _num_frozen_pts == 0)
         {
-            if (_search_threads == 0)
-                throw ANNException("Error: please pass search_threads for building dynamic index.", -1);
+            _num_frozen_pts = 1;
+        }
 
-            if (_initial_search_list_size == 0)
+        if (_dynamic_index)
+        {
+            if (_index_search_params != nullptr && _index_search_params->initial_search_list_size == 0)
                 throw ANNException("Error: please pass initial_search_list_size for building dynamic index.", -1);
         }
 
+        // sanity check
+        if (_dynamic_index && _num_frozen_pts == 0)
+        {
+            diskann::cout << "_num_frozen_pts passed as 0 for dynamic_index. Setting it to 1 for safety." << std::endl;
+            _num_frozen_pts = 1;
+        }
+
         return IndexConfig(_data_strategy, _graph_strategy, _metric, _dimension, _max_points, _num_pq_chunks,
                            _num_frozen_pts, _dynamic_index, _enable_tags, _pq_dist_build, _concurrent_consolidate,
-                           _use_opq, _data_type, _tag_type, _label_type, _index_write_params, _search_threads,
-                           _initial_search_list_size);
+                           _use_opq, _filtered_index, _data_type, _tag_type, _label_type, _index_write_params,
+                           _index_search_params);
     }
 
     IndexConfigBuilder(const IndexConfigBuilder &) = delete;
@@ -208,17 +239,16 @@ class IndexConfigBuilder
     bool _pq_dist_build = false;
     bool _concurrent_consolidate = false;
     bool _use_opq = false;
+    bool _filtered_index{defaults::HAS_LABELS};
 
     size_t _num_pq_chunks = 0;
-    size_t _num_frozen_pts = 0;
+    size_t _num_frozen_pts{defaults::NUM_FROZEN_POINTS_STATIC};
 
-    std::string _label_type = "uint32";
-    std::string _tag_type = "uint32";
+    std::string _label_type{"uint32"};
+    std::string _tag_type{"uint32"};
     std::string _data_type;
 
     std::shared_ptr<IndexWriteParameters> _index_write_params;
-
-    uint32_t _search_threads;
-    uint32_t _initial_search_list_size;
+    std::shared_ptr<IndexSearchParams> _index_search_params;
 };
 } // namespace diskann
diff --git a/include/index_factory.h b/include/index_factory.h
index 3d1eb7992..80bc40dba 100644
--- a/include/index_factory.h
+++ b/include/index_factory.h
@@ -1,6 +1,7 @@
 #include "index.h"
 #include "abstract_graph_store.h"
 #include "in_mem_graph_store.h"
+#include "pq_data_store.h"
 
 namespace diskann
 {
@@ -10,14 +11,25 @@ class IndexFactory
     DISKANN_DLLEXPORT explicit IndexFactory(const IndexConfig &config);
     DISKANN_DLLEXPORT std::unique_ptr<AbstractIndex> create_instance();
 
-  private:
-    void check_config();
+    DISKANN_DLLEXPORT static std::unique_ptr<AbstractGraphStore> construct_graphstore(
+        const GraphStoreStrategy stratagy, const size_t size, const size_t reserve_graph_degree);
 
     template <typename T>
-    std::unique_ptr<AbstractDataStore<T>> construct_datastore(DataStoreStrategy stratagy, size_t num_points,
-                                                              size_t dimension);
+    DISKANN_DLLEXPORT static std::shared_ptr<AbstractDataStore<T>> construct_datastore(DataStoreStrategy stratagy,
+                                                                                       size_t num_points,
+                                                                                       size_t dimension, Metric m);
+    // For now PQDataStore incorporates within itself all variants of quantization that we support. In the
+    // future it may be necessary to introduce an AbstractPQDataStore class to spearate various quantization
+    // flavours.
+    template <typename T>
+    DISKANN_DLLEXPORT static std::shared_ptr<PQDataStore<T>> construct_pq_datastore(DataStoreStrategy strategy,
+                                                                                    size_t num_points, size_t dimension,
+                                                                                    Metric m, size_t num_pq_chunks,
+                                                                                    bool use_opq);
+    template <typename T> static Distance<T> *construct_inmem_distance_fn(Metric m);
 
-    std::unique_ptr<AbstractGraphStore> construct_graphstore(GraphStoreStrategy stratagy, size_t size);
+  private:
+    void check_config();
 
     template <typename data_type, typename tag_type, typename label_type>
     std::unique_ptr<AbstractIndex> create_instance();
diff --git a/include/natural_number_map.h b/include/natural_number_map.h
index 820ac3fdf..e846882a8 100644
--- a/include/natural_number_map.h
+++ b/include/natural_number_map.h
@@ -26,9 +26,6 @@ template <typename Key, typename Value> class natural_number_map
 {
   public:
     static_assert(std::is_trivial<Key>::value, "Key must be a trivial type");
-    // Some of the class member prototypes are done with this assumption to
-    // minimize verbosity since it's the only use case.
-    static_assert(std::is_trivial<Value>::value, "Value must be a trivial type");
 
     // Represents a reference to a element in the map. Used while iterating
     // over map entries.
diff --git a/include/parameters.h b/include/parameters.h
index 81a336da7..0206814bd 100644
--- a/include/parameters.h
+++ b/include/parameters.h
@@ -23,21 +23,30 @@ class IndexWriteParameters
     const float alpha;
     const uint32_t num_threads;
     const uint32_t filter_list_size; // Lf
-    const uint32_t num_frozen_points;
 
-  private:
     IndexWriteParameters(const uint32_t search_list_size, const uint32_t max_degree, const bool saturate_graph,
                          const uint32_t max_occlusion_size, const float alpha, const uint32_t num_threads,
-                         const uint32_t filter_list_size, const uint32_t num_frozen_points)
+                         const uint32_t filter_list_size)
         : search_list_size(search_list_size), max_degree(max_degree), saturate_graph(saturate_graph),
           max_occlusion_size(max_occlusion_size), alpha(alpha), num_threads(num_threads),
-          filter_list_size(filter_list_size), num_frozen_points(num_frozen_points)
+          filter_list_size(filter_list_size)
     {
     }
 
     friend class IndexWriteParametersBuilder;
 };
 
+class IndexSearchParams
+{
+  public:
+    IndexSearchParams(const uint32_t initial_search_list_size, const uint32_t num_search_threads)
+        : initial_search_list_size(initial_search_list_size), num_search_threads(num_search_threads)
+    {
+    }
+    const uint32_t initial_search_list_size; // search L
+    const uint32_t num_search_threads;       // search threads
+};
+
 class IndexWriteParametersBuilder
 {
     /**
@@ -72,7 +81,7 @@ class IndexWriteParametersBuilder
 
     IndexWriteParametersBuilder &with_num_threads(const uint32_t num_threads)
     {
-        _num_threads = num_threads == 0 ? omp_get_num_threads() : num_threads;
+        _num_threads = num_threads == 0 ? omp_get_num_procs() : num_threads;
         return *this;
     }
 
@@ -82,22 +91,16 @@ class IndexWriteParametersBuilder
         return *this;
     }
 
-    IndexWriteParametersBuilder &with_num_frozen_points(const uint32_t num_frozen_points)
-    {
-        _num_frozen_points = num_frozen_points;
-        return *this;
-    }
-
     IndexWriteParameters build() const
     {
         return IndexWriteParameters(_search_list_size, _max_degree, _saturate_graph, _max_occlusion_size, _alpha,
-                                    _num_threads, _filter_list_size, _num_frozen_points);
+                                    _num_threads, _filter_list_size);
     }
 
     IndexWriteParametersBuilder(const IndexWriteParameters &wp)
         : _search_list_size(wp.search_list_size), _max_degree(wp.max_degree),
           _max_occlusion_size(wp.max_occlusion_size), _saturate_graph(wp.saturate_graph), _alpha(wp.alpha),
-          _filter_list_size(wp.filter_list_size), _num_frozen_points(wp.num_frozen_points)
+          _filter_list_size(wp.filter_list_size)
     {
     }
     IndexWriteParametersBuilder(const IndexWriteParametersBuilder &) = delete;
@@ -111,7 +114,6 @@ class IndexWriteParametersBuilder
     float _alpha{defaults::ALPHA};
     uint32_t _num_threads{defaults::NUM_THREADS};
     uint32_t _filter_list_size{defaults::FILTER_LIST_SIZE};
-    uint32_t _num_frozen_points{defaults::NUM_FROZEN_POINTS_STATIC};
 };
 
 } // namespace diskann
diff --git a/include/pq.h b/include/pq.h
index acfa1b30a..3e6119f22 100644
--- a/include/pq.h
+++ b/include/pq.h
@@ -4,13 +4,7 @@
 #pragma once
 
 #include "utils.h"
-
-#define NUM_PQ_BITS 8
-#define NUM_PQ_CENTROIDS (1 << NUM_PQ_BITS)
-#define MAX_OPQ_ITERS 20
-#define NUM_KMEANS_REPS_PQ 12
-#define MAX_PQ_TRAINING_SET_SIZE 256000
-#define MAX_PQ_CHUNKS 512
+#include "pq_common.h"
 
 namespace diskann
 {
@@ -53,40 +47,6 @@ class FixedChunkPQTable
     void populate_chunk_inner_products(const float *query_vec, float *dist_vec);
 };
 
-template <typename T> struct PQScratch
-{
-    float *aligned_pqtable_dist_scratch = nullptr; // MUST BE AT LEAST [256 * NCHUNKS]
-    float *aligned_dist_scratch = nullptr;         // MUST BE AT LEAST diskann MAX_DEGREE
-    uint8_t *aligned_pq_coord_scratch = nullptr;   // MUST BE AT LEAST  [N_CHUNKS * MAX_DEGREE]
-    float *rotated_query = nullptr;
-    float *aligned_query_float = nullptr;
-
-    PQScratch(size_t graph_degree, size_t aligned_dim)
-    {
-        diskann::alloc_aligned((void **)&aligned_pq_coord_scratch,
-                               (size_t)graph_degree * (size_t)MAX_PQ_CHUNKS * sizeof(uint8_t), 256);
-        diskann::alloc_aligned((void **)&aligned_pqtable_dist_scratch, 256 * (size_t)MAX_PQ_CHUNKS * sizeof(float),
-                               256);
-        diskann::alloc_aligned((void **)&aligned_dist_scratch, (size_t)graph_degree * sizeof(float), 256);
-        diskann::alloc_aligned((void **)&aligned_query_float, aligned_dim * sizeof(float), 8 * sizeof(float));
-        diskann::alloc_aligned((void **)&rotated_query, aligned_dim * sizeof(float), 8 * sizeof(float));
-
-        memset(aligned_query_float, 0, aligned_dim * sizeof(float));
-        memset(rotated_query, 0, aligned_dim * sizeof(float));
-    }
-
-    void set(size_t dim, T *query, const float norm = 1.0f)
-    {
-        for (size_t d = 0; d < dim; ++d)
-        {
-            if (norm != 1.0f)
-                rotated_query[d] = aligned_query_float[d] = static_cast<float>(query[d]) / norm;
-            else
-                rotated_query[d] = aligned_query_float[d] = static_cast<float>(query[d]);
-        }
-    }
-};
-
 void aggregate_coords(const std::vector<unsigned> &ids, const uint8_t *all_coords, const uint64_t ndims, uint8_t *out);
 
 void pq_dist_lookup(const uint8_t *pq_ids, const size_t n_pts, const size_t pq_nchunks, const float *pq_dists,
@@ -107,11 +67,19 @@ DISKANN_DLLEXPORT int generate_opq_pivots(const float *train_data, size_t num_tr
                                           unsigned num_pq_chunks, std::string opq_pivots_path,
                                           bool make_zero_mean = false);
 
+DISKANN_DLLEXPORT int generate_pq_pivots_simplified(const float *train_data, size_t num_train, size_t dim,
+                                                    size_t num_pq_chunks, std::vector<float> &pivot_data_vector);
+
 template <typename T>
 int generate_pq_data_from_pivots(const std::string &data_file, unsigned num_centers, unsigned num_pq_chunks,
                                  const std::string &pq_pivots_path, const std::string &pq_compressed_vectors_path,
                                  bool use_opq = false);
 
+DISKANN_DLLEXPORT int generate_pq_data_from_pivots_simplified(const float *data, const size_t num,
+                                                              const float *pivot_data, const size_t pivots_num,
+                                                              const size_t dim, const size_t num_pq_chunks,
+                                                              std::vector<uint8_t> &pq);
+
 template <typename T>
 void generate_disk_quantized_data(const std::string &data_file_to_use, const std::string &disk_pq_pivots_path,
                                   const std::string &disk_pq_compressed_vectors_path,
diff --git a/include/pq_common.h b/include/pq_common.h
new file mode 100644
index 000000000..c6a3a5739
--- /dev/null
+++ b/include/pq_common.h
@@ -0,0 +1,30 @@
+#pragma once
+
+#include <string>
+#include <sstream>
+
+#define NUM_PQ_BITS 8
+#define NUM_PQ_CENTROIDS (1 << NUM_PQ_BITS)
+#define MAX_OPQ_ITERS 20
+#define NUM_KMEANS_REPS_PQ 12
+#define MAX_PQ_TRAINING_SET_SIZE 256000
+#define MAX_PQ_CHUNKS 512
+
+namespace diskann
+{
+inline std::string get_quantized_vectors_filename(const std::string &prefix, bool use_opq, uint32_t num_chunks)
+{
+    return prefix + (use_opq ? "_opq" : "pq") + std::to_string(num_chunks) + "_compressed.bin";
+}
+
+inline std::string get_pivot_data_filename(const std::string &prefix, bool use_opq, uint32_t num_chunks)
+{
+    return prefix + (use_opq ? "_opq" : "pq") + std::to_string(num_chunks) + "_pivots.bin";
+}
+
+inline std::string get_rotation_matrix_suffix(const std::string &pivot_data_filename)
+{
+    return pivot_data_filename + "_rotation_matrix.bin";
+}
+
+} // namespace diskann
diff --git a/include/pq_data_store.h b/include/pq_data_store.h
new file mode 100644
index 000000000..227b8a6af
--- /dev/null
+++ b/include/pq_data_store.h
@@ -0,0 +1,97 @@
+#pragma once
+#include <memory>
+#include "distance.h"
+#include "quantized_distance.h"
+#include "pq.h"
+#include "abstract_data_store.h"
+
+namespace diskann
+{
+// REFACTOR TODO: By default, the PQDataStore is an in-memory datastore because both Vamana and
+// DiskANN treat it the same way. But with DiskPQ, that may need to change.
+template <typename data_t> class PQDataStore : public AbstractDataStore<data_t>
+{
+
+  public:
+    PQDataStore(size_t dim, location_t num_points, size_t num_pq_chunks, std::unique_ptr<Distance<data_t>> distance_fn,
+                std::unique_ptr<QuantizedDistance<data_t>> pq_distance_fn);
+    PQDataStore(const PQDataStore &) = delete;
+    PQDataStore &operator=(const PQDataStore &) = delete;
+    ~PQDataStore();
+
+    // Load quantized vectors from a set of files. Here filename is treated
+    // as a prefix and the files are assumed to be named with DiskANN
+    // conventions.
+    virtual location_t load(const std::string &file_prefix) override;
+
+    // Save quantized vectors to a set of files whose names start with
+    // file_prefix.
+    //  Currently, the plan is to save the quantized vectors to the quantized
+    //  vectors file.
+    virtual size_t save(const std::string &file_prefix, const location_t num_points) override;
+
+    // Since base class function is pure virtual, we need to declare it here, even though alignent concept is not needed
+    // for Quantized data stores.
+    virtual size_t get_aligned_dim() const override;
+
+    // Populate quantized data from unaligned data using PQ functionality
+    virtual void populate_data(const data_t *vectors, const location_t num_pts) override;
+    virtual void populate_data(const std::string &filename, const size_t offset) override;
+
+    virtual void extract_data_to_bin(const std::string &filename, const location_t num_pts) override;
+
+    virtual void get_vector(const location_t i, data_t *target) const override;
+    virtual void set_vector(const location_t i, const data_t *const vector) override;
+    virtual void prefetch_vector(const location_t loc) const override;
+
+    virtual void move_vectors(const location_t old_location_start, const location_t new_location_start,
+                              const location_t num_points) override;
+    virtual void copy_vectors(const location_t from_loc, const location_t to_loc, const location_t num_points) override;
+
+    virtual void preprocess_query(const data_t *query, AbstractScratch<data_t> *scratch) const override;
+
+    virtual float get_distance(const data_t *query, const location_t loc) const override;
+    virtual float get_distance(const location_t loc1, const location_t loc2) const override;
+
+    // NOTE: Caller must invoke "PQDistance->preprocess_query" ONCE before calling
+    // this function.
+    virtual void get_distance(const data_t *preprocessed_query, const location_t *locations,
+                              const uint32_t location_count, float *distances,
+                              AbstractScratch<data_t> *scratch_space) const override;
+
+    // NOTE: Caller must invoke "PQDistance->preprocess_query" ONCE before calling
+    // this function.
+    virtual void get_distance(const data_t *preprocessed_query, const std::vector<location_t> &ids,
+                              std::vector<float> &distances, AbstractScratch<data_t> *scratch_space) const override;
+
+    // We are returning the distance function that is used for full precision
+    // vectors here, not the PQ distance function. This is because the callers
+    // all are expecting a Distance<T> not QuantizedDistance<T>.
+    virtual Distance<data_t> *get_dist_fn() const override;
+
+    virtual location_t calculate_medoid() const override;
+
+    virtual size_t get_alignment_factor() const override;
+
+  protected:
+    virtual location_t expand(const location_t new_size) override;
+    virtual location_t shrink(const location_t new_size) override;
+
+    virtual location_t load_impl(const std::string &filename);
+#ifdef EXEC_ENV_OLS
+    virtual location_t load_impl(AlignedFileReader &reader);
+#endif
+
+  private:
+    uint8_t *_quantized_data = nullptr;
+    size_t _num_chunks = 0;
+
+    // REFACTOR TODO: Doing this temporarily before refactoring OPQ into
+    // its own class. Remove later.
+    bool _use_opq = false;
+
+    Metric _distance_metric;
+    std::unique_ptr<Distance<data_t>> _distance_fn = nullptr;
+    std::unique_ptr<QuantizedDistance<data_t>> _pq_distance_fn = nullptr;
+};
+} // namespace diskann
diff --git a/include/pq_flash_index.h b/include/pq_flash_index.h
index 6f210b20d..2b26f1177 100644
--- a/include/pq_flash_index.h
+++ b/include/pq_flash_index.h
@@ -37,14 +37,14 @@ template <typename T, typename LabelT = uint32_t> class PQFlashIndex
 
 #ifdef EXEC_ENV_OLS
     DISKANN_DLLEXPORT int load_from_separate_paths(diskann::MemoryMappedFiles &files, uint32_t num_threads,
-                                                   const char *index_filepath, const char *pivots_filepath,
-                                                   const char *compressed_filepath, const char* labels_filepath, const char* labels_to_medoids_filepath,
-                                                   const char* labels_map_filepath, const char* unv_label_filepath);
+                                                    const char* index_filepath, const char* pivots_filepath,
+                                                    const char* compressed_filepath, const char* labels_filepath, const char* labels_to_medoids_filepath,
+                                                    const char* labels_map_filepath, const char* unv_label_filepath);
 #else
     DISKANN_DLLEXPORT int load_from_separate_paths(uint32_t num_threads, const char *index_filepath,
-                                                   const char *pivots_filepath, const char *compressed_filepath,
-                                                   const char *labels_filepath, const char *labels_to_medoids_filepath,
-                                                   const char *labels_map_filepath, const char* unv_label_filepath);
+                                                    const char* pivots_filepath, const char* compressed_filepath,
+                                                    const char* labels_filepath, const char* labels_to_medoids_filepath,
+                                                    const char* labels_map_filepath, const char* unv_label_filepath);
 #endif
 
     DISKANN_DLLEXPORT void load_cache_list(std::vector<uint32_t> &node_list);
@@ -99,6 +99,20 @@ template <typename T, typename LabelT = uint32_t> class PQFlashIndex
 
     DISKANN_DLLEXPORT diskann::Metric get_metric();
 
+    //
+    // node_ids: input list of node_ids to be read
+    // coord_buffers: pointers to pre-allocated buffers that coords need to copied to. If null, dont copy.
+    // nbr_buffers: pre-allocated buffers to copy neighbors into
+    //
+    // returns a vector of bool one for each node_id: true if read is success, else false
+    //
+    DISKANN_DLLEXPORT std::vector<bool> read_nodes(const std::vector<uint32_t> &node_ids,
+                                                   std::vector<T *> &coord_buffers,
+                                                   std::vector<std::pair<uint32_t, uint32_t *>> &nbr_buffers);
+
+    DISKANN_DLLEXPORT std::vector<std::uint8_t> get_pq_vector(std::uint64_t vid);
+    DISKANN_DLLEXPORT uint64_t get_num_points();
+
   protected:
     DISKANN_DLLEXPORT void use_medoids_data_as_centroids();
     DISKANN_DLLEXPORT void setup_thread_data(uint64_t nthreads, uint64_t visited_reserve = 4096);
@@ -106,100 +120,121 @@ template <typename T, typename LabelT = uint32_t> class PQFlashIndex
     DISKANN_DLLEXPORT void set_universal_label(const LabelT &label);
 
   private:
-    DISKANN_DLLEXPORT inline bool point_has_label(uint32_t point_id, uint32_t label_id);
-    std::unordered_map<std::string, LabelT> load_label_map(const std::string &map_file);
-    DISKANN_DLLEXPORT void parse_label_file(const std::string &map_file, size_t &num_pts_labels);
+    DISKANN_DLLEXPORT inline bool point_has_label(uint32_t point_id, LabelT label_id);
+    std::unordered_map<std::string, LabelT> load_label_map(std::basic_istream<char>& infile);
+    DISKANN_DLLEXPORT void parse_label_file(std::basic_istream<char>& infile, size_t &num_pts_labels);
     DISKANN_DLLEXPORT void get_label_file_metadata(const std::string &fileContent, uint32_t &num_pts,
                                                    uint32_t &num_total_labels);
-    DISKANN_DLLEXPORT inline int32_t get_filter_number(const LabelT &filter_label);
     DISKANN_DLLEXPORT void generate_random_labels(std::vector<LabelT> &labels, const uint32_t num_labels,
                                                   const uint32_t nthreads);
+    void reset_stream_for_reading(std::basic_istream<char> &infile);
+
+    // sector # on disk where node_id is present with in the graph part
+    DISKANN_DLLEXPORT uint64_t get_node_sector(uint64_t node_id);
+
+    // ptr to start of the node
+    DISKANN_DLLEXPORT char *offset_to_node(char *sector_buf, uint64_t node_id);
+
+    // returns region of `node_buf` containing [NNBRS][NBR_ID(uint32_t)]
+    DISKANN_DLLEXPORT uint32_t *offset_to_node_nhood(char *node_buf);
+
+    // returns region of `node_buf` containing [COORD(T)]
+    DISKANN_DLLEXPORT T *offset_to_node_coords(char *node_buf);
 
     size_t search_string_range(const std::string& str, char ch, size_t start, size_t end);
-    // index info
+
+    // index info for multi-node sectors
     // nhood of node `i` is in sector: [i / nnodes_per_sector]
     // offset in sector: [(i % nnodes_per_sector) * max_node_len]
-    // nnbrs of node `i`: *(unsigned*) (buf)
-    // nbrs of node `i`: ((unsigned*)buf) + 1
-
-    uint64_t max_node_len = 0, nnodes_per_sector = 0, max_degree = 0;
+    //
+    // index info for multi-sector nodes
+    // nhood of node `i` is in sector: [i * DIV_ROUND_UP(_max_node_len, SECTOR_LEN)]
+    // offset in sector: [0]
+    //
+    // Common info
+    // coords start at ofsset
+    // #nbrs of node `i`: *(unsigned*) (offset + disk_bytes_per_point)
+    // nbrs of node `i` : (unsigned*) (offset + disk_bytes_per_point + 1)
+
+    uint64_t _max_node_len = 0;
+    uint64_t _nnodes_per_sector = 0; // 0 for multi-sector nodes, >0 for multi-node sectors
+    uint64_t _max_degree = 0;
 
     // Data used for searching with re-order vectors
-    uint64_t ndims_reorder_vecs = 0, reorder_data_start_sector = 0, nvecs_per_sector = 0;
+    uint64_t _ndims_reorder_vecs = 0;
+    uint64_t _reorder_data_start_sector = 0;
+    uint64_t _nvecs_per_sector = 0;
 
     diskann::Metric metric = diskann::Metric::L2;
 
     // used only for inner product search to re-scale the result value
     // (due to the pre-processing of base during index build)
-    float max_base_norm = 0.0f;
+    float _max_base_norm = 0.0f;
 
     // data info
-    uint64_t num_points = 0;
-    uint64_t num_frozen_points = 0;
-    uint64_t frozen_location = 0;
-    uint64_t data_dim = 0;
-    uint64_t disk_data_dim = 0; // will be different from data_dim only if we use
-                                // PQ for disk data (very large dimensionality)
-    uint64_t aligned_dim = 0;
-    uint64_t disk_bytes_per_point = 0;
-
-    std::string disk_index_file;
-    std::vector<std::pair<uint32_t, uint32_t>> node_visit_counter;
+    uint64_t _num_points = 0;
+    uint64_t _num_frozen_points = 0;
+    uint64_t _frozen_location = 0;
+    uint64_t _data_dim = 0;
+    uint64_t _aligned_dim = 0;
+    uint64_t _disk_bytes_per_point = 0; // Number of bytes
+
+    std::string _disk_index_file;
+    std::vector<std::pair<uint32_t, uint32_t>> _node_visit_counter;
 
     // PQ data
-    // n_chunks = # of chunks ndims is split into
-    // data: char * n_chunks
+    // _n_chunks = # of chunks ndims is split into
+    // data: char * _n_chunks
     // chunk_size = chunk size of each dimension chunk
-    // pq_tables = float* [[2^8 * [chunk_size]] * n_chunks]
+    // pq_tables = float* [[2^8 * [chunk_size]] * _n_chunks]
     uint8_t *data = nullptr;
-    uint64_t n_chunks;
-    FixedChunkPQTable pq_table;
+    uint64_t _n_chunks;
+    FixedChunkPQTable _pq_table;
 
     // distance comparator
-    std::shared_ptr<Distance<T>> dist_cmp;
-    std::shared_ptr<Distance<float>> dist_cmp_float;
+    std::shared_ptr<Distance<T>> _dist_cmp;
+    std::shared_ptr<Distance<float>> _dist_cmp_float;
 
     // for very large datasets: we use PQ even for the disk resident index
-    bool use_disk_index_pq = false;
-    uint64_t disk_pq_n_chunks = 0;
-    FixedChunkPQTable disk_pq_table;
+    bool _use_disk_index_pq = false;
+    uint64_t _disk_pq_n_chunks = 0;
+    FixedChunkPQTable _disk_pq_table;
 
     // medoid/start info
 
     // graph has one entry point by default,
     // we can optionally have multiple starting points
-    uint32_t *medoids = nullptr;
+    uint32_t *_medoids = nullptr;
     // defaults to 1
-    size_t num_medoids;
+    size_t _num_medoids;
     // by default, it is empty. If there are multiple
     // centroids, we pick the medoid corresponding to the
     // closest centroid as the starting point of search
-    float *centroid_data = nullptr;
+    float *_centroid_data = nullptr;
 
-    // nhood_cache
-    unsigned *nhood_cache_buf = nullptr;
-    tsl::robin_map<uint32_t, std::pair<uint32_t, uint32_t *>> nhood_cache;
+    // nhood_cache; the uint32_t in nhood_Cache are offsets into nhood_cache_buf
+    unsigned *_nhood_cache_buf = nullptr;
+    tsl::robin_map<uint32_t, std::pair<uint32_t, uint32_t *>> _nhood_cache;
 
-    // coord_cache
-    T *coord_cache_buf = nullptr;
-    tsl::robin_map<uint32_t, T *> coord_cache;
+    // coord_cache; The T* in coord_cache are offsets into coord_cache_buf
+    T *_coord_cache_buf = nullptr;
+    tsl::robin_map<uint32_t, T *> _coord_cache;
 
     // thread-specific scratch
-    ConcurrentQueue<SSDThreadData<T> *> thread_data;
-    uint64_t max_nthreads;
-    bool load_flag = false;
-    bool count_visited_nodes = false;
-    bool reorder_data_exists = false;
-    uint64_t reoreder_data_offset = 0;
+    ConcurrentQueue<SSDThreadData<T> *> _thread_data;
+    uint64_t _max_nthreads;
+    bool _load_flag = false;
+    bool _count_visited_nodes = false;
+    bool _reorder_data_exists = false;
+    uint64_t _reoreder_data_offset = 0;
 
     // filter support
     uint32_t *_pts_to_label_offsets = nullptr;
-    uint32_t *_pts_to_labels = nullptr;
-    tsl::robin_set<LabelT> _labels;
+    uint32_t *_pts_to_label_counts = nullptr;
+    LabelT *_pts_to_labels = nullptr;
     std::unordered_map<LabelT, std::vector<uint32_t>> _filter_to_medoid_ids;
     bool _use_universal_label = false;
-    uint32_t _universal_filter_num;
-    std::vector<LabelT> _filter_list;
+    LabelT _universal_filter_label;
     tsl::robin_set<uint32_t> _dummy_pts;
     tsl::robin_set<uint32_t> _has_dummy_pts;
     tsl::robin_map<uint32_t, uint32_t> _dummy_to_real_map;
@@ -210,7 +245,7 @@ template <typename T, typename LabelT = uint32_t> class PQFlashIndex
     // Set to a larger value than the actual header to accommodate
     // any additions we make to the header. This is an outer limit
     // on how big the header can be.
-    static const int HEADER_SIZE = SECTOR_LEN;
+    static const int HEADER_SIZE = defaults::SECTOR_LEN;
     char *getHeaderBytes();
 #endif
 };
diff --git a/include/pq_l2_distance.h b/include/pq_l2_distance.h
new file mode 100644
index 000000000..e6fc6e41b
--- /dev/null
+++ b/include/pq_l2_distance.h
@@ -0,0 +1,87 @@
+#pragma once
+#include "quantized_distance.h"
+
+namespace diskann
+{
+template <typename data_t> class PQL2Distance : public QuantizedDistance<data_t>
+{
+  public:
+    // REFACTOR TODO: We could take a file prefix here and load the
+    // PQ pivots file, so that the distance object is initialized
+    // immediately after construction. But this would not work well
+    // with our data store concept where the store is created first
+    // and data populated after.
+    // REFACTOR TODO: Ideally, we should only read the num_chunks from
+    // the pivots file. However, we read the pivots file only later, but
+    // clients can call functions like get_<xxx>_filename without calling
+    // load_pivot_data. Hence this. The TODO is whether we should check
+    // that the num_chunks from the file is the same as this one.
+
+    PQL2Distance(uint32_t num_chunks, bool use_opq = false);
+
+    virtual ~PQL2Distance() override;
+
+    virtual bool is_opq() const override;
+
+    virtual std::string get_quantized_vectors_filename(const std::string &prefix) const override;
+    virtual std::string get_pivot_data_filename(const std::string &prefix) const override;
+    virtual std::string get_rotation_matrix_suffix(const std::string &pq_pivots_filename) const override;
+
+#ifdef EXEC_ENV_OLS
+    virtual void load_pivot_data(MemoryMappedFiles &files, const std::string &pq_table_file,
+                                 size_t num_chunks) override;
+#else
+    virtual void load_pivot_data(const std::string &pq_table_file, size_t num_chunks) override;
+#endif
+
+    // Number of chunks in the PQ table. Depends on the compression level used.
+    // Has to be < ndim
+    virtual uint32_t get_num_chunks() const override;
+
+    // Preprocess the query by computing chunk distances from the query vector to
+    // various centroids. Since we don't want this class to do scratch management,
+    // we will take a PQScratch object which can come either from Index class or
+    // PQFlashIndex class.
+    virtual void preprocess_query(const data_t *aligned_query, uint32_t original_dim,
+                                  PQScratch<data_t> &pq_scratch) override;
+
+    // Distance function used for graph traversal. This function must be called
+    // after
+    // preprocess_query. The reason we do not call preprocess ourselves is because
+    // that function has to be called once per query, while this function is
+    // called at each iteration of the graph walk. NOTE: This function expects
+    // 1. the query to be preprocessed using preprocess_query()
+    // 2. the scratch object to contain the quantized vectors corresponding to ids
+    // in aligned_pq_coord_scratch. Done by calling aggregate_coords()
+    //
+    virtual void preprocessed_distance(PQScratch<data_t> &pq_scratch, const uint32_t id_count,
+                                       float *dists_out) override;
+
+    // Same as above, but returns the distances in a vector instead of an array.
+    // Convenience function for index.cpp.
+    virtual void preprocessed_distance(PQScratch<data_t> &pq_scratch, const uint32_t n_ids,
+                                       std::vector<float> &dists_out) override;
+
+    // Currently this function is required for DiskPQ. However, it too can be
+    // subsumed under preprocessed_distance if we add the appropriate scratch
+    // variables to PQScratch and initialize them in
+    // pq_flash_index.cpp::disk_iterate_to_fixed_point()
+    virtual float brute_force_distance(const float *query_vec, uint8_t *base_vec) override;
+
+  protected:
+    // assumes pre-processed query
+    virtual void prepopulate_chunkwise_distances(const float *query_vec, float *dist_vec);
+
+    // assumes no rotation is involved
+    // virtual void inflate_vector(uint8_t *base_vec, float *out_vec);
+
+    float *_tables = nullptr; // pq_tables = float array of size [256 * ndims]
+    uint64_t _ndims = 0;      // ndims = true dimension of vectors
+    uint64_t _num_chunks = 0;
+    bool _is_opq = false;
+    uint32_t *_chunk_offsets = nullptr;
+    float *_centroid = nullptr;
+    float *_tables_tr = nullptr; // same as pq_tables, but col-major
+    float *_rotmat_tr = nullptr;
+};
+} // namespace diskann
diff --git a/include/pq_scratch.h b/include/pq_scratch.h
new file mode 100644
index 000000000..95f1b1395
--- /dev/null
+++ b/include/pq_scratch.h
@@ -0,0 +1,23 @@
+#pragma once
+#include <cstdint>
+#include "pq_common.h"
+#include "utils.h"
+
+namespace diskann
+{
+
+template <typename T> class PQScratch
+{
+  public:
+    float *aligned_pqtable_dist_scratch = nullptr; // MUST BE AT LEAST [256 * NCHUNKS]
+    float *aligned_dist_scratch = nullptr;         // MUST BE AT LEAST diskann MAX_DEGREE
+    uint8_t *aligned_pq_coord_scratch = nullptr;   // AT LEAST  [N_CHUNKS * MAX_DEGREE]
+    float *rotated_query = nullptr;
+    float *aligned_query_float = nullptr;
+
+    PQScratch(size_t graph_degree, size_t aligned_dim);
+    void initialize(size_t dim, const T *query, const float norm = 1.0f);
+    virtual ~PQScratch();
+};
+
+} // namespace diskann
\ No newline at end of file
diff --git a/include/program_options_utils.hpp b/include/program_options_utils.hpp
index 71077b7b2..2be60595b 100644
--- a/include/program_options_utils.hpp
+++ b/include/program_options_utils.hpp
@@ -73,7 +73,9 @@ const char *LABEL_FILE = "Input label file in txt format for Filtered Index buil
 const char *UNIVERSAL_LABEL =
     "Universal label, Use only in conjunction with label file for filtered index build. If a "
     "graph node has all the labels against it, we can assign a special universal filter to the "
-    "point instead of comma separated filters for that point";
+    "point instead of comma separated filters for that point.  The universal label should be assigned to nodes "
+    "in the labels file instead of listing all labels for a node.  DiskANN will not automatically assign a "
+    "universal label to a node.";
 const char *FILTERED_LBUILD = "Build complexity for filtered points, higher value results in better graphs";
 
 } // namespace program_options_utils
diff --git a/include/quantized_distance.h b/include/quantized_distance.h
new file mode 100644
index 000000000..cc4aea929
--- /dev/null
+++ b/include/quantized_distance.h
@@ -0,0 +1,56 @@
+#pragma once
+#include <memory>
+#include <string>
+#include <vector>
+#include "abstract_scratch.h"
+
+namespace diskann
+{
+template <typename data_t> class PQScratch;
+
+template <typename data_t> class QuantizedDistance
+{
+  public:
+    QuantizedDistance() = default;
+    QuantizedDistance(const QuantizedDistance &) = delete;
+    QuantizedDistance &operator=(const QuantizedDistance &) = delete;
+    virtual ~QuantizedDistance() = default;
+
+    virtual bool is_opq() const = 0;
+    virtual std::string get_quantized_vectors_filename(const std::string &prefix) const = 0;
+    virtual std::string get_pivot_data_filename(const std::string &prefix) const = 0;
+    virtual std::string get_rotation_matrix_suffix(const std::string &pq_pivots_filename) const = 0;
+
+    // Loading the PQ centroid table need not be part of the abstract class.
+    // However, we want to indicate that this function will change once we have a
+    // file reader hierarchy, so leave it here as-is.
+#ifdef EXEC_ENV_OLS
+    virtual void load_pivot_data(MemoryMappedFiles &files, const std::String &pq_table_file, size_t num_chunks) = 0;
+#else
+    virtual void load_pivot_data(const std::string &pq_table_file, size_t num_chunks) = 0;
+#endif
+
+    // Number of chunks in the PQ table. Depends on the compression level used.
+    // Has to be < ndim
+    virtual uint32_t get_num_chunks() const = 0;
+
+    // Preprocess the query by computing chunk distances from the query vector to
+    // various centroids. Since we don't want this class to do scratch management,
+    // we will take a PQScratch object which can come either from Index class or
+    // PQFlashIndex class.
+    virtual void preprocess_query(const data_t *query_vec, uint32_t query_dim, PQScratch<data_t> &pq_scratch) = 0;
+
+    // Workhorse
+    // This function must be called after preprocess_query
+    virtual void preprocessed_distance(PQScratch<data_t> &pq_scratch, const uint32_t id_count, float *dists_out) = 0;
+
+    // Same as above, but convenience function for index.cpp.
+    virtual void preprocessed_distance(PQScratch<data_t> &pq_scratch, const uint32_t n_ids,
+                                       std::vector<float> &dists_out) = 0;
+
+    // Currently this function is required for DiskPQ. However, it too can be subsumed
+    // under preprocessed_distance if we add the appropriate scratch variables to
+    // PQScratch and initialize them in pq_flash_index.cpp::disk_iterate_to_fixed_point()
+    virtual float brute_force_distance(const float *query_vec, uint8_t *base_vec) = 0;
+};
+} // namespace diskann
diff --git a/include/scratch.h b/include/scratch.h
index dd84c7f2f..bfb5e5a62 100644
--- a/include/scratch.h
+++ b/include/scratch.h
@@ -11,30 +11,23 @@
 #include "tsl/robin_map.h"
 #include "tsl/sparse_map.h"
 
+#include "aligned_file_reader.h"
+#include "abstract_scratch.h"
 #include "neighbor.h"
+#include "defaults.h"
 #include "concurrent_queue.h"
-#include "pq.h"
-#include "aligned_file_reader.h"
-
-// In-mem index related limits
-#define GRAPH_SLACK_FACTOR 1.3
-
-// SSD Index related limits
-#define MAX_GRAPH_DEGREE 512
-#define SECTOR_LEN (size_t)4096
-#define MAX_N_SECTOR_READS 128
 
 namespace diskann
 {
+template <typename T> class PQScratch;
 
 //
-// Scratch space for in-memory index based search
+// AbstractScratch space for in-memory index based search
 //
-template <typename T> class InMemQueryScratch
+template <typename T> class InMemQueryScratch : public AbstractScratch<T>
 {
   public:
     ~InMemQueryScratch();
-    // REFACTOR TODO: move all parameters to a new class.
     InMemQueryScratch(uint32_t search_l, uint32_t indexing_l, uint32_t r, uint32_t maxc, size_t dim, size_t aligned_dim,
                       size_t alignment_factor, bool init_pq_scratch = false, size_t bitmask_size = 0);
     void resize_for_new_L(uint32_t new_search_l);
@@ -54,11 +47,11 @@ template <typename T> class InMemQueryScratch
     }
     inline T *aligned_query()
     {
-        return _aligned_query;
+        return this->_aligned_query_T;
     }
     inline PQScratch<T> *pq_scratch()
     {
-        return _pq_scratch;
+        return this->_pq_scratch;
     }
     inline std::vector<Neighbor> &pool()
     {
@@ -111,10 +104,6 @@ template <typename T> class InMemQueryScratch
     uint32_t _R;
     uint32_t _maxc;
 
-    T *_aligned_query = nullptr;
-
-    PQScratch<T> *_pq_scratch = nullptr;
-
     // _pool stores all neighbors explored from best_L_nodes.
     // Usually around L+R, but could be higher.
     // Initialized to 3L+R for some slack, expands as needed.
@@ -153,10 +142,10 @@ template <typename T> class InMemQueryScratch
 };
 
 //
-// Scratch space for SSD index based search
+// AbstractScratch space for SSD index based search
 //
 
-template <typename T> class SSDQueryScratch
+template <typename T> class SSDQueryScratch : public AbstractScratch<T>
 {
   public:
     T *coord_scratch = nullptr; // MUST BE AT LEAST [sizeof(T) * data_dim]
@@ -164,10 +153,6 @@ template <typename T> class SSDQueryScratch
     char *sector_scratch = nullptr; // MUST BE AT LEAST [MAX_N_SECTOR_READS * SECTOR_LEN]
     size_t sector_idx = 0;          // index of next [SECTOR_LEN] scratch to use
 
-    T *aligned_query_T = nullptr;
-
-    PQScratch<T> *_pq_scratch;
-
     tsl::robin_set<size_t> visited;
     NeighborPriorityQueue retset;
     std::vector<Neighbor> full_retset;
diff --git a/include/tag_uint128.h b/include/tag_uint128.h
new file mode 100644
index 000000000..642de3159
--- /dev/null
+++ b/include/tag_uint128.h
@@ -0,0 +1,68 @@
+#pragma once
+#include <cstdint>
+#include <type_traits>
+
+namespace diskann
+{
+#pragma pack(push, 1)
+
+struct tag_uint128
+{
+    std::uint64_t _data1 = 0;
+    std::uint64_t _data2 = 0;
+
+    bool operator==(const tag_uint128 &other) const
+    {
+        return _data1 == other._data1 && _data2 == other._data2;
+    }
+
+    bool operator==(std::uint64_t other) const
+    {
+        return _data1 == other && _data2 == 0;
+    }
+
+    tag_uint128 &operator=(const tag_uint128 &other)
+    {
+        _data1 = other._data1;
+        _data2 = other._data2;
+
+        return *this;
+    }
+
+    tag_uint128 &operator=(std::uint64_t other)
+    {
+        _data1 = other;
+        _data2 = 0;
+
+        return *this;
+    }
+};
+
+#pragma pack(pop)
+} // namespace diskann
+
+namespace std
+{
+// Hash 128 input bits down to 64 bits of output.
+// This is intended to be a reasonably good hash function.
+inline std::uint64_t Hash128to64(const std::uint64_t &low, const std::uint64_t &high)
+{
+    // Murmur-inspired hashing.
+    const std::uint64_t kMul = 0x9ddfea08eb382d69ULL;
+    std::uint64_t a = (low ^ high) * kMul;
+    a ^= (a >> 47);
+    std::uint64_t b = (high ^ a) * kMul;
+    b ^= (b >> 47);
+    b *= kMul;
+    return b;
+}
+
+template <> struct hash<diskann::tag_uint128>
+{
+    size_t operator()(const diskann::tag_uint128 &key) const noexcept
+    {
+        return Hash128to64(key._data1, key._data2); // map -0 to 0
+    }
+};
+
+} // namespace std
\ No newline at end of file
diff --git a/include/types.h b/include/types.h
index b95848869..953d59a5f 100644
--- a/include/types.h
+++ b/include/types.h
@@ -17,5 +17,6 @@ using TagType = std::any;
 using LabelType = std::any;
 using TagVector = AnyWrapper::AnyVector;
 using DataVector = AnyWrapper::AnyVector;
+using Labelvector = AnyWrapper::AnyVector;
 using TagRobinSet = AnyWrapper::AnyRobinSet;
 } // namespace diskann
diff --git a/include/utils.h b/include/utils.h
index f81f6a68b..2f3b9f9e5 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -27,6 +27,7 @@ typedef int FileHandle;
 #include "windows_customizations.h"
 #include "tsl/robin_set.h"
 #include "types.h"
+#include "tag_uint128.h"
 #include <any>
 
 #ifdef EXEC_ENV_OLS
@@ -57,7 +58,7 @@ typedef int FileHandle;
 #define PBSTR "||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||"
 #define PBWIDTH 60
 
-inline bool file_exists(const std::string &name, bool dirCheck = false)
+inline bool file_exists_impl(const std::string &name, bool dirCheck = false)
 {
     int val;
 #ifndef _WINDOWS
@@ -94,6 +95,29 @@ inline bool file_exists(const std::string &name, bool dirCheck = false)
     }
 }
 
+inline bool file_exists(const std::string &name, bool dirCheck = false)
+{
+#ifdef EXEC_ENV_OLS
+    bool exists = file_exists_impl(name, dirCheck);
+    if (exists)
+    {
+        return true;
+    }
+    if (!dirCheck)
+    {
+        // try with .enc extension
+        std::string enc_name = name + ENCRYPTED_EXTENSION;
+        return file_exists_impl(enc_name, dirCheck);
+    }
+    else
+    {
+        return exists;
+    }
+#else
+    return file_exists_impl(name, dirCheck);
+#endif
+}
+
 inline void open_file_to_write(std::ofstream &writer, const std::string &filename)
 {
     writer.exceptions(std::ofstream::failbit | std::ofstream::badbit);
@@ -153,6 +177,7 @@ inline int delete_file(const std::string &fileName)
     }
 }
 
+// generates formatted_label and _labels_map file.
 inline void convert_labels_string_to_int(const std::string &inFileName, const std::string &outFileName,
                                          const std::string &mapFileName, const std::string &unv_label,
                                         uint32_t& unv_label_id)
@@ -174,7 +199,7 @@ inline void convert_labels_string_to_int(const std::string &inFileName, const st
             if (string_int_map.find(token) == string_int_map.end())
             {
                 uint32_t nextId = (uint32_t)string_int_map.size() + 1;
-                string_int_map[token] = nextId;
+                string_int_map[token] = nextId; // nextId can never be 0
             }
             lbls.push_back(string_int_map[token]);
         }
@@ -989,6 +1014,17 @@ void block_convert(std::ofstream &writr, std::ifstream &readr, float *read_buf,
 
 DISKANN_DLLEXPORT void normalize_data_file(const std::string &inFileName, const std::string &outFileName);
 
+inline std::string get_tag_string(std::uint64_t tag)
+{
+    return std::to_string(tag);
+}
+
+inline std::string get_tag_string(const tag_uint128 &tag)
+{
+    std::string str = std::to_string(tag._data2) + "_" + std::to_string(tag._data1);
+    return str;
+}
+
 }; // namespace diskann
 
 struct PivotContainer
diff --git a/include/windows_slim_lock.h b/include/windows_slim_lock.h
index 5d0d65508..9f8b0329a 100644
--- a/include/windows_slim_lock.h
+++ b/include/windows_slim_lock.h
@@ -34,6 +34,11 @@ class windows_exclusive_slim_lock
         return AcquireSRWLockExclusive(&_lock);
     }
 
+    void lock_shared()
+    {
+        return AcquireSRWLockShared(&_lock);
+    }
+
     bool try_lock()
     {
         return TryAcquireSRWLockExclusive(&_lock) != FALSE;
@@ -44,6 +49,11 @@ class windows_exclusive_slim_lock
         return ReleaseSRWLockExclusive(&_lock);
     }
 
+    void unlock_shared()
+    {
+        return ReleaseSRWLockShared(&_lock);
+    }
+
   private:
     SRWLOCK _lock;
 };
diff --git a/pyproject.toml b/pyproject.toml
index fb4349fab..f6a39cfe7 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -11,7 +11,7 @@ build-backend = "setuptools.build_meta"
 
 [project]
 name = "diskannpy"
-version = "0.6.0"
+version = "0.7.0"
 
 description = "DiskANN Python extension module"
 readme = "python/README.md"
@@ -48,9 +48,10 @@ test-command = "python -m unittest discover {project}/python/tests"
 [tool.cibuildwheel.linux]
 before-build = [
     "dnf makecache --refresh",
+    "dnf upgrade -y almalinux-release",
     "dnf install -y epel-release",
     "dnf config-manager -y --add-repo https://yum.repos.intel.com/mkl/setup/intel-mkl.repo",
-    "rpm --import https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS-2019.PUB",
+    "rpm --import https://yum.repos.intel.com/intel-gpg-keys/GPG-PUB-KEY-INTEL-SW-PRODUCTS.PUB",
     "dnf makecache --refresh -y",
     "dnf install -y wget make cmake gcc-c++ libaio-devel gperftools-libs libunwind-devel clang-tools-extra boost-devel boost-program-options intel-mkl-2020.4-912"
 ]
diff --git a/python/README.md b/python/README.md
index 1365fb422..a0c94759e 100644
--- a/python/README.md
+++ b/python/README.md
@@ -49,7 +49,7 @@ Please cite this software in your work as:
    author = {Simhadri, Harsha Vardhan and Krishnaswamy, Ravishankar and Srinivasa, Gopal and Subramanya, Suhas Jayaram and Antonijevic, Andrija and Pryce, Dax and Kaczynski, David and Williams, Shane and Gollapudi, Siddarth and Sivashankar, Varun and Karia, Neel and Singh, Aditi and Jaiswal, Shikhar and Mahapatro, Neelam and Adams, Philip and Tower, Bryan and Patel, Yash}},
    title = {{DiskANN: Graph-structured Indices for Scalable, Fast, Fresh and Filtered Approximate Nearest Neighbor Search}},
    url = {https://github.com/Microsoft/DiskANN},
-   version = {0.6.0},
+   version = {0.6.1},
    year = {2023}
 }
 ```
diff --git a/python/include/builder.h b/python/include/builder.h
index fc12976e7..6b1a5b4f3 100644
--- a/python/include/builder.h
+++ b/python/include/builder.h
@@ -20,7 +20,8 @@ template <typename DT, typename TagT = DynamicIdType, typename LabelT = filterT>
 void build_memory_index(diskann::Metric metric, const std::string &vector_bin_path,
                            const std::string &index_output_path, uint32_t graph_degree, uint32_t complexity,
                            float alpha, uint32_t num_threads, bool use_pq_build,
-                           size_t num_pq_bytes, bool use_opq, uint32_t filter_complexity,
-                           bool use_tags = false);
+                           size_t num_pq_bytes, bool use_opq, bool use_tags = false,
+                           const std::string& filter_labels_file = "", const std::string& universal_label = "",
+                           uint32_t filter_complexity = 0);
 
 }
diff --git a/python/include/static_disk_index.h b/python/include/static_disk_index.h
index 71a1b5aff..4a399ab3e 100644
--- a/python/include/static_disk_index.h
+++ b/python/include/static_disk_index.h
@@ -6,7 +6,6 @@
 #include <cstdint>
 #include <string>
 
-
 #include <pybind11/pybind11.h>
 #include <pybind11/numpy.h>
 
@@ -21,7 +20,8 @@
 
 namespace py = pybind11;
 
-namespace diskannpy {
+namespace diskannpy
+{
 
 #ifdef _WINDOWS
 typedef WindowsAlignedFileReader PlatformSpecificAlignedFileReader;
@@ -29,8 +29,7 @@ typedef WindowsAlignedFileReader PlatformSpecificAlignedFileReader;
 typedef LinuxAlignedFileReader PlatformSpecificAlignedFileReader;
 #endif
 
-template <typename DT>
-class StaticDiskIndex
+template <typename DT> class StaticDiskIndex
 {
   public:
     StaticDiskIndex(diskann::Metric metric, const std::string &index_path_prefix, uint32_t num_threads,
@@ -40,13 +39,15 @@ class StaticDiskIndex
 
     void cache_sample_paths(size_t num_nodes_to_cache, const std::string &warmup_query_file, uint32_t num_threads);
 
-    NeighborsAndDistances<StaticIdType> search(py::array_t<DT, py::array::c_style | py::array::forcecast> &query, uint64_t knn,
-                uint64_t complexity, uint64_t beam_width);
+    NeighborsAndDistances<StaticIdType> search(py::array_t<DT, py::array::c_style | py::array::forcecast> &query,
+                                               uint64_t knn, uint64_t complexity, uint64_t beam_width);
+
+    NeighborsAndDistances<StaticIdType> batch_search(
+        py::array_t<DT, py::array::c_style | py::array::forcecast> &queries, uint64_t num_queries, uint64_t knn,
+        uint64_t complexity, uint64_t beam_width, uint32_t num_threads);
 
-    NeighborsAndDistances<StaticIdType> batch_search(py::array_t<DT, py::array::c_style | py::array::forcecast> &queries, uint64_t num_queries,
-                      uint64_t knn, uint64_t complexity, uint64_t beam_width, uint32_t num_threads);
   private:
     std::shared_ptr<AlignedFileReader> _reader;
     diskann::PQFlashIndex<DT> _index;
 };
-}
+} // namespace diskannpy
diff --git a/python/include/static_memory_index.h b/python/include/static_memory_index.h
index 33f3187ae..6ed5a0822 100644
--- a/python/include/static_memory_index.h
+++ b/python/include/static_memory_index.h
@@ -14,21 +14,27 @@
 
 namespace py = pybind11;
 
-namespace diskannpy {
+namespace diskannpy
+{
 
-template <typename DT>
-class StaticMemoryIndex
+template <typename DT> class StaticMemoryIndex
 {
   public:
-    StaticMemoryIndex(diskann::Metric m, const std::string &index_prefix, size_t num_points,
-                     size_t dimensions, uint32_t num_threads, uint32_t initial_search_complexity);
+    StaticMemoryIndex(diskann::Metric m, const std::string &index_prefix, size_t num_points, size_t dimensions,
+                      uint32_t num_threads, uint32_t initial_search_complexity);
+
+    NeighborsAndDistances<StaticIdType> search(py::array_t<DT, py::array::c_style | py::array::forcecast> &query,
+                                               uint64_t knn, uint64_t complexity);
+
+    NeighborsAndDistances<StaticIdType> search_with_filter(
+        py::array_t<DT, py::array::c_style | py::array::forcecast> &query, uint64_t knn, uint64_t complexity,
+        filterT filter);
 
-    NeighborsAndDistances<StaticIdType> search(py::array_t<DT, py::array::c_style | py::array::forcecast> &query, uint64_t knn,
-                uint64_t complexity);
+    NeighborsAndDistances<StaticIdType> batch_search(
+        py::array_t<DT, py::array::c_style | py::array::forcecast> &queries, uint64_t num_queries, uint64_t knn,
+        uint64_t complexity, uint32_t num_threads);
 
-    NeighborsAndDistances<StaticIdType> batch_search(py::array_t<DT, py::array::c_style | py::array::forcecast> &queries,
-                                           uint64_t num_queries, uint64_t knn, uint64_t complexity, uint32_t num_threads);
   private:
     diskann::Index<DT, StaticIdType, filterT> _index;
 };
-}
\ No newline at end of file
+} // namespace diskannpy
\ No newline at end of file
diff --git a/python/src/_builder.py b/python/src/_builder.py
index 18e9e9fa0..013b7f2c9 100644
--- a/python/src/_builder.py
+++ b/python/src/_builder.py
@@ -1,6 +1,7 @@
 # Copyright (c) Microsoft Corporation. All rights reserved.
 # Licensed under the MIT license.
 
+import json
 import os
 import shutil
 from pathlib import Path
@@ -70,6 +71,15 @@ def build_disk_index(
     in the format DiskANN's PQ Flash Index builder requires. This temp folder is deleted upon index creation completion
     or error.
 
+    ## Distance Metric and Vector Datatype Restrictions
+    | Metric \ Datatype | np.float32 | np.uint8 | np.int8 |
+    |-------------------|------------|----------|---------|
+    | L2                |      ✅     |     ✅    |    ✅    |
+    | MIPS              |      ✅     |     ❌    |    ❌    |
+    | Cosine [^bug-in-disk-cosine]     |      ❌     |     ❌    |    ❌    |
+
+    [^bug-in-disk-cosine]: For StaticDiskIndex, Cosine distances are not currently supported.
+
     ### Parameters
     - **data**: Either a `str` representing a path to a DiskANN vector bin file, or a numpy.ndarray,
       of a supported dtype, in 2 dimensions. Note that `vector_dtype` must be provided if data is a `str`
@@ -119,6 +129,12 @@ def build_disk_index(
     vector_bin_path, vector_dtype_actual = _valid_path_and_dtype(
         data, vector_dtype, index_directory, index_prefix
     )
+    _assert(dap_metric != _native_dap.COSINE, "Cosine is currently not supported in StaticDiskIndex")
+    if dap_metric == _native_dap.INNER_PRODUCT:
+        _assert(
+            vector_dtype_actual == np.float32,
+            "Integral vector dtypes (np.uint8, np.int8) are not supported with distance metric mips"
+        )
 
     num_points, dimensions = vectors_metadata_from_file(vector_bin_path)
 
@@ -159,8 +175,10 @@ def build_memory_index(
     num_pq_bytes: int = defaults.NUM_PQ_BYTES,
     use_opq: bool = defaults.USE_OPQ,
     vector_dtype: Optional[VectorDType] = None,
-    filter_complexity: int = defaults.FILTER_COMPLEXITY,
     tags: Union[str, VectorIdentifierBatch] = "",
+    filter_labels: Optional[list[list[str]]] = None,
+    universal_label: str = "",
+    filter_complexity: int = defaults.FILTER_COMPLEXITY,
     index_prefix: str = "ann",
 ) -> None:
     """
@@ -176,6 +194,14 @@ def build_memory_index(
     `diskannpy.DynamicMemoryIndex`, you **must** supply a valid value for the `tags` parameter. **Do not supply
     tags if the index is intended to be `diskannpy.StaticMemoryIndex`**!
 
+    ## Distance Metric and Vector Datatype Restrictions
+
+    | Metric \ Datatype | np.float32 | np.uint8 | np.int8 |
+    |-------------------|------------|----------|---------|
+    | L2                |      ✅     |     ✅    |    ✅    |
+    | MIPS              |      ✅     |     ❌    |    ❌    |
+    | Cosine            |      ✅     |     ✅    |    ✅    |
+
     ### Parameters
 
     - **data**: Either a `str` representing a path to an existing DiskANN vector bin file, or a numpy.ndarray of a
@@ -200,10 +226,20 @@ def build_memory_index(
       Default is `0`.
     - **use_opq**: Use optimized product quantization during build.
     - **vector_dtype**: Required if the provided `data` is of type `str`, else we use the `data.dtype` if np array.
-    - **filter_complexity**: Complexity to use when using filters. Default is 0.
-    - **tags**: A `str` representing a path to a pre-built tags file on disk, or a `numpy.ndarray` of uint32 ids
-      corresponding to the ordinal position of the vectors provided to build the index. Defaults to "". **This value
-      must be provided if you want to build a memory index intended for use with `diskannpy.DynamicMemoryIndex`**.
+    - **tags**: Tags can be defined either as a path on disk to an existing .tags file, or provided as a np.array of
+      the same length as the number of vectors. Tags are used to identify vectors in the index via your *own*
+      numbering conventions, and is absolutely required for loading DynamicMemoryIndex indices `from_file`.
+    - **filter_labels**: An optional, but exhaustive list of categories for each vector. This is used to filter
+      search results by category. If provided, this must be a list of lists, where each inner list is a list of
+      categories for the corresponding vector. For example, if you have 3 vectors, and the first vector belongs to
+      categories "a" and "b", the second vector belongs to category "b", and the third vector belongs to no categories,
+      you would provide `filter_labels=[["a", "b"], ["b"], []]`. If you do not want to provide categories for a
+      particular vector, you can provide an empty list. If you do not want to provide categories for any vectors,
+      you can provide `None` for this parameter (which is the default)
+    - **universal_label**: An optional label that indicates that this vector should be included in *every* search
+      in which it also meets the knn search criteria.
+    - **filter_complexity**: Complexity to use when using filters. Default is 0. 0 is strictly invalid if you are
+      using filters.
     - **index_prefix**: The prefix of the index files. Defaults to "ann".
     """
     _assert(
@@ -222,6 +258,10 @@ def build_memory_index(
     _assert_is_nonnegative_uint32(num_pq_bytes, "num_pq_bytes")
     _assert_is_nonnegative_uint32(filter_complexity, "filter_complexity")
     _assert(index_prefix != "", "index_prefix cannot be an empty string")
+    _assert(
+        filter_labels is None or filter_complexity > 0,
+        "if filter_labels is provided, filter_complexity must not be 0"
+    )
 
     index_path = Path(index_directory)
     _assert(
@@ -232,8 +272,18 @@ def build_memory_index(
     vector_bin_path, vector_dtype_actual = _valid_path_and_dtype(
         data, vector_dtype, index_directory, index_prefix
     )
+    if dap_metric == _native_dap.INNER_PRODUCT:
+        _assert(
+            vector_dtype_actual == np.float32,
+            "Integral vector dtypes (np.uint8, np.int8) are not supported with distance metric mips"
+        )
 
     num_points, dimensions = vectors_metadata_from_file(vector_bin_path)
+    if filter_labels is not None:
+        _assert(
+            len(filter_labels) == num_points,
+            "filter_labels must be the same length as the number of points"
+        )
 
     if vector_dtype_actual == np.uint8:
         _builder = _native_dap.build_memory_uint8_index
@@ -244,6 +294,21 @@ def build_memory_index(
 
     index_prefix_path = os.path.join(index_directory, index_prefix)
 
+    filter_labels_file = ""
+    if filter_labels is not None:
+        label_counts = {}
+        filter_labels_file = f"{index_prefix_path}_pylabels.txt"
+        with open(filter_labels_file, "w") as labels_file:
+            for labels in filter_labels:
+                for label in labels:
+                    label_counts[label] = 1 if label not in label_counts else label_counts[label] + 1
+                if len(labels) == 0:
+                    print("default", file=labels_file)
+                else:
+                    print(",".join(labels), file=labels_file)
+        with open(f"{index_prefix_path}_label_metadata.json", "w") as label_metadata_file:
+            json.dump(label_counts, label_metadata_file, indent=True)
+
     if isinstance(tags, str) and tags != "":
         use_tags = True
         shutil.copy(tags, index_prefix_path + ".tags")
@@ -271,8 +336,10 @@ def build_memory_index(
         use_pq_build=use_pq_build,
         num_pq_bytes=num_pq_bytes,
         use_opq=use_opq,
-        filter_complexity=filter_complexity,
         use_tags=use_tags,
+        filter_labels_file=filter_labels_file,
+        universal_label=universal_label,
+        filter_complexity=filter_complexity,
     )
 
     _write_index_metadata(
diff --git a/python/src/_builder.pyi b/python/src/_builder.pyi
index 5014880c6..223e6c923 100644
--- a/python/src/_builder.pyi
+++ b/python/src/_builder.pyi
@@ -47,11 +47,11 @@ def build_memory_index(
     use_pq_build: bool,
     num_pq_bytes: int,
     use_opq: bool,
-    label_file: str,
+    tags: Union[str, VectorIdentifierBatch],
+    filter_labels: Optional[list[list[str]]],
     universal_label: str,
     filter_complexity: int,
-    tags: Optional[VectorIdentifierBatch],
-    index_prefix: str,
+    index_prefix: str
 ) -> None: ...
 @overload
 def build_memory_index(
@@ -66,9 +66,9 @@ def build_memory_index(
     num_pq_bytes: int,
     use_opq: bool,
     vector_dtype: VectorDType,
-    label_file: str,
+    tags: Union[str, VectorIdentifierBatch],
+    filter_labels_file: Optional[list[list[str]]],
     universal_label: str,
     filter_complexity: int,
-    tags: Optional[str],
-    index_prefix: str,
+    index_prefix: str
 ) -> None: ...
diff --git a/python/src/_common.py b/python/src/_common.py
index 53f1dbcab..2b28802ff 100644
--- a/python/src/_common.py
+++ b/python/src/_common.py
@@ -211,6 +211,7 @@ def _ensure_index_metadata(
     distance_metric: Optional[DistanceMetric],
     max_vectors: int,
     dimensions: Optional[int],
+    warn_size_exceeded: bool = False,
 ) -> Tuple[VectorDType, str, np.uint64, np.uint64]:
     possible_metadata = _read_index_metadata(index_path_and_prefix)
     if possible_metadata is None:
@@ -226,16 +227,17 @@ def _ensure_index_metadata(
         return vector_dtype, distance_metric, max_vectors, dimensions  # type: ignore
     else:
         vector_dtype, distance_metric, num_vectors, dimensions = possible_metadata
-        if max_vectors is not None and num_vectors > max_vectors:
-            warnings.warn(
-                "The number of vectors in the saved index exceeds the max_vectors parameter. "
-                "max_vectors is being adjusted to accommodate the dataset, but any insertions will fail."
-            )
-            max_vectors = num_vectors
-        if num_vectors == max_vectors:
-            warnings.warn(
-                "The number of vectors in the saved index equals max_vectors parameter. Any insertions will fail."
-            )
+        if warn_size_exceeded:
+            if max_vectors is not None and num_vectors > max_vectors:
+                warnings.warn(
+                    "The number of vectors in the saved index exceeds the max_vectors parameter. "
+                    "max_vectors is being adjusted to accommodate the dataset, but any insertions will fail."
+                )
+                max_vectors = num_vectors
+            if num_vectors == max_vectors:
+                warnings.warn(
+                    "The number of vectors in the saved index equals max_vectors parameter. Any insertions will fail."
+                )
         return possible_metadata
 
 
diff --git a/python/src/_dynamic_memory_index.py b/python/src/_dynamic_memory_index.py
index 9570b8345..cdf643208 100644
--- a/python/src/_dynamic_memory_index.py
+++ b/python/src/_dynamic_memory_index.py
@@ -144,7 +144,7 @@ def from_file(
             f"The file {tags_file} does not exist in {index_directory}",
         )
         vector_dtype, dap_metric, num_vectors, dimensions = _ensure_index_metadata(
-            index_prefix_path, vector_dtype, distance_metric, max_vectors, dimensions
+            index_prefix_path, vector_dtype, distance_metric, max_vectors, dimensions, warn_size_exceeded=True
         )
 
         index = cls(
@@ -309,7 +309,8 @@ def search(
                 f"k_neighbors={k_neighbors} asked for, but list_size={complexity} was smaller. Increasing {complexity} to {k_neighbors}"
             )
             complexity = k_neighbors
-        return self._index.search(query=_query, knn=k_neighbors, complexity=complexity)
+        neighbors, distances = self._index.search(query=_query, knn=k_neighbors, complexity=complexity)
+        return QueryResponse(identifiers=neighbors, distances=distances)
 
     def batch_search(
         self,
@@ -351,13 +352,14 @@ def batch_search(
             complexity = k_neighbors
 
         num_queries, dim = queries.shape
-        return self._index.batch_search(
+        neighbors, distances = self._index.batch_search(
             queries=_queries,
             num_queries=num_queries,
             knn=k_neighbors,
             complexity=complexity,
             num_threads=num_threads,
         )
+        return QueryResponseBatch(identifiers=neighbors, distances=distances)
 
     def save(self, save_path: str, index_prefix: str = "ann"):
         """
diff --git a/python/src/_static_disk_index.py b/python/src/_static_disk_index.py
index 1ca93c0a4..bd532577e 100644
--- a/python/src/_static_disk_index.py
+++ b/python/src/_static_disk_index.py
@@ -79,9 +79,9 @@ def __init__(
           does not exist, you are required to provide it.
         - **index_prefix**: The prefix of the index files. Defaults to "ann".
         """
-        index_prefix = _valid_index_prefix(index_directory, index_prefix)
+        index_prefix_path = _valid_index_prefix(index_directory, index_prefix)
         vector_dtype, metric, _, _ = _ensure_index_metadata(
-            index_prefix,
+            index_prefix_path,
             vector_dtype,
             distance_metric,
             1,  # it doesn't matter because we don't need it in this context anyway
@@ -101,7 +101,7 @@ def __init__(
             _index = _native_dap.StaticDiskFloatIndex
         self._index = _index(
             distance_metric=dap_metric,
-            index_path_prefix=os.path.join(index_directory, index_prefix),
+            index_path_prefix=index_prefix_path,
             num_threads=num_threads,
             num_nodes_to_cache=num_nodes_to_cache,
             cache_mechanism=cache_mechanism,
@@ -138,12 +138,13 @@ def search(
             )
             complexity = k_neighbors
 
-        return self._index.search(
+        neighbors, distances = self._index.search(
             query=_query,
             knn=k_neighbors,
             complexity=complexity,
             beam_width=beam_width,
         )
+        return QueryResponse(identifiers=neighbors, distances=distances)
 
     def batch_search(
         self,
@@ -187,7 +188,7 @@ def batch_search(
             complexity = k_neighbors
 
         num_queries, dim = _queries.shape
-        return self._index.batch_search(
+        neighbors, distances = self._index.batch_search(
             queries=_queries,
             num_queries=num_queries,
             knn=k_neighbors,
@@ -195,3 +196,4 @@ def batch_search(
             beam_width=beam_width,
             num_threads=num_threads,
         )
+        return QueryResponseBatch(identifiers=neighbors, distances=distances)
diff --git a/python/src/_static_memory_index.py b/python/src/_static_memory_index.py
index 8b87cd561..e481403cf 100644
--- a/python/src/_static_memory_index.py
+++ b/python/src/_static_memory_index.py
@@ -1,6 +1,7 @@
 # Copyright (c) Microsoft Corporation. All rights reserved.
 # Licensed under the MIT license.
 
+import json
 import os
 import warnings
 from typing import Optional
@@ -43,6 +44,7 @@ def __init__(
         distance_metric: Optional[DistanceMetric] = None,
         vector_dtype: Optional[VectorDType] = None,
         dimensions: Optional[int] = None,
+        enable_filters: bool = False
     ):
         """
         ### Parameters
@@ -73,10 +75,24 @@ def __init__(
         - **dimensions**: The vector dimensionality of this index. All new vectors inserted must be the same
           dimensionality. **This value is only used if a `{index_prefix}_metadata.bin` file does not exist.** If it
           does not exist, you are required to provide it.
+        - **enable_filters**: Indexes built with filters can also be used for filtered search.
         """
-        index_prefix = _valid_index_prefix(index_directory, index_prefix)
+        index_prefix_path = _valid_index_prefix(index_directory, index_prefix)
+        self._labels_map = {}
+        self._labels_metadata = {}
+        if enable_filters:
+            try:
+                with open(f"{index_prefix_path}_labels_map.txt", "r") as labels_map_if:
+                    for line in labels_map_if:
+                        (key, val) = line.split("\t")
+                        self._labels_map[key] = int(val)
+                with open(f"{index_prefix_path}_label_metadata.json", "r") as labels_metadata_if:
+                    self._labels_metadata = json.load(labels_metadata_if)
+            except: # noqa: E722
+                # exceptions are basically presumed to be either file not found or file not formatted correctly
+                raise RuntimeException("Filter labels file was unable to be processed.")
         vector_dtype, metric, num_points, dims = _ensure_index_metadata(
-            index_prefix,
+            index_prefix_path,
             vector_dtype,
             distance_metric,
             1,  # it doesn't matter because we don't need it in this context anyway
@@ -103,13 +119,13 @@ def __init__(
             distance_metric=dap_metric,
             num_points=num_points,
             dimensions=dims,
-            index_path=os.path.join(index_directory, index_prefix),
+            index_path=index_prefix_path,
             num_threads=num_threads,
             initial_search_complexity=initial_search_complexity,
         )
 
     def search(
-        self, query: VectorLike, k_neighbors: int, complexity: int
+            self, query: VectorLike, k_neighbors: int, complexity: int, filter_label: str = ""
     ) -> QueryResponse:
         """
         Searches the index by a single query vector.
@@ -121,13 +137,25 @@ def search(
         - **complexity**: Size of distance ordered list of candidate neighbors to use while searching. List size
           increases accuracy at the cost of latency. Must be at least k_neighbors in size.
         """
+        if filter_label != "":
+            if len(self._labels_map) == 0:
+                raise ValueError(
+                    f"A filter label of {filter_label} was provided, but this class was not initialized with filters "
+                    "enabled, e.g. StaticDiskMemory(..., enable_filters=True)"
+                )
+            if filter_label not in self._labels_map:
+                raise ValueError(
+                    f"A filter label of {filter_label} was provided, but the external(str)->internal(np.uint32) labels map "
+                    f"does not include that label."
+                )
+            k_neighbors = min(k_neighbors, self._labels_metadata[filter_label])
         _query = _castable_dtype_or_raise(query, expected=self._vector_dtype)
         _assert(len(_query.shape) == 1, "query vector must be 1-d")
         _assert(
             _query.shape[0] == self._dimensions,
             f"query vector must have the same dimensionality as the index; index dimensionality: {self._dimensions}, "
             f"query dimensionality: {_query.shape[0]}",
-        )
+            )
         _assert_is_positive_uint32(k_neighbors, "k_neighbors")
         _assert_is_nonnegative_uint32(complexity, "complexity")
 
@@ -136,7 +164,19 @@ def search(
                 f"k_neighbors={k_neighbors} asked for, but list_size={complexity} was smaller. Increasing {complexity} to {k_neighbors}"
             )
             complexity = k_neighbors
-        return self._index.search(query=_query, knn=k_neighbors, complexity=complexity)
+
+        if filter_label == "":
+            neighbors, distances = self._index.search(query=_query, knn=k_neighbors, complexity=complexity)
+        else:
+            filter = self._labels_map[filter_label]
+            neighbors, distances = self._index.search_with_filter(
+                query=query,
+                knn=k_neighbors,
+                complexity=complexity,
+                filter=filter
+            )
+        return QueryResponse(identifiers=neighbors, distances=distances)
+
 
     def batch_search(
         self,
@@ -178,10 +218,11 @@ def batch_search(
             complexity = k_neighbors
 
         num_queries, dim = _queries.shape
-        return self._index.batch_search(
+        neighbors, distances = self._index.batch_search(
             queries=_queries,
             num_queries=num_queries,
             knn=k_neighbors,
             complexity=complexity,
             num_threads=num_threads,
         )
+        return QueryResponseBatch(identifiers=neighbors, distances=distances)
diff --git a/python/src/builder.cpp b/python/src/builder.cpp
index 4485d66e6..e02a86d6c 100644
--- a/python/src/builder.cpp
+++ b/python/src/builder.cpp
@@ -31,12 +31,37 @@ template void build_disk_index<uint8_t>(diskann::Metric, const std::string &, co
 template void build_disk_index<int8_t>(diskann::Metric, const std::string &, const std::string &, uint32_t, uint32_t,
                                        double, double, uint32_t, uint32_t);
 
+template <typename T, typename TagT, typename LabelT>
+std::string prepare_filtered_label_map(diskann::Index<T, TagT, LabelT> &index, const std::string &index_output_path,
+                                       const std::string &filter_labels_file, const std::string &universal_label)
+{
+    std::string labels_file_to_use = index_output_path + "_label_formatted.txt";
+    std::string mem_labels_int_map_file = index_output_path + "_labels_map.txt";
+    convert_labels_string_to_int(filter_labels_file, labels_file_to_use, mem_labels_int_map_file, universal_label);
+    if (!universal_label.empty())
+    {
+        uint32_t unv_label_as_num = 0;
+        index.set_universal_label(unv_label_as_num);
+    }
+    return labels_file_to_use;
+}
+
+template std::string prepare_filtered_label_map<float>(diskann::Index<float, uint32_t, uint32_t> &, const std::string &,
+                                                       const std::string &, const std::string &);
+
+template std::string prepare_filtered_label_map<int8_t>(diskann::Index<int8_t, uint32_t, uint32_t> &,
+                                                        const std::string &, const std::string &, const std::string &);
+
+template std::string prepare_filtered_label_map<uint8_t>(diskann::Index<uint8_t, uint32_t, uint32_t> &,
+                                                         const std::string &, const std::string &, const std::string &);
+
 template <typename T, typename TagT, typename LabelT>
 void build_memory_index(const diskann::Metric metric, const std::string &vector_bin_path,
                         const std::string &index_output_path, const uint32_t graph_degree, const uint32_t complexity,
                         const float alpha, const uint32_t num_threads, const bool use_pq_build,
-                        const size_t num_pq_bytes, const bool use_opq, const uint32_t filter_complexity,
-                        const bool use_tags)
+                        const size_t num_pq_bytes, const bool use_opq, const bool use_tags,
+                        const std::string &filter_labels_file, const std::string &universal_label,
+                        const uint32_t filter_complexity)
 {
     diskann::IndexWriteParameters index_build_params = diskann::IndexWriteParametersBuilder(complexity, graph_degree)
                                                            .with_filter_list_size(filter_complexity)
@@ -44,10 +69,15 @@ void build_memory_index(const diskann::Metric metric, const std::string &vector_
                                                            .with_saturate_graph(false)
                                                            .with_num_threads(num_threads)
                                                            .build();
+    diskann::IndexSearchParams index_search_params =
+        diskann::IndexSearchParams(index_build_params.search_list_size, num_threads);
     size_t data_num, data_dim;
     diskann::get_bin_metadata(vector_bin_path, data_num, data_dim);
-    diskann::Index<T, TagT, LabelT> index(metric, data_dim, data_num, use_tags, use_tags, false, use_pq_build,
-                                          num_pq_bytes, use_opq);
+
+    diskann::Index<T, TagT, LabelT> index(metric, data_dim, data_num,
+                                          std::make_shared<diskann::IndexWriteParameters>(index_build_params),
+                                          std::make_shared<diskann::IndexSearchParams>(index_search_params), 0,
+                                          use_tags, use_tags, false, use_pq_build, num_pq_bytes, use_opq);
 
     if (use_tags)
     {
@@ -60,23 +90,44 @@ void build_memory_index(const diskann::Metric metric, const std::string &vector_
         size_t tag_dims = 1;
         diskann::load_bin(tags_file, tags_data, data_num, tag_dims);
         std::vector<TagT> tags(tags_data, tags_data + data_num);
-        index.build(vector_bin_path.c_str(), data_num, index_build_params, tags);
+        if (filter_labels_file.empty())
+        {
+            index.build(vector_bin_path.c_str(), data_num, tags);
+        }
+        else
+        {
+            auto labels_file = prepare_filtered_label_map<T, TagT, LabelT>(index, index_output_path, filter_labels_file,
+                                                                           universal_label);
+            index.build_filtered_index(vector_bin_path.c_str(), labels_file, data_num, tags);
+        }
     }
     else
     {
-        index.build(vector_bin_path.c_str(), data_num, index_build_params);
+        if (filter_labels_file.empty())
+        {
+            index.build(vector_bin_path.c_str(), data_num);
+        }
+        else
+        {
+            auto labels_file = prepare_filtered_label_map<T, TagT, LabelT>(index, index_output_path, filter_labels_file,
+                                                                           universal_label);
+            index.build_filtered_index(vector_bin_path.c_str(), labels_file, data_num);
+        }
     }
 
     index.save(index_output_path.c_str());
 }
 
 template void build_memory_index<float>(diskann::Metric, const std::string &, const std::string &, uint32_t, uint32_t,
-                                        float, uint32_t, bool, size_t, bool, uint32_t, bool);
+                                        float, uint32_t, bool, size_t, bool, bool, const std::string &,
+                                        const std::string &, uint32_t);
 
 template void build_memory_index<int8_t>(diskann::Metric, const std::string &, const std::string &, uint32_t, uint32_t,
-                                         float, uint32_t, bool, size_t, bool, uint32_t, bool);
+                                         float, uint32_t, bool, size_t, bool, bool, const std::string &,
+                                         const std::string &, uint32_t);
 
 template void build_memory_index<uint8_t>(diskann::Metric, const std::string &, const std::string &, uint32_t, uint32_t,
-                                          float, uint32_t, bool, size_t, bool, uint32_t, bool);
+                                          float, uint32_t, bool, size_t, bool, bool, const std::string &,
+                                          const std::string &, uint32_t);
 
 } // namespace diskannpy
diff --git a/python/src/diskann_bindings.cpp b/python/src/diskann_bindings.cpp
deleted file mode 100644
index 8b1378917..000000000
--- a/python/src/diskann_bindings.cpp
+++ /dev/null
@@ -1 +0,0 @@
-
diff --git a/python/src/dynamic_memory_index.cpp b/python/src/dynamic_memory_index.cpp
index af276b85f..d05e54d96 100644
--- a/python/src/dynamic_memory_index.cpp
+++ b/python/src/dynamic_memory_index.cpp
@@ -13,8 +13,7 @@ diskann::IndexWriteParameters dynamic_index_write_parameters(const uint32_t comp
                                                              const bool saturate_graph,
                                                              const uint32_t max_occlusion_size, const float alpha,
                                                              const uint32_t num_threads,
-                                                             const uint32_t filter_complexity,
-                                                             const uint32_t num_frozen_points)
+                                                             const uint32_t filter_complexity)
 {
     return diskann::IndexWriteParametersBuilder(complexity, graph_degree)
         .with_saturate_graph(saturate_graph)
@@ -22,28 +21,25 @@ diskann::IndexWriteParameters dynamic_index_write_parameters(const uint32_t comp
         .with_alpha(alpha)
         .with_num_threads(num_threads)
         .with_filter_list_size(filter_complexity)
-        .with_num_frozen_points(num_frozen_points)
         .build();
 }
 
 template <class DT>
-diskann::Index<DT, DynamicIdType, filterT> dynamic_index_builder(const diskann::Metric m,
-                                                                 const diskann::IndexWriteParameters &write_params,
-                                                                 const size_t dimensions, const size_t max_vectors,
-                                                                 const uint32_t initial_search_complexity,
-                                                                 const uint32_t initial_search_threads,
-                                                                 const bool concurrent_consolidation)
+diskann::Index<DT, DynamicIdType, filterT> dynamic_index_builder(
+    const diskann::Metric m, const diskann::IndexWriteParameters &write_params, const size_t dimensions,
+    const size_t max_vectors, const uint32_t initial_search_complexity, const uint32_t initial_search_threads,
+    const bool concurrent_consolidation, const uint32_t num_frozen_points)
 {
-    const uint32_t _initial_search_threads =
-        initial_search_threads != 0 ? initial_search_threads : omp_get_num_threads();
+    const uint32_t _initial_search_threads = initial_search_threads != 0 ? initial_search_threads : omp_get_num_procs();
+
+    auto index_search_params = diskann::IndexSearchParams(initial_search_complexity, _initial_search_threads);
     return diskann::Index<DT, DynamicIdType, filterT>(
         m, dimensions, max_vectors,
-        true,                      // dynamic_index
-        write_params,              // used for insert
-        initial_search_complexity, // used to prepare the scratch space for searching. can / may
-                                   // be expanded if the search asks for a larger L.
-        _initial_search_threads,   // also used for the scratch space
-        true,                      // enable_tags
+        std::make_shared<diskann::IndexWriteParameters>(write_params),     // index write params
+        std::make_shared<diskann::IndexSearchParams>(index_search_params), // index_search_params
+        num_frozen_points,                                                 // frozen_points
+        true,                                                              // dynamic_index
+        true,                                                              // enable_tags
         concurrent_consolidation,
         false,  // pq_dist_build
         0,      // num_pq_chunks
@@ -60,9 +56,9 @@ DynamicMemoryIndex<DT>::DynamicMemoryIndex(const diskann::Metric m, const size_t
                                            const uint32_t initial_search_threads, const bool concurrent_consolidation)
     : _initial_search_complexity(initial_search_complexity != 0 ? initial_search_complexity : complexity),
       _write_parameters(dynamic_index_write_parameters(complexity, graph_degree, saturate_graph, max_occlusion_size,
-                                                       alpha, num_threads, filter_complexity, num_frozen_points)),
+                                                       alpha, num_threads, filter_complexity)),
       _index(dynamic_index_builder<DT>(m, _write_parameters, dimensions, max_vectors, _initial_search_complexity,
-                                       initial_search_threads, concurrent_consolidation))
+                                       initial_search_threads, concurrent_consolidation, num_frozen_points))
 {
 }
 
diff --git a/python/src/module.cpp b/python/src/module.cpp
index 7aea9fc03..376515661 100644
--- a/python/src/module.cpp
+++ b/python/src/module.cpp
@@ -48,7 +48,8 @@ template <typename T> inline void add_variant(py::module_ &m, const Variant &var
 
     m.def(variant.memory_builder_name.c_str(), &diskannpy::build_memory_index<T>, "distance_metric"_a,
           "data_file_path"_a, "index_output_path"_a, "graph_degree"_a, "complexity"_a, "alpha"_a, "num_threads"_a,
-          "use_pq_build"_a, "num_pq_bytes"_a, "use_opq"_a, "filter_complexity"_a = 0, "use_tags"_a = false);
+          "use_pq_build"_a, "num_pq_bytes"_a, "use_opq"_a, "use_tags"_a = false, "filter_labels_file"_a = "",
+          "universal_label"_a = "", "filter_complexity"_a = 0);
 
     py::class_<diskannpy::StaticMemoryIndex<T>>(m, variant.static_memory_index_name.c_str())
         .def(py::init<const diskann::Metric, const std::string &, const size_t, const size_t, const uint32_t,
@@ -56,6 +57,8 @@ template <typename T> inline void add_variant(py::module_ &m, const Variant &var
              "distance_metric"_a, "index_path"_a, "num_points"_a, "dimensions"_a, "num_threads"_a,
              "initial_search_complexity"_a)
         .def("search", &diskannpy::StaticMemoryIndex<T>::search, "query"_a, "knn"_a, "complexity"_a)
+        .def("search_with_filter", &diskannpy::StaticMemoryIndex<T>::search_with_filter, "query"_a, "knn"_a,
+             "complexity"_a, "filter"_a)
         .def("batch_search", &diskannpy::StaticMemoryIndex<T>::batch_search, "queries"_a, "num_queries"_a, "knn"_a,
              "complexity"_a, "num_threads"_a);
 
diff --git a/python/src/static_disk_index.cpp b/python/src/static_disk_index.cpp
index 654f8ec30..9e86b0ad5 100644
--- a/python/src/static_disk_index.cpp
+++ b/python/src/static_disk_index.cpp
@@ -14,7 +14,8 @@ StaticDiskIndex<DT>::StaticDiskIndex(const diskann::Metric metric, const std::st
                                      const uint32_t cache_mechanism)
     : _reader(std::make_shared<PlatformSpecificAlignedFileReader>()), _index(_reader, metric)
 {
-    int load_success = _index.load(num_threads, index_path_prefix.c_str());
+    const uint32_t _num_threads = num_threads != 0 ? num_threads : omp_get_num_procs();
+    int load_success = _index.load(_num_threads, index_path_prefix.c_str());
     if (load_success != 0)
     {
         throw std::runtime_error("index load failed.");
@@ -22,7 +23,7 @@ StaticDiskIndex<DT>::StaticDiskIndex(const diskann::Metric metric, const std::st
     if (cache_mechanism == 1)
     {
         std::string sample_file = index_path_prefix + std::string("_sample_data.bin");
-        cache_sample_paths(num_nodes_to_cache, sample_file, num_threads);
+        cache_sample_paths(num_nodes_to_cache, sample_file, _num_threads);
     }
     else if (cache_mechanism == 2)
     {
diff --git a/python/src/static_memory_index.cpp b/python/src/static_memory_index.cpp
index 3bd927174..d3ac079af 100644
--- a/python/src/static_memory_index.cpp
+++ b/python/src/static_memory_index.cpp
@@ -17,15 +17,17 @@ diskann::Index<DT, StaticIdType, filterT> static_index_builder(const diskann::Me
     {
         throw std::runtime_error("initial_search_complexity must be a positive uint32_t");
     }
-
+    auto index_search_params = diskann::IndexSearchParams(initial_search_complexity, omp_get_num_procs());
     return diskann::Index<DT>(m, dimensions, num_points,
-                              false, // not a dynamic_index
-                              false, // no enable_tags/ids
-                              false, // no concurrent_consolidate,
-                              false, // pq_dist_build
-                              0,     // num_pq_chunks
-                              false, // use_opq = false
-                              0);    // num_frozen_points
+                              nullptr,                                                           // index write params
+                              std::make_shared<diskann::IndexSearchParams>(index_search_params), // index search params
+                              0,                                                                 // num frozen points
+                              false,                                                             // not a dynamic_index
+                              false,                                                             // no enable_tags/ids
+                              false,  // no concurrent_consolidate,
+                              false,  // pq_dist_build
+                              0,      // num_pq_chunks
+                              false); // use_opq = false
 }
 
 template <class DT>
@@ -34,7 +36,7 @@ StaticMemoryIndex<DT>::StaticMemoryIndex(const diskann::Metric m, const std::str
                                          const uint32_t initial_search_complexity)
     : _index(static_index_builder<DT>(m, num_points, dimensions, initial_search_complexity))
 {
-    const uint32_t _num_threads = num_threads != 0 ? num_threads : omp_get_num_threads();
+    const uint32_t _num_threads = num_threads != 0 ? num_threads : omp_get_num_procs();
     _index.load(index_prefix.c_str(), _num_threads, initial_search_complexity);
 }
 
@@ -49,12 +51,24 @@ NeighborsAndDistances<StaticIdType> StaticMemoryIndex<DT>::search(
     return std::make_pair(ids, dists);
 }
 
+template <typename DT>
+NeighborsAndDistances<StaticIdType> StaticMemoryIndex<DT>::search_with_filter(
+    py::array_t<DT, py::array::c_style | py::array::forcecast> &query, const uint64_t knn, const uint64_t complexity,
+    const filterT filter)
+{
+    py::array_t<StaticIdType> ids(knn);
+    py::array_t<float> dists(knn);
+    std::vector<DT *> empty_vector;
+    _index.search_with_filters(query.data(), filter, knn, complexity, ids.mutable_data(), dists.mutable_data());
+    return std::make_pair(ids, dists);
+}
+
 template <typename DT>
 NeighborsAndDistances<StaticIdType> StaticMemoryIndex<DT>::batch_search(
     py::array_t<DT, py::array::c_style | py::array::forcecast> &queries, const uint64_t num_queries, const uint64_t knn,
     const uint64_t complexity, const uint32_t num_threads)
 {
-    const uint32_t _num_threads = num_threads != 0 ? num_threads : omp_get_num_threads();
+    const uint32_t _num_threads = num_threads != 0 ? num_threads : omp_get_num_procs();
     py::array_t<StaticIdType> ids({num_queries, knn});
     py::array_t<float> dists({num_queries, knn});
     std::vector<DT *> empty_vector;
diff --git a/python/tests/test_dynamic_memory_index.py b/python/tests/test_dynamic_memory_index.py
index ff9c8981d..13d9b08db 100644
--- a/python/tests/test_dynamic_memory_index.py
+++ b/python/tests/test_dynamic_memory_index.py
@@ -40,6 +40,7 @@ def setUpClass(cls) -> None:
             build_random_vectors_and_memory_index(np.float32, "cosine", with_tags=True),
             build_random_vectors_and_memory_index(np.uint8, "cosine", with_tags=True),
             build_random_vectors_and_memory_index(np.int8, "cosine", with_tags=True),
+            build_random_vectors_and_memory_index(np.float32, "mips", with_tags=True),
         ]
         cls._example_ann_dir = cls._test_matrix[0][4]
 
@@ -72,12 +73,15 @@ def test_recall_and_batch(self):
                 )
 
                 k = 5
-                diskann_neighbors, diskann_distances = index.batch_search(
+                batch_response = index.batch_search(
                     query_vectors,
                     k_neighbors=k,
                     complexity=5,
                     num_threads=16,
                 )
+                self.assertIsInstance(batch_response, dap.QueryResponseBatch)
+
+                diskann_neighbors, diskann_distances = batch_response
                 if metric == "l2" or metric == "cosine":
                     knn = NearestNeighbors(
                         n_neighbors=100, algorithm="auto", metric=metric
@@ -115,7 +119,9 @@ def test_single(self):
                 index.batch_insert(vectors=index_vectors, vector_ids=generated_tags)
 
                 k = 5
-                ids, dists = index.search(query_vectors[0], k_neighbors=k, complexity=5)
+                response = index.search(query_vectors[0], k_neighbors=k, complexity=5)
+                self.assertIsInstance(response, dap.QueryResponse)
+                ids, dists = response
                 self.assertEqual(ids.shape[0], k)
                 self.assertEqual(dists.shape[0], k)
 
@@ -437,4 +443,27 @@ def _tiny_index():
             warnings.simplefilter("error")  # turns warnings into raised exceptions
             index.batch_insert(rng.random((2, 10), dtype=np.float32), np.array([15, 25], dtype=np.uint32))
 
+    def test_zero_threads(self):
+        for (
+                metric,
+                dtype,
+                query_vectors,
+                index_vectors,
+                ann_dir,
+                vector_bin_file,
+                generated_tags,
+        ) in self._test_matrix:
+            with self.subTest(msg=f"Testing dtype {dtype}"):
+                index = dap.DynamicMemoryIndex(
+                    distance_metric="l2",
+                    vector_dtype=dtype,
+                    dimensions=10,
+                    max_vectors=11_000,
+                    complexity=64,
+                    graph_degree=32,
+                    num_threads=0, # explicitly asking it to use all available threads.
+                )
+                index.batch_insert(vectors=index_vectors, vector_ids=generated_tags, num_threads=0)
 
+                k = 5
+                ids, dists = index.batch_search(query_vectors, k_neighbors=k, complexity=5, num_threads=0)
diff --git a/python/tests/test_static_disk_index.py b/python/tests/test_static_disk_index.py
index 4ba544106..0397c321d 100644
--- a/python/tests/test_static_disk_index.py
+++ b/python/tests/test_static_disk_index.py
@@ -3,6 +3,7 @@
 
 import shutil
 import unittest
+from pathlib import Path
 from tempfile import mkdtemp
 
 import diskannpy as dap
@@ -25,7 +26,7 @@ def _build_random_vectors_and_index(dtype, metric):
             complexity=32,
             search_memory_maximum=0.00003,
             build_memory_maximum=1,
-            num_threads=1,
+            num_threads=0,
             pq_disk_bytes=0,
         )
     return metric, dtype, query_vectors, index_vectors, ann_dir
@@ -38,6 +39,7 @@ def setUpClass(cls) -> None:
             _build_random_vectors_and_index(np.float32, "l2"),
             _build_random_vectors_and_index(np.uint8, "l2"),
             _build_random_vectors_and_index(np.int8, "l2"),
+            _build_random_vectors_and_index(np.float32, "mips"),
         ]
         cls._example_ann_dir = cls._test_matrix[0][4]
 
@@ -62,13 +64,16 @@ def test_recall_and_batch(self):
                 )
 
                 k = 5
-                diskann_neighbors, diskann_distances = index.batch_search(
+                batch_response = index.batch_search(
                     query_vectors,
                     k_neighbors=k,
                     complexity=5,
                     beam_width=2,
                     num_threads=16,
                 )
+                self.assertIsInstance(batch_response, dap.QueryResponseBatch)
+
+                diskann_neighbors, diskann_distances = batch_response
                 if metric == "l2":
                     knn = NearestNeighbors(
                         n_neighbors=100, algorithm="auto", metric="l2"
@@ -93,9 +98,11 @@ def test_single(self):
                 )
 
                 k = 5
-                ids, dists = index.search(
+                response = index.search(
                     query_vectors[0], k_neighbors=k, complexity=5, beam_width=2
                 )
+                self.assertIsInstance(response, dap.QueryResponse)
+                ids, dists = response
                 self.assertEqual(ids.shape[0], k)
                 self.assertEqual(dists.shape[0], k)
 
@@ -144,3 +151,48 @@ def test_value_ranges_batch_search(self):
                     index.batch_search(
                         queries=np.array([[]], dtype=np.single), **kwargs
                     )
+
+    def test_zero_threads(self):
+        for metric, dtype, query_vectors, index_vectors, ann_dir in self._test_matrix:
+            with self.subTest(msg=f"Testing dtype {dtype}"):
+                index = dap.StaticDiskIndex(
+                    distance_metric="l2",
+                    vector_dtype=dtype,
+                    index_directory=ann_dir,
+                    num_threads=0,  # Issue #432
+                    num_nodes_to_cache=10,
+                )
+
+                k = 5
+                ids, dists = index.batch_search(
+                    query_vectors, k_neighbors=k, complexity=5, beam_width=2, num_threads=0
+                )
+
+    def test_relative_paths(self):
+        # Issue 483 and 491 both fixed errors that were somehow slipping past our unit tests
+        # os.path.join() acts as a semi-merge if you give it two paths that look absolute.
+        # since our unit tests are using absolute paths via tempfile.mkdtemp(), the double os.path.join() was never
+        # caught by our tests, but was very easy to trip when using relative paths
+        rel_dir = "tmp"
+        Path(rel_dir).mkdir(exist_ok=True)
+        try:
+            tiny_index_vecs = random_vectors(20, 10, dtype=np.float32, seed=12345)
+            dap.build_disk_index(
+                data=tiny_index_vecs,
+                distance_metric="l2",
+                index_directory=rel_dir,
+                graph_degree=16,
+                complexity=32,
+                search_memory_maximum=0.00003,
+                build_memory_maximum=1,
+                num_threads=0,
+                pq_disk_bytes=0,
+            )
+            index = dap.StaticDiskIndex(
+                index_directory=rel_dir,
+                num_threads=16,
+                num_nodes_to_cache=10,
+            )
+
+        finally:
+            shutil.rmtree(rel_dir, ignore_errors=True)
diff --git a/python/tests/test_static_memory_index.py b/python/tests/test_static_memory_index.py
index cb7f0f01d..fe571be5d 100644
--- a/python/tests/test_static_memory_index.py
+++ b/python/tests/test_static_memory_index.py
@@ -1,12 +1,17 @@
 # Copyright (c) Microsoft Corporation. All rights reserved.
 # Licensed under the MIT license.
 
+import os
 import shutil
 import unittest
 
+from pathlib import Path
+from tempfile import mkdtemp
+
 import diskannpy as dap
 import numpy as np
 from fixtures import build_random_vectors_and_memory_index, calculate_recall
+from fixtures import  random_vectors
 from sklearn.neighbors import NearestNeighbors
 
 
@@ -20,6 +25,7 @@ def setUpClass(cls) -> None:
             build_random_vectors_and_memory_index(np.float32, "cosine"),
             build_random_vectors_and_memory_index(np.uint8, "cosine"),
             build_random_vectors_and_memory_index(np.int8, "cosine"),
+            build_random_vectors_and_memory_index(np.float32, "mips"),
         ]
         cls._example_ann_dir = cls._test_matrix[0][4]
 
@@ -50,12 +56,15 @@ def test_recall_and_batch(self):
                 )
 
                 k = 5
-                diskann_neighbors, diskann_distances = index.batch_search(
+                batch_response = index.batch_search(
                     query_vectors,
                     k_neighbors=k,
                     complexity=5,
                     num_threads=16,
                 )
+                self.assertIsInstance(batch_response, dap.QueryResponseBatch)
+
+                diskann_neighbors, diskann_distances = batch_response
                 if metric in ["l2", "cosine"]:
                     knn = NearestNeighbors(
                         n_neighbors=100, algorithm="auto", metric=metric
@@ -86,7 +95,9 @@ def test_single(self):
                 )
 
                 k = 5
-                ids, dists = index.search(query_vectors[0], k_neighbors=k, complexity=5)
+                response = index.search(query_vectors[0], k_neighbors=k, complexity=5)
+                self.assertIsInstance(response, dap.QueryResponse)
+                ids, dists = response
                 self.assertEqual(ids.shape[0], k)
                 self.assertEqual(dists.shape[0], k)
 
@@ -160,3 +171,152 @@ def test_value_ranges_batch_search(self):
                     index.batch_search(
                         queries=np.array([[]], dtype=np.single), **kwargs
                     )
+
+    def test_zero_threads(self):
+        for (
+            metric,
+            dtype,
+            query_vectors,
+            index_vectors,
+            ann_dir,
+            vector_bin_file,
+            _,
+        ) in self._test_matrix:
+            with self.subTest(msg=f"Testing dtype {dtype}"):
+                index = dap.StaticMemoryIndex(
+                    index_directory=ann_dir,
+                    num_threads=0,
+                    initial_search_complexity=32,
+                )
+
+                k = 5
+                ids, dists = index.batch_search(query_vectors, k_neighbors=k, complexity=5, num_threads=0)
+
+    def test_relative_paths(self):
+        # Issue 483 and 491 both fixed errors that were somehow slipping past our unit tests
+        # os.path.join() acts as a semi-merge if you give it two paths that look absolute.
+        # since our unit tests are using absolute paths via tempfile.mkdtemp(), the double os.path.join() was never
+        # caught by our tests, but was very easy to trip when using relative paths
+        rel_dir = "tmp"
+        Path(rel_dir).mkdir(exist_ok=True)
+        try:
+            tiny_index_vecs = random_vectors(20, 10, dtype=np.float32, seed=12345)
+            dap.build_memory_index(
+                data=tiny_index_vecs,
+                distance_metric="l2",
+                index_directory=rel_dir,
+                graph_degree=16,
+                complexity=32,
+                num_threads=0,
+            )
+            index = dap.StaticMemoryIndex(
+                index_directory=rel_dir,
+                num_threads=0,
+                initial_search_complexity=32,
+            )
+
+        finally:
+            shutil.rmtree(rel_dir, ignore_errors=True)
+
+
+
+class TestFilteredStaticMemoryIndex(unittest.TestCase):
+    def test_simple_scenario(self):
+        vectors: np.ndarray = random_vectors(10000, 10, dtype=np.float32, seed=54321)
+        query_vectors: np.ndarray = random_vectors(10, 10, dtype=np.float32)
+        temp = mkdtemp()
+        labels = []
+        for idx in range(0, vectors.shape[0]):
+            label_list = []
+            if idx % 3 == 0:
+                label_list.append("even_by_3")
+            if idx % 5 == 0:
+                label_list.append("even_by_5")
+            if len(label_list) == 0:
+                label_list = ["neither"]
+            labels.append(label_list)
+        try:
+            dap.build_memory_index(
+                data=vectors,
+                distance_metric="l2",
+                index_directory=temp,
+                complexity=64,
+                graph_degree=32,
+                num_threads=16,
+                filter_labels=labels,
+                universal_label="all",
+                filter_complexity=128,
+            )
+            index = dap.StaticMemoryIndex(
+                index_directory=temp,
+                num_threads=16,
+                initial_search_complexity=64,
+                enable_filters=True
+            )
+
+            k = 50
+            probable_superset, _ = index.search(query_vectors[0], k_neighbors=k*2, complexity=128)
+            ids_1, _ = index.search(query_vectors[0], k_neighbors=k, complexity=64, filter_label="even_by_3")
+            self.assertTrue(all(id % 3 == 0 for id in ids_1))
+            ids_2, _ = index.search(query_vectors[0], k_neighbors=k, complexity=64, filter_label="even_by_5")
+            self.assertTrue(all(id % 5 == 0 for id in ids_2))
+
+            in_superset = np.intersect1d(probable_superset, np.append(ids_1, ids_2)).shape[0]
+            self.assertTrue(in_superset/k*2 > 0.98)
+        finally:
+            shutil.rmtree(temp, ignore_errors=True)
+
+
+    def test_exhaustive_validation(self):
+        vectors: np.ndarray = random_vectors(10000, 10, dtype=np.float32, seed=54321)
+        query_vectors: np.ndarray = random_vectors(10, 10, dtype=np.float32)
+        temp = mkdtemp()
+        labels = []
+        for idx in range(0, vectors.shape[0]):
+            label_list = []
+            label_list.append("all")
+            if idx % 2 == 0:
+                label_list.append("even")
+            else:
+                label_list.append("odd")
+            if idx % 3 == 0:
+                label_list.append("by_three")
+            labels.append(label_list)
+        try:
+            dap.build_memory_index(
+                data=vectors,
+                distance_metric="l2",
+                index_directory=temp,
+                complexity=64,
+                graph_degree=32,
+                num_threads=16,
+                filter_labels=labels,
+                universal_label="",
+                filter_complexity=128,
+            )
+            index = dap.StaticMemoryIndex(
+                index_directory=temp,
+                num_threads=16,
+                initial_search_complexity=64,
+                enable_filters=True
+            )
+
+            k = 5_000
+            without_filter, _ = index.search(query_vectors[0], k_neighbors=k*2, complexity=128)
+            with_filter_but_label_all, _ = index.search(
+                query_vectors[0], k_neighbors=k*2, complexity=128, filter_label="all"
+            )
+            intersection = np.intersect1d(without_filter, with_filter_but_label_all)
+            intersect_count = intersection.shape[0]
+            self.assertEqual(intersect_count, k*2)
+
+            ids_1, _ = index.search(query_vectors[0], k_neighbors=k*10, complexity=128, filter_label="even")
+            # we ask for more than 5000. prior to the addition of the `_label_metadata.json` file
+            # asking for more k than we had items with that label would result in nonsense results past the first
+            # 5000.
+            self.assertTrue(all(id % 2 == 0 for id in ids_1))
+            ids_2, _ = index.search(query_vectors[0], k_neighbors=k, complexity=128, filter_label="odd")
+            self.assertTrue(all(id % 2 != 0 for id in ids_2))
+
+        finally:
+            shutil.rmtree(temp, ignore_errors=True)
diff --git a/rust/Cargo.lock b/rust/Cargo.lock
index 2e58e9322..3a8a25223 100644
--- a/rust/Cargo.lock
+++ b/rust/Cargo.lock
@@ -122,6 +122,12 @@ version = "1.3.2"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "bef38d45163c2f1dde094a7dfd33ccf595c92905c8f8f4fdc18d06fb1037718a"
 
+[[package]]
+name = "bitflags"
+version = "2.4.1"
+source = "registry+https://github.com/rust-lang/crates.io-index"
+checksum = "327762f6e5a765692301e5bb513e0d9fef63be86bbc14528052b1cd3e6f03e07"
+
 [[package]]
 name = "build_and_insert_delete_memory_index"
 version = "0.1.0"
@@ -268,7 +274,7 @@ checksum = "9a78fbdd3cc2914ddf37ba444114bc7765bbdcb55ec9cbe6fa054f0137400717"
 dependencies = [
  "anstream",
  "anstyle",
- "bitflags",
+ "bitflags 1.3.2",
  "clap_lex",
  "strsim",
 ]
@@ -870,11 +876,11 @@ dependencies = [
 
 [[package]]
 name = "openssl"
-version = "0.10.55"
+version = "0.10.60"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "345df152bc43501c5eb9e4654ff05f794effb78d4efe3d53abc158baddc0703d"
+checksum = "79a4c6c3a2b158f7f8f2a2fc5a969fa3a068df6fc9dbb4a43845436e3af7c800"
 dependencies = [
- "bitflags",
+ "bitflags 2.4.1",
  "cfg-if",
  "foreign-types",
  "libc",
@@ -902,9 +908,9 @@ checksum = "ff011a302c396a5197692431fc1948019154afc178baf7d8e37367442a4601cf"
 
 [[package]]
 name = "openssl-sys"
-version = "0.9.90"
+version = "0.9.96"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "374533b0e45f3a7ced10fcaeccca020e66656bc03dac384f852e4e5a7a8104a6"
+checksum = "3812c071ba60da8b5677cc12bcb1d42989a65553772897a7e0355545a819838f"
 dependencies = [
  "cc",
  "libc",
@@ -1116,7 +1122,7 @@ version = "0.2.16"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "fb5a58c1855b4b6819d59012155603f0b22ad30cad752600aadfcb695265519a"
 dependencies = [
- "bitflags",
+ "bitflags 1.3.2",
 ]
 
 [[package]]
@@ -1125,7 +1131,7 @@ version = "0.3.5"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "567664f262709473930a4bf9e51bf2ebf3348f2e748ccc50dea20646858f8f29"
 dependencies = [
- "bitflags",
+ "bitflags 1.3.2",
 ]
 
 [[package]]
@@ -1156,11 +1162,11 @@ checksum = "436b050e76ed2903236f032a59761c1eb99e1b0aead2c257922771dab1fc8c78"
 
 [[package]]
 name = "rustix"
-version = "0.37.20"
+version = "0.37.25"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "b96e891d04aa506a6d1f318d2771bcb1c7dfda84e126660ace067c9b474bb2c0"
+checksum = "d4eb579851244c2c03e7c24f501c3432bed80b8f720af1d6e5b0e0f01555a035"
 dependencies = [
- "bitflags",
+ "bitflags 1.3.2",
  "errno",
  "io-lifetimes",
  "libc",
@@ -1236,7 +1242,7 @@ version = "2.9.1"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "1fc758eb7bffce5b308734e9b0c1468893cae9ff70ebf13e7090be8dcbcc83a8"
 dependencies = [
- "bitflags",
+ "bitflags 1.3.2",
  "core-foundation",
  "core-foundation-sys",
  "libc",
@@ -1613,7 +1619,7 @@ version = "0.1.2"
 source = "registry+https://github.com/rust-lang/crates.io-index"
 checksum = "e50d0fa665033a19ecefd281b4fb5481eba2972dedbb5ec129c9392a206d652f"
 dependencies = [
- "bitflags",
+ "bitflags 1.3.2",
 ]
 
 [[package]]
@@ -1794,9 +1800,9 @@ dependencies = [
 
 [[package]]
 name = "zerocopy"
-version = "0.6.1"
+version = "0.6.6"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "332f188cc1bcf1fe1064b8c58d150f497e697f49774aa846f2dc949d9a25f236"
+checksum = "854e949ac82d619ee9a14c66a1b674ac730422372ccb759ce0c39cabcf2bf8e6"
 dependencies = [
  "byteorder",
  "zerocopy-derive",
@@ -1804,11 +1810,11 @@ dependencies = [
 
 [[package]]
 name = "zerocopy-derive"
-version = "0.3.2"
+version = "0.6.6"
 source = "registry+https://github.com/rust-lang/crates.io-index"
-checksum = "6505e6815af7de1746a08f69c69606bb45695a17149517680f3b2149713b19a3"
+checksum = "125139de3f6b9d625c39e2efdd73d41bdac468ccd556556440e322be0e1bbd91"
 dependencies = [
  "proc-macro2",
  "quote",
- "syn 1.0.109",
+ "syn 2.0.18",
 ]
diff --git a/scripts/dev/install-dev-deps-ubuntu.bash b/scripts/dev/install-dev-deps-ubuntu.bash
index 84f558ed6..09ad6ebb9 100755
--- a/scripts/dev/install-dev-deps-ubuntu.bash
+++ b/scripts/dev/install-dev-deps-ubuntu.bash
@@ -1,6 +1,6 @@
 #!/bin/bash
 
-apt install cmake \
+DEBIAN_FRONTEND=noninteractive apt install -y cmake \
   g++ \
   libaio-dev \
   libgoogle-perftools-dev \
diff --git a/scripts/perf/Dockerfile b/scripts/perf/Dockerfile
new file mode 100644
index 000000000..f900627ab
--- /dev/null
+++ b/scripts/perf/Dockerfile
@@ -0,0 +1,31 @@
+#Copyright(c) Microsoft Corporation.All rights reserved.
+#Licensed under the MIT license.
+
+FROM ubuntu:jammy
+
+# Can be provided at build to point to a specific commit-ish, by default builds from HEAD
+ARG GIT_COMMIT_ISH=HEAD
+
+RUN apt update
+RUN apt install -y software-properties-common
+RUN add-apt-repository -y ppa:git-core/ppa
+RUN apt update
+RUN DEBIAN_FRONTEND=noninteractive apt install -y git time
+
+COPY dev/install-dev-deps-ubuntu.bash /app/fallback/install-dev-deps-ubuntu.bash
+WORKDIR /app
+RUN git clone https://github.com/microsoft/DiskANN.git
+WORKDIR /app/DiskANN
+RUN git checkout $GIT_COMMIT_ISH
+
+# we would prefer to use the deps requested at the same commit. if the script doesn't exist we'll use the current one.
+RUN bash scripts/dev/install-dev-deps-ubuntu.bash || bash /app/fallback/install-dev-deps-ubuntu.bash
+
+RUN mkdir build
+RUN cmake -S . -B build  -DCMAKE_BUILD_TYPE=Release -DUNIT_TEST=True
+RUN cmake --build build -- -j
+
+RUN mkdir /app/logs
+COPY perf/perf_test.sh /app/DiskANN/perf_test.sh
+
+ENTRYPOINT bash perf_test.sh
diff --git a/scripts/perf/README.md b/scripts/perf/README.md
new file mode 100644
index 000000000..692eedca7
--- /dev/null
+++ b/scripts/perf/README.md
@@ -0,0 +1,20 @@
+#Performance Tests
+
+The bash scripts in this folder are responsible for running a suite of performance 
+tests.
+
+The timing and recall metrics reported by these tests when run periodically can then
+be used to identify performance improvements or regressions as 
+development continues.
+
+## Usage
+
+`docker build` must be run with the context directory set to `scripts`, but the Dockerfile set to `scripts/perf/Dockerfile` as in:
+```bash
+docker build [--build-arg GIT_COMMIT_ISH=<rev>] -f scripts/perf/Dockerfile scripts
+```
+
+We prefer to install the dependencies from the commit-ish that we're building against, but as the deps were not stored 
+in a known file in all commits, we will fall back to the one currently in HEAD if one is not found already.
+
+The `--build-arg GIT_COMMIT_ISH=<rev>` is optional, with a default value of HEAD if not otherwise specified.
diff --git a/scripts/perf/perf_test.sh b/scripts/perf/perf_test.sh
new file mode 100644
index 000000000..a8d537f01
--- /dev/null
+++ b/scripts/perf/perf_test.sh
@@ -0,0 +1,40 @@
+#!/bin/bash
+
+function json_time {
+  command="$@"
+  echo "Executing $command"
+  /usr/bin/time --quiet -o /app/logs/time.log -a --format '{"command":"%C", "wallclock": %e, "user": %U, "sys": %S}' $command
+  ret=$?
+  if [ $ret -ne 0 ]; then
+    echo "{\"command\": \""$command"\", \"status_code\": $ret}" >> /app/logs/time.log
+  fi
+}
+
+mkdir data
+rm /app/logs/time.log
+touch /app/logs/time.log
+chmod 666 /app/logs/time.log
+
+if [ -d "build/apps" ]; then
+	export BASE_PATH="build/apps"
+else
+	export BASE_PATH="build/tests"
+fi
+
+json_time $BASE_PATH/utils/rand_data_gen --data_type float --output_file data/rand_float_768D_1M_norm1.0.bin -D 768 -N 1000000 --norm 1.0
+json_time $BASE_PATH/utils/rand_data_gen --data_type float --output_file data/rand_float_768D_10K_norm1.0.bin -D 768 -N 10000 --norm 1.0
+
+json_time $BASE_PATH/utils/compute_groundtruth  --data_type float --dist_fn l2 --base_file data/rand_float_768D_1M_norm1.0.bin --query_file data/rand_float_768D_10K_norm1.0.bin --gt_file data/l2_rand_float_768D_1M_norm1.0_768D_10K_norm1.0_gt100 --K 100
+json_time $BASE_PATH/utils/compute_groundtruth  --data_type float --dist_fn mips --base_file data/rand_float_768D_1M_norm1.0.bin --query_file data/rand_float_768D_10K_norm1.0.bin --gt_file data/mips_rand_float_768D_1M_norm1.0_768D_10K_norm1.0_gt100 --K 100
+json_time $BASE_PATH/utils/compute_groundtruth  --data_type float --dist_fn cosine --base_file data/rand_float_768D_1M_norm1.0.bin --query_file data/rand_float_768D_10K_norm1.0.bin --gt_file data/cosine_rand_float_768D_1M_norm1.0_768D_10K_norm1.0_gt100 --K 100
+
+json_time $BASE_PATH/build_memory_index --data_type float --dist_fn l2 --data_path data/rand_float_768D_1M_norm1.0.bin --index_path_prefix data/index_l2_rand_float_768D_1M_norm1.0
+json_time $BASE_PATH/search_memory_index --data_type float --dist_fn l2 --index_path_prefix data/index_l2_rand_float_768D_1M_norm1.0 --query_file data/rand_float_768D_10K_norm1.0.bin --recall_at 10 --result_path temp --gt_file data/l2_rand_float_768D_1M_norm1.0_768D_10K_norm1.0_gt100 -L 100 32
+json_time $BASE_PATH/search_memory_index --data_type float --dist_fn fast_l2 --index_path_prefix data/index_l2_rand_float_768D_1M_norm1.0 --query_file data/rand_float_768D_10K_norm1.0.bin --recall_at 10 --result_path temp --gt_file data/l2_rand_float_768D_1M_norm1.0_768D_10K_norm1.0_gt100 -L 100 32
+
+json_time $BASE_PATH/build_memory_index --data_type float --dist_fn mips --data_path data/rand_float_768D_1M_norm1.0.bin --index_path_prefix data/index_mips_rand_float_768D_1M_norm1.0
+json_time $BASE_PATH/search_memory_index --data_type float --dist_fn mips --index_path_prefix data/index_l2_rand_float_768D_1M_norm1.0 --query_file data/rand_float_768D_10K_norm1.0.bin --recall_at 10 --result_path temp --gt_file data/mips_rand_float_768D_1M_norm1.0_768D_10K_norm1.0_gt100 -L 100 32
+
+json_time $BASE_PATH/build_memory_index --data_type float --dist_fn cosine --data_path data/rand_float_768D_1M_norm1.0.bin --index_path_prefix data/index_cosine_rand_float_768D_1M_norm1.0
+json_time $BASE_PATH/search_memory_index --data_type float --dist_fn cosine --index_path_prefix data/index_l2_rand_float_768D_1M_norm1.0 --query_file data/rand_float_768D_10K_norm1.0.bin --recall_at 10 --result_path temp --gt_file data/cosine_rand_float_768D_1M_norm1.0_768D_10K_norm1.0_gt100 -L 100 32
+
diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt
index 2206a01f7..cbca26440 100644
--- a/src/CMakeLists.txt
+++ b/src/CMakeLists.txt
@@ -13,7 +13,7 @@ else()
         linux_aligned_file_reader.cpp math_utils.cpp natural_number_map.cpp
         in_mem_data_store.cpp in_mem_graph_store.cpp
         natural_number_set.cpp memory_mapper.cpp partition.cpp pq.cpp
-        pq_flash_index.cpp scratch.cpp logger.cpp utils.cpp filter_utils.cpp index_factory.cpp abstract_index.cpp)
+        pq_flash_index.cpp scratch.cpp logger.cpp utils.cpp filter_utils.cpp index_factory.cpp abstract_index.cpp pq_l2_distance.cpp pq_data_store.cpp)
     if (RESTAPI)
         list(APPEND CPP_SOURCES restapi/search_wrapper.cpp restapi/server.cpp)
     endif()
diff --git a/src/abstract_data_store.cpp b/src/abstract_data_store.cpp
index a980bd545..0cff0152e 100644
--- a/src/abstract_data_store.cpp
+++ b/src/abstract_data_store.cpp
@@ -2,7 +2,6 @@
 // Licensed under the MIT license.
 
 #include <vector>
-
 #include "abstract_data_store.h"
 
 namespace diskann
diff --git a/src/abstract_index.cpp b/src/abstract_index.cpp
index 518f8b7dd..92665825f 100644
--- a/src/abstract_index.cpp
+++ b/src/abstract_index.cpp
@@ -6,12 +6,11 @@ namespace diskann
 {
 
 template <typename data_type, typename tag_type>
-void AbstractIndex::build(const data_type *data, const size_t num_points_to_load,
-                          const IndexWriteParameters &parameters, const std::vector<tag_type> &tags)
+void AbstractIndex::build(const data_type *data, const size_t num_points_to_load, const std::vector<tag_type> &tags)
 {
     auto any_data = std::any(data);
     auto any_tags_vec = TagVector(tags);
-    this->_build(any_data, num_points_to_load, parameters, any_tags_vec);
+    this->_build(any_data, num_points_to_load, any_tags_vec);
 }
 
 template <typename data_type, typename IDType>
@@ -25,12 +24,13 @@ std::pair<uint32_t, uint32_t> AbstractIndex::search(const data_type *query, cons
 
 template <typename data_type, typename tag_type>
 size_t AbstractIndex::search_with_tags(const data_type *query, const uint64_t K, const uint32_t L, tag_type *tags,
-                                       float *distances, std::vector<data_type *> &res_vectors)
+                                       float *distances, std::vector<data_type *> &res_vectors, bool use_filters,
+                                       const std::string filter_label)
 {
     auto any_query = std::any(query);
     auto any_tags = std::any(tags);
     auto any_res_vectors = DataVector(res_vectors);
-    return this->_search_with_tags(any_query, K, L, any_tags, distances, any_res_vectors);
+    return this->_search_with_tags(any_query, K, L, any_tags, distances, any_res_vectors, use_filters, filter_label);
 }
 
 template <typename IndexType>
@@ -57,6 +57,15 @@ int AbstractIndex::insert_point(const data_type *point, const tag_type tag)
     return this->_insert_point(any_point, any_tag);
 }
 
+template <typename data_type, typename tag_type, typename label_type>
+int AbstractIndex::insert_point(const data_type *point, const tag_type tag, const std::vector<label_type> &labels)
+{
+    auto any_point = std::any(point);
+    auto any_tag = std::any(tag);
+    auto any_labels = Labelvector(labels);
+    return this->_insert_point(any_point, any_tag, any_labels);
+}
+
 template <typename tag_type> int AbstractIndex::lazy_delete(const tag_type &tag)
 {
     auto any_tag = std::any(tag);
@@ -90,52 +99,46 @@ template <typename tag_type, typename data_type> int AbstractIndex::get_vector_b
     return this->_get_vector_by_tag(any_tag, any_data_ptr);
 }
 
+template <typename label_type> void AbstractIndex::set_universal_label(const label_type universal_label)
+{
+    auto any_label = std::any(universal_label);
+    this->_set_universal_label(any_label);
+}
+
 // exports
 template DISKANN_DLLEXPORT void AbstractIndex::build<float, int32_t>(const float *data, const size_t num_points_to_load,
-                                                                     const IndexWriteParameters &parameters,
                                                                      const std::vector<int32_t> &tags);
 template DISKANN_DLLEXPORT void AbstractIndex::build<int8_t, int32_t>(const int8_t *data,
                                                                       const size_t num_points_to_load,
-                                                                      const IndexWriteParameters &parameters,
                                                                       const std::vector<int32_t> &tags);
 template DISKANN_DLLEXPORT void AbstractIndex::build<uint8_t, int32_t>(const uint8_t *data,
                                                                        const size_t num_points_to_load,
-                                                                       const IndexWriteParameters &parameters,
                                                                        const std::vector<int32_t> &tags);
 template DISKANN_DLLEXPORT void AbstractIndex::build<float, uint32_t>(const float *data,
                                                                       const size_t num_points_to_load,
-                                                                      const IndexWriteParameters &parameters,
                                                                       const std::vector<uint32_t> &tags);
 template DISKANN_DLLEXPORT void AbstractIndex::build<int8_t, uint32_t>(const int8_t *data,
                                                                        const size_t num_points_to_load,
-                                                                       const IndexWriteParameters &parameters,
                                                                        const std::vector<uint32_t> &tags);
 template DISKANN_DLLEXPORT void AbstractIndex::build<uint8_t, uint32_t>(const uint8_t *data,
                                                                         const size_t num_points_to_load,
-                                                                        const IndexWriteParameters &parameters,
                                                                         const std::vector<uint32_t> &tags);
 template DISKANN_DLLEXPORT void AbstractIndex::build<float, int64_t>(const float *data, const size_t num_points_to_load,
-                                                                     const IndexWriteParameters &parameters,
                                                                      const std::vector<int64_t> &tags);
 template DISKANN_DLLEXPORT void AbstractIndex::build<int8_t, int64_t>(const int8_t *data,
                                                                       const size_t num_points_to_load,
-                                                                      const IndexWriteParameters &parameters,
                                                                       const std::vector<int64_t> &tags);
 template DISKANN_DLLEXPORT void AbstractIndex::build<uint8_t, int64_t>(const uint8_t *data,
                                                                        const size_t num_points_to_load,
-                                                                       const IndexWriteParameters &parameters,
                                                                        const std::vector<int64_t> &tags);
 template DISKANN_DLLEXPORT void AbstractIndex::build<float, uint64_t>(const float *data,
                                                                       const size_t num_points_to_load,
-                                                                      const IndexWriteParameters &parameters,
                                                                       const std::vector<uint64_t> &tags);
 template DISKANN_DLLEXPORT void AbstractIndex::build<int8_t, uint64_t>(const int8_t *data,
                                                                        const size_t num_points_to_load,
-                                                                       const IndexWriteParameters &parameters,
                                                                        const std::vector<uint64_t> &tags);
 template DISKANN_DLLEXPORT void AbstractIndex::build<uint8_t, uint64_t>(const uint8_t *data,
                                                                         const size_t num_points_to_load,
-                                                                        const IndexWriteParameters &parameters,
                                                                         const std::vector<uint64_t> &tags);
 
 template DISKANN_DLLEXPORT std::pair<uint32_t, uint32_t> AbstractIndex::search<float, uint32_t>(
@@ -160,61 +163,53 @@ template DISKANN_DLLEXPORT std::pair<uint32_t, uint32_t> AbstractIndex::search_w
     const DataType &query, const std::string &raw_label, const size_t K, const uint32_t L, uint64_t *indices,
     float *distances);
 
-template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<float, int32_t>(const float *query, const uint64_t K,
-                                                                                  const uint32_t L, int32_t *tags,
-                                                                                  float *distances,
-                                                                                  std::vector<float *> &res_vectors);
+template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<float, int32_t>(
+    const float *query, const uint64_t K, const uint32_t L, int32_t *tags, float *distances,
+    std::vector<float *> &res_vectors, bool use_filters, const std::string filter_label);
 
-template DISKANN_DLLEXPORT size_t
-AbstractIndex::search_with_tags<uint8_t, int32_t>(const uint8_t *query, const uint64_t K, const uint32_t L,
-                                                  int32_t *tags, float *distances, std::vector<uint8_t *> &res_vectors);
+template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<uint8_t, int32_t>(
+    const uint8_t *query, const uint64_t K, const uint32_t L, int32_t *tags, float *distances,
+    std::vector<uint8_t *> &res_vectors, bool use_filters, const std::string filter_label);
 
-template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<int8_t, int32_t>(const int8_t *query,
-                                                                                   const uint64_t K, const uint32_t L,
-                                                                                   int32_t *tags, float *distances,
-                                                                                   std::vector<int8_t *> &res_vectors);
+template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<int8_t, int32_t>(
+    const int8_t *query, const uint64_t K, const uint32_t L, int32_t *tags, float *distances,
+    std::vector<int8_t *> &res_vectors, bool use_filters, const std::string filter_label);
 
-template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<float, uint32_t>(const float *query, const uint64_t K,
-                                                                                   const uint32_t L, uint32_t *tags,
-                                                                                   float *distances,
-                                                                                   std::vector<float *> &res_vectors);
+template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<float, uint32_t>(
+    const float *query, const uint64_t K, const uint32_t L, uint32_t *tags, float *distances,
+    std::vector<float *> &res_vectors, bool use_filters, const std::string filter_label);
 
 template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<uint8_t, uint32_t>(
     const uint8_t *query, const uint64_t K, const uint32_t L, uint32_t *tags, float *distances,
-    std::vector<uint8_t *> &res_vectors);
+    std::vector<uint8_t *> &res_vectors, bool use_filters, const std::string filter_label);
 
-template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<int8_t, uint32_t>(const int8_t *query,
-                                                                                    const uint64_t K, const uint32_t L,
-                                                                                    uint32_t *tags, float *distances,
-                                                                                    std::vector<int8_t *> &res_vectors);
+template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<int8_t, uint32_t>(
+    const int8_t *query, const uint64_t K, const uint32_t L, uint32_t *tags, float *distances,
+    std::vector<int8_t *> &res_vectors, bool use_filters, const std::string filter_label);
 
-template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<float, int64_t>(const float *query, const uint64_t K,
-                                                                                  const uint32_t L, int64_t *tags,
-                                                                                  float *distances,
-                                                                                  std::vector<float *> &res_vectors);
+template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<float, int64_t>(
+    const float *query, const uint64_t K, const uint32_t L, int64_t *tags, float *distances,
+    std::vector<float *> &res_vectors, bool use_filters, const std::string filter_label);
 
-template DISKANN_DLLEXPORT size_t
-AbstractIndex::search_with_tags<uint8_t, int64_t>(const uint8_t *query, const uint64_t K, const uint32_t L,
-                                                  int64_t *tags, float *distances, std::vector<uint8_t *> &res_vectors);
+template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<uint8_t, int64_t>(
+    const uint8_t *query, const uint64_t K, const uint32_t L, int64_t *tags, float *distances,
+    std::vector<uint8_t *> &res_vectors, bool use_filters, const std::string filter_label);
 
-template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<int8_t, int64_t>(const int8_t *query,
-                                                                                   const uint64_t K, const uint32_t L,
-                                                                                   int64_t *tags, float *distances,
-                                                                                   std::vector<int8_t *> &res_vectors);
+template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<int8_t, int64_t>(
+    const int8_t *query, const uint64_t K, const uint32_t L, int64_t *tags, float *distances,
+    std::vector<int8_t *> &res_vectors, bool use_filters, const std::string filter_label);
 
-template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<float, uint64_t>(const float *query, const uint64_t K,
-                                                                                   const uint32_t L, uint64_t *tags,
-                                                                                   float *distances,
-                                                                                   std::vector<float *> &res_vectors);
+template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<float, uint64_t>(
+    const float *query, const uint64_t K, const uint32_t L, uint64_t *tags, float *distances,
+    std::vector<float *> &res_vectors, bool use_filters, const std::string filter_label);
 
 template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<uint8_t, uint64_t>(
     const uint8_t *query, const uint64_t K, const uint32_t L, uint64_t *tags, float *distances,
-    std::vector<uint8_t *> &res_vectors);
+    std::vector<uint8_t *> &res_vectors, bool use_filters, const std::string filter_label);
 
-template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<int8_t, uint64_t>(const int8_t *query,
-                                                                                    const uint64_t K, const uint32_t L,
-                                                                                    uint64_t *tags, float *distances,
-                                                                                    std::vector<int8_t *> &res_vectors);
+template DISKANN_DLLEXPORT size_t AbstractIndex::search_with_tags<int8_t, uint64_t>(
+    const int8_t *query, const uint64_t K, const uint32_t L, uint64_t *tags, float *distances,
+    std::vector<int8_t *> &res_vectors, bool use_filters, const std::string filter_label);
 
 template DISKANN_DLLEXPORT void AbstractIndex::search_with_optimized_layout<float>(const float *query, size_t K,
                                                                                    size_t L, uint32_t *indices);
@@ -239,6 +234,62 @@ template DISKANN_DLLEXPORT int AbstractIndex::insert_point<float, uint64_t>(cons
 template DISKANN_DLLEXPORT int AbstractIndex::insert_point<uint8_t, uint64_t>(const uint8_t *point, const uint64_t tag);
 template DISKANN_DLLEXPORT int AbstractIndex::insert_point<int8_t, uint64_t>(const int8_t *point, const uint64_t tag);
 
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<float, int32_t, uint16_t>(
+    const float *point, const int32_t tag, const std::vector<uint16_t> &labels);
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<uint8_t, int32_t, uint16_t>(
+    const uint8_t *point, const int32_t tag, const std::vector<uint16_t> &labels);
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<int8_t, int32_t, uint16_t>(
+    const int8_t *point, const int32_t tag, const std::vector<uint16_t> &labels);
+
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<float, uint32_t, uint16_t>(
+    const float *point, const uint32_t tag, const std::vector<uint16_t> &labels);
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<uint8_t, uint32_t, uint16_t>(
+    const uint8_t *point, const uint32_t tag, const std::vector<uint16_t> &labels);
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<int8_t, uint32_t, uint16_t>(
+    const int8_t *point, const uint32_t tag, const std::vector<uint16_t> &labels);
+
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<float, int64_t, uint16_t>(
+    const float *point, const int64_t tag, const std::vector<uint16_t> &labels);
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<uint8_t, int64_t, uint16_t>(
+    const uint8_t *point, const int64_t tag, const std::vector<uint16_t> &labels);
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<int8_t, int64_t, uint16_t>(
+    const int8_t *point, const int64_t tag, const std::vector<uint16_t> &labels);
+
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<float, uint64_t, uint16_t>(
+    const float *point, const uint64_t tag, const std::vector<uint16_t> &labels);
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<uint8_t, uint64_t, uint16_t>(
+    const uint8_t *point, const uint64_t tag, const std::vector<uint16_t> &labels);
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<int8_t, uint64_t, uint16_t>(
+    const int8_t *point, const uint64_t tag, const std::vector<uint16_t> &labels);
+
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<float, int32_t, uint32_t>(
+    const float *point, const int32_t tag, const std::vector<uint32_t> &labels);
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<uint8_t, int32_t, uint32_t>(
+    const uint8_t *point, const int32_t tag, const std::vector<uint32_t> &labels);
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<int8_t, int32_t, uint32_t>(
+    const int8_t *point, const int32_t tag, const std::vector<uint32_t> &labels);
+
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<float, uint32_t, uint32_t>(
+    const float *point, const uint32_t tag, const std::vector<uint32_t> &labels);
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<uint8_t, uint32_t, uint32_t>(
+    const uint8_t *point, const uint32_t tag, const std::vector<uint32_t> &labels);
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<int8_t, uint32_t, uint32_t>(
+    const int8_t *point, const uint32_t tag, const std::vector<uint32_t> &labels);
+
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<float, int64_t, uint32_t>(
+    const float *point, const int64_t tag, const std::vector<uint32_t> &labels);
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<uint8_t, int64_t, uint32_t>(
+    const uint8_t *point, const int64_t tag, const std::vector<uint32_t> &labels);
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<int8_t, int64_t, uint32_t>(
+    const int8_t *point, const int64_t tag, const std::vector<uint32_t> &labels);
+
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<float, uint64_t, uint32_t>(
+    const float *point, const uint64_t tag, const std::vector<uint32_t> &labels);
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<uint8_t, uint64_t, uint32_t>(
+    const uint8_t *point, const uint64_t tag, const std::vector<uint32_t> &labels);
+template DISKANN_DLLEXPORT int AbstractIndex::insert_point<int8_t, uint64_t, uint32_t>(
+    const int8_t *point, const uint64_t tag, const std::vector<uint32_t> &labels);
+
 template DISKANN_DLLEXPORT int AbstractIndex::lazy_delete<int32_t>(const int32_t &tag);
 template DISKANN_DLLEXPORT int AbstractIndex::lazy_delete<uint32_t>(const uint32_t &tag);
 template DISKANN_DLLEXPORT int AbstractIndex::lazy_delete<int64_t>(const int64_t &tag);
@@ -277,4 +328,7 @@ template DISKANN_DLLEXPORT int AbstractIndex::get_vector_by_tag<uint64_t, float>
 template DISKANN_DLLEXPORT int AbstractIndex::get_vector_by_tag<uint64_t, uint8_t>(uint64_t &tag, uint8_t *vec);
 template DISKANN_DLLEXPORT int AbstractIndex::get_vector_by_tag<uint64_t, int8_t>(uint64_t &tag, int8_t *vec);
 
+template DISKANN_DLLEXPORT void AbstractIndex::set_universal_label<uint16_t>(const uint16_t label);
+template DISKANN_DLLEXPORT void AbstractIndex::set_universal_label<uint32_t>(const uint32_t label);
+
 } // namespace diskann
diff --git a/src/disk_utils.cpp b/src/disk_utils.cpp
index 08adb186c..faa9e7623 100644
--- a/src/disk_utils.cpp
+++ b/src/disk_utils.cpp
@@ -3,7 +3,7 @@
 
 #include "common_includes.h"
 
-#if defined(RELEASE_UNUSED_TCMALLOC_MEMORY_AT_CHECKPOINTS) && defined(DISKANN_BUILD)
+#if defined(DISKANN_RELEASE_UNUSED_TCMALLOC_MEMORY_AT_CHECKPOINTS) && defined(DISKANN_BUILD)
 #include "gperftools/malloc_extension.h"
 #endif
 
@@ -559,7 +559,21 @@ void breakup_dense_points(const std::string data_file, const std::string labels_
     if (dummy_pt_ids.size() != 0)
     {
         diskann::cout << dummy_pt_ids.size() << " is the number of dummy points created" << std::endl;
-        data = (T *)std::realloc((void *)data, labels_per_point.size() * ndims * sizeof(T));
+
+        T *ptr = (T *)std::realloc((void *)data, labels_per_point.size() * ndims * sizeof(T));
+        if (ptr == nullptr)
+        {
+            diskann::cerr << "Realloc failed while creating dummy points" << std::endl;
+            free(data);
+            data = nullptr;
+            throw new diskann::ANNException("Realloc failed while expanding data.", -1, __FUNCTION__, __FILE__,
+                                            __LINE__);
+        }
+        else
+        {
+            data = ptr;
+        }
+
         std::ofstream dummy_writer(out_metadata_file);
         assert(dummy_writer.is_open());
         for (auto i = dummy_pt_ids.begin(); i != dummy_pt_ids.end(); i++)
@@ -635,10 +649,12 @@ int build_merged_vamana_index(std::string base_file, diskann::Metric compareMetr
                                                   .with_num_threads(num_threads)
                                                   .build();
         using TagT = uint32_t;
-        diskann::Index<T, TagT, LabelT> _index(compareMetric, base_dim, base_num, false, false, false,
-                                               build_pq_bytes > 0, build_pq_bytes, use_opq);
+        diskann::Index<T, TagT, LabelT> _index(compareMetric, base_dim, base_num,
+                                               std::make_shared<diskann::IndexWriteParameters>(paras), nullptr,
+                                               defaults::NUM_FROZEN_POINTS_STATIC, false, false, false,
+                                               build_pq_bytes > 0, build_pq_bytes, use_opq, use_filters);
         if (!use_filters)
-            _index.build(base_file.c_str(), base_num, paras);
+            _index.build(base_file.c_str(), base_num);
         else
         {
             if (universal_label != "")
@@ -646,7 +662,7 @@ int build_merged_vamana_index(std::string base_file, diskann::Metric compareMetr
             //    LabelT unv_label_as_num = 0;
                 _index.set_universal_label(universal_label_num);
             }
-            _index.build_filtered_index(base_file.c_str(), label_file, base_num, paras);
+            _index.build_filtered_index(base_file.c_str(), label_file, base_num);
         }
         _index.save(mem_index_path.c_str());
 
@@ -673,7 +689,7 @@ int build_merged_vamana_index(std::string base_file, diskann::Metric compareMetr
     Timer timer;
     int num_parts =
         partition_with_ram_budget<T>(base_file, sampling_rate, ram_budget, 2 * R / 3, merged_index_prefix, 2);
-    diskann::cout << timer.elapsed_seconds_for_step("partitioning data") << std::endl;
+    diskann::cout << timer.elapsed_seconds_for_step("partitioning data ") << std::endl;
 
     std::string cur_centroid_filepath = merged_index_prefix + "_centroids.bin";
     std::rename(cur_centroid_filepath.c_str(), centroids_file.c_str());
@@ -681,6 +697,10 @@ int build_merged_vamana_index(std::string base_file, diskann::Metric compareMetr
     timer.reset();
     for (int p = 0; p < num_parts; p++)
     {
+#if defined(DISKANN_RELEASE_UNUSED_TCMALLOC_MEMORY_AT_CHECKPOINTS) && defined(DISKANN_BUILD)
+        MallocExtension::instance()->ReleaseFreeMemory();
+#endif
+
         std::string shard_base_file = merged_index_prefix + "_subshard-" + std::to_string(p) + ".bin";
 
         std::string shard_ids_file = merged_index_prefix + "_subshard-" + std::to_string(p) + "_ids_uint32.bin";
@@ -691,16 +711,22 @@ int build_merged_vamana_index(std::string base_file, diskann::Metric compareMetr
 
         std::string shard_index_file = merged_index_prefix + "_subshard-" + std::to_string(p) + "_mem.index";
 
-        diskann::IndexWriteParameters paras =
-            diskann::IndexWriteParametersBuilder(L, (2 * R / 3)).with_filter_list_size(Lf).build();
+        diskann::IndexWriteParameters low_degree_params = diskann::IndexWriteParametersBuilder(L, 2 * R / 3)
+                                                              .with_filter_list_size(Lf)
+                                                              .with_saturate_graph(false)
+                                                              .with_num_threads(num_threads)
+                                                              .build();
 
         uint64_t shard_base_dim, shard_base_pts;
         get_bin_metadata(shard_base_file, shard_base_pts, shard_base_dim);
-        diskann::Index<T> _index(compareMetric, shard_base_dim, shard_base_pts, false, false, false, build_pq_bytes > 0,
+
+        diskann::Index<T> _index(compareMetric, shard_base_dim, shard_base_pts,
+                                 std::make_shared<diskann::IndexWriteParameters>(low_degree_params), nullptr,
+                                 defaults::NUM_FROZEN_POINTS_STATIC, false, false, false, build_pq_bytes > 0,
                                  build_pq_bytes, use_opq);
         if (!use_filters)
         {
-            _index.build(shard_base_file.c_str(), shard_base_pts, paras);
+            _index.build(shard_base_file.c_str(), shard_base_pts);
         }
         else
         {
@@ -710,7 +736,7 @@ int build_merged_vamana_index(std::string base_file, diskann::Metric compareMetr
 //                LabelT unv_label_as_num = 0;
                 _index.set_universal_label(universal_label_num);
             }
-            _index.build_filtered_index(shard_base_file.c_str(), shard_labels_file, shard_base_pts, paras);
+            _index.build_filtered_index(shard_base_file.c_str(), shard_labels_file, shard_base_pts);
         }
         _index.save(shard_index_file.c_str());
         // copy universal label file from first shard to the final destination
@@ -895,29 +921,31 @@ void create_disk_layout(const std::string base_file, const std::string mem_index
     if (vamana_frozen_num == 1)
         vamana_frozen_loc = medoid;
     max_node_len = (((uint64_t)width_u32 + 1) * sizeof(uint32_t)) + (ndims_64 * sizeof(T));
-    nnodes_per_sector = SECTOR_LEN / max_node_len;
+    nnodes_per_sector = defaults::SECTOR_LEN / max_node_len; // 0 if max_node_len > SECTOR_LEN
 
     diskann::cout << "medoid: " << medoid << "B" << std::endl;
     diskann::cout << "max_node_len: " << max_node_len << "B" << std::endl;
     diskann::cout << "nnodes_per_sector: " << nnodes_per_sector << "B" << std::endl;
 
-    // SECTOR_LEN buffer for each sector
-    std::unique_ptr<char[]> sector_buf = std::make_unique<char[]>(SECTOR_LEN);
+    // defaults::SECTOR_LEN buffer for each sector
+    std::unique_ptr<char[]> sector_buf = std::make_unique<char[]>(defaults::SECTOR_LEN);
+    std::unique_ptr<char[]> multisector_buf = std::make_unique<char[]>(ROUND_UP(max_node_len, defaults::SECTOR_LEN));
     std::unique_ptr<char[]> node_buf = std::make_unique<char[]>(max_node_len);
     uint32_t &nnbrs = *(uint32_t *)(node_buf.get() + ndims_64 * sizeof(T));
     uint32_t *nhood_buf = (uint32_t *)(node_buf.get() + (ndims_64 * sizeof(T)) + sizeof(uint32_t));
 
     // number of sectors (1 for meta data)
-    uint64_t n_sectors = ROUND_UP(npts_64, nnodes_per_sector) / nnodes_per_sector;
+    uint64_t n_sectors = nnodes_per_sector > 0 ? ROUND_UP(npts_64, nnodes_per_sector) / nnodes_per_sector
+                                               : npts_64 * DIV_ROUND_UP(max_node_len, defaults::SECTOR_LEN);
     uint64_t n_reorder_sectors = 0;
     uint64_t n_data_nodes_per_sector = 0;
 
     if (append_reorder_data)
     {
-        n_data_nodes_per_sector = SECTOR_LEN / (ndims_reorder_file * sizeof(float));
+        n_data_nodes_per_sector = defaults::SECTOR_LEN / (ndims_reorder_file * sizeof(float));
         n_reorder_sectors = ROUND_UP(npts_64, n_data_nodes_per_sector) / n_data_nodes_per_sector;
     }
-    uint64_t disk_index_file_size = (n_sectors + n_reorder_sectors + 1) * SECTOR_LEN;
+    uint64_t disk_index_file_size = (n_sectors + n_reorder_sectors + 1) * defaults::SECTOR_LEN;
 
     std::vector<uint64_t> output_file_meta;
     output_file_meta.push_back(npts_64);
@@ -936,20 +964,73 @@ void create_disk_layout(const std::string base_file, const std::string mem_index
     }
     output_file_meta.push_back(disk_index_file_size);
 
-    diskann_writer.write(sector_buf.get(), SECTOR_LEN);
+    diskann_writer.write(sector_buf.get(), defaults::SECTOR_LEN);
 
     std::unique_ptr<T[]> cur_node_coords = std::make_unique<T[]>(ndims_64);
     diskann::cout << "# sectors: " << n_sectors << std::endl;
     uint64_t cur_node_id = 0;
-    for (uint64_t sector = 0; sector < n_sectors; sector++)
-    {
-        if (sector % 100000 == 0)
+
+    if (nnodes_per_sector > 0)
+    { // Write multiple nodes per sector
+        for (uint64_t sector = 0; sector < n_sectors; sector++)
         {
-            diskann::cout << "Sector #" << sector << "written" << std::endl;
+            if (sector % 100000 == 0)
+            {
+                diskann::cout << "Sector #" << sector << "written" << std::endl;
+            }
+            memset(sector_buf.get(), 0, defaults::SECTOR_LEN);
+            for (uint64_t sector_node_id = 0; sector_node_id < nnodes_per_sector && cur_node_id < npts_64;
+                 sector_node_id++)
+            {
+                memset(node_buf.get(), 0, max_node_len);
+                // read cur node's nnbrs
+                vamana_reader.read((char *)&nnbrs, sizeof(uint32_t));
+
+                // sanity checks on nnbrs
+                assert(nnbrs > 0);
+                assert(nnbrs <= width_u32);
+
+                // read node's nhood
+                vamana_reader.read((char *)nhood_buf, (std::min)(nnbrs, width_u32) * sizeof(uint32_t));
+                if (nnbrs > width_u32)
+                {
+                    vamana_reader.seekg((nnbrs - width_u32) * sizeof(uint32_t), vamana_reader.cur);
+                }
+
+                // write coords of node first
+                //  T *node_coords = data + ((uint64_t) ndims_64 * cur_node_id);
+                base_reader.read((char *)cur_node_coords.get(), sizeof(T) * ndims_64);
+                memcpy(node_buf.get(), cur_node_coords.get(), ndims_64 * sizeof(T));
+
+                // write nnbrs
+                *(uint32_t *)(node_buf.get() + ndims_64 * sizeof(T)) = (std::min)(nnbrs, width_u32);
+
+                // write nhood next
+                memcpy(node_buf.get() + ndims_64 * sizeof(T) + sizeof(uint32_t), nhood_buf,
+                       (std::min)(nnbrs, width_u32) * sizeof(uint32_t));
+
+                // get offset into sector_buf
+                char *sector_node_buf = sector_buf.get() + (sector_node_id * max_node_len);
+
+                // copy node buf into sector_node_buf
+                memcpy(sector_node_buf, node_buf.get(), max_node_len);
+                cur_node_id++;
+            }
+            // flush sector to disk
+            diskann_writer.write(sector_buf.get(), defaults::SECTOR_LEN);
         }
-        memset(sector_buf.get(), 0, SECTOR_LEN);
-        for (uint64_t sector_node_id = 0; sector_node_id < nnodes_per_sector && cur_node_id < npts_64; sector_node_id++)
+    }
+    else
+    { // Write multi-sector nodes
+        uint64_t nsectors_per_node = DIV_ROUND_UP(max_node_len, defaults::SECTOR_LEN);
+        for (uint64_t i = 0; i < npts_64; i++)
         {
+            if ((i * nsectors_per_node) % 100000 == 0)
+            {
+                diskann::cout << "Sector #" << i * nsectors_per_node << "written" << std::endl;
+            }
+            memset(multisector_buf.get(), 0, nsectors_per_node * defaults::SECTOR_LEN);
+
             memset(node_buf.get(), 0, max_node_len);
             // read cur node's nnbrs
             vamana_reader.read((char *)&nnbrs, sizeof(uint32_t));
@@ -968,25 +1049,20 @@ void create_disk_layout(const std::string base_file, const std::string mem_index
             // write coords of node first
             //  T *node_coords = data + ((uint64_t) ndims_64 * cur_node_id);
             base_reader.read((char *)cur_node_coords.get(), sizeof(T) * ndims_64);
-            memcpy(node_buf.get(), cur_node_coords.get(), ndims_64 * sizeof(T));
+            memcpy(multisector_buf.get(), cur_node_coords.get(), ndims_64 * sizeof(T));
 
             // write nnbrs
-            *(uint32_t *)(node_buf.get() + ndims_64 * sizeof(T)) = (std::min)(nnbrs, width_u32);
+            *(uint32_t *)(multisector_buf.get() + ndims_64 * sizeof(T)) = (std::min)(nnbrs, width_u32);
 
             // write nhood next
-            memcpy(node_buf.get() + ndims_64 * sizeof(T) + sizeof(uint32_t), nhood_buf,
+            memcpy(multisector_buf.get() + ndims_64 * sizeof(T) + sizeof(uint32_t), nhood_buf,
                    (std::min)(nnbrs, width_u32) * sizeof(uint32_t));
 
-            // get offset into sector_buf
-            char *sector_node_buf = sector_buf.get() + (sector_node_id * max_node_len);
-
-            // copy node buf into sector_node_buf
-            memcpy(sector_node_buf, node_buf.get(), max_node_len);
-            cur_node_id++;
+            // flush sector to disk
+            diskann_writer.write(multisector_buf.get(), nsectors_per_node * defaults::SECTOR_LEN);
         }
-        // flush sector to disk
-        diskann_writer.write(sector_buf.get(), SECTOR_LEN);
     }
+
     if (append_reorder_data)
     {
         diskann::cout << "Index written. Appending reorder data..." << std::endl;
@@ -1001,7 +1077,7 @@ void create_disk_layout(const std::string base_file, const std::string mem_index
                 diskann::cout << "Reorder data Sector #" << sector << "written" << std::endl;
             }
 
-            memset(sector_buf.get(), 0, SECTOR_LEN);
+            memset(sector_buf.get(), 0, defaults::SECTOR_LEN);
 
             for (uint64_t sector_node_id = 0; sector_node_id < n_data_nodes_per_sector && sector_node_id < npts_64;
                  sector_node_id++)
@@ -1013,7 +1089,7 @@ void create_disk_layout(const std::string base_file, const std::string mem_index
                 memcpy(sector_buf.get() + (sector_node_id * vec_len), vec_buf.get(), vec_len);
             }
             // flush sector to disk
-            diskann_writer.write(sector_buf.get(), SECTOR_LEN);
+            diskann_writer.write(sector_buf.get(), defaults::SECTOR_LEN);
         }
     }
     diskann_writer.close();
@@ -1053,11 +1129,12 @@ int build_disk_index(const char *dataFilePath, const char *indexFilePath, const
         return -1;
     }
 
-    if (!std::is_same<T, float>::value && compareMetric == diskann::Metric::INNER_PRODUCT)
+    if (!std::is_same<T, float>::value &&
+        (compareMetric == diskann::Metric::INNER_PRODUCT || compareMetric == diskann::Metric::COSINE))
     {
         std::stringstream stream;
-        stream << "DiskANN currently only supports floating point data for Max "
-                  "Inner Product Search. "
+        stream << "Disk-index build currently only supports floating point data for Max "
+                  "Inner Product Search/ cosine similarity. "
                << std::endl;
         throw diskann::ANNException(stream.str(), -1);
     }
@@ -1119,6 +1196,10 @@ int build_disk_index(const char *dataFilePath, const char *indexFilePath, const
     std::string disk_pq_pivots_path = index_prefix_path + "_disk.index_pq_pivots.bin";
     // optional, used if disk index must store pq data
     std::string disk_pq_compressed_vectors_path = index_prefix_path + "_disk.index_pq_compressed.bin";
+    std::string prepped_base =
+        index_prefix_path +
+        "_prepped_base.bin"; // temp file for storing pre-processed base file for cosine/ mips metrics
+    bool created_temp_file_for_processed_data = false;
 
     // output a new base file which contains extra dimension with sqrt(1 -
     // ||x||^2/M^2) for every x, M is max norm of all points. Extra space on
@@ -1129,14 +1210,26 @@ int build_disk_index(const char *dataFilePath, const char *indexFilePath, const
         std::cout << "Using Inner Product search, so need to pre-process base "
                      "data into temp file. Please ensure there is additional "
                      "(n*(d+1)*4) bytes for storing pre-processed base vectors, "
-                     "apart from the intermin indices and final index."
+                     "apart from the interim indices created by DiskANN and the final index."
                   << std::endl;
-        std::string prepped_base = index_prefix_path + "_prepped_base.bin";
         data_file_to_use = prepped_base;
         float max_norm_of_base = diskann::prepare_base_for_inner_products<T>(base_file, prepped_base);
         std::string norm_file = disk_index_path + "_max_base_norm.bin";
         diskann::save_bin<float>(norm_file, &max_norm_of_base, 1, 1);
         diskann::cout << timer.elapsed_seconds_for_step("preprocessing data for inner product") << std::endl;
+        created_temp_file_for_processed_data = true;
+    }
+    else if (compareMetric == diskann::Metric::COSINE)
+    {
+        Timer timer;
+        std::cout << "Normalizing data for cosine to temporary file, please ensure there is additional "
+                     "(n*d*4) bytes for storing normalized base vectors, "
+                     "apart from the interim indices created by DiskANN and the final index."
+                  << std::endl;
+        data_file_to_use = prepped_base;
+        diskann::normalize_data_file(base_file, prepped_base);
+        diskann::cout << timer.elapsed_seconds_for_step("preprocessing data for cosine") << std::endl;
+        created_temp_file_for_processed_data = true;
     }
 
     uint32_t R = (uint32_t)atoi(param_list[0].c_str());
@@ -1226,10 +1319,10 @@ int build_disk_index(const char *dataFilePath, const char *indexFilePath, const
 
 // Gopal. Splitting diskann_dll into separate DLLs for search and build.
 // This code should only be available in the "build" DLL.
-#if defined(RELEASE_UNUSED_TCMALLOC_MEMORY_AT_CHECKPOINTS) && defined(DISKANN_BUILD)
+#if defined(DISKANN_RELEASE_UNUSED_TCMALLOC_MEMORY_AT_CHECKPOINTS) && defined(DISKANN_BUILD)
     MallocExtension::instance()->ReleaseFreeMemory();
 #endif
-
+    // Whether it is cosine or inner product, we still L2 metric due to the pre-processing.
     timer.reset();
     diskann::build_merged_vamana_index<T, LabelT>(data_file_to_use.c_str(), diskann::Metric::L2, L, R, p_val,
                                                   indexing_ram_budget, mem_index_path, medoids_path, centroids_path,
@@ -1270,7 +1363,8 @@ int build_disk_index(const char *dataFilePath, const char *indexFilePath, const
         std::remove(augmented_labels_file.c_str());
         std::remove(labels_file_to_use.c_str());
     }
-
+    if (created_temp_file_for_processed_data)
+        std::remove(prepped_base.c_str());
     std::remove(mem_index_path.c_str());
     if (use_disk_pq)
         std::remove(disk_pq_compressed_vectors_path.c_str());
diff --git a/src/distance.cpp b/src/distance.cpp
index 31ab9d3ff..c2f88c85b 100644
--- a/src/distance.cpp
+++ b/src/distance.cpp
@@ -61,10 +61,6 @@ template <typename T> size_t Distance<T>::get_required_alignment() const
     return _alignment_factor;
 }
 
-template <typename T> Distance<T>::~Distance()
-{
-}
-
 //
 // Cosine distance functions.
 //
@@ -730,4 +726,8 @@ template DISKANN_DLLEXPORT class SlowDistanceL2<float>;
 template DISKANN_DLLEXPORT class SlowDistanceL2<int8_t>;
 template DISKANN_DLLEXPORT class SlowDistanceL2<uint8_t>;
 
+template DISKANN_DLLEXPORT Distance<float> *get_distance_function(Metric m);
+template DISKANN_DLLEXPORT Distance<int8_t> *get_distance_function(Metric m);
+template DISKANN_DLLEXPORT Distance<uint8_t> *get_distance_function(Metric m);
+
 } // namespace diskann
diff --git a/src/dll/CMakeLists.txt b/src/dll/CMakeLists.txt
index d00cfeb95..096d1b76e 100644
--- a/src/dll/CMakeLists.txt
+++ b/src/dll/CMakeLists.txt
@@ -2,14 +2,17 @@
 #Licensed under the MIT                        license.
 
 add_library(${PROJECT_NAME} SHARED dllmain.cpp ../abstract_data_store.cpp ../partition.cpp ../pq.cpp ../pq_flash_index.cpp ../logger.cpp ../utils.cpp 
-    ../windows_aligned_file_reader.cpp ../distance.cpp ../memory_mapper.cpp ../index.cpp 
-    ../in_mem_data_store.cpp ../in_mem_graph_store.cpp ../math_utils.cpp ../disk_utils.cpp ../filter_utils.cpp 
+    ../windows_aligned_file_reader.cpp ../distance.cpp ../pq_l2_distance.cpp ../memory_mapper.cpp ../index.cpp 
+    ../in_mem_data_store.cpp ../pq_data_store.cpp ../in_mem_graph_store.cpp ../math_utils.cpp ../disk_utils.cpp ../filter_utils.cpp 
     ../ann_exception.cpp ../natural_number_set.cpp ../natural_number_map.cpp ../scratch.cpp ../index_factory.cpp ../abstract_index.cpp)
 
 set(TARGET_DIR "$<$<CONFIG:Debug>:${CMAKE_LIBRARY_OUTPUT_DIRECTORY_DEBUG}>$<$<CONFIG:Release>:${CMAKE_LIBRARY_OUTPUT_DIRECTORY_RELEASE}>")
 
 set(DISKANN_DLL_IMPLIB "${TARGET_DIR}/${PROJECT_NAME}.lib")
 
+if (NOT PYBIND)
+    target_compile_definitions(${PROJECT_NAME} PRIVATE DISKANN_RELEASE_UNUSED_TCMALLOC_MEMORY_AT_CHECKPOINTS DISKANN_BUILD)
+endif()
 target_compile_definitions(${PROJECT_NAME} PRIVATE _USRDLL _WINDLL)
 target_compile_options(${PROJECT_NAME} PRIVATE /GL)
 target_include_directories(${PROJECT_NAME} PRIVATE ${DISKANN_MKL_INCLUDE_DIRECTORIES})
diff --git a/src/filter_utils.cpp b/src/filter_utils.cpp
index 965762d1f..09d740e35 100644
--- a/src/filter_utils.cpp
+++ b/src/filter_utils.cpp
@@ -45,10 +45,13 @@ void generate_label_indices(path input_data_path, path final_index_path_prefix,
 
         size_t number_of_label_points, dimension;
         diskann::get_bin_metadata(curr_label_input_data_path, number_of_label_points, dimension);
-        diskann::Index<T> index(diskann::Metric::L2, dimension, number_of_label_points, false, false);
+
+        diskann::Index<T> index(diskann::Metric::L2, dimension, number_of_label_points,
+                                std::make_shared<diskann::IndexWriteParameters>(label_index_build_parameters), nullptr,
+                                0, false, false, false, false, 0, false);
 
         auto index_build_timer = std::chrono::high_resolution_clock::now();
-        index.build(curr_label_input_data_path.c_str(), number_of_label_points, label_index_build_parameters);
+        index.build(curr_label_input_data_path.c_str(), number_of_label_points);
         std::chrono::duration<double> current_indexing_time =
             std::chrono::high_resolution_clock::now() - index_build_timer;
 
@@ -258,6 +261,74 @@ parse_label_file_return_values parse_label_file(path label_data_path, std::strin
     return std::make_tuple(point_ids_to_labels, labels_to_number_of_points, all_labels);
 }
 
+/*
+ * A templated function to parse a file of labels that are already represented
+ * as either uint16_t or uint32_t
+ *
+ * Returns two objects via std::tuple:
+ * 1. a vector of vectors of labels, where the outer vector is indexed by point id
+ * 2. a set of all labels
+ */
+template <typename LabelT>
+std::tuple<std::vector<std::vector<LabelT>>, tsl::robin_set<LabelT>> parse_formatted_label_file(std::string label_file)
+{
+    std::vector<std::vector<LabelT>> pts_to_labels;
+    tsl::robin_set<LabelT> labels;
+
+    // Format of Label txt file: filters with comma separators
+    std::ifstream infile(label_file);
+    if (infile.fail())
+    {
+        throw diskann::ANNException(std::string("Failed to open file ") + label_file, -1);
+    }
+
+    std::string line, token;
+    uint32_t line_cnt = 0;
+
+    while (std::getline(infile, line))
+    {
+        line_cnt++;
+    }
+    pts_to_labels.resize(line_cnt, std::vector<LabelT>());
+
+    infile.clear();
+    infile.seekg(0, std::ios::beg);
+    line_cnt = 0;
+
+    while (std::getline(infile, line))
+    {
+        std::istringstream iss(line);
+        std::vector<LabelT> lbls(0);
+        getline(iss, token, '\t');
+        std::istringstream new_iss(token);
+        while (getline(new_iss, token, ','))
+        {
+            token.erase(std::remove(token.begin(), token.end(), '\n'), token.end());
+            token.erase(std::remove(token.begin(), token.end(), '\r'), token.end());
+            LabelT token_as_num = static_cast<LabelT>(std::stoul(token));
+            lbls.push_back(token_as_num);
+            labels.insert(token_as_num);
+        }
+        if (lbls.size() <= 0)
+        {
+            diskann::cout << "No label found";
+            exit(-1);
+        }
+        std::sort(lbls.begin(), lbls.end());
+        pts_to_labels[line_cnt] = lbls;
+        line_cnt++;
+    }
+    diskann::cout << "Identified " << labels.size() << " distinct label(s)" << std::endl;
+
+    return std::make_tuple(pts_to_labels, labels);
+}
+
+template DISKANN_DLLEXPORT std::tuple<std::vector<std::vector<uint32_t>>, tsl::robin_set<uint32_t>>
+parse_formatted_label_file(path label_file);
+
+template DISKANN_DLLEXPORT std::tuple<std::vector<std::vector<uint16_t>>, tsl::robin_set<uint16_t>>
+parse_formatted_label_file(path label_file);
+
 template DISKANN_DLLEXPORT void generate_label_indices<float>(path input_data_path, path final_index_path_prefix,
                                                               label_set all_labels, uint32_t R, uint32_t L, float alpha,
                                                               uint32_t num_threads);
diff --git a/src/in_mem_data_store.cpp b/src/in_mem_data_store.cpp
index f5f973917..28bb7ba4c 100644
--- a/src/in_mem_data_store.cpp
+++ b/src/in_mem_data_store.cpp
@@ -2,6 +2,7 @@
 // Licensed under the MIT license.
 
 #include <memory>
+#include "abstract_scratch.h"
 #include "in_mem_data_store.h"
 
 #include "utils.h"
@@ -11,8 +12,8 @@ namespace diskann
 
 template <typename data_t>
 InMemDataStore<data_t>::InMemDataStore(const location_t num_points, const size_t dim,
-                                       std::shared_ptr<Distance<data_t>> distance_fn)
-    : AbstractDataStore<data_t>(num_points, dim), _distance_fn(distance_fn)
+                                       std::unique_ptr<Distance<data_t>> distance_fn)
+    : AbstractDataStore<data_t>(num_points, dim), _distance_fn(std::move(distance_fn))
 {
     _aligned_dim = ROUND_UP(dim, _distance_fn->get_required_alignment());
     alloc_aligned(((void **)&_data), this->_capacity * _aligned_dim * sizeof(data_t), 8 * sizeof(data_t));
@@ -157,6 +158,7 @@ void InMemDataStore<data_t>::extract_data_to_bin(const std::string &filename, co
 
 template <typename data_t> void InMemDataStore<data_t>::get_vector(const location_t i, data_t *dest) const
 {
+    // REFACTOR TODO: Should we denormalize and return values?
     memcpy(dest, _data + i * _aligned_dim, this->_dim * sizeof(data_t));
 }
 
@@ -171,9 +173,26 @@ template <typename data_t> void InMemDataStore<data_t>::set_vector(const locatio
     }
 }
 
-template <typename data_t> void InMemDataStore<data_t>::prefetch_vector(const location_t loc)
+template <typename data_t> void InMemDataStore<data_t>::prefetch_vector(const location_t loc) const
 {
-    diskann::prefetch_vector((const char *)_data + _aligned_dim * (size_t)loc, sizeof(data_t) * _aligned_dim);
+    diskann::prefetch_vector((const char *)_data + _aligned_dim * (size_t)loc * sizeof(data_t),
+                             sizeof(data_t) * _aligned_dim);
+}
+
+template <typename data_t>
+void InMemDataStore<data_t>::preprocess_query(const data_t *query, AbstractScratch<data_t> *query_scratch) const
+{
+    if (query_scratch != nullptr)
+    {
+        memcpy(query_scratch->aligned_query_T(), query, sizeof(data_t) * this->get_dims());
+    }
+    else
+    {
+        std::stringstream ss;
+        ss << "In InMemDataStore::preprocess_query: Query scratch is null";
+        diskann::cerr << ss.str() << std::endl;
+        throw diskann::ANNException(ss.str(), -1);
+    }
 }
 
 template <typename data_t> float InMemDataStore<data_t>::get_distance(const data_t *query, const location_t loc) const
@@ -183,10 +202,11 @@ template <typename data_t> float InMemDataStore<data_t>::get_distance(const data
 
 template <typename data_t>
 void InMemDataStore<data_t>::get_distance(const data_t *query, const location_t *locations,
-                                          const uint32_t location_count, float *distances) const
+                                          const uint32_t location_count, float *distances,
+                                          AbstractScratch<data_t> *scratch_space) const
 {
     for (location_t i = 0; i < location_count; i++)
-    {
+    {       
         distances[i] = _distance_fn->compare(query, _data + locations[i] * _aligned_dim, (uint32_t)this->_aligned_dim);
     }
 }
@@ -198,6 +218,17 @@ float InMemDataStore<data_t>::get_distance(const location_t loc1, const location
                                  (uint32_t)this->_aligned_dim);
 }
 
+template <typename data_t>
+void InMemDataStore<data_t>::get_distance(const data_t *preprocessed_query, const std::vector<location_t> &ids,
+                                          std::vector<float> &distances, AbstractScratch<data_t> *scratch_space) const
+{
+    for (int i = 0; i < ids.size(); i++)
+    {
+        distances[i] =
+            _distance_fn->compare(preprocessed_query, _data + ids[i] * _aligned_dim, (uint32_t)this->_aligned_dim);
+    }
+}
+
 template <typename data_t> location_t InMemDataStore<data_t>::expand(const location_t new_size)
 {
     if (new_size == this->capacity())
@@ -358,7 +389,7 @@ template <typename data_t> location_t InMemDataStore<data_t>::calculate_medoid()
     return min_idx;
 }
 
-template <typename data_t> Distance<data_t> *InMemDataStore<data_t>::get_dist_fn()
+template <typename data_t> Distance<data_t> *InMemDataStore<data_t>::get_dist_fn() const
 {
     return this->_distance_fn.get();
 }
diff --git a/src/in_mem_graph_store.cpp b/src/in_mem_graph_store.cpp
index e9bfd4e9e..c12b2514e 100644
--- a/src/in_mem_graph_store.cpp
+++ b/src/in_mem_graph_store.cpp
@@ -6,26 +6,237 @@
 
 namespace diskann
 {
+InMemGraphStore::InMemGraphStore(const size_t total_pts, const size_t reserve_graph_degree)
+    : AbstractGraphStore(total_pts, reserve_graph_degree)
+{
+    this->resize_graph(total_pts);
+    for (size_t i = 0; i < total_pts; i++)
+    {
+        _graph[i].reserve(reserve_graph_degree);
+    }
+}
+
+std::tuple<uint32_t, uint32_t, size_t> InMemGraphStore::load(const std::string &index_path_prefix,
+                                                             const size_t num_points)
+{
+    return load_impl(index_path_prefix, num_points);
+}
+int InMemGraphStore::store(const std::string &index_path_prefix, const size_t num_points,
+                           const size_t num_frozen_points, const uint32_t start)
+{
+    return save_graph(index_path_prefix, num_points, num_frozen_points, start);
+}
+const std::vector<location_t> &InMemGraphStore::get_neighbours(const location_t i) const
+{
+    return _graph.at(i);
+}
+
+void InMemGraphStore::add_neighbour(const location_t i, location_t neighbour_id)
+{
+    _graph[i].emplace_back(neighbour_id);
+    if (_max_observed_degree < _graph[i].size())
+    {
+        _max_observed_degree = (uint32_t)(_graph[i].size());
+    }
+}
+
+void InMemGraphStore::clear_neighbours(const location_t i)
+{
+    _graph[i].clear();
+};
+void InMemGraphStore::swap_neighbours(const location_t a, location_t b)
+{
+    _graph[a].swap(_graph[b]);
+};
+
+void InMemGraphStore::set_neighbours(const location_t i, std::vector<location_t> &neighbours)
+{
+    _graph[i].assign(neighbours.begin(), neighbours.end());
+    if (_max_observed_degree < neighbours.size())
+    {
+        _max_observed_degree = (uint32_t)(neighbours.size());
+    }
+}
+
+size_t InMemGraphStore::resize_graph(const size_t new_size)
+{
+    _graph.resize(new_size);
+    set_total_points(new_size);
+    return _graph.size();
+}
 
-InMemGraphStore::InMemGraphStore(const size_t max_pts) : AbstractGraphStore(max_pts)
+void InMemGraphStore::clear_graph()
 {
+    _graph.clear();
 }
 
-int InMemGraphStore::load(const std::string &index_path_prefix)
+#ifdef EXEC_ENV_OLS
+std::tuple<uint32_t, uint32_t, size_t> InMemGraphStore::load_impl(AlignedFileReader &reader, size_t expected_num_points)
 {
-    return 0;
+    size_t expected_file_size;
+    size_t file_frozen_pts;
+    uint32_t start;
+
+    auto max_points = get_max_points();
+    int header_size = 2 * sizeof(size_t) + 2 * sizeof(uint32_t);
+    std::unique_ptr<char[]> header = std::make_unique<char[]>(header_size);
+    read_array(reader, header.get(), header_size);
+
+    expected_file_size = *((size_t *)header.get());
+    _max_observed_degree = *((uint32_t *)(header.get() + sizeof(size_t)));
+    start = *((uint32_t *)(header.get() + sizeof(size_t) + sizeof(uint32_t)));
+    file_frozen_pts = *((size_t *)(header.get() + sizeof(size_t) + sizeof(uint32_t) + sizeof(uint32_t)));
+
+    diskann::cout << "From graph header, expected_file_size: " << expected_file_size
+                  << ", _max_observed_degree: " << _max_observed_degree << ", _start: " << start
+                  << ", file_frozen_pts: " << file_frozen_pts << std::endl;
+
+    diskann::cout << "Loading vamana graph from reader..." << std::flush;
+
+    // If user provides more points than max_points
+    // resize the _graph to the larger size.
+    if (get_total_points() < expected_num_points)
+    {
+        diskann::cout << "resizing graph to " << expected_num_points << std::endl;
+        this->resize_graph(expected_num_points);
+    }
+
+    uint32_t nodes_read = 0;
+    size_t cc = 0;
+    size_t graph_offset = header_size;
+    while (nodes_read < expected_num_points)
+    {
+        uint32_t k;
+        read_value(reader, k, graph_offset);
+        graph_offset += sizeof(uint32_t);
+        std::vector<uint32_t> tmp(k);
+        tmp.reserve(k);
+        read_array(reader, tmp.data(), k, graph_offset);
+        graph_offset += k * sizeof(uint32_t);
+        cc += k;
+        _graph[nodes_read].swap(tmp);
+        nodes_read++;
+        if (nodes_read % 1000000 == 0)
+        {
+            diskann::cout << "." << std::flush;
+        }
+        if (k > _max_range_of_graph)
+        {
+            _max_range_of_graph = k;
+        }
+    }
+
+    diskann::cout << "done. Index has " << nodes_read << " nodes and " << cc << " out-edges, _start is set to " << start
+                  << std::endl;
+    return std::make_tuple(nodes_read, start, file_frozen_pts);
 }
-int InMemGraphStore::store(const std::string &index_path_prefix)
+#endif
+
+std::tuple<uint32_t, uint32_t, size_t> InMemGraphStore::load_impl(const std::string &filename,
+                                                                  size_t expected_num_points)
 {
-    return 0;
+    size_t expected_file_size;
+    size_t file_frozen_pts;
+    uint32_t start;
+    size_t file_offset = 0; // will need this for single file format support
+
+    std::ifstream in;
+    in.exceptions(std::ios::badbit | std::ios::failbit);
+    in.open(filename, std::ios::binary);
+    in.seekg(file_offset, in.beg);
+    in.read((char *)&expected_file_size, sizeof(size_t));
+    in.read((char *)&_max_observed_degree, sizeof(uint32_t));
+    in.read((char *)&start, sizeof(uint32_t));
+    in.read((char *)&file_frozen_pts, sizeof(size_t));
+    size_t vamana_metadata_size = sizeof(size_t) + sizeof(uint32_t) + sizeof(uint32_t) + sizeof(size_t);
+
+    diskann::cout << "From graph header, expected_file_size: " << expected_file_size
+                  << ", _max_observed_degree: " << _max_observed_degree << ", _start: " << start
+                  << ", file_frozen_pts: " << file_frozen_pts << std::endl;
+
+    diskann::cout << "Loading vamana graph " << filename << "..." << std::flush;
+
+    // If user provides more points than max_points
+    // resize the _graph to the larger size.
+    if (get_total_points() < expected_num_points)
+    {
+        diskann::cout << "resizing graph to " << expected_num_points << std::endl;
+        this->resize_graph(expected_num_points);
+    }
+
+    size_t bytes_read = vamana_metadata_size;
+    size_t cc = 0;
+    uint32_t nodes_read = 0;
+    while (bytes_read != expected_file_size)
+    {
+        uint32_t k;
+        in.read((char *)&k, sizeof(uint32_t));
+
+        if (k == 0)
+        {
+            diskann::cerr << "ERROR: Point found with no out-neighbours, point#" << nodes_read << std::endl;
+        }
+
+        cc += k;
+        ++nodes_read;
+        std::vector<uint32_t> tmp(k);
+        tmp.reserve(k);
+        in.read((char *)tmp.data(), k * sizeof(uint32_t));
+        _graph[nodes_read - 1].swap(tmp);
+        bytes_read += sizeof(uint32_t) * ((size_t)k + 1);
+        if (nodes_read % 10000000 == 0)
+            diskann::cout << "." << std::flush;
+        if (k > _max_range_of_graph)
+        {
+            _max_range_of_graph = k;
+        }
+    }
+
+    diskann::cout << "done. Index has " << nodes_read << " nodes and " << cc << " out-edges, _start is set to " << start
+                  << std::endl;
+    return std::make_tuple(nodes_read, start, file_frozen_pts);
+}
+
+int InMemGraphStore::save_graph(const std::string &index_path_prefix, const size_t num_points,
+                                const size_t num_frozen_points, const uint32_t start)
+{
+    std::ofstream out;
+    open_file_to_write(out, index_path_prefix);
+
+    size_t file_offset = 0;
+    out.seekp(file_offset, out.beg);
+    size_t index_size = 24;
+    uint32_t max_degree = 0;
+    out.write((char *)&index_size, sizeof(uint64_t));
+    out.write((char *)&_max_observed_degree, sizeof(uint32_t));
+    uint32_t ep_u32 = start;
+    out.write((char *)&ep_u32, sizeof(uint32_t));
+    out.write((char *)&num_frozen_points, sizeof(size_t));
+
+    // Note: num_points = _nd + _num_frozen_points
+    for (uint32_t i = 0; i < num_points; i++)
+    {
+        uint32_t GK = (uint32_t)_graph[i].size();
+        out.write((char *)&GK, sizeof(uint32_t));
+        out.write((char *)_graph[i].data(), GK * sizeof(uint32_t));
+        max_degree = _graph[i].size() > max_degree ? (uint32_t)_graph[i].size() : max_degree;
+        index_size += (size_t)(sizeof(uint32_t) * (GK + 1));
+    }
+    out.seekp(file_offset, out.beg);
+    out.write((char *)&index_size, sizeof(uint64_t));
+    out.write((char *)&max_degree, sizeof(uint32_t));
+    out.close();
+    return (int)index_size;
 }
 
-void InMemGraphStore::get_adj_list(const location_t i, std::vector<location_t> &neighbors)
+size_t InMemGraphStore::get_max_range_of_graph()
 {
+    return _max_range_of_graph;
 }
 
-void InMemGraphStore::set_adj_list(const location_t i, std::vector<location_t> &neighbors)
+uint32_t InMemGraphStore::get_max_observed_degree()
 {
+    return _max_observed_degree;
 }
 
 } // namespace diskann
diff --git a/src/index.cpp b/src/index.cpp
index d76690c1b..3d4ae2619 100644
--- a/src/index.cpp
+++ b/src/index.cpp
@@ -1,24 +1,27 @@
 // Copyright (c) Microsoft Corporation. All rights reserved.
 // Licensed under the MIT license.
 
-#include <type_traits>
 #include <omp.h>
 #include <array>
 
-#include "tsl/robin_set.h"
-#include "tsl/robin_map.h"
-#include "boost/dynamic_bitset.hpp"
+#include <type_traits>
 
+#include "boost/dynamic_bitset.hpp"
+#include "index_factory.h"
 #include "memory_mapper.h"
 #include "timer.h"
+#include "tsl/robin_map.h"
+#include "tsl/robin_set.h"
 #include "windows_customizations.h"
-#if defined(RELEASE_UNUSED_TCMALLOC_MEMORY_AT_CHECKPOINTS) && defined(DISKANN_BUILD)
+#include "tag_uint128.h"
+#if defined(DISKANN_RELEASE_UNUSED_TCMALLOC_MEMORY_AT_CHECKPOINTS) && defined(DISKANN_BUILD)
 #include "gperftools/malloc_extension.h"
 #endif
 
 #ifdef _WINDOWS
 #include <xmmintrin.h>
 #endif
+
 #include "index.h"
 #include <limits>
 
@@ -29,59 +32,35 @@ namespace diskann
 // Initialize an index with metric m, load the data of type T with filename
 // (bin), and initialize max_points
 template <typename T, typename TagT, typename LabelT>
-Index<T, TagT, LabelT>::Index(Metric m, const size_t dim, const size_t max_points, const bool dynamic_index,
-                              const IndexWriteParameters &indexParams, const uint32_t initial_search_list_size,
-                              const uint32_t search_threads, const bool enable_tags, const bool concurrent_consolidate,
-                              const bool pq_dist_build, const size_t num_pq_chunks, const bool use_opq)
-    : Index(m, dim, max_points, dynamic_index, enable_tags, concurrent_consolidate, pq_dist_build, num_pq_chunks,
-            use_opq, indexParams.num_frozen_points)
-{
-    if (dynamic_index)
-    {
-        this->enable_delete();
-    }
-    _indexingQueueSize = indexParams.search_list_size;
-    _indexingRange = indexParams.max_degree;
-    _indexingMaxC = indexParams.max_occlusion_size;
-    _indexingAlpha = indexParams.alpha;
-    _filterIndexingQueueSize = indexParams.filter_list_size;
-
-    uint32_t num_threads_indx = indexParams.num_threads;
-    uint32_t num_scratch_spaces = search_threads + num_threads_indx;
-
-    initialize_query_scratch(num_scratch_spaces, initial_search_list_size, _indexingQueueSize, _indexingRange,
-                             _indexingMaxC, dim);
-}
-
-template <typename T, typename TagT, typename LabelT>
-Index<T, TagT, LabelT>::Index(Metric m, const size_t dim, const size_t max_points, const bool dynamic_index,
-                              const bool enable_tags, const bool concurrent_consolidate, const bool pq_dist_build,
-                              const size_t num_pq_chunks, const bool use_opq, const size_t num_frozen_pts,
-                              const bool init_data_store)
-    : _dist_metric(m), _dim(dim), _max_points(max_points), _num_frozen_pts(num_frozen_pts),
-      _dynamic_index(dynamic_index), _enable_tags(enable_tags), _indexingMaxC(DEFAULT_MAXC), _query_scratch(nullptr),
-      _pq_dist(pq_dist_build), _use_opq(use_opq), _num_pq_chunks(num_pq_chunks),
-      _delete_set(new tsl::robin_set<uint32_t>), _conc_consolidate(concurrent_consolidate)
-{
-    if (dynamic_index && !enable_tags)
+Index<T, TagT, LabelT>::Index(const IndexConfig &index_config, std::shared_ptr<AbstractDataStore<T>> data_store,
+                              std::unique_ptr<AbstractGraphStore> graph_store,
+                              std::shared_ptr<AbstractDataStore<T>> pq_data_store)
+    : _dist_metric(index_config.metric), _dim(index_config.dimension), _max_points(index_config.max_points),
+      _num_frozen_pts(index_config.num_frozen_pts), _dynamic_index(index_config.dynamic_index),
+      _enable_tags(index_config.enable_tags), _indexingMaxC(DEFAULT_MAXC), _query_scratch(nullptr),
+      _pq_dist(index_config.pq_dist_build), _use_opq(index_config.use_opq),
+      _filtered_index(index_config.filtered_index), _num_pq_chunks(index_config.num_pq_chunks),
+      _delete_set(new tsl::robin_set<uint32_t>), _conc_consolidate(index_config.concurrent_consolidate)
+{
+    if (_dynamic_index && !_enable_tags)
     {
         throw ANNException("ERROR: Dynamic Indexing must have tags enabled.", -1, __FUNCSIG__, __FILE__, __LINE__);
     }
 
     if (_pq_dist)
     {
-        if (dynamic_index)
+        if (_dynamic_index)
             throw ANNException("ERROR: Dynamic Indexing not supported with PQ distance based "
                                "index construction",
                                -1, __FUNCSIG__, __FILE__, __LINE__);
-        if (m == diskann::Metric::INNER_PRODUCT)
+        if (_dist_metric == diskann::Metric::INNER_PRODUCT)
             throw ANNException("ERROR: Inner product metrics not yet supported "
                                "with PQ distance "
                                "base index",
                                -1, __FUNCSIG__, __FILE__, __LINE__);
     }
 
-    if (dynamic_index && _num_frozen_pts == 0)
+    if (_dynamic_index && _num_frozen_pts == 0)
     {
         _num_frozen_pts = 1;
     }
@@ -93,77 +72,90 @@ Index<T, TagT, LabelT>::Index(Metric m, const size_t dim, const size_t max_point
     }
     const size_t total_internal_points = _max_points + _num_frozen_pts;
 
-    if (_pq_dist)
-    {
-        if (_num_pq_chunks > _dim)
-            throw diskann::ANNException("ERROR: num_pq_chunks > dim", -1, __FUNCSIG__, __FILE__, __LINE__);
-        alloc_aligned(((void **)&_pq_data), total_internal_points * _num_pq_chunks * sizeof(char), 8 * sizeof(char));
-        std::memset(_pq_data, 0, total_internal_points * _num_pq_chunks * sizeof(char));
-    }
-
     _start = (uint32_t)_max_points;
 
-    _final_graph.resize(total_internal_points);
-
-    if (init_data_store)
-    {
-        // Issue #374: data_store is injected from index factory. Keeping this for backward compatibility.
-        // distance is owned by data_store
-        if (m == diskann::Metric::COSINE && std::is_floating_point<T>::value)
-        {
-            // This is safe because T is float inside the if block.
-            this->_distance.reset((Distance<T> *)new AVXNormalizedCosineDistanceFloat());
-            this->_normalize_vecs = true;
-            diskann::cout << "Normalizing vectors and using L2 for cosine "
-                             "AVXNormalizedCosineDistanceFloat()."
-                          << std::endl;
-        }
-        else
-        {
-            this->_distance.reset((Distance<T> *)get_distance_function<T>(m));
-        }
-        // Note: moved this to factory, keeping this for backward compatibility.
-        _data_store =
-            std::make_unique<diskann::InMemDataStore<T>>((location_t)total_internal_points, _dim, this->_distance);
-    }
+    _data_store = data_store;
+    _pq_data_store = pq_data_store;
+    _graph_store = std::move(graph_store);
 
     _locks = std::vector<non_recursive_mutex>(total_internal_points);
-
-    if (enable_tags)
+    if (_enable_tags)
     {
         _location_to_tag.reserve(total_internal_points);
         _tag_to_location.reserve(total_internal_points);
     }
-}
-
-template <typename T, typename TagT, typename LabelT>
-Index<T, TagT, LabelT>::Index(const IndexConfig &index_config, std::unique_ptr<AbstractDataStore<T>> data_store)
-    : Index(index_config.metric, index_config.dimension, index_config.max_points, index_config.dynamic_index,
-            index_config.enable_tags, index_config.concurrent_consolidate, index_config.pq_dist_build,
-            index_config.num_pq_chunks, index_config.use_opq, index_config.num_frozen_pts, false)
-{
 
-    _data_store = std::move(data_store);
-    _distance.reset(_data_store->get_dist_fn());
-
-    // enable delete by default for dynamic index
     if (_dynamic_index)
     {
-        this->enable_delete();
+        this->enable_delete(); // enable delete by default for dynamic index
+        if (_filtered_index)
+        {
+            _location_to_labels.resize(total_internal_points);
+        }
     }
-    if (_dynamic_index && index_config.index_write_params != nullptr)
+
+    if (index_config.index_write_params != nullptr)
     {
         _indexingQueueSize = index_config.index_write_params->search_list_size;
         _indexingRange = index_config.index_write_params->max_degree;
         _indexingMaxC = index_config.index_write_params->max_occlusion_size;
         _indexingAlpha = index_config.index_write_params->alpha;
         _filterIndexingQueueSize = index_config.index_write_params->filter_list_size;
+        _indexingThreads = index_config.index_write_params->num_threads;
+        _saturate_graph = index_config.index_write_params->saturate_graph;
 
-        uint32_t num_threads_indx = index_config.index_write_params->num_threads;
-        uint32_t num_scratch_spaces = index_config.search_threads + num_threads_indx;
+        if (index_config.index_search_params != nullptr)
+        {
+            std::uint32_t default_queue_size = (std::max)(_indexingQueueSize, _filterIndexingQueueSize);
+            uint32_t num_scratch_spaces = index_config.index_search_params->num_search_threads + _indexingThreads;
+            initialize_query_scratch(num_scratch_spaces, index_config.index_search_params->initial_search_list_size,
+                default_queue_size, _indexingRange, _indexingMaxC, _data_store->get_dims());
+        }
+    }
+}
 
-        initialize_query_scratch(num_scratch_spaces, index_config.initial_search_list_size, _indexingQueueSize,
-                                 _indexingRange, _indexingMaxC, _data_store->get_dims());
+template <typename T, typename TagT, typename LabelT>
+Index<T, TagT, LabelT>::Index(Metric m, const size_t dim, const size_t max_points,
+                              const std::shared_ptr<IndexWriteParameters> index_parameters,
+                              const std::shared_ptr<IndexSearchParams> index_search_params, const size_t num_frozen_pts,
+                              const bool dynamic_index, const bool enable_tags, const bool concurrent_consolidate,
+                              const bool pq_dist_build, const size_t num_pq_chunks, const bool use_opq,
+                              const bool filtered_index)
+    : Index(
+          IndexConfigBuilder()
+              .with_metric(m)
+              .with_dimension(dim)
+              .with_max_points(max_points)
+              .with_index_write_params(index_parameters)
+              .with_index_search_params(index_search_params)
+              .with_num_frozen_pts(num_frozen_pts)
+              .is_dynamic_index(dynamic_index)
+              .is_enable_tags(enable_tags)
+              .is_concurrent_consolidate(concurrent_consolidate)
+              .is_pq_dist_build(pq_dist_build)
+              .with_num_pq_chunks(num_pq_chunks)
+              .is_use_opq(use_opq)
+              .is_filtered(filtered_index)
+              .with_data_type(diskann_type_to_name<T>())
+              .build(),
+          IndexFactory::construct_datastore<T>(DataStoreStrategy::MEMORY,
+                                               (max_points == 0 ? (size_t)1 : max_points) +
+                                                   (dynamic_index && num_frozen_pts == 0 ? (size_t)1 : num_frozen_pts),
+                                               dim, m),
+          IndexFactory::construct_graphstore(GraphStoreStrategy::MEMORY,
+                                             (max_points == 0 ? (size_t)1 : max_points) +
+                                                 (dynamic_index && num_frozen_pts == 0 ? (size_t)1 : num_frozen_pts),
+                                             (size_t)((index_parameters == nullptr ? 0 : index_parameters->max_degree) *
+                                                      defaults::GRAPH_SLACK_FACTOR * 1.05)))
+{
+    if (_pq_dist)
+    {
+        _pq_data_store = IndexFactory::construct_pq_datastore<T>(DataStoreStrategy::MEMORY, max_points + num_frozen_pts,
+                                                                 dim, m, num_pq_chunks, use_opq);
+    }
+    else
+    {
+        _pq_data_store = _data_store;
     }
 }
 
@@ -180,13 +172,6 @@ template <typename T, typename TagT, typename LabelT> Index<T, TagT, LabelT>::~I
         LockGuard lg(lock);
     }
 
-    // if (this->_distance != nullptr)
-    //{
-    //     delete this->_distance;
-    //     this->_distance = nullptr;
-    // }
-    // REFACTOR
-
     if (_opt_graph != nullptr)
     {
         delete[] _opt_graph;
@@ -218,6 +203,7 @@ template <typename T, typename TagT, typename LabelT> size_t Index<T, TagT, Labe
         diskann::cout << "Not saving tags as they are not enabled." << std::endl;
         return 0;
     }
+
     size_t tag_bytes_written;
     TagT *tag_data = new TagT[_nd + _num_frozen_pts];
     for (uint32_t i = 0; i < _nd; i++)
@@ -252,7 +238,7 @@ template <typename T, typename TagT, typename LabelT> size_t Index<T, TagT, Labe
 template <typename T, typename TagT, typename LabelT> size_t Index<T, TagT, LabelT>::save_data(std::string data_file)
 {
     // Note: at this point, either _nd == _max_points or any frozen points have
-    // been temporarily moved to _nd, so _nd + _num_frozen_points is the valid
+    // been temporarily moved to _nd, so _nd + _num_frozen_pts is the valid
     // location limit.
     return _data_store->save(data_file, (location_t)(_nd + _num_frozen_pts));
 }
@@ -262,34 +248,7 @@ template <typename T, typename TagT, typename LabelT> size_t Index<T, TagT, Labe
 // 4 byte uint32_t)
 template <typename T, typename TagT, typename LabelT> size_t Index<T, TagT, LabelT>::save_graph(std::string graph_file)
 {
-    std::ofstream out;
-    open_file_to_write(out, graph_file);
-
-    size_t file_offset = 0; // we will use this if we want
-    out.seekp(file_offset, out.beg);
-    size_t index_size = 24;
-    uint32_t max_degree = 0;
-    out.write((char *)&index_size, sizeof(uint64_t));
-    out.write((char *)&_max_observed_degree, sizeof(uint32_t));
-    uint32_t ep_u32 = _start;
-    out.write((char *)&ep_u32, sizeof(uint32_t));
-    out.write((char *)&_num_frozen_pts, sizeof(size_t));
-    // Note: at this point, either _nd == _max_points or any frozen points have
-    // been temporarily moved to _nd, so _nd + _num_frozen_points is the valid
-    // location limit.
-    for (uint32_t i = 0; i < _nd + _num_frozen_pts; i++)
-    {
-        uint32_t GK = (uint32_t)_final_graph[i].size();
-        out.write((char *)&GK, sizeof(uint32_t));
-        out.write((char *)_final_graph[i].data(), GK * sizeof(uint32_t));
-        max_degree = _final_graph[i].size() > max_degree ? (uint32_t)_final_graph[i].size() : max_degree;
-        index_size += (size_t)(sizeof(uint32_t) * (GK + 1));
-    }
-    out.seekp(file_offset, out.beg);
-    out.write((char *)&index_size, sizeof(uint64_t));
-    out.write((char *)&max_degree, sizeof(uint32_t));
-    out.close();
-    return index_size; // number of bytes written
+    return _graph_store->store(graph_file, _nd + _num_frozen_pts, _num_frozen_pts, _start);
 }
 
 template <typename T, typename TagT, typename LabelT>
@@ -336,14 +295,14 @@ void Index<T, TagT, LabelT>::save(const char *filename, bool compact_before_save
     {
         if (_filtered_index)
         {
-            if (_label_to_medoid_id.size() > 0)
+            if (_label_to_start_id.size() > 0)
             {
                 std::ofstream medoid_writer(std::string(filename) + "_labels_to_medoids.txt");
                 if (medoid_writer.fail())
                 {
                     throw diskann::ANNException(std::string("Failed to open file ") + filename, -1);
                 }
-                for (auto iter : _label_to_medoid_id)
+                for (auto iter : _label_to_start_id)
                 {
                     medoid_writer << iter.first << ", " << iter.second << std::endl;
                 }
@@ -358,21 +317,51 @@ void Index<T, TagT, LabelT>::save(const char *filename, bool compact_before_save
                 universal_label_writer.close();
             }
 
-            if (_pts_to_labels.size() > 0)
+            if (_location_to_labels.size() > 0)
             {
                 std::ofstream label_writer(std::string(filename) + "_labels.txt");
                 assert(label_writer.is_open());
-                for (uint32_t i = 0; i < _pts_to_labels.size(); i++)
+                for (uint32_t i = 0; i < _nd + _num_frozen_pts; i++)
                 {
-                    for (uint32_t j = 0; j < (_pts_to_labels[i].size() - 1); j++)
+                    for (uint32_t j = 0; j + 1 < _location_to_labels[i].size(); j++)
                     {
-                        label_writer << _pts_to_labels[i][j] << ",";
+                        label_writer << _location_to_labels[i][j] << ",";
                     }
-                    if (_pts_to_labels[i].size() != 0)
-                        label_writer << _pts_to_labels[i][_pts_to_labels[i].size() - 1];
+                    if (_location_to_labels[i].size() != 0)
+                        label_writer << _location_to_labels[i][_location_to_labels[i].size() - 1];
+
                     label_writer << std::endl;
                 }
                 label_writer.close();
+
+                // write compacted raw_labels if data hence _location_to_labels was also compacted
+                if (compact_before_save && _dynamic_index)
+                {
+                    _label_map = load_label_map(std::string(filename) + "_labels_map.txt");
+                    std::unordered_map<LabelT, std::string> mapped_to_raw_labels;
+                    // invert label map
+                    for (const auto &[key, value] : _label_map)
+                    {
+                        mapped_to_raw_labels.insert({value, key});
+                    }
+
+                    // write updated labels
+                    std::ofstream raw_label_writer(std::string(filename) + "_raw_labels.txt");
+                    assert(raw_label_writer.is_open());
+                    for (uint32_t i = 0; i < _nd + _num_frozen_pts; i++)
+                    {
+                        for (uint32_t j = 0; j + 1 < _location_to_labels[i].size(); j++)
+                        {
+                            raw_label_writer << mapped_to_raw_labels[_location_to_labels[i][j]] << ",";
+                        }
+                        if (_location_to_labels[i].size() != 0)
+                            raw_label_writer
+                                << mapped_to_raw_labels[_location_to_labels[i][_location_to_labels[i].size() - 1]];
+
+                        raw_label_writer << std::endl;
+                    }
+                    raw_label_writer.close();
+                }
             }
         }
 
@@ -505,7 +494,8 @@ size_t Index<T, TagT, LabelT>::load_data(std::string filename)
     }
 
 #ifdef EXEC_ENV_OLS
-    // REFACTOR TODO: Must figure out how to support aligned reader in a clean manner.
+    // REFACTOR TODO: Must figure out how to support aligned reader in a clean
+    // manner.
     copy_aligned_data_from_file<T>(reader, _data, file_num_points, file_dim, _data_store->get_aligned_dim());
 #else
     _data_store->load(filename); // offset == 0.
@@ -556,12 +546,12 @@ void Index<T, TagT, LabelT>::load(const char *filename, uint32_t num_threads, ui
     _has_built = true;
 
     size_t tags_file_num_pts = 0, graph_num_pts = 0, data_file_num_pts = 0, label_num_pts = 0;
-#ifndef EXEC_ENV_OLS
+
     std::string mem_index_file(filename);
     std::string labels_file = mem_index_file + "_labels.txt";
     std::string labels_to_medoids = mem_index_file + "_labels_to_medoids.txt";
     std::string labels_map_file = mem_index_file + "_labels_map.txt";
-#endif
+
     if (!_save_as_one_file)
     {
         // For DLVS Store, we will not support saving the index in multiple
@@ -600,20 +590,19 @@ void Index<T, TagT, LabelT>::load(const char *filename, uint32_t num_threads, ui
         diskann::cerr << stream.str() << std::endl;
         throw diskann::ANNException(stream.str(), -1, __FUNCSIG__, __FILE__, __LINE__);
     }
-#ifndef EXEC_ENV_OLS
+
     if (file_exists(labels_file))
     {
         _label_map = load_label_map(labels_map_file);
         parse_label_file_in_bitset(labels_file, label_num_pts, _label_map.size());
-
-        assert(label_num_pts == data_file_num_pts);
+        assert(label_num_pts == data_file_num_pts - _num_frozen_pts);
         if (file_exists(labels_to_medoids))
         {
             std::ifstream medoid_stream(labels_to_medoids);
             std::string line, token;
             uint32_t line_cnt = 0;
 
-            _label_to_medoid_id.clear();
+            _label_to_start_id.clear();
 
             while (std::getline(medoid_stream, line))
             {
@@ -632,7 +621,7 @@ void Index<T, TagT, LabelT>::load(const char *filename, uint32_t num_threads, ui
                         medoid = token_as_num;
                     cnt++;
                 }
-                _label_to_medoid_id[label] = medoid;
+                _label_to_start_id[label] = medoid;
                 line_cnt++;
             }
         }
@@ -647,7 +636,7 @@ void Index<T, TagT, LabelT>::load(const char *filename, uint32_t num_threads, ui
             universal_label_reader.close();
         }
     }
-#endif
+
     _nd = data_file_num_pts - _num_frozen_pts;
     _empty_slots.clear();
     _empty_slots.reserve(_max_points);
@@ -668,7 +657,7 @@ void Index<T, TagT, LabelT>::load(const char *filename, uint32_t num_threads, ui
     // initialize_q_s().
     if (_query_scratch.size() == 0)
     {
-        initialize_query_scratch(num_threads, search_l, search_l, (uint32_t)_max_range_of_loaded_graph, _indexingMaxC,
+        initialize_query_scratch(num_threads, search_l, search_l, (uint32_t)_graph_store->get_max_range_of_graph(), _indexingMaxC,
                                  _dim, _bitmask_buf._bitmask_size);
     }
 }
@@ -704,131 +693,10 @@ template <typename T, typename TagT, typename LabelT>
 size_t Index<T, TagT, LabelT>::load_graph(std::string filename, size_t expected_num_points)
 {
 #endif
-    size_t expected_file_size;
-    size_t file_frozen_pts;
-
-#ifdef EXEC_ENV_OLS
-    int header_size = 2 * sizeof(size_t) + 2 * sizeof(uint32_t);
-    std::unique_ptr<char[]> header = std::make_unique<char[]>(header_size);
-    read_array(reader, header.get(), header_size);
-
-    expected_file_size = *((size_t *)header.get());
-    _max_observed_degree = *((uint32_t *)(header.get() + sizeof(size_t)));
-    _start = *((uint32_t *)(header.get() + sizeof(size_t) + sizeof(uint32_t)));
-    file_frozen_pts = *((size_t *)(header.get() + sizeof(size_t) + sizeof(uint32_t) + sizeof(uint32_t)));
-#else
-
-    size_t file_offset = 0; // will need this for single file format support
-    std::ifstream in;
-    in.exceptions(std::ios::badbit | std::ios::failbit);
-    in.open(filename, std::ios::binary);
-    in.seekg(file_offset, in.beg);
-    in.read((char *)&expected_file_size, sizeof(size_t));
-    in.read((char *)&_max_observed_degree, sizeof(uint32_t));
-    in.read((char *)&_start, sizeof(uint32_t));
-    in.read((char *)&file_frozen_pts, sizeof(size_t));
-    size_t vamana_metadata_size = sizeof(size_t) + sizeof(uint32_t) + sizeof(uint32_t) + sizeof(size_t);
-
-#endif
-    diskann::cout << "From graph header, expected_file_size: " << expected_file_size
-                  << ", _max_observed_degree: " << _max_observed_degree << ", _start: " << _start
-                  << ", file_frozen_pts: " << file_frozen_pts << std::endl;
-
-    if (file_frozen_pts != _num_frozen_pts)
-    {
-        std::stringstream stream;
-        if (file_frozen_pts == 1)
-        {
-            stream << "ERROR: When loading index, detected dynamic index, but "
-                      "constructor asks for static index. Exitting."
-                   << std::endl;
-        }
-        else
-        {
-            stream << "ERROR: When loading index, detected static index, but "
-                      "constructor asks for dynamic index. Exitting."
-                   << std::endl;
-        }
-        diskann::cerr << stream.str() << std::endl;
-        throw diskann::ANNException(stream.str(), -1, __FUNCSIG__, __FILE__, __LINE__);
-    }
-
-#ifdef EXEC_ENV_OLS
-    diskann::cout << "Loading vamana graph from reader..." << std::flush;
-#else
-    diskann::cout << "Loading vamana graph " << filename << "..." << std::flush;
-#endif
-
-    const size_t expected_max_points = expected_num_points - file_frozen_pts;
-
-    // If user provides more points than max_points
-    // resize the _final_graph to the larger size.
-    if (_max_points < expected_max_points)
-    {
-        diskann::cout << "Number of points in data: " << expected_max_points
-                      << " is greater than max_points: " << _max_points
-                      << " Setting max points to: " << expected_max_points << std::endl;
-        _final_graph.resize(expected_max_points + _num_frozen_pts);
-        _max_points = expected_max_points;
-    }
-#ifdef EXEC_ENV_OLS
-    uint32_t nodes_read = 0;
-    size_t cc = 0;
-    size_t graph_offset = header_size;
-    while (nodes_read < expected_num_points)
-    {
-        uint32_t k;
-        read_value(reader, k, graph_offset);
-        graph_offset += sizeof(uint32_t);
-        std::vector<uint32_t> tmp(k);
-        tmp.reserve(k);
-        read_array(reader, tmp.data(), k, graph_offset);
-        graph_offset += k * sizeof(uint32_t);
-        cc += k;
-        _final_graph[nodes_read].swap(tmp);
-        nodes_read++;
-        if (nodes_read % 1000000 == 0)
-        {
-            diskann::cout << "." << std::flush;
-        }
-        if (k > _max_range_of_loaded_graph)
-        {
-            _max_range_of_loaded_graph = k;
-        }
-    }
-#else
-    size_t bytes_read = vamana_metadata_size;
-    size_t cc = 0;
-    uint32_t nodes_read = 0;
-    while (bytes_read != expected_file_size)
-    {
-        uint32_t k;
-        in.read((char *)&k, sizeof(uint32_t));
-
-        if (k == 0)
-        {
-            diskann::cerr << "ERROR: Point found with no out-neighbors, point#" << nodes_read << std::endl;
-        }
-
-        cc += k;
-        ++nodes_read;
-        std::vector<uint32_t> tmp(k);
-        tmp.reserve(k);
-        in.read((char *)tmp.data(), k * sizeof(uint32_t));
-        _final_graph[nodes_read - 1].swap(tmp);
-        bytes_read += sizeof(uint32_t) * ((size_t)k + 1);
-        if (nodes_read % 10000000 == 0)
-            diskann::cout << "." << std::flush;
-        if (k > _max_range_of_loaded_graph)
-        {
-            _max_range_of_loaded_graph = k;
-        }
-    }
-#endif
-
-    diskann::cout << "done. Index has " << nodes_read << " nodes and " << cc << " out-edges, _start is set to "
-                  << _start << std::endl;
-    return nodes_read;
+    auto res = _graph_store->load(filename, expected_num_points);
+    _start = std::get<1>(res);
+    _num_frozen_pts = std::get<2>(res);
+    return std::get<0>(res);
 }
 
 template <typename T, typename TagT, typename LabelT>
@@ -855,7 +723,7 @@ template <typename T, typename TagT, typename LabelT> int Index<T, TagT, LabelT>
     std::shared_lock<std::shared_timed_mutex> lock(_tag_lock);
     if (_tag_to_location.find(tag) == _tag_to_location.end())
     {
-        diskann::cout << "Tag " << tag << " does not exist" << std::endl;
+        diskann::cout << "Tag " << get_tag_string(tag) << " does not exist" << std::endl;
         return -1;
     }
 
@@ -867,14 +735,7 @@ template <typename T, typename TagT, typename LabelT> int Index<T, TagT, LabelT>
 
 template <typename T, typename TagT, typename LabelT> uint32_t Index<T, TagT, LabelT>::calculate_entry_point()
 {
-    //  TODO: need to compute medoid with PQ data too, for now sample at random
-    if (_pq_dist)
-    {
-        size_t r = (size_t)rand() * (size_t)RAND_MAX + (size_t)rand();
-        return (uint32_t)(r % (size_t)_nd);
-    }
-
-    // TODO: This function does not support multi-threaded calculation of medoid.
+    // REFACTOR TODO: This function does not support multi-threaded calculation of medoid.
     // Must revisit if perf is a concern.
     return _data_store->calculate_medoid();
 }
@@ -897,19 +758,20 @@ template <typename T, typename TagT, typename LabelT> std::vector<uint32_t> Inde
     return init_ids;
 }
 
-// Find common filter between a node's labels and a given set of labels, while taking into account universal label
+// Find common filter between a node's labels and a given set of labels, while
+// taking into account universal label
 template <typename T, typename TagT, typename LabelT>
 bool Index<T, TagT, LabelT>::detect_common_filters(uint32_t point_id, bool search_invocation,
                                                    const std::vector<LabelT> &incoming_labels)
 {
-    auto &curr_node_labels = _pts_to_labels[point_id];
+    auto &curr_node_labels = _location_to_labels[point_id];
     std::vector<LabelT> common_filters;
     std::set_intersection(incoming_labels.begin(), incoming_labels.end(), curr_node_labels.begin(),
                           curr_node_labels.end(), std::back_inserter(common_filters));
     if (common_filters.size() > 0)
     {
-        // This is to reduce the repetitive calls. If common_filters size is > 0 , we dont need to check further for
-        // universal label
+        // This is to reduce the repetitive calls. If common_filters size is > 0 ,
+        // we dont need to check further for universal label
         return true;
     }
     if (_use_universal_label)
@@ -931,8 +793,8 @@ bool Index<T, TagT, LabelT>::detect_common_filters(uint32_t point_id, bool searc
 
 template <typename T, typename TagT, typename LabelT>
 std::pair<uint32_t, uint32_t> Index<T, TagT, LabelT>::iterate_to_fixed_point(
-    const T *query, const uint32_t Lsize, const std::vector<uint32_t> &init_ids, InMemQueryScratch<T> *scratch,
-    bool use_filter, const std::vector<LabelT> &filter_label, bool search_invocation)
+    InMemQueryScratch<T> *scratch, const uint32_t Lsize, const std::vector<uint32_t> &init_ids, bool use_filter,
+    const std::vector<LabelT> &filter_labels, bool search_invocation)
 {
     std::vector<Neighbor> &expanded_nodes = scratch->pool();
     NeighborPriorityQueue &best_L_nodes = scratch->best_l_nodes();
@@ -943,42 +805,12 @@ std::pair<uint32_t, uint32_t> Index<T, TagT, LabelT>::iterate_to_fixed_point(
     std::vector<float> &dist_scratch = scratch->dist_scratch();
     assert(id_scratch.size() == 0);
 
-    // REFACTOR
-    // T *aligned_query = scratch->aligned_query();
-    // memcpy(aligned_query, query, _dim * sizeof(T));
-    // if (_normalize_vecs)
-    //{
-    //     normalize((float *)aligned_query, _dim);
-    // }
-
     T *aligned_query = scratch->aligned_query();
     std::vector<std::uint64_t>& query_bitmask_buf = scratch->query_label_bitmask();
-    float *query_float = nullptr;
-    float *query_rotated = nullptr;
-    float *pq_dists = nullptr;
-    uint8_t *pq_coord_scratch = nullptr;
-    // Intialize PQ related scratch to use PQ based distances
-    if (_pq_dist)
-    {
-        // Get scratch spaces
-        PQScratch<T> *pq_query_scratch = scratch->pq_scratch();
-        query_float = pq_query_scratch->aligned_query_float;
-        query_rotated = pq_query_scratch->rotated_query;
-        pq_dists = pq_query_scratch->aligned_pqtable_dist_scratch;
-
-        // Copy query vector to float and then to "rotated" query
-        for (size_t d = 0; d < _dim; d++)
-        {
-            query_float[d] = (float)aligned_query[d];
-        }
-        pq_query_scratch->set(_dim, aligned_query);
 
-        // center the query and rotate if we have a rotation matrix
-        _pq_table.preprocess_query(query_rotated);
-        _pq_table.populate_chunk_distances(query_rotated, pq_dists);
+    float *pq_dists = nullptr;
 
-        pq_coord_scratch = pq_query_scratch->aligned_pq_coord_scratch;
-    }
+    _pq_data_store->preprocess_query(aligned_query, scratch);
 
     if (expanded_nodes.size() > 0 || id_scratch.size() > 0)
     {
@@ -1007,10 +839,8 @@ std::pair<uint32_t, uint32_t> Index<T, TagT, LabelT>::iterate_to_fixed_point(
     };
 
     // Lambda to batch compute query<-> node distances in PQ space
-    auto compute_dists = [this, pq_coord_scratch, pq_dists](const std::vector<uint32_t> &ids,
-                                                            std::vector<float> &dists_out) {
-        diskann::aggregate_coords(ids, this->_pq_data, this->_num_pq_chunks, pq_coord_scratch);
-        diskann::pq_dist_lookup(pq_coord_scratch, ids.size(), this->_num_pq_chunks, pq_dists, dists_out);
+    auto compute_dists = [this, scratch, pq_dists](const std::vector<uint32_t> &ids, std::vector<float> &dists_out) {
+        _pq_data_store->get_distance(scratch->aligned_query(), ids, dists_out, scratch);
     };
 
     // only support one filter label
@@ -1029,9 +859,9 @@ std::pair<uint32_t, uint32_t> Index<T, TagT, LabelT>::iterate_to_fixed_point(
             bitmask_full_val._mask = query_bitmask_buf.data();
         }
         
-        for (size_t i = 0; i < filter_label.size(); i++)
+        for (size_t i = 0; i < filter_labels.size(); i++)
         {
-            auto bitmask_val = simple_bitmask::get_bitmask_val(filter_label[i]);
+            auto bitmask_val = simple_bitmask::get_bitmask_val(filter_labels[i]);
             bitmask_full_val.merge_bitmask_val(bitmask_val);
         }
 
@@ -1074,14 +904,11 @@ std::pair<uint32_t, uint32_t> Index<T, TagT, LabelT>::iterate_to_fixed_point(
             }
 
             float distance;
-            if (_pq_dist)
-            {
-                pq_dist_lookup(pq_coord_scratch, 1, this->_num_pq_chunks, pq_dists, &distance);
-            }
-            else
-            {
-                distance = _data_store->get_distance(aligned_query, id);
-            }
+            uint32_t ids[] = {id};
+            float distances[] = {std::numeric_limits<float>::max()};
+            _pq_data_store->get_distance(aligned_query, ids, 1, distances, scratch);
+            distance = distances[0];
+
             Neighbor nn = Neighbor(id, distance);
             best_L_nodes.insert(nn);
         }
@@ -1090,6 +917,7 @@ std::pair<uint32_t, uint32_t> Index<T, TagT, LabelT>::iterate_to_fixed_point(
     uint32_t hops = 0;
     uint32_t cmps = 0;
     cmps += static_cast<uint32_t>(init_ids.size());
+    std::vector<location_t> tmp_neighbor_list;
 
     while (best_L_nodes.has_unexpanded_node())
     {
@@ -1117,10 +945,10 @@ std::pair<uint32_t, uint32_t> Index<T, TagT, LabelT>::iterate_to_fixed_point(
         // Find which of the nodes in des have not been visited before
         id_scratch.clear();
         dist_scratch.clear();
+        if (_dynamic_index)
         {
-            if (_dynamic_index)
-                _locks[n].lock();
-            for (auto id : _final_graph[n])
+            LockGuard guard(_locks[n]);
+            for (auto id : _graph_store->get_neighbours(n))
             {
                 assert(id < _max_points + _num_frozen_pts);
 
@@ -1141,10 +969,41 @@ std::pair<uint32_t, uint32_t> Index<T, TagT, LabelT>::iterate_to_fixed_point(
                 }
 
                 id_scratch.push_back(id);
+                
             }
+        }
+        else
+        {
+            tmp_neighbor_list.clear();
+            _locks[n].lock_shared();
+            auto& nbrs = _graph_store->get_neighbours(n);
+            tmp_neighbor_list.resize(nbrs.size());
+            memcpy(tmp_neighbor_list.data(), nbrs.data(), nbrs.size() * sizeof(location_t));
+            _locks[n].unlock_shared();
+            for (auto id : tmp_neighbor_list)
+            {
+                assert(id < _max_points + _num_frozen_pts);
+
+                if (!is_not_visited(id))
+                {
+                    continue;
+                }
+                cmps++;
+                if (use_filter)
+                {
+                    // NOTE: NEED TO CHECK IF THIS CORRECT WITH NEW LOCKS.
+                    simple_bitmask bm(_bitmask_buf.get_bitmask(id), _bitmask_buf._bitmask_size);
 
-            if (_dynamic_index)
-                _locks[n].unlock();
+                    if (!bm.test_full_mask_val(bitmask_full_val))
+                    {
+                        continue;
+                    }
+                }
+
+
+                id_scratch.push_back(id);
+                
+            }
         }
 
         // Mark nodes visited
@@ -1160,29 +1019,10 @@ std::pair<uint32_t, uint32_t> Index<T, TagT, LabelT>::iterate_to_fixed_point(
             }
         }
 
-        // Compute distances to unvisited nodes in the expansion
-        if (_pq_dist)
-        {
-            assert(dist_scratch.capacity() >= id_scratch.size());
-            compute_dists(id_scratch, dist_scratch);
-        }
-        else
-        {
-            assert(dist_scratch.size() == 0);
-            for (size_t m = 0; m < id_scratch.size(); ++m)
-            {
-                uint32_t id = id_scratch[m];
-
-                if (m + 1 < id_scratch.size())
-                {
-                    auto nextn = id_scratch[m + 1];
-                    _data_store->prefetch_vector(nextn);
-                }
-
-                dist_scratch.push_back(_data_store->get_distance(aligned_query, id));
-            }
-        }
-//        cmps += id_scratch.size();
+        dist_scratch.resize(id_scratch.size());
+        //assert(dist_scratch.capacity() >= id_scratch.size());
+        compute_dists(id_scratch, dist_scratch);
+        cmps += (uint32_t)id_scratch.size();
 
         // Insert <id, dist> pairs into the pool of candidates
         for (size_t m = 0; m < id_scratch.size(); ++m)
@@ -1205,17 +1045,51 @@ void Index<T, TagT, LabelT>::search_for_point_and_prune(int location, uint32_t L
     if (!use_filter)
     {
         _data_store->get_vector(location, scratch->aligned_query());
-        iterate_to_fixed_point(scratch->aligned_query(), Lindex, init_ids, scratch, false, unused_filter_label, false);
+        iterate_to_fixed_point(scratch, Lindex, init_ids, false, unused_filter_label, false);
     }
     else
     {
+        std::shared_lock<std::shared_timed_mutex> tl(_tag_lock, std::defer_lock);
+        if (_dynamic_index)
+            tl.lock();
         std::vector<uint32_t> filter_specific_start_nodes;
-        for (auto &x : _pts_to_labels[location])
-            filter_specific_start_nodes.emplace_back(_label_to_medoid_id[x]);
+        for (auto &x : _location_to_labels[location])
+            filter_specific_start_nodes.emplace_back(_label_to_start_id[x]);
+
+        if (_dynamic_index)
+            tl.unlock();
 
         _data_store->get_vector(location, scratch->aligned_query());
-        iterate_to_fixed_point(scratch->aligned_query(), filteredLindex, filter_specific_start_nodes, scratch, true,
-                               _pts_to_labels[location], false);
+        iterate_to_fixed_point(scratch, filteredLindex, filter_specific_start_nodes, true,
+                               _location_to_labels[location], false);
+
+        if (Lindex > 0)
+        {
+            // combine candidate pools obtained with filter and unfiltered criteria.
+            std::set<Neighbor> best_candidate_pool;
+            for (auto filtered_neighbor : scratch->pool())
+            {
+                best_candidate_pool.insert(filtered_neighbor);
+            }
+
+            // clear scratch for finding unfiltered candidates
+            scratch->clear();
+
+            _data_store->get_vector(location, scratch->aligned_query());
+            iterate_to_fixed_point(scratch, Lindex, init_ids, false, unused_filter_label, false);
+
+            for (auto unfiltered_neighbour : scratch->pool())
+            {
+                // insert if this neighbour is not already in best_candidate_pool
+                if (best_candidate_pool.find(unfiltered_neighbour) == best_candidate_pool.end())
+                {
+                    best_candidate_pool.insert(unfiltered_neighbour);
+                }
+            }
+
+            scratch->pool().clear();
+            std::copy(best_candidate_pool.begin(), best_candidate_pool.end(), std::back_inserter(scratch->pool()));
+        }
     }
 
     auto &pool = scratch->pool();
@@ -1237,7 +1111,7 @@ void Index<T, TagT, LabelT>::search_for_point_and_prune(int location, uint32_t L
     prune_neighbors(location, pool, pruned_list, scratch);
 
     assert(!pruned_list.empty());
-    assert(_final_graph.size() == _max_points + _num_frozen_pts);
+    assert(_graph_store->get_total_points() == _max_points + _num_frozen_pts);
 }
 
 template <typename T, typename TagT, typename LabelT>
@@ -1348,9 +1222,8 @@ void Index<T, TagT, LabelT>::prune_neighbors(const uint32_t location, std::vecto
         return;
     }
 
-    _max_observed_degree = (std::max)(_max_observed_degree, range);
-
     // If using _pq_build, over-write the PQ distances with actual distances
+    // REFACTOR PQ: TODO: How to get rid of this!?
     if (_pq_dist)
     {
         for (auto &ngh : pool)
@@ -1386,27 +1259,40 @@ void Index<T, TagT, LabelT>::inter_insert(uint32_t n, std::vector<uint32_t> &pru
 
     assert(!src_pool.empty());
 
+    // des_pool contains the neighbors of the neighbors of n
+    std::vector<uint32_t> copy_of_neighbors;
+
     for (auto des : src_pool)
     {
         // des.loc is the loc of the neighbors of n
         assert(des < _max_points + _num_frozen_pts);
-        // des_pool contains the neighbors of the neighbors of n
-        std::vector<uint32_t> copy_of_neighbors;
+        
         bool prune_needed = false;
         {
-            LockGuard guard(_locks[des]);
-            auto &des_pool = _final_graph[des];
-            if (std::find(des_pool.begin(), des_pool.end(), n) == des_pool.end())
+            copy_of_neighbors.clear();
+        //    LockGuard guard(_locks[des]);
+            _locks[des].lock_shared();
+            auto &des_pool = _graph_store->get_neighbours(des);
+            copy_of_neighbors.reserve(des_pool.size() + 1);
+            for (auto& des_n : des_pool)
             {
-                if (des_pool.size() < (uint64_t)(GRAPH_SLACK_FACTOR * range))
+                copy_of_neighbors.push_back(des_n);
+            }
+            _locks[des].unlock_shared();
+
+            if (std::find(copy_of_neighbors.begin(), copy_of_neighbors.end(), n) == copy_of_neighbors.end())
+            {
+                if (copy_of_neighbors.size() < (uint64_t)(defaults::GRAPH_SLACK_FACTOR * range))
                 {
-                    des_pool.emplace_back(n);
+                    LockGuard guard(_locks[des]);
+                    // des_pool.emplace_back(n);
+                    _graph_store->add_neighbour(des, n);
                     prune_needed = false;
                 }
                 else
                 {
-                    copy_of_neighbors.reserve(des_pool.size() + 1);
-                    copy_of_neighbors = des_pool;
+                //    copy_of_neighbors.reserve(des_pool.size() + 1);
+                //    copy_of_neighbors = des_pool;
                     copy_of_neighbors.push_back(n);
                     prune_needed = true;
                 }
@@ -1418,7 +1304,7 @@ void Index<T, TagT, LabelT>::inter_insert(uint32_t n, std::vector<uint32_t> &pru
             tsl::robin_set<uint32_t> dummy_visited(0);
             std::vector<Neighbor> dummy_pool(0);
 
-            size_t reserveSize = (size_t)(std::ceil(1.05 * GRAPH_SLACK_FACTOR * range));
+            size_t reserveSize = (size_t)(std::ceil(1.05 * defaults::GRAPH_SLACK_FACTOR * range));
             dummy_visited.reserve(reserveSize);
             dummy_pool.reserve(reserveSize);
 
@@ -1436,7 +1322,7 @@ void Index<T, TagT, LabelT>::inter_insert(uint32_t n, std::vector<uint32_t> &pru
             {
                 LockGuard guard(_locks[des]);
 
-                _final_graph[des] = new_out_neighbors;
+                _graph_store->set_neighbours(des, new_out_neighbors);
             }
         }
     }
@@ -1448,21 +1334,12 @@ void Index<T, TagT, LabelT>::inter_insert(uint32_t n, std::vector<uint32_t> &pru
     inter_insert(n, pruned_list, _indexingRange, scratch);
 }
 
-template <typename T, typename TagT, typename LabelT>
-void Index<T, TagT, LabelT>::link(const IndexWriteParameters &parameters)
+template <typename T, typename TagT, typename LabelT> void Index<T, TagT, LabelT>::link()
 {
-    uint32_t num_threads = parameters.num_threads;
+    uint32_t num_threads = _indexingThreads;
     if (num_threads != 0)
         omp_set_num_threads(num_threads);
 
-    _saturate_graph = parameters.saturate_graph;
-
-    _indexingQueueSize = parameters.search_list_size;
-    _filterIndexingQueueSize = parameters.filter_list_size;
-    _indexingRange = parameters.max_degree;
-    _indexingMaxC = parameters.max_occlusion_size;
-    _indexingAlpha = parameters.alpha;
-
     /* visit_order is a vector that is initialized to the entire graph */
     std::vector<uint32_t> visit_order;
     std::vector<diskann::Neighbor> pool, tmp;
@@ -1485,11 +1362,6 @@ void Index<T, TagT, LabelT>::link(const IndexWriteParameters &parameters)
     else
         _start = calculate_entry_point();
 
-    for (size_t p = 0; p < _nd; p++)
-    {
-        _final_graph[p].reserve((size_t)(std::ceil(_indexingRange * GRAPH_SLACK_FACTOR * 1.05)));
-    }
-
     diskann::Timer link_timer;
 
 #pragma omp parallel for schedule(dynamic, 2048)
@@ -1497,24 +1369,25 @@ void Index<T, TagT, LabelT>::link(const IndexWriteParameters &parameters)
     {
         auto node = visit_order[node_ctr];
 
+        // Find and add appropriate graph edges
         ScratchStoreManager<InMemQueryScratch<T>> manager(_query_scratch);
         auto scratch = manager.scratch_space();
-
         std::vector<uint32_t> pruned_list;
         if (_filtered_index)
         {
-            search_for_point_and_prune(node, _indexingQueueSize, pruned_list, scratch, _filtered_index,
-                                       _filterIndexingQueueSize);
+            search_for_point_and_prune(node, _indexingQueueSize, pruned_list, scratch, true, _filterIndexingQueueSize);
         }
         else
         {
             search_for_point_and_prune(node, _indexingQueueSize, pruned_list, scratch);
         }
+        assert(pruned_list.size() > 0);
+
         {
             LockGuard guard(_locks[node]);
-            _final_graph[node].reserve((size_t)(_indexingRange * GRAPH_SLACK_FACTOR * 1.05));
-            _final_graph[node] = pruned_list;
-            assert(_final_graph[node].size() <= _indexingRange);
+
+            _graph_store->set_neighbours(node, pruned_list);
+            assert(_graph_store->get_neighbours((location_t)node).size() <= _indexingRange);
         }
 
         inter_insert(node, pruned_list, scratch);
@@ -1534,7 +1407,7 @@ void Index<T, TagT, LabelT>::link(const IndexWriteParameters &parameters)
     for (int64_t node_ctr = 0; node_ctr < (int64_t)(visit_order.size()); node_ctr++)
     {
         auto node = visit_order[node_ctr];
-        if (_final_graph[node].size() > _indexingRange)
+        if (_graph_store->get_neighbours((location_t)node).size() > _indexingRange)
         {
             ScratchStoreManager<InMemQueryScratch<T>> manager(_query_scratch);
             auto scratch = manager.scratch_space();
@@ -1543,7 +1416,7 @@ void Index<T, TagT, LabelT>::link(const IndexWriteParameters &parameters)
             std::vector<Neighbor> dummy_pool(0);
             std::vector<uint32_t> new_out_neighbors;
 
-            for (auto cur_nbr : _final_graph[node])
+            for (auto cur_nbr : _graph_store->get_neighbours((location_t)node))
             {
                 if (dummy_visited.find(cur_nbr) == dummy_visited.end() && cur_nbr != node)
                 {
@@ -1554,9 +1427,8 @@ void Index<T, TagT, LabelT>::link(const IndexWriteParameters &parameters)
             }
             prune_neighbors(node, dummy_pool, new_out_neighbors, scratch);
 
-            _final_graph[node].clear();
-            for (auto id : new_out_neighbors)
-                _final_graph[node].emplace_back(id);
+            _graph_store->clear_neighbours((location_t)node);
+            _graph_store->set_neighbours((location_t)node, new_out_neighbors);
         }
     }
     if (_nd > 0)
@@ -1580,7 +1452,7 @@ void Index<T, TagT, LabelT>::prune_all_neighbors(const uint32_t max_degree, cons
     {
         if ((size_t)node < _nd || (size_t)node >= _max_points)
         {
-            if (_final_graph[node].size() > range)
+            if (_graph_store->get_neighbours((location_t)node).size() > range)
             {
                 tsl::robin_set<uint32_t> dummy_visited(0);
                 std::vector<Neighbor> dummy_pool(0);
@@ -1589,7 +1461,7 @@ void Index<T, TagT, LabelT>::prune_all_neighbors(const uint32_t max_degree, cons
                 ScratchStoreManager<InMemQueryScratch<T>> manager(_query_scratch);
                 auto scratch = manager.scratch_space();
 
-                for (auto cur_nbr : _final_graph[node])
+                for (auto cur_nbr : _graph_store->get_neighbours((location_t)node))
                 {
                     if (dummy_visited.find(cur_nbr) == dummy_visited.end() && cur_nbr != node)
                     {
@@ -1600,9 +1472,8 @@ void Index<T, TagT, LabelT>::prune_all_neighbors(const uint32_t max_degree, cons
                 }
 
                 prune_neighbors((uint32_t)node, dummy_pool, range, maxc, alpha, new_out_neighbors, scratch);
-                _final_graph[node].clear();
-                for (auto id : new_out_neighbors)
-                    _final_graph[node].emplace_back(id);
+                _graph_store->clear_neighbours((location_t)node);
+                _graph_store->set_neighbours((location_t)node, new_out_neighbors);
             }
         }
     }
@@ -1613,7 +1484,7 @@ void Index<T, TagT, LabelT>::prune_all_neighbors(const uint32_t max_degree, cons
     {
         if (i < _nd || i >= _max_points)
         {
-            const std::vector<uint32_t> &pool = _final_graph[i];
+            const std::vector<uint32_t> &pool = _graph_store->get_neighbours((location_t)i);
             max = (std::max)(max, pool.size());
             min = (std::min)(min, pool.size());
             total += pool.size();
@@ -1701,8 +1572,7 @@ void Index<T, TagT, LabelT>::set_start_points_at_random(T radius, uint32_t rando
 }
 
 template <typename T, typename TagT, typename LabelT>
-void Index<T, TagT, LabelT>::build_with_data_populated(const IndexWriteParameters &parameters,
-                                                       const std::vector<TagT> &tags)
+void Index<T, TagT, LabelT>::build_with_data_populated(const std::vector<TagT> &tags)
 {
     diskann::cout << "Starting index build with " << _nd << " points... " << std::endl;
 
@@ -1726,24 +1596,25 @@ void Index<T, TagT, LabelT>::build_with_data_populated(const IndexWriteParameter
         }
     }
 
-    uint32_t index_R = parameters.max_degree;
-    uint32_t num_threads_index = parameters.num_threads;
-    uint32_t index_L = parameters.search_list_size;
-    uint32_t maxc = parameters.max_occlusion_size;
+    uint32_t index_R = _indexingRange;
+    uint32_t num_threads_index = _indexingThreads;
+    uint32_t index_L = _indexingQueueSize;
+    uint32_t maxc = _indexingMaxC;
 
     if (_query_scratch.size() == 0)
     {
-        initialize_query_scratch(5 + num_threads_index, index_L, index_L, index_R, maxc,
+        std::uint32_t default_queue_size = (std::max)(_indexingQueueSize, _filterIndexingQueueSize);
+        initialize_query_scratch(5 + num_threads_index, default_queue_size, default_queue_size, index_R, maxc,
                                  _data_store->get_aligned_dim());
     }
 
     generate_frozen_point();
-    link(parameters);
+    link();
 
     size_t max = 0, min = SIZE_MAX, total = 0, cnt = 0;
     for (size_t i = 0; i < _nd; i++)
     {
-        auto &pool = _final_graph[i];
+        auto &pool = _graph_store->get_neighbours((location_t)i);
         max = std::max(max, pool.size());
         min = std::min(min, pool.size());
         total += pool.size();
@@ -1753,17 +1624,14 @@ void Index<T, TagT, LabelT>::build_with_data_populated(const IndexWriteParameter
     diskann::cout << "Index built with degree: max:" << max << "  avg:" << (float)total / (float)(_nd + _num_frozen_pts)
                   << "  min:" << min << "  count(deg<2):" << cnt << std::endl;
 
-    _max_observed_degree = std::max((uint32_t)max, _max_observed_degree);
     _has_built = true;
 }
 template <typename T, typename TagT, typename LabelT>
-void Index<T, TagT, LabelT>::_build(const DataType &data, const size_t num_points_to_load,
-                                    const IndexWriteParameters &parameters, TagVector &tags)
+void Index<T, TagT, LabelT>::_build(const DataType &data, const size_t num_points_to_load, TagVector &tags)
 {
     try
     {
-        this->build(std::any_cast<const T *>(data), num_points_to_load, parameters,
-                    tags.get<const std::vector<TagT>>());
+        this->build(std::any_cast<const T *>(data), num_points_to_load, tags.get<const std::vector<TagT>>());
     }
     catch (const std::bad_any_cast &e)
     {
@@ -1775,8 +1643,7 @@ void Index<T, TagT, LabelT>::_build(const DataType &data, const size_t num_point
     }
 }
 template <typename T, typename TagT, typename LabelT>
-void Index<T, TagT, LabelT>::build(const T *data, const size_t num_points_to_load,
-                                   const IndexWriteParameters &parameters, const std::vector<TagT> &tags)
+void Index<T, TagT, LabelT>::build(const T *data, const size_t num_points_to_load, const std::vector<TagT> &tags)
 {
     if (num_points_to_load == 0)
     {
@@ -1795,24 +1662,13 @@ void Index<T, TagT, LabelT>::build(const T *data, const size_t num_points_to_loa
         _nd = num_points_to_load;
 
         _data_store->populate_data(data, (location_t)num_points_to_load);
-
-        // REFACTOR
-        // memcpy((char *)_data, (char *)data, _aligned_dim * _nd * sizeof(T));
-        // if (_normalize_vecs)
-        //{
-        //     for (size_t i = 0; i < num_points_to_load; i++)
-        //     {
-        //         normalize(_data + _aligned_dim * i, _aligned_dim);
-        //     }
-        // }
     }
 
-    build_with_data_populated(parameters, tags);
+    build_with_data_populated(tags);
 }
 
 template <typename T, typename TagT, typename LabelT>
-void Index<T, TagT, LabelT>::build(const char *filename, const size_t num_points_to_load,
-                                   const IndexWriteParameters &parameters, const std::vector<TagT> &tags)
+void Index<T, TagT, LabelT>::build(const char *filename, const size_t num_points_to_load, const std::vector<TagT> &tags)
 {
     // idealy this should call build_filtered_index based on params passed
 
@@ -1844,8 +1700,6 @@ void Index<T, TagT, LabelT>::build(const char *filename, const size_t num_points
                << " points, but "
                << "index can support only " << _max_points << " points as specified in constructor." << std::endl;
 
-        if (_pq_dist)
-            aligned_free(_pq_data);
         throw diskann::ANNException(stream.str(), -1, __FUNCSIG__, __FILE__, __LINE__);
     }
 
@@ -1855,8 +1709,6 @@ void Index<T, TagT, LabelT>::build(const char *filename, const size_t num_points
         stream << "ERROR: Driver requests loading " << num_points_to_load << " points and file has only "
                << file_num_points << " points." << std::endl;
 
-        if (_pq_dist)
-            aligned_free(_pq_data);
         throw diskann::ANNException(stream.str(), -1, __FUNCSIG__, __FILE__, __LINE__);
     }
 
@@ -1867,30 +1719,27 @@ void Index<T, TagT, LabelT>::build(const char *filename, const size_t num_points
                << "but file has " << file_dim << " dimension." << std::endl;
         diskann::cerr << stream.str() << std::endl;
 
-        if (_pq_dist)
-            aligned_free(_pq_data);
         throw diskann::ANNException(stream.str(), -1, __FUNCSIG__, __FILE__, __LINE__);
     }
 
+    // REFACTOR PQ TODO: We can remove this if and add a check in the InMemDataStore
+    // to not populate_data if it has been called once.
     if (_pq_dist)
     {
-        double p_val = std::min(1.0, ((double)MAX_PQ_TRAINING_SET_SIZE / (double)file_num_points));
-
-        std::string suffix = _use_opq ? "_opq" : "_pq";
-        suffix += std::to_string(_num_pq_chunks);
-        auto pq_pivots_file = std::string(filename) + suffix + "_pivots.bin";
-        auto pq_compressed_file = std::string(filename) + suffix + "_compressed.bin";
-        generate_quantized_data<T>(std::string(filename), pq_pivots_file, pq_compressed_file, _dist_metric, p_val,
-                                   _num_pq_chunks, _use_opq);
-
-        copy_aligned_data_from_file<uint8_t>(pq_compressed_file.c_str(), _pq_data, file_num_points, _num_pq_chunks,
-                                             _num_pq_chunks);
 #ifdef EXEC_ENV_OLS
-        throw ANNException("load_pq_centroid_bin should not be called when "
-                           "EXEC_ENV_OLS is defined.",
-                           -1, __FUNCSIG__, __FILE__, __LINE__);
+        std::stringstream ss;
+        ss << "PQ Build is not supported in DLVS environment (i.e. if EXEC_ENV_OLS is defined)" << std::endl;
+        diskann::cerr << ss.str() << std::endl;
+        throw ANNException(ss.str(), -1, __FUNCSIG__, __FILE__, __LINE__);
 #else
-        _pq_table.load_pq_centroid_bin(pq_pivots_file.c_str(), _num_pq_chunks);
+        // REFACTOR TODO: Both in the previous code and in the current PQDataStore,
+        // we are writing the PQ files in the same path as the input file. Now we
+        // may not have write permissions to that folder, but we will always have
+        // write permissions to the output folder. So we should write the PQ files
+        // there. The problem is that the Index class gets the output folder prefix
+        // only at the time of save(), by which time we are too late. So leaving it
+        // as-is for now.
+        _pq_data_store->populate_data(filename, 0U);
 #endif
     }
 
@@ -1901,12 +1750,11 @@ void Index<T, TagT, LabelT>::build(const char *filename, const size_t num_points
         std::unique_lock<std::shared_timed_mutex> tl(_tag_lock);
         _nd = num_points_to_load;
     }
-    build_with_data_populated(parameters, tags);
+    build_with_data_populated(tags);
 }
 
 template <typename T, typename TagT, typename LabelT>
-void Index<T, TagT, LabelT>::build(const char *filename, const size_t num_points_to_load,
-                                   const IndexWriteParameters &parameters, const char *tag_filename)
+void Index<T, TagT, LabelT>::build(const char *filename, const size_t num_points_to_load, const char *tag_filename)
 {
     std::vector<TagT> tags;
 
@@ -1945,44 +1793,37 @@ void Index<T, TagT, LabelT>::build(const char *filename, const size_t num_points
             }
         }
     }
-    build(filename, num_points_to_load, parameters, tags);
+    build(filename, num_points_to_load, tags);
 }
 
 template <typename T, typename TagT, typename LabelT>
 void Index<T, TagT, LabelT>::build(const std::string &data_file, const size_t num_points_to_load,
-                                   IndexBuildParams &build_params)
+                                   IndexFilterParams &filter_params)
 {
-    std::string labels_file_to_use = build_params.save_path_prefix + "_label_formatted.txt";
-    std::string mem_labels_int_map_file = build_params.save_path_prefix + "_labels_map.txt";
-
     size_t points_to_load = num_points_to_load == 0 ? _max_points : num_points_to_load;
 
     auto s = std::chrono::high_resolution_clock::now();
-    if (build_params.label_file == "")
+    if (filter_params.label_file == "")
     {
-        this->build(data_file.c_str(), points_to_load, build_params.index_write_params);
+        this->build(data_file.c_str(), points_to_load);
     }
     else
     {
         // TODO: this should ideally happen in save()
         uint32_t unv_label_as_num = 0;
-        convert_labels_string_to_int(build_params.label_file, labels_file_to_use, mem_labels_int_map_file,
-                                     build_params.universal_label, unv_label_as_num);
-        if (build_params.universal_label != "")
+        std::string labels_file_to_use = filter_params.save_path_prefix + "_label_formatted.txt";
+        std::string mem_labels_int_map_file = filter_params.save_path_prefix + "_labels_map.txt";
+        convert_labels_string_to_int(filter_params.label_file, labels_file_to_use, mem_labels_int_map_file,
+                                     filter_params.universal_label, unv_label_as_num);
+        if (filter_params.universal_label != "")
         {
-            LabelT unv_label_as_num = 0;
+//            LabelT unv_label_as_num = 0;
             this->set_universal_label(unv_label_as_num);
         }
-        this->build_filtered_index(data_file.c_str(), labels_file_to_use, points_to_load,
-                                   build_params.index_write_params);
+        this->build_filtered_index(data_file.c_str(), labels_file_to_use, points_to_load);
     }
     std::chrono::duration<double> diff = std::chrono::high_resolution_clock::now() - s;
     std::cout << "Indexing time: " << diff.count() << "\n";
-    // cleanup
-    if (build_params.label_file != "")
-    {
-        // clean_up_artifacts({labels_file_to_use, mem_labels_int_map_file}, {});
-    }
 }
 
 template <typename T, typename TagT, typename LabelT>
@@ -2006,11 +1847,12 @@ std::unordered_map<std::string, LabelT> Index<T, TagT, LabelT>::load_label_map(c
 }
 
 template <typename T, typename TagT, typename LabelT>
-LabelT Index<T, TagT, LabelT>::get_converted_label(const std::string &raw_label)
+LabelT Index<T, TagT, LabelT>::get_converted_label(const std::string &raw_label) const
 {
-    if (_label_map.find(raw_label) != _label_map.end())
+    auto iter = _label_map.find(raw_label);
+    if (iter != _label_map.end())
     {
-        return _label_map[raw_label];
+        return iter->second;
     }
     else if (_use_universal_label)
     {
@@ -2023,7 +1865,13 @@ LabelT Index<T, TagT, LabelT>::get_converted_label(const std::string &raw_label)
 }
 
 template <typename T, typename TagT, typename LabelT>
-bool Index<T, TagT, LabelT>::is_label_valid(const std::string& raw_label)
+bool Index<T, TagT, LabelT>::is_set_universal_label() const
+{
+    return _use_universal_label;
+}
+
+template <typename T, typename TagT, typename LabelT>
+bool Index<T, TagT, LabelT>::is_label_valid(const std::string& raw_label) const
 {
     if (_label_map.find(raw_label) != _label_map.end())
     {
@@ -2051,7 +1899,7 @@ void Index<T, TagT, LabelT>::parse_label_file(const std::string &label_file, siz
     {
         line_cnt++;
     }
-    _pts_to_labels.resize(line_cnt, std::vector<LabelT>());
+    _location_to_labels.resize(line_cnt, std::vector<LabelT>());
 
     infile.clear();
     infile.seekg(0, std::ios::beg);
@@ -2071,13 +1919,9 @@ void Index<T, TagT, LabelT>::parse_label_file(const std::string &label_file, siz
             lbls.push_back(token_as_num);
             _labels.insert(token_as_num);
         }
-        if (lbls.size() <= 0)
-        {
-            diskann::cout << "No label found";
-            exit(-1);
-        }
+
         std::sort(lbls.begin(), lbls.end());
-        _pts_to_labels[line_cnt] = lbls;
+        _location_to_labels[line_cnt] = lbls;
         line_cnt++;
     }
     num_points = (size_t)line_cnt;
@@ -2207,6 +2051,12 @@ void Index<T, TagT, LabelT>::parse_label_file_in_bitset(const std::string& label
     diskann::cout << "Identified " << _labels.size() << " distinct label(s)" << std::endl;
 }
 
+template <typename T, typename TagT, typename LabelT>
+void Index<T, TagT, LabelT>::_set_universal_label(const LabelType universal_label)
+{
+    this->set_universal_label(std::any_cast<const LabelT>(universal_label));
+}
+
 template <typename T, typename TagT, typename LabelT>
 void Index<T, TagT, LabelT>::set_universal_label(const LabelT &label)
 {
@@ -2216,19 +2066,17 @@ void Index<T, TagT, LabelT>::set_universal_label(const LabelT &label)
 
 template <typename T, typename TagT, typename LabelT>
 void Index<T, TagT, LabelT>::build_filtered_index(const char *filename, const std::string &label_file,
-                                                  const size_t num_points_to_load, IndexWriteParameters &parameters,
-                                                  const std::vector<TagT> &tags)
+                                                  const size_t num_points_to_load, const std::vector<TagT> &tags)
 {
-    _labels_file = label_file; // original label file
     _filtered_index = true;
-    _label_to_medoid_id.clear();
+    _label_to_start_id.clear();
     size_t num_points_labels = 0;
 
     parse_label_file(label_file,
                      num_points_labels); // determines medoid for each label and identifies
                                          // the points to label mapping
 
-    convert_pts_label_to_bitmask(_pts_to_labels, _bitmask_buf, _labels.size());
+    convert_pts_label_to_bitmask(_location_to_labels, _bitmask_buf, _labels.size());
 
     std::unordered_map<LabelT, std::vector<uint32_t>> label_to_points;
     std::vector<std::uint64_t> label_bitmask;
@@ -2262,6 +2110,7 @@ void Index<T, TagT, LabelT>::build_filtered_index(const char *filename, const st
                 labeled_points.emplace_back(point_id);
             }
         }
+
         label_to_points[x] = labeled_points;
     }
 
@@ -2291,11 +2140,11 @@ void Index<T, TagT, LabelT>::build_filtered_index(const char *filename, const st
                 best_medoid = cur_cnd;
             }
         }
-        _label_to_medoid_id[curr_label] = best_medoid;
+        _label_to_start_id[curr_label] = best_medoid;
         _medoid_counts[best_medoid]++;
     }
 
-    this->build(filename, num_points_to_load, parameters, tags);
+    this->build(filename, num_points_to_load, tags);
 }
 
 template <typename T, typename TagT, typename LabelT>
@@ -2356,9 +2205,9 @@ std::pair<uint32_t, uint32_t> Index<T, TagT, LabelT>::search(const T *query, con
 
     std::shared_lock<std::shared_timed_mutex> lock(_update_lock);
 
-    _distance->preprocess_query(query, _data_store->get_dims(), scratch->aligned_query());
-    auto retval =
-        iterate_to_fixed_point(scratch->aligned_query(), L, init_ids, scratch, false, unused_filter_label, true);
+    _data_store->preprocess_query(query, scratch);
+
+    auto retval = iterate_to_fixed_point(scratch, L, init_ids, false, unused_filter_label, true);
 
     NeighborPriorityQueue &best_L_nodes = scratch->best_l_nodes();
 
@@ -2403,12 +2252,12 @@ std::pair<uint32_t, uint32_t> Index<T, TagT, LabelT>::_search_with_filters(const
     if (typeid(uint64_t *) == indices.type())
     {
         auto ptr = std::any_cast<uint64_t *>(indices);
-        return this->search_with_filters(std::any_cast<T *>(query), converted_label, K, L, ptr, distances);
+        return this->search_with_filters(std::any_cast<const T *>(query), converted_label, K, L, ptr, distances);
     }
     else if (typeid(uint32_t *) == indices.type())
     {
         auto ptr = std::any_cast<uint32_t *>(indices);
-        return this->search_with_filters(std::any_cast<T *>(query), converted_label, K, L, ptr, distances);
+        return this->search_with_filters(std::any_cast<const T *>(query), converted_label, K, L, ptr, distances);
     }
     else
     {
@@ -2442,10 +2291,13 @@ std::pair<uint32_t, uint32_t> Index<T, TagT, LabelT>::search_with_filters(const
     std::vector<uint32_t> init_ids = get_init_ids();
 
     std::shared_lock<std::shared_timed_mutex> lock(_update_lock);
+    std::shared_lock<std::shared_timed_mutex> tl(_tag_lock, std::defer_lock);
+    if (_dynamic_index)
+        tl.lock();
 
-    if (_label_to_medoid_id.find(filter_label) != _label_to_medoid_id.end())
+    if (_label_to_start_id.find(filter_label) != _label_to_start_id.end())
     {
-        init_ids.emplace_back(_label_to_medoid_id[filter_label]);
+        init_ids.emplace_back(_label_to_start_id[filter_label]);
     }
     else
     {
@@ -2453,13 +2305,13 @@ std::pair<uint32_t, uint32_t> Index<T, TagT, LabelT>::search_with_filters(const
                       << std::endl; // RKNOTE: If universal label found start there
         throw diskann::ANNException("No filtered medoid found. exitting ", -1);
     }
+    if (_dynamic_index)
+        tl.unlock();
+
     filter_vec.emplace_back(filter_label);
 
-    // REFACTOR
-    // T *aligned_query = scratch->aligned_query();
-    // memcpy(aligned_query, query, _dim * sizeof(T));
-    _distance->preprocess_query(query, _data_store->get_dims(), scratch->aligned_query());
-    auto retval = iterate_to_fixed_point(scratch->aligned_query(), L, init_ids, scratch, true, filter_vec, true);
+    _data_store->preprocess_query(query, scratch);
+    auto retval = iterate_to_fixed_point(scratch, L, init_ids, true, filter_vec, true);
 
     auto best_L_nodes = scratch->best_l_nodes();
 
@@ -2468,9 +2320,8 @@ std::pair<uint32_t, uint32_t> Index<T, TagT, LabelT>::search_with_filters(const
     {
         if (best_L_nodes[i].id < _max_points)
         {
-            // safe because Index uses uint32_t ids internally
-            // and IDType will be uint32_t or uint64_t
             indices[pos] = (IdType)best_L_nodes[i].id;
+
             if (distances != nullptr)
             {
 #ifdef EXEC_ENV_OLS
@@ -2496,12 +2347,13 @@ std::pair<uint32_t, uint32_t> Index<T, TagT, LabelT>::search_with_filters(const
 
 template <typename T, typename TagT, typename LabelT>
 size_t Index<T, TagT, LabelT>::_search_with_tags(const DataType &query, const uint64_t K, const uint32_t L,
-                                                 const TagType &tags, float *distances, DataVector &res_vectors)
+                                                 const TagType &tags, float *distances, DataVector &res_vectors,
+                                                 bool use_filters, const std::string filter_label)
 {
     try
     {
         return this->search_with_tags(std::any_cast<const T *>(query), K, L, std::any_cast<TagT *>(tags), distances,
-                                      res_vectors.get<std::vector<T *>>());
+                                      res_vectors.get<std::vector<T *>>(), use_filters, filter_label);
     }
     catch (const std::bad_any_cast &e)
     {
@@ -2515,7 +2367,8 @@ size_t Index<T, TagT, LabelT>::_search_with_tags(const DataType &query, const ui
 
 template <typename T, typename TagT, typename LabelT>
 size_t Index<T, TagT, LabelT>::search_with_tags(const T *query, const uint64_t K, const uint32_t L, TagT *tags,
-                                                float *distances, std::vector<T *> &res_vectors)
+                                                float *distances, std::vector<T *> &res_vectors, bool use_filters,
+                                                const std::string filter_label)
 {
     if (K > (uint64_t)L)
     {
@@ -2535,10 +2388,22 @@ size_t Index<T, TagT, LabelT>::search_with_tags(const T *query, const uint64_t K
     std::shared_lock<std::shared_timed_mutex> ul(_update_lock);
 
     const std::vector<uint32_t> init_ids = get_init_ids();
-    const std::vector<LabelT> unused_filter_label;
 
-    _distance->preprocess_query(query, _data_store->get_dims(), scratch->aligned_query());
-    iterate_to_fixed_point(scratch->aligned_query(), L, init_ids, scratch, false, unused_filter_label, true);
+    //_distance->preprocess_query(query, _data_store->get_dims(),
+    // scratch->aligned_query());
+    _data_store->preprocess_query(query, scratch);
+    if (!use_filters)
+    {
+        const std::vector<LabelT> unused_filter_label;
+        iterate_to_fixed_point(scratch, L, init_ids, false, unused_filter_label, true);
+    }
+    else
+    {
+        std::vector<LabelT> filter_vec;
+        auto converted_label = this->get_converted_label(filter_label);
+        filter_vec.push_back(converted_label);
+        iterate_to_fixed_point(scratch, L, init_ids, true, filter_vec, true);
+    }
 
     NeighborPriorityQueue &best_L_nodes = scratch->best_l_nodes();
     assert(best_L_nodes.size() <= L);
@@ -2607,17 +2472,21 @@ template <typename T, typename TagT, typename LabelT> void Index<T, TagT, LabelT
     }
     size_t res = calculate_entry_point();
 
+    // REFACTOR PQ: Not sure if we should do this for both stores.
     if (_pq_dist)
     {
         // copy the PQ data corresponding to the point returned by
         // calculate_entry_point
-        memcpy(_pq_data + _max_points * _num_pq_chunks, _pq_data + res * _num_pq_chunks,
-               _num_pq_chunks * DIV_ROUND_UP(NUM_PQ_BITS, 8));
+        // memcpy(_pq_data + _max_points * _num_pq_chunks,
+        //       _pq_data + res * _num_pq_chunks,
+        //       _num_pq_chunks * DIV_ROUND_UP(NUM_PQ_BITS, 8));
+        _pq_data_store->copy_vectors((location_t)res, (location_t)_max_points, 1);
     }
     else
     {
         _data_store->copy_vectors((location_t)res, (location_t)_max_points, 1);
     }
+    _frozen_pts_used++;
 }
 
 template <typename T, typename TagT, typename LabelT> int Index<T, TagT, LabelT>::enable_delete()
@@ -2667,7 +2536,7 @@ inline void Index<T, TagT, LabelT>::process_delete(const tsl::robin_set<uint32_t
         std::unique_lock<non_recursive_mutex> adj_list_lock;
         if (_conc_consolidate)
             adj_list_lock = std::unique_lock<non_recursive_mutex>(_locks[loc]);
-        adj_list = _final_graph[loc];
+        adj_list = _graph_store->get_neighbours((location_t)loc);
     }
 
     bool modify = false;
@@ -2684,7 +2553,7 @@ inline void Index<T, TagT, LabelT>::process_delete(const tsl::robin_set<uint32_t
             std::unique_lock<non_recursive_mutex> ngh_lock;
             if (_conc_consolidate)
                 ngh_lock = std::unique_lock<non_recursive_mutex>(_locks[ngh]);
-            for (auto j : _final_graph[ngh])
+            for (auto j : _graph_store->get_neighbours((location_t)ngh))
                 if (j != loc && old_delete_set.find(j) == old_delete_set.end())
                     expanded_nodes_set.insert(j);
         }
@@ -2695,9 +2564,9 @@ inline void Index<T, TagT, LabelT>::process_delete(const tsl::robin_set<uint32_t
         if (expanded_nodes_set.size() <= range)
         {
             std::unique_lock<non_recursive_mutex> adj_list_lock(_locks[loc]);
-            _final_graph[loc].clear();
+            _graph_store->clear_neighbours((location_t)loc);
             for (auto &ngh : expanded_nodes_set)
-                _final_graph[loc].push_back(ngh);
+                _graph_store->add_neighbour((location_t)loc, ngh);
         }
         else
         {
@@ -2712,7 +2581,7 @@ inline void Index<T, TagT, LabelT>::process_delete(const tsl::robin_set<uint32_t
             occlude_list((uint32_t)loc, expanded_nghrs_vec, alpha, range, maxc, occlude_list_output, scratch,
                          &old_delete_set);
             std::unique_lock<non_recursive_mutex> adj_list_lock(_locks[loc]);
-            _final_graph[loc] = occlude_list_output;
+            _graph_store->set_neighbours((location_t)loc, occlude_list_output);
         }
     }
 }
@@ -2777,7 +2646,7 @@ consolidation_report Index<T, TagT, LabelT>::consolidate_deletes(const IndexWrit
     const uint32_t range = params.max_degree;
     const uint32_t maxc = params.max_occlusion_size;
     const float alpha = params.alpha;
-    const uint32_t num_threads = params.num_threads == 0 ? omp_get_num_threads() : params.num_threads;
+    const uint32_t num_threads = params.num_threads == 0 ? omp_get_num_procs() : params.num_threads;
 
     uint32_t num_calls_to_process_delete = 0;
     diskann::Timer timer;
@@ -2827,6 +2696,17 @@ template <typename T, typename TagT, typename LabelT> void Index<T, TagT, LabelT
     {
         reposition_points((uint32_t)_max_points, (uint32_t)_nd, (uint32_t)_num_frozen_pts);
         _start = (uint32_t)_nd;
+
+        if (_filtered_index && _dynamic_index)
+        {
+            //  update medoid id's as frozen points are treated as medoid
+            for (auto &[label, medoid_id] : _label_to_start_id)
+            {
+                /*  if (label == _universal_label)
+                      continue;*/
+                _label_to_start_id[label] = (uint32_t)_nd + (medoid_id - (uint32_t)_max_points);
+            }
+        }
     }
 }
 
@@ -2882,13 +2762,14 @@ template <typename T, typename TagT, typename LabelT> void Index<T, TagT, LabelT
     size_t num_dangling = 0;
     for (uint32_t old = 0; old < _max_points + _num_frozen_pts; ++old)
     {
+        // compact _final_graph
         std::vector<uint32_t> new_adj_list;
 
         if ((new_location[old] < _max_points) // If point continues to exist
             || (old >= _max_points && old < _max_points + _num_frozen_pts))
         {
-            new_adj_list.reserve(_final_graph[old].size());
-            for (auto ngh_iter : _final_graph[old])
+            new_adj_list.reserve(_graph_store->get_neighbours((location_t)old).size());
+            for (auto ngh_iter : _graph_store->get_neighbours((location_t)old))
             {
                 if (empty_locations.find(ngh_iter) != empty_locations.end())
                 {
@@ -2901,20 +2782,26 @@ template <typename T, typename TagT, typename LabelT> void Index<T, TagT, LabelT
                     new_adj_list.push_back(new_location[ngh_iter]);
                 }
             }
-            _final_graph[old].swap(new_adj_list);
+            //_graph_store->get_neighbours((location_t)old).swap(new_adj_list);
+            _graph_store->set_neighbours((location_t)old, new_adj_list);
 
             // Move the data and adj list to the correct position
             if (new_location[old] != old)
             {
                 assert(new_location[old] < old);
-                _final_graph[new_location[old]].swap(_final_graph[old]);
+                _graph_store->swap_neighbours(new_location[old], (location_t)old);
+
+                if (_filtered_index)
+                {
+                    _location_to_labels[new_location[old]].swap(_location_to_labels[old]);
+                }
 
                 _data_store->copy_vectors(old, new_location[old], 1);
             }
         }
         else
         {
-            _final_graph[old].clear();
+            _graph_store->clear_neighbours((location_t)old);
         }
     }
     diskann::cerr << "#dangling references after data compaction: " << num_dangling << std::endl;
@@ -2930,12 +2817,21 @@ template <typename T, typename TagT, typename LabelT> void Index<T, TagT, LabelT
     {
         _location_to_tag.set(iter.second, iter.first);
     }
-
+    // remove all cleared up old
     for (size_t old = _nd; old < _max_points; ++old)
     {
-        _final_graph[old].clear();
+        _graph_store->clear_neighbours((location_t)old);
+    }
+    if (_filtered_index)
+    {
+        for (size_t old = _nd; old < _max_points; old++)
+        {
+            _location_to_labels[old].clear();
+        }
     }
+
     _empty_slots.clear();
+    // mark all slots after _nd as empty
     for (auto i = _nd; i < _max_points; i++)
     {
         _empty_slots.insert((uint32_t)i);
@@ -2970,7 +2866,6 @@ template <typename T, typename TagT, typename LabelT> int Index<T, TagT, LabelT>
         location = _empty_slots.pop_any();
         _delete_set->erase(location);
     }
-
     ++_nd;
     return location;
 }
@@ -3020,10 +2915,18 @@ void Index<T, TagT, LabelT>::reposition_points(uint32_t old_location_start, uint
     // integer arithmetic rules.
     const uint32_t location_delta = new_location_start - old_location_start;
 
+    std::vector<location_t> updated_neighbours_location;
     for (uint32_t i = 0; i < _max_points + _num_frozen_pts; i++)
-        for (auto &loc : _final_graph[i])
+    {
+        auto &i_neighbours = _graph_store->get_neighbours((location_t)i);
+        std::vector<location_t> i_neighbours_copy(i_neighbours.begin(), i_neighbours.end());
+        for (auto &loc : i_neighbours_copy)
+        {
             if (loc >= old_location_start && loc < old_location_start + num_locations)
                 loc += location_delta;
+        }
+        _graph_store->set_neighbours(i, i_neighbours_copy);
+    }
 
     // The [start, end) interval which will contain obsolete points to be
     // cleared.
@@ -3038,10 +2941,14 @@ void Index<T, TagT, LabelT>::reposition_points(uint32_t old_location_start, uint
         // to avoid modifying locations that are yet to be copied.
         for (uint32_t loc_offset = 0; loc_offset < num_locations; loc_offset++)
         {
-            assert(_final_graph[new_location_start + loc_offset].empty());
-            _final_graph[new_location_start + loc_offset].swap(_final_graph[old_location_start + loc_offset]);
+            assert(_graph_store->get_neighbours(new_location_start + loc_offset).empty());
+            _graph_store->swap_neighbours(new_location_start + loc_offset, old_location_start + loc_offset);
+            if (_dynamic_index && _filtered_index)
+            {
+                _location_to_labels[new_location_start + loc_offset].swap(
+                    _location_to_labels[old_location_start + loc_offset]);
+            }
         }
-
         // If ranges are overlapping, make sure not to clear the newly copied
         // data.
         if (mem_clear_loc_start < new_location_start + num_locations)
@@ -3056,8 +2963,13 @@ void Index<T, TagT, LabelT>::reposition_points(uint32_t old_location_start, uint
         // to avoid modifying locations that are yet to be copied.
         for (uint32_t loc_offset = num_locations; loc_offset > 0; loc_offset--)
         {
-            assert(_final_graph[new_location_start + loc_offset - 1u].empty());
-            _final_graph[new_location_start + loc_offset - 1u].swap(_final_graph[old_location_start + loc_offset - 1u]);
+            assert(_graph_store->get_neighbours(new_location_start + loc_offset - 1u).empty());
+            _graph_store->swap_neighbours(new_location_start + loc_offset - 1u, old_location_start + loc_offset - 1u);
+            if (_dynamic_index && _filtered_index)
+            {
+                _location_to_labels[new_location_start + loc_offset - 1u].swap(
+                    _location_to_labels[old_location_start + loc_offset - 1u]);
+            }
         }
 
         // If ranges are overlapping, make sure not to clear the newly copied
@@ -3084,6 +2996,17 @@ template <typename T, typename TagT, typename LabelT> void Index<T, TagT, LabelT
 
     reposition_points((uint32_t)_nd, (uint32_t)_max_points, (uint32_t)_num_frozen_pts);
     _start = (uint32_t)_max_points;
+
+    // update medoid id's as frozen points are treated as medoid
+    if (_filtered_index && _dynamic_index)
+    {
+        for (auto &[label, medoid_id] : _label_to_start_id)
+        {
+            /*if (label == _universal_label)
+                continue;*/
+            _label_to_start_id[label] = (uint32_t)_max_points + (medoid_id - (uint32_t)_nd);
+        }
+    }
 }
 
 template <typename T, typename TagT, typename LabelT> void Index<T, TagT, LabelT>::resize(size_t new_max_points)
@@ -3093,7 +3016,7 @@ template <typename T, typename TagT, typename LabelT> void Index<T, TagT, LabelT
     assert(_empty_slots.size() == 0); // should not resize if there are empty slots.
 
     _data_store->resize((location_t)new_internal_points);
-    _final_graph.resize(new_internal_points);
+    _graph_store->resize_graph(new_internal_points);
     _locks = std::vector<non_recursive_mutex>(new_internal_points);
 
     if (_num_frozen_pts != 0)
@@ -3130,11 +3053,37 @@ int Index<T, TagT, LabelT>::_insert_point(const DataType &point, const TagType t
     }
 }
 
+template <typename T, typename TagT, typename LabelT>
+int Index<T, TagT, LabelT>::_insert_point(const DataType &point, const TagType tag, Labelvector &labels)
+{
+    try
+    {
+        return this->insert_point(std::any_cast<const T *>(point), std::any_cast<const TagT>(tag),
+                                  labels.get<const std::vector<LabelT>>());
+    }
+    catch (const std::bad_any_cast &anycast_e)
+    {
+        throw new ANNException("Error:Trying to insert invalid data type" + std::string(anycast_e.what()), -1);
+    }
+    catch (const std::exception &e)
+    {
+        throw new ANNException("Error:" + std::string(e.what()), -1);
+    }
+}
+
 template <typename T, typename TagT, typename LabelT>
 int Index<T, TagT, LabelT>::insert_point(const T *point, const TagT tag)
 {
+    std::vector<LabelT> no_labels{0};
+    return insert_point(point, tag, no_labels);
+}
+
+template <typename T, typename TagT, typename LabelT>
+int Index<T, TagT, LabelT>::insert_point(const T *point, const TagT tag, const std::vector<LabelT> &labels)
+{
+
     assert(_has_built);
-    if (tag == static_cast<TagT>(0))
+    if (tag == 0)
     {
         throw diskann::ANNException("Do not insert point with tag 0. That is "
                                     "reserved for points hidden "
@@ -3146,8 +3095,42 @@ int Index<T, TagT, LabelT>::insert_point(const T *point, const TagT tag)
     std::unique_lock<std::shared_timed_mutex> tl(_tag_lock);
     std::unique_lock<std::shared_timed_mutex> dl(_delete_lock);
 
-    // Find a vacant location in the data array to insert the new point
     auto location = reserve_location();
+    if (_filtered_index)
+    {
+        if (labels.empty())
+        {
+            release_location(location);
+            std::cerr << "Error: Can't insert point with tag " + get_tag_string(tag) +
+                             " . there are no labels for the point."
+                      << std::endl;
+            return -1;
+        }
+
+        _location_to_labels[location] = labels;
+
+        for (LabelT label : labels)
+        {
+            if (_labels.find(label) == _labels.end())
+            {
+                if (_frozen_pts_used >= _num_frozen_pts)
+                {
+                    throw ANNException(
+                        "Error: For dynamic filtered index, the number of frozen points should be atleast equal "
+                        "to number of unique labels.",
+                        -1);
+                }
+
+                auto fz_location = (int)(_max_points) + _frozen_pts_used; // as first _fz_point
+                _labels.insert(label);
+                _label_to_start_id[label] = (uint32_t)fz_location;
+                _location_to_labels[fz_location] = {label};
+                _data_store->set_vector((location_t)fz_location, point);
+                _frozen_pts_used++;
+            }
+        }
+    }
+
     if (location == -1)
     {
 #if EXPAND_IF_FULL
@@ -3185,12 +3168,13 @@ int Index<T, TagT, LabelT>::insert_point(const T *point, const TagT tag)
 #else
         return -1;
 #endif
-    }
+    } // cant insert as active pts >= max_pts
     dl.unlock();
 
     // Insert tag and mapping to location
     if (_enable_tags)
     {
+        // if tags are enabled and tag is already inserted. so we can't reuse that tag.
         if (_tag_to_location.find(tag) != _tag_to_location.end())
         {
             release_location(location);
@@ -3202,37 +3186,41 @@ int Index<T, TagT, LabelT>::insert_point(const T *point, const TagT tag)
     }
     tl.unlock();
 
-    _data_store->set_vector(location, point);
+    _data_store->set_vector(location, point); // update datastore
 
     // Find and add appropriate graph edges
     ScratchStoreManager<InMemQueryScratch<T>> manager(_query_scratch);
     auto scratch = manager.scratch_space();
-    std::vector<uint32_t> pruned_list;
+    std::vector<uint32_t> pruned_list; // it is the set best candidates to connect to this point
     if (_filtered_index)
     {
+        // when filtered the best_candidates will share the same label ( label_present > distance)
         search_for_point_and_prune(location, _indexingQueueSize, pruned_list, scratch, true, _filterIndexingQueueSize);
     }
     else
     {
         search_for_point_and_prune(location, _indexingQueueSize, pruned_list, scratch);
     }
+    assert(pruned_list.size() > 0); // should find atleast one neighbour (i.e frozen point acting as medoid)
+
     {
         std::shared_lock<std::shared_timed_mutex> tlock(_tag_lock, std::defer_lock);
         if (_conc_consolidate)
             tlock.lock();
 
         LockGuard guard(_locks[location]);
-        _final_graph[location].clear();
-        _final_graph[location].reserve((size_t)(_indexingRange * GRAPH_SLACK_FACTOR * 1.05));
+        _graph_store->clear_neighbours(location);
 
+        std::vector<uint32_t> neighbor_links;
         for (auto link : pruned_list)
         {
             if (_conc_consolidate)
                 if (!_location_to_tag.contains(link))
                     continue;
-            _final_graph[location].emplace_back(link);
+            neighbor_links.emplace_back(link);
         }
-        assert(_final_graph[location].size() <= _indexingRange);
+        _graph_store->set_neighbours(location, neighbor_links);
+        assert(_graph_store->get_neighbours(location).size() <= _indexingRange);
 
         if (_conc_consolidate)
             tlock.unlock();
@@ -3281,7 +3269,7 @@ template <typename T, typename TagT, typename LabelT> int Index<T, TagT, LabelT>
 
     if (_tag_to_location.find(tag) == _tag_to_location.end())
     {
-        diskann::cerr << "Delete tag not found " << tag << std::endl;
+        diskann::cerr << "Delete tag not found " << get_tag_string(tag) << std::endl;
         return -1;
     }
     assert(_tag_to_location[tag] < _max_points);
@@ -3290,7 +3278,6 @@ template <typename T, typename TagT, typename LabelT> int Index<T, TagT, LabelT>
     _delete_set->insert(location);
     _location_to_tag.erase(location);
     _tag_to_location.erase(tag);
-
     return 0;
 }
 
@@ -3364,7 +3351,7 @@ template <typename T, typename TagT, typename LabelT> void Index<T, TagT, LabelT
 
     diskann::cout << "------------------- Index object: " << (uint64_t)this << " -------------------" << std::endl;
     diskann::cout << "Number of points: " << _nd << std::endl;
-    diskann::cout << "Graph size: " << _final_graph.size() << std::endl;
+    diskann::cout << "Graph size: " << _graph_store->get_total_points() << std::endl;
     diskann::cout << "Location to tag size: " << _location_to_tag.size() << std::endl;
     diskann::cout << "Tag to location size: " << _tag_to_location.size() << std::endl;
     diskann::cout << "Number of empty slots: " << _empty_slots.size() << std::endl;
@@ -3402,7 +3389,7 @@ template <typename T, typename TagT, typename LabelT> void Index<T, TagT, LabelT
             break;
         for (auto node : bfs_sets[l])
         {
-            for (auto nghbr : _final_graph[node])
+            for (auto nghbr : _graph_store->get_neighbours((location_t)node))
             {
                 if (!visited.test(nghbr))
                 {
@@ -3416,12 +3403,6 @@ template <typename T, typename TagT, typename LabelT> void Index<T, TagT, LabelT
     delete[] bfs_sets;
 }
 
-// REFACTOR: This should be an OptimizedDataStore class, dummy impl here for
-// compiling sake template <typename T, typename TagT, typename LabelT> void
-// Index<T, TagT, LabelT>::optimize_index_layout()
-//{ // use after build or load
-//}
-
 // REFACTOR: This should be an OptimizedDataStore class
 template <typename T, typename TagT, typename LabelT> void Index<T, TagT, LabelT>::optimize_index_layout()
 { // use after build or load
@@ -3434,10 +3415,10 @@ template <typename T, typename TagT, typename LabelT> void Index<T, TagT, LabelT
     float *cur_vec = new float[_data_store->get_aligned_dim()];
     std::memset(cur_vec, 0, _data_store->get_aligned_dim() * sizeof(float));
     _data_len = (_data_store->get_aligned_dim() + 1) * sizeof(float);
-    _neighbor_len = (_max_observed_degree + 1) * sizeof(uint32_t);
+    _neighbor_len = (_graph_store->get_max_observed_degree() + 1) * sizeof(uint32_t);
     _node_size = _data_len + _neighbor_len;
     _opt_graph = new char[_node_size * _nd];
-    DistanceFastL2<T> *dist_fast = (DistanceFastL2<T> *)_data_store->get_dist_fn();
+    auto dist_fast = (DistanceFastL2<T> *)(_data_store->get_dist_fn());
     for (uint32_t i = 0; i < _nd; i++)
     {
         char *cur_node_offset = _opt_graph + i * _node_size;
@@ -3447,24 +3428,17 @@ template <typename T, typename TagT, typename LabelT> void Index<T, TagT, LabelT
         std::memcpy(cur_node_offset + sizeof(float), cur_vec, _data_len - sizeof(float));
 
         cur_node_offset += _data_len;
-        uint32_t k = (uint32_t)_final_graph[i].size();
+        uint32_t k = (uint32_t)_graph_store->get_neighbours(i).size();
         std::memcpy(cur_node_offset, &k, sizeof(uint32_t));
-        std::memcpy(cur_node_offset + sizeof(uint32_t), _final_graph[i].data(), k * sizeof(uint32_t));
-        std::vector<uint32_t>().swap(_final_graph[i]);
+        std::memcpy(cur_node_offset + sizeof(uint32_t), _graph_store->get_neighbours(i).data(), k * sizeof(uint32_t));
+        // std::vector<uint32_t>().swap(_graph_store->get_neighbours(i));
+        _graph_store->clear_neighbours(i);
     }
-    _final_graph.clear();
-    _final_graph.shrink_to_fit();
+    _graph_store->clear_graph();
+    _graph_store->resize_graph(0);
     delete[] cur_vec;
 }
 
-//  REFACTOR: once optimized layout becomes its own Data+Graph store, we should
-//  just invoke regular search
-// template <typename T, typename TagT, typename LabelT>
-// void Index<T, TagT, LabelT>::search_with_optimized_layout(const T *query,
-// size_t K, size_t L, uint32_t *indices)
-//{
-//}
-
 template <typename T, typename TagT, typename LabelT>
 void Index<T, TagT, LabelT>::_search_with_optimized_layout(const DataType &query, size_t K, size_t L, uint32_t *indices)
 {
@@ -3474,8 +3448,10 @@ void Index<T, TagT, LabelT>::_search_with_optimized_layout(const DataType &query
     }
     catch (const std::bad_any_cast &e)
     {
-        throw ANNException(
-            "Error: bad any cast while performing _search_with_optimized_layout() " + std::string(e.what()), -1);
+        throw ANNException("Error: bad any cast while performing "
+                           "_search_with_optimized_layout() " +
+                               std::string(e.what()),
+                           -1);
     }
     catch (const std::exception &e)
     {
@@ -3486,7 +3462,7 @@ void Index<T, TagT, LabelT>::_search_with_optimized_layout(const DataType &query
 template <typename T, typename TagT, typename LabelT>
 void Index<T, TagT, LabelT>::search_with_optimized_layout(const T *query, size_t K, size_t L, uint32_t *indices)
 {
-    DistanceFastL2<T> *dist_fast = (DistanceFastL2<T> *)_data_store->get_dist_fn();
+    DistanceFastL2<T> *dist_fast = (DistanceFastL2<T> *)(_data_store->get_dist_fn());
 
     NeighborPriorityQueue retset(L);
     std::vector<uint32_t> init_ids(L);
@@ -3596,6 +3572,9 @@ template DISKANN_DLLEXPORT class Index<uint8_t, int64_t, uint32_t>;
 template DISKANN_DLLEXPORT class Index<float, uint64_t, uint32_t>;
 template DISKANN_DLLEXPORT class Index<int8_t, uint64_t, uint32_t>;
 template DISKANN_DLLEXPORT class Index<uint8_t, uint64_t, uint32_t>;
+template DISKANN_DLLEXPORT class Index<float, tag_uint128, uint32_t>;
+template DISKANN_DLLEXPORT class Index<int8_t, tag_uint128, uint32_t>;
+template DISKANN_DLLEXPORT class Index<uint8_t, tag_uint128, uint32_t>;
 // Label with short int 2 byte
 template DISKANN_DLLEXPORT class Index<float, int32_t, uint16_t>;
 template DISKANN_DLLEXPORT class Index<int8_t, int32_t, uint16_t>;
@@ -3609,6 +3588,9 @@ template DISKANN_DLLEXPORT class Index<uint8_t, int64_t, uint16_t>;
 template DISKANN_DLLEXPORT class Index<float, uint64_t, uint16_t>;
 template DISKANN_DLLEXPORT class Index<int8_t, uint64_t, uint16_t>;
 template DISKANN_DLLEXPORT class Index<uint8_t, uint64_t, uint16_t>;
+template DISKANN_DLLEXPORT class Index<float, tag_uint128, uint16_t>;
+template DISKANN_DLLEXPORT class Index<int8_t, tag_uint128, uint16_t>;
+template DISKANN_DLLEXPORT class Index<uint8_t, tag_uint128, uint16_t>;
 
 template DISKANN_DLLEXPORT std::pair<uint32_t, uint32_t> Index<float, uint64_t, uint32_t>::search<uint64_t>(
     const float *query, const size_t K, const uint32_t L, uint64_t *indices, float *distances);
diff --git a/src/index_factory.cpp b/src/index_factory.cpp
index c5607f4a0..35790f8d6 100644
--- a/src/index_factory.cpp
+++ b/src/index_factory.cpp
@@ -1,4 +1,5 @@
 #include "index_factory.h"
+#include "pq_l2_distance.h"
 
 namespace diskann
 {
@@ -49,45 +50,100 @@ void IndexFactory::check_config()
     }
 }
 
+template <typename T> Distance<T> *IndexFactory::construct_inmem_distance_fn(Metric metric)
+{
+    if (metric == diskann::Metric::COSINE && std::is_same<T, float>::value)
+    {
+        return (Distance<T> *)new AVXNormalizedCosineDistanceFloat();
+    }
+    else
+    {
+        return (Distance<T> *)get_distance_function<T>(metric);
+    }
+}
+
 template <typename T>
-std::unique_ptr<AbstractDataStore<T>> IndexFactory::construct_datastore(DataStoreStrategy strategy, size_t num_points,
-                                                                        size_t dimension)
+std::shared_ptr<AbstractDataStore<T>> IndexFactory::construct_datastore(DataStoreStrategy strategy,
+                                                                        size_t total_internal_points, size_t dimension,
+                                                                        Metric metric)
 {
-    const size_t total_internal_points = num_points + _config->num_frozen_pts;
-    std::shared_ptr<Distance<T>> distance;
+    std::unique_ptr<Distance<T>> distance;
     switch (strategy)
     {
-    case MEMORY:
-        if (_config->metric == diskann::Metric::COSINE && std::is_same<T, float>::value)
-        {
-            distance.reset((Distance<T> *)new AVXNormalizedCosineDistanceFloat());
-            return std::make_unique<diskann::InMemDataStore<T>>((location_t)total_internal_points, dimension, distance);
-        }
-        else
-        {
-            distance.reset((Distance<T> *)get_distance_function<T>(_config->metric));
-            return std::make_unique<diskann::InMemDataStore<T>>((location_t)total_internal_points, dimension, distance);
-        }
-        break;
+    case DataStoreStrategy::MEMORY:
+        distance.reset(construct_inmem_distance_fn<T>(metric));
+        return std::make_shared<diskann::InMemDataStore<T>>((location_t)total_internal_points, dimension,
+                                                            std::move(distance));
     default:
         break;
     }
     return nullptr;
 }
 
-std::unique_ptr<AbstractGraphStore> IndexFactory::construct_graphstore(GraphStoreStrategy, size_t size)
+std::unique_ptr<AbstractGraphStore> IndexFactory::construct_graphstore(const GraphStoreStrategy strategy,
+                                                                       const size_t size,
+                                                                       const size_t reserve_graph_degree)
+{
+    switch (strategy)
+    {
+    case GraphStoreStrategy::MEMORY:
+        return std::make_unique<InMemGraphStore>(size, reserve_graph_degree);
+    default:
+        throw ANNException("Error : Current GraphStoreStratagy is not supported.", -1);
+    }
+}
+
+template <typename T>
+std::shared_ptr<PQDataStore<T>> IndexFactory::construct_pq_datastore(DataStoreStrategy strategy, size_t num_points,
+                                                                     size_t dimension, Metric m, size_t num_pq_chunks,
+                                                                     bool use_opq)
 {
-    return std::make_unique<InMemGraphStore>(size);
+    std::unique_ptr<Distance<T>> distance_fn;
+    std::unique_ptr<QuantizedDistance<T>> quantized_distance_fn;
+
+    quantized_distance_fn = std::move(std::make_unique<PQL2Distance<T>>((uint32_t)num_pq_chunks, use_opq));
+    switch (strategy)
+    {
+    case DataStoreStrategy::MEMORY:
+        distance_fn.reset(construct_inmem_distance_fn<T>(m));
+        return std::make_shared<diskann::PQDataStore<T>>(dimension, (location_t)(num_points), num_pq_chunks,
+                                                         std::move(distance_fn), std::move(quantized_distance_fn));
+    default:
+        // REFACTOR TODO: We do support diskPQ - so we may need to add a new class for SSDPQDataStore!
+        break;
+    }
+    return nullptr;
 }
 
 template <typename data_type, typename tag_type, typename label_type>
 std::unique_ptr<AbstractIndex> IndexFactory::create_instance()
 {
-    size_t num_points = _config->max_points;
+    size_t num_points = _config->max_points + _config->num_frozen_pts;
     size_t dim = _config->dimension;
     // auto graph_store = construct_graphstore(_config->graph_strategy, num_points);
-    auto data_store = construct_datastore<data_type>(_config->data_strategy, num_points, dim);
-    return std::make_unique<diskann::Index<data_type, tag_type, label_type>>(*_config, std::move(data_store));
+    auto data_store = construct_datastore<data_type>(_config->data_strategy, num_points, dim, _config->metric);
+    std::shared_ptr<AbstractDataStore<data_type>> pq_data_store = nullptr;
+
+    if (_config->data_strategy == DataStoreStrategy::MEMORY && _config->pq_dist_build)
+    {
+        pq_data_store =
+            construct_pq_datastore<data_type>(_config->data_strategy, num_points + _config->num_frozen_pts, dim,
+                                              _config->metric, _config->num_pq_chunks, _config->use_opq);
+    }
+    else
+    {
+        pq_data_store = data_store;
+    }
+    size_t max_reserve_degree =
+        (size_t)(defaults::GRAPH_SLACK_FACTOR * 1.05 *
+                 (_config->index_write_params == nullptr ? 0 : _config->index_write_params->max_degree));
+    std::unique_ptr<AbstractGraphStore> graph_store =
+        construct_graphstore(_config->graph_strategy, num_points + _config->num_frozen_pts, max_reserve_degree);
+
+    // REFACTOR TODO: Must construct in-memory PQDatastore if strategy == ONDISK and must construct
+    // in-mem and on-disk PQDataStore if strategy == ONDISK and diskPQ is required.
+    return std::make_unique<diskann::Index<data_type, tag_type, label_type>>(*_config, data_store,
+                                                                             std::move(graph_store), pq_data_store);
 }
 
 std::unique_ptr<AbstractIndex> IndexFactory::create_instance(const std::string &data_type, const std::string &tag_type,
@@ -147,4 +203,11 @@ std::unique_ptr<AbstractIndex> IndexFactory::create_instance(const std::string &
         throw ANNException("Error: unsupported label_type please choose from [uint/ushort]", -1);
 }
 
+// template DISKANN_DLLEXPORT std::shared_ptr<AbstractDataStore<uint8_t>> IndexFactory::construct_datastore(
+//     DataStoreStrategy stratagy, size_t num_points, size_t dimension, Metric m);
+// template DISKANN_DLLEXPORT std::shared_ptr<AbstractDataStore<int8_t>> IndexFactory::construct_datastore(
+//     DataStoreStrategy stratagy, size_t num_points, size_t dimension, Metric m);
+// template DISKANN_DLLEXPORT std::shared_ptr<AbstractDataStore<float>> IndexFactory::construct_datastore(
+//     DataStoreStrategy stratagy, size_t num_points, size_t dimension, Metric m);
+
 } // namespace diskann
diff --git a/src/linux_aligned_file_reader.cpp b/src/linux_aligned_file_reader.cpp
index 47c7cb1fb..31bf5f827 100644
--- a/src/linux_aligned_file_reader.cpp
+++ b/src/linux_aligned_file_reader.cpp
@@ -147,10 +147,14 @@ void LinuxAlignedFileReader::register_thread()
     if (ret != 0)
     {
         lk.unlock();
-        assert(errno != EAGAIN);
-        assert(errno != ENOMEM);
-        std::cerr << "io_setup() failed; returned " << ret << ", errno=" << errno << ":" << ::strerror(errno)
-                  << std::endl;
+        if (ret == -EAGAIN)
+        {
+            std::cerr << "io_setup() failed with EAGAIN: Consider increasing /proc/sys/fs/aio-max-nr" << std::endl;
+        }
+        else
+        {
+            std::cerr << "io_setup() failed; returned " << ret << ": " << ::strerror(-ret) << std::endl;
+        }
     }
     else
     {
diff --git a/src/natural_number_map.cpp b/src/natural_number_map.cpp
index 9050831a2..a996dcf75 100644
--- a/src/natural_number_map.cpp
+++ b/src/natural_number_map.cpp
@@ -5,6 +5,7 @@
 #include <boost/dynamic_bitset.hpp>
 
 #include "natural_number_map.h"
+#include "tag_uint128.h"
 
 namespace diskann
 {
@@ -111,4 +112,5 @@ template class natural_number_map<uint32_t, int32_t>;
 template class natural_number_map<uint32_t, uint32_t>;
 template class natural_number_map<uint32_t, int64_t>;
 template class natural_number_map<uint32_t, uint64_t>;
+template class natural_number_map<uint32_t, tag_uint128>;
 } // namespace diskann
diff --git a/src/partition.cpp b/src/partition.cpp
index 2d46f9faf..570d45c7d 100644
--- a/src/partition.cpp
+++ b/src/partition.cpp
@@ -11,7 +11,7 @@
 #include "tsl/robin_map.h"
 #include "tsl/robin_set.h"
 
-#if defined(RELEASE_UNUSED_TCMALLOC_MEMORY_AT_CHECKPOINTS) && defined(DISKANN_BUILD)
+#if defined(DISKANN_RELEASE_UNUSED_TCMALLOC_MEMORY_AT_CHECKPOINTS) && defined(DISKANN_BUILD)
 #include "gperftools/malloc_extension.h"
 #endif
 
diff --git a/src/pq.cpp b/src/pq.cpp
index 86c68ce0a..d2b545c79 100644
--- a/src/pq.cpp
+++ b/src/pq.cpp
@@ -2,7 +2,9 @@
 // Licensed under the MIT license.
 
 #include "mkl.h"
-
+#if defined(DISKANN_RELEASE_UNUSED_TCMALLOC_MEMORY_AT_CHECKPOINTS) && defined(DISKANN_BUILD)
+#include "gperftools/malloc_extension.h"
+#endif
 #include "pq.h"
 #include "partition.h"
 #include "math_utils.h"
@@ -133,11 +135,13 @@ void FixedChunkPQTable::load_pq_centroid_bin(const char *pq_table_file, size_t n
     diskann::cout << "Loaded PQ Pivots: #ctrs: " << NUM_PQ_CENTROIDS << ", #dims: " << this->ndims
                   << ", #chunks: " << this->n_chunks << std::endl;
 
-    if (file_exists(rotmat_file))
-    {
 #ifdef EXEC_ENV_OLS
+    if (files.fileExists(rotmat_file))
+    {
         diskann::load_bin<float>(files, rotmat_file, (float *&)rotmat_tr, nr, nc);
 #else
+    if (file_exists(rotmat_file))
+    {
         diskann::load_bin<float>(rotmat_file, rotmat_tr, nr, nc);
 #endif
         if (nr != this->ndims || nc != this->ndims)
@@ -340,6 +344,65 @@ void pq_dist_lookup(const uint8_t *pq_ids, const size_t n_pts, const size_t pq_n
     }
 }
 
+// generate_pq_pivots_simplified is a simplified version of generate_pq_pivots.
+// Input is provided in the in-memory buffer train_data.
+// Output is stored in the in-memory buffer pivot_data_vector.
+// Simplification is based on the following assumptions:
+//   dim % num_pq_chunks == 0
+//   num_centers == 256 by default
+//   KMEANS_ITERS_FOR_PQ == 15 by default
+//   make_zero_mean is false by default.
+// These assumptions allow to make the function much simpler and avoid storing
+// array of chunk_offsets and centroids.
+// The compiler pragma for multi-threading support is removed from this implementation
+// for the purpose of integration into systems that strictly control resource allocation.
+int generate_pq_pivots_simplified(const float *train_data, size_t num_train, size_t dim, size_t num_pq_chunks,
+                                  std::vector<float> &pivot_data_vector)
+{
+    if (num_pq_chunks > dim || dim % num_pq_chunks != 0)
+    {
+        return -1;
+    }
+
+    const size_t num_centers = 256;
+    const size_t cur_chunk_size = dim / num_pq_chunks;
+    const uint32_t KMEANS_ITERS_FOR_PQ = 15;
+
+    pivot_data_vector.resize(num_centers * dim);
+    std::vector<float> cur_pivot_data_vector(num_centers * cur_chunk_size);
+    std::vector<float> cur_data_vector(num_train * cur_chunk_size);
+    std::vector<uint32_t> closest_center_vector(num_train);
+
+    float *pivot_data = &pivot_data_vector[0];
+    float *cur_pivot_data = &cur_pivot_data_vector[0];
+    float *cur_data = &cur_data_vector[0];
+    uint32_t *closest_center = &closest_center_vector[0];
+
+    for (size_t i = 0; i < num_pq_chunks; i++)
+    {
+        size_t chunk_offset = cur_chunk_size * i;
+
+        for (int32_t j = 0; j < num_train; j++)
+        {
+            std::memcpy(cur_data + j * cur_chunk_size, train_data + j * dim + chunk_offset,
+                        cur_chunk_size * sizeof(float));
+        }
+
+        kmeans::kmeanspp_selecting_pivots(cur_data, num_train, cur_chunk_size, cur_pivot_data, num_centers);
+
+        kmeans::run_lloyds(cur_data, num_train, cur_chunk_size, cur_pivot_data, num_centers, KMEANS_ITERS_FOR_PQ, NULL,
+                           closest_center);
+
+        for (uint64_t j = 0; j < num_centers; j++)
+        {
+            std::memcpy(pivot_data + j * dim + chunk_offset, cur_pivot_data + j * cur_chunk_size,
+                        cur_chunk_size * sizeof(float));
+        }
+    }
+
+    return 0;
+}
+
 // given training data in train_data of dimensions num_train * dim, generate
 // PQ pivots using k-means algorithm to partition the co-ordinates into
 // num_pq_chunks (if it divides dimension, else rounded) chunks, and runs
@@ -708,6 +771,75 @@ int generate_opq_pivots(const float *passed_train_data, size_t num_train, uint32
     return 0;
 }
 
+// generate_pq_data_from_pivots_simplified is a simplified version of generate_pq_data_from_pivots.
+// Input is provided in the in-memory buffers data and pivot_data.
+// Output is stored in the in-memory buffer pq.
+// Simplification is based on the following assumptions:
+//   supporting only float data type
+//   dim % num_pq_chunks == 0, which results in a fixed chunk_size
+//   num_centers == 256 by default
+//   make_zero_mean is false by default.
+// These assumptions allow to make the function much simpler and avoid using
+// array of chunk_offsets and centroids.
+// The compiler pragma for multi-threading support is removed from this implementation
+// for the purpose of integration into systems that strictly control resource allocation.
+int generate_pq_data_from_pivots_simplified(const float *data, const size_t num, const float *pivot_data,
+                                            const size_t pivots_num, const size_t dim, const size_t num_pq_chunks,
+                                            std::vector<uint8_t> &pq)
+{
+    if (num_pq_chunks == 0 || num_pq_chunks > dim || dim % num_pq_chunks != 0)
+    {
+        return -1;
+    }
+
+    const size_t num_centers = 256;
+    const size_t chunk_size = dim / num_pq_chunks;
+
+    if (pivots_num != num_centers * dim)
+    {
+        return -1;
+    }
+
+    pq.resize(num * num_pq_chunks);
+
+    std::vector<float> cur_pivot_vector(num_centers * chunk_size);
+    std::vector<float> cur_data_vector(num * chunk_size);
+    std::vector<uint32_t> closest_center_vector(num);
+
+    float *cur_pivot_data = &cur_pivot_vector[0];
+    float *cur_data = &cur_data_vector[0];
+    uint32_t *closest_center = &closest_center_vector[0];
+
+    for (size_t i = 0; i < num_pq_chunks; i++)
+    {
+        const size_t chunk_offset = chunk_size * i;
+
+        for (int j = 0; j < num_centers; j++)
+        {
+            std::memcpy(cur_pivot_data + j * chunk_size, pivot_data + j * dim + chunk_offset,
+                        chunk_size * sizeof(float));
+        }
+
+        for (int j = 0; j < num; j++)
+        {
+            for (size_t k = 0; k < chunk_size; k++)
+            {
+                cur_data[j * chunk_size + k] = data[j * dim + chunk_offset + k];
+            }
+        }
+
+        math_utils::compute_closest_centers(cur_data, num, chunk_size, cur_pivot_data, num_centers, 1, closest_center);
+
+        for (int j = 0; j < num; j++)
+        {
+            assert(closest_center[j] < num_centers);
+            pq[j * num_pq_chunks + i] = closest_center[j];
+        }
+    }
+
+    return 0;
+}
+
 // streams the base file (data_file), and computes the closest centers in each
 // chunk to generate the compressed data_file and stores it in
 // pq_compressed_vectors_path.
@@ -921,7 +1053,7 @@ int generate_pq_data_from_pivots(const std::string &data_file, uint32_t num_cent
     }
 // Gopal. Splitting diskann_dll into separate DLLs for search and build.
 // This code should only be available in the "build" DLL.
-#if defined(RELEASE_UNUSED_TCMALLOC_MEMORY_AT_CHECKPOINTS) && defined(DISKANN_BUILD)
+#if defined(DISKANN_RELEASE_UNUSED_TCMALLOC_MEMORY_AT_CHECKPOINTS) && defined(DISKANN_BUILD)
     MallocExtension::instance()->ReleaseFreeMemory();
 #endif
     compressed_file_writer.close();
diff --git a/src/pq_data_store.cpp b/src/pq_data_store.cpp
new file mode 100644
index 000000000..2136c71e2
--- /dev/null
+++ b/src/pq_data_store.cpp
@@ -0,0 +1,260 @@
+#include <exception>
+
+#include "pq_data_store.h"
+#include "pq.h"
+#include "pq_scratch.h"
+#include "utils.h"
+#include "distance.h"
+
+namespace diskann
+{
+
+// REFACTOR TODO: Assuming that num_pq_chunks is known already. Must verify if
+// this is true.
+template <typename data_t>
+PQDataStore<data_t>::PQDataStore(size_t dim, location_t num_points, size_t num_pq_chunks,
+                                 std::unique_ptr<Distance<data_t>> distance_fn,
+                                 std::unique_ptr<QuantizedDistance<data_t>> pq_distance_fn)
+    : AbstractDataStore<data_t>(num_points, dim), _quantized_data(nullptr), _num_chunks(num_pq_chunks),
+      _distance_metric(distance_fn->get_metric())
+{
+    if (num_pq_chunks > dim)
+    {
+        throw diskann::ANNException("ERROR: num_pq_chunks > dim", -1, __FUNCSIG__, __FILE__, __LINE__);
+    }
+    _distance_fn = std::move(distance_fn);
+    _pq_distance_fn = std::move(pq_distance_fn);
+}
+
+template <typename data_t> PQDataStore<data_t>::~PQDataStore()
+{
+    if (_quantized_data != nullptr)
+    {
+        aligned_free(_quantized_data);
+        _quantized_data = nullptr;
+    }
+}
+
+template <typename data_t> location_t PQDataStore<data_t>::load(const std::string &filename)
+{
+    return load_impl(filename);
+}
+template <typename data_t> size_t PQDataStore<data_t>::save(const std::string &filename, const location_t num_points)
+{
+    return diskann::save_bin(filename, _quantized_data, this->capacity(), _num_chunks, 0);
+}
+
+template <typename data_t> size_t PQDataStore<data_t>::get_aligned_dim() const
+{
+    return this->get_dims();
+}
+
+// Populate quantized data from regular data.
+template <typename data_t> void PQDataStore<data_t>::populate_data(const data_t *vectors, const location_t num_pts)
+{
+    throw std::logic_error("Not implemented yet");
+}
+
+template <typename data_t> void PQDataStore<data_t>::populate_data(const std::string &filename, const size_t offset)
+{
+    if (_quantized_data != nullptr)
+    {
+        aligned_free(_quantized_data);
+    }
+
+    uint64_t file_num_points = 0, file_dim = 0;
+    get_bin_metadata(filename, file_num_points, file_dim, offset);
+    this->_capacity = (location_t)file_num_points;
+    this->_dim = file_dim;
+
+    double p_val = std::min(1.0, ((double)MAX_PQ_TRAINING_SET_SIZE / (double)file_num_points));
+
+    auto pivots_file = _pq_distance_fn->get_pivot_data_filename(filename);
+    auto compressed_file = _pq_distance_fn->get_quantized_vectors_filename(filename);
+
+    generate_quantized_data<data_t>(filename, pivots_file, compressed_file, _distance_metric, p_val, _num_chunks,
+                                    _pq_distance_fn->is_opq());
+
+    // REFACTOR TODO: Not sure of the alignment. Just copying from index.cpp
+    alloc_aligned(((void **)&_quantized_data), file_num_points * _num_chunks * sizeof(uint8_t), 1);
+    copy_aligned_data_from_file<uint8_t>(compressed_file.c_str(), _quantized_data, file_num_points, _num_chunks,
+                                         _num_chunks);
+#ifdef EXEC_ENV_OLS
+    throw ANNException("load_pq_centroid_bin should not be called when "
+                       "EXEC_ENV_OLS is defined.",
+                       -1, __FUNCSIG__, __FILE__, __LINE__);
+#else
+    _pq_distance_fn->load_pivot_data(pivots_file.c_str(), _num_chunks);
+#endif
+}
+
+template <typename data_t>
+void PQDataStore<data_t>::extract_data_to_bin(const std::string &filename, const location_t num_pts)
+{
+    throw std::logic_error("Not implemented yet");
+}
+
+template <typename data_t> void PQDataStore<data_t>::get_vector(const location_t i, data_t *target) const
+{
+    // REFACTOR TODO: Should we inflate the compressed vector here?
+    if (i < this->capacity())
+    {
+        throw std::logic_error("Not implemented yet.");
+    }
+    else
+    {
+        std::stringstream ss;
+        ss << "Requested vector " << i << " but only  " << this->capacity() << " vectors are present";
+        throw diskann::ANNException(ss.str(), -1);
+    }
+}
+template <typename data_t> void PQDataStore<data_t>::set_vector(const location_t i, const data_t *const vector)
+{
+    // REFACTOR TODO: Should we accept a normal vector and compress here?
+    // memcpy (_data + i * _num_chunks, vector, _num_chunks * sizeof(data_t));
+    throw std::logic_error("Not implemented yet");
+}
+
+template <typename data_t> void PQDataStore<data_t>::prefetch_vector(const location_t loc) const
+{
+    const uint8_t *ptr = _quantized_data + ((size_t)loc) * _num_chunks * sizeof(data_t);
+    diskann::prefetch_vector((const char *)ptr, _num_chunks * sizeof(data_t));
+}
+
+template <typename data_t>
+void PQDataStore<data_t>::move_vectors(const location_t old_location_start, const location_t new_location_start,
+                                       const location_t num_points)
+{
+    // REFACTOR TODO: Moving vectors is only for in-mem fresh.
+    throw std::logic_error("Not implemented yet");
+}
+
+template <typename data_t>
+void PQDataStore<data_t>::copy_vectors(const location_t from_loc, const location_t to_loc, const location_t num_points)
+{
+    // REFACTOR TODO: Is the number of bytes correct?
+    memcpy(_quantized_data + to_loc * _num_chunks, _quantized_data + from_loc * _num_chunks, _num_chunks * num_points);
+}
+
+// REFACTOR TODO: Currently, we take aligned_query as parameter, but this
+// function should also do the alignment.
+template <typename data_t>
+void PQDataStore<data_t>::preprocess_query(const data_t *aligned_query, AbstractScratch<data_t> *scratch) const
+{
+    if (scratch == nullptr)
+    {
+        throw diskann::ANNException("Scratch space is null", -1);
+    }
+
+    PQScratch<data_t> *pq_scratch = scratch->pq_scratch();
+
+    if (pq_scratch == nullptr)
+    {
+        throw diskann::ANNException("PQScratch space has not been set in the scratch object.", -1);
+    }
+
+    _pq_distance_fn->preprocess_query(aligned_query, (location_t)this->get_dims(), *pq_scratch);
+}
+
+template <typename data_t> float PQDataStore<data_t>::get_distance(const data_t *query, const location_t loc) const
+{
+    throw std::logic_error("Not implemented yet");
+}
+
+template <typename data_t> float PQDataStore<data_t>::get_distance(const location_t loc1, const location_t loc2) const
+{
+    throw std::logic_error("Not implemented yet");
+}
+
+template <typename data_t>
+void PQDataStore<data_t>::get_distance(const data_t *preprocessed_query, const location_t *locations,
+                                       const uint32_t location_count, float *distances,
+                                       AbstractScratch<data_t> *scratch_space) const
+{
+    if (scratch_space == nullptr)
+    {
+        throw diskann::ANNException("Scratch space is null", -1);
+    }
+    PQScratch<data_t> *pq_scratch = scratch_space->pq_scratch();
+    if (pq_scratch == nullptr)
+    {
+        throw diskann::ANNException("PQScratch not set in scratch space.", -1);
+    }
+    diskann::aggregate_coords(locations, location_count, _quantized_data, this->_num_chunks,
+                              pq_scratch->aligned_pq_coord_scratch);
+    _pq_distance_fn->preprocessed_distance(*pq_scratch, location_count, distances);
+}
+
+template <typename data_t>
+void PQDataStore<data_t>::get_distance(const data_t *preprocessed_query, const std::vector<location_t> &ids,
+                                       std::vector<float> &distances, AbstractScratch<data_t> *scratch_space) const
+{
+    if (scratch_space == nullptr)
+    {
+        throw diskann::ANNException("Scratch space is null", -1);
+    }
+    PQScratch<data_t> *pq_scratch = scratch_space->pq_scratch();
+    if (pq_scratch == nullptr)
+    {
+        throw diskann::ANNException("PQScratch not set in scratch space.", -1);
+    }
+    diskann::aggregate_coords(ids, _quantized_data, this->_num_chunks, pq_scratch->aligned_pq_coord_scratch);
+    _pq_distance_fn->preprocessed_distance(*pq_scratch, (location_t)ids.size(), distances);
+}
+
+template <typename data_t> location_t PQDataStore<data_t>::calculate_medoid() const
+{
+    // REFACTOR TODO: Must calculate this just like we do with data store.
+    size_t r = (size_t)rand() * (size_t)RAND_MAX + (size_t)rand();
+    return (uint32_t)(r % (size_t)this->capacity());
+}
+
+template <typename data_t> size_t PQDataStore<data_t>::get_alignment_factor() const
+{
+    return 1;
+}
+
+template <typename data_t> Distance<data_t> *PQDataStore<data_t>::get_dist_fn() const
+{
+    return _distance_fn.get();
+}
+
+template <typename data_t> location_t PQDataStore<data_t>::load_impl(const std::string &file_prefix)
+{
+    if (_quantized_data != nullptr)
+    {
+        aligned_free(_quantized_data);
+    }
+    auto quantized_vectors_file = _pq_distance_fn->get_quantized_vectors_filename(file_prefix);
+
+    size_t num_points;
+    load_aligned_bin(quantized_vectors_file, _quantized_data, num_points, _num_chunks, _num_chunks);
+    this->_capacity = (location_t)num_points;
+
+    auto pivots_file = _pq_distance_fn->get_pivot_data_filename(file_prefix);
+    _pq_distance_fn->load_pivot_data(pivots_file, _num_chunks);
+
+    return this->_capacity;
+}
+
+template <typename data_t> location_t PQDataStore<data_t>::expand(const location_t new_size)
+{
+    throw std::logic_error("Not implemented yet");
+}
+
+template <typename data_t> location_t PQDataStore<data_t>::shrink(const location_t new_size)
+{
+    throw std::logic_error("Not implemented yet");
+}
+
+#ifdef EXEC_ENV_OLS
+template <typename data_t> location_t PQDataStore<data_t>::load_impl(AlignedFileReader &reader)
+{
+}
+#endif
+
+template DISKANN_DLLEXPORT class PQDataStore<int8_t>;
+template DISKANN_DLLEXPORT class PQDataStore<float>;
+template DISKANN_DLLEXPORT class PQDataStore<uint8_t>;
+
+} // namespace diskann
\ No newline at end of file
diff --git a/src/pq_flash_index.cpp b/src/pq_flash_index.cpp
index 592f2d8b3..7e2bda4e2 100644
--- a/src/pq_flash_index.cpp
+++ b/src/pq_flash_index.cpp
@@ -4,6 +4,8 @@
 #include "common_includes.h"
 
 #include "timer.h"
+#include "pq.h"
+#include "pq_scratch.h"
 #include "pq_flash_index.h"
 #include "cosine_similarity.h"
 #include <limits>
@@ -18,40 +20,29 @@
 #define READ_U32(stream, val) stream.read((char *)&val, sizeof(uint32_t))
 #define READ_UNSIGNED(stream, val) stream.read((char *)&val, sizeof(unsigned))
 
-// sector # on disk where node_id is present with in the graph part
-#define NODE_SECTOR_NO(node_id) (((uint64_t)(node_id)) / nnodes_per_sector + 1)
-
-// obtains region of sector containing node
-#define OFFSET_TO_NODE(sector_buf, node_id)                                                                            \
-    ((char *)sector_buf + (((uint64_t)node_id) % nnodes_per_sector) * max_node_len)
-
-// returns region of `node_buf` containing [NNBRS][NBR_ID(uint32_t)]
-#define OFFSET_TO_NODE_NHOOD(node_buf) (unsigned *)((char *)node_buf + disk_bytes_per_point)
-
-// returns region of `node_buf` containing [COORD(T)]
-#define OFFSET_TO_NODE_COORDS(node_buf) (T *)(node_buf)
-
 // sector # beyond the end of graph where data for id is present for reordering
-#define VECTOR_SECTOR_NO(id) (((uint64_t)(id)) / nvecs_per_sector + reorder_data_start_sector)
+#define VECTOR_SECTOR_NO(id) (((uint64_t)(id)) / _nvecs_per_sector + _reorder_data_start_sector)
 
 // sector # beyond the end of graph where data for id is present for reordering
-#define VECTOR_SECTOR_OFFSET(id) ((((uint64_t)(id)) % nvecs_per_sector) * data_dim * sizeof(float))
+#define VECTOR_SECTOR_OFFSET(id) ((((uint64_t)(id)) % _nvecs_per_sector) * _data_dim * sizeof(float))
 
 namespace diskann
 {
 
 template <typename T, typename LabelT>
 PQFlashIndex<T, LabelT>::PQFlashIndex(std::shared_ptr<AlignedFileReader> &fileReader, diskann::Metric m)
-    : reader(fileReader), metric(m), thread_data(nullptr)
+    : reader(fileReader), metric(m), _thread_data(nullptr)
 {
+    diskann::Metric metric_to_invoke = m;
     if (m == diskann::Metric::COSINE || m == diskann::Metric::INNER_PRODUCT)
     {
         if (std::is_floating_point<T>::value)
         {
-            diskann::cout << "Cosine metric chosen for (normalized) float data."
-                             "Changing distance to L2 to boost accuracy."
+            diskann::cout << "Since data is floating point, we assume that it has been appropriately pre-processed "
+                             "(normalization for cosine, and convert-to-l2 by adding extra dimension for MIPS). So we "
+                             "shall invoke an l2 distance function."
                           << std::endl;
-            metric = diskann::Metric::L2;
+            metric_to_invoke = diskann::Metric::L2;
         }
         else
         {
@@ -61,8 +52,8 @@ PQFlashIndex<T, LabelT>::PQFlashIndex(std::shared_ptr<AlignedFileReader> &fileRe
         }
     }
 
-    this->dist_cmp.reset(diskann::get_distance_function<T>(metric));
-    this->dist_cmp_float.reset(diskann::get_distance_function<float>(metric));
+    this->_dist_cmp.reset(diskann::get_distance_function<T>(metric_to_invoke));
+    this->_dist_cmp_float.reset(diskann::get_distance_function<float>(metric_to_invoke));
 }
 
 template <typename T, typename LabelT> PQFlashIndex<T, LabelT>::~PQFlashIndex()
@@ -74,19 +65,19 @@ template <typename T, typename LabelT> PQFlashIndex<T, LabelT>::~PQFlashIndex()
     }
 #endif
 
-    if (centroid_data != nullptr)
-        aligned_free(centroid_data);
+    if (_centroid_data != nullptr)
+        aligned_free(_centroid_data);
     // delete backing bufs for nhood and coord cache
-    if (nhood_cache_buf != nullptr)
+    if (_nhood_cache_buf != nullptr)
     {
-        delete[] nhood_cache_buf;
-        diskann::aligned_free(coord_cache_buf);
+        delete[] _nhood_cache_buf;
+        diskann::aligned_free(_coord_cache_buf);
     }
 
-    if (load_flag)
+    if (_load_flag)
     {
         diskann::cout << "Clearing scratch" << std::endl;
-        ScratchStoreManager<SSDThreadData<T>> manager(this->thread_data);
+        ScratchStoreManager<SSDThreadData<T>> manager(this->_thread_data);
         manager.destroy();
         this->reader->deregister_all_threads();
         reader->close();
@@ -95,11 +86,40 @@ template <typename T, typename LabelT> PQFlashIndex<T, LabelT>::~PQFlashIndex()
     {
         delete[] _pts_to_label_offsets;
     }
-
+    if (_pts_to_label_counts != nullptr)
+    {
+        delete[] _pts_to_label_counts;
+    }
     if (_pts_to_labels != nullptr)
     {
         delete[] _pts_to_labels;
     }
+    if (_medoids != nullptr)
+    {
+        delete[] _medoids;
+    }
+}
+
+template <typename T, typename LabelT> inline uint64_t PQFlashIndex<T, LabelT>::get_node_sector(uint64_t node_id)
+{
+    return 1 + (_nnodes_per_sector > 0 ? node_id / _nnodes_per_sector
+                                       : node_id * DIV_ROUND_UP(_max_node_len, defaults::SECTOR_LEN));
+}
+
+template <typename T, typename LabelT>
+inline char *PQFlashIndex<T, LabelT>::offset_to_node(char *sector_buf, uint64_t node_id)
+{
+    return sector_buf + (_nnodes_per_sector == 0 ? 0 : (node_id % _nnodes_per_sector) * _max_node_len);
+}
+
+template <typename T, typename LabelT> inline uint32_t *PQFlashIndex<T, LabelT>::offset_to_node_nhood(char *node_buf)
+{
+    return (unsigned *)(node_buf + _disk_bytes_per_point);
+}
+
+template <typename T, typename LabelT> inline T *PQFlashIndex<T, LabelT>::offset_to_node_coords(char *node_buf)
+{
+    return (T *)(node_buf);
 }
 
 template <typename T, typename LabelT>
@@ -112,13 +132,77 @@ void PQFlashIndex<T, LabelT>::setup_thread_data(uint64_t nthreads, uint64_t visi
     {
 #pragma omp critical
         {
-            SSDThreadData<T> *data = new SSDThreadData<T>(this->aligned_dim, visited_reserve);
+            SSDThreadData<T> *data = new SSDThreadData<T>(this->_aligned_dim, visited_reserve);
             this->reader->register_thread();
             data->ctx = this->reader->get_ctx();
-            this->thread_data.push(data);
+            this->_thread_data.push(data);
         }
     }
-    load_flag = true;
+    _load_flag = true;
+}
+
+template <typename T, typename LabelT>
+std::vector<bool> PQFlashIndex<T, LabelT>::read_nodes(const std::vector<uint32_t> &node_ids,
+                                                      std::vector<T *> &coord_buffers,
+                                                      std::vector<std::pair<uint32_t, uint32_t *>> &nbr_buffers)
+{
+    std::vector<AlignedRead> read_reqs;
+    std::vector<bool> retval(node_ids.size(), true);
+
+    char *buf = nullptr;
+    auto num_sectors = _nnodes_per_sector > 0 ? 1 : DIV_ROUND_UP(_max_node_len, defaults::SECTOR_LEN);
+    alloc_aligned((void **)&buf, node_ids.size() * num_sectors * defaults::SECTOR_LEN, defaults::SECTOR_LEN);
+
+    // create read requests
+    for (size_t i = 0; i < node_ids.size(); ++i)
+    {
+        auto node_id = node_ids[i];
+
+        AlignedRead read;
+        read.len = num_sectors * defaults::SECTOR_LEN;
+        read.buf = buf + i * num_sectors * defaults::SECTOR_LEN;
+        read.offset = get_node_sector(node_id) * defaults::SECTOR_LEN;
+        read_reqs.push_back(read);
+    }
+
+    // borrow thread data and issue reads
+    ScratchStoreManager<SSDThreadData<T>> manager(this->_thread_data);
+    auto this_thread_data = manager.scratch_space();
+    IOContext &ctx = this_thread_data->ctx;
+    reader->read(read_reqs, ctx);
+
+    // copy reads into buffers
+    for (uint32_t i = 0; i < read_reqs.size(); i++)
+    {
+#if defined(_WINDOWS) && defined(USE_BING_INFRA) // this block is to handle failed reads in
+                                                 // production settings
+        if ((*ctx.m_pRequestsStatus)[i] != IOContext::READ_SUCCESS)
+        {
+            retval[i] = false;
+            continue;
+        }
+#endif
+
+        char *node_buf = offset_to_node((char *)read_reqs[i].buf, node_ids[i]);
+
+        if (coord_buffers[i] != nullptr)
+        {
+            T *node_coords = offset_to_node_coords(node_buf);
+            memcpy(coord_buffers[i], node_coords, _disk_bytes_per_point);
+        }
+
+        if (nbr_buffers[i].second != nullptr)
+        {
+            uint32_t *node_nhood = offset_to_node_nhood(node_buf);
+            auto num_nbrs = *node_nhood;
+            nbr_buffers[i].first = num_nbrs;
+            memcpy(nbr_buffers[i].second, node_nhood + 1, num_nbrs * sizeof(uint32_t));
+        }
+    }
+
+    aligned_free(buf);
+
+    return retval;
 }
 
 template <typename T, typename LabelT> void PQFlashIndex<T, LabelT>::load_cache_list(std::vector<uint32_t> &node_list)
@@ -127,69 +211,48 @@ template <typename T, typename LabelT> void PQFlashIndex<T, LabelT>::load_cache_
     size_t num_cached_nodes = node_list.size();
 
     // borrow thread data
-    ScratchStoreManager<SSDThreadData<T>> manager(this->thread_data);
-    auto this_thread_data = manager.scratch_space();
-    IOContext &ctx = this_thread_data->ctx;
+    //ScratchStoreManager<SSDThreadData<T>> manager(this->_thread_data);
+    //auto this_thread_data = manager.scratch_space();
+    //IOContext &ctx = this_thread_data->ctx;
 
-    nhood_cache_buf = new uint32_t[num_cached_nodes * (max_degree + 1)];
-    memset(nhood_cache_buf, 0, num_cached_nodes * (max_degree + 1));
+    // Allocate space for neighborhood cache
+    _nhood_cache_buf = new uint32_t[num_cached_nodes * (_max_degree + 1)];
+    memset(_nhood_cache_buf, 0, num_cached_nodes * (_max_degree + 1));
 
-    size_t coord_cache_buf_len = num_cached_nodes * aligned_dim;
-    diskann::alloc_aligned((void **)&coord_cache_buf, coord_cache_buf_len * sizeof(T), 8 * sizeof(T));
-    memset(coord_cache_buf, 0, coord_cache_buf_len * sizeof(T));
+    // Allocate space for coordinate cache
+    size_t coord_cache_buf_len = num_cached_nodes * _aligned_dim;
+    diskann::alloc_aligned((void **)&_coord_cache_buf, coord_cache_buf_len * sizeof(T), 8 * sizeof(T));
+    memset(_coord_cache_buf, 0, coord_cache_buf_len * sizeof(T));
 
     size_t BLOCK_SIZE = 8;
     size_t num_blocks = DIV_ROUND_UP(num_cached_nodes, BLOCK_SIZE);
-
     for (size_t block = 0; block < num_blocks; block++)
     {
         size_t start_idx = block * BLOCK_SIZE;
         size_t end_idx = (std::min)(num_cached_nodes, (block + 1) * BLOCK_SIZE);
-        std::vector<AlignedRead> read_reqs;
-        std::vector<std::pair<uint32_t, char *>> nhoods;
+
+        // Copy offset into buffers to read into
+        std::vector<uint32_t> nodes_to_read;
+        std::vector<T *> coord_buffers;
+        std::vector<std::pair<uint32_t, uint32_t *>> nbr_buffers;
         for (size_t node_idx = start_idx; node_idx < end_idx; node_idx++)
         {
-            AlignedRead read;
-            char *buf = nullptr;
-            alloc_aligned((void **)&buf, SECTOR_LEN, SECTOR_LEN);
-            nhoods.push_back(std::make_pair(node_list[node_idx], buf));
-            read.len = SECTOR_LEN;
-            read.buf = buf;
-            read.offset = NODE_SECTOR_NO(node_list[node_idx]) * SECTOR_LEN;
-            read_reqs.push_back(read);
+            nodes_to_read.push_back(node_list[node_idx]);
+            coord_buffers.push_back(_coord_cache_buf + node_idx * _aligned_dim);
+            nbr_buffers.emplace_back(0, _nhood_cache_buf + node_idx * (_max_degree + 1));
         }
 
-        reader->read(read_reqs, ctx);
+        // issue the reads
+        auto read_status = read_nodes(nodes_to_read, coord_buffers, nbr_buffers);
 
-        size_t node_idx = start_idx;
-        for (uint32_t i = 0; i < read_reqs.size(); i++)
+        // check for success and insert into the cache.
+        for (size_t i = 0; i < read_status.size(); i++)
         {
-#if defined(_WINDOWS) && defined(USE_BING_INFRA) // this block is to handle failed reads in
-                                                 // production settings
-            if ((*ctx.m_pRequestsStatus)[i] != IOContext::READ_SUCCESS)
+            if (read_status[i] == true)
             {
-                continue;
+                _coord_cache.insert(std::make_pair(nodes_to_read[i], coord_buffers[i]));
+                _nhood_cache.insert(std::make_pair(nodes_to_read[i], nbr_buffers[i]));
             }
-#endif
-            auto &nhood = nhoods[i];
-            char *node_buf = OFFSET_TO_NODE(nhood.second, nhood.first);
-            T *node_coords = OFFSET_TO_NODE_COORDS(node_buf);
-            T *cached_coords = coord_cache_buf + node_idx * aligned_dim;
-            memcpy(cached_coords, node_coords, disk_bytes_per_point);
-            coord_cache.insert(std::make_pair(nhood.first, cached_coords));
-
-            // insert node nhood into nhood_cache
-            uint32_t *node_nhood = OFFSET_TO_NODE_NHOOD(node_buf);
-
-            auto nnbrs = *node_nhood;
-            uint32_t *nbrs = node_nhood + 1;
-            std::pair<uint32_t, uint32_t *> cnhood;
-            cnhood.first = nnbrs;
-            cnhood.second = nhood_cache_buf + node_idx * (max_degree + 1);
-            memcpy(cnhood.second, nbrs, nnbrs * sizeof(uint32_t));
-            nhood_cache.insert(std::make_pair(nhood.first, cnhood));
-            aligned_free(nhood.second);
-            node_idx++;
         }
     }
     diskann::cout << "..done." << std::endl;
@@ -210,24 +273,24 @@ void PQFlashIndex<T, LabelT>::generate_cache_list_from_sample_queries(std::strin
                                                                       std::vector<uint32_t> &node_list)
 {
 #endif
-    if (num_nodes_to_cache >= this->num_points)
+    if (num_nodes_to_cache >= this->_num_points)
     {
         // for small num_points and big num_nodes_to_cache, use below way to get the node_list quickly
-        node_list.resize(this->num_points);
-        for (uint32_t i = 0; i < this->num_points; ++i)
+        node_list.resize(this->_num_points);
+        for (uint32_t i = 0; i < this->_num_points; ++i)
         {
             node_list[i] = i;
         }
         return;
     }
 
-    this->count_visited_nodes = true;
-    this->node_visit_counter.clear();
-    this->node_visit_counter.resize(this->num_points);
-    for (uint32_t i = 0; i < node_visit_counter.size(); i++)
+    this->_count_visited_nodes = true;
+    this->_node_visit_counter.clear();
+    this->_node_visit_counter.resize(this->_num_points);
+    for (uint32_t i = 0; i < _node_visit_counter.size(); i++)
     {
-        this->node_visit_counter[i].first = i;
-        this->node_visit_counter[i].second = 0;
+        this->_node_visit_counter[i].first = i;
+        this->_node_visit_counter[i].second = 0;
     }
 
     uint64_t sample_num, sample_dim, sample_aligned_dim;
@@ -272,19 +335,19 @@ void PQFlashIndex<T, LabelT>::generate_cache_list_from_sample_queries(std::strin
                            tmp_result_dists.data() + i, beamwidth, filtered_search, label_for_search, false);
     }
 
-    std::sort(this->node_visit_counter.begin(), node_visit_counter.end(),
+    std::sort(this->_node_visit_counter.begin(), _node_visit_counter.end(),
               [](std::pair<uint32_t, uint32_t> &left, std::pair<uint32_t, uint32_t> &right) {
                   return left.second > right.second;
               });
     node_list.clear();
     node_list.shrink_to_fit();
-    num_nodes_to_cache = std::min(num_nodes_to_cache, this->node_visit_counter.size());
+    num_nodes_to_cache = std::min(num_nodes_to_cache, this->_node_visit_counter.size());
     node_list.reserve(num_nodes_to_cache);
     for (uint64_t i = 0; i < num_nodes_to_cache; i++)
     {
-        node_list.push_back(this->node_visit_counter[i].first);
+        node_list.push_back(this->_node_visit_counter[i].first);
     }
-    this->count_visited_nodes = false;
+    this->_count_visited_nodes = false;
 
     diskann::aligned_free(samples);
 }
@@ -299,27 +362,27 @@ void PQFlashIndex<T, LabelT>::cache_bfs_levels(uint64_t num_nodes_to_cache, std:
     tsl::robin_set<uint32_t> node_set;
 
     // Do not cache more than 10% of the nodes in the index
-    uint64_t tenp_nodes = (uint64_t)(std::round(this->num_points * 0.1));
+    uint64_t tenp_nodes = (uint64_t)(std::round(this->_num_points * 0.1));
     if (num_nodes_to_cache > tenp_nodes)
     {
         diskann::cout << "Reducing nodes to cache from: " << num_nodes_to_cache << " to: " << tenp_nodes
-                      << "(10 percent of total nodes:" << this->num_points << ")" << std::endl;
+                      << "(10 percent of total nodes:" << this->_num_points << ")" << std::endl;
         num_nodes_to_cache = tenp_nodes == 0 ? 1 : tenp_nodes;
     }
     diskann::cout << "Caching " << num_nodes_to_cache << "..." << std::endl;
 
     // borrow thread data
-    ScratchStoreManager<SSDThreadData<T>> manager(this->thread_data);
-    auto this_thread_data = manager.scratch_space();
-    IOContext &ctx = this_thread_data->ctx;
+    //ScratchStoreManager<SSDThreadData<T>> manager(this->_thread_data);
+    //auto this_thread_data = manager.scratch_space();
+    //IOContext &ctx = this_thread_data->ctx;
 
     std::unique_ptr<tsl::robin_set<uint32_t>> cur_level, prev_level;
     cur_level = std::make_unique<tsl::robin_set<uint32_t>>();
     prev_level = std::make_unique<tsl::robin_set<uint32_t>>();
 
-    for (uint64_t miter = 0; miter < num_medoids && cur_level->size() < num_nodes_to_cache; miter++)
+    for (uint64_t miter = 0; miter < _num_medoids && cur_level->size() < num_nodes_to_cache; miter++)
     {
-        cur_level->insert(medoids[miter]);
+        cur_level->insert(_medoids[miter]);
     }
 
     if ((_filter_to_medoid_ids.size() > 0) && (cur_level->size() < num_nodes_to_cache))
@@ -373,53 +436,46 @@ void PQFlashIndex<T, LabelT>::cache_bfs_levels(uint64_t num_nodes_to_cache, std:
             diskann::cout << "." << std::flush;
             size_t start = block * BLOCK_SIZE;
             size_t end = (std::min)((block + 1) * BLOCK_SIZE, nodes_to_expand.size());
-            std::vector<AlignedRead> read_reqs;
-            std::vector<std::pair<uint32_t, char *>> nhoods;
+
+            std::vector<uint32_t> nodes_to_read;
+            std::vector<T *> coord_buffers(end - start, nullptr);
+            std::vector<std::pair<uint32_t, uint32_t *>> nbr_buffers;
+
             for (size_t cur_pt = start; cur_pt < end; cur_pt++)
             {
-                char *buf = nullptr;
-                alloc_aligned((void **)&buf, SECTOR_LEN, SECTOR_LEN);
-                nhoods.emplace_back(nodes_to_expand[cur_pt], buf);
-                AlignedRead read;
-                read.len = SECTOR_LEN;
-                read.buf = buf;
-                read.offset = NODE_SECTOR_NO(nodes_to_expand[cur_pt]) * SECTOR_LEN;
-                read_reqs.push_back(read);
+                nodes_to_read.push_back(nodes_to_expand[cur_pt]);
+                nbr_buffers.emplace_back(0, new uint32_t[_max_degree + 1]);
             }
 
             // issue read requests
-            reader->read(read_reqs, ctx);
+            auto read_status = read_nodes(nodes_to_read, coord_buffers, nbr_buffers);
 
             // process each nhood buf
-            for (uint32_t i = 0; i < read_reqs.size(); i++)
+            for (uint32_t i = 0; i < read_status.size(); i++)
             {
-#if defined(_WINDOWS) && defined(USE_BING_INFRA) // this block is to handle read failures in
-                                                 // production settings
-                if ((*ctx.m_pRequestsStatus)[i] != IOContext::READ_SUCCESS)
+                if (read_status[i] == false)
                 {
                     continue;
                 }
-#endif
-                auto &nhood = nhoods[i];
-
-                // insert node coord into coord_cache
-                char *node_buf = OFFSET_TO_NODE(nhood.second, nhood.first);
-                uint32_t *node_nhood = OFFSET_TO_NODE_NHOOD(node_buf);
-                uint64_t nnbrs = (uint64_t)*node_nhood;
-                uint32_t *nbrs = node_nhood + 1;
-                // explore next level
-                for (uint64_t j = 0; j < nnbrs && !finish_flag; j++)
+                else
                 {
-                    if (node_set.find(nbrs[j]) == node_set.end())
-                    {
-                        cur_level->insert(nbrs[j]);
-                    }
-                    if (cur_level->size() + node_set.size() >= num_nodes_to_cache)
+                    uint32_t nnbrs = nbr_buffers[i].first;
+                    uint32_t *nbrs = nbr_buffers[i].second;
+
+                    // explore next level
+                    for (uint32_t j = 0; j < nnbrs && !finish_flag; j++)
                     {
-                        finish_flag = true;
+                        if (node_set.find(nbrs[j]) == node_set.end())
+                        {
+                            cur_level->insert(nbrs[j]);
+                        }
+                        if (cur_level->size() + node_set.size() >= num_nodes_to_cache)
+                        {
+                            finish_flag = true;
+                        }
                     }
                 }
-                aligned_free(nhood.second);
+                delete[] nbr_buffers[i].second;
             }
         }
 
@@ -446,64 +502,50 @@ void PQFlashIndex<T, LabelT>::cache_bfs_levels(uint64_t num_nodes_to_cache, std:
 
 template <typename T, typename LabelT> void PQFlashIndex<T, LabelT>::use_medoids_data_as_centroids()
 {
-    if (centroid_data != nullptr)
-        aligned_free(centroid_data);
-    alloc_aligned(((void **)&centroid_data), num_medoids * aligned_dim * sizeof(float), 32);
-    std::memset(centroid_data, 0, num_medoids * aligned_dim * sizeof(float));
+    if (_centroid_data != nullptr)
+        aligned_free(_centroid_data);
+    alloc_aligned(((void **)&_centroid_data), _num_medoids * _aligned_dim * sizeof(float), 32);
+    std::memset(_centroid_data, 0, _num_medoids * _aligned_dim * sizeof(float));
 
     // borrow ctx
-    ScratchStoreManager<SSDThreadData<T>> manager(this->thread_data);
-    auto data = manager.scratch_space();
-    IOContext &ctx = data->ctx;
-    diskann::cout << "Loading centroid data from medoids vector data of " << num_medoids << " medoid(s)" << std::endl;
-    for (uint64_t cur_m = 0; cur_m < num_medoids; cur_m++)
-    {
-        auto medoid = medoids[cur_m];
-        // read medoid nhood
-        char *medoid_buf = nullptr;
-        alloc_aligned((void **)&medoid_buf, SECTOR_LEN, SECTOR_LEN);
-        std::vector<AlignedRead> medoid_read(1);
-        medoid_read[0].len = SECTOR_LEN;
-        medoid_read[0].buf = medoid_buf;
-        medoid_read[0].offset = NODE_SECTOR_NO(medoid) * SECTOR_LEN;
-        reader->read(medoid_read, ctx);
-
-        // all data about medoid
-        char *medoid_node_buf = OFFSET_TO_NODE(medoid_buf, medoid);
-
-        // add medoid coords to `coord_cache`
-        T *medoid_coords = new T[data_dim];
-        T *medoid_disk_coords = OFFSET_TO_NODE_COORDS(medoid_node_buf);
-        memcpy(medoid_coords, medoid_disk_coords, disk_bytes_per_point);
-
-        if (!use_disk_index_pq)
-        {
-            for (uint32_t i = 0; i < data_dim; i++)
-                centroid_data[cur_m * aligned_dim + i] = medoid_coords[i];
-        }
-        else
-        {
-            disk_pq_table.inflate_vector((uint8_t *)medoid_coords, (centroid_data + cur_m * aligned_dim));
-        }
+    //ScratchStoreManager<SSDThreadData<T>> manager(this->_thread_data);
+    //auto data = manager.scratch_space();
+    //IOContext &ctx = data->ctx;
+    diskann::cout << "Loading centroid data from medoids vector data of " << _num_medoids << " medoid(s)" << std::endl;
+
+    std::vector<uint32_t> nodes_to_read;
+    std::vector<T *> medoid_bufs;
+    std::vector<std::pair<uint32_t, uint32_t *>> nbr_bufs;
 
-        aligned_free(medoid_buf);
-        delete[] medoid_coords;
+    for (uint64_t cur_m = 0; cur_m < _num_medoids; cur_m++)
+    {
+        nodes_to_read.push_back(_medoids[cur_m]);
+        medoid_bufs.push_back(new T[_data_dim]);
+        nbr_bufs.emplace_back(0, nullptr);
     }
-}
 
-template <typename T, typename LabelT>
-inline int32_t PQFlashIndex<T, LabelT>::get_filter_number(const LabelT &filter_label)
-{
-    int idx = -1;
-    for (uint32_t i = 0; i < _filter_list.size(); i++)
+    auto read_status = read_nodes(nodes_to_read, medoid_bufs, nbr_bufs);
+
+    for (uint64_t cur_m = 0; cur_m < _num_medoids; cur_m++)
     {
-        if (_filter_list[i] == filter_label)
+        if (read_status[cur_m] == true)
         {
-            idx = i;
-            break;
+            if (!_use_disk_index_pq)
+            {
+                for (uint32_t i = 0; i < _data_dim; i++)
+                    _centroid_data[cur_m * _aligned_dim + i] = medoid_bufs[cur_m][i];
+            }
+            else
+            {
+                _disk_pq_table.inflate_vector((uint8_t *)medoid_bufs[cur_m], (_centroid_data + cur_m * _aligned_dim));
+            }
         }
+        else
+        {
+            throw ANNException("Unable to read a medoid", -1, __FUNCSIG__, __FILE__, __LINE__);
+        }
+        delete[] medoid_bufs[cur_m];
     }
-    return idx;
 }
 
 template <typename T, typename LabelT>
@@ -514,38 +556,29 @@ void PQFlashIndex<T, LabelT>::generate_random_labels(std::vector<LabelT> &labels
     labels.clear();
     labels.resize(num_labels);
 
-    uint64_t num_total_labels =
-        _pts_to_label_offsets[num_points - 1] + _pts_to_labels[_pts_to_label_offsets[num_points - 1]];
+    uint64_t num_total_labels = _pts_to_label_offsets[_num_points - 1] + _pts_to_label_counts[_num_points - 1];
     std::mt19937 gen(rd());
-    std::uniform_int_distribution<uint64_t> dis(0, num_total_labels);
-
-    tsl::robin_set<uint64_t> skip_locs;
-    for (uint32_t i = 0; i < num_points; i++)
+    if (num_total_labels == 0)
     {
-        skip_locs.insert(_pts_to_label_offsets[i]);
+        std::stringstream stream;
+        stream << "No labels found in data. Not sampling random labels ";
+        diskann::cerr << stream.str() << std::endl;
+        throw diskann::ANNException(stream.str(), -1, __FUNCSIG__, __FILE__, __LINE__);
     }
+    std::uniform_int_distribution<uint64_t> dis(0, num_total_labels - 1);
 
 #pragma omp parallel for schedule(dynamic, 1) num_threads(nthreads)
     for (int64_t i = 0; i < num_labels; i++)
     {
-        bool found_flag = false;
-        while (!found_flag)
-        {
-            uint64_t rnd_loc = dis(gen);
-            if (skip_locs.find(rnd_loc) == skip_locs.end())
-            {
-                found_flag = true;
-                labels[i] = _filter_list[_pts_to_labels[rnd_loc]];
-            }
-        }
+        uint64_t rnd_loc = dis(gen);
+        labels[i] = (LabelT)_pts_to_labels[rnd_loc];
     }
 }
 
 template <typename T, typename LabelT>
-std::unordered_map<std::string, LabelT> PQFlashIndex<T, LabelT>::load_label_map(const std::string &labels_map_file)
+std::unordered_map<std::string, LabelT> PQFlashIndex<T, LabelT>::load_label_map(std::basic_istream<char> &map_reader)
 {
     std::unordered_map<std::string, LabelT> string_to_int_mp;
-    std::ifstream map_reader(labels_map_file);
     std::string line, token;
     LabelT token_as_num;
     std::string label_str;
@@ -570,7 +603,7 @@ LabelT PQFlashIndex<T, LabelT>::get_converted_label(const std::string &filter_la
     }
     else if (_use_universal_label)
     {
-        return static_cast<LabelT>(_universal_filter_num);
+        return _universal_filter_label;
     }
     else
     {
@@ -578,6 +611,13 @@ LabelT PQFlashIndex<T, LabelT>::get_converted_label(const std::string &filter_la
     }
 }
 
+template <typename T, typename LabelT>
+void PQFlashIndex<T, LabelT>::reset_stream_for_reading(std::basic_istream<char> &infile)
+{
+    infile.clear();
+    infile.seekg(0);
+}
+
 template <typename T, typename LabelT>
 bool PQFlashIndex<T, LabelT>::is_label_valid(const std::string& filter_label)
 {
@@ -589,7 +629,6 @@ bool PQFlashIndex<T, LabelT>::is_label_valid(const std::string& filter_label)
     return false;
 }
 
-// test commit
 template <typename T, typename LabelT>
 void PQFlashIndex<T, LabelT>::get_label_file_metadata(const std::string &fileContent, uint32_t &num_pts,
                                                       uint32_t &num_total_labels)
@@ -635,14 +674,14 @@ void PQFlashIndex<T, LabelT>::get_label_file_metadata(const std::string &fileCon
 }
 
 template <typename T, typename LabelT>
-inline bool PQFlashIndex<T, LabelT>::point_has_label(uint32_t point_id, uint32_t label_id)
+inline bool PQFlashIndex<T, LabelT>::point_has_label(uint32_t point_id, LabelT label_id)
 {
     uint32_t start_vec = _pts_to_label_offsets[point_id];
-    uint32_t num_lbls = _pts_to_labels[start_vec];
+    uint32_t num_lbls = _pts_to_label_counts[point_id];
     bool ret_val = false;
     for (uint32_t i = 0; i < num_lbls; i++)
     {
-        if (_pts_to_labels[start_vec + 1 + i] == label_id)
+        if (_pts_to_labels[start_vec + i] == label_id)
         {
             ret_val = true;
             break;
@@ -652,13 +691,8 @@ inline bool PQFlashIndex<T, LabelT>::point_has_label(uint32_t point_id, uint32_t
 }
 
 template <typename T, typename LabelT>
-void PQFlashIndex<T, LabelT>::parse_label_file(const std::string &label_file, size_t &num_points_labels)
+void PQFlashIndex<T, LabelT>::parse_label_file(std::basic_istream<char>& infile, size_t &num_points_labels)
 {
-    std::ifstream infile(label_file, std::ios::binary);
-    if (infile.fail())
-    {
-        throw diskann::ANNException(std::string("Failed to open file ") + label_file, -1);
-    }
     infile.seekg(0, std::ios::end);
     size_t file_size = infile.tellg();
 
@@ -666,8 +700,8 @@ void PQFlashIndex<T, LabelT>::parse_label_file(const std::string &label_file, si
 
     infile.seekg(0, std::ios::beg);
     infile.read(&buffer[0], file_size);
-    infile.close();
 
+    std::string line;
     uint32_t line_cnt = 0;
 
     uint32_t num_pts_in_label_file;
@@ -675,8 +709,9 @@ void PQFlashIndex<T, LabelT>::parse_label_file(const std::string &label_file, si
     get_label_file_metadata(buffer, num_pts_in_label_file, num_total_labels);
 
     _pts_to_label_offsets = new uint32_t[num_pts_in_label_file];
-    _pts_to_labels = new uint32_t[num_pts_in_label_file + num_total_labels];
-    uint32_t counter = 0;
+    _pts_to_label_counts = new uint32_t[num_pts_in_label_file];
+    _pts_to_labels = new LabelT[num_total_labels];
+    uint32_t labels_seen_so_far = 0;
 
     std::string label_str;
     size_t cur_pos = 0;
@@ -689,10 +724,9 @@ void PQFlashIndex<T, LabelT>::parse_label_file(const std::string &label_file, si
             break;
         }
 
-        _pts_to_label_offsets[line_cnt] = counter;
-        uint32_t &num_lbls_in_cur_pt = _pts_to_labels[counter];
+        _pts_to_label_offsets[line_cnt] = labels_seen_so_far;
+        uint32_t &num_lbls_in_cur_pt = _pts_to_label_counts[line_cnt];
         num_lbls_in_cur_pt = 0;
-        counter++;
 
         size_t lbl_pos = cur_pos;
         size_t next_lbl_pos = 0;
@@ -704,30 +738,26 @@ void PQFlashIndex<T, LabelT>::parse_label_file(const std::string &label_file, si
                 next_lbl_pos = next_pos;
             }
 
-            if (next_lbl_pos > next_pos) // the last label in one line
+            if (next_lbl_pos > next_pos) // the last label in one line, just read to the end
             {
                 next_lbl_pos = next_pos;
             }
 
             label_str.assign(buffer.c_str() + lbl_pos, next_lbl_pos - lbl_pos);
-            if (label_str[label_str.length() - 1] == '\t')
+            if (label_str[label_str.length() - 1] == '\t') // '\t' won't exist in label file?
             {
                 label_str.erase(label_str.length() - 1);
             }
 
             LabelT token_as_num = (LabelT)std::stoul(label_str);
-            if (_labels.find(token_as_num) == _labels.end())
-            {
-                _filter_list.emplace_back(token_as_num);
-            }
- 
-            _pts_to_labels[counter++] = token_as_num;
+            _pts_to_labels[labels_seen_so_far++] = (LabelT)token_as_num;
             num_lbls_in_cur_pt++;
-            _labels.insert(token_as_num);
 
+            // move to next label
             lbl_pos = next_lbl_pos + 1;
         }
 
+        // move to next line
         cur_pos = next_pos + 1;
 
         if (num_lbls_in_cur_pt == 0)
@@ -740,12 +770,13 @@ void PQFlashIndex<T, LabelT>::parse_label_file(const std::string &label_file, si
     }
 
     num_points_labels = line_cnt;
+    reset_stream_for_reading(infile);
 }
 
 template <typename T, typename LabelT> void PQFlashIndex<T, LabelT>::set_universal_label(const LabelT &label)
 {
     _use_universal_label = true;
-    _universal_filter_num = (uint32_t)label;
+    _universal_filter_label = label;
 }
 
 #ifdef EXEC_ENV_OLS
@@ -763,15 +794,14 @@ template <typename T, typename LabelT> int PQFlashIndex<T, LabelT>::load(uint32_
     std::string labels_to_medoids = std::string(index_prefix) + "_labels_to_medoids.txt";
     std::string labels_map_file = std::string(index_prefix) + "_labels_map.txt";
     std::string univ_label_file = std::string(index_prefix) + "_universal_label.txt";
-
 #ifdef EXEC_ENV_OLS
     return load_from_separate_paths(files, num_threads, disk_index_file.c_str(), pq_table_bin.c_str(),
-                                    pq_compressed_vectors.c_str(), labels_file.c_str(), labels_to_medoids.c_str(), 
-                                    labels_map_file.c_str(), univ_label_file.c_str());
+        pq_compressed_vectors.c_str(), labels_file.c_str(), labels_to_medoids.c_str(),
+        labels_map_file.c_str(), univ_label_file.c_str());
 #else
     return load_from_separate_paths(num_threads, disk_index_file.c_str(), pq_table_bin.c_str(),
-                                    pq_compressed_vectors.c_str(), labels_file.c_str(), labels_to_medoids.c_str(),
-                                    labels_map_file.c_str(), univ_label_file.c_str());
+        pq_compressed_vectors.c_str(), labels_file.c_str(), labels_to_medoids.c_str(),
+        labels_map_file.c_str(), univ_label_file.c_str());
 #endif
 }
 
@@ -793,13 +823,13 @@ int PQFlashIndex<T, LabelT>::load_from_separate_paths(uint32_t num_threads, cons
 #endif
     std::string pq_table_bin = pivots_filepath;
     std::string pq_compressed_vectors = compressed_filepath;
-    std::string disk_index_file = index_filepath;
-    std::string medoids_file = std::string(disk_index_file) + "_medoids.bin";
-    std::string centroids_file = std::string(disk_index_file) + "_centroids.bin";
+    std::string _disk_index_file = index_filepath;
+    std::string medoids_file = std::string(_disk_index_file) + "_medoids.bin";
+    std::string centroids_file = std::string(_disk_index_file) + "_centroids.bin";
 
     std::string labels_file = (labels_filepath == nullptr ? "" : labels_filepath);
     std::string labels_to_medoids = (labels_to_medoids_filepath == nullptr ? "" : labels_to_medoids_filepath);
-    std::string dummy_map_file = std ::string(disk_index_file) + "_dummy_map.txt";
+    std::string dummy_map_file = std ::string(_disk_index_file) + "_dummy_map.txt";
     std::string labels_map_file = (labels_map_filepath == nullptr ? "" : labels_map_filepath);
     size_t num_pts_in_label_file = 0;
 
@@ -810,7 +840,7 @@ int PQFlashIndex<T, LabelT>::load_from_separate_paths(uint32_t num_threads, cons
     get_bin_metadata(pq_table_bin, pq_file_num_centroids, pq_file_dim, METADATA_SIZE);
 #endif
 
-    this->disk_index_file = disk_index_file;
+    this->_disk_index_file = _disk_index_file;
 
     if (pq_file_num_centroids != 256)
     {
@@ -818,13 +848,11 @@ int PQFlashIndex<T, LabelT>::load_from_separate_paths(uint32_t num_threads, cons
         return -1;
     }
 
-    this->data_dim = pq_file_dim;
-    // will reset later if we use PQ on disk
-    this->disk_data_dim = this->data_dim;
+    this->_data_dim = pq_file_dim;
     // will change later if we use PQ on disk or if we are using
     // inner product without PQ
-    this->disk_bytes_per_point = this->data_dim * sizeof(T);
-    this->aligned_dim = ROUND_UP(pq_file_dim, 8);
+    this->_disk_bytes_per_point = this->_data_dim * sizeof(T);
+    this->_aligned_dim = ROUND_UP(pq_file_dim, 8);
 
     size_t npts_u64, nchunks_u64;
 #ifdef EXEC_ENV_OLS
@@ -833,17 +861,53 @@ int PQFlashIndex<T, LabelT>::load_from_separate_paths(uint32_t num_threads, cons
     diskann::load_bin<uint8_t>(pq_compressed_vectors, this->data, npts_u64, nchunks_u64);
 #endif
 
-    this->num_points = npts_u64;
-    this->n_chunks = nchunks_u64;
+    this->_num_points = npts_u64;
+    this->_n_chunks = nchunks_u64;
+#ifdef EXEC_ENV_OLS
+    if (files.fileExists(labels_file))
+    {
+        FileContent &content_labels = files.getContent(labels_file);
+        std::stringstream infile(std::string((const char *)content_labels._content, content_labels._size));
+#else
     if (file_exists(labels_file))
     {
-        parse_label_file(labels_file, num_pts_in_label_file);
-        assert(num_pts_in_label_file == this->num_points);
-        _label_map = load_label_map(labels_map_file);
+        std::ifstream infile(labels_file, std::ios::binary);
+        if (infile.fail())
+        {
+            throw diskann::ANNException(std::string("Failed to open file ") + labels_file, -1);
+        }
+#endif
+        parse_label_file(infile, num_pts_in_label_file);
+        assert(num_pts_in_label_file == this->_num_points);
+
+#ifndef EXEC_ENV_OLS
+        infile.close();
+#endif
+
+#ifdef EXEC_ENV_OLS
+        FileContent &content_labels_map = files.getContent(labels_map_file);
+        std::stringstream map_reader(std::string((const char *)content_labels_map._content, content_labels_map._size));
+#else
+        std::ifstream map_reader(labels_map_file);
+#endif
+        _label_map = load_label_map(map_reader);
+
+#ifndef EXEC_ENV_OLS
+        map_reader.close();
+#endif
+
+#ifdef EXEC_ENV_OLS
+        if (files.fileExists(labels_to_medoids))
+        {
+            FileContent &content_labels_to_meoids = files.getContent(labels_to_medoids);
+            std::stringstream medoid_stream(
+                std::string((const char *)content_labels_to_meoids._content, content_labels_to_meoids._size));
+#else
         if (file_exists(labels_to_medoids))
         {
             std::ifstream medoid_stream(labels_to_medoids);
             assert(medoid_stream.is_open());
+#endif
             std::string line, token;
 
             _filter_to_medoid_ids.clear();
@@ -871,22 +935,41 @@ int PQFlashIndex<T, LabelT>::load_from_separate_paths(uint32_t num_threads, cons
                 throw FileException(labels_to_medoids, e, __FUNCSIG__, __FILE__, __LINE__);
             }
         }
-
         std::string univ_label_file = (unv_label_filepath == nullptr ? "" : unv_label_filepath);
+
+#ifdef EXEC_ENV_OLS
+        if (files.fileExists(univ_label_file))
+        {
+            FileContent& content_univ_label = files.getContent(univ_label_file);
+            std::stringstream universal_label_reader(
+                std::string((const char*)content_univ_label._content, content_univ_label._size));
+#else
         if (file_exists(univ_label_file))
         {
             std::ifstream universal_label_reader(univ_label_file);
             assert(universal_label_reader.is_open());
+#endif
             std::string univ_label;
             universal_label_reader >> univ_label;
+#ifndef EXEC_ENV_OLS
             universal_label_reader.close();
+#endif
             LabelT label_as_num = (LabelT)std::stoul(univ_label);
             set_universal_label(label_as_num);
         }
+
+#ifdef EXEC_ENV_OLS
+        if (files.fileExists(dummy_map_file))
+        {
+            FileContent &content_dummy_map = files.getContent(dummy_map_file);
+            std::stringstream dummy_map_stream(
+                std::string((const char *)content_dummy_map._content, content_dummy_map._size));
+#else
         if (file_exists(dummy_map_file))
         {
             std::ifstream dummy_map_stream(dummy_map_file);
             assert(dummy_map_stream.is_open());
+#endif
             std::string line, token;
 
             while (std::getline(dummy_map_stream, line))
@@ -912,21 +995,24 @@ int PQFlashIndex<T, LabelT>::load_from_separate_paths(uint32_t num_threads, cons
 
                 _real_to_dummy_map[real_id].emplace_back(dummy_id);
             }
+#ifndef EXEC_ENV_OLS
             dummy_map_stream.close();
+#endif
             diskann::cout << "Loaded dummy map" << std::endl;
         }
     }
 
 #ifdef EXEC_ENV_OLS
-    pq_table.load_pq_centroid_bin(files, pq_table_bin.c_str(), nchunks_u64);
+    _pq_table.load_pq_centroid_bin(files, pq_table_bin.c_str(), nchunks_u64);
 #else
-    pq_table.load_pq_centroid_bin(pq_table_bin.c_str(), nchunks_u64);
+    _pq_table.load_pq_centroid_bin(pq_table_bin.c_str(), nchunks_u64);
 #endif
 
-    diskann::cout << "Loaded PQ centroids and in-memory compressed vectors. #points: " << num_points
-                  << " #dim: " << data_dim << " #aligned_dim: " << aligned_dim << " #chunks: " << n_chunks << std::endl;
+    diskann::cout << "Loaded PQ centroids and in-memory compressed vectors. #points: " << _num_points
+                  << " #dim: " << _data_dim << " #aligned_dim: " << _aligned_dim << " #chunks: " << _n_chunks
+                  << std::endl;
 
-    if (n_chunks > MAX_PQ_CHUNKS)
+    if (_n_chunks > MAX_PQ_CHUNKS)
     {
         std::stringstream stream;
         stream << "Error loading index. Ensure that max PQ bytes for in-memory "
@@ -935,23 +1021,26 @@ int PQFlashIndex<T, LabelT>::load_from_separate_paths(uint32_t num_threads, cons
         throw diskann::ANNException(stream.str(), -1, __FUNCSIG__, __FILE__, __LINE__);
     }
 
-    std::string disk_pq_pivots_path = this->disk_index_file + "_pq_pivots.bin";
-    if (file_exists(disk_pq_pivots_path))
-    {
-        use_disk_index_pq = true;
+    std::string disk_pq_pivots_path = this->_disk_index_file + "_pq_pivots.bin";
 #ifdef EXEC_ENV_OLS
-        // giving 0 chunks to make the pq_table infer from the
+    if (files.fileExists(disk_pq_pivots_path))
+    {
+        _use_disk_index_pq = true;
+        // giving 0 chunks to make the _pq_table infer from the
         // chunk_offsets file the correct value
-        disk_pq_table.load_pq_centroid_bin(files, disk_pq_pivots_path.c_str(), 0);
+        _disk_pq_table.load_pq_centroid_bin(files, disk_pq_pivots_path.c_str(), 0);
 #else
-        // giving 0 chunks to make the pq_table infer from the
+    if (file_exists(disk_pq_pivots_path))
+    {
+        _use_disk_index_pq = true;
+        // giving 0 chunks to make the _pq_table infer from the
         // chunk_offsets file the correct value
-        disk_pq_table.load_pq_centroid_bin(disk_pq_pivots_path.c_str(), 0);
+        _disk_pq_table.load_pq_centroid_bin(disk_pq_pivots_path.c_str(), 0);
 #endif
-        disk_pq_n_chunks = disk_pq_table.get_num_chunks();
-        disk_bytes_per_point =
-            disk_pq_n_chunks * sizeof(uint8_t); // revising disk_bytes_per_point since DISK PQ is used.
-        diskann::cout << "Disk index uses PQ data compressed down to " << disk_pq_n_chunks << " bytes per point."
+        _disk_pq_n_chunks = _disk_pq_table.get_num_chunks();
+        _disk_bytes_per_point =
+            _disk_pq_n_chunks * sizeof(uint8_t); // revising disk_bytes_per_point since DISK PQ is used.
+        diskann::cout << "Disk index uses PQ data compressed down to " << _disk_pq_n_chunks << " bytes per point."
                       << std::endl;
     }
 
@@ -962,15 +1051,15 @@ int PQFlashIndex<T, LabelT>::load_from_separate_paths(uint32_t num_threads, cons
     // DiskPriorityIO class. So, we need to estimate how many
     // bytes are needed to store the header and read in that many using our
     // 'standard' aligned file reader approach.
-    reader->open(disk_index_file);
+    reader->open(_disk_index_file);
     this->setup_thread_data(num_threads);
-    this->max_nthreads = num_threads;
+    this->_max_nthreads = num_threads;
 
     char *bytes = getHeaderBytes();
     ContentBuf buf(bytes, HEADER_SIZE);
     std::basic_istream<char> index_metadata(&buf);
 #else
-    std::ifstream index_metadata(disk_index_file, std::ios::binary);
+    std::ifstream index_metadata(_disk_index_file, std::ios::binary);
 #endif
 
     uint32_t nr, nc; // metadata itself is stored as bin format (nr is number of
@@ -983,59 +1072,59 @@ int PQFlashIndex<T, LabelT>::load_from_separate_paths(uint32_t num_threads, cons
     READ_U64(index_metadata, disk_nnodes);
     READ_U64(index_metadata, disk_ndims);
 
-    if (disk_nnodes != num_points)
+    if (disk_nnodes != _num_points)
     {
         diskann::cout << "Mismatch in #points for compressed data file and disk "
                          "index file: "
-                      << disk_nnodes << " vs " << num_points << std::endl;
+                      << disk_nnodes << " vs " << _num_points << std::endl;
         return -1;
     }
 
     size_t medoid_id_on_file;
     READ_U64(index_metadata, medoid_id_on_file);
-    READ_U64(index_metadata, max_node_len);
-    READ_U64(index_metadata, nnodes_per_sector);
-    max_degree = ((max_node_len - disk_bytes_per_point) / sizeof(uint32_t)) - 1;
+    READ_U64(index_metadata, _max_node_len);
+    READ_U64(index_metadata, _nnodes_per_sector);
+    _max_degree = ((_max_node_len - _disk_bytes_per_point) / sizeof(uint32_t)) - 1;
 
-    if (max_degree > MAX_GRAPH_DEGREE)
+    if (_max_degree > defaults::MAX_GRAPH_DEGREE)
     {
         std::stringstream stream;
         stream << "Error loading index. Ensure that max graph degree (R) does "
                   "not exceed "
-               << MAX_GRAPH_DEGREE << std::endl;
+               << defaults::MAX_GRAPH_DEGREE << std::endl;
         throw diskann::ANNException(stream.str(), -1, __FUNCSIG__, __FILE__, __LINE__);
     }
 
     // setting up concept of frozen points in disk index for streaming-DiskANN
-    READ_U64(index_metadata, this->num_frozen_points);
+    READ_U64(index_metadata, this->_num_frozen_points);
     uint64_t file_frozen_id;
     READ_U64(index_metadata, file_frozen_id);
-    if (this->num_frozen_points == 1)
-        this->frozen_location = file_frozen_id;
-    if (this->num_frozen_points == 1)
+    if (this->_num_frozen_points == 1)
+        this->_frozen_location = file_frozen_id;
+    if (this->_num_frozen_points == 1)
     {
-        diskann::cout << " Detected frozen point in index at location " << this->frozen_location
+        diskann::cout << " Detected frozen point in index at location " << this->_frozen_location
                       << ". Will not output it at search time." << std::endl;
     }
 
-    READ_U64(index_metadata, this->reorder_data_exists);
-    if (this->reorder_data_exists)
+    READ_U64(index_metadata, this->_reorder_data_exists);
+    if (this->_reorder_data_exists)
     {
-        if (this->use_disk_index_pq == false)
+        if (this->_use_disk_index_pq == false)
         {
             throw ANNException("Reordering is designed for used with disk PQ "
                                "compression option",
                                -1, __FUNCSIG__, __FILE__, __LINE__);
         }
-        READ_U64(index_metadata, this->reorder_data_start_sector);
-        READ_U64(index_metadata, this->ndims_reorder_vecs);
-        READ_U64(index_metadata, this->nvecs_per_sector);
+        READ_U64(index_metadata, this->_reorder_data_start_sector);
+        READ_U64(index_metadata, this->_ndims_reorder_vecs);
+        READ_U64(index_metadata, this->_nvecs_per_sector);
     }
 
     diskann::cout << "Disk-Index File Meta-data: ";
-    diskann::cout << "# nodes per sector: " << nnodes_per_sector;
-    diskann::cout << ", max node len (bytes): " << max_node_len;
-    diskann::cout << ", max node degree: " << max_degree << std::endl;
+    diskann::cout << "# nodes per sector: " << _nnodes_per_sector;
+    diskann::cout << ", max node len (bytes): " << _max_node_len;
+    diskann::cout << ", max node degree: " << _max_degree << std::endl;
 
 #ifdef EXEC_ENV_OLS
     delete[] bytes;
@@ -1045,10 +1134,10 @@ int PQFlashIndex<T, LabelT>::load_from_separate_paths(uint32_t num_threads, cons
 
 #ifndef EXEC_ENV_OLS
     // open AlignedFileReader handle to index_file
-    std::string index_fname(disk_index_file);
+    std::string index_fname(_disk_index_file);
     reader->open(index_fname);
     this->setup_thread_data(num_threads);
-    this->max_nthreads = num_threads;
+    this->_max_nthreads = num_threads;
 
 #endif
 
@@ -1056,12 +1145,12 @@ int PQFlashIndex<T, LabelT>::load_from_separate_paths(uint32_t num_threads, cons
     if (files.fileExists(medoids_file))
     {
         size_t tmp_dim;
-        diskann::load_bin<uint32_t>(files, medoids_file, medoids, num_medoids, tmp_dim);
+        diskann::load_bin<uint32_t>(files, norm_file, medoids_file, _medoids, _num_medoids, tmp_dim);
 #else
     if (file_exists(medoids_file))
     {
         size_t tmp_dim;
-        diskann::load_bin<uint32_t>(medoids_file, medoids, num_medoids, tmp_dim);
+        diskann::load_bin<uint32_t>(medoids_file, _medoids, _num_medoids, tmp_dim);
 #endif
 
         if (tmp_dim != 1)
@@ -1088,12 +1177,12 @@ int PQFlashIndex<T, LabelT>::load_from_separate_paths(uint32_t num_threads, cons
         {
             size_t num_centroids, aligned_tmp_dim;
 #ifdef EXEC_ENV_OLS
-            diskann::load_aligned_bin<float>(files, centroids_file, centroid_data, num_centroids, tmp_dim,
+            diskann::load_aligned_bin<float>(files, centroids_file, _centroid_data, num_centroids, tmp_dim,
                                              aligned_tmp_dim);
 #else
-            diskann::load_aligned_bin<float>(centroids_file, centroid_data, num_centroids, tmp_dim, aligned_tmp_dim);
+            diskann::load_aligned_bin<float>(centroids_file, _centroid_data, num_centroids, tmp_dim, aligned_tmp_dim);
 #endif
-            if (aligned_tmp_dim != aligned_dim || num_centroids != num_medoids)
+            if (aligned_tmp_dim != _aligned_dim || num_centroids != _num_medoids)
             {
                 std::stringstream stream;
                 stream << "Error loading centroids data file. Expected bin format "
@@ -1108,21 +1197,29 @@ int PQFlashIndex<T, LabelT>::load_from_separate_paths(uint32_t num_threads, cons
     }
     else
     {
-        num_medoids = 1;
-        medoids = new uint32_t[1];
-        medoids[0] = (uint32_t)(medoid_id_on_file);
+        _num_medoids = 1;
+        _medoids = new uint32_t[1];
+        _medoids[0] = (uint32_t)(medoid_id_on_file);
         use_medoids_data_as_centroids();
     }
 
-    std::string norm_file = std::string(disk_index_file) + "_max_base_norm.bin";
+    std::string norm_file = std::string(_disk_index_file) + "_max_base_norm.bin";
 
+#ifdef EXEC_ENV_OLS
+    if (files.fileExists(norm_file) && metric == diskann::Metric::INNER_PRODUCT)
+    {
+        uint64_t dumr, dumc;
+        float *norm_val;
+        diskann::load_bin<float>(files, norm_val, dumr, dumc);
+#else
     if (file_exists(norm_file) && metric == diskann::Metric::INNER_PRODUCT)
     {
         uint64_t dumr, dumc;
         float *norm_val;
         diskann::load_bin<float>(norm_file, norm_val, dumr, dumc);
-        this->max_base_norm = norm_val[0];
-        diskann::cout << "Setting re-scaling factor of base vectors to " << this->max_base_norm << std::endl;
+#endif
+        this->_max_base_norm = norm_val[0];
+        diskann::cout << "Setting re-scaling factor of base vectors to " << this->_max_base_norm << std::endl;
         delete[] norm_val;
     }
     diskann::cout << "done.." << std::endl;
@@ -1130,37 +1227,46 @@ int PQFlashIndex<T, LabelT>::load_from_separate_paths(uint32_t num_threads, cons
 }
 
 #ifdef USE_BING_INFRA
-bool getNextCompletedRequest(const IOContext &ctx, size_t size, int &completedIndex)
+bool getNextCompletedRequest(std::shared_ptr<AlignedFileReader> &reader, IOContext &ctx, size_t size,
+                             int &completedIndex)
 {
-    bool waitsRemaining = false;
-    long completeCount = ctx.m_completeCount;
-    do
+    if ((*ctx.m_pRequests)[0].m_callback)
     {
-        for (int i = 0; i < size; i++)
+        bool waitsRemaining = false;
+        long completeCount = ctx.m_completeCount;
+        do
         {
-            auto ithStatus = (*ctx.m_pRequestsStatus)[i];
-            if (ithStatus == IOContext::Status::READ_SUCCESS)
+            for (int i = 0; i < size; i++)
             {
-                completedIndex = i;
-                return true;
+                auto ithStatus = (*ctx.m_pRequestsStatus)[i];
+                if (ithStatus == IOContext::Status::READ_SUCCESS)
+                {
+                    completedIndex = i;
+                    return true;
+                }
+                else if (ithStatus == IOContext::Status::READ_WAIT)
+                {
+                    waitsRemaining = true;
+                }
             }
-            else if (ithStatus == IOContext::Status::READ_WAIT)
+
+            // if we didn't find one in READ_SUCCESS, wait for one to complete.
+            if (waitsRemaining)
             {
-                waitsRemaining = true;
+                WaitOnAddress(&ctx.m_completeCount, &completeCount, sizeof(completeCount), 100);
+                // this assumes the knowledge of the reader behavior (implicit
+                // contract). need better factoring?
             }
-        }
+        } while (waitsRemaining);
 
-        // if we didn't find one in READ_SUCCESS, wait for one to complete.
-        if (waitsRemaining)
-        {
-            WaitOnAddress(&ctx.m_completeCount, &completeCount, sizeof(completeCount), 100);
-            // this assumes the knowledge of the reader behavior (implicit
-            // contract). need better factoring?
-        }
-    } while (waitsRemaining);
-
-    completedIndex = -1;
-    return false;
+        completedIndex = -1;
+        return false;
+    }
+    else
+    {
+        reader->wait(ctx, completedIndex);
+        return completedIndex != -1;
+    }
 }
 #endif
 
@@ -1201,16 +1307,17 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
                                                  const uint32_t io_limit, const bool use_reorder_data,
                                                  QueryStats *stats)
 {
-    int32_t filter_num = filter_label;
 
-    if (beam_width > MAX_N_SECTOR_READS)
-        throw ANNException("Beamwidth can not be higher than MAX_N_SECTOR_READS", -1, __FUNCSIG__, __FILE__, __LINE__);
+    uint64_t num_sector_per_nodes = DIV_ROUND_UP(_max_node_len, defaults::SECTOR_LEN);
+    if (beam_width > num_sector_per_nodes * defaults::MAX_N_SECTOR_READS)
+        throw ANNException("Beamwidth can not be higher than defaults::MAX_N_SECTOR_READS", -1, __FUNCSIG__, __FILE__,
+                           __LINE__);
 
-    ScratchStoreManager<SSDThreadData<T>> manager(this->thread_data);
+    ScratchStoreManager<SSDThreadData<T>> manager(this->_thread_data);
     auto data = manager.scratch_space();
     IOContext &ctx = data->ctx;
     auto query_scratch = &(data->scratch);
-    auto pq_query_scratch = query_scratch->_pq_scratch;
+    auto pq_query_scratch = query_scratch->pq_scratch();
 
     // reset query scratch
     query_scratch->reset();
@@ -1218,36 +1325,39 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
     // copy query to thread specific aligned and allocated memory (for distance
     // calculations we need aligned data)
     float query_norm = 0;
-    T *aligned_query_T = query_scratch->aligned_query_T;
+    T *aligned_query_T = query_scratch->aligned_query_T();
     float *query_float = pq_query_scratch->aligned_query_float;
     float *query_rotated = pq_query_scratch->rotated_query;
 
-    // if inner product, we laso normalize the query and set the last coordinate
-    // to 0 (this is the extra coordindate used to convert MIPS to L2 search)
-    if (metric == diskann::Metric::INNER_PRODUCT)
+    // normalization step. for cosine, we simply normalize the query
+    // for mips, we normalize the first d-1 dims, and add a 0 for last dim, since an extra coordinate was used to
+    // convert MIPS to L2 search
+    if (metric == diskann::Metric::INNER_PRODUCT || metric == diskann::Metric::COSINE)
     {
-        for (size_t i = 0; i < this->data_dim - 1; i++)
+        uint64_t inherent_dim = (metric == diskann::Metric::COSINE) ? this->_data_dim : (uint64_t)(this->_data_dim - 1);
+        for (size_t i = 0; i < inherent_dim; i++)
         {
             aligned_query_T[i] = query1[i];
             query_norm += query1[i] * query1[i];
         }
-        aligned_query_T[this->data_dim - 1] = 0;
+        if (metric == diskann::Metric::INNER_PRODUCT)
+            aligned_query_T[this->_data_dim - 1] = 0;
 
         query_norm = std::sqrt(query_norm);
 
-        for (size_t i = 0; i < this->data_dim - 1; i++)
+        for (size_t i = 0; i < inherent_dim; i++)
         {
             aligned_query_T[i] = (T)(aligned_query_T[i] / query_norm);
         }
-        pq_query_scratch->set(this->data_dim, aligned_query_T);
+        pq_query_scratch->initialize(this->_data_dim, aligned_query_T);
     }
     else
     {
-        for (size_t i = 0; i < this->data_dim; i++)
+        for (size_t i = 0; i < this->_data_dim; i++)
         {
             aligned_query_T[i] = query1[i];
         }
-        pq_query_scratch->set(this->data_dim, aligned_query_T);
+        pq_query_scratch->initialize(this->_data_dim, aligned_query_T);
     }
 
     // pointers to buffers for data
@@ -1257,12 +1367,14 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
     // sector scratch
     char *sector_scratch = query_scratch->sector_scratch;
     uint64_t &sector_scratch_idx = query_scratch->sector_idx;
+    const uint64_t num_sectors_per_node =
+        _nnodes_per_sector > 0 ? 1 : DIV_ROUND_UP(_max_node_len, defaults::SECTOR_LEN);
 
     // query <-> PQ chunk centers distances
-    pq_table.preprocess_query(query_rotated); // center the query and rotate if
-                                              // we have a rotation matrix
+    _pq_table.preprocess_query(query_rotated); // center the query and rotate if
+                                               // we have a rotation matrix
     float *pq_dists = pq_query_scratch->aligned_pqtable_dist_scratch;
-    pq_table.populate_chunk_distances(query_rotated, pq_dists);
+    _pq_table.populate_chunk_distances(query_rotated, pq_dists);
 
     // query <-> neighbor list
     float *dist_scratch = pq_query_scratch->aligned_dist_scratch;
@@ -1271,8 +1383,8 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
     // lambda to batch compute query<-> node distances in PQ space
     auto compute_dists = [this, pq_coord_scratch, pq_dists](const uint32_t *ids, const uint64_t n_ids,
                                                             float *dists_out) {
-        diskann::aggregate_coords(ids, n_ids, this->data, this->n_chunks, pq_coord_scratch);
-        diskann::pq_dist_lookup(pq_coord_scratch, n_ids, this->n_chunks, pq_dists, dists_out);
+        diskann::aggregate_coords(ids, n_ids, this->data, this->_n_chunks, pq_coord_scratch);
+        diskann::pq_dist_lookup(pq_coord_scratch, n_ids, this->_n_chunks, pq_dists, dists_out);
     };
     Timer query_timer, io_timer, cpu_timer;
 
@@ -1285,13 +1397,13 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
     float best_dist = (std::numeric_limits<float>::max)();
     if (!use_filter)
     {
-        for (uint64_t cur_m = 0; cur_m < num_medoids; cur_m++)
+        for (uint64_t cur_m = 0; cur_m < _num_medoids; cur_m++)
         {
             float cur_expanded_dist =
-                dist_cmp_float->compare(query_float, centroid_data + aligned_dim * cur_m, (uint32_t)aligned_dim);
+                _dist_cmp_float->compare(query_float, _centroid_data + _aligned_dim * cur_m, (uint32_t)_aligned_dim);
             if (cur_expanded_dist < best_dist)
             {
-                best_medoid = medoids[cur_m];
+                best_medoid = _medoids[cur_m];
                 best_dist = cur_expanded_dist;
             }
         }
@@ -1352,8 +1464,8 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
         {
             auto nbr = retset.closest_unexpanded();
             num_seen++;
-            auto iter = nhood_cache.find(nbr.id);
-            if (iter != nhood_cache.end())
+            auto iter = _nhood_cache.find(nbr.id);
+            if (iter != _nhood_cache.end())
             {
                 cached_nhoods.push_back(std::make_pair(nbr.id, iter->second));
                 if (stats != nullptr)
@@ -1365,9 +1477,9 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
             {
                 frontier.push_back(nbr.id);
             }
-            if (this->count_visited_nodes)
+            if (this->_count_visited_nodes)
             {
-                reinterpret_cast<std::atomic<uint32_t> &>(this->node_visit_counter[nbr.id].second).fetch_add(1);
+                reinterpret_cast<std::atomic<uint32_t> &>(this->_node_visit_counter[nbr.id].second).fetch_add(1);
             }
         }
 
@@ -1381,10 +1493,11 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
                 auto id = frontier[i];
                 std::pair<uint32_t, char *> fnhood;
                 fnhood.first = id;
-                fnhood.second = sector_scratch + sector_scratch_idx * SECTOR_LEN;
+                fnhood.second = sector_scratch + num_sectors_per_node * sector_scratch_idx * defaults::SECTOR_LEN;
                 sector_scratch_idx++;
                 frontier_nhoods.push_back(fnhood);
-                frontier_read_reqs.emplace_back(NODE_SECTOR_NO(((size_t)id)) * SECTOR_LEN, SECTOR_LEN, fnhood.second);
+                frontier_read_reqs.emplace_back(get_node_sector((size_t)id) * defaults::SECTOR_LEN,
+                                                num_sectors_per_node * defaults::SECTOR_LEN, fnhood.second);
                 if (stats != nullptr)
                 {
                     stats->n_4k++;
@@ -1395,7 +1508,7 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
             io_timer.reset();
 #ifdef USE_BING_INFRA
             reader->read(frontier_read_reqs, ctx,
-                         true); // async reader windows.
+                         true); // asynhronous reader for Bing.
 #else
             reader->read(frontier_read_reqs, ctx); // synchronous IO linux
 #endif
@@ -1408,19 +1521,19 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
         // process cached nhoods
         for (auto &cached_nhood : cached_nhoods)
         {
-            auto global_cache_iter = coord_cache.find(cached_nhood.first);
+            auto global_cache_iter = _coord_cache.find(cached_nhood.first);
             T *node_fp_coords_copy = global_cache_iter->second;
             float cur_expanded_dist;
-            if (!use_disk_index_pq)
+            if (!_use_disk_index_pq)
             {
-                cur_expanded_dist = dist_cmp->compare(aligned_query_T, node_fp_coords_copy, (uint32_t)aligned_dim);
+                cur_expanded_dist = _dist_cmp->compare(aligned_query_T, node_fp_coords_copy, (uint32_t)_aligned_dim);
             }
             else
             {
                 if (metric == diskann::Metric::INNER_PRODUCT)
-                    cur_expanded_dist = disk_pq_table.inner_product(query_float, (uint8_t *)node_fp_coords_copy);
+                    cur_expanded_dist = _disk_pq_table.inner_product(query_float, (uint8_t *)node_fp_coords_copy);
                 else
-                    cur_expanded_dist = disk_pq_table.l2_distance( // disk_pq does not support OPQ yet
+                    cur_expanded_dist = _disk_pq_table.l2_distance( // disk_pq does not support OPQ yet
                         query_float, (uint8_t *)node_fp_coords_copy);
             }
             full_retset.push_back(Neighbor((uint32_t)cached_nhood.first, cur_expanded_dist));
@@ -1446,8 +1559,8 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
                     if (!use_filter && _dummy_pts.find(id) != _dummy_pts.end())
                         continue;
 
-                    if (use_filter && !point_has_label(id, filter_num) 
-                        && (!_use_universal_label || !point_has_label(id, _universal_filter_num)))
+                    if (use_filter && !(point_has_label(id, filter_label)) &&
+                        (!_use_universal_label || !point_has_label(id, _universal_filter_label)))
                         continue;
                     cmps++;
                     float dist = dist_scratch[m];
@@ -1462,7 +1575,7 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
         long requestCount = static_cast<long>(frontier_read_reqs.size());
         // If we issued read requests and if a read is complete or there are
         // reads in wait state, then enter the while loop.
-        while (requestCount > 0 && getNextCompletedRequest(ctx, requestCount, completedIndex))
+        while (requestCount > 0 && getNextCompletedRequest(reader, ctx, requestCount, completedIndex))
         {
             assert(completedIndex >= 0);
             auto &frontier_nhood = frontier_nhoods[completedIndex];
@@ -1471,22 +1584,22 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
         for (auto &frontier_nhood : frontier_nhoods)
         {
 #endif
-            char *node_disk_buf = OFFSET_TO_NODE(frontier_nhood.second, frontier_nhood.first);
-            uint32_t *node_buf = OFFSET_TO_NODE_NHOOD(node_disk_buf);
+            char *node_disk_buf = offset_to_node(frontier_nhood.second, frontier_nhood.first);
+            uint32_t *node_buf = offset_to_node_nhood(node_disk_buf);
             uint64_t nnbrs = (uint64_t)(*node_buf);
-            T *node_fp_coords = OFFSET_TO_NODE_COORDS(node_disk_buf);
-            memcpy(data_buf, node_fp_coords, disk_bytes_per_point);
+            T *node_fp_coords = offset_to_node_coords(node_disk_buf);
+            memcpy(data_buf, node_fp_coords, _disk_bytes_per_point);
             float cur_expanded_dist;
-            if (!use_disk_index_pq)
+            if (!_use_disk_index_pq)
             {
-                cur_expanded_dist = dist_cmp->compare(aligned_query_T, data_buf, (uint32_t)aligned_dim);
+                cur_expanded_dist = _dist_cmp->compare(aligned_query_T, data_buf, (uint32_t)_aligned_dim);
             }
             else
             {
                 if (metric == diskann::Metric::INNER_PRODUCT)
-                    cur_expanded_dist = disk_pq_table.inner_product(query_float, (uint8_t *)data_buf);
+                    cur_expanded_dist = _disk_pq_table.inner_product(query_float, (uint8_t *)data_buf);
                 else
-                    cur_expanded_dist = disk_pq_table.l2_distance(query_float, (uint8_t *)data_buf);
+                    cur_expanded_dist = _disk_pq_table.l2_distance(query_float, (uint8_t *)data_buf);
             }
             full_retset.push_back(Neighbor(frontier_nhood.first, cur_expanded_dist));
             uint32_t *node_nbrs = (node_buf + 1);
@@ -1509,8 +1622,8 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
                     if (!use_filter && _dummy_pts.find(id) != _dummy_pts.end())
                         continue;
 
-                    if (use_filter && !point_has_label(id, filter_num) 
-                        && (!_use_universal_label || !point_has_label(id, _universal_filter_num)))
+                    if (use_filter && !(point_has_label(id, filter_label)) &&
+                        (!_use_universal_label || !point_has_label(id, _universal_filter_label)))
                         continue;
                     cmps++;
                     float dist = dist_scratch[m];
@@ -1538,7 +1651,7 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
 
     if (use_reorder_data)
     {
-        if (!(this->reorder_data_exists))
+        if (!(this->_reorder_data_exists))
         {
             throw ANNException("Requested use of reordering data which does "
                                "not exist in index "
@@ -1553,8 +1666,9 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
 
         for (size_t i = 0; i < full_retset.size(); ++i)
         {
-            vec_read_reqs.emplace_back(VECTOR_SECTOR_NO(((size_t)full_retset[i].id)) * SECTOR_LEN, SECTOR_LEN,
-                                       sector_scratch + i * SECTOR_LEN);
+            // MULTISECTORFIX
+            vec_read_reqs.emplace_back(VECTOR_SECTOR_NO(((size_t)full_retset[i].id)) * defaults::SECTOR_LEN,
+                                       defaults::SECTOR_LEN, sector_scratch + i * defaults::SECTOR_LEN);
 
             if (stats != nullptr)
             {
@@ -1565,7 +1679,7 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
 
         io_timer.reset();
 #ifdef USE_BING_INFRA
-        reader->read(vec_read_reqs, ctx, false); // sync reader windows.
+        reader->read(vec_read_reqs, ctx, true); // async reader windows.
 #else
         reader->read(vec_read_reqs, ctx); // synchronous IO linux
 #endif
@@ -1577,8 +1691,9 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
         for (size_t i = 0; i < full_retset.size(); ++i)
         {
             auto id = full_retset[i].id;
-            auto location = (sector_scratch + i * SECTOR_LEN) + VECTOR_SECTOR_OFFSET(id);
-            full_retset[i].distance = dist_cmp->compare(aligned_query_T, (T *)location, (uint32_t)this->data_dim);
+            // MULTISECTORFIX
+            auto location = (sector_scratch + i * defaults::SECTOR_LEN) + VECTOR_SECTOR_OFFSET(id);
+            full_retset[i].distance = _dist_cmp->compare(aligned_query_T, (T *)location, (uint32_t)this->_data_dim);
         }
 
         std::sort(full_retset.begin(), full_retset.end());
@@ -1603,8 +1718,8 @@ void PQFlashIndex<T, LabelT>::cached_beam_search(const T *query1, const uint64_t
                 distances[i] = (-distances[i]);
                 // rescale to revert back to original norms (cancelling the
                 // effect of base and query pre-processing)
-                if (max_base_norm != 0)
-                    distances[i] *= (max_base_norm * query_norm);
+                if (_max_base_norm != 0)
+                    distances[i] *= (_max_base_norm * query_norm);
             }
         }
     }
@@ -1665,7 +1780,7 @@ uint32_t PQFlashIndex<T, LabelT>::range_search(const T *query1, const double ran
 
 template <typename T, typename LabelT> uint64_t PQFlashIndex<T, LabelT>::get_data_dim()
 {
-    return data_dim;
+    return _data_dim;
 }
 
 template <typename T, typename LabelT> diskann::Metric PQFlashIndex<T, LabelT>::get_metric()
@@ -1705,6 +1820,18 @@ template <typename T, typename LabelT> char *PQFlashIndex<T, LabelT>::getHeaderB
 }
 #endif
 
+template <typename T, typename LabelT>
+std::vector<std::uint8_t> PQFlashIndex<T, LabelT>::get_pq_vector(std::uint64_t vid)
+{
+    std::uint8_t *pqVec = &this->data[vid * this->_n_chunks];
+    return std::vector<std::uint8_t>(pqVec, pqVec + this->_n_chunks);
+}
+
+template <typename T, typename LabelT> std::uint64_t PQFlashIndex<T, LabelT>::get_num_points()
+{
+    return _num_points;
+}
+
 // instantiations
 template class PQFlashIndex<uint8_t>;
 template class PQFlashIndex<int8_t>;
diff --git a/src/pq_l2_distance.cpp b/src/pq_l2_distance.cpp
new file mode 100644
index 000000000..c08744c35
--- /dev/null
+++ b/src/pq_l2_distance.cpp
@@ -0,0 +1,284 @@
+
+#include "pq.h"
+#include "pq_l2_distance.h"
+#include "pq_scratch.h"
+
+// block size for reading/processing large files and matrices in blocks
+#define BLOCK_SIZE 5000000
+
+namespace diskann
+{
+
+template <typename data_t>
+PQL2Distance<data_t>::PQL2Distance(uint32_t num_chunks, bool use_opq) : _num_chunks(num_chunks), _is_opq(use_opq)
+{
+}
+
+template <typename data_t> PQL2Distance<data_t>::~PQL2Distance()
+{
+#ifndef EXEC_ENV_OLS
+    if (_tables != nullptr)
+        delete[] _tables;
+    if (_chunk_offsets != nullptr)
+        delete[] _chunk_offsets;
+    if (_centroid != nullptr)
+        delete[] _centroid;
+    if (_rotmat_tr != nullptr)
+        delete[] _rotmat_tr;
+#endif
+    if (_tables_tr != nullptr)
+        delete[] _tables_tr;
+}
+
+template <typename data_t> bool PQL2Distance<data_t>::is_opq() const
+{
+    return this->_is_opq;
+}
+
+template <typename data_t>
+std::string PQL2Distance<data_t>::get_quantized_vectors_filename(const std::string &prefix) const
+{
+    if (_num_chunks == 0)
+    {
+        throw diskann::ANNException("Must set num_chunks before calling get_quantized_vectors_filename", -1,
+                                    __FUNCSIG__, __FILE__, __LINE__);
+    }
+    return diskann::get_quantized_vectors_filename(prefix, _is_opq, (uint32_t)_num_chunks);
+}
+template <typename data_t> std::string PQL2Distance<data_t>::get_pivot_data_filename(const std::string &prefix) const
+{
+    if (_num_chunks == 0)
+    {
+        throw diskann::ANNException("Must set num_chunks before calling get_pivot_data_filename", -1, __FUNCSIG__,
+                                    __FILE__, __LINE__);
+    }
+    return diskann::get_pivot_data_filename(prefix, _is_opq, (uint32_t)_num_chunks);
+}
+template <typename data_t>
+std::string PQL2Distance<data_t>::get_rotation_matrix_suffix(const std::string &pq_pivots_filename) const
+{
+    return diskann::get_rotation_matrix_suffix(pq_pivots_filename);
+}
+
+#ifdef EXEC_ENV_OLS
+template <typename data_t>
+void PQL2Distance<data_t>::load_pivot_data(MemoryMappedFiles &files, const std::string &pq_table_file,
+                                           size_t num_chunks)
+{
+#else
+template <typename data_t>
+void PQL2Distance<data_t>::load_pivot_data(const std::string &pq_table_file, size_t num_chunks)
+{
+#endif
+    uint64_t nr, nc;
+    // std::string rotmat_file = get_opq_rot_matrix_filename(pq_table_file,
+    // false);
+
+#ifdef EXEC_ENV_OLS
+    size_t *file_offset_data; // since load_bin only sets the pointer, no need
+    // to delete.
+    diskann::load_bin<size_t>(files, pq_table_file, file_offset_data, nr, nc);
+#else
+    std::unique_ptr<size_t[]> file_offset_data;
+    diskann::load_bin<size_t>(pq_table_file, file_offset_data, nr, nc);
+#endif
+
+    bool use_old_filetype = false;
+
+    if (nr != 4 && nr != 5)
+    {
+        diskann::cout << "Error reading pq_pivots file " << pq_table_file
+                      << ". Offsets dont contain correct metadata, # offsets = " << nr << ", but expecting " << 4
+                      << " or " << 5;
+        throw diskann::ANNException("Error reading pq_pivots file at offsets data.", -1, __FUNCSIG__, __FILE__,
+                                    __LINE__);
+    }
+
+    if (nr == 4)
+    {
+        diskann::cout << "Offsets: " << file_offset_data[0] << " " << file_offset_data[1] << " " << file_offset_data[2]
+                      << " " << file_offset_data[3] << std::endl;
+    }
+    else if (nr == 5)
+    {
+        use_old_filetype = true;
+        diskann::cout << "Offsets: " << file_offset_data[0] << " " << file_offset_data[1] << " " << file_offset_data[2]
+                      << " " << file_offset_data[3] << file_offset_data[4] << std::endl;
+    }
+    else
+    {
+        throw diskann::ANNException("Wrong number of offsets in pq_pivots", -1, __FUNCSIG__, __FILE__, __LINE__);
+    }
+
+#ifdef EXEC_ENV_OLS
+    diskann::load_bin<float>(files, pq_table_file, tables, nr, nc, file_offset_data[0]);
+#else
+    diskann::load_bin<float>(pq_table_file, _tables, nr, nc, file_offset_data[0]);
+#endif
+
+    if ((nr != NUM_PQ_CENTROIDS))
+    {
+        diskann::cout << "Error reading pq_pivots file " << pq_table_file << ". file_num_centers  = " << nr
+                      << " but expecting " << NUM_PQ_CENTROIDS << " centers";
+        throw diskann::ANNException("Error reading pq_pivots file at pivots data.", -1, __FUNCSIG__, __FILE__,
+                                    __LINE__);
+    }
+
+    this->_ndims = nc;
+
+#ifdef EXEC_ENV_OLS
+    diskann::load_bin<float>(files, pq_table_file, centroid, nr, nc, file_offset_data[1]);
+#else
+    diskann::load_bin<float>(pq_table_file, _centroid, nr, nc, file_offset_data[1]);
+#endif
+
+    if ((nr != this->_ndims) || (nc != 1))
+    {
+        diskann::cerr << "Error reading centroids from pq_pivots file " << pq_table_file << ". file_dim  = " << nr
+                      << ", file_cols = " << nc << " but expecting " << this->_ndims << " entries in 1 dimension.";
+        throw diskann::ANNException("Error reading pq_pivots file at centroid data.", -1, __FUNCSIG__, __FILE__,
+                                    __LINE__);
+    }
+
+    int chunk_offsets_index = 2;
+    if (use_old_filetype)
+    {
+        chunk_offsets_index = 3;
+    }
+#ifdef EXEC_ENV_OLS
+    diskann::load_bin<uint32_t>(files, pq_table_file, chunk_offsets, nr, nc, file_offset_data[chunk_offsets_index]);
+#else
+    diskann::load_bin<uint32_t>(pq_table_file, _chunk_offsets, nr, nc, file_offset_data[chunk_offsets_index]);
+#endif
+
+    if (nc != 1 || (nr != num_chunks + 1 && num_chunks != 0))
+    {
+        diskann::cerr << "Error loading chunk offsets file. numc: " << nc << " (should be 1). numr: " << nr
+                      << " (should be " << num_chunks + 1 << " or 0 if we need to infer)" << std::endl;
+        throw diskann::ANNException("Error loading chunk offsets file", -1, __FUNCSIG__, __FILE__, __LINE__);
+    }
+
+    this->_num_chunks = nr - 1;
+    diskann::cout << "Loaded PQ Pivots: #ctrs: " << NUM_PQ_CENTROIDS << ", #dims: " << this->_ndims
+                  << ", #chunks: " << this->_num_chunks << std::endl;
+
+    // For OPQ there will be a rotation matrix to load.
+    if (this->_is_opq)
+    {
+        std::string rotmat_file = get_rotation_matrix_suffix(pq_table_file);
+#ifdef EXEC_ENV_OLS
+        diskann::load_bin<float>(files, rotmat_file, (float *&)rotmat_tr, nr, nc);
+#else
+        diskann::load_bin<float>(rotmat_file, _rotmat_tr, nr, nc);
+#endif
+        if (nr != this->_ndims || nc != this->_ndims)
+        {
+            diskann::cerr << "Error loading rotation matrix file" << std::endl;
+            throw diskann::ANNException("Error loading rotation matrix file", -1, __FUNCSIG__, __FILE__, __LINE__);
+        }
+    }
+
+    // alloc and compute transpose
+    _tables_tr = new float[256 * this->_ndims];
+    for (size_t i = 0; i < 256; i++)
+    {
+        for (size_t j = 0; j < this->_ndims; j++)
+        {
+            _tables_tr[j * 256 + i] = _tables[i * this->_ndims + j];
+        }
+    }
+}
+
+template <typename data_t> uint32_t PQL2Distance<data_t>::get_num_chunks() const
+{
+    return static_cast<uint32_t>(_num_chunks);
+}
+
+// REFACTOR: Instead of doing half the work in the caller and half in this
+// function, we let this function
+//  do all of the work, making it easier for the caller.
+template <typename data_t>
+void PQL2Distance<data_t>::preprocess_query(const data_t *aligned_query, uint32_t dim, PQScratch<data_t> &scratch)
+{
+    // Copy query vector to float and then to "rotated" query
+    for (size_t d = 0; d < dim; d++)
+    {
+        scratch.aligned_query_float[d] = (float)aligned_query[d];
+    }
+    scratch.initialize(dim, aligned_query);
+
+    for (uint32_t d = 0; d < _ndims; d++)
+    {
+        scratch.rotated_query[d] -= _centroid[d];
+    }
+    std::vector<float> tmp(_ndims, 0);
+    if (_is_opq)
+    {
+        for (uint32_t d = 0; d < _ndims; d++)
+        {
+            for (uint32_t d1 = 0; d1 < _ndims; d1++)
+            {
+                tmp[d] += scratch.rotated_query[d1] * _rotmat_tr[d1 * _ndims + d];
+            }
+        }
+        std::memcpy(scratch.rotated_query, tmp.data(), _ndims * sizeof(float));
+    }
+    this->prepopulate_chunkwise_distances(scratch.rotated_query, scratch.aligned_pqtable_dist_scratch);
+}
+
+template <typename data_t>
+void PQL2Distance<data_t>::preprocessed_distance(PQScratch<data_t> &pq_scratch, const uint32_t n_ids, float *dists_out)
+{
+    pq_dist_lookup(pq_scratch.aligned_pq_coord_scratch, n_ids, _num_chunks, pq_scratch.aligned_pqtable_dist_scratch,
+                   dists_out);
+}
+
+template <typename data_t>
+void PQL2Distance<data_t>::preprocessed_distance(PQScratch<data_t> &pq_scratch, const uint32_t n_ids,
+                                                 std::vector<float> &dists_out)
+{
+    pq_dist_lookup(pq_scratch.aligned_pq_coord_scratch, n_ids, _num_chunks, pq_scratch.aligned_pqtable_dist_scratch,
+                   dists_out);
+}
+
+template <typename data_t> float PQL2Distance<data_t>::brute_force_distance(const float *query_vec, uint8_t *base_vec)
+{
+    float res = 0;
+    for (size_t chunk = 0; chunk < _num_chunks; chunk++)
+    {
+        for (size_t j = _chunk_offsets[chunk]; j < _chunk_offsets[chunk + 1]; j++)
+        {
+            const float *centers_dim_vec = _tables_tr + (256 * j);
+            float diff = centers_dim_vec[base_vec[chunk]] - (query_vec[j]);
+            res += diff * diff;
+        }
+    }
+    return res;
+}
+
+template <typename data_t>
+void PQL2Distance<data_t>::prepopulate_chunkwise_distances(const float *query_vec, float *dist_vec)
+{
+    memset(dist_vec, 0, 256 * _num_chunks * sizeof(float));
+    // chunk wise distance computation
+    for (size_t chunk = 0; chunk < _num_chunks; chunk++)
+    {
+        // sum (q-c)^2 for the dimensions associated with this chunk
+        float *chunk_dists = dist_vec + (256 * chunk);
+        for (size_t j = _chunk_offsets[chunk]; j < _chunk_offsets[chunk + 1]; j++)
+        {
+            const float *centers_dim_vec = _tables_tr + (256 * j);
+            for (size_t idx = 0; idx < 256; idx++)
+            {
+                double diff = centers_dim_vec[idx] - (query_vec[j]);
+                chunk_dists[idx] += (float)(diff * diff);
+            }
+        }
+    }
+}
+
+template DISKANN_DLLEXPORT class PQL2Distance<int8_t>;
+template DISKANN_DLLEXPORT class PQL2Distance<uint8_t>;
+template DISKANN_DLLEXPORT class PQL2Distance<float>;
+
+} // namespace diskann
\ No newline at end of file
diff --git a/src/restapi/search_wrapper.cpp b/src/restapi/search_wrapper.cpp
index dc9f5734e..001e36d39 100644
--- a/src/restapi/search_wrapper.cpp
+++ b/src/restapi/search_wrapper.cpp
@@ -100,7 +100,9 @@ InMemorySearch<T>::InMemorySearch(const std::string &baseFile, const std::string
 {
     size_t dimensions, total_points = 0;
     diskann::get_bin_metadata(baseFile, total_points, dimensions);
-    _index = std::unique_ptr<diskann::Index<T>>(new diskann::Index<T>(m, dimensions, total_points, false));
+    auto search_params = diskann::IndexSearchParams(search_l, num_threads);
+    _index = std::unique_ptr<diskann::Index<T>>(
+        new diskann::Index<T>(m, dimensions, total_points, nullptr, search_params, 0, false));
 
     _index->load(indexFile.c_str(), num_threads, search_l);
 }
diff --git a/src/scratch.cpp b/src/scratch.cpp
index 745daa6a7..8b8427453 100644
--- a/src/scratch.cpp
+++ b/src/scratch.cpp
@@ -5,6 +5,7 @@
 #include <boost/dynamic_bitset.hpp>
 
 #include "scratch.h"
+#include "pq_scratch.h"
 
 namespace diskann
 {
@@ -24,18 +25,18 @@ InMemQueryScratch<T>::InMemQueryScratch(uint32_t search_l, uint32_t indexing_l,
         throw diskann::ANNException(ss.str(), -1);
     }
 
-    alloc_aligned(((void **)&_aligned_query), aligned_dim * sizeof(T), alignment_factor * sizeof(T));
-    memset(_aligned_query, 0, aligned_dim * sizeof(T));
+    alloc_aligned(((void **)&this->_aligned_query_T), aligned_dim * sizeof(T), alignment_factor * sizeof(T));
+    memset(this->_aligned_query_T, 0, aligned_dim * sizeof(T));
 
     if (init_pq_scratch)
-        _pq_scratch = new PQScratch<T>(MAX_GRAPH_DEGREE, aligned_dim);
+        this->_pq_scratch = new PQScratch<T>(defaults::MAX_GRAPH_DEGREE, aligned_dim);
     else
-        _pq_scratch = nullptr;
+        this->_pq_scratch = nullptr;
 
     _occlude_factor.reserve(maxc);
     _inserted_into_pool_bs = new boost::dynamic_bitset<>();
-    _id_scratch.reserve((size_t)std::ceil(1.5 * GRAPH_SLACK_FACTOR * _R));
-    _dist_scratch.reserve((size_t)std::ceil(1.5 * GRAPH_SLACK_FACTOR * _R));
+    _id_scratch.reserve((size_t)std::ceil(1.5 * defaults::GRAPH_SLACK_FACTOR * _R));
+    _dist_scratch.reserve((size_t)std::ceil(1.5 * defaults::GRAPH_SLACK_FACTOR * _R));
 
     resize_for_new_L(std::max(search_l, indexing_l));
 
@@ -77,12 +78,13 @@ template <typename T> void InMemQueryScratch<T>::resize_for_new_L(uint32_t new_l
 
 template <typename T> InMemQueryScratch<T>::~InMemQueryScratch()
 {
-    if (_aligned_query != nullptr)
+    if (this->_aligned_query_T != nullptr)
     {
-        aligned_free(_aligned_query);
+        aligned_free(this->_aligned_query_T);
+        this->_aligned_query_T = nullptr;
     }
 
-    delete _pq_scratch;
+    delete this->_pq_scratch;
     delete _inserted_into_pool_bs;
 }
 
@@ -102,13 +104,14 @@ template <typename T> SSDQueryScratch<T>::SSDQueryScratch(size_t aligned_dim, si
     size_t coord_alloc_size = ROUND_UP(sizeof(T) * aligned_dim, 256);
 
     diskann::alloc_aligned((void **)&coord_scratch, coord_alloc_size, 256);
-    diskann::alloc_aligned((void **)&sector_scratch, (size_t)MAX_N_SECTOR_READS * (size_t)SECTOR_LEN, SECTOR_LEN);
-    diskann::alloc_aligned((void **)&aligned_query_T, aligned_dim * sizeof(T), 8 * sizeof(T));
+    diskann::alloc_aligned((void **)&sector_scratch, defaults::MAX_N_SECTOR_READS * defaults::SECTOR_LEN,
+                           defaults::SECTOR_LEN);
+    diskann::alloc_aligned((void **)&this->_aligned_query_T, aligned_dim * sizeof(T), 8 * sizeof(T));
 
-    _pq_scratch = new PQScratch<T>(MAX_GRAPH_DEGREE, aligned_dim);
+    this->_pq_scratch = new PQScratch<T>(defaults::MAX_GRAPH_DEGREE, aligned_dim);
 
     memset(coord_scratch, 0, coord_alloc_size);
-    memset(aligned_query_T, 0, aligned_dim * sizeof(T));
+    memset(this->_aligned_query_T, 0, aligned_dim * sizeof(T));
 
     visited.reserve(visited_reserve);
     full_retset.reserve(visited_reserve);
@@ -118,9 +121,9 @@ template <typename T> SSDQueryScratch<T>::~SSDQueryScratch()
 {
     diskann::aligned_free((void *)coord_scratch);
     diskann::aligned_free((void *)sector_scratch);
-    diskann::aligned_free((void *)aligned_query_T);
+    diskann::aligned_free((void *)this->_aligned_query_T);
 
-    delete[] _pq_scratch;
+    delete this->_pq_scratch;
 }
 
 template <typename T>
@@ -133,6 +136,39 @@ template <typename T> void SSDThreadData<T>::clear()
     scratch.reset();
 }
 
+template <typename T> PQScratch<T>::PQScratch(size_t graph_degree, size_t aligned_dim)
+{
+    diskann::alloc_aligned((void **)&aligned_pq_coord_scratch,
+                           (size_t)graph_degree * (size_t)MAX_PQ_CHUNKS * sizeof(uint8_t), 256);
+    diskann::alloc_aligned((void **)&aligned_pqtable_dist_scratch, 256 * (size_t)MAX_PQ_CHUNKS * sizeof(float), 256);
+    diskann::alloc_aligned((void **)&aligned_dist_scratch, (size_t)graph_degree * sizeof(float), 256);
+    diskann::alloc_aligned((void **)&aligned_query_float, aligned_dim * sizeof(float), 8 * sizeof(float));
+    diskann::alloc_aligned((void **)&rotated_query, aligned_dim * sizeof(float), 8 * sizeof(float));
+
+    memset(aligned_query_float, 0, aligned_dim * sizeof(float));
+    memset(rotated_query, 0, aligned_dim * sizeof(float));
+}
+
+template <typename T> PQScratch<T>::~PQScratch()
+{
+    diskann::aligned_free((void *)aligned_pq_coord_scratch);
+    diskann::aligned_free((void *)aligned_pqtable_dist_scratch);
+    diskann::aligned_free((void *)aligned_dist_scratch);
+    diskann::aligned_free((void *)aligned_query_float);
+    diskann::aligned_free((void *)rotated_query);
+}
+
+template <typename T> void PQScratch<T>::initialize(size_t dim, const T *query, const float norm)
+{
+    for (size_t d = 0; d < dim; ++d)
+    {
+        if (norm != 1.0f)
+            rotated_query[d] = aligned_query_float[d] = static_cast<float>(query[d]) / norm;
+        else
+            rotated_query[d] = aligned_query_float[d] = static_cast<float>(query[d]);
+    }
+}
+
 template DISKANN_DLLEXPORT class InMemQueryScratch<int8_t>;
 template DISKANN_DLLEXPORT class InMemQueryScratch<uint8_t>;
 template DISKANN_DLLEXPORT class InMemQueryScratch<float>;
@@ -141,7 +177,12 @@ template DISKANN_DLLEXPORT class SSDQueryScratch<int8_t>;
 template DISKANN_DLLEXPORT class SSDQueryScratch<uint8_t>;
 template DISKANN_DLLEXPORT class SSDQueryScratch<float>;
 
+template DISKANN_DLLEXPORT class PQScratch<int8_t>;
+template DISKANN_DLLEXPORT class PQScratch<uint8_t>;
+template DISKANN_DLLEXPORT class PQScratch<float>;
+
 template DISKANN_DLLEXPORT class SSDThreadData<int8_t>;
 template DISKANN_DLLEXPORT class SSDThreadData<uint8_t>;
 template DISKANN_DLLEXPORT class SSDThreadData<float>;
+
 } // namespace diskann
diff --git a/src/utils.cpp b/src/utils.cpp
index b675e656d..3773cda22 100644
--- a/src/utils.cpp
+++ b/src/utils.cpp
@@ -391,29 +391,30 @@ template <typename T> void read_array(AlignedFileReader &reader, T *data, size_t
     if (data == nullptr)
     {
         throw diskann::ANNException("read_array requires an allocated buffer.", -1);
-        if (size * sizeof(T) > MAX_REQUEST_SIZE)
-        {
-            std::stringstream ss;
-            ss << "Cannot read more than " << MAX_REQUEST_SIZE
-               << " bytes. Current request size: " << std::to_string(size) << " sizeof(T): " << sizeof(T) << std::endl;
-            throw diskann::ANNException(ss.str(), -1, __FUNCSIG__, __FILE__, __LINE__);
-        }
-        std::vector<AlignedRead> read_requests;
-        AlignedRead read_req;
-        read_req.buf = data;
-        read_req.len = size * sizeof(T);
-        read_req.offset = offset;
-        read_requests.push_back(read_req);
-        IOContext &ctx = reader.get_ctx();
-        reader.read(read_requests, ctx);
+    }
 
-        if ((*(ctx.m_pRequestsStatus))[0] != IOContext::READ_SUCCESS)
-        {
-            std::stringstream ss;
-            ss << "Failed to read_array() of size: " << size * sizeof(T) << " at offset: " << offset << " from reader. "
-               << std::endl;
-            throw diskann::ANNException(ss.str(), -1, __FUNCSIG__, __FILE__, __LINE__);
-        }
+    if (size * sizeof(T) > MAX_REQUEST_SIZE)
+    {
+        std::stringstream ss;
+        ss << "Cannot read more than " << MAX_REQUEST_SIZE << " bytes. Current request size: " << std::to_string(size)
+           << " sizeof(T): " << sizeof(T) << std::endl;
+        throw diskann::ANNException(ss.str(), -1, __FUNCSIG__, __FILE__, __LINE__);
+    }
+    std::vector<AlignedRead> read_requests;
+    AlignedRead read_req;
+    read_req.buf = data;
+    read_req.len = size * sizeof(T);
+    read_req.offset = offset;
+    read_requests.push_back(read_req);
+    IOContext &ctx = reader.get_ctx();
+    reader.read(read_requests, ctx);
+
+    if ((*(ctx.m_pRequestsStatus))[0] != IOContext::READ_SUCCESS)
+    {
+        std::stringstream ss;
+        ss << "Failed to read_array() of size: " << size * sizeof(T) << " at offset: " << offset << " from reader. "
+           << std::endl;
+        throw diskann::ANNException(ss.str(), -1, __FUNCSIG__, __FILE__, __LINE__);
     }
 }
 
diff --git a/tests/index_write_parameters_builder_tests.cpp b/tests/index_write_parameters_builder_tests.cpp
index acd5e2227..0aa798da8 100644
--- a/tests/index_write_parameters_builder_tests.cpp
+++ b/tests/index_write_parameters_builder_tests.cpp
@@ -14,7 +14,6 @@ BOOST_AUTO_TEST_CASE(test_build)
     float alpha = (float)rand();
     uint32_t filter_list_size = rand();
     uint32_t max_occlusion_size = rand();
-    uint32_t num_frozen_points = rand();
     bool saturate_graph = true;
 
     diskann::IndexWriteParametersBuilder builder(search_list_size, max_degree);
@@ -22,7 +21,6 @@ BOOST_AUTO_TEST_CASE(test_build)
     builder.with_alpha(alpha)
         .with_filter_list_size(filter_list_size)
         .with_max_occlusion_size(max_occlusion_size)
-        .with_num_frozen_points(num_frozen_points)
         .with_num_threads(0)
         .with_saturate_graph(saturate_graph);
 
@@ -34,7 +32,6 @@ BOOST_AUTO_TEST_CASE(test_build)
         BOOST_TEST(alpha == parameters.alpha);
         BOOST_TEST(filter_list_size == parameters.filter_list_size);
         BOOST_TEST(max_occlusion_size == parameters.max_occlusion_size);
-        BOOST_TEST(num_frozen_points == parameters.num_frozen_points);
         BOOST_TEST(saturate_graph == parameters.saturate_graph);
 
         BOOST_TEST(parameters.num_threads > (uint32_t)0);
@@ -43,8 +40,7 @@ BOOST_AUTO_TEST_CASE(test_build)
     {
         uint32_t num_threads = rand() + 1;
         saturate_graph = false;
-        builder.with_num_threads(num_threads)
-            .with_saturate_graph(saturate_graph);
+        builder.with_num_threads(num_threads).with_saturate_graph(saturate_graph);
 
         auto parameters = builder.build();
 
@@ -53,7 +49,6 @@ BOOST_AUTO_TEST_CASE(test_build)
         BOOST_TEST(alpha == parameters.alpha);
         BOOST_TEST(filter_list_size == parameters.filter_list_size);
         BOOST_TEST(max_occlusion_size == parameters.max_occlusion_size);
-        BOOST_TEST(num_frozen_points == parameters.num_frozen_points);
         BOOST_TEST(saturate_graph == parameters.saturate_graph);
 
         BOOST_TEST(num_threads == parameters.num_threads);
diff --git a/unit_tester.sh b/unit_tester.sh
index d19e62575..1ef96c025 100755
--- a/unit_tester.sh
+++ b/unit_tester.sh
@@ -43,29 +43,29 @@ while IFS= read -r line; do
   BUDGETBUILD=`bc <<< "scale=4; 0.0001 + ${FILESIZE}/(5*1024*1024*1024)"`
   BUDGETSERVE=`bc <<< "scale=4; 0.0001 + ${FILESIZE}/(10*1024*1024*1024)"`
   echo "============================================================================================================================================="
-  echo "Running tests on ${DATASET} dataset, ${TYPE} datatype, $METRIC metric, ${BUDGETBUILD} GiB and ${BUDGETSERVE} GiB build and serve budget"
+  echo "Running apps on ${DATASET} dataset, ${TYPE} datatype, $METRIC metric, ${BUDGETBUILD} GiB and ${BUDGETSERVE} GiB build and serve budget"
   echo "============================================================================================================================================="
   rm ${DISK}_*
   
   #echo "Going to run test on ${BASE} base, ${QUERY} query, ${TYPE} datatype, ${METRIC} metric, saving gt at ${GT}"
   echo "Computing Groundtruth"
-  #${BUILD_FOLDER}/tests/utils/compute_groundtruth ${TYPE} ${BASE} ${QUERY} 30 ${GT} ${METRIC} > /dev/null
-  ${BUILD_FOLDER}/tests/utils/compute_groundtruth --data_type ${TYPE} --base_file ${BASE} --query_file ${QUERY} --K 30 --gt_file ${GT} --dist_fn ${METRIC} > /dev/null  
+  #${BUILD_FOLDER}/apps/utils/compute_groundtruth ${TYPE} ${BASE} ${QUERY} 30 ${GT} ${METRIC} > /dev/null
+  ${BUILD_FOLDER}/apps/utils/compute_groundtruth --data_type ${TYPE} --base_file ${BASE} --query_file ${QUERY} --K 30 --gt_file ${GT} --dist_fn ${METRIC} > /dev/null  
   echo "Building Mem Index"
-#  /usr/bin/time ${BUILD_FOLDER}/tests/build_memory_index ${TYPE} ${METRIC} ${BASE} ${MEM}  32  50  1.2 0 > ${MBLOG}
-  /usr/bin/time ${BUILD_FOLDER}/tests/build_memory_index --data_type ${TYPE} --dist_fn ${METRIC} --data_path ${BASE} --index_path_prefix ${MEM}  -R 32  -L 50  --alpha 1.2 -T 0 > ${MBLOG}
+#  /usr/bin/time ${BUILD_FOLDER}/apps/build_memory_index ${TYPE} ${METRIC} ${BASE} ${MEM}  32  50  1.2 0 > ${MBLOG}
+  /usr/bin/time ${BUILD_FOLDER}/apps/build_memory_index --data_type ${TYPE} --dist_fn ${METRIC} --data_path ${BASE} --index_path_prefix ${MEM}  -R 32  -L 50  --alpha 1.2 -T 0 > ${MBLOG}
   awk '/^Degree/' ${MBLOG}
   awk '/^Indexing/' ${MBLOG}
   echo "Searching Mem Index"
-  ${BUILD_FOLDER}/tests/search_memory_index --data_type ${TYPE} --dist_fn ${METRIC} --index_path_prefix ${MEM} -T 16 --query_file ${QUERY} --gt_file ${GT} -K 10 --result_path /tmp/res -L 10 20 30 40 50 60 70 80 90 100 > ${MSLOG}
+  ${BUILD_FOLDER}/apps/search_memory_index --data_type ${TYPE} --dist_fn ${METRIC} --index_path_prefix ${MEM} -T 16 --query_file ${QUERY} --gt_file ${GT} -K 10 --result_path /tmp/res -L 10 20 30 40 50 60 70 80 90 100 > ${MSLOG}
   awk '/===/{x=NR+10}(NR<=x){print}' ${MSLOG}
   echo "Building Disk Index"
-  ${BUILD_FOLDER}/tests/build_disk_index  --data_type ${TYPE} --dist_fn ${METRIC} --data_path ${BASE} --index_path_prefix ${DISK} -R 32 -L 50 -B ${BUDGETSERVE} -M ${BUDGETBUILD} -T 32 --PQ_disk_bytes 0 > ${DBLOG}
+  ${BUILD_FOLDER}/apps/build_disk_index  --data_type ${TYPE} --dist_fn ${METRIC} --data_path ${BASE} --index_path_prefix ${DISK} -R 32 -L 50 -B ${BUDGETSERVE} -M ${BUDGETBUILD} -T 32 --PQ_disk_bytes 0 > ${DBLOG}
   awk '/^Compressing/' ${DBLOG}
   echo "#shards in disk index"
   awk '/^Indexing/' ${DBLOG}
   echo "Searching Disk Index"
-  ${BUILD_FOLDER}/tests/search_disk_index --data_type ${TYPE} --dist_fn ${METRIC} --index_path_prefix ${DISK} --num_nodes_to_cache 10000 -T 10 -W 4 --query_file ${QUERY} --gt_file ${GT} -K 10 --result_path /tmp/res -L 20 40 60 80 100 > ${DSLOG}
+  ${BUILD_FOLDER}/apps/search_disk_index --data_type ${TYPE} --dist_fn ${METRIC} --index_path_prefix ${DISK} --num_nodes_to_cache 10000 -T 10 -W 4 --query_file ${QUERY} --gt_file ${GT} -K 10 --result_path /tmp/res -L 20 40 60 80 100 > ${DSLOG}
   echo "# shards used during index construction:"
   awk '/medoids/{x=NR+1}(NR<=x){print}' ${DSLOG}
   awk '/===/{x=NR+10}(NR<=x){print}' ${DSLOG}
diff --git a/workflows/SSD_index.md b/workflows/SSD_index.md
index f86856796..e95ece97f 100644
--- a/workflows/SSD_index.md
+++ b/workflows/SSD_index.md
@@ -11,7 +11,7 @@ The arguments are as follows:
 3. **--data_file**: The input data over which to build an index, in .bin format. The first 4 bytes represent number of points as an integer. The next 4 bytes represent the dimension of data as an integer. The following `n*d*sizeof(T)` bytes contain the contents of the data one data point in time. `sizeof(T)` is 1 for byte indices, and 4 for float indices. This will be read by the program as int8_t for signed indices, uint8_t for unsigned indices or float for float indices.
 4. **--index_path_prefix**: the index will span a few files, all beginning with the specified prefix path. For example, if you provide `~/index_test` as the prefix path, build  generates files such as `~/index_test_pq_pivots.bin, ~/index_test_pq_compressed.bin, ~/index_test_disk.index, ...`. There may be between 8 and 10 files generated with this prefix depending on how the index is constructed.
 5. **-R (--max_degree)**  (default is 64): the degree of the graph index, typically between 60 and 150. Larger R will result in larger indices and longer indexing times, but better search quality. 
-6. **-L (--Lbuild)**  (default is 100): the size of search listduring index build. Typical values are between 75 to 200. Larger values will take more time to build but result in indices that provide higher recall for the same search complexity. Use a value for L value that is at least the value of R unless you need to build indices really quickly and can somewhat compromise on quality. 
+6. **-L (--Lbuild)**  (default is 100): the size of search list during index build. Typical values are between 75 to 200. Larger values will take more time to build but result in indices that provide higher recall for the same search complexity. Use a value for L value that is at least the value of R unless you need to build indices really quickly and can somewhat compromise on quality. 
 7. **-B (--search_DRAM_budget)**: bound on the memory footprint of the index at search time in GB. Once built, the index will use up only the specified RAM limit, the rest will reside on disk. This will dictate how aggressively we compress the data vectors to store in memory. Larger will yield better performance at search time. For an n point index, to use b byte PQ compressed representation in memory, use `B = ((n * b) / 2^30  + (250000*(4*R + sizeof(T)*ndim)) / 2^30)`. The second term in the summation is to allow some buffer for caching about 250,000 nodes from the graph in memory while serving.  If you are not sure about this term, add 0.25GB to the first term. 
 8. **-M (--build_DRAM_budget)**: Limit on the memory allowed for building the index in GB. If you specify a value less than what is required to build the index in one pass, the index is  built using a divide and conquer approach so that  sub-graphs will fit in the RAM budget. The sub-graphs are overlayed to build the overall index. This approach can be upto 1.5 times slower than building the index in one shot. Allocate as much memory as your RAM allows.
 9. **-T (--num_threads)** (default is to get_omp_num_procs()): number of threads used by the index build process. Since the code is highly parallel, the  indexing time improves almost linearly with the number of threads (subject to the cores available on the machine and DRAM bandwidth).
@@ -34,7 +34,7 @@ The arguments are as follows:
 8. **--gt_file**: The ground truth file for the queries in arg (7) and data file used in index construction.  The binary file must start with *n*, the number of queries (4 bytes), followed by *d*, the number of ground truth elements per query (4 bytes), followed by `n*d` entries per query representing the d closest IDs per query in integer format,  followed by `n*d` entries representing the corresponding distances (float). Total file size is `8 + 4*n*d + 4*n*d` bytes. The groundtruth file, if not available, can be calculated using the program `apps/utils/compute_groundtruth`. Use "null" if you do not have this file and if you do not want to compute recall.
 9. **K**: search for *K* neighbors and measure *K*-recall@*K*, meaning the intersection between the retrieved top-*K* nearest neighbors and ground truth *K* nearest neighbors.
 10. **result_output_prefix**: Search results will be stored in files with specified prefix, in bin format.
-11. **-L (--search_list)**: A list of search_list sizes to perform search with. Larger parameters will result in slower latencies, but higher accuracies. Must be atleast the value of *K* in arg (9).
+11. **-L (--search_list)**: A list of search_list sizes to perform search with. Larger parameters will result in slower latencies, but higher accuracies. Must be at least the value of *K* in arg (9).
 
 
 Example with BIGANN:
@@ -60,7 +60,7 @@ Now build and search the index and measure the recall using ground truth compute
  ./apps/search_disk_index  --data_type float --dist_fn l2 --index_path_prefix data/sift/disk_index_sift_learn_R32_L50_A1.2 --query_file data/sift/sift_query.fbin  --gt_file data/sift/sift_query_learn_gt100 -K 10 -L 10 20 30 40 50 100 --result_path data/sift/res --num_nodes_to_cache 10000
  ```
 
-The search might be slower on machine with remote SSDs. The output lists the quer throughput, the mean and 99.9pc latency in microseconds and mean number of 4KB IOs to disk for each `L` parameter provided. 
+The search might be slower on machine with remote SSDs. The output lists the query throughput, the mean and 99.9pc latency in microseconds and mean number of 4KB IOs to disk for each `L` parameter provided. 
 
 ```
     L   Beamwidth             QPS    Mean Latency    99.9 Latency        Mean IOs         CPU (s)       Recall@10
diff --git a/workflows/dynamic_index.md b/workflows/dynamic_index.md
index ca3bfbf68..17c3fb3bf 100644
--- a/workflows/dynamic_index.md
+++ b/workflows/dynamic_index.md
@@ -22,6 +22,17 @@ The program then simultaneously inserts newer points drawn from the file and del
 in chunks of `consolidate_interval` points so that the number of active points in the index is approximately `active_window`.
 It terminates when the end of data file is reached, and the final index has `active_window + consolidate_interval` number of points.
 
+The index also supports filters on steaming index, you can use `insert_point` function overloads to either insert points as before or insert points with labels.
+Additional options are added to support this in `apps/test_streaming_scenario` and `apps/test_streaming_scenario` please refer to program arguments for more details.
+
+---
+> Note
+* The index does not support mixed points, that is, either all points do not have labels or all points have labels. 
+* You can search the built filter index (one built with filters) without filters as well.
+ 
+> WARNING: Deleting points in case of filtered build may cause the quality of Index to degrade and affect recall.
+---
+
 `apps/test_insert_deletes_consolidate` to try inserting, lazy deletes and consolidate_delete 
 ---------------------------------------------------------------------------------------------
 
@@ -63,7 +74,13 @@ The arguments are as follows:
 12. **--consolidate_interval**: Granularity at which insert and delete functions are called.
 13. **--start_point_norm**: Set the starting node to a random point on a sphere of this radius.  A reasonable choice is to set this to the average norm of the data stream.
 
+** To build with filters add these optional parameters.
 
+14. **--label_file**: Filter data for each point, in `.txt` format. Line `i` of the file consists of a comma-separated list of labels corresponding to point `i` in the file passed via `--data_file`.
+15. **--FilteredLbuild**: If building a filtered index, we maintain a separate search list from the one provided by `--Lbuild/-L`.
+16. **--num_start_points**: number of frozen points in this case should be more then number of unique labels. 
+17. **--universal_label**: Optionally, the label data may contain a special "universal" label. A point with the universal label can be matched against a query with any label. Note that if a point has the universal label, then the filter data must only have the universal label on the line corresponding.
+18. **--label_type**: Optionally, type of label to be use its either uint or short, defaulted to `uint`.
 
 To search the generated index, use the `apps/search_memory_index` program:
 ---------------------------------------------------------------------------
@@ -83,6 +100,9 @@ The arguments are as follows:
 10. **--dynamic** (default false): whether the index being searched is dynamic or not.
 11. **--tags** (default false): whether to search with tags. This should be used if point *i* in the ground truth file does not correspond the point in the *i*th position in the loaded index.
 
+** to search with filters add these
+
+12. **--filter_label**: Filter for each query. For each query, a search is performed with this filter.
 
 Example with BIGANN:
 --------------------
@@ -126,7 +146,13 @@ gt_file=data/sift/gt100_learn-conc-${deletes}-${inserts}
 are inserted, start deleting the first 10000 points while inserting points 40000--50000.  Then delete points 10000--20000 while inserting
 points 50000--60000 and so until the index is left with points 60000-100000.
 
+
+Generate labels for filtered build like this. Generating 50 unique labels zipf's distributed for 100K point dataset.
+```
+~/DiskANN/build/apps/utils/generate_synthetic_labels  --num_labels 50 --num_points 100000  --output_file data/zipf_labels_50_100K.txt --distribution_type zipf
 ```
+
+```bash
 type='float'
 data='data/sift/sift_learn.fbin'
 query='data/sift/sift_query.fbin'
@@ -139,8 +165,23 @@ active=20000
 cons_int=10000
 index=${index_prefix}.after-streaming-act${active}-cons${cons_int}-max${inserts}
 gt=data/sift/gt100_learn-act${active}-cons${cons_int}-max${inserts}
+filter_label=1
+
+## filter options
+universal_label = '0'
+label_file = 'data/zipf_labels_50_100K.txt'
+num_start_points = 50
+gt_filtered= data/sift/gt100_learn-act${active}-cons${cons_int}-max${inserts}_wlabel_${filter_label}
+
 
+# Without Filters (build and search)
 ./apps/test_streaming_scenario  --data_type ${type} --dist_fn l2 --data_path ${data}  --index_path_prefix ${index_prefix} -R 64 -L 600 --alpha 1.2 --insert_threads ${ins_thr} --consolidate_threads ${cons_thr}  --max_points_to_insert ${inserts}  --active_window ${active} --consolidate_interval ${cons_int} --start_point_norm 508;
 ./apps/utils/compute_groundtruth --data_type ${type} --dist_fn l2 --base_file ${index}.data  --query_file ${query}  --K 100 --gt_file ${gt} --tags_file  ${index}.tags
 ./apps/search_memory_index  --data_type ${type} --dist_fn l2 --index_path_prefix ${index} --result_path ${result} --query_file ${query}  --gt_file ${gt}  -K 10 -L 20 40 60 80 100 -T 64 --dynamic true --tags 1
-```
\ No newline at end of file
+
+# With filters (build and search)
+
+./apps/test_streaming_scenario  --data_type ${type} --num_start_points ${num_start_points} --label_file ${label_file} --universal_label {universal_label} --dist_fn l2 --data_path ${data}  --index_path_prefix ${index_prefix} -R 64 -L 600 --alpha 1.2 --insert_threads ${ins_thr} --consolidate_threads ${cons_thr}  --max_points_to_insert ${inserts}  --active_window ${active} --consolidate_interval ${cons_int} --start_point_norm 508;
+./apps/utils/compute_groundtruth_for_filters --data_type ${type} --dist_fn l2 --base_file ${index}.data  --query_file ${query}  --K 100 --gt_file ${gt_filtered} --label_file  ${label_file} --universal_label {universal_label} --filter_label {filter_label}
+./apps/search_memory_index  --data_type ${type} --filter_label {filter_label} --dist_fn l2 --index_path_prefix ${index} --result_path ${result} --query_file ${query}  --gt_file ${gt_filtered}  -K 10 -L 20 40 60 80 100 -T 64 --dynamic true --tags 1
+```
diff --git a/workflows/filtered_ssd_index.md b/workflows/filtered_ssd_index.md
index 272100e6d..7457d8c9b 100644
--- a/workflows/filtered_ssd_index.md
+++ b/workflows/filtered_ssd_index.md
@@ -21,7 +21,7 @@ To generate an SSD-friendly index, use the `apps/build_disk_index` program.
 11. **--build_PQ_bytes** (default is 0): Set to a positive value less than the dimensionality of the data to enable faster index build with PQ based distance comparisons. 
 12. **--use_opq**: use the flag to use OPQ rather than PQ compression. OPQ is more space efficient for some high dimensional datasets, but also needs a bit more build time.
 13. **--label_file**: Filter data for each point, in `.txt` format. Line `i` of the file consists of a comma-separated list of filters corresponding to point `i` in the file passed via `--data_file`.
-14. **--universal_label**: Optionally, the the filter data may contain a "wild-card" filter corresponding to all filters. This is referred to as a universal label. Note that if a point has the universal label, then the filter data must only have the universal label on the line corresponding to said point.
+14. **--universal_label**: Optionally, the label data may contain a special "universal" label. A point with the universal label can be matched against a query with any label. Note that if a point has the universal label, then the filter data must only have the universal label on the line corresponding.
 15. **--FilteredLbuild**: If building a filtered index, we maintain a separate search list from the one provided by `--Lbuild`. 
 16. **--filter_threshold**: Threshold to break up the existing nodes to generate new graph internally by breaking dense points where each node will have a maximum F labels. Default value is zero where no break up happens for the dense points.