Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce repo indexer disk usage #3452

Merged
merged 1 commit into from
Feb 5, 2018

Conversation

ethantkoenig
Copy link
Member

@ethantkoenig ethantkoenig commented Feb 3, 2018

Reduces disk usage of the repo (i.e. code) indexer:

I saw as roughly 3x (1.5GB -> 500MB) reduction in disk usage as a result of these changes (of course, mileage will vary depending on what type of text/code you are indexing).

Also introduces a migration-like versions to the issue and repo indexers to facilitate changes (which will typically require rebuilding the index).

Yes, this PR shamelessly pulls in https://github.com/ethantkoenig/rupture as a dependency to facilitate tracking indexer versions and migrations; I am aware of no other alternatives.

@@ -70,9 +73,15 @@ func createIssueIndexer() error {
mapping := bleve.NewIndexMapping()
docMapping := bleve.NewDocumentMapping()

numericFieldMapping := bleve.NewNumericFieldMapping()
numericFieldMapping.Store = false
numericFieldMapping.IncludeInAll = false
docMapping.AddFieldMappingsAt("RepoID", bleve.NewNumericFieldMapping())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this use numericFieldMapping?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fixed

@tboerger tboerger added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Feb 3, 2018
@lafriks lafriks added the type/enhancement An improvement of existing functionality label Feb 3, 2018
@lafriks lafriks added this to the 1.5.0 milestone Feb 3, 2018
@lafriks lafriks added the type/changelog Adds the changelog for a new Gitea version label Feb 3, 2018
@ethantkoenig ethantkoenig force-pushed the repo_indexer_disk_usage branch 3 times, most recently from 43870d9 to c90f7af Compare February 4, 2018 03:57
@codecov-io
Copy link

codecov-io commented Feb 4, 2018

Codecov Report

Merging #3452 into master will decrease coverage by <.01%.
The diff coverage is 54.32%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3452      +/-   ##
==========================================
- Coverage   35.67%   35.67%   -0.01%     
==========================================
  Files         281      281              
  Lines       40697    40671      -26     
==========================================
- Hits        14519    14508      -11     
+ Misses      24031    24020      -11     
+ Partials     2147     2143       -4
Impacted Files Coverage Δ
models/issue_indexer.go 67.81% <0%> (ø) ⬆️
modules/indexer/indexer.go 63.26% <40.9%> (-14.24%) ⬇️
models/repo_indexer.go 43.85% <50%> (-3.58%) ⬇️
modules/indexer/repo.go 63.47% <53.33%> (+2.6%) ⬆️
modules/indexer/issue.go 67.56% <78.94%> (+8.35%) ⬆️
models/repo_list.go 65.62% <0%> (-1.57%) ⬇️
models/error.go 32.73% <0%> (-0.4%) ⬇️
models/repo.go 42.98% <0%> (+0.18%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 283e87d...55a3db8. Read the comment docs.

@tboerger tboerger added lgtm/need 1 This PR needs approval from one additional maintainer to be merged. and removed lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. labels Feb 4, 2018
"strconv"

"code.gitea.io/gitea/modules/setting"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add empty line

@ethantkoenig
Copy link
Member Author

@appleboy Done

@tboerger tboerger added lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. and removed lgtm/need 1 This PR needs approval from one additional maintainer to be merged. labels Feb 5, 2018
@lafriks
Copy link
Member

lafriks commented Feb 5, 2018

@ethantkoenig please resolve conflicts

@ethantkoenig
Copy link
Member Author

@lafriks Resolved

@lafriks lafriks merged commit a89592d into go-gitea:master Feb 5, 2018
@ethantkoenig ethantkoenig deleted the repo_indexer_disk_usage branch February 21, 2018 05:50
aswild added a commit to aswild/gitea that referenced this pull request Jul 6, 2018
* SECURITY
  * Limit uploaded avatar image-size to 4096x3072 by default (go-gitea#4353)
  * Do not allow to reuse TOTP passcode (go-gitea#3878)
* FEATURE
  * Add cli commands to regen hooks & keys (go-gitea#3979)
  * Add support for FIDO U2F (go-gitea#3971)
  * Added user language setting (go-gitea#3875)
  * LDAP Public SSH Keys synchronization (go-gitea#1844)
  * Add topic support (go-gitea#3711)
  * Multiple assignees (go-gitea#3705)
  * Add protected branch whitelists for merging (go-gitea#3689)
  * Global code search support (go-gitea#3664)
  * Add label descriptions (go-gitea#3662)
  * Add issue search via API (go-gitea#3612)
  * Add repository setting to enable/disable health checks (go-gitea#3607)
  * Emoji Autocomplete (go-gitea#3433)
  * Implements generator cli for secrets (go-gitea#3531)
* ENHANCEMENT
  * Add more webhooks support and refactor webhook templates directory (go-gitea#3929)
  * Add new option to allow only OAuth2/OpenID user registration (go-gitea#3910)
  * Add option to use paged LDAP search when synchronizing users (go-gitea#3895)
  * Symlink icons (go-gitea#1416)
  * Improve release page UI (go-gitea#3693)
  * Add admin dashboard option to run health checks (go-gitea#3606)
  * Add branch link in branch list (go-gitea#3576)
  * Reduce sql query times in retrieveFeeds (go-gitea#3547)
  * Option to enable or disable swagger endpoints (go-gitea#3502)
  * Add missing licenses (go-gitea#3497)
  * Reduce repo indexer disk usage (go-gitea#3452)
  * Enable caching on assets and avatars (go-gitea#3376)
  * Add repository search ordered by stars/forks. Forks column in admin repo list (go-gitea#3969)
  * Add Environment Variables to Docker template (go-gitea#4012)
  * LFS: make HTTP auth period configurable (go-gitea#4035)
  * Add config path as an optionial flag when changing pass via CLI (go-gitea#4184)
  * Refactor User Settings sections (go-gitea#3900)
  * Allow square brackets in external issue patterns (go-gitea#3408)
  * Add Attachment API (go-gitea#3478)
  * Add EnableTimetracking option to app settings (go-gitea#3719)
  * Add config option to enable or disable log executed SQL (go-gitea#3726)
  * Shows total tracked time in issue and milestone list (go-gitea#3341)
* TRANSLATION
  * Improve English grammar and consistency (go-gitea#3614)
* DEPLOYMENT
  * Allow Gitea to run as different USER in Docker (go-gitea#3961)
  * Provide compressed release binaries (go-gitea#3991)
  * Sign release binaries (go-gitea#4188)
@go-gitea go-gitea locked and limited conversation to collaborators Nov 23, 2020
@delvh delvh removed the type/changelog Adds the changelog for a new Gitea version label Oct 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
lgtm/done This PR has enough approvals to get merged. There are no important open reservations anymore. type/enhancement An improvement of existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants