-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Strange Single Node/Multi Node Insert Performance #23976
Comments
I've done some quick testing this morning against different versions.. (operations per seconds are jobs per second), unfortunately I'm getting worse results with V2 in single node. I can run the 3 node cluster soon in v2 however. Single Node Postgres:
Single Node Cockroach v1.1.6:
Single Node Cockroach v2.0-beta.20180312:
|
Here's an update script that outputs operations per second and can be run against Postgres too Updated Go Scriptpackage main
import (
"context"
"fmt"
"os"
"strconv"
"sync"
"time"
otgorm "github.com/echo-health/opentracing-gorm"
"github.com/jinzhu/gorm"
_ "github.com/jinzhu/gorm/dialects/postgres"
_ "github.com/mattes/migrate/source/file"
"github.com/prometheus/common/log"
uuid "github.com/satori/go.uuid"
"github.com/segmentio/ksuid"
"github.com/wawandco/fako"
)
var (
defaultConnString = "host=localhost user=postgres dbname=postgres sslmode=disable"
defaultMaxConn = 20
defaultPoolName = "default"
lock = &sync.Mutex{}
conns = map[string]*gorm.DB{}
)
const schema = `
DROP TABLE IF EXISTS users, registered_addresses, user_relationships, user_exemptions;
CREATE TABLE IF NOT EXISTS users (
id text PRIMARY KEY,
firebase_id text NOT NULL,
practice_id text NOT NULL,
title text NOT NULL,
first_name text NOT NULL,
last_name text NOT NULL,
gender text NOT NULL,
date_of_birth timestamp NOT NULL,
create_time timestamp DEFAULT now(),
update_time timestamp DEFAULT now()
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_firebase_id ON users (firebase_id);
CREATE TABLE IF NOT EXISTS registered_addresses (
id text PRIMARY KEY,
user_id text NOT NULL REFERENCES users (id),
line1 text NOT NULL,
line2 text,
city text NOT NULL,
postcode text NOT NULL,
country text NOT NULL,
lat numeric NOT NULL,
lng numeric NOT NULL
);
CREATE UNIQUE INDEX IF NOT EXISTS idx_registered_addresses_pid ON registered_addresses (user_id);
CREATE TABLE IF NOT EXISTS user_relationships (
user_id text NOT NULL REFERENCES users (id),
account_id text NOT NULL,
relationship text NOT NULL,
create_time timestamp DEFAULT now(),
update_time timestamp DEFAULT now(),
PRIMARY KEY(user_id, account_id)
);
CREATE INDEX IF NOT EXISTS idx_patients_accounts_id ON user_relationships (account_id);
CREATE TABLE IF NOT EXISTS user_exemptions (
id text PRIMARY KEY,
user_id text NOT NULL REFERENCES users (id),
code text NOT NULL,
status text NOT NULL,
url text NOT NULL,
expiry timestamp DEFAULT now(),
create_time timestamp DEFAULT now(),
update_time timestamp DEFAULT now()
);
CREATE INDEX IF NOT EXISTS idx_exemptions_pid ON user_exemptions (user_id);
`
type User struct {
ID string
PracticeID string
FirebaseID string
Title string `fako:"title"`
FirstName string `fako:"first_name"`
LastName string `fako:"last_name"`
Gender string
DateOfBirth time.Time
CreatedAt time.Time `gorm:"column:create_time"`
UpdatedAt time.Time `gorm:"column:update_time"`
Relationships []*UserRelationship
Exemptions []*UserExemption
RegisteredAddress RegisteredAddress
}
type UserRelationship struct {
UserID string `gorm:"primary_key"`
AccountID string `gorm:"primary_key" fako:"simple_password"`
Relationship string `fako:"first_name"`
CreatedAt time.Time `gorm:"column:create_time"`
UpdatedAt time.Time `gorm:"column:update_time"`
}
type RegisteredAddress struct {
ID string
UserID string
Line1 string `fako:"street_address"`
Line2 string `fako:"street"`
City string `fako:"city"`
Postcode string `fako:"zip"`
Country string `fako:"country"`
Lat float64
Lng float64
}
type UserExemption struct {
ID string
UserID string
URL string `fako:"characters"`
Code string
Status string
Expiry *time.Time
CreatedAt time.Time `gorm:"column:create_time"`
UpdatedAt time.Time `gorm:"column:update_time"`
}
func UserID() string {
if os.Getenv("USE_UUID") != "" {
u, _ := uuid.NewV4()
return u.String()
}
return "user_" + ksuid.New().String()
}
func ExemptionID() string {
if os.Getenv("USE_UUID") != "" {
u, _ := uuid.NewV4()
return u.String()
}
return "user_ex_" + ksuid.New().String()
}
func NominationRequestID() string {
if os.Getenv("USE_UUID") != "" {
u, _ := uuid.NewV4()
return u.String()
}
return "user_nom_" + ksuid.New().String()
}
func AddrID() string {
if os.Getenv("USE_UUID") != "" {
u, _ := uuid.NewV4()
return u.String()
}
return "addr_" + ksuid.New().String()
}
func (p *User) BeforeCreate(scope *gorm.Scope) error {
if p.ID == "" {
p.ID = UserID()
scope.SetColumn("ID", p.ID)
}
if p.FirebaseID == "" {
p.FirebaseID = p.ID
}
return nil
}
func (p *RegisteredAddress) BeforeCreate(scope *gorm.Scope) error {
if p.ID == "" {
p.ID = AddrID()
return scope.SetColumn("ID", p.ID)
}
return nil
}
func (p *UserExemption) BeforeCreate(scope *gorm.Scope) error {
if p.ID == "" {
p.ID = ExemptionID()
return scope.SetColumn("ID", p.ID)
}
return nil
}
func GetDB(ctx context.Context) (*gorm.DB, error) {
return GetDBFromPool(ctx, defaultPoolName)
}
func GetDBFromPool(ctx context.Context, pool string) (*gorm.DB, error) {
lock.Lock()
defer lock.Unlock()
if conns[pool] != nil {
db := otgorm.SetSpanToGorm(ctx, conns[pool])
return db, nil
}
connString := defaultConnString
if os.Getenv("POSTGRES_URL") != "" {
connString = os.Getenv("POSTGRES_URL")
}
log.Debugf("Connecting to PG at %s for pool %s", connString, pool)
var err error
conn, err := connect(connString)
if err != nil {
db := otgorm.SetSpanToGorm(ctx, conn)
return db, err
}
conns[pool] = conn
return conns[pool], err
}
func connect(connString string) (*gorm.DB, error) {
driver := "postgres"
if os.Getenv("CLOUDSQL") == "true" {
driver = "cloudsqlpostgres"
}
conn, err := gorm.Open(driver, connString)
if err != nil {
return nil, err
}
maxConns := defaultMaxConn
if os.Getenv("DB_MAX_CONNS") != "" {
i, err := strconv.Atoi(os.Getenv("DB_MAX_CONNS"))
if err != nil {
return nil, err
}
maxConns = i
}
conn.DB().SetMaxOpenConns(maxConns)
if os.Getenv("DEBUG_SQL") == "true" {
conn.LogMode(true)
}
otgorm.AddGormCallbacks(conn)
conn = conn.Set("gorm:auto_preload", true)
return conn, nil
}
func Get(ctx context.Context, id string) (*User, error) {
db, err := GetDB(ctx)
if err != nil {
return nil, err
}
var p User
err = db.Where("id = ? OR firebase_id = ?", id, id).
First(&p).Error
return &p, err
}
func Create(ctx context.Context, new *User) error {
db, err := GetDBFromPool(ctx, "create")
if err != nil {
return err
}
old, err := Get(ctx, new.FirebaseID)
if err == gorm.ErrRecordNotFound {
return db.Create(new).Error
}
if err != nil {
return err
}
tx := db.Begin()
// first delete any relationships that have been removed from firebase
for _, rel := range old.Relationships {
inFirebase := false
for _, a := range new.Relationships {
if a.AccountID == rel.AccountID {
inFirebase = true
}
}
if !inFirebase {
if err = tx.Delete(rel).Error; err != nil {
tx.Rollback()
return err
}
}
}
new.ID = old.ID
new.RegisteredAddress.ID = old.RegisteredAddress.ID
err = tx.Save(&new).Error
if err != nil {
tx.Rollback()
return err
}
return tx.Commit().Error
}
func worker(id int, jobs <-chan *User) {
// log.Infof("Booting worker %d", id)
for j := range jobs {
err := Create(context.Background(), j)
if err != nil {
log.Fatal(err)
}
}
}
func main() {
db, err := GetDB(context.Background())
if err != nil {
log.Fatal(err)
}
err = db.Exec(schema).Error
if err != nil {
log.Fatal(err)
}
numberOfInserts := 10000
jobs := make(chan *User)
for w := 1; w <= 20; w++ {
go worker(w, jobs)
}
ops := 0
lastOps := 0
go func() {
for i := 0; i < numberOfInserts; i++ {
u := &User{}
fako.Fill(u)
ra := &RegisteredAddress{}
fako.Fill(ra)
u.RegisteredAddress = *ra
for i := 0; i < 2; i++ {
r := &UserRelationship{}
fako.Fill(r)
u.Relationships = append(u.Relationships, r)
}
for i := 0; i < 2; i++ {
r := &UserExemption{}
fako.Fill(r)
u.Exemptions = append(u.Exemptions, r)
}
jobs <- u
ops++
lastOps++
}
}()
now := time.Now()
lastNow := time.Now()
ticker := time.NewTicker(1 * time.Second)
for range ticker.C {
elapsed := now.Sub(lastNow)
fmt.Printf("opts = %+v\n", float64(lastOps-ops)/elapsed.Seconds())
lastNow = time.Now()
lastOps = 0
}
} |
Thanks for the report, @arbarlow. I haven't dug in to your load generator yet, but there absolutely should not be a decrease in performance between v1.1.6 and v2.0-beta. In fact, you should generally see a performance increase when moving to v2.0. We'll try to get someone to dig into this next week to understand what is going on. |
Thanks for the report and provided script @arbarlow! I was able to get it up and running without any issues. The first thing I noticed was that the reported throughput seemed to be steadily dropping while running the loadgen. I realized that this was due to a small logic error that is fixed with the following diff@@ -6,6 +6,7 @@ import (
"os"
"strconv"
"sync"
+ "sync/atomic"
"time"
otgorm "github.com/echo-health/opentracing-gorm"
@@ -364,8 +365,7 @@ func main() {
go worker(w, jobs)
}
- ops := 0
- lastOps := 0
+ var ops int64
go func() {
for i := 0; i < numberOfInserts; i++ {
u := &User{}
@@ -388,20 +388,16 @@ func main() {
}
jobs <- u
- ops++
- lastOps++
+ atomic.AddInt64(&ops, 1)
}
}()
- now := time.Now()
lastNow := time.Now()
ticker := time.NewTicker(1 * time.Second)
-
for range ticker.C {
- elapsed := now.Sub(lastNow)
- fmt.Printf("opts = %+v\n", float64(lastOps-ops)/elapsed.Seconds())
-
+ elapsed := time.Since(lastNow)
+ elapsedOps := atomic.SwapInt64(&ops, 0)
+ fmt.Printf("opts = %+v\n", float64(elapsedOps)/elapsed.Seconds())
lastNow = time.Now()
- lastOps = 0
}
} With this issue fixed, I was able to see steady throughput numbers when running against Cockroach. I added some timing instrumentation and it immediately jumped out that the EDIT: I just verified that this is the case. Postgres uses both indexes to turn this into a pair of point lookups. Is there a way this query could be re-writen to take better advantage of your indexing structure? For the sake of continuing the exploration, I rewrote this as I also tested against Cockroach 1.1.6 and 2.0 and did not see a performance regression. In fact, in my testing the throughput increased by 7%, which is close to what we'd expect. |
Hi there,
I'm doing some preliminary testing of Cockroach and was keen to test it on one of our usual work loads, this is a an application that takes RPCs and mostly performs writes or (syncs), i.e the object may or may not exist, then it will "sync" the objects by performing a write or a read.
With a single node I see initial excellent performance (1000qps, 110tps) with it then dwindling off (400qps, 43tps), which is strange? But initially acceptable.
However when adding more nodes, the performance doesn't increase and in-fact somewhat drops to about 90% of the lower qps (350-400qps).
I have a feeling that my workload is creating lots of contention, but I don't know where to start looking on how to debug that issue. I don't see CPU, Memory or Networking being pushed to it's limits.
I'm running 3 x 4CPU, 16GB RAM, 375GB local SSD on Google Cloud with TCP load balancing.
Here's a Go script that has the same performance as above and is very similar to our workload, I've even added interleaved tables.
A Go script for testing
In order to run the script you'll need to create a database and some permissions..
I'm using Cockroach 1.1.6
Any help is appreciated and I'm willing to help run more tests also. Thanks!
The text was updated successfully, but these errors were encountered: