title | summary |
---|---|
Update Data |
Learn about how to update data and batch update data. |
This document describes how to use the following SQL statements to update the data in TiDB with various programming languages:
- UPDATE: Used to modify the data in the specified table.
- INSERT ON DUPLICATE KEY UPDATE: Used to insert data and update this data if there is a primary key or unique key conflict. It is not recommended to use this statement if there are multiple unique keys (including primary keys). This is because this statement updates the data once it detects any unique key (including primary key) conflict. When there are more than one row conflicts, it updates only one row.
Before reading this document, you need to prepare the following:
- Build a TiDB Cluster in TiDB Cloud (Developer Tier).
- Read Schema Design Overview, Create a Database, Create a Table, and Create Secondary Indexes.
- If you want to
UPDATE
data, you need to insert data first.
To update an existing row in a table, you need to use an UPDATE
statement with a WHERE
clause to filter the columns for updating.
Note:
If you need to update a large number of rows, for example, more than ten thousand, it is recommended that you do NOT doing a complete update at once, but rather updating a portion at a time iteratively until all rows are updated. You can write scripts or programs to loop this operation. See Bulk-update for details.
In SQL, the UPDATE
statement is generally in the following form:
{{< copyable "sql" >}}
UPDATE {table} SET {update_column} = {update_value} WHERE {filter_column} = {filter_value}
Parameter Name | Description |
---|---|
{table} |
Table Name |
{update_column} |
Column names to be updated |
{update_value} |
Column values to be updated |
{filter_column} |
Column names matching filters |
{filter_value} |
Column values matching filters |
For detailed information, see UPDATE syntax.
The following are some best practices for updating data:
- Always specify the
WHERE
clause in theUPDATE
statement. If theUPDATE
statement does not have aWHERE
clause, TiDB will update ALL ROWS in the table.
- Use bulk-update when you need to update a large number of rows (for example, more than ten thousand). Because TiDB limits the size of a single transaction (txn-total-size-limit, 100 MB by default), too many data updates at once will result in holding locks for too long (pessimistic transactions) or cause conflicts (optimistic transactions).
- Use bulk-update when you need to update a large number of rows (for example, more than ten thousand). Because TiDB limits the size of a single transaction to 100 MB by default, too many data updates at once will result in holding locks for too long (pessimistic transactions) or cause conflicts (optimistic transactions).
Suppose an author changes her name to Helen Haruki. You need to change the authors table. Assume that her unique id
is 1, and the filter should be: id = 1
.
{{< copyable "sql" >}}
UPDATE `authors` SET `name` = "Helen Haruki" WHERE `id` = 1;
{{< copyable "" >}}
// ds is an entity of com.mysql.cj.jdbc.MysqlDataSource
try (Connection connection = ds.getConnection()) {
PreparedStatement pstmt = connection.prepareStatement("UPDATE `authors` SET `name` = ? WHERE `id` = ?");
pstmt.setString(1, "Helen Haruki");
pstmt.setInt(2, 1);
pstmt.executeUpdate();
} catch (SQLException e) {
e.printStackTrace();
}
If you need to insert new data into a table, but if there are unique key (a primary key is also a unique key) conflicts, the first conflicted record will be updated. You can use INSERT ... ON DUPLICATE KEY UPDATE ...
statement to insert or update.
In SQL, the INSERT ... ON DUPLICATE KEY UPDATE ...
statement is generally in the following form:
{{< copyable "sql" >}}
INSERT INTO {table} ({columns}) VALUES ({values})
ON DUPLICATE KEY UPDATE {update_column} = {update_value};
Parameter Name | Description |
---|---|
{table} |
Table name |
{columns} |
Column names to be inserted |
{values} |
Column values to be inserted |
{update_column} |
Column names to be updated |
{update_value} |
Column values to be updated |
- Use
INSERT ON DUPLICATE KEY UPDATE
only for a table with one unique key. This statement updates the data if any UNIQUE KEY (including the primary key) conflicts are detected. If there are more than one row of conflicts, only one row will be updated. Therefore, it is not recommended to use theINSERT ON DUPLICATE KEY UPDATE
statement in tables with multiple unique keys unless you can guarantee that there is only one row of conflict. - Use this statement when you create data or update data.
For example, you need to update the ratings table to include the user's ratings for the book. If the user has not yet rated the book, a new rating will be created. If the user has already rated it, his previous rating will be updated.
In the following example, the primary key is the joint primary keys of book_id
and user_id
. A user user_id = 1
gives a rating of 5
to a book book_id = 1000
.
{{< copyable "sql" >}}
INSERT INTO `ratings`
(`book_id`, `user_id`, `score`, `rated_at`)
VALUES
(1000, 1, 5, NOW())
ON DUPLICATE KEY UPDATE `score` = 5, `rated_at` = NOW();
{{< copyable "" >}}
// ds is an entity of com.mysql.cj.jdbc.MysqlDataSource
try (Connection connection = ds.getConnection()) {
PreparedStatement p = connection.prepareStatement("INSERT INTO `ratings` (`book_id`, `user_id`, `score`, `rated_at`)
VALUES (?, ?, ?, NOW()) ON DUPLICATE KEY UPDATE `score` = ?, `rated_at` = NOW()");
p.setInt(1, 1000);
p.setInt(2, 1);
p.setInt(3, 5);
p.setInt(4, 5);
p.executeUpdate();
} catch (SQLException e) {
e.printStackTrace();
}
When you need to update multiple rows of data in a table, you can use INSERT ON DUPLICATE KEY UPDATE
with the WHERE
clause to filter the data that needs to be updated.
However, if you need to update a large number of rows (for example, more than ten thousand), it is recommended that you update the data iteratively, that is, updating only a portion of the data at each iteration until the update is complete. This is because TiDB limits the size of a single transaction (txn-total-size-limit, 100 MB by default). Too many data updates at once will result in holding locks for too long (pessimistic transactions, or causing conflicts (optimistic transactions). You can use a loop in your program or script to complete the operation.
However, if you need to update a large number of rows (for example, more than ten thousand), it is recommended that you update the data iteratively, that is, updating only a portion of the data at each iteration until the update is complete. This is because TiDB limits the size of a single transaction to 100 MB by default. Too many data updates at once will result in holding locks for too long (pessimistic transactions, or causing conflicts (optimistic transactions). You can use a loop in your program or script to complete the operation.
This section provides examples of writing scripts to handle iterative updates. This example shows how a combination of SELECT
and UPDATE
should be done to complete a bulk-update.
First, you should write a SELECT
query in a loop of your application or script. The return value of this query can be used as the primary key for the rows that need to be updated. Note that when defining this SELECT
query, you need to use the WHERE
clause to filter the rows that need to be updated.
Suppose that you have had a lot of book ratings from users on your bookshop
website over the past year, but the original design of a 5-point scale has resulted in a lack of differentiation in book ratings. Most books are rated 3
. You decide to switch from a 5-point scale to a 10-point scale to differentiate ratings.
You need to multiply by 2
the data in the ratings
table from the previous 5-point scale, and add a new column to the ratings table to indicate whether the rows have been updated. Using this column, you can filter out rows that have been updated in SELECT
, which will prevent the script from crashing and updating the rows multiple times, resulting in unreasonable data.
For example, you create a column named ten_point
with the data type BOOL as an identifier of whether it is a 10-point scale:
{{< copyable "sql" >}}
ALTER TABLE `bookshop`.`ratings` ADD COLUMN `ten_point` BOOL NOT NULL DEFAULT FALSE;
Note:
This bulk-update application uses the DDL statements to make schema changes to the data tables. All DDL change operations for TiDB are executed online. For more information, see ADD COLUMN.
In Golang, a bulk-update application is similar to the following:
package main
import (
"database/sql"
"fmt"
_ "github.com/go-sql-driver/mysql"
"strings"
"time"
)
func main() {
db, err := sql.Open("mysql", "root:@tcp(127.0.0.1:4000)/bookshop")
if err != nil {
panic(err)
}
defer db.Close()
bookID, userID := updateBatch(db, true, 0, 0)
fmt.Println("first time batch update success")
for {
time.Sleep(time.Second)
bookID, userID = updateBatch(db, false, bookID, userID)
fmt.Printf("batch update success, [bookID] %d, [userID] %d\n", bookID, userID)
}
}
// updateBatch select at most 1000 lines data to update score
func updateBatch(db *sql.DB, firstTime bool, lastBookID, lastUserID int64) (bookID, userID int64) {
// select at most 1000 primary keys in five-point scale data
var err error
var rows *sql.Rows
if firstTime {
rows, err = db.Query("SELECT `book_id`, `user_id` FROM `bookshop`.`ratings` " +
"WHERE `ten_point` != true ORDER BY `book_id`, `user_id` LIMIT 1000")
} else {
rows, err = db.Query("SELECT `book_id`, `user_id` FROM `bookshop`.`ratings` "+
"WHERE `ten_point` != true AND `book_id` > ? AND `user_id` > ? "+
"ORDER BY `book_id`, `user_id` LIMIT 1000", lastBookID, lastUserID)
}
if err != nil || rows == nil {
panic(fmt.Errorf("error occurred or rows nil: %+v", err))
}
// joint all id with a list
var idList []interface{}
for rows.Next() {
var tempBookID, tempUserID int64
if err := rows.Scan(&tempBookID, &tempUserID); err != nil {
panic(err)
}
idList = append(idList, tempBookID, tempUserID)
bookID, userID = tempBookID, tempUserID
}
bulkUpdateSql := fmt.Sprintf("UPDATE `bookshop`.`ratings` SET `ten_point` = true, "+
"`score` = `score` * 2 WHERE (`book_id`, `user_id`) IN (%s)", placeHolder(len(idList)))
db.Exec(bulkUpdateSql, idList...)
return bookID, userID
}
// placeHolder format SQL place holder
func placeHolder(n int) string {
holderList := make([]string, n/2, n/2)
for i := range holderList {
holderList[i] = "(?,?)"
}
return strings.Join(holderList, ",")
}
In each iteration, SELECT
queries in order of the primary key. It selects primary key values for up to 1000
rows that have not been updated to the 10-point scale (ten_point
is false
). Each SELECT
statement selects primary keys larger than the largest of the previous SELECT
results to prevent duplication. Then, it uses bulk-update, multiples its score
column by 2
, and sets ten_point
to true
. The purpose of updating ten_point
is to prevent the update application from repeatedly updating the same row in case of restart after crashing, which can cause data corruption. time.Sleep(time.Second)
in each loop makes the update application pause for 1 second to prevent the update application from consuming too many hardware resources.
In Java (JDBC), a bulk-update application might be similar to the following:
Code:
{{< copyable "" >}}
package com.pingcap.bulkUpdate;
import com.mysql.cj.jdbc.MysqlDataSource;
import java.sql.*;
import java.util.LinkedList;
import java.util.List;
import java.util.concurrent.TimeUnit;
public class BatchUpdateExample {
static class UpdateID {
private Long bookID;
private Long userID;
public UpdateID(Long bookID, Long userID) {
this.bookID = bookID;
this.userID = userID;
}
public Long getBookID() {
return bookID;
}
public void setBookID(Long bookID) {
this.bookID = bookID;
}
public Long getUserID() {
return userID;
}
public void setUserID(Long userID) {
this.userID = userID;
}
@Override
public String toString() {
return "[bookID] " + bookID + ", [userID] " + userID ;
}
}
public static void main(String[] args) throws InterruptedException {
// Configure the example database connection.
// Create a mysql data source instance.
MysqlDataSource mysqlDataSource = new MysqlDataSource();
// Set server name, port, database name, username and password.
mysqlDataSource.setServerName("localhost");
mysqlDataSource.setPortNumber(4000);
mysqlDataSource.setDatabaseName("bookshop");
mysqlDataSource.setUser("root");
mysqlDataSource.setPassword("");
UpdateID lastID = batchUpdate(mysqlDataSource, null);
System.out.println("first time batch update success");
while (true) {
TimeUnit.SECONDS.sleep(1);
lastID = batchUpdate(mysqlDataSource, lastID);
System.out.println("batch update success, [lastID] " + lastID);
}
}
public static UpdateID batchUpdate (MysqlDataSource ds, UpdateID lastID) {
try (Connection connection = ds.getConnection()) {
UpdateID updateID = null;
PreparedStatement selectPs;
if (lastID == null) {
selectPs = connection.prepareStatement(
"SELECT `book_id`, `user_id` FROM `bookshop`.`ratings` " +
"WHERE `ten_point` != true ORDER BY `book_id`, `user_id` LIMIT 1000");
} else {
selectPs = connection.prepareStatement(
"SELECT `book_id`, `user_id` FROM `bookshop`.`ratings` "+
"WHERE `ten_point` != true AND `book_id` > ? AND `user_id` > ? "+
"ORDER BY `book_id`, `user_id` LIMIT 1000");
selectPs.setLong(1, lastID.getBookID());
selectPs.setLong(2, lastID.getUserID());
}
List<Long> idList = new LinkedList<>();
ResultSet res = selectPs.executeQuery();
while (res.next()) {
updateID = new UpdateID(
res.getLong("book_id"),
res.getLong("user_id")
);
idList.add(updateID.getBookID());
idList.add(updateID.getUserID());
}
if (idList.isEmpty()) {
System.out.println("no data should update");
return null;
}
String updateSQL = "UPDATE `bookshop`.`ratings` SET `ten_point` = true, "+
"`score` = `score` * 2 WHERE (`book_id`, `user_id`) IN (" +
placeHolder(idList.size() / 2) + ")";
PreparedStatement updatePs = connection.prepareStatement(updateSQL);
for (int i = 0; i < idList.size(); i++) {
updatePs.setLong(i + 1, idList.get(i));
}
int count = updatePs.executeUpdate();
System.out.println("update " + count + " data");
return updateID;
} catch (SQLException e) {
e.printStackTrace();
}
return null;
}
public static String placeHolder(int n) {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < n ; i++) {
sb.append(i == 0 ? "(?,?)" : ",(?,?)");
}
return sb.toString();
}
}
hibernate.cfg.xml
configuration:
{{< copyable "" >}}
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE hibernate-configuration PUBLIC
"-//Hibernate/Hibernate Configuration DTD 3.0//EN"
"http://www.hibernate.org/dtd/hibernate-configuration-3.0.dtd">
<hibernate-configuration>
<session-factory>
<!-- Database connection settings -->
<property name="hibernate.connection.driver_class">com.mysql.cj.jdbc.Driver</property>
<property name="hibernate.dialect">org.hibernate.dialect.TiDBDialect</property>
<property name="hibernate.connection.url">jdbc:mysql://localhost:4000/movie</property>
<property name="hibernate.connection.username">root</property>
<property name="hibernate.connection.password"></property>
<property name="hibernate.connection.autocommit">false</property>
<property name="hibernate.jdbc.batch_size">20</property>
<!-- Optional: Show SQL output for debugging -->
<property name="hibernate.show_sql">true</property>
<property name="hibernate.format_sql">true</property>
</session-factory>
</hibernate-configuration>
In each iteration, SELECT
queries in order of the primary key. It selects primary key values for up to 1000
rows that have not been updated to the 10-point scale (ten_point
is false
). Each SELECT
statement selects primary keys larger than the largest of the previous SELECT
results to prevent duplication. Then, it uses bulk-update, multiples its score
column by 2
, and sets ten_point
to true
. The purpose of updating ten_point
is to prevent the update application from repeatedly updating the same row in case of restart after crashing, which can cause data corruption. TimeUnit.SECONDS.sleep(1);
in each loop makes the update application pause for 1 second to prevent the update application from consuming too many hardware resources.