Skip to content
paško edited this page Sep 13, 2021 · 26 revisions

Why implement i2c for the esp8266 in Assembly?

Because I can 😃

Yes it is true: There are a couple of good i2c implementations for both the arduino tool chain or the native one. However, for various reasons I decided to implement i2c in inline Xtensa assembly language:

  • Robust i2c single master implementation, including SCL clock stretching support and handling of i2c errors
  • Clean and exact timing for SCL and SDA, e.g., removal of ACK spike
  • Fast mode plus SCL speeds: 800 KHz @ 80 MHz, 1 MHz @ 160 MHz
  • Introduction of i2C transactions: Group any combination of i2c writes or reads together, either with or without repeated start.

Note: You should be familiar with the i2c protocol, refer for instance to i2c-bus.org or i2c.info. Furthermore, before doing anything in software—even with this library—make sure, your i2c hardware setting is correct: Calculate the correct value of pullup resistors, especially for high SCL rates > 400 KHz, as shown in the EDN or TI documentation. Correctly connect your i2c slaves, e.g., do not forget to connect the (not) RESET pin of an MCP23017 as well...

Wiki Overview

I2C Transactions

Communication between the i2c master and its slaves usually involves a combination of i2c commands, i.e. writes and reads. For instance, write the register address pointer first and then read out a couple of bytes from it. From an implementation point of view, you would like to know if the communication as a whole succeeded or failed and you don't want to do an error handling after each command. Also, you often have i2c slaves on your bus supporting different maximal SCL speeds and you want to "talk" to each of them at their maximal speed. Setting the SCL speed should therefore be bound to a transaction and not to the i2c bus setup.

Below is an example of two transactions to different slaves with different SCL speeds. Transactions with different SCL speeds

What is a transaction in brzo i2c?

  • All i2c commands belonging to a transaction are bracketed between brzo_i2c_start_transaction and brzo_i2c_end_transaction
  • Each transaction can contain an arbitrary combination of i2c read and writes, each command can be executed with or without a repeated start
  • A transaction is bound to one i2c slave and all commands are executed at the same SCL speed. Thus, brzo_i2c_start_transaction takes the slave address and SCL speed in KHz as input parameters.
  • brzo_i2c_end_transaction ends the current transaction and returns a code: Zero, if all commands of the transaction were successful or an non-zero error code.

As an example take a transaction to an ADT7420 sensor at address 0x49 with 400 KHz: First, select the configuration register (write: 0x03) and select 16 bit resolution (write: 0xA0). Second, select temperature value most significant byte register (write: 0x00) and third read two bytes of data.

// Precondition: brzo_i2c_setup(...) was called to setup i2c 

brzo_i2c_start_transaction(0x49, 400);
  buffer[0] = 0x03;
  buffer[1] = 0xA0;
  brzo_i2c_write(buffer, 2, false); // write 2 bytes, no repeated start
  buffer[0] = 0x00;
  brzo_i2c_write(buffer, 1, true); // write 1 byte with repeated start
  brzo_i2c_read(buffer, 2, false); // read 2 bytes, no repeated start
error = brzo_i2c_end_transaction();

Note:

  • All commands inside a transaction are executed immediately. They are not queued up until brzo_i2c_end_transaction() is called. (This is different from wire.endTransmission())
  • A transaction must always end with a STOP sequence. Therefore the last command must not contain a repeated start, otherwise you will most likely stall the i2c bus!
  • If the currently executed i2c command fails, then subsequent commands will not be executed
  • The precondition of brzo_i2c_start_transaction(...) is to call at least once brzo_i2c_setup(...) before, typically in the "setup part" of your code.

How to migrate from Wire Library to brzo i2c

The "classic" Arduino Wire Library (an i2c master library) is available for the esp8266 as well. For many i2c devices example code is available on the basis of the Wire Library. This section gives you some hints how to migrate Wire Library code to brzo i2c.

The main conceptional difference to the Wire Library are brzo i2c transactions. With the Wire Library i2c writes follow this pattern:

Wire.beginTransmission(...);
Wire.write(...);
Wire.endTransmission();

Behind the scenes of the Wire Library the following takes place: Wire.beginTransmission(...) sets up some internal stuff, Wire.write(...) writes data to an internal Wire Library buffer and eventually in Wire.endTransmission() the i2c communication takes place and the content of the buffer is transmitted to the i2c device; i.e. START, send slave address, write first byte, ..., STOP.

In Wire Library code you find very often more than one Wire.write(...), e.g.,

1  Wire.beginTransmisswion(...);
2    Wire.write(A); // e.g., write a Register Pointer Byte
3    Wire.write(B); // e.g., write some Data to fill that Register
4    Wire.write(C); // cont.
5  Wire.endTransmission();

Now, the following brzo_i2c code does not transmit the same i2c sequence as Wire Library:

6  brzo_i2c_start_transaction(...);
7    brzo_i2c_write(A, true);
8    buffer[0] = B;
9    buffer[1] = C;
10   brzo_i2c_write(buffer, 2, false);
11 resultcode = brzo_i2c_end_transaction();

The Wire Library code has the following i2c sequence:

a START
b Transmit Slave Address
c Transmit A 
d Transmit B, C
e STOP

Whereas the brzo_i2c code has the following i2c sequence:

f START
g Transmit Slave Address
h Transmit A
  (NO STOP)
i (REPEATED) START
j Transmit Slave Address
k Transmit B, C
l STOP

In Wire Library lines 1-4 do not yet transmit anything over i2c, instead an internal buffer is filled with all the data to be transmitted. Eventually, the whole transmission takes place with line 5, i.e. the whole i2c sequence a-e takes place.

In brzo_i2c, line 7 transmitts f, g, h (no STOP, since repeated start is set to true). Then, line 10 does i, j, k, l.

Therefore, one has to fill all data to be transmitted into a buffer first and then use one brzo_i2c_write to send it. In the example above this would mean:

buffer[0] = A;
buffer[1] = B;
buffer[2] = C;

brzo_i2c_start_transaction(...);
  brzo_i2c_write(buffer, 3, false);
resultcode = brzo_i2c_end_transaction();

Concerning i2c reads, Wire Library Code looks usually like this:

Wire.requestFrom(..., n);
... = Wire.read();

Concerning Wire.requestFrom(..., n): Behind the scenes this statement does read n bytes and puts them in a Wire Library buffer and i2c communication takes place, START, send slave address, read first byte, ..., STOP. The calls to wire.read(...) afterwards does not perform any i2c communication, instead it just returns the data from the buffer.

If you would like to turn the above Wire Library code snippets into brzo i2c you could do:

brzo_i2c_start_transaction(..., ...);
  brzo_i2c_write(..., ..., false);
returnCode = brzo_i2c_end_transaction();

and

brzo_i2c_start_transaction(..., ...);
  brzo_i2c_read(..., n, false);
returnCode = brzo_i2c_end_transaction();

Note that in brzo i2c you have to explicitly pass a parameter for repeated starts in every read or write. Since the default behaviour of the Wire Library is to not have a repeated start, you have to pass false in the corresponding brzo i2c commands.

While the above approach will work, it is not the recommended way to migrate to brzo i2c. Instead, you should have a look at the datasheet of your i2c device and figure out how to form i2c transactions. For instance, you have to write 1 byte to a configuration register with repeated start and then read out 4 bytes from the i2c device with no repeated start. Thus, you would put this into one brzo i2c transaction. Please refer to the example code for some i2c devices.

SCL Speed in Fast Mode Plus

The i2c specification allows fast mode plus SCL speeds up to 1 MHz. This is supported by brzo i2c if the esp8266 CPU speed is set to 160 MHz. Given an i2c device supports fast mode plus up to 1 MHz, the typical limiting factor for higher SCL speed is the minimum pullup resistor value. In the scope picture below the SCL signal looks good at 800 KHz: The rise time is 126 nsec and slightly above the 120 nsec according to the i2c specification for fast mode plus. In that setting the pullup resistor had already the calculated minimum value of 1 kOhm. Other min/max values according to the i2c specification, like the maximum LOW-level input voltage VIL etc., are met as well—you should always check your signals with a scope. i2c 800 KHz Details

Critical timing during Read

When selecting a too high SCL speed for a specific i2c device it may lead to "strange" errors during i2c reads, which cannot be properly detected as i2c errors. Instead, you will read back a byte of value 255. In the pictures below, the i2c slave takes around 468 ns before it begins to pull down SDA, in order to signal an ACK to the master. The first picture shows the situation with 800 KHz, still working properly. However, in the second picture with 900 KHz the SCL half cycle is too short: During SCL is kept high by the master, the slave releases SDA again: This is a sort of STOP condition by the slave! Now, there is a critical timing when SDA is sampled by the master: Will it read a LOW or HIGH value? The fact is that SDA is sampled towards the end the half cycle of SCL being high, i.e. just around the minimum value of SDA. Thus, SDA is sampled as being LOW and hence the interpretation is that the slave has ACKnowledged the read address. In the ongoing read of the data byte, the master will just read all bits being 1 as the third picture shows. Since the i2c device does not support CRC-checking, the master has no way to detect such a read error!

To minimize the risk of such a scenario, the implementation stretches the first half cycle of the 9th clock cycle a little bit during an i2c read.

Read ok at 800 KHz

Read fail at 900 KHz

Read fail at 900 KHz, 255 being read

I2C setup and Clock Stretching

Before using any i2c transactions brzo_i2c_setup(...) needs to be called at least once. Besides setting the SDA and SCL Pins as open drain outputs, it sets the maximum time the master is waiting for SCL clock stretching. I2c slaves may stretch the clock by pulling SCL low, signalling the master to wait before next commands can be transmitted. Clock stretching is implemented in brzo_i2c_write and brzo_i2c_read.

Bus stall: If the timeout is exceeded, the slave will still pull SCL low and the master cannot send a STOP sequence. Hence, we have a bus stall!

Be very careful with long clock stretching i2c devices: If you use i2c slaves in the clock stretching mode then the corresponding brzo i2c transaction takes as long as the i2c slave stretches the clock. The default behaviour of brzo i2c reads and writes is that all all interrupts are disabled and thus you will be without any interrupts during rather a "long time". You can change this global behaviour via BRZO_I2C_DISABLE_INTERRUPTS, see readme.

The following scope picture shows a SCL stretch of around 42 msec with a HTU21 sensor. In this case the timeout was set to 50 msec: brzo_i2c_setup(..., ..., 50000);. See also the HTU21 example.

Clock Stretch of 42 msec

I2C Spikes and how they are removed

Spikes occur either during an i2c write, when the slave sends an acknowledge (ACK), the so-called ACK spike. It occurs in the 9th cycle after 8 cycles were used to transmit a byte. I2C Write ACK Spike

Another spike can occur between a read of two bytes, when the MSBit of the second byte to be read is zero. In the example below you can see the spike in the first cycle of the second byte. I2C Read Spike

Since i2c is implemented in assembly, we have full control and can thus remove those spikes by adjusting the timing in the corresponding SCL cycles. The i2c specification defines the maximum size of spikes/glitches, referred to as tsp: The "pulse width of spikes that must be suppressed by the input filter". Therefore, if there is a way to remove them in software, one should definitely do it.

In general, there is a simple busy waiting for each half cycle of SCL:

MOV.N %[r_temp1], %[r_iteration_scl_halfcycle];
l_delay:
  ADDI.N %[r_temp1], %[r_temp1], -1;
  NOP;
BNEZ  %[r_temp1], l_delay;

The number of iterations iteration_scl_halfcycle depend on the CPU Clock speed, 80 or 160 MHz, and the selected SCL frequency. In order to reduce those spikes, there are shorter delays which are represented as iteration_remove_spike. In brzo_i2c_write(...) and brzo_i2c_read(...) these shorter waits are used to allow precise timing of the corresponding events, e.g., release of SDA. The length of those spike reducing waits were determined with several tests of different i2c slaves at different SCL speeds.

Now, the i2c writes and reads have no more spikes:

i2c ACK Spike removed

i2c ACK Spike removed

I2C Acknowledge Polling

Acknowledge Polling is only found with EEPROMs, e.g., from ATMEL or Microchip. EEPROMs need internal write cycles to actually store the data. These internal write cycles take up to a maximum time tw to complete, but very often the acutal needed time is much shorter than tw. Therefore, there is a mechanism called Acknowledge Polling, where the Master polls for an ACK that signals the completion of the internal write cycle. The basic operation is as follows: ACK Polling

Read NACKs with the Rigol Scope

When the master has read the last byte from the slave, it has to send a not acknowledge (NACK) instead of an ACK. Now, on the Rigol scope with i2c decode option this is depicted with an "?". Yessss, an i2c decoding error and this NACK use the same ? symbol! OK, it is written in the Rigol manual, but...

It took me a couple of hours to figure out that this is not an error 😡

I2C NACK and Rigol Scope Questionmark

And in details, the 9th clock cycle with the NACK and the STOP sequence: I2C NACK Details

Force all Functions into Instruction RAM

All functions which need exact i2c timing are decorated with ICACHE_RAM_ATTR. This is especially important for the Arduino tool chain, because we want to force these functions to be always kept in (instruction) RAM. This is the other way around compared to the native tool chain, where "Functions decorated with ICACHE_FLASH_ATTR are compiled to the irom section,CPU will read the function code out of FLASH chip if needed. It will be loaded in CACHE and run only if it is called". If ICACHE_RAM_ATTR is missing, the Arduino tool chain will treat them like the native tool chain treats functions decorated with ICACHE_FLASH_ATTR. Thus, omitting ICACHE_RAM_ATTR will lead to a very strange and unpredictable timing behaviour, cf. my issue. If you use brzo i2c inside another function, do not forget to decorate that function with ICACHE_RAM_ATTR as well. Otherwise you may face the same timing issues again when that function is called.

"Atomic" reads and writes

From a macroscopic view brzo_i2c_write and brzo_i2c_read are "atomic" in the sense that all interrrupts are disabled: At the beginning of reads and writes all interrupts are disabled with RSIL %[r_temp1], 15; and re-enabled at the end with RSIL %[r_temp1], 0;. This is the default behaviour, you can change this globally by setting BRZO_I2C_DISABLE_INTERRUPTS to 0 which "allows" interrupts, see readme

Why disabling all Interrupts?

During an active i2c read or write, it is not foreseeable what will happen if interrupt occurs. It depends on the behaviour of both the i2c master and the i2c slave: In the best case it could lead to a stretch of an SCL (half-)cycle and the slave will just ignore this. But other slaves might get "confused", especially if an interrupt occurs during ACK/NACK phases or during restarts. And thus you might end up with an i2c bus stall. So, in order to minimize such unpredictable behaviour, all interrupts are disabled.

Depending on what else you would like to do besides i2c communications, it can be useful to allow interrupts during i2c reads or writes.

Note that the software watchdog is not disabled during reads and writes.

Maximum duration of reads and writes

Besides interrupts, there are two watchdogs on the esp8266, a soft and a hard one, i.e. a software and hardware watchdog. Assuming that the software watchdog is fed before a read or write, the software watchdog timeout gives an upper bound for the maximum duration of reads or writes. According to espressif (although the thread was for the RTOS SDK), the timeout is around 1 sec.

Besides watchdogs, the cpu should not be used for too long with interrupts being disabled. In older SDK documentation it said "a task should not occupy CPU more than 10 ms (milli seconds), otherwise may cause Wi-Fi connection break". In the recent SDK documentation it says, "If interrupt is disabled, CPU can only be occupied in us range and the time should not be more than 10 us (microseconds)", which is not feasible from my point of view. So again, the bottom line is: If interrupts are disabled, make sure that i2c reads and writes don't take too long, especially in the context of clock stretching.

Therefore:

  • Try to break up long single writes and reads into smaller peaces. I.e. instead of sending n bytes with one brzo_i2c_write(), send less than n Bytes per brzo_i2c_write(). This will allow that interrupts could occur between i2c writes or reads.
  • If your i2c transaction is very long (more than 10 msec), for instance if you are using a for loop to send some data, then add an delay(0) or yield() after each iteration of your loop.

Inline Assembly

You need the Xtensa Overview handbook and Instruction Set Architecture Reference Manual. For the inline assembly I used both the keyword volatile and the clobber "memory", because GPIOs are accessed through memory mapped I/O. Since I wanted to let the compiler choose the registers, I used the output and input operand lists. Another approach would be to use registers manually and put them on the clobber list.

asm volatile (
  "Assembly instruction;"
  "...;"
  : [r_set] "+r" (a_set), ...
  : [r_sda_bitmask] "r" (sda_bitmask), ...
  : "memory"
);

Memory mapped I/O

All 16 GPIO pins are mapped to these memory addresses. To save some registers I just used r_set to hold the set address MOVI %[r_set], 0x60000304 (Note: The compiler will expand this statement for correct immediate moves). All other memory addresses to clear and read GPIO values are relative to r_set—note that the offsets are decimal in inline assembly but the memory addresses are in hex! Since there are only 16 GPIO pins, 16 bit loads and store are enough.

L16UI %[r_in_value], %[r_set], 20;   // offset is 20d = 14h => in: 0x60000318
MEMW;
S16I  %[r_scl_bitmask], %[r_set], 4; // clear: 0x60000308
MEMW;

You should use at least one MEMW between every load and store to a volatile memory address.

BALL and BNALL

I found those instructions very useful. BALL means branch if all bits are set or not set for BNALL. Typical use cases are when you need to branch if a specific pin is high, e.g., SDA is high. In brzo_i2c_setup the corresponding bitmasks to the given SDA and SCL pins are set. These bitmasks are used in the BALL or BNALLinstructions, see code snippet:

// Sample SCL value
L16UI  %[r_in_value], %[r_set], 20; // offset is 20d = 14h = > in: 0x60000318
MEMW;
// Branch if SCL = 1, i.e. no stretching
BALL   %[r_in_value], %[r_scl_bitmask], l_no_stretch;

A couple of hints

Inline assembly is very hard to debug. Be careful with program flow including jumps or branches. Furthermore, if you misspell assembly instructions, the compiler will just tell you "there was somewhere junk in your inline assembly", but it won't tell you where exactly the problem is. So, after changing a few line of inline assembly do always a re-compile.

Also, be careful with memory accesses! If you mess it up in assembly, your esp8266 will crash 😧 And finding the cause of the exception can be time consuming.

If you need to access the array of bytes for the buffer &data[0] will pass the pointer of the first array element to the inline assembly.

btw: Why the heck is this library called brzo i2c?

In Croatian brzo means fast... 🐟