Skip to content

Improve pin I/O performance #264

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
matthijskooijman opened this issue Jan 27, 2016 · 6 comments
Open

Improve pin I/O performance #264

matthijskooijman opened this issue Jan 27, 2016 · 6 comments

Comments

@matthijskooijman
Copy link
Collaborator

It is well-known that digitalWrite() and friends are quite slow, taking dozens of clockcycles for something that can sometimes be done in just 1 or 2 when using direct I/O. Especially when the arguments are constants, it should be possible to severely optimize this.

In addition to the high-level functions like digitalWrite(), there are also more low level (and AVR-specific) macros like digitalPinToPort(), which currently use a PROGMEM-stored table to perform translations. Currently, these always result in runtime lookups, even when the argument is constant, because:

  1. The compiler does not optimize through things like pgm_read_byte() (haven't verified this, though)
  2. The contents of these tables is only available in the compilation unit that defines ARDUINO_MAIN (wiring_digital.c IIRC)

It would be good to solve these 2 problems to optimize the low-level macros, and additionally improve the higher-level functions to allow (or force) inlining them when the arguments are constants. arduino/Arduino#1285 took a swing at the latter and contains a lot of useful experience. For the low-level problems, arduino/Arduino#1285 modified pins_arduino.h to put the lookup table contents into a macro, but that doesn't seem very elegant or robust to me.

An alternative I came up with was to define a static table in all compilation units (except for the ARDUINO_MAIN one) to use for compile-time lookups, and a non-static table in the ARDUINO_MAIN compilation unit to use for all runtime lookups. In short, this would look something like this:

Arduino.h:

// Use a function to ensure pin is only evaluated once. This might
// also work with a do while loop and/or local variable?
#define digitalPinToPort(P) digitalPinToPortFunction(pin)
static inline uint8_t digitalPinToPortFunction(uint8_t pin) __attribute__((always_inline));
static inline uint8_t digitalPinToPortFunction(uint8_t pin) {
#if !defined(ARDUINO_MAIN)
  if __builtin_constant_p(pin)
    return digital_pin_to_port[P];
  else
#endif
    return pgm_read_byte( digital_pin_to_port_PGM + pin );
}

#ifdef ARDUINO_MAIN
#define DIGITAL_PIN_TO_PORT_TABLE const uint8_t PROGMEM digital_pin_to_port_PGM[]
#else
#define DIGITAL_PIN_TO_PORT_TABLE static const uint8_t PROGMEM digital_pin_to_port[]
#endif

pins_arduino.h:

DIGITAL_PIN_TO_PORT_TABLE = {
    PD, /* 0 */
    PD,
    etc.
}

This discussion was started after merging arduino/Arduino#121, where @PaulStoffregen pointed out the performance implications of that merge.

@PaulStoffregen
Copy link
Contributor

Over and over again, this general idea has failed to gain traction because it changes the way the core libraries define their pin mapping. Now that so many derivative boards exist, based on Arduino core lib with only minor customizations, changing will be incredibly painful.

I personally feel any effort on this is a complete waste of time, unless @cmaglie & @mbanzi & @damellis are 100% on board with such a large change. Many people have wasted so much time, with no chance of Arduino ever adopting these sorts of improvements.

@sandeepmistry sandeepmistry transferred this issue from arduino/Arduino Sep 16, 2019
@WestfW
Copy link

WestfW commented Mar 8, 2020

oh; you don't even need the extra table, with -flto. The compiler is smart/dumb enough to do lookups in constant pgmspace if you "forget" the "pgm_read_xxx" function. See https://github.com/WestfW/Duino-hacks/blob/master/fastdigitalIO/fastdigitalIO.h#L80 which uses all of the existing tables.
(without -flto, the tables are external and can't be read.)

Alas, with the current level of optimization, it's difficult to check/test this sort of code.

@matthijskooijman
Copy link
Collaborator Author

I was recently thinking exactly the same. Even more, with __builtin_constant_p I think you could even do this in the main digitalWrite, without needing a separate constant-only version for this (though I'm not 100% sure if __builtin_constant_p works well enough with LTO, but worst case it returns false when it could return true and things are unchanged from what they are now.

I have yet to try this (but will not get around to this soon, I'm afraid).

@PaulStoffregen
Copy link
Contributor

As far as I know, the idea to use __builtin_constant_p was first proposed by Ben Combee on the developer mail list on March 14, 2009. I started using it for Teensy in November 2009. After more than 10 years of use by many thousands of people with Teensy, I believe it's safe to say it does indeed work quite well, at least on AVR at 16 MHz. Issues do come up with very fast ARM chips and pins with slew rate limiting.

@WestfW
Copy link

WestfW commented Mar 8, 2020

Traditionally, the "fast" implementations of digitalWrite() have been very "dangerous", because they've used a mechanism separate from the pgmspace tables for translating pin number to port/bit, and there was significant risk that the two methods would get out-of-sync (especially when creating new boards.)
But using the compiler features mentioned in https://github.com/WestfW/Duino-hacks/blob/master/fastdigitalIO/fastdigitalIO.h means that the same tables can be used for both, with the "fast" code retrieving constant values from the tables at compile time...

I'd sort-of like to see a digitalWriteFast() appear in the megaAVR core first, where the pgmspace issues are more academic, the performance hit in the normal digitalWrite is worse due to the input-pullup backward compatibility, and we could get some experience on a smaller user-base with less legacy code.

@neu-rah
Copy link

neu-rah commented Mar 9, 2020

this would be a good chance to insert user definable function addresses to be used when pin access fail, as in not in table, by default they can just return but there is a chance to implement virtual pin handlers. The impact would be null because they only express on fail, and even then if on default just return they are inlined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants