SPI Plugin - TI TLC5971 support #1399

s-light · 2018-04-11T19:44:57Z

this pull request adds support for the TI TLC5971 12Ch 16Bit LED-Driver Chips in the SPI-Plugin.

its basically working -
as fare as i have seen if a configuration works it will on every start.
but sometimes / some configurations leads to random malloc(): memory corruption and Received Segmentation fault crashes.
That is not good 🪲
currently i think it has to do with amount of pixels/ports in use - but have no evident for this yet.

iam currently out of ideas how to start tracking down the course of this -
and hope one of you can give me a idea how to start debugging this. -
it definite has to do somehow with my code... and only shows if i use the plugin with the new pixel type.

peternewman

Some initial review comments

peternewman · 2018-04-12T00:37:33Z

plugins/spi/SPIOutput.cpp

+  // calculate DMX-start-address
+  const unsigned int first_slot = m_start_address - 1;  // 0 offset
+
+  // calculate how much channels for full devices are available in dmx_buffer


SPaG: How many

peternewman · 2018-04-12T00:38:06Z

plugins/spi/SPIOutput.cpp


+  personalities.insert(personalities.begin() + PERS_TLC5971_INDIVIDUAL - 1,
+    Personality(m_pixel_count * TLC5971_SLOTS_PER_DEVICE,
+      "TLC5971 Individual Control (16bit per channel)"));


We probably ideally want both 8 and 16bit individual and combined options eventually, but feel free to fix the underlying issues first.

yes - that is the plan when the rest works :-)
for the combined options there are more then one way to solve it...

repeat/copy one set of driver channels (24ch) to all others

repeat/copy the first 3 LED values (3 or 6 ch for 8 or 16bit modes) to all other positions

what do you think does make more sense?

So it's a 4 channel RGB driver right, with global dimmer or similar over each RGB group? You could also use it as 3 channel RGBA drivers (although the global dimmer wouldn't align).

I suspect the latter option is likely to be a better fit for most people, but it's kind of hard to tell.

The main solution would be to implement http://rdm.openlighting.org/pid/display?manufacturer=31344&pid=32773 and then personalities can be independent of driver type and just offer a range of sensible ones.

yes - its meant as 4xRGB - i think its not really global dimmer- is more a correction value per color group... (but it is a long time since i actually read the datasheet / wrote this code... - eventually there is both.. a correction and a color-group dimming)
as fare as i know all libraries (i have found) does not let you control any of the 'advanced' features..
for the PIXEL_TYPE thing i think that is handled in #871
do you mean it makes more sens i try and do this first? (iam currently unaware of how much work / where to start for this - but i can read on this...)

I don't think #871 should be masses of work @s-light , it should essentially be just tracking another variable and then using both to work out what function to run, rather than just personality. We probably also need to double check the RDM spec, but I think in theory every fixture should offer all personality sets, but perhaps just NAck the irrelevant ones (like a 24 channel clone on a normal RGB WS2801 or whatever). I suspect it broadly makes more sense to do it first, although I guess the bulk of the code to write is in the functions that actually process the DMX, so perhaps it doesn't make that much difference overall.

We should probably try and add the PIXEL_TYPE PIDs to the web UI, which will be interesting as the first manufacturer specific ones, but I can probably handle that bit. As well as that stuff needing to go in the config file.

peternewman · 2018-04-12T00:39:08Z

plugins/spi/SPIOutput.cpp

+  // Device ..
+  // Device 2
+  // Device 1
+  // short brake of 8x period of clock (666ns .. 2.74ms) to generate latchpulse


peternewman · 2018-04-12T00:40:28Z

plugins/spi/SPIOutput.cpp

+  const unsigned int first_slot = m_start_address - 1;  // 0 offset
+
+  // calculate how much channels for full devices are available in dmx_buffer
+  uint16_t devices_in_buffer =


I think this can be a uint8_t, or at least the value will always be < 255, or does it need to be a uint16_t as that's what buffer.Size() returns?

peternewman · 2018-04-12T00:40:55Z

plugins/spi/SPIOutput.cpp

+    return;
+  }
+
+  // rename m_pxiel_count for easier understanding.


peternewman · 2018-04-12T00:49:59Z

plugins/spi/SPIOutput.h

+  PACK(
+  struct TLC5971_packet_config_fields_t{
+    //  Write Command (6Bit)
+    uint8_t WRCMD : 6;


Okay I've learnt some new syntax here. 😄

peternewman · 2018-04-12T00:50:46Z

plugins/spi/SPIOutput.h

+
+  union TLC5971_packet_gsdata_t {
+    uint8_t bytes[24];
+    // the uint16_t will not work everywhere because of endianess problems..


SPaG: endianness

I think some use of our Host to Network code should fix that issue.

i will have a look at this..

http://docs.openlighting.org/ola/doc/latest/namespaceola_1_1network.html#ae9263a09db6563136f1e46af389dbf0f

peternewman · 2018-04-12T11:34:20Z

Can you give us some working and broken configurations to compare please?

s-light · 2018-04-12T12:48:09Z

thanks for your feedback!

i test some configurations for comparison this evening and write up what i find.
last time i just had to enable 4 ports
(all @ 16 'pixels' --> every pixel in this type has 12 LEDs @ 16bit - so 24Channels as that is the smallest amount of data i need to generate a package for one driver chip..)
and it crashed - - 1, 2, 3 ports worked fine before i tested 4, - all with the same 16 pixels... - but if i after a crash went back to 3 or 2 it also crashed.. only 1 port worked then...
so i want to make sure i try and test this a bit more structured (makes a restart of the system a difference for example...)

s-light · 2018-04-12T23:19:40Z

Here my tests / configurations:
as basic i used these set of files:
LEDBoard_Layout_Sun/sw/ETH_SPI_bridge/ola_config/target_config

i only changed things in ola-spi.conf

TLDR result

As far as i can tell its related to the pixel-count value only.
There are these options for the values:

type	pixel-count
works	20, 21
crashes on normal exit	9, 10
crashes after a short time	8, 12
crashes immediately	1, 2, 3, 4, 5, 6, 7, 11, 13, 15, 16, , 17, 18, 19

(more than 21 is not possible because of the universe limit: 21*24=504)

so much for tonight. i think i try to read through the code tomorrow once more and hope i find something that looks wired to me...

details

(i wrote it while testing.. to get it documented)

1. try the config as is - crash

*** Error in '/usr/local/bin/olad': double free or corruption (!prev): 0x0009ebb0 ***
Abgebrochen

2. enable only 1 port with pixel-count 20 - works

(only posting here what i have changed / is relevant..)

spidev32766.0-ports = 1
spidev32766.0-0-pixel-count = 20

3. changing pixel-count to 21 - works

spidev32766.0-0-pixel-count = 21

4. change pixel-count to 1 - crashes

spidev32766.0-0-pixel-count = 1

error message is

*** Error in '/usr/local/bin/olad': malloc(): smallbin double linked list corrupted: 0x000f1e38 ***
Abgebrochen

6. changing pixel-count to 20 and port count to 2 - works

spidev32766.0-ports = 2
spidev32766.0-0-pixel-count = 20
spidev32766.0-1-pixel-count = 20

7. changing all pixel-count to 20 and port count to 12 - does not crash

spidev32766.0-ports = 12
spidev32766.0-*-pixel-count = 20

error

plugins/spi/SPIWriter.cpp:119: Failed to write all the SPI data: Message too long

8. changing all pixel-count to 20 and port count to 11 7 - works

spidev32766.0-ports = 7
spidev32766.0-*-pixel-count = 20

9. enable only 1 port with pixel-count 2 - crash

spidev32766.0-ports = 1
spidev32766.0-0-pixel-count = 2

error

Received Segmentation fault
^Ccommon/thread/SignalThread.cpp:115: Received signal: Interrupt
Getötet

10. pixel-count 3..8

pixel-count	error message
3	`Speicherzugriffsfehler`
4	`* Error in '/usr/local/bin/olad': malloc(): smallbin double linked list corrupted: 0x00094f70 * Abgebrochen`
5	`* Error in '/usr/local/bin/olad': malloc(): smallbin double linked list corrupted: 0x000851f8 * Abgebrochen`
6	`* Error in '/usr/local/bin/olad': malloc(): smallbin double linked list corrupted: 0x0008a738 * Abgebrochen`
7	`* Error in '/usr/local/bin/olad': malloc(): smallbin double linked list corrupted: 0x0009db50 * Abgebrochen`
8	worked for a short moment `* Error in '/usr/local/bin/olad': corrupted double-linked list: 0x00051160 * Abgebrochen`

11. pixel-count 9 - works - kind of

works as long as its running - at the moment i hit Ctrl+C for exit i get

*** Error in '/usr/local/bin/olad': double free or corruption (!prev): 0x0009af60 ***
Abgebrochen

12. pixel-count 10 - works - kind of

similar to 9 - but at one occurrence i got an Segmentation fault and after this had to kill it manually...

olad/AvahiDiscoveryAgent.cpp:236: State for OLA Server._http._tcp,_ola, group 0xead00 changed to AVAHI_ENTRY_GROUP_ESTABLISHED
**^C**common/thread/SignalThread.cpp:115: Received signal: Interrupt
common/http/HTTPServer.cpp:537: Notifying HTTP server thread to stop
common/http/HTTPServer.cpp:539: Waiting for HTTP server thread to exit
common/http/HTTPServer.cpp:541: HTTP server thread exited
Received Segmentation fault
Getötet

13. pixel-count 11..19

pixel-count	error message
11	`* Error in '/usr/local/bin/olad': corrupted double-linked list: 0x000ecdf8 * Abgebrochen`
12	worked for a moment `* Error in '/usr/local/bin/olad': corrupted double-linked list: 0x000a5280 * Abgebrochen`
13	`* Error in '/usr/local/bin/olad': corrupted double-linked list: 0x000a5280 * Abgebrochen`
14	worked kind of - like 9
15	`* Error in '/usr/local/bin/olad': malloc(): smallbin double linked list corrupted: 0x0009ef28 * Abgebrochen`
16	crash like 10 - but had to kill it every time i tried
17	crash like 16
18	crash like 16
19	`* Error in '/usr/local/bin/olad': malloc(): memory corruption: 0x0008ae48 * Abgebrochen`

peternewman · 2018-04-13T09:43:19Z

Possibly relevant:
https://stackoverflow.com/questions/19534051/glibc-detect-smallbin-linked-list-corrupted?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa

I'd look at what you checkout from the SPI buffer, and how you access that afterwards.

On a different note, if you update your branch compared to master, the Travis build should start working again.

peternewman

A few random thoughts on what might be breaking it.

peternewman · 2018-04-13T09:47:26Z

plugins/spi/SPIOutput.cpp

+    // should return 28byte = 224bit
+
+    // copy data to output buffer
+    // memcpy(output + spi_offset, device_data.bytes, sizeof(TLC5971_packet_t));


Does the memcpy not work? Why not, that ought to be quicker and safer.

peternewman · 2018-04-13T09:49:51Z

plugins/spi/SPIOutput.cpp

+  }  // for devices_in_buffer end
+
+  // write output back
+  m_backend->Commit(m_output_number);


Low tech debug, add a log line before this, is it this causing the issue, or the code above.

s-light added 11 commits January 6, 2018 16:17

added TLC5971 test cases

243b150

Merge branch 'APA102PixelBrightness' into SPI_TLC5971_new

6f9d7c1

readded TLC5971 code

017fce5

Merge branch 'APA102PixelBrightness' into SPI_TLC5971_new

394b1ef

fixed mergin bug

3ead26f

Merge branch 'APA102PixelBrightness' into SPI_TLC5971_new

888807a

Merge branch 'APA102PixelBrightness' into SPI_TLC5971_new

d8ab05e

Merge branch 'SPIPlugin_PortCount' into SPI_TLC5971_new

38b651e

Merge branch 'SPIPlugin_PortCount' into SPI_TLC5971_new

04528fc

Merge remote-tracking branch 'origin/master' into SPI_TLC5971_new

5360d10

fixed APA change

3607990

peternewman requested changes Apr 12, 2018

View reviewed changes

fix SPaG

04da39e

peternewman reviewed Apr 13, 2018

View reviewed changes

s-light added 2 commits April 13, 2018 22:43

Merge branch 'master' into SPI_TLC5971_new

7fe9ea4

Merge branch 'master' into SPI_TLC5971_new

1e0eddd

peternewman added this to the 0.11.0 milestone May 2, 2018

SPI Plugin - TI TLC5971 support #1399

Are you sure you want to change the base?

SPI Plugin - TI TLC5971 support #1399

Uh oh!

Conversation

s-light commented Apr 11, 2018

Uh oh!

peternewman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peternewman commented Apr 12, 2018

Uh oh!

s-light commented Apr 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

s-light commented Apr 12, 2018

TLDR result

details

1. try the config as is - crash

2. enable only 1 port with pixel-count 20 - works

3. changing pixel-count to 21 - works

4. change pixel-count to 1 - crashes

6. changing pixel-count to 20 and port count to 2 - works

7. changing all pixel-count to 20 and port count to 12 - does not crash

8. changing all pixel-count to 20 and port count to 11 7 - works

9. enable only 1 port with pixel-count 2 - crash

10. pixel-count 3..8

11. pixel-count 9 - works - kind of

12. pixel-count 10 - works - kind of

13. pixel-count 11..19

Uh oh!

peternewman commented Apr 13, 2018

Uh oh!

peternewman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

s-light commented Apr 12, 2018 •

edited

Loading