Skip to content
This repository was archived by the owner on Mar 6, 2020. It is now read-only.

Conversation

@machinekoder
Copy link
Member

Params are deprecated (they can't be connected to signals).

@ArcEye
Copy link

ArcEye commented Jan 4, 2018

This build has errored out on the runtests.
The test counter-encoder.0, which uses stepgen has dumped a core file and is stuck in a loop

msgd:0 stopped
rtapi:0 stopped
test.hal:1: Warning: HAL_LIB: HAL will pretend that the exact base period requested is possible.
This mode is not suitable for running real hardware.
test.hal:6: insmod failed, returned -1:
rtapi_rpc(): reply timeout
See /var/log/linuxcnc.log for more information.
halcmd: cant connect to rtapi_app: -1 (uri= uuid=a42c8c6b-4025-4f83-ba28-dad21114744a): rtapi_rpc(): reply timeout

halcmd: the rtapi:0 RT demon is not running - please investigate /var/log/linuxcnc.log
halcmd: the msgd:0 logger demon is not running - please investigate /var/log/linuxcnc.log

Line 6 in test.hal is the line
loadrt stepgen step_type=2

As these PRs are linked, do you want to close #1345 and #1346 for now until you fix this one?
Then I can cancel all the builds that are stacked instead of building them on top of the old stepgen component.

@ArcEye
Copy link

ArcEye commented Jan 4, 2018

OK, guess you are busy.
I have canceled the build for #1345, which appears to be dependent, aborted #1344, which will leave #1346, which should not have any dependency or difficulty with tests, to complete.

When you have sorted #1344, just need to close and reopen #1345 which should queue it for test build again.

if (retval != 0) { return retval; }
/* export parameter for position scaling */
retval = hal_param_float_newf(HAL_RW, &(addr->pos_scale), comp_id,
retval = hal_pin_float_newf(HAL_IN, &(addr->pos_scale), comp_id,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be HAL_IO

if (retval != 0) { return retval; }
/* export param for scaled velocity (frequency in Hz) */
retval = hal_param_float_newf(HAL_RO, &(addr->freq), comp_id,
retval = hal_pin_float_newf(HAL_IN, &(addr->freq), comp_id,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be HAL_OUT


/* export param variable for raw counts */
retval = hal_param_s32_newf(HAL_RO, &(addr->rawcount), comp_id,
retval = hal_pin_s32_newf(HAL_IN, &(addr->rawcount), comp_id,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be HAL_OUT

@machinekoder
Copy link
Member Author

Let's see if that fixes the problem.

@ArcEye
Copy link

ArcEye commented Mar 27, 2018

Had to abort that build.

It failed in exactly the same way as previously, dumped core and stuck in endless loop after

msgd:0 stopped
rtapi:0 stopped
test.hal:1: Warning: HAL_LIB: HAL will pretend that the exact base period requested is possible.
This mode is not suitable for running real hardware.
test.hal:6: insmod failed, returned -1:
rtapi_rpc(): reply timeout
See /var/log/linuxcnc.log for more information.
halcmd: cant connect to rtapi_app: -1 (uri= uuid=a42c8c6b-4025-4f83-ba28-dad21114744a): rtapi_rpc(): reply timeout

halcmd: the rtapi:0 RT demon is not running - please investigate /var/log/linuxcnc.log
halcmd: the msgd:0 logger demon is not running - please investigate /var/log/linuxcnc.log

Think the only way forward is to runtests locally and try to find why insmod does not like the module now, by looking at the linuxcnc.log.
I cannot immediately see any instance of param assignment not changed to pointer indirection assignment etc.

@ArcEye
Copy link

ArcEye commented Mar 27, 2018

A quick run through confirms in a local test

Output from linuxcnc.log confirms segfault on loading.
Nothing in dmesg however, so not a GP fault.

A backtrace through the core file will hopefully tell you what it is.

Mar 27 11:41:42 INTEL-i7 msgd:0: ulapi:32362:user _ulapi_init(): ulapi posix unknown loaded
Mar 27 11:41:42 INTEL-i7 msgd:0: ulapi:32362:user halg_xinitfv:271 HAL: singleton component 'hal_lib32362' id=92 initialized
Mar 27 11:41:42 INTEL-i7 rtapi:0: 1:rtapi_app:32353:user signal 11 - 'Segmentation fault' received, dumping core (current dir=/media/sdc/sdc7/home/mick/src/machinekit/tests/counter-encoder.0)
Mar 27 11:41:42 INTEL-i7 rtapi:0: 1:rtapi_app:32353:user  --- rtapi_app backtrace: ---
Mar 27 11:41:42 INTEL-i7 msgd:0: hal_lib:32353:rt halg_xinitfv:90 HAL: initializing component 'stepgen' type=1 arg1=0 arg2=0/0x0
Mar 27 11:41:42 INTEL-i7 rtapi:0: 1:rtapi_app:32353:user 7f7dc7802bb7 export_stepgen   (hal/components/stepgen.c:1160)
Mar 27 11:41:42 INTEL-i7 rtapi:0: 1:rtapi_app:32353:user 7f7dc78010ee rtapi_app_main   (hal/components/stepgen.c:504)
Mar 27 11:41:42 INTEL-i7 rtapi:0: 1:rtapi_app:32353:user 560192405b7b do_load_cmd      (rtapi/posix/rtapi_app.cc:647)
Mar 27 11:41:42 INTEL-i7 rtapi:0: 1:rtapi_app:32353:user 5601924087f9 rtapi_request    (rtapi/posix/rtapi_app.cc:944)
Mar 27 11:41:42 INTEL-i7 rtapi:0: 1:rtapi_app:32353:user 7f7dcd6c7055 zloop_start ??:0
Mar 27 11:41:42 INTEL-i7 rtapi:0: 1:rtapi_app:32353:user 560192407f04 mainloop         (rtapi/posix/rtapi_app.cc:1407)
Mar 27 11:41:42 INTEL-i7 rtapi:0: 1:rtapi_app:32353:user 560192401b16 main             (rtapi/posix/rtapi_app.cc:1762)
Mar 27 11:41:42 INTEL-i7 rtapi:0: 1:rtapi_app:32353:user 7f7dcbf16a86 __libc_start_main ??:0
Mar 27 11:41:42 INTEL-i7 rtapi:0: 1:rtapi_app:32353:user 560192401c19 _start ??:0
Mar 27 11:41:42 INTEL-i7 rtapi:0: 1:rtapi_app:32353:user ffffffffffffffff ?? ??:0
Mar 27 11:41:42 INTEL-i7 rtapi:0: 1:rtapi_app:32353:user  --------------------
Mar 27 11:41:42 INTEL-i7 msgd:0: rtapi_app exit detected - scheduled shutdown

@ArcEye
Copy link

ArcEye commented Mar 27, 2018

OK think I have it.
It could only be failing in the initial creation stage and gbd confirmed it was pin initialisation in
export_stepgen(...) that caused it

(gdb) thread 1
[Switching to thread 1 (Thread 0x7f7dcef1b5c0 (LWP 32353))]
#0  0x00007f7dc7802bb7 in export_stepgen (num=0, addr=0x7f7dc8ed0e80, step_type=2, pos_mode=1) at hal/components/stepgen.c:1160
1160		*(addr->dir_setup) = 0;
(gdb) backtrace
#0  0x00007f7dc7802bb7 in export_stepgen (num=0, addr=0x7f7dc8ed0e80, step_type=2, pos_mode=1) at hal/components/stepgen.c:1160
#1  0x00007f7dc78010ef in rtapi_app_main () at hal/components/stepgen.c:504
#2  0x0000560192405b7c in do_load_cmd (name="stepgen", args=..., pbreply=..., instance=<optimized out>) at rtapi/posix/rtapi_app.cc:647
#3  0x00005601924087fa in rtapi_request (loop=<optimized out>, socket=0x5601936a9d70, arg=<optimized out>) at rtapi/posix/rtapi_app.cc:944
#4  0x00007f7dcd6c7056 in zloop_start () from /usr/lib/x86_64-linux-gnu/libczmq.so.4
#5  0x0000560192407f05 in mainloop (argc=<optimized out>, argv=0x7ffcce050468) at rtapi/posix/rtapi_app.cc:1407
#6  0x0000560192401b17 in main (argc=2, argv=0x7ffcce050468) at rtapi/posix/rtapi_app.cc:1762
(gdb) up 0
#0  0x00007f7dc7802bb7 in export_stepgen (num=0, addr=0x7f7dc8ed0e80, step_type=2, pos_mode=1) at hal/components/stepgen.c:1160
1160		*(addr->dir_setup) = 0;
(gdb) list
1155	    if ( step_type == 0 ) {
1156		*(addr->dir_hold_dly) = 1;
1157		*(addr->dir_setup) = 1;
1158	    } else {
1159		*(addr->dir_hold_dly) = 1;
1160		*(addr->dir_setup) = 0;
1161	    }
1162	    /* set 'old' values to make update_freq validate the timing params */
1163	    addr->old_step_len = ~0;
1164	    addr->old_step_space = ~0;

Line 1160 appears perfectly valid until I noticed that the argv contains step_type set to 2
export_stepgen (num=0, addr=0x7f7dc8ed0e80, step_type=2, pos_mode=1)

https://github.com/machinekoder/machinekit/blob/691cd5d6a6fffc871eae0790cbdc9ae0a04223f3/src/hal/components/stepgen.c#L1094
creates a dir_setup pin if step_type=0, but does not if it isnt

So when it is set to step_type=2 the initialisation assignment at L:1160 accesses ??? because it was not created.

The original had the same line
https://github.com/machinekit/machinekit/blob/master/src/hal/components/stepgen.c#L1160
but I think got away with it, because the struct allocated space for hal_u32_t, which could
be zeroed, but the new code being a pointer has to point somewhere to be valid.

Think if you comment out line 1160 it should work.
(subject for checking for any other similar lazy assignments from the old code 😄 )

@ArcEye
Copy link

ArcEye commented Mar 27, 2018

In fact it gets worse.

The dir_setup pin was accessed later when attached to a thread, even though it does not exist.

Also the same sloppy assignments to params that didn't exist were made with step_space param as well in the original code and that causes a crash later when attached to a thread and referenced as a pin.

I now have it running and passing the runtest, by creating the pins irrespective of step_type, because the later code references them without testing what the step_type is.

I'll try to do a PR to your branch for this.

@ArcEye
Copy link

ArcEye commented Mar 27, 2018

machinekoder#3 done

It won't merge as is. but you can see the changes easily enough

struct stepgen_t contained space for params, but not all were created.
Later assignments and comparisons worked even though the param did
not exist, because the original hal_xxx_t existed in the struct.

When params were converted to pointers to enable real pin creation
these shortcuts caused segfaults addressing null pointers

Signed-off-by: Mick <arceye@mgware.co.uk>
@ArcEye
Copy link

ArcEye commented Mar 28, 2018

The failures are because of runtests that try to print params, which are now pins; or get more pins than previous, so the result does not match expected.

Did not test a full runtests, just the original failure test.

I'll adjust the expected output and PR to your branch in a while

@ArcEye
Copy link

ArcEye commented Mar 28, 2018

machinekoder#4 applies, tests pass locally

@luminize
Copy link

no activity on this PR for 6 months, stale, closing

@luminize luminize closed this Oct 10, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants