Take threaddump on hung surefire tests

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Take threaddump on hung surefire tests

Debraj Manna
Sometimes I have maven surefire tests that get hung, due to either races or
deadlocks.

When this happens I have to discover what slave is being used, and then I
have to log on that slave, sudo to jenkins account and execute either
jstack or kill -3

I am looking for a simple solution like doing jstack / kill -3 when someone
presses abort button on the jenkins.

Can someone suggest how can I automate this or some better way of handling
this?
Reply | Threaded
Open this post in threaded view
|

Re: Take threaddump on hung surefire tests

Enrico Olivelli
Hi,
you can create a simple Listener like this one:
https://github.com/apache/bookkeeper/blob/master/bookkeeper-common/src/test/java/org/apache/bookkeeper/common/testing/util/TimedOutTestsListener.java

check on the pom.xml file about how to enable it:
https://github.com/apache/bookkeeper/blob/2f996dcf0159f945f7ec97ce7402e5d293009444/bookkeeper-server/pom.xml#L212

hope that helps

Enrico

Il giorno gio 3 ott 2019 alle ore 14:49 Debraj Manna <
[hidden email]> ha scritto:

> Sometimes I have maven surefire tests that get hung, due to either races or
> deadlocks.
>
> When this happens I have to discover what slave is being used, and then I
> have to log on that slave, sudo to jenkins account and execute either
> jstack or kill -3
>
> I am looking for a simple solution like doing jstack / kill -3 when someone
> presses abort button on the jenkins.
>
> Can someone suggest how can I automate this or some better way of handling
> this?
>
Reply | Threaded
Open this post in threaded view
|

Re: Take threaddump on hung surefire tests

Tibor Digana
In reply to this post by Debraj Manna
Hi Debraj,

There is nice technical idea from Enrico.
If you apply it and you are convinced that it would work properly for all
the Java community, feel free to show it and we can discuss it on how we
would adopt your solution in Surefire project.

Cheers
Tibor17

On Thu, Oct 3, 2019 at 2:49 PM Debraj Manna <[hidden email]>
wrote:

> Sometimes I have maven surefire tests that get hung, due to either races or
> deadlocks.
>
> When this happens I have to discover what slave is being used, and then I
> have to log on that slave, sudo to jenkins account and execute either
> jstack or kill -3
>
> I am looking for a simple solution like doing jstack / kill -3 when someone
> presses abort button on the jenkins.
>
> Can someone suggest how can I automate this or some better way of handling
> this?
>
Reply | Threaded
Open this post in threaded view
|

Re: Take threaddump on hung surefire tests

Debraj Manna
Yeah sure ... thanks.

On Thu, Oct 3, 2019 at 7:50 PM Tibor Digana <[hidden email]> wrote:

> Hi Debraj,
>
> There is nice technical idea from Enrico.
> If you apply it and you are convinced that it would work properly for all
> the Java community, feel free to show it and we can discuss it on how we
> would adopt your solution in Surefire project.
>
> Cheers
> Tibor17
>
> On Thu, Oct 3, 2019 at 2:49 PM Debraj Manna <[hidden email]>
> wrote:
>
> > Sometimes I have maven surefire tests that get hung, due to either races
> or
> > deadlocks.
> >
> > When this happens I have to discover what slave is being used, and then I
> > have to log on that slave, sudo to jenkins account and execute either
> > jstack or kill -3
> >
> > I am looking for a simple solution like doing jstack / kill -3 when
> someone
> > presses abort button on the jenkins.
> >
> > Can someone suggest how can I automate this or some better way of
> handling
> > this?
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Take threaddump on hung surefire tests

Debraj Manna
Enrico

If I get the approach correctly then all my junit4 tests should have
timeout specified (either via @Test or via @Rule) then only I can use the
listener. But the problem is we are having more than 2000 tests and
specifying a timeout in each of the tests/classes is cumbersome.

Correct me if I have misunderstood anything.



On Fri, Oct 4, 2019 at 3:18 PM Debraj Manna <[hidden email]>
wrote:

> Yeah sure ... thanks.
>
> On Thu, Oct 3, 2019 at 7:50 PM Tibor Digana <[hidden email]>
> wrote:
>
>> Hi Debraj,
>>
>> There is nice technical idea from Enrico.
>> If you apply it and you are convinced that it would work properly for all
>> the Java community, feel free to show it and we can discuss it on how we
>> would adopt your solution in Surefire project.
>>
>> Cheers
>> Tibor17
>>
>> On Thu, Oct 3, 2019 at 2:49 PM Debraj Manna <[hidden email]>
>> wrote:
>>
>> > Sometimes I have maven surefire tests that get hung, due to either
>> races or
>> > deadlocks.
>> >
>> > When this happens I have to discover what slave is being used, and then
>> I
>> > have to log on that slave, sudo to jenkins account and execute either
>> > jstack or kill -3
>> >
>> > I am looking for a simple solution like doing jstack / kill -3 when
>> someone
>> > presses abort button on the jenkins.
>> >
>> > Can someone suggest how can I automate this or some better way of
>> handling
>> > this?
>> >
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Take threaddump on hung surefire tests

Enrico Olivelli
Il ven 4 ott 2019, 16:30 Debraj Manna <[hidden email]> ha scritto:

> Enrico
>
> If I get the approach correctly then all my junit4 tests should have
> timeout specified (either via @Test or via @Rule) then only I can use the
> listener. But the problem is we are having more than 2000 tests and
> specifying a timeout in each of the tests/classes is cumbersome.
>


We don't have timeouts Rules in Bookkeeper. So there is something wrong
with your analysis.

I have other variations of that listener.
I don't have time this weekend to check.
I will check as soon as I can

Enrico

>
> Correct me if I have misunderstood anything.
>
>
>
> On Fri, Oct 4, 2019 at 3:18 PM Debraj Manna <[hidden email]>
> wrote:
>
> > Yeah sure ... thanks.
> >
> > On Thu, Oct 3, 2019 at 7:50 PM Tibor Digana <[hidden email]>
> > wrote:
> >
> >> Hi Debraj,
> >>
> >> There is nice technical idea from Enrico.
> >> If you apply it and you are convinced that it would work properly for
> all
> >> the Java community, feel free to show it and we can discuss it on how we
> >> would adopt your solution in Surefire project.
> >>
> >> Cheers
> >> Tibor17
> >>
> >> On Thu, Oct 3, 2019 at 2:49 PM Debraj Manna <[hidden email]>
> >> wrote:
> >>
> >> > Sometimes I have maven surefire tests that get hung, due to either
> >> races or
> >> > deadlocks.
> >> >
> >> > When this happens I have to discover what slave is being used, and
> then
> >> I
> >> > have to log on that slave, sudo to jenkins account and execute either
> >> > jstack or kill -3
> >> >
> >> > I am looking for a simple solution like doing jstack / kill -3 when
> >> someone
> >> > presses abort button on the jenkins.
> >> >
> >> > Can someone suggest how can I automate this or some better way of
> >> handling
> >> > this?
> >> >
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Take threaddump on hung surefire tests

Tibor Digana
In reply to this post by Debraj Manna
Hi Debraj,

It depends on your requirements.

As I initially understood your email, every test method wants to have
implicit timeout value even without the annotation @Timeout.
Then the testStarted is important event even your code will see when the
test method has started and timeout can be easily computed.
Of course you need to use an extra thread which checks the end events.
https://junit.org/junit4/javadoc/4.12/org/junit/runner/notification/RunListener.html#testStarted(org.junit.runner.Description)
Your code can call the method "pleaseStop()" on the listener. I think it
would not be reliable algorithm in all use cases if you use forkCount > 1
and there you will need to have our support.

Is it this what you need, the global timeout with the same value on every
test method?

Cheers
Tibor

On Fri, Oct 4, 2019 at 4:30 PM Debraj Manna <[hidden email]>
wrote:

> Enrico
>
> If I get the approach correctly then all my junit4 tests should have
> timeout specified (either via @Test or via @Rule) then only I can use the
> listener. But the problem is we are having more than 2000 tests and
> specifying a timeout in each of the tests/classes is cumbersome.
>
> Correct me if I have misunderstood anything.
>
>
>
> On Fri, Oct 4, 2019 at 3:18 PM Debraj Manna <[hidden email]>
> wrote:
>
> > Yeah sure ... thanks.
> >
> > On Thu, Oct 3, 2019 at 7:50 PM Tibor Digana <[hidden email]>
> > wrote:
> >
> >> Hi Debraj,
> >>
> >> There is nice technical idea from Enrico.
> >> If you apply it and you are convinced that it would work properly for
> all
> >> the Java community, feel free to show it and we can discuss it on how we
> >> would adopt your solution in Surefire project.
> >>
> >> Cheers
> >> Tibor17
> >>
> >> On Thu, Oct 3, 2019 at 2:49 PM Debraj Manna <[hidden email]>
> >> wrote:
> >>
> >> > Sometimes I have maven surefire tests that get hung, due to either
> >> races or
> >> > deadlocks.
> >> >
> >> > When this happens I have to discover what slave is being used, and
> then
> >> I
> >> > have to log on that slave, sudo to jenkins account and execute either
> >> > jstack or kill -3
> >> >
> >> > I am looking for a simple solution like doing jstack / kill -3 when
> >> someone
> >> > presses abort button on the jenkins.
> >> >
> >> > Can someone suggest how can I automate this or some better way of
> >> handling
> >> > this?
> >> >
> >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Take threaddump on hung surefire tests

Debraj Manna
Thanks again Enrico. I will try to find out from apache-bookmark code or
check in apache-bookkeper mailing list.

Yes Tibor I am looking for a global timeout without explicitly adding the
timeout annotation in all of my tests / classes. I am not using forkCount >
1. I get your testStarted part but it is still not clear to me about
calling 'pleaseStop() on the listener'. Are you suggesting to add this in
each of my test classes? Can you explain this a bit more?

On Fri, Oct 4, 2019 at 11:52 PM Tibor Digana <[hidden email]> wrote:

> Hi Debraj,
>
> It depends on your requirements.
>
> As I initially understood your email, every test method wants to have
> implicit timeout value even without the annotation @Timeout.
> Then the testStarted is important event even your code will see when the
> test method has started and timeout can be easily computed.
> Of course you need to use an extra thread which checks the end events.
>
> https://junit.org/junit4/javadoc/4.12/org/junit/runner/notification/RunListener.html#testStarted(org.junit.runner.Description)
> Your code can call the method "pleaseStop()" on the listener. I think it
> would not be reliable algorithm in all use cases if you use forkCount > 1
> and there you will need to have our support.
>
> Is it this what you need, the global timeout with the same value on every
> test method?
>
> Cheers
> Tibor
>
> On Fri, Oct 4, 2019 at 4:30 PM Debraj Manna <[hidden email]>
> wrote:
>
> > Enrico
> >
> > If I get the approach correctly then all my junit4 tests should have
> > timeout specified (either via @Test or via @Rule) then only I can use the
> > listener. But the problem is we are having more than 2000 tests and
> > specifying a timeout in each of the tests/classes is cumbersome.
> >
> > Correct me if I have misunderstood anything.
> >
> >
> >
> > On Fri, Oct 4, 2019 at 3:18 PM Debraj Manna <[hidden email]>
> > wrote:
> >
> > > Yeah sure ... thanks.
> > >
> > > On Thu, Oct 3, 2019 at 7:50 PM Tibor Digana <[hidden email]>
> > > wrote:
> > >
> > >> Hi Debraj,
> > >>
> > >> There is nice technical idea from Enrico.
> > >> If you apply it and you are convinced that it would work properly for
> > all
> > >> the Java community, feel free to show it and we can discuss it on how
> we
> > >> would adopt your solution in Surefire project.
> > >>
> > >> Cheers
> > >> Tibor17
> > >>
> > >> On Thu, Oct 3, 2019 at 2:49 PM Debraj Manna <[hidden email]
> >
> > >> wrote:
> > >>
> > >> > Sometimes I have maven surefire tests that get hung, due to either
> > >> races or
> > >> > deadlocks.
> > >> >
> > >> > When this happens I have to discover what slave is being used, and
> > then
> > >> I
> > >> > have to log on that slave, sudo to jenkins account and execute
> either
> > >> > jstack or kill -3
> > >> >
> > >> > I am looking for a simple solution like doing jstack / kill -3 when
> > >> someone
> > >> > presses abort button on the jenkins.
> > >> >
> > >> > Can someone suggest how can I automate this or some better way of
> > >> handling
> > >> > this?
> > >> >
> > >>
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Take threaddump on hung surefire tests

Tibor Digana
Notice that Surefire instantiates new JUnitCore() and then runs the test
class(es).
The method pleaseStop() stops this execution so that no new test method
would be followed.

On Sat, Oct 5, 2019 at 2:17 PM Debraj Manna <[hidden email]>
wrote:

> Thanks again Enrico. I will try to find out from apache-bookmark code or
> check in apache-bookkeper mailing list.
>
> Yes Tibor I am looking for a global timeout without explicitly adding the
> timeout annotation in all of my tests / classes. I am not using forkCount >
> 1. I get your testStarted part but it is still not clear to me about
> calling 'pleaseStop() on the listener'. Are you suggesting to add this in
> each of my test classes? Can you explain this a bit more?
>
> On Fri, Oct 4, 2019 at 11:52 PM Tibor Digana <[hidden email]>
> wrote:
>
> > Hi Debraj,
> >
> > It depends on your requirements.
> >
> > As I initially understood your email, every test method wants to have
> > implicit timeout value even without the annotation @Timeout.
> > Then the testStarted is important event even your code will see when the
> > test method has started and timeout can be easily computed.
> > Of course you need to use an extra thread which checks the end events.
> >
> >
> https://junit.org/junit4/javadoc/4.12/org/junit/runner/notification/RunListener.html#testStarted(org.junit.runner.Description)
> > Your code can call the method "pleaseStop()" on the listener. I think it
> > would not be reliable algorithm in all use cases if you use forkCount > 1
> > and there you will need to have our support.
> >
> > Is it this what you need, the global timeout with the same value on every
> > test method?
> >
> > Cheers
> > Tibor
> >
> > On Fri, Oct 4, 2019 at 4:30 PM Debraj Manna <[hidden email]>
> > wrote:
> >
> > > Enrico
> > >
> > > If I get the approach correctly then all my junit4 tests should have
> > > timeout specified (either via @Test or via @Rule) then only I can use
> the
> > > listener. But the problem is we are having more than 2000 tests and
> > > specifying a timeout in each of the tests/classes is cumbersome.
> > >
> > > Correct me if I have misunderstood anything.
> > >
> > >
> > >
> > > On Fri, Oct 4, 2019 at 3:18 PM Debraj Manna <[hidden email]>
> > > wrote:
> > >
> > > > Yeah sure ... thanks.
> > > >
> > > > On Thu, Oct 3, 2019 at 7:50 PM Tibor Digana <[hidden email]>
> > > > wrote:
> > > >
> > > >> Hi Debraj,
> > > >>
> > > >> There is nice technical idea from Enrico.
> > > >> If you apply it and you are convinced that it would work properly
> for
> > > all
> > > >> the Java community, feel free to show it and we can discuss it on
> how
> > we
> > > >> would adopt your solution in Surefire project.
> > > >>
> > > >> Cheers
> > > >> Tibor17
> > > >>
> > > >> On Thu, Oct 3, 2019 at 2:49 PM Debraj Manna <
> [hidden email]
> > >
> > > >> wrote:
> > > >>
> > > >> > Sometimes I have maven surefire tests that get hung, due to either
> > > >> races or
> > > >> > deadlocks.
> > > >> >
> > > >> > When this happens I have to discover what slave is being used, and
> > > then
> > > >> I
> > > >> > have to log on that slave, sudo to jenkins account and execute
> > either
> > > >> > jstack or kill -3
> > > >> >
> > > >> > I am looking for a simple solution like doing jstack / kill -3
> when
> > > >> someone
> > > >> > presses abort button on the jenkins.
> > > >> >
> > > >> > Can someone suggest how can I automate this or some better way of
> > > >> handling
> > > >> > this?
> > > >> >
> > > >>
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Take threaddump on hung surefire tests

Tibor Digana
In reply to this post by Debraj Manna
Some users may have a requirement to stop the exertion when a test hangs.
Other users woul maybe prefer to interrupt the test and continue with next
test.
This would lead to configuration with more than one config value => POJO in
new config parameter.

On Sat, Oct 5, 2019 at 2:17 PM Debraj Manna <[hidden email]>
wrote:

> Thanks again Enrico. I will try to find out from apache-bookmark code or
> check in apache-bookkeper mailing list.
>
> Yes Tibor I am looking for a global timeout without explicitly adding the
> timeout annotation in all of my tests / classes. I am not using forkCount >
> 1. I get your testStarted part but it is still not clear to me about
> calling 'pleaseStop() on the listener'. Are you suggesting to add this in
> each of my test classes? Can you explain this a bit more?
>
> On Fri, Oct 4, 2019 at 11:52 PM Tibor Digana <[hidden email]>
> wrote:
>
> > Hi Debraj,
> >
> > It depends on your requirements.
> >
> > As I initially understood your email, every test method wants to have
> > implicit timeout value even without the annotation @Timeout.
> > Then the testStarted is important event even your code will see when the
> > test method has started and timeout can be easily computed.
> > Of course you need to use an extra thread which checks the end events.
> >
> >
> https://junit.org/junit4/javadoc/4.12/org/junit/runner/notification/RunListener.html#testStarted(org.junit.runner.Description)
> > Your code can call the method "pleaseStop()" on the listener. I think it
> > would not be reliable algorithm in all use cases if you use forkCount > 1
> > and there you will need to have our support.
> >
> > Is it this what you need, the global timeout with the same value on every
> > test method?
> >
> > Cheers
> > Tibor
> >
> > On Fri, Oct 4, 2019 at 4:30 PM Debraj Manna <[hidden email]>
> > wrote:
> >
> > > Enrico
> > >
> > > If I get the approach correctly then all my junit4 tests should have
> > > timeout specified (either via @Test or via @Rule) then only I can use
> the
> > > listener. But the problem is we are having more than 2000 tests and
> > > specifying a timeout in each of the tests/classes is cumbersome.
> > >
> > > Correct me if I have misunderstood anything.
> > >
> > >
> > >
> > > On Fri, Oct 4, 2019 at 3:18 PM Debraj Manna <[hidden email]>
> > > wrote:
> > >
> > > > Yeah sure ... thanks.
> > > >
> > > > On Thu, Oct 3, 2019 at 7:50 PM Tibor Digana <[hidden email]>
> > > > wrote:
> > > >
> > > >> Hi Debraj,
> > > >>
> > > >> There is nice technical idea from Enrico.
> > > >> If you apply it and you are convinced that it would work properly
> for
> > > all
> > > >> the Java community, feel free to show it and we can discuss it on
> how
> > we
> > > >> would adopt your solution in Surefire project.
> > > >>
> > > >> Cheers
> > > >> Tibor17
> > > >>
> > > >> On Thu, Oct 3, 2019 at 2:49 PM Debraj Manna <
> [hidden email]
> > >
> > > >> wrote:
> > > >>
> > > >> > Sometimes I have maven surefire tests that get hung, due to either
> > > >> races or
> > > >> > deadlocks.
> > > >> >
> > > >> > When this happens I have to discover what slave is being used, and
> > > then
> > > >> I
> > > >> > have to log on that slave, sudo to jenkins account and execute
> > either
> > > >> > jstack or kill -3
> > > >> >
> > > >> > I am looking for a simple solution like doing jstack / kill -3
> when
> > > >> someone
> > > >> > presses abort button on the jenkins.
> > > >> >
> > > >> > Can someone suggest how can I automate this or some better way of
> > > >> handling
> > > >> > this?
> > > >> >
> > > >>
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Take threaddump on hung surefire tests

Debraj Manna
Enrico apache-bookeeper seems to be using timeout in @Test as I can see
from their github repo here
<https://github.com/apache/bookkeeper/search?q=%40Test%28timeout+%3D&unscoped_q=%40Test%28timeout+%3D>
.

Tibor how can I pass an instance of RunNotifier to a class extending from
RunListener so that I can call pleaseStop() from the listener when the
timeout of a test expires? Is there any example code I can refer to?

I am using Junit 4.12 and maven-surefire 2.22.2

On Sat, Oct 5, 2019 at 6:27 PM Tibor Digana <[hidden email]> wrote:

> Some users may have a requirement to stop the exertion when a test hangs.
> Other users woul maybe prefer to interrupt the test and continue with next
> test.
> This would lead to configuration with more than one config value => POJO in
> new config parameter.
>
> On Sat, Oct 5, 2019 at 2:17 PM Debraj Manna <[hidden email]>
> wrote:
>
> > Thanks again Enrico. I will try to find out from apache-bookmark code or
> > check in apache-bookkeper mailing list.
> >
> > Yes Tibor I am looking for a global timeout without explicitly adding the
> > timeout annotation in all of my tests / classes. I am not using
> forkCount >
> > 1. I get your testStarted part but it is still not clear to me about
> > calling 'pleaseStop() on the listener'. Are you suggesting to add this in
> > each of my test classes? Can you explain this a bit more?
> >
> > On Fri, Oct 4, 2019 at 11:52 PM Tibor Digana <[hidden email]>
> > wrote:
> >
> > > Hi Debraj,
> > >
> > > It depends on your requirements.
> > >
> > > As I initially understood your email, every test method wants to have
> > > implicit timeout value even without the annotation @Timeout.
> > > Then the testStarted is important event even your code will see when
> the
> > > test method has started and timeout can be easily computed.
> > > Of course you need to use an extra thread which checks the end events.
> > >
> > >
> >
> https://junit.org/junit4/javadoc/4.12/org/junit/runner/notification/RunListener.html#testStarted(org.junit.runner.Description)
> > > Your code can call the method "pleaseStop()" on the listener. I think
> it
> > > would not be reliable algorithm in all use cases if you use forkCount
> > 1
> > > and there you will need to have our support.
> > >
> > > Is it this what you need, the global timeout with the same value on
> every
> > > test method?
> > >
> > > Cheers
> > > Tibor
> > >
> > > On Fri, Oct 4, 2019 at 4:30 PM Debraj Manna <[hidden email]>
> > > wrote:
> > >
> > > > Enrico
> > > >
> > > > If I get the approach correctly then all my junit4 tests should have
> > > > timeout specified (either via @Test or via @Rule) then only I can use
> > the
> > > > listener. But the problem is we are having more than 2000 tests and
> > > > specifying a timeout in each of the tests/classes is cumbersome.
> > > >
> > > > Correct me if I have misunderstood anything.
> > > >
> > > >
> > > >
> > > > On Fri, Oct 4, 2019 at 3:18 PM Debraj Manna <
> [hidden email]>
> > > > wrote:
> > > >
> > > > > Yeah sure ... thanks.
> > > > >
> > > > > On Thu, Oct 3, 2019 at 7:50 PM Tibor Digana <
> [hidden email]>
> > > > > wrote:
> > > > >
> > > > >> Hi Debraj,
> > > > >>
> > > > >> There is nice technical idea from Enrico.
> > > > >> If you apply it and you are convinced that it would work properly
> > for
> > > > all
> > > > >> the Java community, feel free to show it and we can discuss it on
> > how
> > > we
> > > > >> would adopt your solution in Surefire project.
> > > > >>
> > > > >> Cheers
> > > > >> Tibor17
> > > > >>
> > > > >> On Thu, Oct 3, 2019 at 2:49 PM Debraj Manna <
> > [hidden email]
> > > >
> > > > >> wrote:
> > > > >>
> > > > >> > Sometimes I have maven surefire tests that get hung, due to
> either
> > > > >> races or
> > > > >> > deadlocks.
> > > > >> >
> > > > >> > When this happens I have to discover what slave is being used,
> and
> > > > then
> > > > >> I
> > > > >> > have to log on that slave, sudo to jenkins account and execute
> > > either
> > > > >> > jstack or kill -3
> > > > >> >
> > > > >> > I am looking for a simple solution like doing jstack / kill -3
> > when
> > > > >> someone
> > > > >> > presses abort button on the jenkins.
> > > > >> >
> > > > >> > Can someone suggest how can I automate this or some better way
> of
> > > > >> handling
> > > > >> > this?
> > > > >> >
> > > > >>
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Take threaddump on hung surefire tests

Tibor Digana
pleaseStop() can be called only inside Surefire.
So your solution has to be embedded in Surefire and the feature has to
configured.
Unfortunately it's by design of JUnit.
RunNotifier cannot be exposed because it must be only one instance and we
need to instantiate it and have it under the control in Surefire.
Therefore this feature has to be internal and implement a hook between
RunListener and RunNotifier which already exists in Surefire.
This is the support I was talking about in one of my previous emails.

T

On Sat, Oct 5, 2019 at 5:57 PM Debraj Manna <[hidden email]>
wrote:

> Enrico apache-bookeeper seems to be using timeout in @Test as I can see
> from their github repo here
> <
> https://github.com/apache/bookkeeper/search?q=%40Test%28timeout+%3D&unscoped_q=%40Test%28timeout+%3D
> >
> .
>
> Tibor how can I pass an instance of RunNotifier to a class extending from
> RunListener so that I can call pleaseStop() from the listener when the
> timeout of a test expires? Is there any example code I can refer to?
>
> I am using Junit 4.12 and maven-surefire 2.22.2
>
> On Sat, Oct 5, 2019 at 6:27 PM Tibor Digana <[hidden email]>
> wrote:
>
> > Some users may have a requirement to stop the exertion when a test hangs.
> > Other users woul maybe prefer to interrupt the test and continue with
> next
> > test.
> > This would lead to configuration with more than one config value => POJO
> in
> > new config parameter.
> >
> > On Sat, Oct 5, 2019 at 2:17 PM Debraj Manna <[hidden email]>
> > wrote:
> >
> > > Thanks again Enrico. I will try to find out from apache-bookmark code
> or
> > > check in apache-bookkeper mailing list.
> > >
> > > Yes Tibor I am looking for a global timeout without explicitly adding
> the
> > > timeout annotation in all of my tests / classes. I am not using
> > forkCount >
> > > 1. I get your testStarted part but it is still not clear to me about
> > > calling 'pleaseStop() on the listener'. Are you suggesting to add this
> in
> > > each of my test classes? Can you explain this a bit more?
> > >
> > > On Fri, Oct 4, 2019 at 11:52 PM Tibor Digana <[hidden email]>
> > > wrote:
> > >
> > > > Hi Debraj,
> > > >
> > > > It depends on your requirements.
> > > >
> > > > As I initially understood your email, every test method wants to have
> > > > implicit timeout value even without the annotation @Timeout.
> > > > Then the testStarted is important event even your code will see when
> > the
> > > > test method has started and timeout can be easily computed.
> > > > Of course you need to use an extra thread which checks the end
> events.
> > > >
> > > >
> > >
> >
> https://junit.org/junit4/javadoc/4.12/org/junit/runner/notification/RunListener.html#testStarted(org.junit.runner.Description)
> > > > Your code can call the method "pleaseStop()" on the listener. I think
> > it
> > > > would not be reliable algorithm in all use cases if you use forkCount
> > > 1
> > > > and there you will need to have our support.
> > > >
> > > > Is it this what you need, the global timeout with the same value on
> > every
> > > > test method?
> > > >
> > > > Cheers
> > > > Tibor
> > > >
> > > > On Fri, Oct 4, 2019 at 4:30 PM Debraj Manna <
> [hidden email]>
> > > > wrote:
> > > >
> > > > > Enrico
> > > > >
> > > > > If I get the approach correctly then all my junit4 tests should
> have
> > > > > timeout specified (either via @Test or via @Rule) then only I can
> use
> > > the
> > > > > listener. But the problem is we are having more than 2000 tests and
> > > > > specifying a timeout in each of the tests/classes is cumbersome.
> > > > >
> > > > > Correct me if I have misunderstood anything.
> > > > >
> > > > >
> > > > >
> > > > > On Fri, Oct 4, 2019 at 3:18 PM Debraj Manna <
> > [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Yeah sure ... thanks.
> > > > > >
> > > > > > On Thu, Oct 3, 2019 at 7:50 PM Tibor Digana <
> > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi Debraj,
> > > > > >>
> > > > > >> There is nice technical idea from Enrico.
> > > > > >> If you apply it and you are convinced that it would work
> properly
> > > for
> > > > > all
> > > > > >> the Java community, feel free to show it and we can discuss it
> on
> > > how
> > > > we
> > > > > >> would adopt your solution in Surefire project.
> > > > > >>
> > > > > >> Cheers
> > > > > >> Tibor17
> > > > > >>
> > > > > >> On Thu, Oct 3, 2019 at 2:49 PM Debraj Manna <
> > > [hidden email]
> > > > >
> > > > > >> wrote:
> > > > > >>
> > > > > >> > Sometimes I have maven surefire tests that get hung, due to
> > either
> > > > > >> races or
> > > > > >> > deadlocks.
> > > > > >> >
> > > > > >> > When this happens I have to discover what slave is being used,
> > and
> > > > > then
> > > > > >> I
> > > > > >> > have to log on that slave, sudo to jenkins account and execute
> > > > either
> > > > > >> > jstack or kill -3
> > > > > >> >
> > > > > >> > I am looking for a simple solution like doing jstack / kill -3
> > > when
> > > > > >> someone
> > > > > >> > presses abort button on the jenkins.
> > > > > >> >
> > > > > >> > Can someone suggest how can I automate this or some better way
> > of
> > > > > >> handling
> > > > > >> > this?
> > > > > >> >
> > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Take threaddump on hung surefire tests

Debraj Manna
Thanks Tibor. It is clear now.

On Sat 5 Oct, 2019, 9:36 PM Tibor Digana, <[hidden email]> wrote:

> pleaseStop() can be called only inside Surefire.
> So your solution has to be embedded in Surefire and the feature has to
> configured.
> Unfortunately it's by design of JUnit.
> RunNotifier cannot be exposed because it must be only one instance and we
> need to instantiate it and have it under the control in Surefire.
> Therefore this feature has to be internal and implement a hook between
> RunListener and RunNotifier which already exists in Surefire.
> This is the support I was talking about in one of my previous emails.
>
> T
>
> On Sat, Oct 5, 2019 at 5:57 PM Debraj Manna <[hidden email]>
> wrote:
>
> > Enrico apache-bookeeper seems to be using timeout in @Test as I can see
> > from their github repo here
> > <
> >
> https://github.com/apache/bookkeeper/search?q=%40Test%28timeout+%3D&unscoped_q=%40Test%28timeout+%3D
> > >
> > .
> >
> > Tibor how can I pass an instance of RunNotifier to a class extending from
> > RunListener so that I can call pleaseStop() from the listener when the
> > timeout of a test expires? Is there any example code I can refer to?
> >
> > I am using Junit 4.12 and maven-surefire 2.22.2
> >
> > On Sat, Oct 5, 2019 at 6:27 PM Tibor Digana <[hidden email]>
> > wrote:
> >
> > > Some users may have a requirement to stop the exertion when a test
> hangs.
> > > Other users woul maybe prefer to interrupt the test and continue with
> > next
> > > test.
> > > This would lead to configuration with more than one config value =>
> POJO
> > in
> > > new config parameter.
> > >
> > > On Sat, Oct 5, 2019 at 2:17 PM Debraj Manna <[hidden email]>
> > > wrote:
> > >
> > > > Thanks again Enrico. I will try to find out from apache-bookmark code
> > or
> > > > check in apache-bookkeper mailing list.
> > > >
> > > > Yes Tibor I am looking for a global timeout without explicitly adding
> > the
> > > > timeout annotation in all of my tests / classes. I am not using
> > > forkCount >
> > > > 1. I get your testStarted part but it is still not clear to me about
> > > > calling 'pleaseStop() on the listener'. Are you suggesting to add
> this
> > in
> > > > each of my test classes? Can you explain this a bit more?
> > > >
> > > > On Fri, Oct 4, 2019 at 11:52 PM Tibor Digana <[hidden email]
> >
> > > > wrote:
> > > >
> > > > > Hi Debraj,
> > > > >
> > > > > It depends on your requirements.
> > > > >
> > > > > As I initially understood your email, every test method wants to
> have
> > > > > implicit timeout value even without the annotation @Timeout.
> > > > > Then the testStarted is important event even your code will see
> when
> > > the
> > > > > test method has started and timeout can be easily computed.
> > > > > Of course you need to use an extra thread which checks the end
> > events.
> > > > >
> > > > >
> > > >
> > >
> >
> https://junit.org/junit4/javadoc/4.12/org/junit/runner/notification/RunListener.html#testStarted(org.junit.runner.Description)
> > > > > Your code can call the method "pleaseStop()" on the listener. I
> think
> > > it
> > > > > would not be reliable algorithm in all use cases if you use
> forkCount
> > > > 1
> > > > > and there you will need to have our support.
> > > > >
> > > > > Is it this what you need, the global timeout with the same value on
> > > every
> > > > > test method?
> > > > >
> > > > > Cheers
> > > > > Tibor
> > > > >
> > > > > On Fri, Oct 4, 2019 at 4:30 PM Debraj Manna <
> > [hidden email]>
> > > > > wrote:
> > > > >
> > > > > > Enrico
> > > > > >
> > > > > > If I get the approach correctly then all my junit4 tests should
> > have
> > > > > > timeout specified (either via @Test or via @Rule) then only I can
> > use
> > > > the
> > > > > > listener. But the problem is we are having more than 2000 tests
> and
> > > > > > specifying a timeout in each of the tests/classes is cumbersome.
> > > > > >
> > > > > > Correct me if I have misunderstood anything.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Oct 4, 2019 at 3:18 PM Debraj Manna <
> > > [hidden email]>
> > > > > > wrote:
> > > > > >
> > > > > > > Yeah sure ... thanks.
> > > > > > >
> > > > > > > On Thu, Oct 3, 2019 at 7:50 PM Tibor Digana <
> > > [hidden email]>
> > > > > > > wrote:
> > > > > > >
> > > > > > >> Hi Debraj,
> > > > > > >>
> > > > > > >> There is nice technical idea from Enrico.
> > > > > > >> If you apply it and you are convinced that it would work
> > properly
> > > > for
> > > > > > all
> > > > > > >> the Java community, feel free to show it and we can discuss it
> > on
> > > > how
> > > > > we
> > > > > > >> would adopt your solution in Surefire project.
> > > > > > >>
> > > > > > >> Cheers
> > > > > > >> Tibor17
> > > > > > >>
> > > > > > >> On Thu, Oct 3, 2019 at 2:49 PM Debraj Manna <
> > > > [hidden email]
> > > > > >
> > > > > > >> wrote:
> > > > > > >>
> > > > > > >> > Sometimes I have maven surefire tests that get hung, due to
> > > either
> > > > > > >> races or
> > > > > > >> > deadlocks.
> > > > > > >> >
> > > > > > >> > When this happens I have to discover what slave is being
> used,
> > > and
> > > > > > then
> > > > > > >> I
> > > > > > >> > have to log on that slave, sudo to jenkins account and
> execute
> > > > > either
> > > > > > >> > jstack or kill -3
> > > > > > >> >
> > > > > > >> > I am looking for a simple solution like doing jstack / kill
> -3
> > > > when
> > > > > > >> someone
> > > > > > >> > presses abort button on the jenkins.
> > > > > > >> >
> > > > > > >> > Can someone suggest how can I automate this or some better
> way
> > > of
> > > > > > >> handling
> > > > > > >> > this?
> > > > > > >> >
> > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>