[Commented] (SUREFIRE-1302) Surefire does not wait long enough for the forked VM and assumes it to be dead

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[Commented] (SUREFIRE-1302) Surefire does not wait long enough for the forked VM and assumes it to be dead

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SUREFIRE-1302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017086#comment-16017086 ]

Olivier Peyrusse commented on SUREFIRE-1302:

Hello [~tibor17],
I will try adding more details about the logs and printing the ping times.
However, I am still not confortable with your heuristics as it depends a lot on the GC configuration, the size of your eden, survivor and old gen spaces, because when your memory starts to be full, it will take much more time to find some space in the heap.
I remain on my position that we should make the timer waits for 1 minute without a ping, and then kill the machine, as we are dealing with tests here, not a real-time application with needs for fast-recovery.

Best regards

> Surefire does not wait long enough for the forked VM and assumes it to be dead
> ------------------------------------------------------------------------------
>                 Key: SUREFIRE-1302
>                 URL: https://issues.apache.org/jira/browse/SUREFIRE-1302
>             Project: Maven Surefire
>          Issue Type: Request
>          Components: Maven Surefire Plugin
>    Affects Versions: 2.19.1
>            Reporter: Yuriy Zaplavnov
>            Assignee: Tibor Digana
>             Fix For: 2.20.1
>         Attachments: 2017-05-18T05-48-08_685-jvmRun1.dumpstream, surefire-logs, surefire-tests-terminated-master-aa9330316038f6b46316ce36ff40714ffc7cf299.zip, tests_log_01.txt, tests_log_02.txt
> This issue happens because surefire kills the forked container if it times out waiting for the 'ping'.
> In org.apache.maven.surefire.booter.ForkedBooter class there is hardcoded constant PING_TIMEOUT_IN_SECONDS  = 20 which is used in the following method:
> {code}
> private static ScheduledFuture<?> listenToShutdownCommands( CommandReader reader )
>     {
>         reader.addShutdownListener( createExitHandler( reader ) );
>         AtomicBoolean pingDone = new AtomicBoolean( true );
>         reader.addNoopListener( createPingHandler( pingDone ) );
>         return JVM_TERMINATOR.scheduleAtFixedRate( createPingJob( pingDone, reader ),
>                                                    0,PING_TIMEOUT_IN_SECONDS, SECONDS );
>     }
> {code}
> to create ScheduledFuture.
> In some of the cases the forked container might respond a bit later than it's expected and surefire kills it
> {code}
> private static Runnable createPingJob( final AtomicBoolean pingDone, final CommandReader reader  )
>     {
>         return new Runnable()
>         {
>             public void run()
>             {
>                 boolean hasPing = pingDone.getAndSet( false );
>                 if ( !hasPing )
>                 {
>                     exit( 1, KILL, reader, true );
>                 }
>             }
>         };
>     }
> {code}
> As long as we need to terminate it anyway, It would be really helpful if the problem could be solved making the PING_TIMEOUT_IN_SECONDS  configurable with the ability to specify the value from maven-surefire-plugin.
> It would help to configure this timeout based on needs and factors of the projects where surefire runs.

This message was sent by Atlassian JIRA