RE: [VOTE] Maven incremental build for BIG-sized projects with local and remote caching [I]

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

RE: [VOTE] Maven incremental build for BIG-sized projects with local and remote caching [I]

Maximilian Novikov
Classification: For internal use only

Ok, looks like this is the only option for us: create extension and override MojoExecutor.

> The only challenge is an exhaustive test suite since your current impl can easily fake a passing build (as gradle does today if you don't disable the daemon and state cache on the CI).
I assume we are safe here. Our solution provides incremental build with per project granularity. So there is no need in smart "test relationship discovery" within a project.

-----Original Message-----
From: Romain Manni-Bucau [mailto:[hidden email]]
Sent: Saturday, September 14, 2019 11:48 AM
To: Maven Developers List <[hidden email]>
Subject: Re: [VOTE] Maven incremental build for BIG-sized projects with local and remote caching

Le sam. 14 sept. 2019 à 08:00, Alexander Ashitkin <[hidden email]> a écrit :

> Indeed we have a kind of the option 2 with variations. Current
> implementation is opt-in feature driven by configuration with some
> metadata of required cache behavior and hints.
>
> Maven extensions is the option, but we would love to have it in maven
> itself which is in interest of maven community i believe. Extension is
> a way we are trying to avoid and even not sure it could be implemented
> as extension as it requires changes in maven core.
>

No real change required in maven core here since guice enables to override any bean or even just to rewrite the pom to remove modules to just rebuild the minimum set (keeping downstream project).

The only challenge is an exhaustive test suite since your current impl can easily fake a passing build (as gradle does today if you dont disable the daemon and state cache on the CI).

Side note: test relationship discovery is close to AOT in terms of impl and very very slow so can be worse than doing the full suite in simple projects and it still asks the IT question.

So due to the numerous "?" of a core solution, extension is the way to go.
Now if a guice bean in core can help to write your extension, it can surely be reviewed more easily IMHO.

Hope it helps.


> Thanks in advance, Aleks
>
> On 2019/09/13 21:37:15, Romain Manni-Bucau <[hidden email]> wrote:
> > There are multiple possible incremental support:
> >
> > 1. Scm related: do a status and rebuild downstream reactor 2. Full
> > and module build graph: seems it is the one you target, ie bypass
> > modules without change. Note that it only works if upstream graph is
> taken
> > into account.
> > 3. Full build: each mojo has incremental support so the full build
> > gets
> it.
> > Issue is that it requires each mojo to know if it needs to be
> > executed or give enough info to the mojo executor to do so (gradle
> > requires all inputs/outputs to assume this state - which is still
> > just an heuristic
> and
> > not 100% reliable).
> >
> > In current state, 2. sounds like a good option since 3 can require
> > a
> loot
> > of work for external plugins (today's builds have a lot more of not
> > maven provide plugins than core plugins).
> > Now, we should be able to activate it or not so having a
> > cacheLocation config in settings.xml can be good.
> >
> > Side notes:
> >
> > 1. having it on by default will break builds - reactor is
> > deterministic
> and
> > bypassing a module can break a build since it can init maven
> > properties - for ex - for next modules 2. You cant find all in/out
> > paths from the pom in general so your algo is not generic, a meta
> > config can be needed in .mvn 3. We should let a mojo be able to
> > disable that to replace default logic (surefire is a good example
> > where it must be refined and it can save
> hours
> > there ;))
> > 4. Let's try to impl it as a mvn extension first then if it works
> > well on multiple big project get it to core?
> >
> > Romain
> >
> >
> >
> > Le ven. 13 sept. 2019 à 23:18, Tibor Digana <[hidden email]>
> > a écrit :
> >
> > > In theory, the incremental compiler would make it faster.
> > > But this can be told only if you present a demo project with has
> trivial
> > > tests taking much less time to complete than the compiler.
> > >
> > > In reality the tests in huge projects take significantly longer
> > > time
> than
> > > the compiler.
> > > Some developers say "switch off all the tests" in the release
> > > phase but that's wrong because then the quality goes down and
> > > methodologies are broken.
> > >
> > > I can see a big problem that we do not have an interface between
> Surefire
> > > and Compiler plugin negotiating which tests have been modified
> including
> > > modules and classes in the entire structure.
> > >
> > > Having incremental compiler is easy, just use compiler:3.8.1 or
> > > use the Takari compiler.
> > > But IMO the biggest benefit in performance would be after having
> > > the
> truly
> > > incremental test executor.
> > >
> > > On Fri, Sep 13, 2019 at 10:46 PM Maximilian Novikov <
> > > [hidden email]> wrote:
> > >
> > > > Hi All,
> > > >
> > > >
> > > >
> > > > *We want to create upstream change to Maven* to support true
> incremental
> > > > build for big-sized projects.
> > > >
> > > > To raise a pull request we have to pass long chain of Deutsche
> > > > Bank’s internal procedures. So, *before starting the process we
> > > > would like
> to
> > > > get your feedback regarding this feature*.
> > > >
> > > >
> > > >
> > > > *Motivation:*
> > > >
> > > >
> > > >
> > > > Our project is hosted in mono-repo and contains ~600 modules.
> > > > All
> modules
> > > > has the same SNAPSHOT version.
> > > >
> > > > There are lot of test automation around this, everything is
> > > > tested
> before
> > > > merge into release branch.
> > > >
> > > >
> > > >
> > > > Current setup helps us to simplify build/release/dependency
> management
> > > for
> > > > 10+ teams those contribute into codebase. We can release
> > > > 10+ everything
> in
> > > > 1-click.
> > > >
> > > > The major drawback of such approach is build time: *full local
> > > > build
> took
> > > > 45-60 min (*-T8)*, CI build ~25min(*-T16*)*.
> > > >
> > > >
> > > >
> > > > To speed-up our build we needed 2 features: incremental build
> > > > and
> shared
> > > > cache.
> > > >
> > > > Initially we started to think about migration to Gradle or
> > > > Bazel. As migration costs for the mentioned tools were too high,
> > > > we decided to
> add
> > > > similar functionality into Maven.
> > > >
> > > >
> > > >
> > > > Current results we get: *1-2 mins for local build(*-T8*)* if
> > > > build
> was
> > > > cached by CI*, CI build ~5 mins (*-T16*).*
> > > >
> > > >
> > > >
> > > > *Feature description:*
> > > >
> > > >
> > > >
> > > > The idea is to calculate checksum for inputs and save outputs in
> cache.
> > > >
> > > > [image: image2019-8-27_20-0-14.png]
> > > >
> > > > Each node checksum calculated with:
> > > >
> > > >
> > > >
> > > > ·         Effective POM hash
> > > >
> > > > ·         Sources hash
> > > >
> > > > ·         Dependencies hash (dependencies within multi-module
> project)
> > > >
> > > >
> > > >
> > > > Project sources inputs are searched inside project + all paths
> > > > from plugins configuration:
> > > >
> > > > [image: image2019-8-30_10-28-56.png]
> > > >
> > > > How does it work in practice:
> > > >
> > > >
> > > >
> > > > 1.       CI: runs builds and stores outputs in shared cache
> > > >
> > > > 2.       CI: reuse outputs for same inputs, so time is decreasing
> > > >
> > > > 3.       Locally: when I checkout branch and run ‘install’ for whole
> > > > project, I get all actual snapshots from remote cache for this
> > > > branch
> > > >
> > > > 4.       Locally: if I change multiple modules in tree, only changed
> > > > subtree is rebuilt
> > > >
> > > >
> > > >
> > > > Impact on current Maven codebase is very localized
> > > > (MojoExecutor,
> where
> > > we
> > > > injected cache controller).
> > > >
> > > > Caching can be activated/deactivated by property, so current
> > > > maven
> flow
> > > > will work as is.
> > > >
> > > >
> > > >
> > > > And the big plus is that you don’t need to re-work your current
> project.
> > > > Caching should work out of box, just need to add config in .mvn
> folder.
> > > >
> > > >
> > > >
> > > > Please let us know what do you think. We are ready to invest in
> > > > this feature and address any further feedback.
> > > >
> > > >
> > > >
> > > > Kind regards,
> > > >
> > > > Max
> > > >
> > > >
> > > >
> > > >
> > > > ---
> > > > This e-mail may contain confidential and/or privileged information.
> If
> > > you
> > > > are not the intended recipient (or have received this e-mail in
> error)
> > > > please notify the sender immediately and delete this e-mail. Any
> > > > unauthorized copying, disclosure or distribution of the material
> > > > in
> this
> > > > e-mail is strictly forbidden.
> > > >
> > > > Please refer to https://www.db.com/disclosures for additional EU
> > > > corporate and regulatory disclosures and to
> > > > http://www.db.com/unitedkingdom/content/privacy.htm for
> > > > information
> > > about
> > > > privacy.
> > > >
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email] For
> additional commands, e-mail: [hidden email]
>
>


---
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Please refer to https://www.db.com/disclosures for additional EU corporate and regulatory disclosures and to http://www.db.com/unitedkingdom/content/privacy.htm for information about privacy.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Maven incremental build for BIG-sized projects with local and remote caching [I]

Tibor Digana
I was the person who talked with Robert and Herve about incremental build
in ASF Conference Budapest in 2015.
I had an idea with a Java agent and optimize the test-set in Surefire but
we all understood that a generic solution is not possible with full
guarantees for all users which may finally break Maven's reputation. The
solution has limitations and some drawbacks. The problem is that the user
expects only better outcome with no drawbacks.

But what I see as a possible solution(s) is to move this responsibility for
the entire build system stability to the particular user. This means that
the isolated facility of incremental build should not be inside of Maven
Core/Dist and the user should add the extensions by her/himself in POM. As
soon as the build is unstable, the user has to be active either to remove
the extension or contact the developer of the extension to fix it. The
Maven itself would be the golden software again.

The extensions have certain granularity and no extension exists yet which
would optimize the entire build with no drawbacks.
Maybe we all can see what possibilities are. We can enumerate them:
1. optimizations in caches
2. optimizations on SCM level
3. optimizations on compiler level
4. optimizations on test-set and test execution level

And some developers may add more extensions like:
1. optimizations on packaging level (JAR, WAR, etc)
2. optimizations on resources level

Maybe missing:
1. an appropriate integratioin of the extensions with CI systems

I think the concept of Maven is good since it is an extensible software
which is our case of discussion with the extensions.
The only thing which is missing is large community of very active
developers who improve the quality of these tools and enlarge the group of
satisfied users.

Maven can list all these extensions on the WEB and help the users to pickup
the best fitting extension.
The extensions should be well documented and they should list all
limitations.

Freely, the extensions may be part of ASF and the activity would grow in
these projects.
We had some incubetor projects in ASF so we can again include the
extensions this way in the ASF organization.

Cheers
Tibor17


On Wed, Sep 18, 2019 at 12:48 PM Falko Modler <[hidden email]> wrote:

> Hi Maximilian,
>
> > 2. No IDE integration
>
> IDEs usually have their own mechanisms to build incrementally. They also
> execute Maven core in their own special way, often very different from the
> default command line execution.
>
> > 3. Further advanced optimizations don't look possible
>
> Feature requests are welcome! :-)
>
> Otherwise I wish you good luck on your path.
>
> Best regards,
>
> Falko
>
>
> Am 18. September 2019 09:52:52 MESZ schrieb Maximilian Novikov <
> [hidden email]>:
> >Classification: For internal use only
> >
> >Hi Falko,
> >
> >I saw this project.
> >It can help in some cases, but to build fast you need:
> >  1. Incremental build
> >  2. Remote cache(shared cache)
> >
> >gitflow-incremental-builder helps to cover #1. BTW I still see
> >limitations here:
> >  1. It creates coupling with GIT
> >  2. No IDE integration
> >  3. Further advanced optimizations don't look possible
> >
> >Our idea was to move from workaround solutions(as Tibor classified
> >this) to native solution.
> >
> >Thanks for sharing this.
> >
> >Kind regards,
> >Max
> >
> >
> >-----Original Message-----
> >From: Falko Modler [mailto:[hidden email]]
> >Sent: Wednesday, September 18, 2019 1:15 AM
> >To: [hidden email]
> >Subject: Re: [VOTE] Maven incremental build for BIG-sized projects with
> >local and remote caching
> >
> >Hi there,
> >
> >I must admit that I did not read everything but in case you are using
> >Git this extension might help:
> >
> >https://github.com/vackosar/gitflow-incremental-builder
> >
> >It is a Maven extension and it is _not_ limited to Gitflow setups!
> >
> >Disclaimer: I am not the owner of this project but I am a
> >"Collaborator"
> >(I can cut releases etc.).
> >
> >Feedback is very much appreciated.
> >
> >
> >PS: I hope this works, never posted to a mailing list before. :-O
> >
> >
> >Best regards,
> >
> >Falko
> >
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: [hidden email] For additional
> >commands, e-mail: [hidden email]
> >
> >
> >---
> >This e-mail may contain confidential and/or privileged information. If
> >you are not the intended recipient (or have received this e-mail in
> >error) please notify the sender immediately and delete this e-mail. Any
> >unauthorized copying, disclosure or distribution of the material in
> >this e-mail is strictly forbidden.
> >
> >Please refer to https://www.db.com/disclosures for additional EU
> >corporate and regulatory disclosures and to
> >http://www.db.com/unitedkingdom/content/privacy.htm for information
> >about privacy.
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: [hidden email]
> >For additional commands, e-mail: [hidden email]
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Maven incremental build for BIG-sized projects with local and remote caching [I]

Romain Manni-Bucau
In reply to this post by Maximilian Novikov
> no go for this change

It is likely the easiest way to interpret (and reduce the short term
investment) it but I don't read it this way, more as "it needs more
investment to work and be a core feature usable by everyone so let step the
inclusion with intermediate deliveries" (but it is still a "go go go IMHO).
I really love the first step being an extension, letting people test it and
give feedback on the broken cases until it is mature and integrable more
deeply.

Romain

Le lun. 23 sept. 2019 à 15:10, Maximilian Novikov <[hidden email]>
a écrit :

> Classification: For internal use only
>
> Hi Enrico,
>
> It's not open sourced for now. We need to spend much efforts according
> with DB policies/procedures to make it open source.
> That's why we started this discussion, we don't want to waste time if
> change conceptually wrong(from Maven Dev Community perspective) and will be
> rejected.
>
> To sum up discussion: no go for this change.
>
> In some time we will wrap caching module in extension.
>
> Regards,
> Max
>
> -----Original Message-----
> From: Enrico Olivelli [mailto:[hidden email]]
> Sent: Saturday, September 21, 2019 9:50 AM
> To: Maven Developers List <[hidden email]>
> Subject: Re: [VOTE] Maven incremental build for BIG-sized projects with
> local and remote caching
>
> Hi Maximilian,
> is there anyway to see this work ? is it already open source? (I am sorry,
> maybe I missed some email with links)
>
> Enrico
>
> Il giorno ven 20 set 2019 alle ore 19:30 Alexander Ashitkin <
> [hidden email]> ha scritto:
>
> > Hi Martijn
> > thanks for positive feedback.
> >
> > Regarding IDE part, yes you're right on integration part, but still
> > there important cases when cache helps:
> > 1) you need to navigate less in project as top level targets fast
> > enough to not drill down
> > 2) if you need to build a part of project (say only rest of wicket)
> > you need to provide up-to-date rest dependencies which are not active
> > in the subproject - and caches restores missing pieces for you without
> > rebuilding remaining part of the project
> > 3) If you need to test project and invoke test - cache saves your time
> > (as gradle does) on unchanged pieces
> > 4) and because tests run faster you can try run slow tests which often
> > too expensive in rapid development
> >
> > So maven integration in Intellij works nice. There is nothing super
> > smart here, just sharing how i benefit from the cache in everyday ide
> > work
> >
> > Thank you!
> >
> > On 2019/09/19 11:28:48, Martijn Dashorst <[hidden email]>
> > wrote:
> > > On Thu, Sep 19, 2019 at 7:48 AM Alexander Ashitkin
> > > <[hidden email]> wrote:
> > > > Configuration:
> > > > * verify -T4 -P default,all-shapshots-repos
> > > > * my project config (might be suboptimal for wicket)
> > > > * scala tests disabled in 2 modules (caused bytecode version
> > > > conflict
> > on my machine)
> > > >
> > > > Results
> > > > Clean state (cache disabled):                           15:58 min
> > > > Second run, target up to date (cache disabled):      10:20 min
> > > > Fully cached (no changes):
> 17.507
> > s
> > > > wicketstuff-jwicket-tooltip-wtooltips changed:          34.936 s
> > > > wicketstuff-rest-utils changed:
>  54.040
> > s
> > > >
> > > > If you want to try other modules - please let me know.
> > >
> > > Nice results!
> > >
> > > > regarding ide - it's a usual maven installation, so any ide with
> > > > maven
> > integration should benefit from cache them maven action invoked
> > >
> > > My instinct says that an IDE as Eclipse won't benefit much from it,
> > > as it has its own build lifecycle. Only when you invoke a
> > > commandline Maven action (such as generate-sources) one might have a
> benefit.
> > >
> > > So in the day-to-day life the caching might not be as beneficial for
> > > developers, but commandline builds happen often enough to make this
> > > matter.
> > >
> > > Martijn
> > >
> > > --------------------------------------------------------------------
> > > - To unsubscribe, e-mail: [hidden email] For
> > > additional commands, e-mail: [hidden email]
> > >
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email] For
> > additional commands, e-mail: [hidden email]
> >
> >
>
>
> ---
> This e-mail may contain confidential and/or privileged information. If you
> are not the intended recipient (or have received this e-mail in error)
> please notify the sender immediately and delete this e-mail. Any
> unauthorized copying, disclosure or distribution of the material in this
> e-mail is strictly forbidden.
>
> Please refer to https://www.db.com/disclosures for additional EU
> corporate and regulatory disclosures and to
> http://www.db.com/unitedkingdom/content/privacy.htm for information about
> privacy.
>