RE: [VOTE] Maven incremental build for BIG-sized projects with local and remote caching [I]

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

RE: [VOTE] Maven incremental build for BIG-sized projects with local and remote caching [I]

Maximilian Novikov
Classification: For internal use only

It's moving to off-topic.

Monorepo != single build
Monorepo != monolithic application

We moved from poly-repo to monorepo 3 years ago and see value in this.
Let's say that this approach exists. It has own benefits/drawbacks comparing to poly-repo.

-----Original Message-----
From: Romain Manni-Bucau [mailto:[hidden email]]
Sent: Saturday, September 14, 2019 7:42 PM
To: Maven Developers List <[hidden email]>
Subject: Re: [VOTE] Maven incremental build for BIG-sized projects with local and remote caching

Hope I didnt miss it but how monorepo=single build?

It is working well to not have a common parent too and is unlinked to monorepo which uses local relative paths in general (at least in the references you quoted which are also not about java ;)).

Unrelated to making maven better at incremental builds but both tracks can help you to get a very fast build feedback.

Le sam. 14 sept. 2019 à 17:35, Robert Scholte <[hidden email]> a écrit :

> https://issues.apache.org/jira/browse/MPLUGIN-350 is the issue to
> start with.
>
> Please read all the comments, because my original thought won't work.
>
> thanks,
> Robert
>
> On Sat, 14 Sep 2019 17:10:13 +0200, Alexander Ashitkin
> <[hidden email]> wrote:
>
> > We checked and price of 550$ per user makes us think twice of what's
> the
> > best way to proceed here :-)
> > Regarding plugin api - yes, changes are desirable to make maven
> > model cache-friendly. Both in plugin invocation model and
> > Mojo#execute input/output apis. But it is possible to work with
> > current model with declarative approach.
> >
> > Thanks in advance
> >
> > On 2019/09/14 10:45:24, Tibor Digana <[hidden email]> wrote:
> >> But I do not understand why the Maven should be responsible for the
> >> project cahe control/management of "/target" directories.
> >> It is a responsibility of the build manager which is the Jenkins.
> >> The Jenkins has the ability to archive files and such property
> >> already exists in the Jenkins.
> >>
> >> So the Jenkins has a full knowledge about:
> >>
> >> 1. how long the workspace content retains intact 2. what commit
> >> hash is for the last build/job/branch 3. and what commit was
> >> successful
> >>
> >> If the target directories retain intact (or renewed from archive)
> >> in the workspace for very long time and the workspace was reused by
> >> the next build then I would say that the improvement should work as
> >> it is on CI level.
> >>
> >> Maybe what is necessary is only that improvement in Maven where we
> >> would obtain the list of modules or directories of changes in the
> >> current commit.
> >> Then the Maven can highly optimize its own build steps and build
> >> only those modules which have been changed and their dependent
> >> modules.
> >> So the interface between CI and Maven is needed in a kind of
> >> extension or the class MavenCli can be extended with some new
> >> entrypoint.
> >>
> >> But I do not hink that Maven has to take care of responsibilities
> >> of CI (project cache mgmt), that's not our task I would say and we
> >> as Maven would never know all about the miscellaneous CI specifics
> >> and therefore we would not cope with CI related troubles.
> >>
> >> Cheers
> >> Tibor17
> >>
> >>
> >>
> >> On Sat, Sep 14, 2019 at 11:08 AM Robert Scholte
> >> <[hidden email]>
> >> wrote:
> >>
> >> > On Fri, 13 Sep 2019 23:37:15 +0200, Romain Manni-Bucau
> >> > <[hidden email]> wrote:
> >> >
> >> > > There are multiple possible incremental support:
> >> > >
> >> > > 1. Scm related: do a status and rebuild downstream reactor 2.
> >> > > Full and module build graph: seems it is the one you target, ie
> >> bypass
> >> > > modules without change. Note that it only works if upstream
> >> > > graph is taken into account.
> >> > > 3. Full build: each mojo has incremental support so the full
> >> > > build
> >> gets
> >> > > it.
> >> > > Issue is that it requires each mojo to know if it needs to be
> >> executed or
> >> > > give enough info to the mojo executor to do so (gradle requires
> >> > > all inputs/outputs to assume this state - which is still just
> >> > > an
> >> heuristic
> >> > > and
> >> > > not 100% reliable).
> >> > >
> >> > > In current state, 2. sounds like a good option since 3 can
> >> > > require
>
> >> a
> >> > > loot
> >> > > of work for external plugins (today's builds have a lot more of
> not
> >> maven
> >> > > provide plugins than core plugins).
> >> > > Now, we should be able to activate it or not so having a
> >> cacheLocation
> >> > > config in settings.xml can be good.
> >> > >
> >> > > Side notes:
> >> > >
> >> > > 1. having it on by default will break builds - reactor is
> >> deterministic
> >> > > and
> >> > > bypassing a module can break a build since it can init maven
> >> properties -
> >> > > for ex - for next modules
> >> > > 2. You cant find all in/out paths from the pom in general so
> >> > > your
> >> algo is
> >> > > not generic, a meta config can be needed in .mvn 3. We should
> >> > > let a mojo be able to disable that to replace default
> >> logic
> >> > > (surefire is a good example where it must be refined and it can
> >> > > save hours there ;)) 4. Let's try to impl it as a mvn extension
> >> > > first then if it works
> >> well on
> >> > > multiple big project get it to core?
> >> >
> >> > Did anyone Google for "maven extension build cache"? There are
> >> > already commercial solutions for it.
> >> > Even though I would like to see improvements in this area, the
> >> > old architecture of Maven makes it quite hard to move to that situation.
> >> > First
> >> > of all it requires changes to the Plugin API (without breaking
> >> backwards
> >> > compatibility) to have support out of the box.
> >> >
> >> > Robert
> >> >
> >> > >
> >> > > Romain
> >> > >
> >> > >
> >> > >
> >> > > Le ven. 13 sept. 2019 à 23:18, Tibor Digana
> >> <[hidden email]> a
> >> > > écrit :
> >> > >
> >> > >> In theory, the incremental compiler would make it faster.
> >> > >> But this can be told only if you present a demo project with
> >> > >> has
> >> trivial
> >> > >> tests taking much less time to complete than the compiler.
> >> > >>
> >> > >> In reality the tests in huge projects take significantly
> >> > >> longer
> >> time
> >> > >> than
> >> > >> the compiler.
> >> > >> Some developers say "switch off all the tests" in the release
> >> phase but
> >> > >> that's wrong because then the quality goes down and
> >> > >> methodologies
> >> are
> >> > >> broken.
> >> > >>
> >> > >> I can see a big problem that we do not have an interface
> >> > >> between Surefire and Compiler plugin negotiating which tests
> >> > >> have been modified
> >> including
> >> > >> modules and classes in the entire structure.
> >> > >>
> >> > >> Having incremental compiler is easy, just use compiler:3.8.1
> >> > >> or
> >> use the
> >> > >> Takari compiler.
> >> > >> But IMO the biggest benefit in performance would be after
> >> > >> having
> >> the
> >> > >> truly
> >> > >> incremental test executor.
> >> > >>
> >> > >> On Fri, Sep 13, 2019 at 10:46 PM Maximilian Novikov <
> >> > >> [hidden email]> wrote:
> >> > >>
> >> > >> > Hi All,
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > *We want to create upstream change to Maven* to support true
> >> > >> incremental
> >> > >> > build for big-sized projects.
> >> > >> >
> >> > >> > To raise a pull request we have to pass long chain of
> >> > >> > Deutsche
> >> Bank’s
> >> > >> > internal procedures. So, *before starting the process we
> >> > >> > would
> >> like to
> >> > >> > get your feedback regarding this feature*.
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > *Motivation:*
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > Our project is hosted in mono-repo and contains ~600
> >> > >> > modules. All
> >> > >> modules
> >> > >> > has the same SNAPSHOT version.
> >> > >> >
> >> > >> > There are lot of test automation around this, everything is
> >> tested
> >> > >> before
> >> > >> > merge into release branch.
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > Current setup helps us to simplify build/release/dependency
> >> management
> >> > >> for
> >> > >> > 10+ teams those contribute into codebase. We can release
> >> everything in
> >> > >> > 1-click.
> >> > >> >
> >> > >> > The major drawback of such approach is build time: *full
> >> > >> > local
> >> build
> >> > >> took
> >> > >> > 45-60 min (*-T8)*, CI build ~25min(*-T16*)*.
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > To speed-up our build we needed 2 features: incremental
> >> > >> > build and
> >> > >> shared
> >> > >> > cache.
> >> > >> >
> >> > >> > Initially we started to think about migration to Gradle or
> >> Bazel. As
> >> > >> > migration costs for the mentioned tools were too high, we
> >> decided to
> >> > >> add
> >> > >> > similar functionality into Maven.
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > Current results we get: *1-2 mins for local build(*-T8*)* if
> >> build was
> >> > >> > cached by CI*, CI build ~5 mins (*-T16*).*
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > *Feature description:*
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > The idea is to calculate checksum for inputs and save
> >> > >> > outputs in
> >> > >> cache.
> >> > >> >
> >> > >> > [image: image2019-8-27_20-0-14.png]
> >> > >> >
> >> > >> > Each node checksum calculated with:
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > ·         Effective POM hash
> >> > >> >
> >> > >> > ·         Sources hash
> >> > >> >
> >> > >> > ·         Dependencies hash (dependencies within multi-module
> >> project)
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > Project sources inputs are searched inside project + all
> >> > >> > paths
> >> from
> >> > >> > plugins configuration:
> >> > >> >
> >> > >> > [image: image2019-8-30_10-28-56.png]
> >> > >> >
> >> > >> > How does it work in practice:
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > 1.       CI: runs builds and stores outputs in shared cache
> >> > >> >
> >> > >> > 2.       CI: reuse outputs for same inputs, so time is decreasing
> >> > >> >
> >> > >> > 3.       Locally: when I checkout branch and run ‘install’ for
> >> whole
> >> > >> > project, I get all actual snapshots from remote cache for
> >> > >> > this
> >> branch
> >> > >> >
> >> > >> > 4.       Locally: if I change multiple modules in tree, only
> >> changed
> >> > >> > subtree is rebuilt
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > Impact on current Maven codebase is very localized
> >> > >> > (MojoExecutor,
> >> > >> where
> >> > >> we
> >> > >> > injected cache controller).
> >> > >> >
> >> > >> > Caching can be activated/deactivated by property, so current
> >> maven
> >> > >> flow
> >> > >> > will work as is.
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > And the big plus is that you don’t need to re-work your
> >> > >> > current
> >> > >> project.
> >> > >> > Caching should work out of box, just need to add config in
> >> > >> > .mvn
> >> > >> folder.
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > Please let us know what do you think. We are ready to invest
> >> > >> > in
> >> this
> >> > >> > feature and address any further feedback.
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > Kind regards,
> >> > >> >
> >> > >> > Max
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> >
> >> > >> > ---
> >> > >> > This e-mail may contain confidential and/or privileged
> >> information. If
> >> > >> you
> >> > >> > are not the intended recipient (or have received this e-mail
> >> > >> > in
> >> error)
> >> > >> > please notify the sender immediately and delete this e-mail.
> >> > >> > Any unauthorized copying, disclosure or distribution of the
> material
> >> in
> >> > >> this
> >> > >> > e-mail is strictly forbidden.
> >> > >> >
> >> > >> > Please refer to https://www.db.com/disclosures for
> >> > >> > additional EU corporate and regulatory disclosures and to
> >> > >> > http://www.db.com/unitedkingdom/content/privacy.htm for
> >> information
> >> > >> about
> >> > >> > privacy.
> >> > >> >
> >> >
> >> > -----------------------------------------------------------------
> >> > ---- To unsubscribe, e-mail: [hidden email] For
> >> > additional commands, e-mail: [hidden email]
> >> >
> >> >
> >>
> >
> > --------------------------------------------------------------------
> > - To unsubscribe, e-mail: [hidden email] For
> > additional commands, e-mail: [hidden email]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email] For
> additional commands, e-mail: [hidden email]
>
>


---
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Please refer to https://www.db.com/disclosures for additional EU corporate and regulatory disclosures and to http://www.db.com/unitedkingdom/content/privacy.htm for information about privacy.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]