RE: [VOTE] Maven incremental build for BIG-sized projects with local and remote caching

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

RE: [VOTE] Maven incremental build for BIG-sized projects with local and remote caching

Alexander Ashitkin-2
Hi
Yes we tried, but Takari is a bit different story – it’s a smarter scheduler which gives you some boost over default lifecycle scheduler, but still require you to build your modules.
This feature is true incremental build – you don’t build modules which were not changed at all and build only modified/changed ones. Required build state for skipped modules is restored from cache.
So for our 600 modules build time is down to 1 minute from ~40 minutes and even single threaded build benefits from the cache. Takari just doesn’t do that.

Kindly yours
Aleks

From: Tamás Cservenák [mailto:[hidden email]]
Sent: Friday, September 13, 2019 4:54 PM
To: Maven Developers List <[hidden email]>
Cc: Alexander Ashitkin <[hidden email]>
Subject: Re: [VOTE] Maven incremental build for BIG-sized projects with local and remote caching

Hi there,

just a shot in a dark: Have you tried any of the existing stuff, like Takari Lifecycle before modding Maven itself? (http://takari.io/book/40-lifecycle.html)

Thanks,
T

On Fri, Sep 13, 2019 at 10:46 PM Maximilian Novikov <[hidden email]<mailto:[hidden email]>> wrote:
Hi All,

We want to create upstream change to Maven to support true incremental build for big-sized projects.
To raise a pull request we have to pass long chain of Deutsche Bank’s internal procedures. So, before starting the process we would like to get your feedback regarding this feature.

Motivation:

Our project is hosted in mono-repo and contains ~600 modules. All modules has the same SNAPSHOT version.
There are lot of test automation around this, everything is tested before merge into release branch.

Current setup helps us to simplify build/release/dependency management for 10+ teams those contribute into codebase. We can release everything in 1-click.
The major drawback of such approach is build time: full local build took 45-60 min (-T8), CI build ~25min(-T16).

To speed-up our build we needed 2 features: incremental build and shared cache.
Initially we started to think about migration to Gradle or Bazel. As migration costs for the mentioned tools were too high, we decided to add similar functionality into Maven.

Current results we get: 1-2 mins for local build(-T8) if build was cached by CI, CI build ~5 mins (-T16).

Feature description:

The idea is to calculate checksum for inputs and save outputs in cache.
Each node checksum calculated with:


•         Effective POM hash

•         Sources hash

•         Dependencies hash (dependencies within multi-module project)

Project sources inputs are searched inside project + all paths from plugins configuration:
How does it work in practice:



1.       CI: runs builds and stores outputs in shared cache

2.       CI: reuse outputs for same inputs, so time is decreasing

3.       Locally: when I checkout branch and run ‘install’ for whole project, I get all actual snapshots from remote cache for this branch

4.       Locally: if I change multiple modules in tree, only changed subtree is rebuilt

Impact on current Maven codebase is very localized (MojoExecutor, where we injected cache controller).
Caching can be activated/deactivated by property, so current maven flow will work as is.

And the big plus is that you don’t need to re-work your current project. Caching should work out of box, just need to add config in .mvn folder.

Please let us know what do you think. We are ready to invest in this feature and address any further feedback.

Kind regards,
Max



---
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.

Please refer to https://www.db.com/disclosures for additional EU corporate and regulatory disclosures and to http://www.db.com/unitedkingdom/content/privacy.htm for information about privacy.


---
Die Europäische Kommission hat unter http://ec.europa.eu/consumers/odr/ eine Europäische Online-Streitbeilegungsplattform (OS-Plattform) errichtet. Verbraucher können die OS-Plattform für die außergerichtliche Beilegung von Streitigkeiten aus Online-Verträgen mit in der EU niedergelassenen Unternehmen nutzen.

Informationen (einschließlich Pflichtangaben) zu einzelnen, innerhalb der EU tätigen Gesellschaften und Zweigniederlassungen des Konzerns Deutsche Bank finden Sie unter https://www.deutsche-bank.de/Pflichtangaben. Diese E-Mail enthält vertrauliche und/ oder rechtlich geschützte Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie bitte sofort den Absender und vernichten Sie diese E-Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser E-Mail ist nicht gestattet.

The European Commission has established a European online dispute resolution platform (OS platform) under http://ec.europa.eu/consumers/odr/. Consumers may use the OS platform to resolve disputes arising from online contracts with providers established in the EU.

Please refer to https://www.db.com/disclosures for information (including mandatory corporate particulars) on selected Deutsche Bank branches and group companies registered or incorporated in the European Union. This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized copying, disclosure or distribution of the material in this e-mail is strictly forbidden.
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Maven incremental build for BIG-sized projects with local and remote caching

Alexander Ashitkin
HI Enrico
Thanks for feedback. that's a side discussion for best approach for projects layouts. Monorepo has own own advocates and it is easy to find posts describing why google, microsoft or facebook go monorepo.
Unlike of way of thought, we are ready to go globally in case of emergency scenario. If say zero-day vulnerability is discovered in some of low-level widely reused core libraries, we need just one click to build/test/deploy and safely go live globally with whole estate updated on scale of thousands of processes. And you know, there are people in the world who think that scattered across small repos codebase is difficult to maintain and snapshots are evil. It all depends.
Honestly, i think it will be it's a kind of reversed approach them you build system defines how your software development processes work. Google has own vision and just implemented Bazel and this is correct approach. Btw Bazel is perfect for such scenario, but costly to migrate on for existing project.

So if you choose monorepo as we did it is normal to work just on a part of project. You just need a way to deal with scalability challenges:
a) you hit hardware and infrastructure limitations and need to address them in some way.
b) need to have incremental build so you can work on subpart of project but contribute to shared codebase

Sincerely yours, Aleks

On 2019/09/14 08:41:37, Enrico Olivelli <[hidden email]> wrote:

> I feel that in general having an huge monolithic project is kind of a
> project-smell.
> Btw I have some big project with 100+ modules so I can see your pain.
> In the daywork experience a single developer doesn't work on all of the
> modules but usually you touch 1-2 modules and maybe some integration/system
> test.
> If you need to rebuild the full project for every change maybe there is
> something wrong with the overall design.
>
> I think you have you motivations for your layout, so let's talk about your
> proposal.
>
> If you have a way to split your project in subsystems you can use some
> shared remote repository for deploying snapshots in order to share
> intermediate results with other developers
>
> If your goal is to be ready for releases I don't get your point. Usually
> you work with snapshots and for a release you have to rebuild one time and
> only once the full codebase in order to ensure that you a consistent build
> of the project.
> With all of this kind of temporary caches how do you ensure that the final
> artifacts are the intended ones?
>
>
> Beside note: this is not a real VOTE thread
>
> Just my 2 cents
>
> don't get me wrong, I admire your will to improve Maven ecosystem with this
> cool feature! Thank you for contribution your work. We will try to get the
> best
>
> Enrico
>
> Il sab 14 set 2019, 08:29 Laird Nelson <[hidden email]> ha scritto:
>
> > On Fri, Sep 13, 2019 at 11:01 PM Alexander Ashitkin <
> > [hidden email]> wrote:
> >
> > > This feature is true incremental build – you don’t build modules which
> > > were not changed at all and build only modified/changed ones.
> > >
> >
> > Suppose module B depends on module A and I change A.  Does B get rebuilt in
> > your system?
> >
> > Best,
> > Laird
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Maven incremental build for BIG-sized projects with local and remote caching

Tibor Digana
Alexander,
Enrico is really right. Today it is Microservices and there every
microservice is in a separate SCM repo.

It was just only an example with Microservices but in my experiences you
can always find the lower bound modules in the hierary which do not change
so much and segragate them in another SCM repos. Those should undergo the
release process, share release versions and avoid sharing SNAPSHOT
versions.

You can find the top roots which are actually applications. If you have 10
WAR files as a result of the build and all of them should be deployed, then
there is a strong reason to separate them in separate SCM repos.

Then this separation concept will guide you to isolate the middle layers
into isolated services as JAR files. And then you endup with Microservices,
SOA services and not JAR files or you will be much closer to them. the huge
monolith project is gone.

All the development process will be faster and more flexible than it was
before. Just try!

Cheers
Tibor17

On Sat, Sep 14, 2019 at 5:23 PM Alexander Ashitkin <[hidden email]>
wrote:

> HI Enrico
> Thanks for feedback. that's a side discussion for best approach for
> projects layouts. Monorepo has own own advocates and it is easy to find
> posts describing why google, microsoft or facebook go monorepo.
> Unlike of way of thought, we are ready to go globally in case of emergency
> scenario. If say zero-day vulnerability is discovered in some of low-level
> widely reused core libraries, we need just one click to build/test/deploy
> and safely go live globally with whole estate updated on scale of thousands
> of processes. And you know, there are people in the world who think that
> scattered across small repos codebase is difficult to maintain and
> snapshots are evil. It all depends.
> Honestly, i think it will be it's a kind of reversed approach them you
> build system defines how your software development processes work. Google
> has own vision and just implemented Bazel and this is correct approach. Btw
> Bazel is perfect for such scenario, but costly to migrate on for existing
> project.
>
> So if you choose monorepo as we did it is normal to work just on a part of
> project. You just need a way to deal with scalability challenges:
> a) you hit hardware and infrastructure limitations and need to address
> them in some way.
> b) need to have incremental build so you can work on subpart of project
> but contribute to shared codebase
>
> Sincerely yours, Aleks
>
> On 2019/09/14 08:41:37, Enrico Olivelli <[hidden email]> wrote:
> > I feel that in general having an huge monolithic project is kind of a
> > project-smell.
> > Btw I have some big project with 100+ modules so I can see your pain.
> > In the daywork experience a single developer doesn't work on all of the
> > modules but usually you touch 1-2 modules and maybe some
> integration/system
> > test.
> > If you need to rebuild the full project for every change maybe there is
> > something wrong with the overall design.
> >
> > I think you have you motivations for your layout, so let's talk about
> your
> > proposal.
> >
> > If you have a way to split your project in subsystems you can use some
> > shared remote repository for deploying snapshots in order to share
> > intermediate results with other developers
> >
> > If your goal is to be ready for releases I don't get your point. Usually
> > you work with snapshots and for a release you have to rebuild one time
> and
> > only once the full codebase in order to ensure that you a consistent
> build
> > of the project.
> > With all of this kind of temporary caches how do you ensure that the
> final
> > artifacts are the intended ones?
> >
> >
> > Beside note: this is not a real VOTE thread
> >
> > Just my 2 cents
> >
> > don't get me wrong, I admire your will to improve Maven ecosystem with
> this
> > cool feature! Thank you for contribution your work. We will try to get
> the
> > best
> >
> > Enrico
> >
> > Il sab 14 set 2019, 08:29 Laird Nelson <[hidden email]> ha scritto:
> >
> > > On Fri, Sep 13, 2019 at 11:01 PM Alexander Ashitkin <
> > > [hidden email]> wrote:
> > >
> > > > This feature is true incremental build – you don’t build modules
> which
> > > > were not changed at all and build only modified/changed ones.
> > > >
> > >
> > > Suppose module B depends on module A and I change A.  Does B get
> rebuilt in
> > > your system?
> > >
> > > Best,
> > > Laird
> > >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Maven incremental build for BIG-sized projects with local and remote caching

Alexander Ashitkin
Tibor
Let me please share a personal opinion.
To move this conversation forward, i would kindly ask to refrain from judgements and speculations about our project. Speaking on behalf of community is a certain responsibility after all. I guess your knowledge about our platform, it's architecture, cases, requirements, infrastructure is not so huge. In general judgements and speculation without basement is a very thin ice on which it is very easy to lose credibility. Thanks for sharing with us such an important concepts like microservices, nosql and all over important words. I believe that was done with good intentions, not with intention to insult.

The second - as a Maven users, we came to community with (a) our case and b (proposal). Speaking to users that your case is wrong, irrelevant, etc is counterproductive as such. Framing all customers in your vision is a perfect way for product stagnation. Ignoring cases which customers bring to you is a way to miss opportunity for product growth.

Productive would be to focus on our needs and how maven could address it. Another constructive input will be guidance on a proper feature implementation and next steps. Speculating about the project does not help at all and no the topic we are interested in.

Thank you
Aleks

On 2019/09/14 20:37:03, Tibor Digana <[hidden email]> wrote:

> Hello Maximilian,
>
> So now the next step is to break the traditional dependencies in Maven and
> isolate the services via web-services, e.g. JAX-RS or JAX-WS and you would
> not "touch" the POMs.
> You need to use Logstash, Kibana, Elasticsearch, and Zipkin because the
> logs won't be aggregated without these frameworks.
> This would require you to spend some time and develop automatic deployment
> and reliable CI.
>
> The monolith would become on infrastructure level but not on code level.
> There you can write integration tests in every service. The input XML/Json
> received from another service can be a mock and mock data. The service and
> it's project as well as the tests still become isolated on project level.
> The tests would become a documentation, and the data (XML/Json) would be a
> specification for another team.
> In this position a particular functionality would appear on the right
> place. Shared data won't become a workaround anymore. Sharing something may
> easily happen in the monolith project.
>
> The worst situation is if you share the database between the services
> because there you really have to deploy many services.
> One way is for instance an architecture where you have one NoSql database
> for one webapp, and RDBMS as master data.
> Each webapp has another NoSql database.
> Then the services would read only from one NoSql and write to RDBMS master
> data + JMS streaming the data back to NoSql databases via data/event bus.
>
> It is more about infrastructure and such isolation.
> Since every app has isolated database, then not all services have to change
> only because a new feature required database migration to new tables and
> relations.
> The probabily of a change in the service would be smaller.
>
> Then you have got DDD, CQRS but not the Event Sourcing - only partial.
>
> Cheers
> Tibor17
>
>
> On Sat, Sep 14, 2019 at 9:35 PM Maximilian Novikov <
> [hidden email]> wrote:
>
> > Tibor,
> >
> > We understand your position.
> >
> > We moved from separated SCM to one SCM. We can move back, but we don't
> > want this.
> >
> > In single SCM we like:
> > 1. Atomic commits
> > 2. Single point of responsibility.
> > If someone makes incompatible change in shared library, he is responsible
> > to update all usages. At first look It can be considered as slowness in
> > development, but it helps us to avoid growing of technical debt. We never
> > get in situation when projects A, B, C, D... depends on different version
> > of shared library and we need to make major upgrade, it can block release
> > of some apps and etc...
> >
> > Now we releasing 20+ clients apps and 50+ backend components every week or
> > even often. With multiple SCM we will need to hire a team of release
> > managers and build engineers to coordinate and support this.
> >
> > Again, we are don’t selling our approach. We implemented the missing for
> > us feature.
> >
> > PS. Just thing why commercial products like Gradle Maven Extensions
> > appeared.
> >
> >
> > From: Tibor Digana <[hidden email]<mailto:[hidden email]>>
> > Date: Saturday, 14 Sep 2019, 9:43 PM
> > To: Maven Developers List <[hidden email]<mailto:
> > [hidden email]>>
> > Subject: Re: [VOTE] Maven incremental build for BIG-sized projects with
> > local and remote caching
> >
> > Alexander,
> > Enrico is really right. Today it is Microservices and there every
> > microservice is in a separate SCM repo.
> >
> > It was just only an example with Microservices but in my experiences you
> > can always find the lower bound modules in the hierary which do not change
> > so much and segragate them in another SCM repos. Those should undergo the
> > release process, share release versions and avoid sharing SNAPSHOT
> > versions.
> >
> > You can find the top roots which are actually applications. If you have 10
> > WAR files as a result of the build and all of them should be deployed, then
> > there is a strong reason to separate them in separate SCM repos.
> >
> > Then this separation concept will guide you to isolate the middle layers
> > into isolated services as JAR files. And then you endup with Microservices,
> > SOA services and not JAR files or you will be much closer to them. the huge
> > monolith project is gone.
> >
> > All the development process will be faster and more flexible than it was
> > before. Just try!
> >
> > Cheers
> > Tibor17
> >
> > On Sat, Sep 14, 2019 at 5:23 PM Alexander Ashitkin <
> > [hidden email]>
> > wrote:
> >
> > > HI Enrico
> > > Thanks for feedback. that's a side discussion for best approach for
> > > projects layouts. Monorepo has own own advocates and it is easy to find
> > > posts describing why google, microsoft or facebook go monorepo.
> > > Unlike of way of thought, we are ready to go globally in case of
> > emergency
> > > scenario. If say zero-day vulnerability is discovered in some of
> > low-level
> > > widely reused core libraries, we need just one click to build/test/deploy
> > > and safely go live globally with whole estate updated on scale of
> > thousands
> > > of processes. And you know, there are people in the world who think that
> > > scattered across small repos codebase is difficult to maintain and
> > > snapshots are evil. It all depends.
> > > Honestly, i think it will be it's a kind of reversed approach them you
> > > build system defines how your software development processes work. Google
> > > has own vision and just implemented Bazel and this is correct approach.
> > Btw
> > > Bazel is perfect for such scenario, but costly to migrate on for existing
> > > project.
> > >
> > > So if you choose monorepo as we did it is normal to work just on a part
> > of
> > > project. You just need a way to deal with scalability challenges:
> > > a) you hit hardware and infrastructure limitations and need to address
> > > them in some way.
> > > b) need to have incremental build so you can work on subpart of project
> > > but contribute to shared codebase
> > >
> > > Sincerely yours, Aleks
> > >
> > > On 2019/09/14 08:41:37, Enrico Olivelli <[hidden email]> wrote:
> > > > I feel that in general having an huge monolithic project is kind of a
> > > > project-smell.
> > > > Btw I have some big project with 100+ modules so I can see your pain.
> > > > In the daywork experience a single developer doesn't work on all of the
> > > > modules but usually you touch 1-2 modules and maybe some
> > > integration/system
> > > > test.
> > > > If you need to rebuild the full project for every change maybe there is
> > > > something wrong with the overall design.
> > > >
> > > > I think you have you motivations for your layout, so let's talk about
> > > your
> > > > proposal.
> > > >
> > > > If you have a way to split your project in subsystems you can use some
> > > > shared remote repository for deploying snapshots in order to share
> > > > intermediate results with other developers
> > > >
> > > > If your goal is to be ready for releases I don't get your point.
> > Usually
> > > > you work with snapshots and for a release you have to rebuild one time
> > > and
> > > > only once the full codebase in order to ensure that you a consistent
> > > build
> > > > of the project.
> > > > With all of this kind of temporary caches how do you ensure that the
> > > final
> > > > artifacts are the intended ones?
> > > >
> > > >
> > > > Beside note: this is not a real VOTE thread
> > > >
> > > > Just my 2 cents
> > > >
> > > > don't get me wrong, I admire your will to improve Maven ecosystem with
> > > this
> > > > cool feature! Thank you for contribution your work. We will try to get
> > > the
> > > > best
> > > >
> > > > Enrico
> > > >
> > > > Il sab 14 set 2019, 08:29 Laird Nelson <[hidden email]> ha
> > scritto:
> > > >
> > > > > On Fri, Sep 13, 2019 at 11:01 PM Alexander Ashitkin <
> > > > > [hidden email]> wrote:
> > > > >
> > > > > > This feature is true incremental build – you don’t build modules
> > > which
> > > > > > were not changed at all and build only modified/changed ones.
> > > > > >
> > > > >
> > > > > Suppose module B depends on module A and I change A.  Does B get
> > > rebuilt in
> > > > > your system?
> > > > >
> > > > > Best,
> > > > > Laird
> > > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> > >
> >
> >
> > ---
> > This e-mail may contain confidential and/or privileged information. If you
> > are not the intended recipient (or have received this e-mail in error)
> > please notify the sender immediately and delete this e-mail. Any
> > unauthorized copying, disclosure or distribution of the material in this
> > e-mail is strictly forbidden.
> >
> > Please refer to https://www.db.com/disclosures for additional EU
> > corporate and regulatory disclosures and to
> > http://www.db.com/unitedkingdom/content/privacy.htm for information about
> > privacy.
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]