versioning by hashes to speedup multi-module build (a'la nix package manager)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

versioning by hashes to speedup multi-module build (a'la nix package manager)

Anton Vodonosov
Hello.

In order to speed up the build of a multi-module project, I'd like to reuse artifacts of modules that haven't changed.
Manual versioning is tedious and error-prone.

Is it possible to automatically assign versions to modules computed as a hash-of( hash-of(module sources) + hashes of all dependencies)?

In this approach, every change in code will modify such hash-based version of all dependent modules automatically.

This would be similar to Nix package manager.

How to do that in maven?

Best regards,
- Anton  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: versioning by hashes to speedup multi-module build (a'la nix package manager)

Thomas Broyer-2
Le dim. 2 févr. 2020 à 17:48, Anton Vodonosov <[hidden email]> a
écrit :

> Hello.
>
> In order to speed up the build of a multi-module project, I'd like to
> reuse artifacts of modules that haven't changed.
> Manual versioning is tedious and error-prone.
>
> Is it possible to automatically assign versions to modules computed as a
> hash-of( hash-of(module sources) + hashes of all dependencies)?
>

Please define modules sources?
Hint: you can't, at least not without knowing how all plugins work. Gradle
Enterprise tries to have such knowledge fwiw to solve this exact issue.

Also, you'll probably want to include system properties (or at least Maven
properties) and some environment information (e.g. which JDK) in the hash.

In this approach, every change in code will modify such hash-based version
> of all dependent modules automatically.
>
> This would be similar to Nix package manager.
>
> How to do that in maven?
>

As said above, you could try Gradle Enterprise. Takari had something in the
works too a few years ago.
…or if that's really problematic for you, then migrate to another build
tool, such as Gradle or Bazel.


> Best regards,
> - Anton
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: versioning by hashes to speedup multi-module build (a'la nix package manager)

Enrico Olivelli
(Apologises for top posting )

This thread is about a bunch of requested features (cache and parallel
executions of mojos) that we have been discussing on dev@ mailing list.
As said in this thread the first show stopper for Maven is that we do not
have a clear definition of input and outputs for each plugin.
There are proposals but actually no one is spending actively engineering
time on these topics.

Any help is very appreciated, please join us on dev@

Best regards
Enrico



Il Dom 2 Feb 2020, 18:57 Thomas Broyer <[hidden email]> ha scritto:

> Le dim. 2 févr. 2020 à 17:48, Anton Vodonosov <[hidden email]> a
> écrit :
>
> > Hello.
> >
> > In order to speed up the build of a multi-module project, I'd like to
> > reuse artifacts of modules that haven't changed.
> > Manual versioning is tedious and error-prone.
> >
> > Is it possible to automatically assign versions to modules computed as a
> > hash-of( hash-of(module sources) + hashes of all dependencies)?
> >
>
> Please define modules sources?
> Hint: you can't, at least not without knowing how all plugins work. Gradle
> Enterprise tries to have such knowledge fwiw to solve this exact issue.
>
> Also, you'll probably want to include system properties (or at least Maven
> properties) and some environment information (e.g. which JDK) in the hash.
>
> In this approach, every change in code will modify such hash-based version
> > of all dependent modules automatically.
> >
> > This would be similar to Nix package manager.
> >
> > How to do that in maven?
> >
>
> As said above, you could try Gradle Enterprise. Takari had something in the
> works too a few years ago.
> …or if that's really problematic for you, then migrate to another build
> tool, such as Gradle or Bazel.
>
>
> > Best regards,
> > - Anton
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: versioning by hashes to speedup multi-module build (a'la nix package manager)

Anton Vodonosov


03.02.2020, 00:15, "Enrico Olivelli" <[hidden email]>:

> (Apologises for top posting )
>
> This thread is about a bunch of requested features (cache and parallel
> executions of mojos) that we have been discussing on dev@ mailing list.
> As said in this thread the first show stopper for Maven is that we do not
> have a clear definition of input and outputs for each plugin.
> There are proposals but actually no one is spending actively engineering
> time on these topics.
>
> Any help is very appreciated, please join us on dev@
>
> Best regards
> Enrico

What threads in dev list do you recommend to follow in regard
to this subject?


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: versioning by hashes to speedup multi-module build (a'la nix package manager)

Anton Vodonosov
Thomas Broyer, Enrico Olivelli,

I consider the whole directory where the module's
pom.xml resides, excluding the target/ dir,
as the input, and the final module artifacts as
the output.

Even if some plugins allow sources outside the
pom.xml's directory (out of curiosity, is it possible?),
it is an acceptable restriction on project structure,
IMO.

The version hash approach I described may cause
some redundant work. For example, if only dependencies
changed, most often only re-testing is needed,
re-compilation and re-packaging are not necessary
(unless your dependency generates or instruments
code at compilation time, or you package an uberjar
or war, which includes dependency artifacts).
But for stability better to compromise,
accepting some redundant work, than go into such
complexities as intermediate results of individual
plugins, distinguishing test and prod sources,
types of dependencies. Even the simplest hash
versioning can potentially give significant
speedups for large multi-module projects.

I've spent this weekend trying to create a script
for such version hashes. But haven't completed
it (yet), due to various obstacles in maven
behaviour (impossible to use property expression
in the project/version element; the dependency:tree
goal requires artifact of the current version
to be present in the ~/.m2 folder, although
it doesn't look into the artifact content,
so it can even be a fake artifact).

Maybe someday I'll have more progress.


Best regards,
- Anton


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: versioning by hashes to speedup multi-module build (a'la nix package manager)

Anton Vodonosov
Ha, only after completing the script (even though a slow one)
I discovered that maven rebuilds modules even if
an artifact of the same version already exists in artifact
repository.

I hoped maven, in case a non -SNAPSHOT artifact
found in an artifact repository will just use it
and won't build the same version of a module again.

Is there a way to tell maven to do so?

Best regards,
- Anton

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: versioning by hashes to speedup multi-module build (a'la nix package manager)

Jason Young
It seems Maven itself never omits re-doing anything except for downloading
artifacts from a remote repository. The command you give to Maven and the
configuration of your projects dictates what it will do, no matter what
happened in the previous build. You _can_ omit projects in a multi-module
project if by manually specifying what projects to run:
https://blog.sonatype.com/2009/10/maven-tips-and-tricks-advanced-reactor-options/

Not what you're looking for, but maybe useful: We use one plugin that will
skip whole projects that have not changed WRT a given Git branch:
https://github.com/vackosar/gitflow-incremental-builder. With careful
configuration, this is an effective shortcut without sacrificing
repeatability.

Gradle advertises as a feature that it will not rebuild if rebuilding is
not required, or something to that effect. I assume some configuration
required sometimes.

On Tue, Feb 4, 2020 at 1:57 PM Anton Vodonosov <[hidden email]> wrote:

> Ha, only after completing the script (even though a slow one)
> I discovered that maven rebuilds modules even if
> an artifact of the same version already exists in artifact
> repository.
>
> I hoped maven, in case a non -SNAPSHOT artifact
> found in an artifact repository will just use it
> and won't build the same version of a module again.
>
> Is there a way to tell maven to do so?
>
> Best regards,
> - Anton
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

--
Reply | Threaded
Open this post in threaded view
|

Re: versioning by hashes to speedup multi-module build (a'la nix package manager)

Anton Vodonosov


04.02.2020, 23:32, "Jason Young" <[hidden email]>:
>
> Not what you're looking for, but maybe useful: We use one plugin that will
> skip whole projects that have not changed WRT a given Git branch:
> https://github.com/vackosar/gitflow-incremental-builder. With careful
> configuration, this is an effective shortcut without sacrificing
> repeatability.

How do you use it? I mean if you need to start full system,
(locally or deploying it to a qa server),
but the plugin has only built changed modules,
how do you download the rest?

Do you use this plugin in CI?

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: versioning by hashes to speedup multi-module build (a'la nix package manager)

Jason Young
Good questions. First of all, this plugin is CI-agnostic, but it does
require the project to exist in a `git` repository, whether that is in CI
or on your machine. Check the github page I linked to for more instructions
on how it determines what projects in a reactor are considered "changed"
and need to be built versus which are not changed and will be omitted from
the reactor.

In every Maven build, every dependency is checked this way:

   1. If it is a project in the reactor, use the artifact of that project.
   2. Otherwise, if the artifact is in the local repo, is that artifact.
   3. Last resort: Download from the remote repository.

There are some other rules omitted above, e.g. when to download a fresh
SNAPSHOT artifact based on your chosen snapshot policy, etc., but that's
the gist of it: Maven will obtain the artifact if it is not present, no
further configuration needed.

E.g. let's say you have one project that names 2 other projects A and B as
submodules, and A depends on B. If you run `mvn install -pl A` (NOT SURE
about that syntax), then Maven will look for B.jar from your local repo,
and resort to checking your remote repo (e.g. Maven Central) if it's not
there. But if you omit the `-pl A` part, Maven will build B, then build A
using B.jar.

Essentially, the plugin I linked to determines the project list based on
what has changed and what has not. Maven then decides whether to use
B/target/B.jar, ~/.m2/repository/.../B.jar, or to look to Maven Central.

HTH.

On Tue, Feb 4, 2020 at 4:08 PM Anton Vodonosov <[hidden email]> wrote:

>
>
> 04.02.2020, 23:32, "Jason Young" <[hidden email]>:
> >
> > Not what you're looking for, but maybe useful: We use one plugin that
> will
> > skip whole projects that have not changed WRT a given Git branch:
> > https://github.com/vackosar/gitflow-incremental-builder. With careful
> > configuration, this is an effective shortcut without sacrificing
> > repeatability.
>
> How do you use it? I mean if you need to start full system,
> (locally or deploying it to a qa server),
> but the plugin has only built changed modules,
> how do you download the rest?
>
> Do you use this plugin in CI?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

--
Reply | Threaded
Open this post in threaded view
|

Re: versioning by hashes to speedup multi-module build (a'la nix package manager)

Anton Vodonosov
After learning from gitflow-incremental-builder how to remove
modules from mavenSession if we want to skip them
I implemented a 'version as a hash of sources and dependency
tree" solution:
https://github.com/avodonosov/hashver-maven-plugin

It relies on using property expressions as versions.
A build extension loads values for those properties from a file.
This can be a file with "normal" maven versions maintained
manually in source control or a file generated by the "hashver"
mojo when user wants to use the hash versions and avoid
build of unchanged modules. The extension allows to skip
module build if the artifact of the same version exists.

Using property expressions for versions has two drawbacks:
1. Calculation of hash versions should be a separate maven
    invocation, it can't be done in the same maven session
    as the main goal, because calculating the hash versions
    requires building the dependency tree, and after dependency
    tree is built it's impossible (I believe) to inject newly calculated
    versions into the maven structures so that they are in effect
    when the main goal is executed.
2. This approach is a bit intrusive - user needs to adjust
    his pom.xml files to use property expressions in place
    of versions.

The advantage of this approach is that maven downloads
artifacts of the skipped modules automatically if this module
is a dependency of a changed module.

If it was possible to annotate artifacts with the "src hash"
using some attribute other the version, and hook into
the artifact download maven logic to find artifacts by this
attribute then generation of hash versions could be done
in the same session as the main goal, and probably
the solution could be applied without any modifications
of user's pom.

I noticed, when uploading snapshot artifacts maven
adds a timestamp (and a sequential "build number"?)
to the artifact name. And when resolving artifacts,
special versionResolver object is invoked, which
for SNAPSHOT sends a metadata request
to the repository to retrieve a real, timestamped
name of the artifact corresponding tot to the snapshot.
This logic could be adjusted to use the "src hash"
instead of the timestamp, and lookup in the remote
repo an artifact matching the current module's
"src hash". Any advice on how to do that?

In the long run, I believe it is desirable for maven
to natively support the notion of "build inputs hash"
for modules and artifacts. It allows significant build
time savings and still be very deterministic and stable.


05.02.2020, 01:26, "Jason Young" <[hidden email]>:

> Good questions. First of all, this plugin is CI-agnostic, but it does
> require the project to exist in a `git` repository, whether that is in CI
> or on your machine. Check the github page I linked to for more instructions
> on how it determines what projects in a reactor are considered "changed"
> and need to be built versus which are not changed and will be omitted from
> the reactor.
>
> In every Maven build, every dependency is checked this way:
>
>    1. If it is a project in the reactor, use the artifact of that project.
>    2. Otherwise, if the artifact is in the local repo, is that artifact.
>    3. Last resort: Download from the remote repository.
>
> There are some other rules omitted above, e.g. when to download a fresh
> SNAPSHOT artifact based on your chosen snapshot policy, etc., but that's
> the gist of it: Maven will obtain the artifact if it is not present, no
> further configuration needed.
>
> E.g. let's say you have one project that names 2 other projects A and B as
> submodules, and A depends on B. If you run `mvn install -pl A` (NOT SURE
> about that syntax), then Maven will look for B.jar from your local repo,
> and resort to checking your remote repo (e.g. Maven Central) if it's not
> there. But if you omit the `-pl A` part, Maven will build B, then build A
> using B.jar.
>
> Essentially, the plugin I linked to determines the project list based on
> what has changed and what has not. Maven then decides whether to use
> B/target/B.jar, ~/.m2/repository/.../B.jar, or to look to Maven Central.
>
> HTH.
>
> On Tue, Feb 4, 2020 at 4:08 PM Anton Vodonosov <[hidden email]> wrote:
>
>>  04.02.2020, 23:32, "Jason Young" <[hidden email]>:
>>  >
>>  > Not what you're looking for, but maybe useful: We use one plugin that
>>  will
>>  > skip whole projects that have not changed WRT a given Git branch:
>>  > https://github.com/vackosar/gitflow-incremental-builder. With careful
>>  > configuration, this is an effective shortcut without sacrificing
>>  > repeatability.
>>
>>  How do you use it? I mean if you need to start full system,
>>  (locally or deploying it to a qa server),
>>  but the plugin has only built changed modules,
>>  how do you download the rest?
>>
>>  Do you use this plugin in CI?
>>
>>  ---------------------------------------------------------------------
>>  To unsubscribe, e-mail: [hidden email]
>>  For additional commands, e-mail: [hidden email]
>
> --

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]