Skip to content

Add an option to execute the plugin with a custom encoding setting #70

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

eyalroth
Copy link
Contributor

@eyalroth eyalroth commented Apr 4, 2018

The data scoverage is covering is often encoded with custom charsets, which may lead to problems with reading / writing the coverage data, since scalac plugin relies on the default encoding configured on the JVM and doesn't fully support configuring custom encoding with other means.

This could be seen in multiple locations such as:

This means that to overcome such problems one must be able to set the default JVM encoding when using the scalac plugin. This PR adds the ability to do so with the gradle plugin.

It makes sure to execute all the launched processes / JVMs -- including both the report and aggregate apps and the scalac compilation with the compiler plugin -- with the custom encoding as the default JVM encoding (using the -Dfile.encoding=encoding argument).

This PR fixes #68.

@@ -146,10 +150,23 @@ class ScoverageExtension {
if (extension.highlighting) {
parameters.add('-Yrangepos')
}
if (extension.encoding) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part was not actually necessary in my build, but I suspect that this is only because I was using the Gradle daemon, where as in the case of not using it, it may be that the other settings (in the fork options) will be irrelevant and this one will actually matter.

Copy link
Contributor

@maiflai maiflai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for this - it's great to find fellow-minded folks.

It seems that there is already a property on ScalaCompileOptions, which can be set globally with:

tasks.withType(ScalaCompile) {
 scalaCompileOptions.encoding = 'UTF-8' 
}

I think I prefer that this plugin respect the compiler configuration rather than introducing a new place to configure things. The report task reports on something that has to be compiled in.

I think this makes the PR much simpler, I'll see if I can get a fix out later today unless you're keen to take a closer look?

Cheers,
Stu.

@eyalroth
Copy link
Contributor Author

eyalroth commented Apr 8, 2018

That was one of the things I tried at first actually, as I already mentioned in the issue (#68):

gradle.projectsEvaluated {
    tasks.withType(AbstractScalaCompile) {
        options.encoding = "UTF-8"
        scalaCompileOptions.setEncoding("UTF-8")
        List<String> parameters = ['-encoding', 'UTF-8']
        List<String> existingParameters = scalaCompileOptions.additionalParameters
        if (existingParameters) {
            parameters.addAll(existingParameters)
        }
        scalaCompileOptions.additionalParameters = parameters
    }
}

AbstractScalaCompile is much like using ScalaCompile, and setEncoding is equivalent to .encoding = . That didn't work for me.

The only thing that actually worked was adding the JVM encoding argument to the fork options. I'm guessing this has something to do with how the Gradle daemon executes the tasks. I added the JVM argument to the non-fork options as well only to try cover cases when executing gradle without the daemon (but haven't actually tested this).

Other than this, I added the JVM argument to both of the applications' (report and aggregate) execution since those make use of "third party" code that doesn't support custom encoding other than this method.

@maiflai
Copy link
Contributor

maiflai commented Apr 8, 2018

Sorry for the confusion.

I've created #71 - the intention was to actually respect the values that are being configured on the ScalaCompile tasks, rather than to add additional configuration options for this plugin.

I've added this to the check task as it too uses the default encoding when parsing XML.

Would it be possible to see if the PR works for you? I still need to figure out how to test this automatically.

@eyalroth
Copy link
Contributor Author

eyalroth commented Apr 9, 2018

Again, the problem is not just with the gradle plugin code, but with the scalac plugin as well. The scalac plugin is generating a coverage XML with corrupted data.

I've managed to find the problematic source code that causes this problem. The source code has a literal string with the character \u201c, which is e2809c in hex. For some reason, the scalac plugin converts this to a mere 93 hex in the XML.

To reproduce / automatically test this, you could create a project with the following source code:

object Foo {
  def foo = "this is unicode - “ - this is unicode"
}

And reportScoverage will fail (at least it does on my environment).

I don't know exactly why this is happening. I tried to reproduce this with my own code writing this character to a file and it wrote it just fine. Perhaps there's something going on when the plugin / compiler reads the source code. Again, this could only be solved if the fork JVM arguments are set to the right encoding.

I don't mind relying on the encoding configured for the ScalaCompile task instead of an additional parameter for the plugin, though I still think this case should be documented in the readme.md in a "troubleshooting" section.

@maiflai
Copy link
Contributor

maiflai commented Apr 9, 2018

Ah, ok. It looks as thought the current version of scalac plugin should produce UTF-8 regardless of any specified encoding.

It may be that SBT adds the fork option to set encoding as well as passing the command-line argument to the compiler (as you have done above).

I'll see if I can take a closer look tomorrow; this may also involve an upstream PR or two :-/.

Stu.

@gslowikowski
Copy link
Member

For me a test with this code:

object Foo {
  def foo = "this is unicode - “ - this is unicode"
}

works. The files scoverage.coverage.xml and Foo.scala.html in html report contain properly encoded left upper quotation mark character.
I have encoding set to UTF-8 by default.

@eyalroth
Copy link
Contributor Author

Well, I'm running on Windows 10 and my default Scala codec is windows-1252:

scala> def foo()(implicit codec: scala.io.Codec) = codec
scala> foo()
res0: scala.io.Codec = windows-1252

Perhaps that has something to do with that...

@gslowikowski
Copy link
Member

gslowikowski commented Apr 10, 2018

@eyalroth can you confirm that with encoding set to UTF-8 it works for you?

@eyalroth
Copy link
Contributor Author

Well, yes, this is what this is about - forcing UTF-8 as the default encoding on all of the JVM processes involved in the build. When the default encoding is UTF-8 then everything works.

@gslowikowski
Copy link
Member

So Gradle plugin must provide a way to set desired encoding (like in #70 or #71) and pass it to forked JVM.

@gslowikowski
Copy link
Member

Don't force UTF-8. Any user-defined encoding should work (provided source files are properly encoded).

@eyalroth
Copy link
Contributor Author

Yep, obviously we don't want to force UTF-8 specifically, but only what the user has defined (via scalaCompileOptions.setEncoding(???), I guess).

It would have been better if there was a way to invoke the compiler / scalac plugin with custom encoding, but this method should work for the Gradle plugin as well.

@gslowikowski
Copy link
Member

I've changed my system encoding to Windows-1250 and:

  • generated html reports are not encoded properly on "Code Grid" tab, but encoded properly on "Statement List" tab, scoverage.coverage.xml report is not encoded properly
  • after adding scalacOptions ++= Seq("-encoding", "UTF-8") in SBT or <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> in Maven html reports are encoded properly on both tabs, scoverage.coverage.xml report is not encoded properly

Conclusions:

  • scoverage.coverage.xml file should always be written and read in UTF-8 IMO (Serializer#serialize and Serializer#deserialize). This is internal XML file so I see no reason in using other, system default so unpredictable, encoding. This needs separate PR in scalac-scoverage-plugin`.
  • properly passing encoding to scalac via scalacOptions (in Maven project.build.sourceEncoding is converted to scalacOptions) is the only requirement to properly render HTML reports.

@maiflai
Copy link
Contributor

maiflai commented Apr 27, 2018

Sorry, I'm struggling to test this on my local machine.

I've set the fork option as you suggest, please could you see if #71 helps?

Thanks,
Stu

@maiflai
Copy link
Contributor

maiflai commented May 6, 2018

Hi,

I've managed to source a build environment with Windows 10, and I can't seem to reproduce the issue with a simple Gradle project.

I did have to explicitly state the encoding. If I don't, I see

IO error while decoding ...\example\src\main\scala\hello\World.scala with UTF-8
Please try specifying another one using the -encoding option

If I do specify the encoding

tasks.withType(ScalaCompile) {
    scalaCompileOptions.encoding='windows-1252'
}

Then reportScoverage faithfully reproduces the magic quotes.

println("this is unicode - “ - this is unicode")

This is with gradle-scoverage-2.1.0, scalac-scoverage-1.3.1, and scalac 2.12.5

Sorry to be a pain, but please could you share an example repository that fails so that I can take a closer look?

Thanks,
Stu.

@maiflai
Copy link
Contributor

maiflai commented May 7, 2018

I've uploaded my test repository to https://github.com/maiflai/gradle-scoverage-encoding - please could you let me know if it builds?

Thanks,
Stu

@eyalroth
Copy link
Contributor Author

eyalroth commented May 7, 2018

@maiflai Your system's encoding has to be changed to Windows-1250. This cannot be done with Gradle, but you'd have to configure your Windows machine as so.

@maiflai
Copy link
Contributor

maiflai commented May 8, 2018

Sorry, I'm not sure I follow.

My Windows machine is configured for the United States and reports that it is using Cp-1250.

How do I make my system encoding match yours?

maiflai/gradle-scoverage-encoding@91658cc

Task :showEncoding
Cp1252

@eyalroth
Copy link
Contributor Author

eyalroth commented May 8, 2018

I believe cp-1250 is equivalent to windows-1250 ("cp" stands for code page). Mine is actually windows-1252, but looks like the problem exists with both encodings, which are pretty standard on windows machines.

Can you please check the encoding with a scala REPL using the code I used in one of my previous comments (the one with the implicit codec variable)?

@maiflai
Copy link
Contributor

maiflai commented May 8, 2018

$ bin/scala
Welcome to Scala 2.12.6 (OpenJDK Client VM, Java 1.8.0_40).
Type in expressions for evaluation. Or try :help.

scala> def foo()(implicit codec: scala.io.Codec) = codec
foo: ()(implicit codec: scala.io.Codec)scala.io.Codec

scala> foo()
res0: scala.io.Codec = windows-1252

@maiflai
Copy link
Contributor

maiflai commented May 8, 2018

does my test repository build?

@eyalroth
Copy link
Contributor Author

@maiflai well, not exactly.

If you remove this part from the build.grade:

tasks.withType(ScalaCompile) {
    scalaCompileOptions.encoding='windows-1252'
}

Then the compilation fails with:

IO error while decoding World.scala with UTF-8
Please try specifying another one using the -encoding option

Which is strange and shouldn't normally happen. I realized that the problem is with the World.scala file itself, which is not encoded correctly. As an example, take a look at how IntelliJ reads this file:
intellij

Creating a file with proper UTF-8 encoding makes the compilation work. Re-adding the scalaCompileOptions.encoding part to the gradle actually fails the compilation again with a similar exception, though this time it is windows-1252:

IO error while decoding World.scala with windows-1252

Anyhow, after encoding the file properly and removing that gradle part, reportScoverage actually seems to work with \u201C (), but doesn't work with \u201D ().

@eyalroth
Copy link
Contributor Author

See #117.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UnmappableCharacterException / MalformedInputException (problems with encoding)
3 participants