Skip to content

UnmappableCharacterException / MalformedInputException (problems with encoding) #68

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eyalroth opened this issue Apr 3, 2018 · 10 comments · Fixed by #117
Closed

UnmappableCharacterException / MalformedInputException (problems with encoding) #68

eyalroth opened this issue Apr 3, 2018 · 10 comments · Fixed by #117

Comments

@eyalroth
Copy link
Contributor

eyalroth commented Apr 3, 2018

I have a hybrid scala/java gradle project with multiple sub-projects and I'm trying to make scoverage work for me.

reportScoverage fails to execute on one of my sub-projects with the following exception:

Exception in thread "main" java.nio.charset.UnmappableCharacterException: Input length = 1
        at java.nio.charset.CoderResult.throwException(CoderResult.java:282)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.io.InputStreamReader.read(InputStreamReader.java:184)
        at java.io.BufferedReader.read1(BufferedReader.java:210)
        at java.io.BufferedReader.read(BufferedReader.java:286)
        at java.io.Reader.read(Reader.java:140)
        at scala.io.BufferedSource.mkString(BufferedSource.scala:96)
        at scoverage.report.CodeGrid.source(CodeGrid.scala:63)
        at scoverage.report.CodeGrid.<init>(CodeGrid.scala:17)
        at scoverage.report.ScoverageHtmlWriter.filePage(ScoverageHtmlWriter.scala:78)
        at scoverage.report.ScoverageHtmlWriter.scoverage$report$ScoverageHtmlWriter$$writeFile(ScoverageHtmlWriter.scala:44)
        at scoverage.report.ScoverageHtmlWriter$$anonfun$scoverage$report$ScoverageHtmlWriter$$writePackage$1.apply(ScoverageHtmlWriter.scala:37)
        at scoverage.report.ScoverageHtmlWriter$$anonfun$scoverage$report$ScoverageHtmlWriter$$writePackage$1.apply(ScoverageHtmlWriter.scala:37)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scoverage.report.ScoverageHtmlWriter.scoverage$report$ScoverageHtmlWriter$$writePackage(ScoverageHtmlWriter.scala:37)
        at scoverage.report.ScoverageHtmlWriter$$anonfun$write$1.apply(ScoverageHtmlWriter.scala:27)
        at scoverage.report.ScoverageHtmlWriter$$anonfun$write$1.apply(ScoverageHtmlWriter.scala:27)
        at scala.collection.immutable.List.foreach(List.scala:381)
        at scoverage.report.ScoverageHtmlWriter.write(ScoverageHtmlWriter.scala:27)
        at org.scoverage.ScoverageWriter.write(ScoverageWriter.java:65)
        at org.scoverage.SingleReportApp.main(SingleReportApp.java:41)

I found an issue on the SBT plugin issue tracker (sbt-scoverage#204) with the same exception, and it implied using the proper encoding setting when invoking the scala compiler. Well then, I added this to my project:

gradle.projectsEvaluated {
    tasks.withType(AbstractScalaCompile) {
        options.encoding = "UTF-8"
        scalaCompileOptions.setEncoding("UTF-8")
        List<String> parameters = ['-encoding', 'UTF-8']
        List<String> existingParameters = scalaCompileOptions.additionalParameters
        if (existingParameters) {
            parameters.addAll(existingParameters)
        }
        scalaCompileOptions.additionalParameters = parameters
    }
}

This code is an overkill to make sure Scala compiles with UTF-8. But it didn't change a thing.

I looked at source code of where the exception is thrown at CodeGrid.source(), and I saw that it's using a custom encoding propagated by ScoverageHtmlWriter. The problem here though is that ScoverageHtmlWriter is never initialized with any custom encoding; not in the original SBT plugin code nor in this project's ScoverageWriter.

This obviously should be fixed. But anyway I thought "let's just try overriding the default encoding with a simple -Dfile.encoding=utf-8". Well, this caused an exception in an even earlier stage:

Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 1
        at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
        at java.io.InputStreamReader.read(InputStreamReader.java:184)
        at java.io.BufferedReader.read1(BufferedReader.java:210)
        at java.io.BufferedReader.read(BufferedReader.java:286)
        at java.io.Reader.read(Reader.java:140)
        at scala.io.BufferedSource.mkString(BufferedSource.scala:96)
        at scoverage.Serializer$.deserialize(Serializer.scala:130)
        at scoverage.Serializer.deserialize(Serializer.scala)
        at org.scoverage.SingleReportApp.main(SingleReportApp.java:36)

Again, problems with encoding. I'm assuming this is a problem with the original SBT plugin and not with the gradle plugin, but I thought I'd file the issue here since I'm using the gradle plugin (well, I'm trying to use it, but it seems it needs to be fixed in order for it to be relevant to me).

@maiflai
Copy link
Contributor

maiflai commented Apr 3, 2018

Hi,

How are you passing the file.encoding system property please?

You may find that exporting an environment variable before launching Gradle makes this work transparently.

export JAVA_TOOL_OPTIONS=-Dfile.encoding=UTF-8

Thanks,
Stu

@eyalroth
Copy link
Contributor Author

eyalroth commented Apr 3, 2018

It worked, but IMHO this is only a workaround, and not a very good one. Making sure this variable is set on every environment (devs and CI) would result in build inconsistencies for sure. I couldn't find a way to define these variables on a per-project basis other than placing the export in the wrapper scripts, but I prefer not to have these scripts in my VCS but rather rely on the Wrapper task.

The scoverage plugins should allow for overriding the default JVM encoding. Until that would be implemented (if at all), perhaps there is a different way of injecting the default encoding to the actual gradle process executing the report?

Note that the following configuration didn't work either:

configure(reportScoverage) {
    systemProperty "file.encoding", "utf-8"
}

That makes sense since the SingleReportApp is being executed in its own JVM. Maybe add a custom support for giving custom VM args? Or maybe even just the default encoding (which will be translated into a VM arg in ScoverageReport.groovy)?

@maiflai
Copy link
Contributor

maiflai commented Apr 3, 2018

Yes, this was just a way to test that the default JVM encoding was to blame.

I seem to recall Gradle washing their hands of this a long time ago; Maven does support a project level encoding.

I suspect the reporting task is not respecting the configured system properties; I will take a look tomorrow.

@maiflai
Copy link
Contributor

maiflai commented Apr 3, 2018

That said - are you not using the gradle wrapper scripts everywhere?

This gives you a convenient location to configure the build environment consistently.

@eyalroth
Copy link
Contributor Author

eyalroth commented Apr 3, 2018

I suspect the reporting task is not respecting the configured system properties; I will take a look tomorrow.

It seems so. Perhaps this only happens while using the gradle daemon (child JVMs spawned from within the daemon will not "inherit" its arguments). There could be a relatively easy workaround here by adjusting ScoverageReport.groovy as I've mentioned earlier.

That said - are you not using the gradle wrapper scripts everywhere?

Well, I make sure to use gradlew and I configure my IDE to use the gradle wrapper task as well, but I honestly have nothing else configured there. I try to make my code and builds cross platform and not rely on environment variables :)

@eyalroth
Copy link
Contributor Author

eyalroth commented Apr 4, 2018

I added a PR which allows to configure the plugin with a custom encoding, thus preventing the exceptions.

@gslowikowski
Copy link
Member

Hi guys.

Gradle plugin should call this ScoverageWriter constructor (with sourceEncoding parameter) like in SBT plugin or in Maven plugin.

@eyalroth
Copy link
Contributor Author

eyalroth commented Apr 9, 2018

@gslowikowski Thanks for joining in. That is true, but it would hardly solve the problem. Take a look at the PR I created (#70) for further details on this.

@kknd22
Copy link

kknd22 commented Jun 13, 2018

we have seen this behavior as well - however they seem to happen very inconsistently - only happens on some ci workers but works in some. Any reason why?
Thanks
-cl

@eyalroth
Copy link
Contributor Author

eyalroth commented Sep 16, 2019

I've gotten back to investigating this issue and I believe I have the full picture in mind.

There are basically three steps to Scoverage:

  1. Instrumenting the source files via the compiler. This includes generating the scoverage.coverage.xml file, which is a sort of a "map" of the source files.
  2. Running the tests, which generate the measurement files scoverage.measurements.X.
  3. Generating a report based on the "mapping" file from step 1 and the measurement files from step 2.

First off, there is an encoding problem in the scalac plugin with the creation of the "mapping" file in step 1. The scalac plugin relies on the JVM encoding instead of the -encoding compiler option. This is discussed in my (outdated) PR #70, and eventually results in a badly encoded "mapping" file. I believe this should be fixed in the original scalac plugin, and shouldn't be handled by the Gradle plugin at all.

The problem that the Gradle plugin is responsible for is failing to generate an HTML report file. The ScoverageHtmlWriter actually reads the original source files when generating a report; therefore, it must read the source files with the right encoding. Right now it uses the default JVM encoding to read the source files, but ScoverageHtmlWriter also accepts an encoding parameter. The ScoverageXmlWriter avoids this problem by not reading the source files, presumably since it doesn't contain source code in the report).

What then happened in my original cas, and how come some Unicode characters work and some don't? Well, my default OS encoding -- and therefore my default JVM encoding -- is Windows-1252, while my source files are encoded in UTF-8. In one of my files I had the (U+201D) character, which translates to 0xE2 0x80 0x9D hex in UTF-8, and that's how my source file was written on the file system. Thing is, 0x9D is not mapped in Windows-1252, so when trying to read that file with this encoding, an exception is thrown. How come the scala compiler able to read this file? That's because its default encoding is UTF-8 and is not based on the default JVM encoding.

What the Gradle plugin should do then is to invoke the ScoverageHtmlWriter with the source files encoding, which are either configured via the gradle scala plugin scalaCompileOptions, or defaults to UTF-8 much like the scala compiler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants