How to process images and videos within Java JVM

We'll discuss two options: metadata parsing & thumbnail generation

Processing of images - let alone videos - within the Java JVM has always been a challenging task. ImageIO classes have come a long way since JDK7 - together with the usual SDK bugs - not always giving you what you expect (bad image quality, not always supporting all types of JPEG standards, ...). At the end of the line you are better off with open source libraries specifically written for image processing, like ImageMagick and GraphicsMagick. These libraries are also what we use in our ImageServer Across Module to generate thumbnails and variants for images, PDFs, ...

Recently we were involved in a project where we had to display and play audio/video files which had been uploaded by a customer. The page also showed some metadata from the media asset and files would be rejected after upload (e.g. if the bitrate or other metadata was not adequate). In short we had to parse metadata for all kinds of audio and video assets and then render this media file to the customer. We aren't talking about a Netflix streaming platform here, just some basic audio/video streaming.

We looked for libraries that could parse video files (in this case we were talking MXF files) to extract the metadata. There are libraries like Netflix Photon and But would you really want to parse and read files in the JVM? The short answer is no, you don't want all this crud in your Java memory.

So what are the options?

Metadata parsing

We looked at ffmpeg and MediaInfo for this.

If you have ever converted your personal (S)VCD,DVD disks to MKV (Matroska container) - or AVI, MPEG back in the days - you surely noticed that ffmpeg is the defacto tool for converting/parsing media files.

MediaInfo is a tool which was suggested by the customer and provides structured metadata probing from media files.

The parser we wrote supports ffmpeg and Mediainfo for flexibility and maps the JSON from these tools onto the same data structure. Both give similar outputs

ffmpeg probe

$ ffprobe -show_format -show_streams audiocheck.net_polarity_guitarOK.wav -print_format json -loglevel 0
    "streams": [
            "index": 0,
            "codec_name": "pcm_s16le",
            "codec_long_name": "PCM signed 16-bit little-endian",
            "codec_type": "audio",
            "codec_time_base": "1/44100",
            "codec_tag_string": "[1][0][0][0]",
            "codec_tag": "0x0001",
            "sample_fmt": "s16",
            "sample_rate": "44100",
            "channels": 2,
            "bits_per_sample": 16,
            "r_frame_rate": "0/0",
            "avg_frame_rate": "0/0",
            "time_base": "1/44100",
            "duration_ts": 224041,
            "duration": "5.080295",
            "bit_rate": "1411200",
            "disposition": {
                "default": 0,
                "dub": 0,
                "original": 0,
                "comment": 0,
                "lyrics": 0,
                "karaoke": 0,
                "forced": 0,
                "hearing_impaired": 0,
                "visual_impaired": 0,
                "clean_effects": 0,
                "attached_pic": 0,
                "timed_thumbnails": 0
    "format": {
        "filename": "audiocheck.net_polarity_guitarOK.wav",
        "nb_streams": 1,
        "nb_programs": 0,
        "format_name": "wav",
        "format_long_name": "WAV / WAVE (Waveform Audio)",
        "duration": "5.080295",
        "size": "896208",
        "bit_rate": "1411269",
        "probe_score": 99
$ mediainfo --output=JSON audiocheck.net_polarity_guitarOK.wav
"media": {
"@ref": "audiocheck.net_polarity_guitarOK.wav",
"track": [
"@type": "General",
"AudioCount": "1",
"FileExtension": "wav",
"Format": "Wave",
"FileSize": "896208",
"Duration": "5.080",
"OverallBitRate_Mode": "CBR",
"OverallBitRate": "1411351",
"StreamSize": "44",
"File_Modified_Date": "UTC 2020-03-03 12:02:30",
"File_Modified_Date_Local": "2020-03-03 13:02:30"
"@type": "Audio",
"Format": "PCM",
"Format_Settings_Endianness": "Little",
"Format_Settings_Sign": "Signed",
"CodecID": "1",
"Duration": "5.080",
"BitRate_Mode": "CBR",
"BitRate": "1411200",
"Channels": "2",
"SamplingRate": "44100",
"SamplingCount": "224028",
"BitDepth": "16",
"StreamSize": "896164",
"StreamSize_Proportion": "0.99995"

Note that if you are using a stock Debian install, you need to install the .deb packages from - otherwise you will be stuck with a (very) old version which has no JSON output.

Wrapping these outputs to a common data structure was more than enough to do our metadata processing checks and store some of the metadata for display purposes (e.g. the duration and the format of the media file).

Thumbnail generation

For thumbnail generation, there were two requirements. An audio file would have to generate a waveform. A video file would have to generate a good thumbnail for that video.

Based on the metadata above, you can quickly differentiate if the uploaded media file is an audio file or a video file (a video file has a video stream/track).

Both follow another track for thumbnail generation.

To display the waveform on overview pages, we simply use ffmpeg to generate a waveform with the following command

$ ffmpeg -y -i inputfile -filter_complex "showwavespic=colors=#007bff:split_channels=1" -frames:v 1 -c:v png -loglevel -8

This would generate a waveform in PNG format and split the different audio channels in the waveform. After this image is generated, we upload it to our Across ImageServer.

On the details page of the audio asset, we use WaveSurfer ( to play the audio file and render the audio channels - nothing special there.

Video thumbnail generation and video playing

To display a thumbnail on overview pages, we can use the ffmpeg thumbnail filter

$ ffmpeg -i inputFile -vf "thumbnail" -frames:v 1

This filter is quite good at guestimating a good thumbnail picture. You can do more fancy things like

$ ffmpeg -ss 3 -i inputFile -vf "select=gt(scene\,0.5)" -frames:v 5 -vsync vfr out%02d.png

Which would generate 5 thumbnail frames, skipping 3 seconds from the start (these might be credits) and grabbing the frames where "scene changes" are bigger than 50%. There is a good discussion at for this.

In the end the customer decided the last second frame would be the best for their purpose since that frame usually contains a closing packshot from the commercial video.

Since the videos are 25fps the command we ended up with was the following (where 89 is the total number of frames - 26). Yes, 26 ... because ffmpeg does zero-based counting of the frames.

$ ffmpeg -i inputFile -vf "select=gte(n\,89)" -frames:v 1

The generated thumbnail is then uploaded in ImageServer and that's that. Now ... to play the video file ...

Well, MXF files are not supported by video players on the web, the best bet was to transcode this video container format to MP4 (which is the most compatible cross browser format these days).

Luckely, ffmpeg comes to the rescue, though it can be challenging to find the right command which generates an MP4 that plays in most of the browsers.

$ ffmpeg -y -i inputFile -vcodec libx264 -pix_fmt yuv420p -profile:v baseline -level 3 transcodedFile

This command generates an MP4 file with a baseline profile and a YUV420P color encoding scheme. This baseline profile and color scheme makes sure it displays properly on Safari (for Mac).

The transcoded file is stored using the Across FileRepositoryModule in a backing store (Azure BLOB storage in this case but it also support AWS S3 or a Local store).

Now ... to really play the video file ...

We need a video player for the web to achieve this. The most common library there is videojs ( which is easy to setup and quite customizable, enough for our purposes.

Just providing the <video> tag with the correct url immediately yielded results in Firefox and Chrome, however Safari was stubborn to play the file.

Safari tries to be a bit special - as always with Apple things - by adding Range Headers to the HTTP request. This is to avoid sending all bytes from the video file in one go over the wire.

Instead the HTTP Range headers specify which byte ranges need to be fetched.

This can easily be done with the ResourceRegion construct in Spring Boot, the following blog was helpful to achieve this:

In the end, the setup was able to:

  • Extract metadata from any media file

  • Generate thumbnails for media files (a waveform for audio and a thumbnail for video)

  • Play audio files via Wavesurfer

  • Play video files via VideoJS

Dit blog is geschreven door de specialisten van Foreach.

Inmiddels is Foreach onderdeel van iO. Meer weten? Neem gerust contact op!

logo iO