Friday, December 07, 2012

LibreOffice Visio Import Filter: 20 years of drawings opened in your favourite office suite

It is true that the support of the most used Microsoft Visio file formats in LibreOffice will celebrate 1 year next February. And I will gladly have a birthday talk with any of you who will be freezing in Brussels during the next FOSDEM 2013. Nonetheless, even though libvisio was in development for several months already, the Visio story was far from finished when we released that day. As I already mentioned in another blogpost concerning reverse-engineering of file formats, assessment of a conversion quality in this kind of cases is illusory before real users get to stress-test it with real-life documents.
Since the first release of our filter in LibreOffice 3.5.0, we were improving it thanks to bug reports from our users. It is a big thank you that I would like to say to all those that took the bother to submit reports in our bugzilla. Without you, guys, this filter would be only a moot exercise.
But wait... Do I write this blog now only to thank the people who contributed to the current quality of the filter? Yes to a big extent! Nevertheless, I know that the distinguished readers of this blog would like to have some news. And, yes, we have some news.
The libvisio library underwent heavy re-factoring as we started to understand more and more details about the underlying file-format.
  1. A particular bug report about files imported as empty pages provided us with a document structure that we did never see before. This resulted in a more generic parser and unification on the way we parse master shapes and visible pages.
  2. This re-factoring in its turn allowed us to extend our file-format coverage to all earlier binary Visio file-format versions. We now support all binary Visio documents starting from Visio 1 (released in 1992).
  3. Extending the support to earlier file-format versions allowed us to better understand the development of the file-format, to find more information that we did not parse before, and improve the conversion quality for other binary versions too.
  4. Another re-factoring came with our work to support the XML-based Visio file-formats, namely the "XML Drawing" also known as *.vdx; and the Microsoft Visio 2013 new file-format, known as *.vsdx.
So the news is that LibreOffice 4.0.0 will be able to open ALL Visio files starting from Visio 1 (release in 1992) until Microsoft Visio 2013 (released just some weeks ago).
And since the readers of this blog are more interested in pictures than in pointless words, here come some candies for your eyes:
File opened in Visio 1.0 The same file opened in LibreOffice 4.0.0 beta1
File in Visio 1.0 File in LibreOffice 4.0.0 beta1

VSDX File opened in Microsoft Visio 2013 The same file opened in LibreOffice 4.0.0 beta1
VSDX File in Microsoft Visio 2013 File in LibreOffice 4.0.0 beta1
So, download the LibreOffice 4.0.0 beta1 and help us testing the new big release. We are interested in bug reports that help us to improve our quality. And for those that would love to support us with donations, just click here:
Donate for LibreOffice

Monday, November 26, 2012

LibreOffice CorelDraw import filter: improvements by user input

It has been a long time without communicating with the distinguished readership of my blog. There was a hard decision to be made between producing code and producing literature. The code won until now. But now I have found a time to lift my head up from the coding, so the literature is back.

Many of you might be wondering what happened since my post about the text support in CorelDraw files from last June. Things are going pretty well. Since the CorelDraw import filter was released with LibreOffice 3.6, the users started to use the feature and report bugs. We were working on fixing them and improving the libcdr's quality.

Quick overview of reverse-engineering process

From my discussions with our users and developers on-line and during some of the conferences that I attended, I realize that there is a slight misunderstanding in the large public about how the reverse-engineering works. So, here are some thoughts that may help understand it a bit more:

At the beginning of the process, there is a file-format. We don't know anything about its internal structure. There is no documentation whatsoever about it. One tries to generate a file in this file-format and examine it in hexadecimal viewer. Next, one tries to operate some little change in the document and examine what changed in the file itself. Eventually after many iterations, one might find regularities and some structure that helps to divide the file into several sections or blocks of more manageable size. It is essential in this phase that one can encode this information into some kind of introspection tool, since a plain hexadecimal viewer is not a very productive tool in the long run. We use for introspection of documents Valek Filippov's oletoy, a python tool that stores our knowledge about the structure of different file-formats.

Once there is enough information about how to parse the document structure, the next target becomes to get some visible results. In order to save time and get visible results in a short time, all libraries such as libcdr or libvisio, use the libwpg's interface. Reusing this interface means a considerable saving of time, since there are already working generators of ODG and SVG from the callbacks of this interface. Having visible results soon in the development/reverse-engineering cycle also allows visually asses the import results and correct them if necessary. Eventually, one can realize the absence of necessary information and try to go back to reverse-engineering to find it.

Users' feedback is essential

The support of reverse-engineered file-formats is a constant work-in-progress. A subtle dance between implementation and information digging. In this process, the user feedback is an essential element. The theories about the meaning of some information inside file hold only until a file comes to falsify them. Even a complex file generated by a developer is easily beaten by real life documents. And each file that shows a "weird" bug is advancing the understanding of the file-format. Let us look at this example:

After the release of LibreOffice 3.6.1, we got a not so good assessment of the quality of the CorelDraw import filter in the heise.de' c't review. Those of you that understand German can delight in the nuanced evaluation:

Ein neuer Import-Filter in Draw öffnet jetzt auch CorelDraw-Dateien, was uns im Test allerdings nur mit sehr einfachen Zeichnungen fehlerfrei gelang. In dieser Form ist er schlicht unbrauchbar.

Which can be mildly translated into English (given the understatements so common in en-GB):

A new import filter in Draw opens now also CorelDraw files, which we managed to do without errors only with very simple drawings. In this form, it is rather unusable.

Since we are really concerned about the quality of our software, we are thankful for any bug report whether it is brought to us in a friendly or other manner. This specific bug report helped us to understand how are stored in newer CorelDraw files chains of matrix transforms. And since a picture speaks louder then thousand words, compare the document c't was refering to opened in LibreOffice 3.6.2 and then in LibreOffice 3.6.3, after we fixed the position bits.

File opened in Libreoffice 3.6.2 The same file opened in LibreOffice 3.6.3
File in LibreOffice 3.6.2 File in LibreOffice 3.6.2

So feel encouraged to submit bugs against the CorelDraw import filter, or — even better — send us patches for your favorite itch.


Monday, July 02, 2012

Susan's Book on Intellectual Property and Access to Education

I am happy to announce the upcoming book of my dear wife. A must read for all interested in intellectual property, in access to copyrighted materials and in development issues.

This book originates from a PhD thesis defended at the Graduate Institute of International and Development Studies, Geneva, Switzerland. It has been awarded "summa cum laude" mention.

Check, please, with your libraries whether they know about the book and advise them strongly to purchase it for the biggest good of the humanity :)

Tuesday, June 12, 2012

LibreOffice CorelDraw Import filter - text support hatches out

Uff, it is done!!!

We started to work on the text support inside libcdr already before the Libre Graphics Meeting in Vienna. We worked hard during the talks and the long evenings after having eaten some portions of Wienerschnitzl.

Now we are proud to announce that we managed to release yesterday libcdr-0.0.8 with "basic initial primitive [u]ncomplete" (further BIPU) text support. At the moment, we are supporting only a couple of parameters as a font face and font size and we are able to detect the encoding and produce a corresponding utf-8 string. Far from being perfect, it is nonetheless a milestone, because in the FOSS world, there was no support for CorelDraw text before.

We know that you prefer to look at nice pictures instead of reading bad text. So, this gives your heart's desires.

A simple document with text in CorelDraw 7:

fancytext_cdr7.cdr in CorelDraw 7

The same document opened in a build of LibreOffice from yesterday:

fancytext_cdr7.cdr in CorelDraw 7

At the moment, libcdr is able to convert text in CorelDraw documents from versions 7 to 16. Nonetheless, we know already roughly how to read it in files of lower versions and we will add the support for next release. In the same way, we will extend our support of other text properties, like font colour, transparency, effects, paragraph alignments, character positions, etc.

How can I test it? All this goodness will be part of LibreOffice 3.6.0 release. You will be able to test the text support in the 3.6.0 beta2 pre-release. For the brave, any of the daily builds that are built from a code checkout after June 11th also include libcdr-0.0.8 and thus the text support in CorelDraw files.

As usual, this is a free and open source software project and, as such, it delights in developers that want to help. So, if you feel the itch, patches can be sent to libreoffice-dev mailing list. And, do not forget to find a way to join the #libreoffice-dev channel at irc.freenode.net in order to meet other developers. We can promis you that you will feel at home in the LibreOffice community.

Wednesday, June 06, 2012

LibreOffice MS Publisher Import filter - young but strong baby

As Sophie Gauthier announced in the language of Voltaire, LibreOffice was branched for the beta phase in view of the 3.6 release. This is a major step in order to bring the features we were working on during the last half a year to the end users. But, it is also oportunity to bring to the main codebase all the nifty nice features that were developed in feature branches and targeted for the next big release, presumably the 3.7.

It is this way that the first version of our new Microsoft Publisher import filter landed to the master. This filter is developed by Brennan Vincent from the University of Arizona in the frame of the Google Summer of Code. Although being a work in progress and supporting for the while only the Publisher 2003 file-format, the progress is spectacular. Brennan has been busy like a bee even long before the start of the program. After only two weeks from the official kick-off, we have a first (non-)release, libmspub-0.0.0.

And as the careful readers of this blog already know, an image speaks louder then thousand words, here are the pics:

A random document from the Internet opened in Microsoft Publisher 2003:

Document in Publisher 2003

The same document opened in LibreOffice master build from yesterday:

The same document in LibreOffice Draw

With Valek Filippov, we have a lot of fun mentoring this project. If anybody of the distinguished readership wants to join this effort, the code of libmspub lives in LibreOffice freedesktop.org repository. The patches can be sent to libreoffice-dev mailing list. And, do not forget to find a way to join the #libreoffice-dev channel at irc.freenode.net in order to meet other developers.

You will never regret the decision to get involved in LibreOffice.

Monday, April 23, 2012

Google Summer of Code 2012 - accepted projects for LibreOffice

Google announced today the accepted students for Google Summer of Code 2011.

The students working on LibreOffice will be:

StudentTitleMentor
Andrzej HuntSmartphone remote control for LibreOffice ImpressMuthu Subramanian
ArturoPLTooling - More and better tests Michael Stahl
Brennan VincentImplementing a Microsoft Publisher import filter for LibreOfficeValek Filippov
Daniel BankstonCalc Performance ImprovementsKohei Yoshida
Daniel KorostilLightproof improvementsLászló Németh
Gökcen EraslanSigned PDF exportStephan Bergmann
iainbJava GUI for Libre-Office Based Android App(s)Tor Lillqvist
Marco CecchettiEnhanced Impress svg export filterThorsten Behrens
Matúš KukanTelepathy for collaborationEike Rathke (erAck)
RafaelNew templates picking UICédric Bosdonnat

Let the summer start immediately and let quality code fall like a spring rain!

Monday, April 02, 2012

LibreOffice CorelDraw Import filter - the best file-format coverage in the FOSS world

I just realized that has been a long long time since I last blogged about libcdr and the CorelDraw import filter in LibreOffice. Those that know me well can imagine that it is much more fun to write code then to write blogs. Nonetheless, one serious breakthrough happened this weekend and I cannot prevent myself from climbing on the roofs and shout.

On 20th of March 2012, Corel released a new version of CorelDraw Graphics Suite X6. We got the information from this Wikipedia page and downloaded the evaluation version on Friday. Although it was usual to see the file-format mutate a bit with every released version, this release changed the file-format substantially in what concerns the RIFF chunks. To cut the long story short, we managed to get the last pieces reverse-engineered today and we released libcdr-0.0.6 with support of all 32-bit CorelDraw formats, from version 6 to 16.

The new release tarball was integrated in LibreOffice which became the first and only FOSS application that supports versions 6, 15 and 16 of the CorelDraw file-format. This goodness will be part of our 3.6 release later this year. For those that do not know fear, the feature can be tested in daily builds that will start to appear tomorrow morning here.

I know that the distinguished readership prefers pictures to words. Here is this simple document in CorelDraw X6 format:

Terra in Corel 1  Terra in Corel 2

Here is the same document opened by LibreOffice Draw:

Terra in LibreOffice Draw

And here is the libcdr-generated SVG opened in Inkscape:

Terra in converted to SVG

If you are tempted and think that it might be fun to participate in a reverse-engineering endavour, we have with Valek two project proposals for Google Summer of Code 2012. The first is the implementation of MS Publisher import filter for LibreOffice and the second is to help to improve and extend the Corel Draw import filter I am currently blogging about. Try to apply with LibreOffice and your life will never be the same again.

Be aware though that the application deadline is the 6th of April and you will need to accomplish a simple programing task in order to be eligible. More details in this blog.