To install the OxGarage service from source, you need maven. To be able to run OxGarage, you also need tei-p5 stylesheets (e.g. tei-p5-xsl2_5.37_all.deb, or whatever current version of them is available). To fully utilise the potential of OxGarage, you need OpenOffice.org installed (at least version 3.0.0). Install these before you proceed. Then use Makefile to install OxGarage:
- Go to the oxgarage source directory.
- Run $ make setup – this sets all the necessary maven dependencies.
- Run $ make build – runs mvn install.
- Run $ make install – takes the created .war files and copies them into tomcat directory.
- Run $ make debianize – replaces the copy of stylesheets with a link to directory, where stylesheets are installed. By doing this, you can change the stylesheets without having to rebuild the application or copy them.
- Run $ make test – runs some test conversions and compares them with expected results.
- domain name in ege-webclient/WEB-INF/web.xml. You need to edit the file to add your domain name and port, e.g. instead of:
If you leave the localhost settings in place, it might work for some browsers, but other browsers will not render the site, as they will consider the website is trying to do cross-site scripting (as localhost is a different domain name from www.your-domain.com).
There is also some configuration which is coded in Java source files. If you have access to them, you can change these too, otherwise it is required that the files are located exactly where the servlet is trying to find them. Sometimes these locations are being referred to from several different places, so you might want to double check that you changed all of them.
- tei-config directory – this directory contains links to stylesheets and local directory, both of which are needed for performing conversions. This directory should be located in WEB-INF/lib/ in the main webservice directory.
- webservice/WEB-INF/lib/ – this directory should also contain all the necessary jar files (see the list below).
- webservice/WEB-INF/config/ – this folder contains fileExt.xml file, which provides appropriate file extensions for all mime-types used in OxGarage.
- webservice/WEB-INF/locale/ – here are located files hodling conversion properties descriptions displayed to the user. There is one file per language (currently English and Polish, but there are some things missing in the Polish version). However, as far as I know, the feature of choosing a language is not implemented yet. Nonetheless, at least one of these files is still required.
This service requires read permissions for the stylesheet directory and for the local directory, which are being linked to in the tei-config directory. It doesn't need write permissions for these directories. However, there are directories, which are used for caching and as temp directories.
- In EGEConstants.java, there are Strings, which define a data directory for the service. The service uses this directory for caching and also as a temporary directory. Currently this directory is set to be a home directory of a user. On my machine it was /root/ directory. Hence it used /root/.ege as its data directory.
The program is divided into 8 parts: API, framework, 4 plug-ins (1 validator, 3 converters), web service and web client. API offers only the base, on which the framework is built. The role of the framework is to search for all provided plug-ins, initialize them and calculate all possible input types and conversion paths. To do this, it asks each converter to provide a list of all conversions it is able to do. Then the framework constructs a graph, where different document types are nodes and conversions are edges. This graph is directed and weighted. Weights to the edges are assigned based on a subjective judgement of how good or bad the resulting document looks. The better the document looks, the lower the weight. These weights are then summed together and only the path with minimal total weight is offered to the user in case there are several routes available from input format to output format. Framework also provides for processing the path of conversions that are needed to be done and performing the necessary conversions in a chain of threads, where one thread passes its result to the next thread until the desired output format is reached. Each thread does exactly one conversion and uses a converter to perform it.
The role of validator is to validate documents before conversions. This is done in order to stop user from transforming a malformed document, as this could cause an error during conversion, or an unexpected result. Unfortunately, the validator is only capable of validating very few document formats (some XML documents) and hence it is not used very often.
Then there are converters, which do the conversion from one format to another. Each converter must be able to provide a list of all possible conversions it can do and also perform a conversion. Currently there are 3 different converters: XslConverter, TEIConverter and OOConverter. XslConverter and TEIConverter are using xsl style-sheets to convert between different form of XML documents. The main difference between them is that TEIConverter is used for a more complex conversions, e.g. conversions to and from docx and odt. The OOConverter is using a JODConverter library to start OpenOffice.org in a headless mode and then calls it to convert a document. More plug-ins (both converters and validators) can be added quite easily. If you are interested in this, I suggest reading http://enrich-ege.sourceforge.net/creation.html.
Web service is a servlet, that uses the framework to perform conversions. It is REST-full and you can control it simply using POST and GET request. First you need to send GET request asking for all the possible input formats. Then you need to send another GET request to get all possible output formats from a given input format. After this, you need to POST your file into a given URL and that's it. This can be particularly useful for batch processing a large number of files. For more information read http://enrich-ege.sourceforge.net/restws.html. Of course, if you already know the URL for the conversion, it is enough to POST your file to this URL without having to go through all these steps.
Adding new conversions can be done in two different ways. You can either build a new converter, or add new conversions into existing converters. Adding new conversions is rather different in each converter and you can find very brief instructions in the next sections. After you have added the format, you will also need to add new mime-type and extension pair into fileExt.xml file in the web service directory. It is strongly advised to use the same format description, format name and format mime-type for one document format, in case it is defined in several converters.
This can be done very easily. All you have to do is to add the new style-sheets into you stylesheets directory. Then you need to provide a plugin.xml file specifying some properties of the conversion. For and example of such file, see profiles/default/csv folder in your stylesheets directory. After it is done, you only need to refresh the web client page and new conversion should appear. Note that you can also add new conversions by defining them in the ege-xsl-converter/src/main/resources/META-INF/plugin.xml file. But then you have to recompile the whole application.
This is a bit more difficult. First you need to add the conversion information into Format.java file. After this is done, you need to define the conversion in the TEIconverter.java file. You might also need to look into ConverterConfiguration.java in order to change some conversion settings. When everything is finished, you need to rebuild and redeploy the whole application.
In order to do this, you need to add the document format into one of the files: InputTextFormat.java, InputSpreadsheetFormat.java, InputPresentationFormat.java, OutputTextFormat.java, OutputSpreadsheetFormat.java, OutputPresentationFormat.java. Then you need to change some of the Java files depending on the support of the new format by the JODConverter library.
As was mentioned before, each conversion is assigned a weight according to how much we trust the result. The better the result, the lower the weight. This has to be done, because there is a huge amount of possible ways how to get from input format to output format. Therefore, now the program chooses always the path with the smallest total weight, which is calculated as sum of weights of all conversions which form the path. If there is more than one path with the smallest total weight, one of the paths is chosen non-deterministically.
However, during time the conversions will surely become more refined and produce better results. Therefore, you might want to change the weights to make the service use the current best conversions more often. Again, what you need to do in order to change the weights depends quite a lot on the converter.
To change the weights in XslConverter you need to change the value of “cost” parameter in plugin.xml file. This file can be found in ege-xsl-converter/src/main/resources/META-INF directory. If the conversion you are looking for is not there, it is possible that it was added by definition in stylesheets directory. In that case, you need to find the appropriate plugin.xml file in your stylesheets directory.
In OOConverter weights are calculated as the sum of the input's and output's weight. Therefore, if for example in the new version of OpenOffice.org its ability to read docx files improves rapidly and you would like to reflect this in the weightings, you need to find the appropriate input type in the appropriate file. In this case it would be DOCX in InputTextFormat.java. Now you simply change the value of the cost variable and it's done.
- Conversion TEI → ODT produces corrupt documents, if the TEI XML document contains images.
- When converting from some spreadsheet documents, only the first sheet is converted if there is more than one sheet in the document. This is because OpenOffice.org can only convert the active sheet. However, when running in headless mode, it always takes the first sheet as the active one.
- When converting TEI → HTML → PDF, pictures get a little bit stretched horizontally. This is probably because of the OpenOffice.org running in the headless mode.