IJEDE – Article convertor (Word to HTML)

Introduction #

This document guides you through the structure and details of the “IJEDE Converter”, a tool that converts Word documents into HTML for IJEDE articles. This also serves as a base for the Future Earth Converter since they are largely the same.

Important Information #

At its core this converter uses a tool called mammoth.js, which can be viewed here on GitHub: https://github.com/mwilliamson/mammoth.js. If you want to learn more about how the converter works and what other tools it provides, please go through the docs there.

Since this converter returns the HTML code as a string, this converter mainly uses string manipulation methods to format the final document to align with IJEDE standards. Please be familiar with the following methods since these are the most commonly used methods.

replace() and replaceAll()

substring()

indexOf()

There is another version of this converter in the folder “French Converter”. It is a copy of the IJEDE converter, but with some additions so that it can convert French IJEDE articles. There has only been one French IJEDE article so far (as of Aug 2025), so there is room for improvement/change if another French article comes along.

Requirements and Formatting #

In this subsection, I will describe the requirements that the Word document needs have for the converter to work (at a minimum):

The structure of the main headings in the document must follow the following sequence (the end notes are automatically generated, so there’s no header that’s explicitly required). The Long/Image Description(s) are also not required, but if they exist in the document than they need to be in this order.

Introduction

References

Author(s)

Long Description(s)

Image Description(s)

If an IJEDE article doesn’t follow the above heading order, instead of adjusting the code to follow that structure I would recommend just directly editing the article and move the sections to follow the above order. This way, once the sections are partitioned into their respective variables in the formatTags() function (which will be explained later), you can format the HTML string in whatever order you want.

Headings are to be limited to Headings 1-6, since that is what’s available in HTML.

In the Word document, blockquotes must be formatted as Heading 7 or Quote.

Additional Information #

To use the converter, only uploading the Word file is needed at a minimum. So, whenever you need to quickly see the output of a change you made, just uploading the Word document is the quickest way to do so.

There are also comments within the HTML converter to guide you, both as section dividers and details about specific line(s) of code. If there’s anything that’s not covered or clear in this document, please look at the converter comments.

To test the converter, I used the VSCode extension “Live Server”, so I recommend installing that (or your preferred IDE equivalent) before you start editing the converter.

In the case of the French converter, it assumes the same structure as described in the Requirements and Formatting section above but assumes that the appendix (or appendixes) goes between the References and Author(s) sections. If another French article needs to be converted and the appendix is in another place, then the code or the document sections will need to be adjusted.

These methods are also used, but not nearly as frequently so only read up when necessary.

map()

sort()

split()

push()

Using the Converter #

Basic Usage #

Here is an overview on how to use the converter:

Fill out the necessary fields.

File name: Here is where you can name the HTML file the converter will output.

Author name(s): Here is where the author name(s) will go. Make sure what you put here matches what the author name(s) are in the Word document exactly. I usually just copy and paste directly from the Word document just to be sure. For two authors, please separate them by the word “and” instead of commas. For example, “Dr. A and Dr. B” instead of “Dr. A, Dr. B”.

Example for more than two authors: “Dr. A, Dr. B, and Dr. C”

Table dimensions: It’s whether the tables in the document are one- or two-dimensional. The default value the converter assumes is one (even when left blank), and this applies to all the tables in the document. This means that if there is a mix of one- and two-dimensional tables in the document, then you’ll have to add additional styling in the code in order to properly adjust how the tables look.

Everything else: The rest is self-explanatory where you either have to copy + paste in the information (like the DOI link) or just fill out the fields according to the Word document information.

Images: An input box to specify the number of images in the documents. Brings up a list of input boxes for the image filenames, should be in order of how it appears on the document.

Upload your file.

Press the blue button labelled “Convert”.

A light grey button should appear below the “Convert” button. Click it to download your HTML file.

To convert another file, either refresh the page or press the green button labelled “Convert Another File”.

Code Structure #

Since this converter is a single HTML file, dividing it into sections will make it easier to understand the overall structure/flow of the converter.

Section One: Front-End HTML/CSS #

This section covers the HTML and CSS code that is responsible for the structure and visual appearance of the converter for the users. This section is where you would add extra inputs or change the styling of the converter itself, not the HTML document the user will download.

Section Two: Front-End JavaScript #

This section covers the JavaScript code responsible for the behaviour of the converter’s front end. This is where it assigns the user input(s) to variables and makes the modal work for the “Requirements” button.

Also handles the number of inputs for images in the document. And uses the filename in the “src” attribute. If an image is missed it falls back to a default filename.

Section Three: Conversion JavaScript #

This section covers the JavaScript code responsible for the changes made to the HTML document that the user will eventually download. Here is where you’ll probably spend most of your time if you need to make any further changes to the converter.

Function Descriptions #

The following subsections outline the main functions used in the converter to format the initial HTML string given by the converter. Each function subsection will outline:

A general overview of the purpose of the function

What the parameters of the function are

What the function returns

Note. There are more functions than the ones listed, but those are mainly for:

Grabbing the user inputs and assigning them to variables.

Making the modal work for the “Requirements” button on the front page.

With those functions, they are simple and ‘non-essential’ enough such that they don’t need a dedicated section in this document.

convertFile(file File) #

Parameter: A file (with the extension “.docx”)

Returns: Nothing

Description: This function is activated when the user imports a Word file to the corresponding input. It then:

Takes that file and applies a style map to it (if necessary, read the GitHub repository for more information).

Formats the HTML string of the file.

Listens to the “Convert” button. When clicked:

It converts the HTML string into a downloadable HTML file

Adds the HTML file to an element

Disables all the inputs of the converter

formatOutput(string String) #

Parameter: String

Returns: String

Description: This function formats the raw HTML string of the user’s document before it gets converted into a downloadable HTML file and is probably the most important function. Here is where the HTML string is changed to fit IJEDE article style standards by using all the available formatting functions.

formatTables(string String) #

Parameter: String

Returns: String

Description: This function formats all the tables in the HTML string to IJEDE standards. This is the only function that uses a DOM parser to manipulate the various elements of the tables, so it is also (in my opinion) the most confusing to get used to. I recommend looking more closely at the code in this function to learn the overall structure and how to change each element of the tables.

Also handles image titles, alt texts and filenames

formatTags(string String) #

Parameter: String

Returns: String OR -1

Description: This function is the longest function in the converter, where it adds styles, classes and ids, and separates the HTML string into parts according to the existing headers. The separation of the HTML string into its parts is the most important part of this function, since it makes the formatting of specific sections easier, as well as reorganising the final output.

formatAuthors(string String) #

Parameter: String

Returns: String

Description: This function changes the style of the author(s) information in the HTML string, taken from the information the user inputs into the converter. The author information must be exactly copied from the original Word document, or else the function won’t work.

formatHeaderAndCC(string String) #

Parameter: String

Returns: String

Description: This function replaces specific substrings within the HTML string with either the header image (at the top of the document), or the CC license image. The specific substrings are added in automatically, but if the placement ever needs to be changed then just removing the code and manually adding in the text will also work.

addAdditionalChanges(string String) #

Parameter: String

Returns: String

Description: This function simply adds any additional changes to the HTML string before it’s converted into a downloadable file. Here is where you add any code if you need to add a class, id, change the wording, or anything else in the HTML string.

updateImageCount() #

Parameter: N/A

Returns: N/A

Description: This function dynamically creates input fields for image filenames based on the number of images specified by the user.

updateImageFilenames() #

Parameter: N/A

Returns: N/A

Description: This function collects and stores all the image filenames from the dynamically created input fields.

When Making Changes #

In my experience, there are always unique changes I need to make to the HTML string that are specific to one article. Because this will continue happening in the future, here are some basic guidelines (with examples) that I would follow to make the process as streamlined as possible:

Make a Separate Copy of the Converter

In general, I would suggest creating a repository of past documents that were converted. In each new folder of the repository, I would include:

A copy of the base converter.

Word document of the article.

This way you don’t have to bother deleting the changes you make to the converter, and you have a growing library you can reference in the future.

Changing the HTML String #

Always use the addAdditionalChanges() function when making any changes to the HTML string like adding classes to existing tags or inserting new tags/elements. This limits the places where errors could occur and allows you to access the HTML string when it’s already had some processing done and has classes/ids available that you could take advantage of. Whenever I need to add a change, I typically follow these rules (roughly):

Use console.log(output) so that you know exactly what the HTML string looks like. It also makes it faster to change because you can just directly copy and paste the substring you need to change

Use the backtick (`) instead of single/double quotes (‘ or “) when using methods. Using the backtick allows you to use the values of variables, which makes it a lot easier to insert data into the HTML string. Here are some examples of how I take advantage of this:

Instead of manually inserting the data the user inputs (i.e. Volume number, DOI link) by manually typing it into a string, I can just insert it like this:

<p id=”article_details”>ISSN: 2819-7046 – Volume ${volumeNo}, Issue ${issueNo}, ${articleDate}</p>

The use of ${} allows you to insert the value of a variable directly into the string. I can also do this in more creative ways, like taking advantage of loops. For example, in one article the images naturally had an <a> tag with an id referring to the image number (e.g. id=”Fig1”). And since I wanted to add an additional class of “figures” to the images, I used a for-loop to locate each image by iterating the number in the id with ${i}. ISSN has been added manually as it is the same regardless of the article

Adding CSS Styles #

When adding styles to newly created classes, add them to the extraStyles string variable. This way it’s easier to keep track of what you added and delete it once you’re finished converting the article you’re working on.

Changing Tables #

When changing the styling of the tables, it’s best to add in the extra code inside the innermost loop of the formatTables() function. The variables index (the table number), i (the row number) and j (the cell number) are the most important when wanting to add styles. To target a specific cell in a specific table, use the following code format:

if (index === x) { table.rows[i].cells[j].xxx = “”; }

Changing Specific Sections #

When changing the HTML string, sometimes it’s better to target a specific section if you want to target specific elements in specific sections. In this case, instead of adding code to the addAdditionalChanges() function (which targets the entire HTML string), I would recommend changing it at the very end of the formatTags() function when each section is contained in its own variable (i.e. variables like beginningText, mainContentText, etc.). This way your changes are contained to only a portion of the HTML string.

Testing #

I have never come across a document that was smoothly converted in the first 10 tries. There is always something that needs to be changed, so here are some tips I have when going through the process of converting a document into its proper HTML counterpart.

Initial test #

As a general note, other than the Word file all other inputs are optional when converting a Word doc to HTML (assuming the prerequisites are met). Therefore, I just upload the Word document and convert it without any of the other inputs to start. This gives me a quick overview of what the converted file is going to look like and allows me to see any major issues. After the initial test, how you approach fixing the issues is up to you.

console.log() is your friend #

If you uncomment the console.log() in the addAdditionalChanges() function, then the HTML string will be printed in the converter before you even convert the file. I find this extremely helpful, especially when I have to add/change something in a specific place, as it allows me to directly copy the part of the HTML string I need to replace. This makes the whole process easier and takes a lot of the guesswork out of it.

Processes Documentation

Guidelines & Standards

Custom Tools Developed