Tag: Import

Import Content from HTML, MS Word documents or Lucid

Post author By wp_help_lucidcentral
Post date 22 June 2020

The Import content (HTML, MS Word or Lucid data) option will allow you define a set of rules for each Project Topic that will attempt to match existing content in either HTML MS Word documents or a Lucid key and then import this into the Fact Sheet Fusion project. Each topic can have it’s own set of rules that define what to look for as well as what to exclude. The HTML and Lucid import options also support importing of images and their captions.

Important Tip

Defining sets of rules that work well across lots of existing content can be tricky, especially if you haven't done this type of operation in the past. It is highly recommended that you try out your import rules into a new database and project before attempting to import into an existing project that already contains content. This will allow you to easily check your results. It is also worth checking the imported topics code tab in the editor to ensure you are removing any undesirable formatting tags or unclosed tags. Another way to check the import success is to of course perform an export of the entities to view the resulting fact sheets. This again will help determine if your rules captured everything you need from the source documents.

It is also highly recommended you first make a back up of an existing database and media store prior to importing.

Importing Content

Step 1.

The first step in undertaking an import is to define the Entities you wish to import content for. These entities must already exist within your project. New Entities cannot be added via the Import dialog. Filtering of Entities can be done via the Filter by Subsets option located top left of the Import dialog. Entities can also be easily excluded from the import process by simply not selecting them to be included in the import process via the check box in the Include column. More information on this is detailed further below in Step 3.

Step 2.

Select the document format are trying to importing from. I.e. HTML, MS Word or a supported Lucid key or data file.

Step 3.

Use the document folder browse button to select the folder where the HTML, MS Word documents or Lucid key or data files are located. Once a valid folder or Lucid key file has been defined, the content importer will as if you wish to auto match the entities by scanning the folder or key for matching document types (HTML: .htm, .html or MS Word:.doc, .docx, or Lucid: .lk4, .lkc4, .data, .xml).

Documents found will then be compared against your Entity list. Matching documents (or Lucid Entities) will be automatically mapped as matching file names and listed in the ‘Matching Filename’ column. Closely matched documents will be highlighted in light yellow, while not so closely matched documents will be highlighted in a darker yellow. These non-matching documents should be examined to determine if they are the correct document to import for the given Entity. If the match document is incorrect, change the file name by typing in correct one in the ‘Matching Filename’ column, or use the browse option () to select the correct file. You can also preview the selected document by clicking on the preview icon.

Example of matched documents to entities.
Once you have mapped the Entities to the documents (or Lucid Entities) you wish to import against, select theses Entities by selecting the check boxes in the ‘Include’ column. You can select and un-select all Entities for inclusion or exclusion via the two check box options as shown in the screen capture below.

Step 4.

Within the middle of the import dialog is the list of the current projects Topics. Each of these Topics can be used to define a set of rules to capture content from the Entities matched document.

Select each Topic in turn to define the desired rules. Any Topic without rules will not have any content imported against it. After defining a rule or set of rules for a topic you can them against the selected entity documents. To test rules, select the topic containing the rules you wish to test, then select the entity matched document to run the rules against by clicking on the corresponding play button next to the matched filename column. If the rule(s) you have defined matches content within the document then the results of rule(s) will be displayed in a preview dialog.

Preview entity file, Play rule(s), and entity file browse buttons.

Once you have finished defining and testing your rule set click the Import button. During the import process Fact Sheet Fusion will show you the progress of the import via a green progress bar that will appear at the bottom of the dialog. Detailed information on defining import Rules are listed below.

Tip

If you are getting unexpected results from your rule set, check the Fact Sheet Fusion log file (via Help...About menu) as errors and warnings will be logged there.

Import Rules

Loading and Saving Rules

Defining rules for multiple Topics and images can take some time to get right, particularly when examining the document looking for consistent content and tags to match against. You can save your rule set to be used at a later date by selecting the save icon () (Ctrl + s shortcut key) or the Save As option located in the Topics Rules panel. This will save all rules for all Globally and Topics rules to a file with an extension called ‘.rules’. This can be loaded at a later time via the Open button () located to the left of the Save button. The loaded rules are matched against current Topic list. Any rule contained within a rules file that doesn’t match against an existing Topic will be ignored.

Note

If a topic has been renamed since the Rule set was saved you will either need to redefine the rules for that topic. Or you can edit the rule file and update the topic label in your preferred text editor.

Look For Rules

Look for rules are the instructions given to the import algorithm to find content within the Entities matched document. Multiple Look for rules can be defined, though each rule must be defined on its own line. Defining multiple Look for rules can be very handy when you want to capture content that maybe inconsistent between documents. For example, across many documents information may use consistent headings but differ slightly due to pluralisation. As an example some documents may contain a heading “Family:”, while others contain “Families:”.

Fragment content example:
<h4>Family: Proteaceae</h4>

A second fragment example showing a different identifier of interest, targeted for the same Topic:
<h4>Families: Myrtaceae, Mimosaceae and Rutaceae</h4>

Look for rules can be defined in a number of ways:

Note

The import algorithm uses an asterisk (*) as a wildcard character. A wildcard character is used to represent one or more characters when searching. The wildcard character is a reserved character when defining search rules.

Using the simple examples above dealing with capturing the family name(s), the following two rules would be defined:

<h4*Family:*</h4>
<h4*Families:*</h4>

Tip

The left string to find only defines part of the heading four tag. This is done because HTML tags may contain additional style definitions. E.g. <h4 style="...">. This way we ensure to capture any kind of heading four tag (plain or styled).

Only heading four tags would be returned that contained either ‘Family:’ or ‘Families:’. However since we already have a Topic in this instance is called ‘Family’ we don’t need to retain the ‘Family:’ or ‘Families:’ component of the returned match. This where we would use the exclusion rules to remove this text. Details of the exclusion rules are outlined further below in the Exclusion Rules section.

Using a single wild card option will return everything matched between the strings to find, but also include the strings to find in the returned matched text. For example, if we wanted to capture HTML tables and their content, we would need to search for the beginning and end table tags, but also retain them, so as not to break the HTML formatting.

E.g.

<table*</table>

Tip

Note the left string to find only defines part of the start of the table tag. This is done because HTML tags can contain additional definitions such as <table border="1" ...>. This way we ensure to capture all beginning table tags no matter how they are defined.

Additional Rule Options

When using the single wildcard search option it is not always desirable to keep the string to find. To remove it from the returned match use the following token ‘[-]’. To remove the left string to find you must define it to the left, while to remove the right string to find add it to the right. Adding the remove token to both the left and right strings to find is similar to the double wildcard, but without the additional ability to match on inner content.

As an example, if we were wanting to match on individual images within a block of images the only reliable next tag to search on may be the next image. E.g.

<div><img src=”../../pict.jpg” width=”500″ height=”600″ /><img src=”../../plant1.jpg” width=”450 height=” /><img src=”../seed3.jpg” width=”250″ height=”300″ />
<img src=”../../leaf223.jpg” width=”100″ height=”300″ border=”1″ /></div>

To capture these image tags we could do the do the following:

<img*<img[-]

In this example everything from the beginning of an image tag up until the next partial image tag (i.e. ‘<img’ ) would be captured. Though the ‘<img’ part of the right hand string to find would be discarded in the matched results.

Note

The above example would also need an additional rule to capture the last image as it would not have another beginning image tag for it to be matched on. See Greedy find option below.

The Greedy find option can be defined on to the end of any Look for rule and can be used in conjunction with the string removal rule. Consider the following example HTML content:

If we wanted to capture the last image within this div block we don’t necessarily have anything unique to match on. We can’t define the last image tag (<img src=”../../leaf223.jpg” width=”100″ height=”300″ border=”1″>) as the left string to find since the file name and size will change from file to file. We could use the start of an image tag (<img). E.g.

<img*</div>

However this would return from the first image tag found to the end Div tag. E.g.

Using the Greedy option tells the matching algorithm to keep matching until the last instance of the match is found. E.g.

<img*</div>[g]

Would return:

We also don’t need the end Div tag (</div>) as this would add a “broken” HTML tag to our matched content. To strip this from the matched results we just need to add the removal token. E.g.

<img*</div>[g][-]

This would return:

Exclude Rules

Exclude rules use the exact same rule types as the Look For rules, however the Exclude rules only work on the matched results of the Look for rules. Unlike Look For rules Exclude rules can consist of rules that contain no wildcard characters. When no wildcard character is defined the entire string block is searched for and if found removed from the matched results. Exclude rules allow you to remove undesired content such as words, or HTML tags. Each Exclude rule must be defined on separate lines.

Some example Exclude rules:

Comment below here
Remove the string ‘Comment below here’

<!–*–>
Removes all HTML comments

<br>
Remove all breaking returns

<font*>
Removes all start font tags

</font>
Removes all end font tags

<b*>
Remove all bold tags

</b>
Remove all bold end tags

<img*>
Remove all images

<div*>
remove all beginning Div tags

</div>
Removes all end Div tags

Tip

If removing specific tags ensure you remove both the start and end tag.

Topic Rules

Exclude check box option will exclude this Topics rules from the import process. This is useful, for example, if you have saved a set of rules for processing multiple folders worth of content but don’t wish this Topic rule to be processed for one or more instances.
Look for rules defined for the selected Topic.
Exclude rules for the matched content of the Look for rules.
Replace, if topic already exists check box will replace any topic text that may exist for that entity topic combination, if matching results are returned. If not checked and text already exists for the Entity Topic combination then no matched text will be saved.

Global Rules

Global rules are applied either for every topic or for matching images.

Clean HTML check box option will clean and remove any MS Word generated HTML formatting contained with the matched results of a rule.
Topics Exclude rules. Any topic exclude rules defined here will be applied to every Topic result match after any specific Topic exclude rules have been processed. These global Topic Exclude rules save you from having to defined the same set of rules for every topic where you may want to remove common elements.

Images

The Fact Sheet Fusion import algorithm can also capture images and their captions, saving them to your database media store and automatically attaching them to the Entity being imported against.

Note

Detection and retrieval of embedded images within MS Word documents is not currently supported.

Look for rules – The Image Look for rules are the same rule types as defined in the Import Rules outlined above. However in many instances images also have a corresponding caption. When matching on image tags any remaining content captured outside of the image tag is treated as the caption block. Consider the following example HTML that contains a table with images and their captions on the row below:

<table>
<tbody>
<tr>
<td><img src=”../../pict.jpg” width=”500″ height=”600″></td>
<tr>
<td><p>Example of the habit.</p></td>
</tr>
<tr>
<td><img src=”../../plant1.jpg” width=”450 height=”300″></td>
</tr>
<tr>
<td><p>Example of the tree in flower.</p></td>
</tr>
<tr>
<td><img src=”../seed3.jpg” width=”250″ height=”300″></td>
<tr>
<td><p>Mature seed pod.</p></td>
</tr>
<td><img src=”../../leaf223.jpg” width=”100″ height=”300″ border=”1″></td>
<tr>
<td><p>Bipinnate leaves</p></td>
</tr>
</tbody>
</table>

We could use the following Image Look for rules:

<img*<img[-]
<img*</table>[g]

The first rule will return each image along with all other content, except for the last image as there is no additional image tag to match on.
The second rule uses the Greedy matching option to find the closest image tag to the end table tag to pick up the last image.

If we were to just define these two rules to find the desired images then we would be left with lots of broken inner table tags such as row tags and column tags. e.g.

First match:

<img src=”../../pict.jpg” width=”500″ height=”600″></td>
<tr>
<td><p>Example of the habit.</p></td>
</tr>
<tr>
<td>

Second match:

<img src=”../../plant1.jpg” width=”450 height=”300″></td>
</tr>
<tr>
<td><p>Example of the tree in flower.</p></td>
</tr>
<tr>
<td>

Third match:

<img src=”../seed3.jpg” width=”250″ height=”300″></td>
<tr>
<td><p>Mature seed pod.</p></td>
</tr>
<td>

Final match:

<img src=”../../leaf223.jpg” width=”100″ height=”300″ border=”1″></td>
<tr>
<td><p>Bipinnate leaves</p></td>
</tr>
</tbody>

As you can see we need to use the Image Exclude rules to remove the remaining undesirable tags.

Following on from the Look for rules, these Exclude rules could be used:

<tr>
</tr>
<td>
</td
</tbody>

Each of the above string blocks consisting of tags will be removed from each match. E.g.

First match:

<img src=”../../pict.jpg” width=”500″ height=”600″>
<p>Example of the habit.</p>

Second match:

<img src=”../../plant1.jpg” width=”450 height=”300″>
<p>Example of the tree in flower.</p>

Third match:

<img src=”../seed3.jpg” width=”250″ height=”300″>
<p>Mature seed pod.</p>

Final match:

<img src=”../../leaf223.jpg” width=”100″ height=”300″ border=”1″>
<p>Bipinnate leaves</p>

Given these matches the import algorithm will take each of the image tags and find the corresponding image file and caption text, copy it and register it to the media store, then attach it as an Entity image.

Skip, if already Entity already has images option will not attach or store any matched images from the import if the Entity already has images associated with it.

Replace existing images option, if selected, will overwrite images that already exist in the database media store.

Tags HTML, Import, Lucid, Word

Fact Sheet Fusion v2

Importing Fusion Databases

Post author By wp_help_lucidcentral
Post date 22 June 2020

If you have existing version one (v1) or two (v2) Fact Sheet Fusion databases, you can easily import them via the import option found within the projects dialog. First select the Fusion database you would like to import. Fusion v1 databases should have ‘.fusion’ file extension, while v2 databases have a ‘.fusion2’ extension. These file types will be filtered in the database selection dialog.

Once a valid Fusion database has been selected, you can choose to import all of the available topics and entities. Or you can select as many topics and as many entities as desired. They can be selected via the ‘Select All’ buttons or individually selected by holding down the Shift or Control key and clicking an item with the mouse.

Select Project

This option will only be available if a version 2 Fusion database has been selected to be imported. Since Fusion v2 databases can store one or more related projects, you can choose which project you wish to import.

Tip

If you wish to import multiple projects from the same Fusion v2 database, just select each project in turn (along with your preferred import options) and click the import button.

Select Entities by Subset

In very large Fusion v2 projects there can be many Entities available in which to choose from when importing. If you have used Entity Subsets in the Fusion v2 database you have select to import from, you can select Entities to be imported by their Subset.

Note

This option is only available when a v2 Fusion database has been selected.

Select Glossary Set

This option allows individual Glossary Sets to be imported. By default, if the ‘Include Glossary’ checkbox has been selected, all Glossary Sets will be imported, along with their associated terms. If no existing Glossary Set in the receiving database has been selected or a new Glossary Set name not entered (see below), then the selected Glossary Set name will be used for the imported Glossary Set.

Note

This option is only available when a v2 Fusion database has been selected.

Match Entities by List

This button allows you to enter or load an Entity list that is then used to select Entities for import. This can be very useful if you are dealing with large groups of entities.

Please select or enter a new Glossary Set Name

Fact Sheet Fusion v2 introduced the concept of Glossary Sets (see the Glossary Set topic for more information). You must supply a Glossary Set name, if you are electing to import glossary items from a v1 Fusion database. Or select an existing Glossary Set, if it exists in the Fusion v2 database.

If you are importing from a v2 database, then this option may be left empty. If left empty then the Glossary Set name(s) from the source database will be used. If you define a Glossary Set name (new or existing) then all the Glossary Sets from the Source will be merged to the selected Glossary Set.

Note

If the same term exist multiple time across the selected Glossary sets, then the first term and definition imported will used, the other proceeding duplicate terms will be ignored.

Clean HTML

The Clean HTML option will attempt to clean any Microsoft Word (or Office Application) HTML formatting from each of the topics for the selected entities. This will give a much better result to the exported fact sheets since the Microsoft HTML formatting will not override the export template styles.

Include Media

You have the option of importing the media associated with selected database. This media will be copied to the destination databases media store location.

Include Glossary

If selected, glossary items and their definitions will be imported across to the selected project.

Importing

During the import process Fact Sheet Fusion will log all of the import actions to the log section in the interface. Any missing media, other problems, or errors will be listed here. Media that cannot be located will be automatically skipped during the import process. You can either correct this by copying the missing image(s) to the specified location and try importing again, or manually add the images to the Project at a later time. Once the import process has been completed, close the import dialog and open the project.

Tags Import

Fact Sheet Fusion v2

Importing Media

Post author By wp_help_lucidcentral
Post date 19 June 2020

Media Manager

Importing Media

The media import option is accessible via the Media Manager menu (Media…Import).

Import existing media via a CSV file. It is assumed the CSV file will contain a header row, which will be ignored when importing. The CSV column format is as follows:

[Entity Name], [Media Path], [Caption], [Photographer], [Copyright], [Comments], [Watermark Text], [Watermark Filename], [Review], [Exclude], [Sort Order], [Delete]

For example, the following rows are valid:

Entity three, C:\images\ent3\e3_bottom_view.jpg,”Bottom View”,,,,Find a replacement image,”[EntityName] – copyright [year].”,copyright.png,FALSE,FALSE,3,FALSE

As you can see from the examples, each column must be present, even if empty. The columns must be separated via a comma when viewed in a text editor. When viewed in Excel each element is shown as a column, the comma separator isn’t shown within Excel and is not needed as a part of the column data. The columns can optionally be wrapped in double quotes to avoid escaping commas contained in field. The [Media Path] field must be the absolute path to the media item. The content fields can contain HTML tags. Any tags will be automatically cleaned; any tags outside of the body tags will be removed.

The Delete field, if set to ‘TRUE’, will delete the image within the entity category, if found. If creating a CSV for the purpose of removing media then only the Entity Name, Media Path and Delete fields need hold values. The Media Path value can contain only the media item’s filename.

For example:

Entity name example, entity_one_20191.jpg,,,,,,,,,,TRUE

Any errors occurring during the import process will be logged to the default FSF log file.

Tip

You can quickly access the log file via the Help…About menu.

Tags Import, Media, Media Manager

Fact Sheet Fusion v2

Media Categories

Post author By wp_help_lucidcentral
Post date 19 June 2020

Media Manager

Media Categories

Media Categories are containers (or folders) to help manage a large number of media. You can define as many Media Categories as desired. Media categories can hold any supported media types.

Adding a Category

Fact Sheet Fusion Add Media Category Dialog

To add a new media category either select the ‘Add’ option from the Categories menu, or right click in the Media Categories list and select ‘Add’ from the pop-up context menu. The add category dialog will appear. Enter the category name and click the add button. The new category will be added below the currently selected category.

Removing a Category

Delete Media Category confirmation dialog

To remove a Category give it focus by selecting it in the Media Category list. Then either select ‘Delete’ from the Categories menu, right click on the selected Category context pop-up menu, or simply press the delete key. Prior to deletion you will be given a warning you are about to delete. There is no undo to this action.

Warning

If you delete a media category containing media items then all the associated media items will also be deleted. If these media items are linked to project elements such as entities or topics the media will automatically be unlinked from them.

Sorting Categories

You can manually sort Media Categories to your preferred order by drag and drop. Just select the desired category, hold down the left mouse button and drag the category to the desired position within the Media Category list, then let go of the left mouse button. Alternatively you can sort the Media Categories into ascending or descending order. Either select your preferred sort option from the Categories menu or by right clicking the Media Category list and selecting the sort preference from the pop-up sort menu.

Importing Categories

If you wish to create several Media Categories at once then use the Import option. This will allow you to paste or type a list of categories. Each Media Category name should be placed onto a separate line. Once you have finished entering your Media Categories click the ‘Import’ button. The Media Categories will then be added to the list.

Tags Categories, Import, Media Manager, Sorting

Fact Sheet Fusion v2

Fusion Projects

Post author By wp_help_lucidcentral
Post date 18 June 2020

A Fact Sheet Fusion project is held within a Fusion Database. There may be one or many projects contained within a Fusion database. A fusion database is where all the data is held necessary for creating fact sheets.

Creating a Project

To create a new project, type its name into the project name text box, then click the add button. When typing the project name any invalid characters will be removed and the project name limited to a maximum of 256 characters. You will receive a confirmation message once the project has been created.

Once the project has been created it will be listed in the current projects list.

Newly created project listed in the current projects list

Opening a Project

To open an existing project within the selected Fusion database click on the open icon.

Renaming a Project

To rename a project, double click on the project name, edit the project name, then click on the update icon.

Importing into a Project

To import an older fusion database (version 1) click on the import icon for the desired project. See Importing version 1 fusion databases for more information.

Deleting a Project

A project can be deleted by selecting the delete icon for that project. You will be warned prior to the deletion.

Warning

There is no undo available once a project has been deleted. They only way to recover a project is to restore a database from a backup. See Backing up your fusion database for more information.