egaia2.git log: v2.dev..HEAD


8ba0479 (2018-04-22)  (HEAD, master)
    Clean formatting with autopep8

a4a709f (2018-04-22) 
    Export figures.docx to the current directory; do not assume we are in
    root.

88bc375 (2018-04-20) 
    Update local item CSS for docx conversions.

5ac9a7b (2018-04-20) 
    Rename pdf, html derivatives to indicate source.

50b4e83 (2018-04-20) 
    Re-enable flag to ignore large files in export.

385f645 (2018-04-20) 
    Add footnotes to markdown processing.

3b4527c (2018-03-19) 
    Improve HTML output from the ``make`` module.

    Include the "Table of Contents" in html output.

    Provide a "static" target that copies the static directory and generates
    error pages using the html template for the site. This is mainly going to
    be useful for development purposes, where the template and styles are
    modified.

    Remove link to font-awesome, since we don't use it.

    If a target file already exists in the public directory, don't just ignore
    it but create a new version if the source is newer. If possible, create a
    hardlink so that disk use is not increased.

    Update the logic for downloads list presentation. Output a table with the 
    format, file type (source, h264, etc.), modification time, and size of each 
    file. Do not list video downloads for which a remote copy is available.

    Include source docx and svg files in the html catalogue, to allow users to 
    modify these and return to the archive curator.

    Update the pagination links in order to present a range of 3 pages on
    either side of the current one; otherwise we end up with an unwieldy list
    as the archive grows.

    For pages listing keywords, paginate at 500 items rather than 24.

    Use headers and paragraphs rather than a definition list for metadata.
    Among other things, this works better with tables.

    Add the missing "delete" parameter to the previews generation function.


789255a (2018-03-19) 
    Modify the ``roughcut`` tool to take CSV input.

    The new input format is more flexible, and will more easily allow us to
    develop tools that round-trip our data with standard EDLs for video editor 
    import/export. We now treat each UUID as a "reel" (we assume this is a 
    collection of source clips in a ``.vclips`` directory, and calculate
    offsets on that basis).

    Adjust the audio sample rate in title clips to match the rate for video
    clips, since a mismatch causes audio sync problems.

    Generate a finding aid based on the "caption" field for each edit. If there
    is caption text in a given line, extract a thumbnail image from the middle
    of the associated clip and print the thumbnail, time, and caption to a docx
    file (``BASENAME.df-video-description.UUID.docx``). Text from "title"
    fields will simply be included as full-width paragraphs. If a title is
    specified using the ``--title`` flag, that will be used as the title for
    the document (which will otherwise be empty).

    Require explicit flags telling the tool what to generate: ``--video`` or
    ``--finding-aid``.


daf5920 (2018-03-19) 
    Fix or handle some errors in the ``archivedotorg`` tool.

    Ensure that metadata strings are read as utf-8.

    Parse each of the metadata fields as a list (we no longer use strings).
    Allow for multiple "remote embed url" values.

    Since Internet Archive produces new derivatives, avoid confusion by
    stripping
    "df-h264" from the input filename, and simply use the UUID.

    Upload videos with the extension ".mpeg4", in order to force the creation
    of an mp4 derivative. This gets us a player with an HD/SD toggle.

    Add a ``--dry-run`` flag, which simply prints the metadata that will be 
    uploaded.

    Strip the trailing slash from the archive url that is included in the item 
    description.

    Only upload the core metadata fields (title, creator, description,
    subject).

    Provide an ``--update-docx`` flag, which will insert an embed URL into the 
    metadata document without actually uploading an item to the Internet
    Archive. This is primarily useful for situations where an upload was
    successful but the metadata update failed for some reason.


6080dfb (2018-03-19) 
    Add support for DCTERMS.tableOfContents and document template.

    Any table entered in the docx metadata document will be treated as a "table
    of contents", and parsed as a tabular array. One of the main intended uses
    for this function is item coding, as part of the research process.

    This change also introduces a default document template. Currently this 
    template includes style definitions for headers and text, but also provides
    a footer that will include the UUID, revision date, and page number for the 
    finding aid. If users change the document formatting for any reason, the
    default formatting can be reapplied with the command ``docx --update``.
    Currently the template is supplied within the static directory and cannot
    be modified.

    Also fix issues with the naming of headings used by LibreOffice vs.
    Microsoft Word.


ba43172 (2018-03-19) 
    Improve list handling of "special" directories (".dir" and
    ".vclips").

    Introduce an ``--include-dirs`` flag. Setting this flag will let us include 
    video stills or web page directories in our list of filtered items. Files 
    within those directories will not be filtered.

    Allow matching "orig*" for originals.


9f2d8e5 (2018-03-19) 
    In the ``bag`` tool, ensure that we use the labels from our config
    file.

    We now also require a flag for this command, to ensure that the command is
    only doing what we explicitly ask for. We now use "fast" validation, but
    updating the bag (the former default) remains very slow, as we have to
    calculate the hash of every file.


13d4754 (2018-03-19) 
    Overtly specify DCMI Type for vclip and docx items.

    This change fixes confusing metadata that result from parsing vclips as 
    inode/directories (not Moving Image), and docx as octet-stream due to a
    failure of python-magic on some of our files.


878e8b6 (2018-03-19) 
    Fix a coding error where we didn't apply filename sanitization.

59191ec (2018-03-19) 
    When parsing filenames, only look for the LAST matching UUID.

    This is prompted by a situation where we have a removeable disk that is 
    labelled with a UUID.

    We also provide a "getFormat()" function for derivatives, for use in the 
    html files list.


bfeb7d2 (2018-03-19) 
    Provide an ``--update`` flag for the ``derive`` tool.

    Move the fmtTime() function into the utils module.

    Introduce an "update()" function and flag that allow us to regenerate 
    derivatives where the derived file is older than the source. This is
    intended for use with editable files (mainly docx and svg), where we
    previously would have needed to force-regenerate everything or identify
    changed files and specify a UUID manually.


48e8f85 (2018-03-19) 
    Streamline docx to html conversion styles and functions.

    Move docx.Document instantiation and the fmtTime() function into the utils 
    module, since we reuse these in various places.

    Update the styles for docx to html conversion, so we have coverage of the
    main HTML5 semantic elements.


b958f95 (2018-03-19) 
    Provide a script to list the styles in docx templates.

a28f848 (2018-03-19) 
    Add docx templates for use by the docx and figs tools.

efe5442 (2018-03-19) 
    Rewrite the figs command to produce output in a separate document.

    Previously this command parsed docx-format source documents for archive
    UUIDs and added figures for matching items to the end of the document. This
    method proves a bit clumsy as it requires closing the document being
    edited, reopening, then copying and pasting from different parts in the
    same document. The revised tool takes a list of UUIDs supplied on the
    command line and adds them to a separate document.


8b67381 (2018-03-19) 
    Update CSS and provide Bootstrap 3 customization file.

de50d1b (2018-03-19) 
    Delete unused FontAwesome files.

d5fe53d (2018-01-16) 
    Disable multi-threaded processing for video stills (ffmpeg warning).

b03bcfa (2018-01-16) 
    Update derive command to produce HTML contact sheets, not PDF.

    The PDF output is useful as a standalone document, but does not provide an
    indication of the timing of each clip. Instead of this we will generate an
    HTML document with images embedded using "data:" URIs.

    Reduce the thumbnail size to 240x180px.


c85e0d2 (2018-01-16) 
    Update roughcut parameters for better error handling.

    Correct ffmpeg input parameters to deal with a/v stream mismatch.

    Handle missing offset data.

    Document requirement for audio track.


303d0e4 (2018-01-16) 
    Reverse append order in appending to docx.

a32f8e1 (2018-01-16) 
    Preserve list order in appending to docx.

54555bd (2018-01-15) 
    Fix typo in roughcut module

2ca49ad (2018-01-15) 
    Remove unused --commands flag from the base command.

57008c4 (2018-01-15) 
    Add null image for blank thumbnails.

636bcf1 (2018-01-15) 
    Update the roughcut tool.

    Render all working clips to 1080p, h264, 23.97fps, so we can accept a wider
    range of input footage.

    Specify in- and out-points in the input file, rather than frame numbers, so
    that we can have more control over the duration of segments.

    See also the previous commit, which introduced a change to the clipboard
    text generated for the html stills by ``egaia derive``, so as to be
    compatible with the new roughcut input format.


6bae415 (2018-01-15) 
    Support the use of ``.vclips`` directories of video footage.

    This change necessitates recognizing these directories in the sanitize,
    tag, list, docx, and meta tools.

    Also included in this commit is an ``--untag`` flag for
    ``egaia tag``, which may be useful if clips within a video footage
    directory are accidentally tagged.


eb14aea (2018-01-10) 
    Collage: fix error in parsing of filenames when files list input is
    given.

cc51927 (2018-01-10) 
    Fix error on attempt to update new collection from empty docx files

ada5932 (2018-01-10) 
    Fix typo in previews directory path.

59bb9a1 (2018-01-10) 
    Update documentation for make command.

90dfb57 (2018-01-10) 
    Html generation (make): Process item and collection titles as lists.

    Separate the item and collection preview generation and retrieval into 
    distinct functions, so we can store previews to disk, reducing the 
    catalogue generation time.

    Convert targets to a list, retrieved as a positional argument.


917a64b (2018-01-10) 
    Docx parser: Ensure that the item/collection title is parsed as a
    list.

f876735 (2018-01-09) 
    Update derive strategies.

    Support pptx input.

    Generate html from plain-text input.

    Use a smaller text header size for video finding aids.

    Don't scale h264 videos.


9c759a3 (2018-01-09) 
    When creating truncated descriptions, don't truncate after the first
    sentence.

5034bc2 (2018-01-09) 
    Fix coding errors in roughcut input file processing.

fd09d50 (2018-01-09) 
    Improve html generation in make command.

    Don't embed audio in the responsive div.

    Reduce calls to parsefn.

    Sort items by input filename, rather than by uuid, in collection lists.

    Concatenate description field lists before generating the truncated version
    for preview.

    Style the container for preview images on item pages.


9e6d8f7 (2018-01-09) 
    Try different decoding strategies for filenames, to avoid character
    issues.

b45c9c3 (2018-01-09) 
    Update docx command.

    Don't sort lists; keep the order given, as this may be significant.

    Get the uuid for thumb retrieval from the input filename, not the metadata,
    as the former is more reliable.

    Catch an exception where python-docx is unable to recognize some of the 
    images generated by ffmpeg.

    Make sure that everything is a list, in csv2docx.

    Use hyphenated forms for flags (``--to-csv`` instead of ``--tocsv``) for 
    consistency with the rest of the program.

    Make the DOCX argument optional.

    Do a dry-run by default for csv to docx conversion.


ddb7bc9 (2018-01-09) 
    Add cover-size collage generation when 1-4 images are present.

b1afa84 (2018-01-09) 
    Fix flag typo in bag command.

59d7b2d (2018-01-08) 
    Fix field key error for remote embed urls

4ed05fd (2018-01-08) 
    In docx from csv, halt processing on empty metadata lists.

f548577 (2018-01-08) 
    Properly fix prefix handling in egaia collage.

24f5efd (2018-01-08) 
    Provide flag in egaia bag for rebuilding the collection finding aid.

4451a34 (2018-01-08) 
    Apply fix to prevent tagging of files in web archives.

5a334c6 (2018-01-08) 
    Fix collage function to allow empty prefix.

225202a (2018-01-08) 
    Remove reference to deprecated switch in egaia make.

1ac8d59 (2018-01-08) 
    Add missing import in egaia tag

e135446 (2018-01-08) 
    Update documentation.

7a402dd (2018-01-08) 
    Change reference from egaia init to egaia bag

ac233c8 (2018-01-08) 
    Move the clipboard.js code into the strings module

a70e579 (2018-01-08) 
    Allow the option to use prefixes for collage images.

da31d3a (2018-01-08) 
    Remove the commands list from the base command.

efde571 (2018-01-08) 
    Rename egaia web to egaia serve

21d3a9f (2018-01-08) 
    Remove unused command files that were committed by accident.

e9c34f3 (2018-01-07) 
    Simplify the base command system.

    Rename "egaia bagit" to "egaia bag", and incorporate the functionality of
    "egaia init".

    Simplify the flag names in "egaia docx" (``--tojson``, ``fromcsv``, etc.).

    Move "egaia uuid" into "egaia tag".

    Remove the deprecated "update" command.


bb22d3c (2018-01-07) 
    Rename "pub" as "make".

    The command is now closer to the GNU Make tool conventions, with output 
    formats listed as named targets rather than as flags (e.g., "egaia make 
    item-pages"). Not all of the output formats are necessarily generated for
    publication; this is particularly the case with json, which is required by
    other tools.


de9f101 (2018-01-07) 
    Update docx2html conversion to make use of defined styles.

    We support HTML5 semantic elements (header, footer, figure/figcaption) and
    elements mapping to the default metadata fields recognized by Pandoc
    (.author, .date, .title). Additionally we provide mappings to blockquote,
    .pullquote, .publisher, .series-title, and .lede.

    There is also a ".wrap" div that allows us to set padding/margins for the
    main content.


9bb44ad (2018-01-07) 
    Fix figs command to use styles and accept relative path input.

    When appending images to docx, add with the 'Figure' and 'Caption' 
    paragraph styles.

    Also remove the collection root check, as this forces all input filenames
    to be relative to the collection root.


3977003 (2018-01-07) 
    Add some annotations (help comments) to the default config file.

220ad98 (2017-12-30) 
    Disable markdown extensions in md2html.

    We will run plain text fields through the Markdown converter, so as to
    generate clickable hyperlinks and so on. Metadata and other extensions can
    interfere with parsing.


cbdd267 (2017-12-30) 
    Treat missing metadata elements as empty lists when generating html.

b87b70c (2017-12-30) 
    Place docx files next to the originals.

9f055c4 (2017-12-30) 
    For thumbs, use input frame numbering starting with "1", to match page
    numbers in pdfs.

7ee5f3a (2017-12-30) 
    Use "metadata-xx.docx" for the collection metadata.

7fc6ed0 (2017-12-23) 
    Update documentation

2a8980b (2017-12-23) 
    Fix globbing in docx module to allow recursive matching

1530540 (2017-12-23) 
    Treat subject and description as (potentially empty) lists in pub
    output.

578d31d (2017-12-23) 
    Fix image sizing for docx files.

    Check if an image is in portrait or landscape mode, and size accordingly.

    Additionally, avoid checking dimensions for svg images, as this causes a 
    system crash.


d3fba23 (2017-12-23) 
    Fix paths in csv2docx.

    Write the docx file to the same directory as the original file, or skip if
    the item is not present in the current collection.

    Provide a width for thumbnail images, rather than a height, so we don't 
    extend past the right margin (but this may cause us problems with very tall
    images).


8ce2f0b (2017-12-23) 
    Fix flag name error in docx

64f9b0c (2017-12-23) 
    Fix multi-line values and provide missing binaries in setup.

    Allow editing of multi-line values in the config file by concatenating and
    stripping the string.

    Add montage, wget, etc. to the list of binaries to be added to the config
    file.


bed1d6f (2017-12-22) 
    Move option descriptions from command-line help into online docs.

    The command-line help text is far too verbose to be usable, and limited in
    its formatting. By moving the details into the manual, we are also able to
    take advantage of rst directives to include screenshots, etc.


11caeff (2017-12-22) 
    Change default metadata format from csv to docx.

    This major change is intended to accommodate simpler data entry for
    individual items -- particularly those with many metadata fields or with
    very long descriptions, for which a spreadsheet is clumsy. Docx is an
    inefficient storage format but has the advantage of being friendly to
    non-technical contributors. We retain batch editing capability via csv.

    As part of this change, we use localized metadata labels rather than the
    Dublin Core keys, so that metadata documents can be used for both editing
    and direct distribution. Mapping of labels to keys is provided in the
    config file.


acc4e4f (2017-12-22) 
    Provide support for intertitles in the roughcut tool.

7d83512 (2017-12-22) 
    Begin moving some long variables into a separate file (strings.py).

fd9c112 (2017-12-22) 
    Allow case-sensitive slugs.

d671b4c (2017-12-22) 
    Move the doc2html logic into utils, egaia_derive, and egaia_figs.

    We will now automatically convert all docx documents into html during the 
    derive process. Instead of injecting figures at this stage, we use the
    "figs" command to import figures directly into the source docx document,
    which can then be further edited and ultimately used as an illustrated
    source for both pdf and html derivatives.


2c0e2f8 (2017-12-22) 
    Deprecate and remove the doc2html script.

    We will move this into the derive command, to provide a more intuitive and
    streamlined workflow.


a8da969 (2017-12-22) 
    Add GPL text to the README file.

a826a9e (2017-10-12) 
    Update pub command formatting parameters and allow alternative item
    templates.

    Suppress the dc.format field in metadata output, because it is confusing
    (this relates only to the original file, but not the one that is 
    necessarily shown online).

    Allow for other functions to use alternative item templates by passing 
    these as strings.

    In the default item preview template, list the author's name as a subtitle,
    as this helps to identify items in collection listings.

    Since we no longer supply a title in the csv metadata by default, use the
    original filename as a fallback in item descriptions.

    Allow dc.source to be a list.

    Rename "json_str" to "meta_str", since the string itself is not actually 
    json.

    Retrieve the thumbnail directly from the item directory, and not from the
    metadata.

    Fix a bug where thumbnail and medium images were being listed as downloads.

    Copy both collection thumbnail and the new collection cover image to output
    directories.


93c28b4 (2017-10-12) 
    Disable automatic writing of dc.title and dc.data metadata.

    It is convenient to have the metadata entered on the basis of the filename
    and modification/creation timestamp, but this metadata overwrites any
    user-supplied information entered prior to tagging (i.e., where metadata is
    entered manually based on the original filename field as key). Moreover, it
    turns out the date is quite often incorrect when files are imported from
    third-party sources (Internet Archive, Wikimedia Commons, Library of
    Congress, etc.) because the file creation date may refer to the upload date
    whereas we need the document creation date.


f211703 (2017-10-12) 
    Ensure that the exclusion pattern for the list command is in fact a
    string.

98cb186 (2017-10-12) 
    Update doc2html output formatting.

    In doc2html, set the document title as an h1 element, in order to prevent
    creating a blank section.

    Use a custom item preview template with full-size (cover) images.

    Use a 10-column width for non-masonry layouts, and place images as inline
    text blocks (in rows of two) rather than in the margin. This fixes unusable
    layouts that were generated in testing with variable proportions of images
    to text.

    Add bootstrap "table" class to tables for better formatting.


35d3cac (2017-10-12) 
    Make general fixes to derivatives production.

    Set density of pdf images so we don't end up with low-resolution 
    thumbnails.

    Return to 400x300 video image thumbnails, as larger thumbs can create 
    memory errors when we view the html overview page for a long video.

    Fix a quoting error in the clipboard javascript, and move the js into the
    function where it is used.

    Fix the incorrect output directory for video stills pdfs.

    Introduce a ``--force`` flag, which overwrites existing derivatives.


4a42b7b (2017-10-12) 
    Fix indentation bug in csv writer, so rows get updated properly

d0b8da3 (2017-10-12) 
    Create a separate collection-cover image for the collection description
    page.

    Also fix a blocking bug due to typo in the command line arguments in the 
    collage tool.


a7b20fa (2017-10-12) 
    Update documentation format settings and fix minor typos.

    Remove reference in docs to deprecated exhibit module.

    Fix sidebar width to avoid overflow.

    Output source code in the documentation.

    Fix typos in sample scripts in the documentation.


76389d1 (2017-10-12) 
    Update the truncate utils.truncate function to allow for a full
    (non-truncated) item description string.

7f76d44 (2017-10-06) 
    Update the list of available commands in main help text.

04f4aba (2017-10-06) 
    Add "nolinks" and "json outdir" options to egaia pub.

    The ``--nolinks`` flag allows us to generate unlisted items, whose 
    description pages do not contain links to collection description or keyword
    index pages (as the items are not listed there).

    The ``--outdir`` flag for json output lets us export json metadata for 
    collection items to an arbitrary directory, for use by external tools.

    Item listings are now sorted by creation date and time, in reverse 
    chronological order (newest to oldest), rather than in the arbitrary order
    of their location on disk.


2bf089d (2017-10-06) 
    Introduce json import/export action for egaia csv.

    This utility is intended to support collection metadata translation. Json
    is preferable to csv in translation workflows, because reliable processing
    of newline-separated fields in csv is not guaranteed. Conversely, CSV is
    much better than json for user data entry, especially where we are trying
    to work with multiple items sharing similar metadata.

    The ``--import`` action for the csv command is non-destructive; it will 
    abort if the target metadata csv file already exists.


042f37f (2017-10-06) 
    Add details to docs on scripting, with sample shell scripts

cd9f8f0 (2017-10-01) 
    In derive tool, print error if Libreoffice conversion fails.

412118e (2017-10-01) 
    Update roughcut help string

f308ff8 (2017-10-01) 
    Fix bug in tag function, where tagged files were placed in root
    directory.

8fda86b (2017-10-01) 
    In pub command, rename --dumptemplate flag to --dump-template.

9b4736b (2017-10-01) 
    Set language in the alias template in doc2html

bf8f2c6 (2017-10-01) 
    Move the language switch example out of the config tool docstring, as it
    interferes with docopt parsing.

a343413 (2017-10-01) 
    Update documentation

5df3631 (2017-09-29) 
    Remove slugify dependency. Add the locale directory to package data.

c2a362c (2017-09-29) 
    Allow stopping and re-starting the server without socket re-use
    error.

2cb4147 (2017-09-29) 
    Provide our own slugify function.

    We now no longer use slugify, which was giving empty strings for Unicode 
    characters and stripped some characters we wanted to keep in filenames
    (underscores and periods).


fcc5d14 (2017-09-29) 
    Update the pub tool to work more nicely with internationalized archives,
    &c.

    Open all json files using codecs, so as to force utf-8 data storage, and
    decode on input.

    Decode strings from the config file as utf-8.

    Use the "boolean=True" argument for getConfig.

    Retrieve translations for the language specified in the config file.

    Use the "makeSlug()" function from egaia_sanitize.

    Insert a subtitle in the document header, to differentiate between index
    types.

    Update the favicon path to use the static directory.

    For collection pages, use available metadata from the spreadsheet rather
    than from bag-info.txt where a collection description record is available.
    This change is primarily intended to facilitate translation, since the
    BagIt specification doesn't accommodate localized versions of the
    bag-info.txt file, and mixing translations in the same file is not
    compatible with most translation tools.

    Create language-specific keyword indexes.

    Don't normalize keywords, but instead store them as-is within the indexes,
    so that we can retain proper capitalization and hyphenation data; only the 
    filenames will be converted.

    Introduce an option to generate descriptions of different lengths when 
    generating item previews, depending on the context. This is used in
    doc2html, where we would like longer descriptions in the image-centric
    masonry exhibit layout, but short descriptions when we have smaller margin
    thumnails.

    Update help text, including a new note and example for embedding a book
    from Internet Archive using the "remote_embed_url" metadata field.


fe9c605 (2017-09-29) 
    Minor change in debug comment.

bd252e9 (2017-09-29) 
    Update "derive" to fix bugs and provide clipboard copy from contact
    sheets.

    Add a javascript function from clipboard.js to video thumbnail html pages
    (contact sheets), that copies the image filenames to the clipboard when 
    clicked. This facilitates use of the ``roughcut`` tool, since the desired 
    images can simply be clicked on in the contact sheet and the filename then 
    pasted immediately into a plain text file used for generating a video clip.

    Enable ffv1 conversion, but only if explicitly allowed in the archive
    settings.

    Fix the directory path for derivative output, so that we use the absolute
    path rather than the current working directory.

    Fix a bug with the "frame" argument; it should be passed as a string rather 
    than as an integer, since "0" converts to "None".


b8afc86 (2017-09-29) 
    Update the "csv" command to accommodate metadata updates from the
    command line.

    Fix the update function so that it is actually possible to update rows with 
    existing metadata.

    Provide a ``--set`` flag that allows us to modify item metadata from the 
    command line, without opening the CSV file directly, for use in scripts.


10e940a (2017-09-29) 
    Update the "config" command.

    Add documentation about path and an example for changing settings from the 
    command line.

    Allow setting multiple values in the same command.

    Add "ffv1 = False" as a default setting.

    Make use of the "boolean" flag for the config parser, which normalizes
    settings for us (e.g., the strings "True", "true", "1", and "yes" all
    convert to
    "True").


86d6997 (2017-09-29) 
    Add the collection description, title, and uuid to the csv file, to
    allow easier localization.

e800537 (2017-09-29) 
    Minor update to help text.

9abb540 (2017-09-29) 
    Fold exhibit tool into doc2html.

    The doc2html tool now gives two output options for processing Markdown or
    Word (docx) documents -- "masonry", which lays out images and text in two
    animated columns, and "figs", which places smaller image thumbnails in the
    margin to the right of the text.

    We don't have the same content flexibility as was given with the
    "exhibit" tool, but this change provides a more consistent input format and
    predictable outputs (customized item descriptions and images can be 
    counter-intuitive for the end user).


a5925e5 (2017-09-29) 
    Update Bootstrap theme and add default favicon.

e167af1 (2017-09-15) 
    Allow updating of metadata for existing tagged files.

    This is necessary if we have user-generated files that are updated
    (e.g., Markdown or docx), as the file size and creation date will change.

    Currently this is a non-destructive action, so to overwrite existing 
    metadata the fields need to be emptied manually first; this is to protect
    against cases where, for example, the manually-entered date for a
    photograph might be overwritten by the modification date of the digital
    file, which is probably not what we want.


5ccb2fe (2017-09-15) 
    Implement multilingual feature (content negotiation).

    Metadata spreadsheets and generated html files now include the language 
    code in their filename. "metadata.csv" will now be "metadata.en.csv" by 
    default (and will need to be renamed in existing collections); similarly,
    "index.html" files will now be generated as "index.en.html".

    We can translate metadata spreadsheets and subsequently generate catalogues
    in a supplementary language by toggling the configuration language, via
    ``egaia config --set language:xx`` (where "xx" is the ISO 639-1 language
    code). If the generated html documents for different language are placed
    alongside one another in the same directories, it is possible to have the
    web server perform content negotiation.


ad7916e (2017-09-15) 
    Allow tools to use filenames instead of bare UUIDs for input.

    Wherever we need an input item identifier, we will parse the input to 
    extract the UUID from the input string. This is more user-friendly because
    it allows us to make use of shell completion. We can now type in the first
    few characters of a filename (``anth*``), or a wildcard and the first
    few characters of a UUID (``*cfbb*``), to get the complete filename on
    the command line.


f2b0339 (2017-09-15) 
    Allow the roughcut tool to take "segment-NNNN" input.

    This is more logical for manually generated lists (e.g., using intertitle
    clips), and potentially allows us to distinguish between clip types
    generated through different processes.


2b9622d (2017-09-15) 
    Move collection presence check into tools themselves.

    This method involves duplicated code, but allows us to run each tool with
    some options that don't actually require us to be in a collection
    (notably ``--help``).


d30842e (2017-09-13) 
    Minor fix in handling missing link fields for figures

1933d7b (2017-09-13) 
    Clean up egaia meta to remove references to non-implemented
    functions.

dbbd08b (2017-09-13) 
    Add csv update function and fix bug in csv module.

    This change allows us to update an existing metadata spreadsheet, merging
    new values into existing data. We can now edit a spreadsheet using the
    "original_filename" field as key, and update after tagging files with
    UUIDs. This accommodates a workflow in which items are individually added
    to a collection along with their metadata within the same session, as the
    metadata spreadsheet can remain open for editing within a graphical editor.

    Add the Dublin Core fields here; they were removed from the config file in
    an earlier commit, but this module continued to reference egaia.cfg.


39532cb (2017-09-10) 
    Provide option to specify page or frame number in thumb image
    generation

92da6ab (2017-09-10) 
    Fix image url on html description page

05ad568 (2017-09-10) 
    Fix encoding issues in exhibit

702fbc1 (2017-09-10) 
    Rewrite the ``exhibit`` tool to use CSV input instead of YAML.

    The YAML format is convenient but error-prone if external collaborators are
    being asked to supply data, since precise indentation matters.


bd94018 (2017-09-10) 
    Remove unused config variables and add a --set flag for egaia
    config.

07b15ae (2017-09-10) 
    Update formatting.

    - Put the metadata tables inside panels, so they can be differentiated
     from the content of embedded pages.
    - Fix the uneven spacing in metadata lists containing paragraph elements.
    - Left-align images at the top of item and collection pages.
    - Update help text for the ``pub`` command, to reflect the option of
     using ``config --set``.


ae50bdb (2017-09-09) 
    Allow the public directory to be specified on the command line

418f172 (2017-09-09) 
    Rename some cli commands

    - egaia ia -> egaia archivedotorg
    - egaia mkcollage -> egaia collage
    - egaia mkcsv -> egaia csv
    - egaia getmeta -> egaia meta
    - egaia mkindex -> egaia pub
    - egaia mkuuid -> egaia uuid

    These commands were originally named for their core actions (e.g.,
    "make" or "get"), but they offer greater functionality than this.

    ``ia`` is renamed ``archivedotorg`` to avoid confusion with the ``ia`` 
    command provided by the internet archive package, which is installed along
    with egaia.


1d3b2a5 (2017-09-09) 
    Organize command line tools into categories in the documentation

49d2fbc (2017-09-09) 
    Remove generated documentation from the git repository.

09a29a9 (2017-09-06) 
    Fix version parsing function to recognize dev versions

5a101c1 (2017-09-06) 
    Use release format 'vN(.devN)'.

bd25355 (2017-09-01) 
    Add rm() function for deleting files or directories.

dec1af5 (2017-09-01) 
    Allow use of a filename or UUID in exhibit yaml.

787d4b4 (2017-09-01) 
    Clarify documentation about list filters.

c8fee47 (2017-09-01) 
    Create sections after conversion to html, to accommodate all input
    formats.

f822584 (2017-08-21) 
    Provide improvements and a "remove" flag for the mkindex command.

    - Generate previews directly from json files, rather than saving to disk. 
    The performance penalty is minimal, and this allows for different output 
    templates and formats.
    - Provide a "remove" flag that allows us to remove an item from an index or
    to "unpublish" an a item or collection
    - Provide short options that can be stacked (``egaia mkindex -jcidx``) 
    intead of using the more ambiguous "collection" flag.
    - Allow multiple values for the "Contributor" metadata field


e12ceab (2017-08-21) 
    Update the list of commands in egaia.py

bdaa4f2 (2017-08-21) 
    Add support for docx parsing. Rename md to doc2html.

1867c11 (2017-08-11) 
    Make narrower content width for unfiltered Markdown output.

3f7adf6 (2017-08-11) 
    Remove Markdown processing from egaia derive; fix clip names in egaia
    roughcut.

b47957d (2017-08-11) 
    Separate Markdown item processing into a separate egaia md command.

7a90348 (2017-08-08) 
    Add new tool for generating rough-cut videos from lists of still
    images.

88fa1fb (2017-08-08) 
    Add requirement for pyyaml to setup.py.

6b4fcff (2017-08-08) 
    Increase video stills size to 800x600.

72622ac (2017-08-08) 
    Enable recursion into video still directories in egaia list, as stills
    may be needed.

90ef1da (2017-08-08) 
    Add optional metadata parsing in Markdown conversion utility.

d979114 (2017-08-08) 
    Add simple web server for testing html output, allowing absolute
    paths.

98f74d2 (2017-08-08) 
    Add javascript for masonry layout to be used in exhibit pages.

dc00582 (2017-08-08) 
    Add new tool for generating exhibits, as manually curated lists of
    items, based on YAML files.

    This tool replaces the "featured_items" option for "egaia mkindex". It is
    more flexible as it allows for arbitrary combinations of text and items,
    and allows for exhibit pages to be treated as archive items.


44e85e8 (2017-06-28) 
    Document Markdown processing in egaia derive.

542f97e (2017-06-28) 
    Add localization strings for HTML output.

eb57965 (2017-06-27) 
    Fix formatting of help strings.

0e7ed19 (2017-06-27) 
    Fix dimensions of collage images to accommodate 2px borders.

4c253eb (2017-06-27) 
    Use Markdown meta extension in md2html().

649dfe3 (2017-06-27) 
    Allow specifying of csv path in loadCsv, so we can work outside a
    collection source.

d15200a (2017-06-27) 
    Generate html derivatives from Markdown documents within the
    archive.

    The raw Markdown will be parsed to locate items that are referenced in each
    line (e.g., "[my item](item/)") and a small gallery of thumbnails
    will be inserted above that line. This will of course break formatting if
    lines are hard-wrapped. There is currently no handling of items that are
    cited multiple times.


55037e9 (2017-06-26) 
    Use relative base href instead of calculating relative links.

    This change makes the templating system much simpler, but more importantly
    allows us to create links relative to the archive root in Markdown
    documents within the archive. So a document in
    mycollection/data/path/to/file.md can reference "item/uuid" rather than
    "../../item/uuid", for instance.


d5db0f5 (2017-06-26) 
    Add "featured items" option to mkindex.

    This tool creates a manually created index of featured items, based on a
    spreadsheet containing one or more of the following fields: item (UUID),
    title, description, image (UUID of an item whose medium image should be
    used). The index can serve as a gallery or portfolio page.


0864efe (2017-06-24) 
    Convert Markdown items to embeddable html, and include in the item
    description pages.

a4828db (2017-06-23) 
    Update documentation

e8dd46b (2017-06-23) 
    Update configuration defaults to list montage command and set ia_noindex
    to false

61bee86 (2017-06-23) 
    Fix bug in mkcsv where updateRow() would fail if headers couldn't be
    read.

c7740b8 (2017-06-23) 
    When listing collection files, don't recurse into video stills or web
    page directories

462112b (2017-06-23) 
    Return False if the mkcollage command fails due to an absence of images
    to process.

a09c844 (2017-06-23) 
    Fix some limitations of the Internet Archive tool.

    Handle upload errors. Treat fields with newlines as lists of multiple
    values, to avoid processing errors on upload (which are silently ignored).
    Instead of uploading original files, upload the new "df-h264" derivative
    format, which is optimized for streaming and works well for HD files.


386c84c (2017-06-23) 
    Move generic functions to the utils module.

4aaf423 (2017-06-23) 
    Add full-size h264 video export, compatible with YouTube recommended
    settings.

7c169f3 (2017-06-23) 
    Refactor the mkhtml command, and rename mkindex.

    These changes make the html catalogue generation more atomic, and
    accommodate additional output formats. Item metadata is now stored in json
    files within the output directory, so that it is possible to reformat and
    regenerate indexes without relying on access to the source collection. Html
    index snippets with thumbnails are now stored in the item directories,
    rather than in pickles, to avoid data duplication and to enhance
    scalability. It is now possible to modify the metadata for an item and
    update the index without having to rebuild it entirely. (Previously, if we
    changed one of the keywords for a collection, it didn't get removed from
    the html index. The only way to fix it was to delete the index and 
    regenerate for ALL the collections.)

    An enhanced and more modular set of commands allows for the construction of
    json, description pages, and indexes in separate stages, which can benefit
    a planned workflow involving Makefiles.

    The html templates are now more cleanly separated from the index generation
    logic. In cases where thumbnails are not available, random placeholder
    images are now generated.


4970808 (2017-06-22) 
    Add "mkcollage" command.

    This tool creates a collage of four images from the collection, to serve as
    the cover or "thumbnail" image for the collection. With updates to the
    mkhtml command (separate commit), these larger collages will be used in the
    collections index page in order to accommodate longer descriptions and to
    distinguish collections visually from items.


c549e2c (2017-06-22) 
    Fix title and description parameters in the init command.

    The incorrect values were being passed to docopt when "title" or
    "collection" were set on the command line. This has been fixed, and the 
    tool additionally now prompts the user for a title and description if they
    are not entered through command line parameters.


707e7d5 (2017-06-17) 
    Add medium-size image derivatives for videos

6719601 (2017-06-17) 
    Allow list filters to match any type of derivative. Fix variable
    error.

54d24cc (2017-06-15) 
    Add documentation for the Internet Archive upload command.

f02ac0a (2017-06-15) 
    Introduce the "egaia ia" command to upload videos to the Internet
    Archive.

    This command will publish videos, along with their Dublin Core metadata, to
    the Internet Archive and embed the published versions in html pages
    generated by egaia. This command also allows you to update remote item
    metadata.


e01ab02 (2017-06-15) 
    Update documentation

3bfb4e7 (2017-06-13) 
    Remove transparency when deriving images from PDFs.

    Otherwise we can end up with black text on a black background. We now
    change the background to white, which should work in most cases.


d408957 (2017-06-13) 
    Automatically parse UUIDs from anywhere in a Markdown document.

    This change allows us to cite items from the archive in a Markdown document
    simply by creating links, e.g., "See the [link to this 
    item](#32d3f16e-d3ff-4fcb-9d1e-779e47ed0c7b)". Bare citations or uuids
    listed anywhere in the yaml metadata for the page will also work. If the
    items are available in the published pages of the archive, thumbnails will
    be included in the html output.


091480d (2017-06-13) 
    Move the version number to the bottom of the description in html
    docs.

0dd80cb (2017-06-04) 
    Create main video thumbnails separately from the contact sheet.

    This allows us to resize thumbnail images if we update the HTML theme. The
    video thumbnail images are now created at 320 px width, so they are 
    consistent with the size of all other thumbnail images.


c43402c (2017-06-04) 
    Specify noscroll for embedded video iframes

77c9a82 (2017-06-04) 
    Add features list to the README file

c93396e (2017-06-04) 
    Select thumbnails from about halfway through videos, not the first
    frame

1163f53 (2017-06-03) 
    Use sphinx-git to include a changelog in the documentation

6c1d40d (2017-06-03) 
    Use git describe for version numbers.

    This change implements . We
    call ``git describe`` to get the number of commits past the most recent 
    version tag. We are treating commits to the main branch of the repository 
    as patch or "development" releases.

    ``egaia --version`` will give us a string like "v2.0.56+git14676fb".

    The docs will show a string like "v2.0.56", which doesn't include the git 
    hash so is primarily meaningful with reference to the main repository.


14676fb (2017-06-01) 
    Create full-size screenshots of web pages. Allow config of related
    tools.

7e3ad9b (2017-06-01) 
    Move remote embeds config setting from system to archive section

fe086ec (2017-06-01) 
    Copy attributes of files in html output.

77991da (2017-06-01) 
    Provide a command to list the number of items in a collection

07b0e98 (2017-05-30) 
    Create variables for wkhtmltopdf, wkhtmltoimage commands

04e1532 (2017-05-30) 
    Update formatting of html documentation

a898321 (2017-05-30) 
    Print usage as a single block in help documents

2571ae6 (2017-05-29) 
    Accommodate rst tables in docstrings

ac88272 (2017-05-29) 
    Add fuller documentation for derive command

d94dd57 (2017-05-25) 
    Remove icons in sample directory listing

981cc0c (2017-05-25) 
    Add documentation about item metadata fields

3e92191 (2017-05-25) 
    Remove remote embeds as a default option.

fbb6473 (2017-05-25) 
    Remove remote embeds as a default option.

ae2127a (2017-05-25) 
    Add font-awesome, so we can use icons as default thumbnails in html
    output.

ac6d9d1 (2017-05-25) 
    Provide parsing mechanism for web pages, using ".url" files.

ca44bdc (2017-05-22) 
    Update documentation with improved docstring parser.

6907704 (2017-05-22) 
    Avoid some code execution on module imports by not checking config files
    etc.

b7944d3 (2017-05-22) 
    Use uppercase tags, since lowercase proper names look silly.

cba5508 (2017-05-22) 
    Provide sample Makefile for building HTML output.

02faf81 (2017-05-22) 
    Update docs to reflect deprecated build tool.

ce664b6 (2017-05-22) 
    Deprecate and remove the build script.

    This script is intended to work like Make, but it doesn't do so as cleanly.
    Instead, we will provide a sample Makefile in the documentation that can be
    modified to suit the layout of an individual archive.


b83e2a0 (2017-05-21) 
    Update metadata on init.

b27c085 (2017-05-21) 
    Provide more complete html output feature.

    This commit provides several new features related to HTML output.
    - Indices are created for various fields: type, creator, coverage, and
     language. These are hyperlinked from item description pages and
     linked by default in the header menu.
    - Bootstrap is included to facilitate consistent and predictable theming
    - Thumbnail and medium-scale images are derived from PDFs. On item
     description pages, these link to the full document.
    - Support for embedded versions of remotely stored video content is
     provided. This does not obviate the need for local archival storage,
     but allows for the provision of video content in multiple formats
     and resolutions.
    - The base url variable is substituted at the time of page generation
    - Static page generation with item galleries is supported.


66b2b9d (2017-05-21) 
    Allow init command to run, by removing blocking collection check.

fef3d6a (2017-05-21) 
    Provide line-by-line parsing of doc strings into RST.

    This basic parser transforms tabular command/description lists into a
    definition list, recognizing continuation lines.


9d48caf (2017-05-21) 
    Move README from Mardown to ReStructured text, for use by Sphinx.

5f8c26b (2017-05-15) 
    Reactivate markdown processing

df03d0a (2017-05-15) 
    Link or copy files to distribution formats where no conversion is
    needed

75d73e8 (2017-05-15) 
    Accommodate Unicode input in the metadata

287c938 (2017-05-15) 
    Fix parameters

c832d83 (2017-05-15) 
    Fix init command.

    - Fix incorrect function name in egaia_mkcsv module
    - Allow description and title for a new bag to be set on the command line.
     Otherwise pass "none provided" as a string, instead of None, in order to
     prevent parse errors from bagit.


5a312b4 (2017-05-15) 
    In setup, do not prompt user for the value of archive.storage_path

fe248dc (2017-05-15) 
    Fix minor bugs blocking installation and setup.

    - Use the full module name with importlib for subcommands
    - Create directories for the configuration file if they do not
     yet exist
    - Fix a typo and update repository URL


975cb28 (2017-03-12) 
    Add parent collection name to the item description page; remove
    duplicate title

1df7ba6 (2017-03-09) 
    Link to tags from the dc_subject field on item description pages

af274be (2017-03-09) 
    Add support for keywords/tags index

0c000fc (2017-03-09) 
    Process static pages from a directory, not individual files.

da839d5 (2017-03-08) 
    Create documentation for egaia v2.1

6269de5 (2017-03-08) 
    update version and requirements for setup.py

86098e7 (2017-03-08) 
    update css

8aa6a8b (2017-03-08) 
    Apply miscellaneous docstring fixes

9afba4d (2017-03-08) 
    Complete HTML output command, supporting indexes and embedded media.

fd22ff6 (2017-03-08) 
    Change the working directory to the collection root, to be safe

4a0ae5c (2017-03-08) 
    Provide an exclusion filter for the list command

a3f0a2a (2017-03-08) 
    General fixes to derivatives production.

    - Enable production of PDF version of video stills (because we want to
    avoid
     sending individual thumbs to HTML output)
    - Process everything in a collection by default, rather than using globs,
     which becomes confusing and sometimes gives file path errors
    - Ensure that all derivatives have ".df-" in the filename


8faad6a (2017-03-08) 
    Remove unnecessary comment and variable

c1d1b90 (2017-03-07) 
    Fix missing basename for video thumb output

b2b2538 (2017-03-07) 
    Update html output to add audio and video embeds.

b974960 (2017-03-03) 
    Fix setup.py

46696ad (2017-03-03) 
    Fix docstrings and function names for sphinx output

ced2973 (2017-03-03) 
    Fix missing comma in setup.py

3456549 (2017-02-28) 
    Add BagIt metadata fields and labels to the configuration file

e7223b5 (2017-02-28) 
    Use bumpversion for release management

d606c66 (2017-02-28) 
    Update stylesheet for HTML output

5b4664f (2017-02-27) 
    Provide collection-level descriptions and index pages in HTML output

1d45bb6 (2017-02-25) 
    Add html export functionality for item description pages.

    The dc fields in metadata are renamed from dc.* to dc_* in order to allow 
    parsing by the string template, used for variable substitution in html 
    templates.

    Thumbnail derivatives are now expected to have the pattern ".thumb-" to 
    facilitate filename matching.