Generating Checksums for Alfresco Content

Last week I implemented a feature in Alfresco that will generate an MD5 checksum for managed files as part of some proof-of-concept work Tribloom is doing for UC Berkeley Information Services and Technology to integrate their Alfresco-based collaboration service with the University of California Curation Center (UC3) Merritt preservation repository.

Commonly used for verifying the integrity of files after download, it was rather surprising to find that checksums haven’t made their way into the Alfresco code base. In our case, we wanted to supply a web service with the checksums for files transmitted from Alfresco without requiring the user generate them by hand. To do this, I created a custom aspect and policy handler that generates a checksum hash when added to any file in Alfresco. The hash is added as a property on the content and can be used by the Alfresco user or passed to the web service for automated verification.

The following steps will illustrate how I went about adding this handy feature.

This example is based on Michael McCarthy’s demo repository and share template projects and was deployed against Alfresco Enterprise 3.4.4. The full source and Eclipse project can be downloaded at GitHub and the code below will assume the file locations of my project so feel free to fork and follow along. I also deployed the repository and share to separate Tomcat instances, which made it necessary to duplicate the resource bundle in both so that the aspect name property is available on both sides (refer to the latter part of step 6).

  1. Create a Custom Model
    This extension model will define the Hashable aspect and its two properties: a hash value and a hash type. The hash value is the checksum that we are after and will be automatically generated by the policy handler that will be created later on. The hash type is the algorithm used and, while chosen by the user, is restricted to those algorithms offered by the java.security.MessageDigest: MD2, MD5 (our default), SHA-1, SHA-256, SHA-384, and SHA-512. This constraint is defined in the model as well. It all lives in the file /hashable-repo/config/alfresco/extension/model/demoModel.xmlas illustrated below:

    <?xml version="1.0" encoding="UTF-8"?>
    <model name="dm:demomodel" xmlns="http://www.alfresco.org/model/dictionary/1.0">
    	<description>Demo Content Model</description>
    	<author>Tribloom - Chris Paul</author>
    	<version>1.0</version>
    	<imports>
    		<import uri="http://www.alfresco.org/model/dictionary/1.0" prefix="d" />
    	</imports>
    	<namespaces>
    		<namespace uri="http://www.tribloom.com/model/demo/1.0" prefix="dm" />
    	</namespaces>
    	<aspects>
    		<aspect name="dm:hashable">
    			<title>Hashable</title>
    			<properties>
    				<property name="dm:hashType">
    					<title>Hash Type</title>
    					<type>d:text</type>
    					<default>md5</default>
    					<constraints>
    						<constraint type="LIST">
    							<parameter name="allowedValues">
    								<list>
    									<value>md2</value>
    									<value>md5</value>
    									<value>sha-1</value>
    									<value>sha-256</value>
    									<value>sha-384</value>
    									<value>sha-512</value>
    								</list>
    							</parameter>
    						</constraint>
    					</constraints>
    				</property>
    				<property name="dm:hashValue">
    					<title>Hash Value</title>
    					<type>d:text</type>
    				</property>
    			</properties>
    		</aspect>
    	</aspects>
    </model>
  2. Build a Model Interface
    Next I created a Java interface to store commonly used namespace variables. This is somewhat optional, but increases readability and is just plain convenient. It’s defined in the file /hashable-repo/src/java/com/tribloom/demo/model/HashableModel.java:

    package com.tribloom.demo.model;
    
    import org.alfresco.service.namespace.QName;
    
    public interface HashableModel {
    	static final String DEMO_URI = "http://www.tribloom.com/model/demo/1.0";
    	static final QName ASPECT_HASHABLE = QName.createQName(DEMO_URI, "hashable");
    	static final QName PROP_HASH_TYPE = QName.createQName(DEMO_URI, "hashType");
    	static final QName PROP_HASH_VALUE = QName.createQName(DEMO_URI, "hashValue");
    }
  3. Load the Custom Model via Spring
    The new model is loaded automatically by the Spring bootstrap loader but must be configured by creating a Spring context file. For this example, it will be located at /hashable-repo/config/alfresco/extension/demo-context.xml. I also added a ResourceBundle that will define a property for displaying the aspect name:

    <?xml version='1.0' encoding='UTF-8'?>
    <!DOCTYPE beans PUBLIC '-//SPRING/DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>
    
    <beans>
    	<!-- Load the demo content model -->
    	<bean id="demo.extension.dictionaryBootstrap" parent="dictionaryModelBootstrap"	depends-on="dictionaryBootstrap">
    		<property name="models">
    			<list>
    				<value>alfresco/extension/model/demoModel.xml</value>
    			</list>
    		</property>
    	</bean>
    	<!-- Load the demo properties file -->
    	<bean id="demo.extension.resourceBundle" class="org.alfresco.i18n.ResourceBundleBootstrapComponent">
    		<property name="resourceBundles">
    			<list>
    				<value>alfresco.extension.messages.demo</value>
    			</list>
    		</property>
    	</bean>
    </beans>
  4. Add Properties for Display
    Two files will designate how the aspect and its properties should be displayed to the user on the repository web client. The first sets labels for the aspect properties and is located at /hashable-repo/config/alfresco/extension/webclient.properties:

    demo.property.hashtype.title=Hash Type
    demo.property.hashvalue.title=Hash Value

    The second provides a label for the aspect itself and is a custom resource bundle located at /hashable-repo/config/alfresco/extension/messages/demo.properties:

    aspect.dm_hashable=Hashable
  5. Make it all Visible in the Repository
    To make the Hashable aspect and its properties visible in the Alfresco repository web client, the file /hashable-repo/config/alfresco/extension/web-client-config-custom.xmlis created to define when and where those items should be displayed.

    <alfresco-config>
    	<!-- Display hashable properties for content with the Hashable aspect -->
    	<config evaluator="aspect-name" condition="dm:hashable">
    		<property-sheet>
    			<show-property name="dm:hashType" display-label-id="demo.property.hashtype.title" />
    			<show-property name="dm:hashValue" display-label-id="demo.property.hashvalue.title" />
    		</property-sheet>
    	</config>
    	<!-- Activate the option to add the Hashable aspect in the Action Wizard -->
    	<config evaluator="string-compare" condition="Action Wizards">
    		<aspects>
    			<aspect name="dm:hashable" />
    		</aspects>
    		<!-- Offer this option to all content. This can be updated to reflect
    		     only the types that you want to make Hashable. -->
    		<subtypes>
    			<type name="cm:content" />
    		</subtypes>
    	</config>
    	<!-- Allow repo users to search for content with a given hash type -->
    	<config evaluator="string-compare" condition="Advanced Search">
    		<advanced-search>
    			<custom-properties>
    				<meta-data aspect="dm:hashable" property="dm:hashType"
    					display-label-id="demo.property.hashtype.title" />
    			</custom-properties>
    		</advanced-search>
    	</config>
    </alfresco-config>
  6. Enable the Hashable Aspect in Share
    Displaying the aspect in Alfresco Share requires some similar steps to the repository client, in that we must define when and where it should be shown in the file /hashable-share/config/web-extension/share-config-custom.xml:

    <alfresco-config>
    	<!-- Display the Hashable aspect in the "Manage Aspects" pop-up -->
    	<config evaluator="string-compare" condition="DocumentLibrary">
    		<aspects>
    			<visible>
    				<aspect name="dm:hashable" />
    			</visible>
    			<addable></addable>
    			<removable></removable>
    		</aspects>
    	</config>
    	<!-- Display the Hashable properties -->
    	<config evaluator="node-type" condition="cm:content">
    		<forms>
    			<form>
    				<field-visibility>
    					<show id="dm:hashType" />
    					<show id="dm:hashValue" />
    				</field-visibility>
    				<appearance>
    					<field id="dm:hashValue" read-only="true" />
    				</appearance>
    			</form>
    		</forms>
    	</config>
    </alfresco-config>

    As stated earlier, if you are deploying the Alfresco repository and Share to separate Tomcat instances, you will have to duplicate the ResourceBundle creation to ensure the aspect label property is available on both clients. To do this, create the file /hashable-share/config/web-extension/messages/demo.properties:

    demo.property.hashtype.title=Hash Type
    demo.property.hashvalue.title=Hash Value
    aspect.dm_hashable=Hashable

    Wire this properties file into the Spring bootstrap loader at /hashable-share/config/web-extension/demo-custom-slingshot-application-context.xml:

    <?xml version='1.0' encoding='UTF-8'?>
    <!DOCTYPE beans PUBLIC '-//SPRING//DTD BEAN//EN' 'http://www.springframework.org/dtd/spring-beans.dtd'>
    
    <beans>
       <!-- Load the demo properties -->
       <bean id="demo.custom.resources" class="org.springframework.extensions.surf.util.ResourceBundleBootstrapComponent">
          <property name="resourceBundles">
             <list>
                <value>alfresco.web-extension.messages.demo</value>
             </list>
          </property>
       </bean>
    </beans>
  7. Build a Policy Handler to Generate the Checksum
    Now comes the fun part: writing the Java code that will do our heavy lifting. We’ll create a Spring bean on the repository side of things that is injected with the necessary Alfresco services and registers itself to be notified of events in the Alfresco system. This class will be notified whenever the Hashable aspect is applied to a piece of content or when the metadata or content with the Hashable aspect is updated, at which point it will generate a new hash value. This is taken care of by the following Alfresco interfaces:

    • org.alfresco.repo.content.ContentServicePolicies.OnContentUpdatePolicy
    • org.alfresco.repo.node.NodeServicePolicies.OnAddAspectPolicy
    • org.alfresco.repo.node.NodeServicePolicies.OnUpdatePropertiesPolicy

    These policies provide methods that are activated when the registered behaviours (note the British spelling) are instantiated by the init() method, which is specified in the Spring configuration. The ContentHasher class lives at /hashable-repo/src/java/com/tribloom/demo/ContentHasher.java and is shown, in part, below (for the complete source, please see the GitHub repository linked at the beginning of this article):

    public class ContentHasher implements OnAddAspectPolicy, OnContentUpdatePolicy,
    		OnUpdatePropertiesPolicy {
    ...
    	private Behaviour onContentUpdate;
    	private Behaviour onAddAspect;
    	private Behaviour onUpdateProperties;
    
    	// Dependencies that will be injected by Spring
    	private PolicyComponent policyComponent;
    	private NodeService nodeService;
    	private BehaviourFilter policyFilter;
    	private ContentService contentService;
    
    	public void init() {
    		onContentUpdate = new JavaBehaviour(this, "onContentUpdate",
    				Behaviour.NotificationFrequency.TRANSACTION_COMMIT);
    		policyComponent.bindClassBehaviour(QName.createQName(
    				NamespaceService.ALFRESCO_URI, "onContentUpdate"),
    				HashableModel.ASPECT_HASHABLE, onContentUpdate);
    		onAddAspect = new JavaBehaviour(this, "onAddAspect",
    				Behaviour.NotificationFrequency.TRANSACTION_COMMIT);
    		policyComponent.bindClassBehaviour(QName.createQName(
    				NamespaceService.ALFRESCO_URI, "onAddAspect"),
    				HashableModel.ASPECT_HASHABLE, onAddAspect);
    		onUpdateProperties = new JavaBehaviour(this, "onUpdateProperties",
    				Behaviour.NotificationFrequency.TRANSACTION_COMMIT);
    		policyComponent.bindClassBehaviour(QName.createQName(
    				NamespaceService.ALFRESCO_URI, "onUpdateProperties"),
    				HashableModel.ASPECT_HASHABLE, onUpdateProperties);
    	}
    
    	@Override
    	public void onUpdateProperties(NodeRef nodeRef,
    			Map<QName, Serializable> before, Map<QName, Serializable> after) {
    		if (!nodeService.exists(nodeRef)
    			|| !nodeService.hasAspect(nodeRef, HashableModel.ASPECT_HASHABLE)
    			|| nodeService.hasAspect(nodeRef, ContentModel.ASPECT_TEMPORARY)) {
    			if (logger.isDebugEnabled())
    				logger.debug("Cannot process nodeRef.");
    			return;
    		}
    		String oldHashType = (String) before.get(HashableModel.PROP_HASH_TYPE);
    		String newHashType = (String) after.get(HashableModel.PROP_HASH_TYPE);
    		if (oldHashType.equals(newHashType)) {
    			if (logger.isDebugEnabled())
    				logger.debug("No change in hash type.");
    		}
    		setHash(nodeRef, newHashType);
    	}
    
    	@Override
    	public void onContentUpdate(NodeRef nodeRef, boolean newContent) {
    		if (!nodeService.exists(nodeRef)
    			|| !nodeService.hasAspect(nodeRef, HashableModel.ASPECT_HASHABLE)
    			|| nodeService.hasAspect(nodeRef, ContentModel.ASPECT_TEMPORARY)) {
    			if (logger.isDebugEnabled())
    				logger.debug("Cannot process nodeRef.");
    			return;
    		}
    		String digestType = (String) nodeService.getProperty(nodeRef, HashableModel.PROP_HASH_TYPE);
    		setHash(nodeRef, digestType);
    	}
    
    	@Override
    	public void onAddAspect(NodeRef nodeRef, QName aspectTypeQName) {
    		if (!nodeService.exists(nodeRef)
    			|| !nodeService.hasAspect(nodeRef, HashableModel.ASPECT_HASHABLE)
    			|| nodeService.hasAspect(nodeRef, ContentModel.ASPECT_TEMPORARY)
    			|| (nodeService.getProperty(nodeRef, HashableModel.PROP_HASH_VALUE) != null)) {
    			if (logger.isDebugEnabled())
    				logger.debug("Cannot process nodeRef.");
    			return;
    		}
    		setHash(nodeRef, DEFAULT_HASH_TYPE);
    	}
    
    	private void setHash(NodeRef nodeRef, String hashType) {
    		policyFilter.disableBehaviour(ContentModel.ASPECT_VERSIONABLE);
    		ContentReader contentReader = contentService.getReader(nodeRef, ContentModel.PROP_CONTENT);
    		if (contentReader == null || contentReader.getSize() == 0) {
    			logger.error("Content is null or empty, removing aspect.");
    			nodeService.removeAspect(nodeRef, HashableModel.ASPECT_HASHABLE);
    			return;
    		}
    		InputStream contentStream = contentReader.getContentInputStream();
    		String hashValue = computeHash(contentStream, hashType);
    		if (hashValue == null) {
    			nodeService.removeAspect(nodeRef, HashableModel.ASPECT_HASHABLE);
    			return;
    		}
    		nodeService.setProperty(nodeRef, HashableModel.PROP_HASH_TYPE, hashType);
    		nodeService.setProperty(nodeRef, HashableModel.PROP_HASH_VALUE, hashValue);
    		policyFilter.enableBehaviour(ContentModel.ASPECT_VERSIONABLE);
    	}
    
    	private String computeHash(InputStream contentStream, String hashType) {
    		MessageDigest messageDigest = null;
    		try {
    			messageDigest = MessageDigest.getInstance(hashType);
    		} catch (NoSuchAlgorithmException e) {
    			logger.error("Unable to process algorith type: " + hashType);
    			return null;
    		}
    		messageDigest.reset();
    		byte[] buffer = new byte[BUFFER_SIZE];
    		int bytesRead = -1;
    		try {
    			while ((bytesRead = contentStream.read(buffer)) > -1) {
    				messageDigest.update(buffer, 0, bytesRead);
    			}
    		} catch (IOException e) {
    			logger.error("Unable to read content stream.", e);
    			return null;
    		} finally {
    			try {
    				contentStream.close();
    			} catch (IOException e) {}
    		}
    		byte[] digest = messageDigest.digest();
    		return convertByteArrayToHex(digest);
    	}
    
    	private String convertByteArrayToHex(byte[] array) {
    		StringBuffer hashValue = new StringBuffer();
    		for (int i = 0; i < array.length; i++) {
    			String hex = Integer.toHexString(0xFF & array[i]);
    			if (hex.length() == 1) {
    				hashValue.append('0');
    			}
    			hashValue.append(hex);
    		}
    		return hashValue.toString().toUpperCase();
    	}
    ...
    }

    Wire this class up by editing the Spring configuration file, /hashable-repo/config/alfresco/extension/demo-model-context.xml, and adding the bean:

    	<!-- Instantiate the ContentHasher which will generate digests for content -->
    	<bean id="demo.extension.contenthasher" class="com.tribloom.demo.ContentHasher"
    		init-method="init">
    		<property name="policyComponent">
    			<ref bean="policyComponent" />
    		</property>
    		<property name="nodeService">
    			<ref bean="NodeService" />
    		</property>
    		<property name="contentService">
    			<ref bean="ContentService" />
    		</property>
    		<property name="policyFilter">
    			<ref bean="policyBehaviourFilter" />
    		</property>
    	</bean>

Once the repository and Share extension files have been deployed and Tomcat is restarted, the Hashable aspect can be tested by adding it to a piece of content and verifying the generation of a hash value property in the content metadata using the default MD5 algorithm. The algorithm can be changed by editing the metadata for the item, and a new hash value will be automatically generated thanks to our policy handler.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>