edu.stanford.nlp.ie.merge
Class GenericMerger

java.lang.Object
  |
  +--edu.stanford.nlp.ie.merge.AbstractInstanceMerger
        |
        +--edu.stanford.nlp.ie.merge.GenericMerger
All Implemented Interfaces:
InstanceMerger, Serializable
Direct Known Subclasses:
WebSearchMerger

public class GenericMerger
extends AbstractInstanceMerger

Provides basic merging functionality.

See Also:
Serialized Form

Field Summary
protected  HashMap ignoredFields
          stores the names of fields that don't participate in the merge
protected  HashMap ignoredPenalties
          stores the names of fields that are allowed to conflict
 
Constructor Summary
GenericMerger()
          Empty constructor; subclasses can use the constructor to make calls to ignoreField and suppressConflictPenalty
 
Method Summary
 boolean compatibleConcept(edu.unika.aifb.kaon.Concept c)
          By default, compatible with all concepts.
protected  void concatFields(edu.unika.aifb.kaon.Relation r, edu.unika.aifb.kaon.Instance i1, edu.unika.aifb.kaon.Instance i2, Confidence c1, Confidence c2, edu.unika.aifb.kaon.Instance newInstance, Confidence newConfidence)
          For the specified relation, takes the two values in each of the instances and concatenates them together delimiting with a comma space.
 edu.unika.aifb.kaon.Instance getBestInstance(edu.unika.aifb.kaon.Instance[] instances, Confidence[] confidences)
          Calls getMergedInstances and returns the best one.
 double getConflictPenalty()
          Gets the conflict penalty, which is assessed when the same field in two different instances don't match.
 void getMergedInstances(Vector instances, Vector confidences)
          Finds the best Instance according to getRank.
protected  double getMergedRank(edu.unika.aifb.kaon.Instance i1, edu.unika.aifb.kaon.Instance i2, Confidence c1, Confidence c2)
          The rank of a merge is the rank of the Instance resulting from the union of the fields of the two input Instances.
 double getMergePenalty()
          Gets the merge penalty, which is assessed every time two Instances are merged.
protected  double getRank(edu.unika.aifb.kaon.Instance i, Confidence c)
          Rank of an instance is the sum of the confidences for each field, as specified by the Confidence object.
 double getVacuousMergePenalty()
          Gets the "vacuous merge" penalty, which is assessed when a merged instance would contain no values from one of the original instances.
 void ignoreField(String fieldName)
          For the purposes of ranking, a relational field whose name matches the one passed into this method is ignored.
protected  edu.unika.aifb.kaon.Instance merge(edu.unika.aifb.kaon.Instance i1, edu.unika.aifb.kaon.Instance i2, Confidence c1, Confidence c2, Confidence resultingConfidence)
          Merges two Instances as described by getMergedRank.
protected  boolean mergeInstances(int index, Vector instances, Vector confidences)
          Given the index of the starting instance, compare all subsequent instances looking for the best merge; if one is not found, return false indicating a merge was not found.
protected  void reconcileConflictedField(edu.unika.aifb.kaon.Relation r, edu.unika.aifb.kaon.Instance i1, edu.unika.aifb.kaon.Instance i2, Confidence c1, Confidence c2, edu.unika.aifb.kaon.Instance newInstance, Confidence newConfidence)
          Similar to reconcileIgnoredFields except it handles fields specified by suppressConflictPenalty.
protected  void reconcileIgnoredFields(edu.unika.aifb.kaon.Instance i1, edu.unika.aifb.kaon.Instance i2, Confidence c1, Confidence c2, edu.unika.aifb.kaon.Instance newInstance, Confidence newConfidence)
          When two instances are merged, fields specified by ignoreField are ignored until the end, when this method is called.
 void setConflictPenalty(double penalty)
          Sets the conflict penalty.
 void setMergePenalty(double penalty)
          Sets the merge penalty.
protected  void setOneField(edu.unika.aifb.kaon.Relation r, edu.unika.aifb.kaon.Instance i1, edu.unika.aifb.kaon.Instance i2, Confidence c1, Confidence c2, edu.unika.aifb.kaon.Instance newInstance, Confidence newConfidence)
          For the specified relation, takes the the value in the instance with the higher confidence, otherwise i1's value if it's a tie.
 void setVacuousMergePenalty(double penalty)
          Sets the vacuous merge penalty.
protected  void sortInstances(Vector instances, Vector confidences)
          sorts the instances vector by rank, sorts the confidences in parallel.
 void suppressConflictPenalty(String fieldName)
          Exempts a particular field from conflict penalties -- that is, when ranking a merger between two Instances, if they disagree on a particular field, a conflict penalty for that mismatch is not assessed if this method was called for that field.
 
Methods inherited from class edu.stanford.nlp.ie.merge.AbstractInstanceMerger
isEmpty, storeMerger
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

ignoredFields

protected HashMap ignoredFields
stores the names of fields that don't participate in the merge


ignoredPenalties

protected HashMap ignoredPenalties
stores the names of fields that are allowed to conflict

Constructor Detail

GenericMerger

public GenericMerger()
Empty constructor; subclasses can use the constructor to make calls to ignoreField and suppressConflictPenalty

Method Detail

compatibleConcept

public boolean compatibleConcept(edu.unika.aifb.kaon.Concept c)
By default, compatible with all concepts.

Parameters:
c - the Concept to be checked against the Merger
Returns:
true if the Merger can be applied to Instances of the Concept

getConflictPenalty

public double getConflictPenalty()
Gets the conflict penalty, which is assessed when the same field in two different instances don't match. Default is 0.5.

Returns:
the penalty

setConflictPenalty

public void setConflictPenalty(double penalty)
Sets the conflict penalty.

Parameters:
penalty - the new penalty value

getMergePenalty

public double getMergePenalty()
Gets the merge penalty, which is assessed every time two Instances are merged. Default is 0.25. Possible scenario where this might be useful: suppose you have an incomplete Instance with high global ranking that can be merged with another incomplete Instance with low global ranking. Suppose the resulting merged Instance would be comparable to an already-existing mid-global-ranked Instance. A high merge penalty would favor the already existing Instance, while a low penalty would favor the one that derived from the original instance with high ranking.

Returns:
the penalty

setMergePenalty

public void setMergePenalty(double penalty)
Sets the merge penalty.

Parameters:
penalty - the new penalty value

getVacuousMergePenalty

public double getVacuousMergePenalty()
Gets the "vacuous merge" penalty, which is assessed when a merged instance would contain no values from one of the original instances. This only concerns getMergedRank, so fields exempted by either ignoreField or suppressConflictPenalty do not contribute toward deciding whether a merge is vacuous. Default is 10, but conceivably could be set to 0 if appropriate. Note that if a normal mergePenalty is specified, this is also assessed for vacuous merges.

Returns:
the penalty

setVacuousMergePenalty

public void setVacuousMergePenalty(double penalty)
Sets the vacuous merge penalty.

Parameters:
penalty - the new penalty value

ignoreField

public void ignoreField(String fieldName)
For the purposes of ranking, a relational field whose name matches the one passed into this method is ignored. For instance, calling gm.ignoreField("Address"); means that the rank of an Instance or the rank of a merged Instance does not depend on the confidence ranking or contents of the Address field.

The method reconcileIgnoredFields is called during actual merger of two instances, which can contain code that deals with ignored fields accordingly.


suppressConflictPenalty

public void suppressConflictPenalty(String fieldName)
Exempts a particular field from conflict penalties -- that is, when ranking a merger between two Instances, if they disagree on a particular field, a conflict penalty for that mismatch is not assessed if this method was called for that field.

The method reconcileConflictedField is called during merger of two instances to handle the merging behavior in the event that a conflict for one of these fields arises.

Parameters:
fieldName - the name of the Relation

reconcileIgnoredFields

protected void reconcileIgnoredFields(edu.unika.aifb.kaon.Instance i1,
                                      edu.unika.aifb.kaon.Instance i2,
                                      Confidence c1,
                                      Confidence c2,
                                      edu.unika.aifb.kaon.Instance newInstance,
                                      Confidence newConfidence)
When two instances are merged, fields specified by ignoreField are ignored until the end, when this method is called. Ignored fields are concatenated together, delimited by a comma and space, and the higher of the two confidence values is used. Possible behavior for overriding subclasses could consist of using some general or specific criteria for choosing one of the values to use. Changes are made to newInstance and newConfidence

Parameters:
i1 - the first instance to merge
i2 - the other instance to merge
c1 - the confidence object corresponding to i1
c2 - the confidence object corresponding to i2
newInstance - the Instance so-far from merging i1 and i2
newConfidence - the Confidence object so-far for newInstance

reconcileConflictedField

protected void reconcileConflictedField(edu.unika.aifb.kaon.Relation r,
                                        edu.unika.aifb.kaon.Instance i1,
                                        edu.unika.aifb.kaon.Instance i2,
                                        Confidence c1,
                                        Confidence c2,
                                        edu.unika.aifb.kaon.Instance newInstance,
                                        Confidence newConfidence)
Similar to reconcileIgnoredFields except it handles fields specified by suppressConflictPenalty. Note that this method deals with such fields only if a conflict arises. And unlike reconcileIgnoredFields, it does not operate in bulk. The two values for the conflicting field are concatenated together, delimited by a comma and space, and the higher of the two confidence values is used. Possible behavior for overriding subclasses could consist of using some general or specific criteria for choosing one of the values to use. Changes are made to newInstance and newConfidence

Parameters:
r - the relation corresponding to the field with the conflict
i1 - the first instance to merge
i2 - the other instance to merge
c1 - the confidence object corresponding to i1
c2 - the confidence object corresponding to i2
newInstance - the Instance so-far from merging i1 and i2
newConfidence - the Confidence object so-far for newInstance

concatFields

protected void concatFields(edu.unika.aifb.kaon.Relation r,
                            edu.unika.aifb.kaon.Instance i1,
                            edu.unika.aifb.kaon.Instance i2,
                            Confidence c1,
                            Confidence c2,
                            edu.unika.aifb.kaon.Instance newInstance,
                            Confidence newConfidence)
For the specified relation, takes the two values in each of the instances and concatenates them together delimiting with a comma space. The higher of the two confidences are taken. These values are stored in newInstance and newConfidence. Designed to be used in the reconcile* methods.

Parameters:
r - the relation corresponding to the field to merge
i1 - the first instance to merge
i2 - the other instance to merge
c1 - the confidence object corresponding to i1
c2 - the confidence object corresponding to i2
newInstance - the Instance so-far from merging i1 and i2
newConfidence - the Confidence object so-far for newInstance

setOneField

protected void setOneField(edu.unika.aifb.kaon.Relation r,
                           edu.unika.aifb.kaon.Instance i1,
                           edu.unika.aifb.kaon.Instance i2,
                           Confidence c1,
                           Confidence c2,
                           edu.unika.aifb.kaon.Instance newInstance,
                           Confidence newConfidence)
For the specified relation, takes the the value in the instance with the higher confidence, otherwise i1's value if it's a tie. This value and the corresponding confidence are stored in newInstance and newConfidence respectively. Designed to be used in the reconcile* methods.

Parameters:
r - the relation corresponding to the field to merge
i1 - the first instance to merge
i2 - the other instance to merge
c1 - the confidence object corresponding to i1
c2 - the confidence object corresponding to i2
newInstance - the Instance so-far from merging i1 and i2
newConfidence - the Confidence object so-far for newInstance

getRank

protected double getRank(edu.unika.aifb.kaon.Instance i,
                         Confidence c)
Rank of an instance is the sum of the confidences for each field, as specified by the Confidence object. This effectively takes into account how filled the Instance is as well, assuming >0 rankings. It also includes the penalty and globalRanking of the Confidence object, which takes into account previous conflict penalties if this instance is the result of a previous merger. Fields specified by ignoreField are not included in the calculation.

Specified by:
getRank in class AbstractInstanceMerger
Parameters:
i - the Instance to rank
c - the Confidence object describing the Instance
Returns:
the rank of the Instance

getMergedRank

protected double getMergedRank(edu.unika.aifb.kaon.Instance i1,
                               edu.unika.aifb.kaon.Instance i2,
                               Confidence c1,
                               Confidence c2)
The rank of a merge is the rank of the Instance resulting from the union of the fields of the two input Instances. If there are conflicting fields, then the higher-ranked field wins, but a uniform conflict penalty is assessed, as specified by conflictPenalty. Fields specified by ignoreField are not included in the calculation.

Past conflictPenalties are taken into account and are reflected in the current merged rank as well. However, there is no memory of what particular conflicting fields caused the previous conflictPenalties, so it is possible that a given field can cause several penalties over the course of several mergers.

Specified by:
getMergedRank in class AbstractInstanceMerger
Parameters:
i1 - one of the Instances
i2 - the other Instance
c1 - the Confidence object corresponding to i1
c2 - the Confidence object corresponding to i2
Returns:
the rank of the resulting Instance if i1 and i2 were merged, or -1 if the Instances are not mergeable

merge

protected edu.unika.aifb.kaon.Instance merge(edu.unika.aifb.kaon.Instance i1,
                                             edu.unika.aifb.kaon.Instance i2,
                                             Confidence c1,
                                             Confidence c2,
                                             Confidence resultingConfidence)
Merges two Instances as described by getMergedRank. Ignored fields specified by ignoredFields are dealt with according to reconcileIgnoredFields. This method is not destructive to either input instance or either input confidence. This will force merge the two instances, regardless of how bad the resulting instance is, so getMergedRank should probably be called first to determine whether the merge is reasonable.

Specified by:
merge in class AbstractInstanceMerger
Parameters:
i1 - the first instance to merge
i2 - the other instance to merge
c1 - the confidence object corresponding to i1
c2 - the confidence object corresponding to i2
resultingConfidence - an instantiated but empty Confidence object that will store the confidence information of the resulting merged instance
Returns:
the Instance resulting from merging i1 and i2, or null if they can't be merged

getBestInstance

public edu.unika.aifb.kaon.Instance getBestInstance(edu.unika.aifb.kaon.Instance[] instances,
                                                    Confidence[] confidences)
Calls getMergedInstances and returns the best one. This is identical to calling getMergedInstances and obtaining the first element in the resulting Instance Vector.

Parameters:
instances - an array of instances
confidences - an array of confidences indexed parallel to the array of instances
Returns:
the best, (possibly) merged instance

getMergedInstances

public void getMergedInstances(Vector instances,
                               Vector confidences)
Finds the best Instance according to getRank. Then finds the best merger with that Instance, provided that the resulting Instance does not have a lower score than the original best Instance. Keeps attempting to merge with other Instances while the score doesn't go down. The same process is applied to the next best unmerged Instance, and so forth, until there are no more productive mergers. The resulting merged Instances and any leftover unmerged Instances are returned as the only elements in the input instances Vector, and their confidences are returned in the parallel confidences Vector.

Parameters:
instances - a Vector of Instances to be merged; during the run of this method, this Vector is cleared, and when the method returns, the Vector will contain the resulting merged and leftover Instances in decreasing rank order
confidences - a Vector of Confidences parallel to the instances that will be treated the same way as the instances Vector and remain parallel to its contents.

sortInstances

protected void sortInstances(Vector instances,
                             Vector confidences)
sorts the instances vector by rank, sorts the confidences in parallel. This does a worst case n^2 sort, in an effort to use reasonable memory; in practice, n should not be terribly large (<20).


mergeInstances

protected boolean mergeInstances(int index,
                                 Vector instances,
                                 Vector confidences)
Given the index of the starting instance, compare all subsequent instances looking for the best merge; if one is not found, return false indicating a merge was not found.



Stanford NLP Group