652. Find Duplicate Subtrees

Problem Statement

Pattern:

O(n^2) Solution

List<TreeNode> res;
HashMap<String, Integer> serialCount;
 
// post order serialise
String pos(TreeNode root){
	if(root == null) return "#";
	
	// serialise curr subtree
	String serial = root.val + "," + pos(root.left) + pos(root.right);
	
	// update serialCount and res
	int freq = serialCount.getOrDefault(serial, 0);
	serialCount.put(serial, ++freq);
	if (freq == 2) res.add(root);
	
	return serial;
}
 
public List<TreeNode> findDuplicateSubtrees (TreeNode root){
	res = new LinkedList<>();
	serialCount = new HashMap<>();
	pos(root);
	return res;
}

Notes

For every node, serialise its subtree convert its root, left, right values into a string. and save this serials frequency in a hashmap.
since we only have to return a single instance of duplicate subtrees, check if freq == 2, and add to result
effectively postorder traverse and serialise

Now, Why O(N^2)

at the bottom left most node/subtree , the size of the serial string is 1
at the root the size of the serial string is > n where n is number of nodes.
why does this matter?
because concatenation, and hashing operations complexity increases linearly with the size of the string
so at the root the concatenation has to concatenate a string of characters for all nodes = n, its child n-1, for its child n-2, so on and so forth
therefore O(n^2)

O(n) Solution

if we could instead of appending all the values, could somehow ‘cache’ the serial, effectively assign the serial an id, so we didnt have to concatenate, and append these serial s over and over again, and we could simply identify , concatenate and hash a serial based off if its id

int uniqueSerials = 1;
HashMap<String, Integer> serialID;
HashMap<Integer, Integer> idCount;
List<TreeNode> res;
 
int getSerial(String serial) {
	// get serialID if exists
	if (serialID.containsKey(serial)) return serialID.get(serial);
	// else get unique serial
	serialID.put(serial, ++uniqueSerials);
	return uniqueSerials;
}
 
// post order serialise
int pos(TreeNode root) {
	if (root == null) return 0;
	
	// serialise curr subtree
	String serial = pos(root.left) + "," + root.val + "," + pos(root.right);
	
	// get an ID for serial
	int srID = getSerial(serial);
	
	// update idCount and res
	int freq = idCount.getOrDefault(srID, 0);
	idCount.put(srID, ++freq);
	if (freq == 2) res.add(root);
	
	return srID;
}
 
public List<TreeNode> findDuplicateSubtrees(TreeNode root) {
	serialID = new HashMap<>();
	idCount = new HashMap<>();
	res = new LinkedList<>();
	pos(root);
	return res;
}

In the first approach, imagine that from left side you got a string like “2,3,4,5” etc while from right you got “7,8,9,10”, then you will be concatenating 4 + 4 operations(excluding commas and if you include them its even more).
Whereas with ids, this is always bounded to an integer at most from left side you will get 10 chars(-2147483647 or 2147483647) and from right you will get again max 10 chars. So at most your String concatenation will be 10 chars + 10 chars.

sidequests

Table of Contents

652. Find Duplicate Subtrees

Problem Statement

O(n^2) Solution

Notes

O(n) Solution

Graph

Backlinks