5. xml files
Warning
5.1. xml structure
<employees>
is the root tag.<office_worker>
is a child tag of the root tag.<firstName>
is a child tag of the office_worker tag.<?xml version="1.0" encoding="UTF-8" ?>
<employees>
<office_worker>
<firstName>John</firstName>
<lastName>Doe</lastName>
<gender>Male</gender>
</office_worker>
<office_worker>
<firstName>Peter</firstName>
<lastName>Jones</lastName>
<gender>Male</gender>
</office_worker>
<writer>
<firstName>Anna</firstName>
<lastName>Smith</lastName>
<gender>Female</gender>
</writer>
</employees>
5.2. Parse xml file
import xml.etree.ElementTree as ET
ET.parse
to parse an XML source into an element tree.- ET.parse(source, parser=None)
- Parameters:
source – a filename or file object containing XML data.
parser – an optional parser instance. If not given, the standard XMLParser parser is used.
Returns an ElementTree instance.
getroot()
to get the root of the element tree object.- ET.getroot()
- Returns the root element for the tree, ET.
- ET.findall(match, namespaces=None)
- Parameters:
match – may be a tag name or a path
namespaces – is an optional mapping from namespace prefix to full name. Pass ‘’ as prefix to move all unprefixed tag names in the expression into the given namespace.
Finds all matching subelements, by tag name or path, starting at the root of the tree.Returns a list containing all matching elements in document order.
tree = ET.parse(xml_file_path)
and parses it into the variable tree
.found_elememnts = tree.findall("office_worker")
is used to get a list containing the office_worker data as a list of element objects.for elem in found_elememnts:
iterates over each office_worker list of elements so the text of each element can be obtained.elem.find("firstName").text
import xml.etree.ElementTree as ET
xml_file_path = "files/employees.xml"
tree = ET.parse(xml_file_path)
employee_type = "office_worker"
found_elememnts = tree.findall(employee_type)
for elem in found_elememnts:
f_name = elem.find("firstName").text
l_name = elem.find("lastName").text
gen = elem.find("gender").text
print(f'{f_name} {l_name} is a {gen.lower()} {employee_type}')
employee_type = "writer"
found_elememnts = tree.findall(employee_type)
for elem in found_elememnts:
f_name = elem.find("firstName").text
l_name = elem.find("lastName").text
gen = elem.find("gender").text
print(f'{f_name} {l_name} is a {gen.lower()} {employee_type}')
John Doe is a male office_worker
Peter Jones is a male office_worker
Anna Smith is a female writer
Tasks
Avoid code duplication above by using a list of employee types and iterating over them.
Avoid code duplication above by using a list of employee types and iterating over them.
import xml.etree.ElementTree as ET
xml_file_path = "files/employees.xml"
tree = ET.parse(xml_file_path)
employee_types = ["office_worker", "writer"]
for employee_type in employee_types:
found_elememnts = tree.findall(employee_type)
for elem in found_elememnts:
f_name = elem.find("firstName").text
l_name = elem.find("lastName").text
gen = elem.find("gender").text
print(f'{f_name} {l_name} is a {gen.lower()} {employee_type}')
5.3. Edit tag text
root[1][0].text
.<?xml version="1.0" encoding="UTF-8" ?>
<employees>
...
<office_worker>
<firstName>Peter</firstName>
...
</office_worker>
</employees>
import xml.etree.ElementTree as ET
xml_file_path = "files/employees.xml"
xml_file_path2 = "files/employees2.xml"
tree = ET.parse(xml_file_path)
root = tree.getroot()
root[2][0].text = "Pete"
tree.write(xml_file_path2)
5.4. Fixing indenting
- ET.indent(tree, space=' ', level=0)
- Parameters:
tree – can be an Element or ElementTree.
space – space is the whitespace string that will be inserted for each indentation level, two space characters by default
level – For indenting partial subtrees inside of an already indented tree, pass the initial indentation level as level.
Appends whitespace to the subtree to indent the tree visually. This can be used to generate pretty-printed XML output.
5.5. New Elements
- ET.Element(tag, attrib={}, **extra)
- Parameters:
tag – tag is the element name
attrib – attrib is an optional dictionary, containing element attributes.
extra – extra contains additional attributes, given as keyword arguments.
Appends whitespace to the subtree to indent the tree visually. This can be used to generate pretty-printed XML output.Just use:ET.Element(tag)
.
5.6. Append element
employer4 = ET.Element("writer")
creates an element for the new employee.child = ET.Element("firstName")
creates an element for the firstName.child.text = "Jane"
adds the text value to firstName element.employer4.append(child)
appends the firstName element to the employer4 element.import xml.etree.ElementTree as ET
xml_file_path = "files/employees.xml"
xml_file_path2 = "files/employees2.xml"
tree = ET.parse(xml_file_path)
root = tree.getroot()
employer4 = ET.Element("writer")
child = ET.Element("firstName")
child.text = "Jane"
employer4.append(child)
child = ET.Element("lastName")
child.text = "Austin"
employer4.append(child)
child = ET.Element("gender")
child.text = "Female"
employer4.append(child)
root.append(employer4)
ET.indent(tree, space=' ', level=0)
tree.write(xml_file_path2)
<writer>
<firstName>Jane</firstName>
<lastName>Austin</lastName>
<gender>Female</gender>
</writer>
Tasks
Write a definition to append the employee data from a python dictionary to the xml and use it to add the employee dictionary: {“firstName”:”Jane”,”lastName”:”Austin”,”gender”:”Female”}
Write a definition to append the employee data from a python dictionary to the xml and use it to add the employee dictionary: {“firstName”:”Jane”,”lastName”:”Austin”,”gender”:”Female”} using a tag of “writer”
import xml.etree.ElementTree as ET
xml_file_path = "files/employees.xml"
xml_file_path2 = "files/employees2.xml"
tree = ET.parse(xml_file_path)
root = tree.getroot()
def append_dict_to_xml(tag, d):
# create new element
elem = ET.Element(tag)
for key, val in d.items():
# create child element
child = ET.Element(key)
child.text = val
elem.append(child)
return elem
emp4_dict = {"firstName":"Jane","lastName":"Austin","gender":"Female"}
tag = "writer"
emp4_xml = append_dict_to_xml(tag, emp4_dict)
root.append(emp4_xml)
ET.indent(tree, space=' ', level=0)
tree.write(xml_file_path2)
<employees>
<office_worker>
<firstName>John</firstName>
<lastName>Doe</lastName>
<gender>Male</gender>
</office_worker>
<office_worker>
<firstName>Peter</firstName>
<lastName>Jones</lastName>
<gender>Male</gender>
</office_worker>
<writer>
<firstName>Anna</firstName>
<lastName>Smith</lastName>
<gender>Female</gender>
</writer>
<writer>
<firstName>Jane</firstName>
<lastName>Austin</lastName>
<gender>Female</gender>
</writer>
</employees>