DWCA Module#
Reading a DwCA file#
A Darwin Core Archive file is a compress zip
or tar.gz
file containing the component files in a star scheme and the descriptor files. Whole documentation of Darwin Core in the text guide.
Assuming the archive is inside a DwCArchive.zip
file containing:
meta.xml
: Mandatory descriptor file.eml.xml
: Metadata file.taxon.txt
: Core component.identifier.txt
: Identifier extension component.speciesprofile.txt
: Species profile extension component.reference.txt
: Reference extension component.
All of these files must be specified in the meta.xml
file as described in the text guide, section metafile content.
To read this file you can:
from dwca import DarwinCoreArchive
darwin_core = DarwinCoreArchive.from_archive("DwCArchive.zip")
To check the components on it:
print(darwin_core.core)
Core:
class: Taxon
filename: taxon.txt
content: 163461 entries
The extensions component are stores in a Python list
and can be access in the same way:
print(len(darwin_core.extension))
3
Check the first one:
print(darwin_core.extension[0])
Extension:
class: SpeciesProfile
filename: speciesprofile.txt
content: 153622 entries
And you can work with this data as an array of Python objects
, as numpy arrays` or as ``pandas DataFrames
darwin_core.core.data
[<Taxon urn:lsid:example.org:taxname:1>, <Taxon urn:lsid:example.org:taxname:2>, ...]
darwin_core.core.data.as_pandas()
Pending...
Writing a DwCA archive#
To generate a new Darwin Core Archive file you can use the same class and build that starting point:
from dwca import DarwinCoreArchive
from eml.resources import EMLResource
from eml.types import ResponsibleParty, IndividualName
# Define the metadata file future location
darwin_core = DarwinCoreArchive(metadata="eml.xml")
The guidelines suggest to add a metadata file in a standardized form. Alternatives suggest EML (Ecological Metadata Language), FGDC (Federal Geographic Data Committee) or ISO 19115.
For this package, we implemented EML support (Next section) for the metadata, and can be added and worked like this:
darwin_core.metadata.initialize_resource(
"Example for Darwin Core Archive",
ResponsibleParty(
individual_name=IndividualName(
_id="1"
last_name="Doe",
first_name="John",
salutation="Mr."
)
),
contact=[ResponsibleParty(_id="1", referencing=True)]
)
# Add core data
darwin_core.set_core("taxon.txt")
# Add an extension
darwin_core.add_extension("identifier.txt")
# Write the archive
with open("example.zip", "wb") as example_file:
darwin_core.to_file(example_file)
There are other ways to add data. Check the whole documentation for more information.