iode.Comments.convert_file

classmethod Comments.convert_file(input_file: str | Path, input_format: str | ImportFormats, save_file: str | Path, rule_file: str | Path, lang: str | TableLang = TableLang.ENGLISH, debug_file: str | Path = None)[source]

Convert an external file representing IODE comments to an IODE comments file (.cmt). The possible formats for the input file are:

  • Ascii: IODE-specific Ascii format for objects

  • Rotated Ascii: Ascii format for variables with series in columns

  • DIF: DIF format (Data Interchange Format)

  • DIF Belgostat: (old) exchange format specific to Belgostat

  • NIS: National Institute of Statistics Ascii format (old)

  • GEM: Ascii format of Chronos software

  • PRN-Aremos: Ascii format from Aremos software

  • TXT Belgostat: (old) Belgostat-specific exchange format

The rule file is a simple text file contains the rules for:

  • selecting the objects to be imported

  • determining the objects names.

Each rule consists of two fields:

  • the selection pattern, containing a description of the names concerned by the rule. This mask is defined in the same way as the search() method.

  • the transcoding algorithm for the names, which can contain : - + : indicates that the character must be included in the name - - : indicates that the character should be skipped - any other character: included in the name

Example:

B* C+-+ -> transforms B1234 into CB2, BCDEF into CBE, etc *X ++++++++++ -> keeps names ending in X unchanged * ++++++++++ -> keeps all names unchanged

Parameters:
input_filestr or Path

The path to the input file to be converted.

input_formatstr or ImportFormats

The format of the input file. Possible formats are ASCII, ROT_ASCII (Rotated Ascii), DIF, BISTEL, NIS, GEM, PRN, TXT (TXT Belgostat).

save_filestr or Path

The path to the output file where the IODE comments will be saved.

rule_filestr or Path

The path to the rule file that defines the selection and transcoding rules.

langstr or TableLang, optional

The language of the extracted comments. It is only used when a text appears in several languages in the input file. Currently, only the Belgostat DIF format uses this value, allowing you to select the language of the extracted comments. Default is ENGLISH.

debug_filestr or Path, optional

The path to the debug file where the debug information will be saved. If not provided, the debug information will be printed to the console.

Examples

>>> from pathlib import Path
>>> from iode import SAMPLE_DATA_DIR, comments, ImportFormats
>>> output_dir = getfixture('tmp_path')
>>> input_file = f"{SAMPLE_DATA_DIR}/fun_xode.ac.ref"
>>> input_format = ImportFormats.ASCII
>>> save_file = str(output_dir / "imported_cmt.cmt")
>>> rule_file = f"{SAMPLE_DATA_DIR}/rules.txt"
>>> debug_file = str(output_dir / "debug.log")
>>> # print rules
>>> with open(rule_file, "r") as f:
...     print(f.read())
...
AC*  KK_--+++++++++++++
*U   UU_++++++++++++++++
>>> # get list of comments with a name starting with 'AC' 
>>> # and ending with 'U' from the input file
>>> with open(input_file, "r") as f:
...     for line in f:
...         name = line.split(" ")[0]
...         if name.startswith("AC") or name.endswith("U"):
...             print(line.strip())
...
ACAF "Ondernemingen: ontvangen kapitaaloverdrachten."
ACAG "Totale overheid: netto ontvangen kapitaaloverdrachten."
DPU "Nominale afschrijvingen op de kapitaalvoorraad."
DPUU "Nominale afschrijvingen op de kapitaalvoorraad (aangepast: inkomensoptiek)."
IFU "Bruto kapitaalvorming: ondernemingen."
IHU "Bruto kapitaalvorming: gezinnen."
WBU "Totale loonmassa (inclusief werkgeversbijdragen)."
>>> # import comments from input_file to save_file
>>> # using the rules defined in rule_file
>>> comments.convert_file(input_file, input_format, save_file, rule_file, 'E', debug_file)
Reading object 1 : KK_AF
Reading object 2 : KK_AG
Reading object 3 : UU_DPU
Reading object 4 : UU_DPUU
Reading object 5 : UU_IFU
Reading object 6 : UU_IHU
Reading object 7 : UU_WBU
7 objects saved
>>> # check content of the saved file
>>> comments.load(save_file)
Loading ...\imported_cmt.cmt
7 objects loaded
>>> comments
Workspace: Comments
nb comments: 7
filename: ...\imported_cmt.cmt

  name                                        comments
KK_AF       Ondernemingen: ontvangen kapitaaloverdrachten.
KK_AG       Totale overheid: netto ontvangen kapitaaloverdrachten.
UU_DPU      Nominale afschrijvingen op de kapitaalvoorraad.
UU_DPUU     Nominale afschrijvingen op de kapitaalvoorraad (aangepast: inkomensoptiek).
UU_IFU      Bruto kapitaalvorming: ondernemingen.
UU_IHU      Bruto kapitaalvorming: gezinnen.
UU_WBU      Totale loonmassa (inclusief werkgeversbijdragen).

>>> # content of the debug file
>>> with open(debug_file, "r") as f:
...     for line in f:
...         print(line.strip())
...
ACAF -> KK_AF       (Rule KK_--+++++++++++++)
ACAG -> KK_AG       (Rule KK_--+++++++++++++)
DPU -> UU_DPU       (Rule UU_++++++++++++++++)
DPUU -> UU_DPUU     (Rule UU_++++++++++++++++)
IFU -> UU_IFU       (Rule UU_++++++++++++++++)
IHU -> UU_IHU       (Rule UU_++++++++++++++++)
WBU -> UU_WBU       (Rule UU_++++++++++++++++)