Generate

This module provides the functionality to generate synthetic records.

Before using this module you must have already:

  • Created a config

  • Trained a model

gretel_synthetics.generate.PredString

alias of gretel_synthetics.generate.pred_string

class gretel_synthetics.generate.gen_text(valid: bool = None, text: str = None, explain: str = None, delimiter: str = None)

A record that is yielded from the generate_text generator.

valid

True, False, or None. If the line passed a validation function, then this will be True. If the validation function raised an exception then this will be automatically set to False. If no validation function is used, then this value will be None.

text

The actual record as a string

explain

A string that describes why a record failed validation. This is the string representation of the Exception that is thrown in a validation function. This will only be set if validation fails, otherwise will be None.

delimiter

If the generated text are column/field based records. This will hold the delimiter used to separate the fields from each other.

as_dict() → dict

Serialize the generated record to a dictionary

values_as_list() → List[str]

Attempt to split the generated text on the provided delimiter

Returns

A list of values that are separated by the object’s delimiter or None is there is no delimiter in the text

gretel_synthetics.generate.generate_text(config: BaseConfig, start_string: str = '<n>', line_validator: Callable = None, max_invalid: int = 1000, num_lines: int = None)

A generator that will load a model and start creating records.

Parameters
  • config – A configuration object, which you must have created previously

  • start_string – A prefix string that is used to seed the record generation. By default we use a newline, but you may substitue any initial value here which will influence how the generator predicts what to generate.

  • line_validator – An optional callback validator function that will take the raw string value from the generator as a single argument. This validator can executue arbitrary code with the raw string value. The validator function may return a bool to indicate line validity. This boolean value will be set on the yielded gen_text object. Additionally, if the validator throws an exception, the gen_text object will be set with a failed validation. If the validator returns None, we will assume successful validation.

  • max_invalid – If using a line_validator, this is the maximum number of invalid lines to generate. If the number of invalid lines exceeds this value a RunTimeError will be raised.

  • num_lines – If not None, this will override the gen_lines value that is provided in the config

Simple validator example:

def my_validator(raw_line: str):
    parts = raw_line.split(',')
    if len(parts) != 5:
        raise Exception('record does not have 5 fields')

Note

gen_lines from the config is important for this function. If a line validator is not provided, each line will count towards the number of total generated lines. When the total lines generated is >= gen_lines we stop. If a line validator is provided, only valid lines will count towards the total number of lines generated. When the total number of valid lines generated is >= gen_lines, we stop.

Note

gen_chars, controls the possible maximum number of characters a single generated line can have. If a newline character has not been generated before reaching this number, then the line will be returned. For example if gen_chars is 180 and a newline has not been generated, once 180 chars have been created, the line will be returned no matter what. As a note, if this value is 0, then each line will generate until a newline is observed.

Yields

A gen_text object for each record that is generated. The generator will stop after the max number of lines is reached (based on your config).

Raises

A RunTimeError if the max_invalid number of lines is generated