example.assets.example_components package

This module is auto generated by azure-ml-component.

Assets included:

file:../../../../../components/**/*.yaml
azureml://feeds/azureml
azureml://feeds/huggingface

class example.assets.example_components.Datasets

Bases: object

property acronym_identification_default: Huggingface acronym_identification-default dataset

property ade_corpus_v2_ade_corpus_v2_classification: Huggingface ade_corpus_v2-Ade_corpus_v2_classification dataset

property ade_corpus_v2_ade_corpus_v2_drug_ade_relation: Huggingface ade_corpus_v2-Ade_corpus_v2_drug_ade_relation dataset

property ade_corpus_v2_ade_corpus_v2_drug_dosage_relation: Huggingface ade_corpus_v2-Ade_corpus_v2_drug_dosage_relation dataset

property adult_census_income_binary_classification_dataset: Census Income dataset

property adversarial_qa_adversarialqa: Huggingface adversarial_qa-adversarialQA dataset

property adversarial_qa_dbert: Huggingface adversarial_qa-dbert dataset

property adversarial_qa_dbidaf: Huggingface adversarial_qa-dbidaf dataset

property adversarial_qa_droberta: Huggingface adversarial_qa-droberta dataset

property aeslc_default: Huggingface aeslc-default dataset

property afrikaans_ner_corpus_afrikaans_ner_corpus: Huggingface afrikaans_ner_corpus-afrikaans_ner_corpus dataset

property ag_news_default: Huggingface ag_news-default dataset

property ai2_arc_arc_challenge: Huggingface ai2_arc-ARC-Challenge dataset

property ai2_arc_arc_easy: Huggingface ai2_arc-ARC-Easy dataset

property air_dialogue_air_dialogue_data: Huggingface air_dialogue-air_dialogue_data dataset

property air_dialogue_air_dialogue_kb: Huggingface air_dialogue-air_dialogue_kb dataset

property akhooli_gpt2_small_arabic: Huggingface akhooli/gpt2-small-arabic model

property akhooli_gpt2_small_arabic_poetry: Huggingface akhooli/gpt2-small-arabic-poetry model

property allegro_reviews_default: Huggingface allegro_reviews-default dataset

property allocine_allocine: Huggingface allocine-allocine dataset

property alt_alt_km: Huggingface alt-alt-km dataset

property alt_alt_my: Huggingface alt-alt-my dataset

property alt_alt_my_transliteration: Huggingface alt-alt-my-transliteration dataset

property alt_alt_my_west_transliteration: Huggingface alt-alt-my-west-transliteration dataset

property alt_alt_parallel: Huggingface alt-alt-parallel dataset

property alvaroalon2_biobert_chemical_ner: Huggingface alvaroalon2/biobert_chemical_ner model

property alvaroalon2_biobert_diseases_ner: Huggingface alvaroalon2/biobert_diseases_ner model

property amazon_polarity_amazon_polarity: Huggingface amazon_polarity-amazon_polarity dataset

property amazon_reviews_multi_all_languages: Huggingface amazon_reviews_multi-all_languages dataset

property amazon_reviews_multi_de: Huggingface amazon_reviews_multi-de dataset

property amazon_reviews_multi_en: Huggingface amazon_reviews_multi-en dataset

property amazon_reviews_multi_es: Huggingface amazon_reviews_multi-es dataset

property amazon_reviews_multi_fr: Huggingface amazon_reviews_multi-fr dataset

property amazon_reviews_multi_ja: Huggingface amazon_reviews_multi-ja dataset

property amazon_reviews_multi_zh: Huggingface amazon_reviews_multi-zh dataset

property ambig_qa_full: Huggingface ambig_qa-full dataset

property ambig_qa_light: Huggingface ambig_qa-light dataset

property amttl_amttl: Huggingface amttl-amttl dataset

property animal_images_dataset: This sample dataset is derived from Open Image Dataset and includes 3 animal categories (cat, dog, frog). Each category contains 10 images.

property anli_plain_text: Huggingface anli-plain_text dataset

property app_reviews_default: Huggingface app_reviews-default dataset

property aqua_rat_raw: Huggingface aqua_rat-raw dataset

property aqua_rat_tokenized: Huggingface aqua_rat-tokenized dataset

property ar_res_reviews_default: Huggingface ar_res_reviews-default dataset

property arcd_plain_text: Huggingface arcd-plain_text dataset

property arsentd_lev_default: Huggingface arsentd_lev-default dataset

property art_anli: Huggingface art-anli dataset

property asi_gpt_fr_cased_small: Huggingface asi/gpt-fr-cased-small model

property aslg_pc12_default: Huggingface aslg_pc12-default dataset

property asset_ratings: Huggingface asset-ratings dataset

property asset_simplification: Huggingface asset-simplification dataset

property assin2_default: Huggingface assin2-default dataset

property assin_full: Huggingface assin-full dataset

property assin_ptbr: Huggingface assin-ptbr dataset

property assin_ptpt: Huggingface assin-ptpt dataset

property atomic_atomic: Huggingface atomic-atomic dataset

property automobile_price_data_raw: Clean missing data module required. Prices of various automobiles against make, model and technical specifications

property autshumato_autshumato_en_tn: Huggingface autshumato-autshumato-en-tn dataset

property autshumato_autshumato_en_ts: Huggingface autshumato-autshumato-en-ts dataset

property autshumato_autshumato_en_ts_manual: Huggingface autshumato-autshumato-en-ts-manual dataset

property autshumato_autshumato_en_zu: Huggingface autshumato-autshumato-en-zu dataset

property avichr_hebert_sentiment_analysis: Huggingface avichr/heBERT_sentiment_analysis model

property bert_base_uncased: Huggingface bert-base-uncased model

property bertin_project_bertin_base_ner_conll2002_es: Huggingface bertin-project/bertin-base-ner-conll2002-es model

property bsc_tecla_tecla: Huggingface bsc/tecla-tecla dataset

property cahya_gpt2_small_indonesian_522m: Huggingface cahya/gpt2-small-indonesian-522M model

property cahya_gpt2_small_indonesian_story: Huggingface cahya/gpt2-small-indonesian-story model

property capreolus_bert_base_msmarco: Huggingface Capreolus/bert-base-msmarco model

property cardiffnlp_twitter_roberta_base_emotion: Huggingface cardiffnlp/twitter-roberta-base-emotion model

property chambliss_distilbert_for_food_extraction: Huggingface chambliss/distilbert-for-food-extraction model

property ckiplab_albert_base_chinese_ner: Huggingface ckiplab/albert-base-chinese-ner model

property ckiplab_albert_base_chinese_pos: Huggingface ckiplab/albert-base-chinese-pos model

property ckiplab_albert_base_chinese_ws: Huggingface ckiplab/albert-base-chinese-ws model

property ckiplab_albert_tiny_chinese_ws: Huggingface ckiplab/albert-tiny-chinese-ws model

property ckiplab_bert_base_chinese_ner: Huggingface ckiplab/bert-base-chinese-ner model

property ckiplab_bert_base_chinese_pos: Huggingface ckiplab/bert-base-chinese-pos model

property ckiplab_bert_base_chinese_ws: Huggingface ckiplab/bert-base-chinese-ws model

property colorfulscoop_gpt2_small_ja: Huggingface colorfulscoop/gpt2-small-ja model

property crm_appetency_labels_shared: CRM Appetency Labels

property crm_churn_labels_shared: CRM Churn Labels

property crm_dataset_shared: CRM Dataset

property crm_upselling_labels_shared: CRM Upselling Labels

property cross_encoder_ms_marco_electra_base: Huggingface cross-encoder/ms-marco-electra-base model

property cross_encoder_stsb_tinybert_l_4: Huggingface cross-encoder/stsb-TinyBERT-L-4 model

property datificate_gpt2_small_spanish: Huggingface datificate/gpt2-small-spanish model

property dbmdz_bert_base_cased_finetuned_conll03_english: Huggingface dbmdz/bert-base-cased-finetuned-conll03-english model

property distilbert_base_uncased_finetuned_sst_2_english: Huggingface distilbert-base-uncased-finetuned-sst-2-english model

property dslim_bert_base_ner: Huggingface dslim/bert-base-NER model

property dslim_bert_base_ner_uncased: Huggingface dslim/bert-base-NER-uncased model

property elastic_distilbert_base_cased_finetuned_conll03_english: Huggingface elastic/distilbert-base-cased-finetuned-conll03-english model

property elastic_distilbert_base_uncased_finetuned_conll03_english: Huggingface elastic/distilbert-base-uncased-finetuned-conll03-english model

property ethanyt_guwen_ner: Huggingface ethanyt/guwen-ner model

property ethanyt_guwen_punc: Huggingface ethanyt/guwen-punc model

property ferch423_gpt2_small_portuguese_wikipediabio: Huggingface Ferch423/gpt2-small-portuguese-wikipediabio model

property finiteautomata_bertweet_base_sentiment_analysis: Huggingface finiteautomata/bertweet-base-sentiment-analysis model

property finiteautomata_beto_sentiment_analysis: Huggingface finiteautomata/beto-sentiment-analysis model

property flight_delays_data: Flight Delays Data

property german_credit_card_uci_dataset: German Credit Card UCI dataset

property gilf_french_camembert_postag_model: Huggingface gilf/french-camembert-postag-model model

property glue_ax: Huggingface glue-ax dataset

property glue_cola: Huggingface glue-cola dataset

property glue_mnli: Huggingface glue-mnli dataset

property glue_mnli_matched: Huggingface glue-mnli_matched dataset

property glue_mnli_mismatched: Huggingface glue-mnli_mismatched dataset

property glue_mrpc: Huggingface glue-mrpc dataset

property glue_qnli: Huggingface glue-qnli dataset

property glue_qqp: Huggingface glue-qqp dataset

property glue_rte: Huggingface glue-rte dataset

property glue_sst2: Huggingface glue-sst2 dataset

property glue_stsb: Huggingface glue-stsb dataset

property glue_wnli: Huggingface glue-wnli dataset

property gronlp_gpt2_small_italian: Huggingface GroNLP/gpt2-small-italian model

property gunghio_distilbert_base_multilingual_cased_finetuned_conll2003_ner: Huggingface gunghio/distilbert-base-multilingual-cased-finetuned-conll2003-ner model

property hf_internal_testing_tiny_xlm_roberta: Huggingface hf-internal-testing/tiny-xlm-roberta model

property imdb_movie_titles: IMDB Movie Titles

property imdb_plain_text: Huggingface imdb-plain_text dataset

property jsfoon_slogan_generator: Huggingface jsfoon/slogan-generator model

property lilaboualili_bert_vanilla: Huggingface LilaBoualili/bert-vanilla model

property lordtt13_emo_mobilebert: Huggingface lordtt13/emo-mobilebert model

property maltehb_l_ctra_danish_electra_small_uncased_ner_dane: Huggingface Maltehb/-l-ctra-danish-electra-small-uncased-ner-dane model

property media1129_recipe_tag_model: Huggingface Media1129/recipe-tag-model model

property microsoft_codegpt_small_py: Huggingface microsoft/CodeGPT-small-py model

property microsoft_codegpt_small_py_adaptedgpt2: Huggingface microsoft/CodeGPT-small-py-adaptedGPT2 model

property microsoft_minilm_l12_h384_uncased: Huggingface microsoft/MiniLM-L12-H384-uncased model

property movie_ratings: Movie Ratings

property mrm8488_bert_spanish_cased_finetuned_ner: Huggingface mrm8488/bert-spanish-cased-finetuned-ner model

property mrm8488_bert_tiny_finetuned_sms_spam_detection: Huggingface mrm8488/bert-tiny-finetuned-sms-spam-detection model

property mrm8488_codebert_base_finetuned_stackoverflow_ner: Huggingface mrm8488/codebert-base-finetuned-stackoverflow-ner model

property mrm8488_mobilebert_finetuned_ner: Huggingface mrm8488/mobilebert-finetuned-ner model

property mrm8488_mobilebert_finetuned_pos: Huggingface mrm8488/mobilebert-finetuned-pos model

property myx4567_distilgpt2_finetuned_wikitext2: Huggingface MYX4567/distilgpt2-finetuned-wikitext2 model

property narsil_tiny_distilbert_sequence_classification: Huggingface Narsil/tiny-distilbert-sequence-classification model

property nateraw_bert_base_uncased_emotion: Huggingface nateraw/bert-base-uncased-emotion model

property oliverguhr_german_sentiment_bert: Huggingface oliverguhr/german-sentiment-bert model

property philschmid_distilroberta_base_ner_conll2003: Huggingface philschmid/distilroberta-base-ner-conll2003 model

property pierreguillou_gpt2_small_portuguese: Huggingface pierreguillou/gpt2-small-portuguese model

property pierrerappolt_disease_extraction: Huggingface pierrerappolt/disease-extraction model

property pranavpsv_gpt2_genre_story_generator: Huggingface pranavpsv/gpt2-genre-story-generator model

property prosusai_finbert: Huggingface ProsusAI/finbert model

property proycon_bert_ner_cased_sonar1_nld: Huggingface proycon/bert-ner-cased-sonar1-nld model

property recordedfuture_swedish_ner: Huggingface RecordedFuture/Swedish-NER model

property restaurant_customer_data: Contains customer features, such as drink_level, dress_preference and marital_status.

property restaurant_feature_data: Contains restaurant features, such as name, address and dress_code.

property restaurant_ratings: Contains ratings given by customers to restaurants on scale from 0 to 2.

property sgugger_tiny_distilbert_classification: Huggingface sgugger/tiny-distilbert-classification model

property squad_adversarial_addonesent: Huggingface squad_adversarial-AddOneSent dataset

property squad_adversarial_addsent: Huggingface squad_adversarial-AddSent dataset

property squad_es_v1_1_0: Huggingface squad_es-v1.1.0 dataset

property squad_it_default: Huggingface squad_it-default dataset

property squad_plain_text: Huggingface squad-plain_text dataset

property squad_v1_pt_default: Huggingface squad_v1_pt-default dataset

property squad_v2_squad_v2: Huggingface squad_v2-squad_v2 dataset

property squadshifts_amazon: Huggingface squadshifts-amazon dataset

property squadshifts_new_wiki: Huggingface squadshifts-new_wiki dataset

property squadshifts_nyt: Huggingface squadshifts-nyt dataset

property squadshifts_reddit: Huggingface squadshifts-reddit dataset

property sshleifer_tiny_ctrl: Huggingface sshleifer/tiny-ctrl model

property sshleifer_tiny_dbmdz_bert_large_cased_finetuned_conll03_english: Huggingface sshleifer/tiny-dbmdz-bert-large-cased-finetuned-conll03-english model

property sshleifer_tiny_distilbert_base_cased: Huggingface sshleifer/tiny-distilbert-base-cased model

property sshleifer_tiny_distilbert_base_uncased_finetuned_sst_2_english: Huggingface sshleifer/tiny-distilbert-base-uncased-finetuned-sst-2-english model

property sshleifer_tiny_gpt2: Huggingface sshleifer/tiny-gpt2 model

property sshleifer_tiny_xlnet_base_cased: Huggingface sshleifer/tiny-xlnet-base-cased model

property super_glue_axb: Huggingface super_glue-axb dataset

property super_glue_axg: Huggingface super_glue-axg dataset

property super_glue_boolq: Huggingface super_glue-boolq dataset

property super_glue_cb: Huggingface super_glue-cb dataset

property super_glue_copa: Huggingface super_glue-copa dataset

property super_glue_multirc: Huggingface super_glue-multirc dataset

property super_glue_record: Huggingface super_glue-record dataset

property super_glue_rte: Huggingface super_glue-rte dataset

property super_glue_wic: Huggingface super_glue-wic dataset

property super_glue_wsc: Huggingface super_glue-wsc dataset

property super_glue_wsc_fixed: Huggingface super_glue-wsc.fixed dataset

property swag_full: Huggingface swag-full dataset

property swag_regular: Huggingface swag-regular dataset

property swahili_news_swahili_news: Huggingface swahili_news-swahili_news dataset

property swahili_swahili: Huggingface swahili-swahili dataset

property swda_default: Huggingface swda-default dataset

property swedish_ner_corpus_default: Huggingface swedish_ner_corpus-default dataset

property swedish_reviews_plain_text: Huggingface swedish_reviews-plain_text dataset

property tab_fact_blind_test: Huggingface tab_fact-blind_test dataset

property tab_fact_tab_fact: Huggingface tab_fact-tab_fact dataset

property tamilmixsentiment_default: Huggingface tamilmixsentiment-default dataset

property tanzil_bg_en: Huggingface tanzil-bg-en dataset

property tanzil_bn_hi: Huggingface tanzil-bn-hi dataset

property tanzil_en_tr: Huggingface tanzil-en-tr dataset

property tanzil_fa_sv: Huggingface tanzil-fa-sv dataset

property tanzil_ru_zh: Huggingface tanzil-ru-zh dataset

property tapaco_en: Huggingface tapaco-en dataset

property tapaco_eo: Huggingface tapaco-eo dataset

property tapaco_es: Huggingface tapaco-es dataset

property tapaco_et: Huggingface tapaco-et dataset

property tapaco_eu: Huggingface tapaco-eu dataset

property tapaco_fi: Huggingface tapaco-fi dataset

property tapaco_fr: Huggingface tapaco-fr dataset

property tapaco_gl: Huggingface tapaco-gl dataset

property tapaco_gos: Huggingface tapaco-gos dataset

property textattack_bert_base_uncased_cola: Huggingface textattack/bert-base-uncased-CoLA model

property textattack_bert_base_uncased_imdb: Huggingface textattack/bert-base-uncased-imdb model

property textattack_bert_base_uncased_mnli: Huggingface textattack/bert-base-uncased-MNLI model

property textattack_bert_base_uncased_snli: Huggingface textattack/bert-base-uncased-snli model

property textattack_bert_base_uncased_sst_2: Huggingface textattack/bert-base-uncased-SST-2 model

property textattack_distilbert_base_uncased_imdb: Huggingface textattack/distilbert-base-uncased-imdb model

property textattack_distilbert_base_uncased_rotten_tomatoes: Huggingface textattack/distilbert-base-uncased-rotten-tomatoes model

property textattack_roberta_base_imdb: Huggingface textattack/roberta-base-imdb model

property textattack_xlnet_base_cased_imdb: Huggingface textattack/xlnet-base-cased-imdb model

property transformersbook_codepage_small: Huggingface transformersbook/codepage-small model

property uer_gpt2_chinese_poem: Huggingface uer/gpt2-chinese-poem model

property uer_roberta_base_finetuned_cluener2020_chinese: Huggingface uer/roberta-base-finetuned-cluener2020-chinese model

property unitary_toxic_bert: Huggingface unitary/toxic-bert model

property vblagoje_bert_english_uncased_finetuned_pos: Huggingface vblagoje/bert-english-uncased-finetuned-pos model

property vishnun_distilgpt2_finetuned_distilgpt2_med_articles: Huggingface vishnun/distilgpt2-finetuned-distilgpt2-med_articles model

property vishnun_distilgpt2_finetuned_tamilmixsentiment: Huggingface vishnun/distilgpt2-finetuned-tamilmixsentiment model

property weather_dataset: Weather Dataset

property wietsedv_bert_base_multilingual_cased_finetuned_conll2002_ner: Huggingface wietsedv/bert-base-multilingual-cased-finetuned-conll2002-ner model

property wikipedia_sp_500_dataset: Wikipedia SP 500 Dataset

property xlnet_base_cased: Huggingface xlnet-base-cased model

example.assets.example_components.azureml_add_columns(left_dataset: Optional[pathlib.Path] = None, right_dataset: Optional[pathlib.Path] = None) → example.assets.example_components._assets._AzuremlAddColumnsComponent

Adds a set of columns from one dataset to another.

Parameters

left_dataset (Path) – Left dataset
right_dataset (Path) – Right dataset

Output combined_dataset

Combined dataset

Type

combined_dataset: Output

example.assets.example_components.azureml_add_rows(dataset1: Optional[pathlib.Path] = None, dataset2: Optional[pathlib.Path] = None) → example.assets.example_components._assets._AzuremlAddRowsComponent

Appends a set of rows from an input dataset to the end of another dataset.

Parameters

dataset1 (Path) – Dataset rows to be added to the output dataset first
dataset2 (Path) – Dataset rows to be appended to the first dataset

Output results_dataset

Dataset that contains all rows of both input datasets

Type

results_dataset: Output

example.assets.example_components.azureml_apply_image_transformation(input_image_transformation: Optional[pathlib.Path] = None, input_image_directory: Optional[pathlib.Path] = None, mode: Optional[example.assets.example_components._assets._AzuremlApplyImageTransformationModeEnum] = None) → example.assets.example_components._assets._AzuremlApplyImageTransformationComponent

Applies a image transformation to a image directory.

Parameters

input_image_transformation (Path) – Input image transformation
input_image_directory (Path) – Input image directory
mode (_AzuremlApplyImageTransformationModeEnum) – Should exclude ‘Random’ transform operations in inference but keep them in training (enum: [‘For training’, ‘For inference’])

Output output_image_directory

Output image directory

Type

output_image_directory: Output

example.assets.example_components.azureml_apply_math_operation(input: Optional[pathlib.Path] = None, category: example.assets.example_components._assets._AzuremlApplyMathOperationCategoryEnum = _AzuremlApplyMathOperationCategoryEnum.basic, basic_func: example.assets.example_components._assets._AzuremlApplyMathOperationBasicFuncEnum = _AzuremlApplyMathOperationBasicFuncEnum.abs, basic_arg_type: example.assets.example_components._assets._AzuremlApplyMathOperationBasicArgTypeEnum = _AzuremlApplyMathOperationBasicArgTypeEnum.constant, basic_constant: float = 1, basic_column_selector: Optional[str] = None, compare_func: example.assets.example_components._assets._AzuremlApplyMathOperationCompareFuncEnum = _AzuremlApplyMathOperationCompareFuncEnum.equalto, compare_arg_type: example.assets.example_components._assets._AzuremlApplyMathOperationCompareArgTypeEnum = _AzuremlApplyMathOperationCompareArgTypeEnum.constant, compare_constant: float = 1, compare_column_selector: Optional[str] = None, operations_func: example.assets.example_components._assets._AzuremlApplyMathOperationOperationsFuncEnum = _AzuremlApplyMathOperationOperationsFuncEnum.add, operations_arg_type: example.assets.example_components._assets._AzuremlApplyMathOperationOperationsArgTypeEnum = _AzuremlApplyMathOperationOperationsArgTypeEnum.constant, operations_constant: float = 1, operations_column_selector: Optional[str] = None, rounding_func: example.assets.example_components._assets._AzuremlApplyMathOperationRoundingFuncEnum = _AzuremlApplyMathOperationRoundingFuncEnum.ceiling, rounding_arg_type: example.assets.example_components._assets._AzuremlApplyMathOperationRoundingArgTypeEnum = _AzuremlApplyMathOperationRoundingArgTypeEnum.constant, rounding_constant: float = 1, rounding_column_selector: Optional[str] = None, special_func: example.assets.example_components._assets._AzuremlApplyMathOperationSpecialFuncEnum = _AzuremlApplyMathOperationSpecialFuncEnum.beta, special_arg_type: example.assets.example_components._assets._AzuremlApplyMathOperationSpecialArgTypeEnum = _AzuremlApplyMathOperationSpecialArgTypeEnum.constant, special_constant: float = 1, special_column_selector: Optional[str] = None, trigonometric_func: example.assets.example_components._assets._AzuremlApplyMathOperationTrigonometricFuncEnum = _AzuremlApplyMathOperationTrigonometricFuncEnum.acos, column_selector: Optional[str] = None, output_mode: example.assets.example_components._assets._AzuremlApplyMathOperationOutputModeEnum = _AzuremlApplyMathOperationOutputModeEnum.append) → example.assets.example_components._assets._AzuremlApplyMathOperationComponent

Applies a mathematical operation to column values.

Parameters

input (Path) – DataFrameDirectory
category (_AzuremlApplyMathOperationCategoryEnum) – enum (enum: [‘Basic’, ‘Compare’, ‘Operations’, ‘Rounding’, ‘Special’, ‘Trigonometric’])
basic_func (_AzuremlApplyMathOperationBasicFuncEnum) – enum (optional, enum: [‘Abs’, ‘Atan2’, ‘Conj’, ‘Cuberoot’, ‘DoubleFactorial’, ‘Eps’, ‘Exp’, ‘Exp2’, ‘ExpMinus1’, ‘Factorial’, ‘Hypotenuse’, ‘ImaginaryPart’, ‘Ln’, ‘LnPlus1’, ‘Log’, ‘Log10’, ‘Log2’, ‘NthRoot’, ‘Pow’, ‘RealPart’, ‘Sqrt’, ‘SqrtPi’, ‘Square’])
basic_arg_type (_AzuremlApplyMathOperationBasicArgTypeEnum) – enum (optional, enum: [‘Constant’, ‘ColumnSet’])
basic_constant (float) – float (optional)
basic_column_selector (str) – ColumnPicker (optional)
compare_func (_AzuremlApplyMathOperationCompareFuncEnum) – enum (optional, enum: [‘EqualTo’, ‘GreaterThan’, ‘GreaterThanOrEqualTo’, ‘LessThan’, ‘LessThanOrEqualTo’, ‘NotEqualTo’, ‘PairMax’, ‘PairMin’])
compare_arg_type (_AzuremlApplyMathOperationCompareArgTypeEnum) – enum (optional, enum: [‘Constant’, ‘ColumnSet’])
compare_constant (float) – float (optional)
compare_column_selector (str) – ColumnPicker (optional)
operations_func (_AzuremlApplyMathOperationOperationsFuncEnum) – enum (optional, enum: [‘Add’, ‘Divide’, ‘Multiply’, ‘Subtract’])
operations_arg_type (_AzuremlApplyMathOperationOperationsArgTypeEnum) – enum (optional, enum: [‘Constant’, ‘ColumnSet’])
operations_constant (float) – float (optional)
operations_column_selector (str) – ColumnPicker (optional)
rounding_func (_AzuremlApplyMathOperationRoundingFuncEnum) – enum (optional, enum: [‘Ceiling’, ‘CeilingPower2’, ‘Floor’, ‘Mod’, ‘Quotient’, ‘Remainder’, ‘RoundDigits’, ‘RoundDown’, ‘RoundUp’, ‘ToEven’, ‘ToMultiple’, ‘ToOdd’, ‘Truncate’])
rounding_arg_type (_AzuremlApplyMathOperationRoundingArgTypeEnum) – enum (optional, enum: [‘Constant’, ‘ColumnSet’])
rounding_constant (float) – float (optional)
rounding_column_selector (str) – ColumnPicker (optional)
special_func (_AzuremlApplyMathOperationSpecialFuncEnum) – enum (optional, enum: [‘Beta’, ‘BetaLn’, ‘EllipticIntegralE’, ‘EllipticIntegralK’, ‘Erf’, ‘Erfc’, ‘ErfcScaled’, ‘ErfInverse’, ‘ExponentialIntegralEin’, ‘Gamma’, ‘GammaLn’, ‘GammaRegularizedP’, ‘GammaRegularizedPInverse’, ‘GammaRegularizedQ’, ‘GammaRegularizedQInverse’, ‘Polygamma’])
special_arg_type (_AzuremlApplyMathOperationSpecialArgTypeEnum) – enum (optional, enum: [‘Constant’, ‘ColumnSet’])
special_constant (float) – float (optional)
special_column_selector (str) – ColumnPicker (optional)
trigonometric_func (_AzuremlApplyMathOperationTrigonometricFuncEnum) – enum (optional, enum: [‘Acos’, ‘AcosDegrees’, ‘Acosh’, ‘Acot’, ‘AcotDegrees’, ‘Acoth’, ‘Acsc’, ‘AcscDegrees’, ‘Acsch’, ‘Arg’, ‘Asec’, ‘AsecDegrees’, ‘Asech’, ‘Asin’, ‘AsinDegrees’, ‘Asinh’, ‘Atan’, ‘AtanDegrees’, ‘Atanh’, ‘Cis’, ‘Cos’, ‘CosDegrees’, ‘Cosh’, ‘Cot’, ‘CotDegrees’, ‘Coth’, ‘Csc’, ‘CscDegrees’, ‘Csch’, ‘DegreesToRadians’, ‘RadiansToDegrees’, ‘Sec’, ‘SecDegrees’, ‘Sech’, ‘Sign’, ‘Sin’, ‘Sinc’, ‘SinDegrees’, ‘Sinh’, ‘Tan’, ‘TanDegrees’, ‘Tanh’])
column_selector (str) – ColumnPicker
output_mode (_AzuremlApplyMathOperationOutputModeEnum) – enum (enum: [‘Append’, ‘Inplace’, ‘ResultOnly’])

Output result_dataset

DataFrameDirectory

Type

result_dataset: Output

example.assets.example_components.azureml_apply_sql_transformation(t1: Optional[pathlib.Path] = None, t2: Optional[pathlib.Path] = None, t3: Optional[pathlib.Path] = None, sqlquery: str = 'select * from t1') → example.assets.example_components._assets._AzuremlApplySqlTransformationComponent

Runs a SQLite query on input datasets to transform the data.

Parameters

t1 (Path) – DataFrameDirectory
t2 (Path) – DataFrameDirectory(optional)
t3 (Path) – DataFrameDirectory(optional)
sqlquery (str) – Script

Output result_dataset

DataFrameDirectory

Type

result_dataset: Output

example.assets.example_components.azureml_apply_transformation(transformation: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None) → example.assets.example_components._assets._AzuremlApplyTransformationComponent

Applies a well-specified data transformation to a dataset.

Parameters

transformation (Path) – A unary data transformation
dataset (Path) – Dataset to be transformed

Output transformed_dataset

Transformed dataset

Type

transformed_dataset: Output

example.assets.example_components.azureml_assign_data_to_clusters(trained_model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, check_for_append_or_uncheck_for_result_only: bool = True) → example.assets.example_components._assets._AzuremlAssignDataToClustersComponent

Assign data to clusters using an existing trained clustering model.

Parameters

trained_model (Path) – Trained clustering model
dataset (Path) – Input data source
check_for_append_or_uncheck_for_result_only (bool) – Whether output dataset must contain input dataset appended by assignments column (Checked) or assignments column only (Unchecked)

Output results_dataset

Input dataset appended by data column of assignments or assignments column only

Type

results_dataset: Output

example.assets.example_components.azureml_boosted_decision_tree_regression(create_trainer_mode: example.assets.example_components._assets._AzuremlBoostedDecisionTreeRegressionCreateTrainerModeEnum = _AzuremlBoostedDecisionTreeRegressionCreateTrainerModeEnum.singleparameter, maximum_number_of_leaves_per_tree: int = 20, minimum_number_of_training_instances_required_to_form_a_leaf: int = 10, the_learning_rate: float = 0.2, total_number_of_trees_constructed: int = 100, range_for_maximum_number_of_leaves_per_tree: str = '2; 8; 32; 128', range_for_minimum_number_of_training_instances_required_to_form_a_leaf: str = '1; 10; 50', range_for_learning_rate: str = '0.025; 0.05; 0.1; 0.2; 0.4', range_for_total_number_of_trees_constructed: str = '20; 100; 500', random_number_seed: Optional[int] = None) → example.assets.example_components._assets._AzuremlBoostedDecisionTreeRegressionComponent

Creates a regression model using the Boosted Decision Tree algorithm.

Parameters

create_trainer_mode (_AzuremlBoostedDecisionTreeRegressionCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])
maximum_number_of_leaves_per_tree (int) – Specify the maximum number of leaves per tree (optional, min: 2, max: 131072)
minimum_number_of_training_instances_required_to_form_a_leaf (int) – Specify the minimum number of cases required to form a leaf node (optional, min: 1)
the_learning_rate (float) – Specify the initial learning rate (optional, min: 2.220446049250313e-16, max: 1.0)
total_number_of_trees_constructed (int) – Specify the maximum number of trees that can be created during training (optional, min: 1)
range_for_maximum_number_of_leaves_per_tree (str) – Specify range for the maximum number of leaves allowed per tree (optional)
range_for_minimum_number_of_training_instances_required_to_form_a_leaf (str) – Specify the range for the minimum number of cases required to form a leaf (optional)
range_for_learning_rate (str) – Specify the range for the initial learning rate (optional)
range_for_total_number_of_trees_constructed (str) – Specify the range for the maximum number of trees that can be created during training (optional)
random_number_seed (int) – Provide a seed for the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained regression model that can be connected to the Train Generic Model or Cross Validate Model modules

Type

untrained_model: Output

example.assets.example_components.azureml_clean_missing_data(dataset: Optional[pathlib.Path] = None, columns_to_be_cleaned: Optional[str] = None, minimum_missing_value_ratio: float = 0.0, maximum_missing_value_ratio: float = 1.0, cleaning_mode: example.assets.example_components._assets._AzuremlCleanMissingDataCleaningModeEnum = _AzuremlCleanMissingDataCleaningModeEnum.custom_substitution_value, replacement_value: str = '0', generate_missing_value_indicator_column: bool = False, cols_with_all_missing_values: example.assets.example_components._assets._AzuremlCleanMissingDataColsWithAllMissingValuesEnum = _AzuremlCleanMissingDataColsWithAllMissingValuesEnum.remove) → example.assets.example_components._assets._AzuremlCleanMissingDataComponent

Specifies how to handle the values missing from a dataset.

Parameters

dataset (Path) – Dataset to be cleaned
columns_to_be_cleaned (str) – Columns for missing values clean operation
minimum_missing_value_ratio (float) – Clean only column with missing value ratio above specified value, out of set of all selected columns (max: 1.0)
maximum_missing_value_ratio (float) – Clean only columns with missing value ratio below specified value, out of set of all selected columns (max: 1.0)
cleaning_mode (_AzuremlCleanMissingDataCleaningModeEnum) – Algorithm to clean missing values (enum: [‘Custom substitution value’, ‘Replace with mean’, ‘Replace with median’, ‘Replace with mode’, ‘Remove entire row’, ‘Remove entire column’])
replacement_value (str) – Type the value that takes the place of missing values (optional)
generate_missing_value_indicator_column (bool) – Generate a column that indicates which rows were cleaned (optional)
cols_with_all_missing_values (_AzuremlCleanMissingDataColsWithAllMissingValuesEnum) – Cols with all missing values (optional, enum: [‘Propagate’, ‘Remove’])

Output cleaned_dataset

Cleaned dataset

Type

cleaned_dataset: Output

Output cleaning_transformation

Transformation to be passed to Apply Transformation module to clean new data

Type

cleaning_transformation: Output

example.assets.example_components.azureml_clip_values(input: Optional[pathlib.Path] = None, clipmode: example.assets.example_components._assets._AzuremlClipValuesClipmodeEnum = _AzuremlClipValuesClipmodeEnum.clippeaks, upperthreshold: example.assets.example_components._assets._AzuremlClipValuesUpperthresholdEnum = _AzuremlClipValuesUpperthresholdEnum.constant, constantupperthreshold: float = 99, percentileupperthreshold: float = 99, modeuppersubstitute: example.assets.example_components._assets._AzuremlClipValuesModeuppersubstituteEnum = _AzuremlClipValuesModeuppersubstituteEnum.threshold, lowerthreshold: example.assets.example_components._assets._AzuremlClipValuesLowerthresholdEnum = _AzuremlClipValuesLowerthresholdEnum.constant, constantlowerthreshold: float = 1, percentilelowerthreshold: float = 1, modeowersubstitute: example.assets.example_components._assets._AzuremlClipValuesModeowersubstituteEnum = _AzuremlClipValuesModeowersubstituteEnum.threshold, lowerupperthreshold: example.assets.example_components._assets._AzuremlClipValuesLowerupperthresholdEnum = _AzuremlClipValuesLowerupperthresholdEnum.constant, constantuthreshold: float = 99, constantlthreshold: float = 1, percentileuthreshold: float = 99, percentilelthreshold: float = 1, modeusubstitute: example.assets.example_components._assets._AzuremlClipValuesModeusubstituteEnum = _AzuremlClipValuesModeusubstituteEnum.threshold, modelsubstitute: example.assets.example_components._assets._AzuremlClipValuesModelsubstituteEnum = _AzuremlClipValuesModelsubstituteEnum.threshold, column_selector: Optional[str] = None, inplace_flag: bool = True, indicator_flag: bool = False) → example.assets.example_components._assets._AzuremlClipValuesComponent

Detects outliers and clips or replaces their values.

Parameters

input (Path) – DataFrameDirectory
clipmode (_AzuremlClipValuesClipmodeEnum) – enum (enum: [‘ClipPeaks’, ‘ClipSubPeaks’, ‘ClipPeaksAndSubpeaks’])
upperthreshold (_AzuremlClipValuesUpperthresholdEnum) – enum (optional, enum: [‘Constant’, ‘Percentile’])
constantupperthreshold (float) – float (optional)
percentileupperthreshold (float) – float (optional)
modeuppersubstitute (_AzuremlClipValuesModeuppersubstituteEnum) – enum (optional, enum: [‘Threshold’, ‘Mean’, ‘Median’, ‘Missing’])
lowerthreshold (_AzuremlClipValuesLowerthresholdEnum) – enum (optional, enum: [‘Constant’, ‘Percentile’])
constantlowerthreshold (float) – float (optional)
percentilelowerthreshold (float) – float (optional)
modeowersubstitute (_AzuremlClipValuesModeowersubstituteEnum) – enum (optional, enum: [‘Threshold’, ‘Mean’, ‘Median’, ‘Missing’])
lowerupperthreshold (_AzuremlClipValuesLowerupperthresholdEnum) – enum (optional, enum: [‘Constant’, ‘Percentile’])
constantuthreshold (float) – float (optional)
constantlthreshold (float) – float (optional)
percentileuthreshold (float) – float (optional)
percentilelthreshold (float) – float (optional)
modeusubstitute (_AzuremlClipValuesModeusubstituteEnum) – enum (optional, enum: [‘Threshold’, ‘Mean’, ‘Median’, ‘Missing’])
modelsubstitute (_AzuremlClipValuesModelsubstituteEnum) – enum (optional, enum: [‘Threshold’, ‘Mean’, ‘Median’, ‘Missing’])
column_selector (str) – ColumnPicker
inplace_flag (bool) – boolean
indicator_flag (bool) – boolean

Output result_dataset

DataFrameDirectory

Type

result_dataset: Output

example.assets.example_components.azureml_convert_to_csv(dataset: Optional[pathlib.Path] = None) → example.assets.example_components._assets._AzuremlConvertToCsvComponent

Converts data input to a comma-separated values format.

Parameters: dataset (Path) – Input dataset
Output results_dataset: Output dataset
Type: results_dataset: Output

example.assets.example_components.azureml_convert_to_dataset(dataset: Optional[pathlib.Path] = None, action: example.assets.example_components._assets._AzuremlConvertToDatasetActionEnum = _AzuremlConvertToDatasetActionEnum.none, custom_missing_value: str = '?', replace: example.assets.example_components._assets._AzuremlConvertToDatasetReplaceEnum = _AzuremlConvertToDatasetReplaceEnum.missing, custom_value: str = 'obs', new_value: str = '0') → example.assets.example_components._assets._AzuremlConvertToDatasetComponent

Converts data input to the internal Dataset format used by Azure Machine Learning designer.

Parameters

dataset (Path) – Input dataset
action (_AzuremlConvertToDatasetActionEnum) – Action to apply to input dataset (enum: [‘None’, ‘SetMissingValues’, ‘ReplaceValues’])
custom_missing_value (str) – Value indicating missing value token (optional)
replace (_AzuremlConvertToDatasetReplaceEnum) – Specifies type of replacement for values (optional, enum: [‘Missing’, ‘Custom’])
custom_value (str) – Value to be replaced (optional)
new_value (str) – Replacement value (optional)

Output results_dataset

Output dataset

Type

results_dataset: Output

example.assets.example_components.azureml_convert_to_image_directory(input_dataset: Optional[pathlib.Path] = None) → example.assets.example_components._assets._AzuremlConvertToImageDirectoryComponent

Convert dataset to image directory format.

Parameters: input_dataset (Path) – Input dataset
Output output_image_directory: Output image directory.
Type: output_image_directory: Output

example.assets.example_components.azureml_convert_to_indicator_values(dataset: Optional[pathlib.Path] = None, categorical_columns_to_convert: Optional[str] = None, overwrite_categorical_columns: bool = False) → example.assets.example_components._assets._AzuremlConvertToIndicatorValuesComponent

Converts categorical values in columns to indicator values.

Parameters

dataset (Path) – Dataset with categorical columns
categorical_columns_to_convert (str) – Select categorical columns to convert to indicator matrices.
overwrite_categorical_columns (bool) – If True, overwrite the selected categorical columns, otherwise append the resulting indicator matrices to the dataset (optional)

Output results_dataset

Dataset with categorical columns converted to indicator matrices.

Type

results_dataset: Output

Output indicator_values_transformation

Transformation to be passed to Apply Transformation module to convert indicator values for new data

Type

indicator_values_transformation: Output

example.assets.example_components.azureml_convert_word_to_vector(dataset: Optional[pathlib.Path] = None, target_column: Optional[str] = None, word2vec_strategy: example.assets.example_components._assets._AzuremlConvertWordToVectorWord2VecStrategyEnum = _AzuremlConvertWordToVectorWord2VecStrategyEnum.gensim_word2vec, word2vec_training_algorithm: example.assets.example_components._assets._AzuremlConvertWordToVectorWord2VecTrainingAlgorithmEnum = _AzuremlConvertWordToVectorWord2VecTrainingAlgorithmEnum.skip_gram, length_of_word_embedding: int = 100, context_window_size: int = 5, number_of_epochs: int = 5, maximum_vocabulary_size: int = 10000, minimum_word_count: int = 5) → example.assets.example_components._assets._AzuremlConvertWordToVectorComponent

Convert word to vector.

Parameters

dataset (Path) – Input data
target_column (str) – Select one target column whose vocabulary embeddings will be generated
word2vec_strategy (_AzuremlConvertWordToVectorWord2VecStrategyEnum) – Select the strategy for computing word embedding (enum: [‘GloVe pretrained English Model’, ‘Gensim Word2Vec’, ‘Gensim FastText’])
word2vec_training_algorithm (_AzuremlConvertWordToVectorWord2VecTrainingAlgorithmEnum) – Select the training algorithm for training Word2Vec model (optional, enum: [‘Skip_gram’, ‘CBOW’])
length_of_word_embedding (int) – Specify the length of the word embedding/vector (optional, min: 10, max: 2000)
context_window_size (int) – Specify the maximum distance between the word being predicted and the current word (optional, min: 1, max: 100)
number_of_epochs (int) – Specify the number of epochs (iterations) over the corpus (optional, min: 1, max: 1024)
maximum_vocabulary_size (int) – Specify the maximum number of the words in vocabulary (min: 10, max: 2147483647)
minimum_word_count (int) – Ignores all words that have a frequency lower than this value (min: 1, max: 100)

Output vocabulary_with_embeddings

Vocabulary with embeddings

Type

vocabulary_with_embeddings: Output

example.assets.example_components.azureml_create_python_model(python_script: str = '\n# The script MUST define a class named AzureMLModel.\n# This class MUST at least define the following three methods: "__init__", "train" and "predict".\n# The signatures (method and argument names) of all these methods MUST be exactly the same as the following example.\n\n# Please do not install extra packages such as "pip install xgboost" in this script,\n# otherwise errors will be raised when reading models in down-stream modules.\n\nimport pandas as pd\nfrom sklearn.linear_model import LogisticRegression\n\n\nclass AzureMLModel:\n # The __init__ method is only invoked in module "Create Python Model",\n # and will not be invoked again in the following modules "Train Model" and "Score Model".\n # The attributes defined in the __init__ method are preserved and usable in the train and predict method.\n def __init__(self):\n # self.model must be assigned\n self.model = LogisticRegression()\n self.feature_column_names = list()\n\n # Train model\n # Param<df_train>: a pandas.DataFrame\n # Param<df_label>: a pandas.Series\n def train(self, df_train, df_label):\n # self.feature_column_names records the column names used for training.\n # It is recommended to set this attribute before training so that the\n # feature columns used in predict and train methods have the same names.\n self.feature_column_names = df_train.columns.tolist()\n self.model.fit(df_train, df_label)\n\n # Predict results\n # Param<df>: a pandas.DataFrame\n # Must return a pandas.DataFrame\n def predict(self, df):\n # The feature columns used for prediction MUST have the same names as the ones for training.\n # The name of score column ("Scored Labels" in this case) MUST be different from any other\n # columns in input data.\n return pd.DataFrame({\'Scored Labels\': self.model.predict(df[self.feature_column_names])})\n') → example.assets.example_components._assets._AzuremlCreatePythonModelComponent

Creates Python model using custom script.

Parameters: python_script (str) – The Python script to execute
Output untrained_model: A untrained custom python model
Type: untrained_model: Output

example.assets.example_components.azureml_cross_validate_model(untrained_model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, name_or_numerical_index_of_the_label_column: Optional[str] = None, random_seed: int = 0) → example.assets.example_components._assets._AzuremlCrossValidateModelComponent

Cross Validate a classification or regression model with standard metrics.

Parameters

untrained_model (Path) – Untrained learner
dataset (Path) – Training data
name_or_numerical_index_of_the_label_column (str) – Select the column that contains the label or outcome column
random_seed (int) – Specify a numeric seed to use for random number generation. (max: 4294967295)

Output scored_results

Data scored results

Type

scored_results: Output

Output evaluation_results_by_fold

Data evaluation results by fold

Type

evaluation_results_by_fold: Output

example.assets.example_components.azureml_decision_forest_regression(create_trainer_mode: example.assets.example_components._assets._AzuremlDecisionForestRegressionCreateTrainerModeEnum = _AzuremlDecisionForestRegressionCreateTrainerModeEnum.singleparameter, number_of_decision_trees: int = 8, maximum_depth_of_the_decision_trees: int = 32, minimum_number_of_samples_per_leaf_node: int = 1, range_for_number_of_decision_trees: str = '1; 8; 32', range_for_the_maximum_depth_of_the_decision_trees: str = '1; 16; 64', range_for_the_minimum_number_of_samples_per_leaf_node: str = '1; 4; 16', resampling_method: example.assets.example_components._assets._AzuremlDecisionForestRegressionResamplingMethodEnum = _AzuremlDecisionForestRegressionResamplingMethodEnum.bagging_resampling) → example.assets.example_components._assets._AzuremlDecisionForestRegressionComponent

Creates a regression model using the decision forest algorithm.

Parameters

create_trainer_mode (_AzuremlDecisionForestRegressionCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])
number_of_decision_trees (int) – Specify the number of decision trees to create in the ensemble (optional, min: 1)
maximum_depth_of_the_decision_trees (int) – Specify the maximum depth of any decision tree that can be created in the ensemble (optional, min: 1)
minimum_number_of_samples_per_leaf_node (int) – Specify the minimum number of training samples required to generate a leaf node (optional, min: 1)
range_for_number_of_decision_trees (str) – Specify range for the number of decision trees to create in the ensemble (optional)
range_for_the_maximum_depth_of_the_decision_trees (str) – Specify range for the maximum depth of the decision trees (optional)
range_for_the_minimum_number_of_samples_per_leaf_node (str) – Specify range for the minimum number of samples per leaf node (optional)
resampling_method (_AzuremlDecisionForestRegressionResamplingMethodEnum) – Choose a resampling method (enum: [‘Bagging Resampling’, ‘Replicate Resampling’])

Output untrained_model

An untrained regression model

Type

untrained_model: Output

example.assets.example_components.azureml_densenet(model_name: example.assets.example_components._assets._AzuremlDensenetModelNameEnum = _AzuremlDensenetModelNameEnum.densenet201, pretrained: bool = True, memory_efficient: bool = False) → example.assets.example_components._assets._AzuremlDensenetComponent

Creates a image classification model using the densenet algorithm.

Parameters

model_name (_AzuremlDensenetModelNameEnum) – Name of a certain densenet structure (enum: [‘densenet121’, ‘densenet161’, ‘densenet169’, ‘densenet201’])
pretrained (bool) – Indicate whether to use a model pre-trained on ImageNet
memory_efficient (bool) – Indicate whether to use checkpointing, which is much more memory efficient but slower

Output untrained_model

Untrained densenet model path

Type

untrained_model: Output

example.assets.example_components.azureml_edit_metadata(dataset: Optional[pathlib.Path] = None, column: Optional[str] = None, data_type: example.assets.example_components._assets._AzuremlEditMetadataDataTypeEnum = _AzuremlEditMetadataDataTypeEnum.unchanged, date_and_time_format: Optional[str] = None, categorical: example.assets.example_components._assets._AzuremlEditMetadataCategoricalEnum = _AzuremlEditMetadataCategoricalEnum.unchanged, fields: example.assets.example_components._assets._AzuremlEditMetadataFieldsEnum = _AzuremlEditMetadataFieldsEnum.unchanged, new_column_name: Optional[str] = None) → example.assets.example_components._assets._AzuremlEditMetadataComponent

Edits metadata associated with columns in a dataset.

Parameters

dataset (Path) – Input dataset
column (str) – Choose the columns to which your changes should apply
data_type (_AzuremlEditMetadataDataTypeEnum) – Specify the new data type of the column (enum: [‘Unchanged’, ‘String’, ‘Integer’, ‘Double’, ‘Boolean’, ‘DateTime’])
date_and_time_format (str) – Specify custom format string for parsing DateTime, refer to Python standard library datetime.strftime() for detailed documentation. Leave empty for default permissive parsing (optional)
categorical (_AzuremlEditMetadataCategoricalEnum) – Indicate whether the column should be flagged as categorical (enum: [‘Unchanged’, ‘Categorical’, ‘NonCategorical’])
fields (_AzuremlEditMetadataFieldsEnum) – Specify whether the column should be considered a feature or label by learning algorithms (enum: [‘Unchanged’, ‘Features’, ‘Labels’, ‘ClearFeatures’, ‘ClearLabels’, ‘ClearScores’])
new_column_name (str) – Type the new names of the columns (optional)

Output results_dataset

Dataset with changed metadata

Type

results_dataset: Output

example.assets.example_components.azureml_enter_data_manually(dataformat: example.assets.example_components._assets._AzuremlEnterDataManuallyDataformatEnum = _AzuremlEnterDataManuallyDataformatEnum.csv, hasheader: bool = True, data: Optional[str] = None) → example.assets.example_components._assets._AzuremlEnterDataManuallyComponent

Enables entering and editing small datasets by typing values.

Parameters

dataformat (_AzuremlEnterDataManuallyDataformatEnum) – Select which format data will be entered (enum: [‘ARFF’, ‘CSV’, ‘SvmLight’, ‘TSV’])
hasheader (bool) – CSV or TSV file has a header (optional)
data (str) – Text to output as DataTable

Output dataset

Entered data

Type

dataset: Output

example.assets.example_components.azureml_evaluate_model(scored_dataset: Optional[pathlib.Path] = None, scored_dataset_to_compare: Optional[pathlib.Path] = None) → example.assets.example_components._assets._AzuremlEvaluateModelComponent

Evaluates the results of a classification or regression model with standard metrics.

Parameters

scored_dataset (Path) – Scored dataset
scored_dataset_to_compare (Path) – Scored dataset to compare (optional)(optional)

Output evaluation_results

Data evaluation result

Type

evaluation_results: Output

example.assets.example_components.azureml_evaluate_recommender(test_dataset: Optional[pathlib.Path] = None, scored_dataset: Optional[pathlib.Path] = None) → example.assets.example_components._assets._AzuremlEvaluateRecommenderComponent

Evaluate a recommendation model.

Parameters

test_dataset (Path) – Test dataset
scored_dataset (Path) – Scored dataset

Output metric

A table of evaluation metrics

Type

metric: Output

example.assets.example_components.azureml_execute_python_script(dataset1: Optional[pathlib.Path] = None, dataset2: Optional[pathlib.Path] = None, script_bundle: Optional[pathlib.Path] = None, python_script: str = '\n# The script MUST contain a function named azureml_main\n# which is the entry point for this module.\n\n# imports up here can be used to\nimport pandas as pd\n\n# The entry point function MUST have two input arguments.\n# If the input port is not connected, the corresponding\n# dataframe argument will be None.\n# Param<dataframe1>: a pandas.DataFrame\n# Param<dataframe2>: a pandas.DataFrame\ndef azureml_main(dataframe1 = None, dataframe2 = None):\n\n # Execution logic goes here\n print(f\'Input pandas.DataFrame #1: {dataframe1}\')\n\n # If a zip file is connected to the third input port,\n # it is unzipped under "./Script Bundle". This directory is added\n # to sys.path. Therefore, if your zip file contains a Python file\n # mymodule.py you can import it using:\n # import mymodule\n\n # Return value must be of a sequence of pandas.DataFrame\n # E.g.\n # - Single return value: return dataframe1,\n # - Two return values: return dataframe1, dataframe2\n return dataframe1,\n\n') → example.assets.example_components._assets._AzuremlExecutePythonScriptComponent

Executes a Python script from an Azure Machine Learning designer pipeline.

Parameters

dataset1 (Path) – Input dataset 1(optional)
dataset2 (Path) – Input dataset 2(optional)
script_bundle (Path) – Zip file containing custom resources(optional)
python_script (str) – The Python script to execute

Output result_dataset

Output Dataset

Type

result_dataset: Output

Output python_device

Output Dataset2

Type

python_device: Output

example.assets.example_components.azureml_execute_r_script(dataset1: Optional[pathlib.Path] = None, dataset2: Optional[pathlib.Path] = None, script_bundle: Optional[pathlib.Path] = None, r_script: str = '\n# R version: 3.5.1\n# The script MUST contain a function named azureml_main\n# which is the entry point for this module.\n\n# Please note that functions dependant on X11 library\n# such as "View" are not supported because X11 library\n# is not pre-installed.\n\n# The entry point function MUST have two input arguments.\n# If the input port is not connected, the corresponding\n# dataframe argument will be null.\n# Param<dataframe1>: a R DataFrame\n# Param<dataframe2>: a R DataFrame\nazureml_main <- function(dataframe1, dataframe2){\n print("R script run.")\n\n # If a zip file is connected to the third input port, it is\n # unzipped under "./Script Bundle". This directory is added\n # to sys.path.\n\n # Return datasets as a Named List\n return(list(dataset1=dataframe1, dataset2=dataframe2))\n}\n\n', random_seed: Optional[int] = None) → example.assets.example_components._assets._AzuremlExecuteRScriptComponent

Executes an R script from an Azure Machine Learning designer pipeline.

Parameters

dataset1 (Path) – Input dataset 1(optional)
dataset2 (Path) – Input dataset 2(optional)
script_bundle (Path) – Set of R sources(optional)
r_script (str) – Specify a StreamReader pointing to the R script sources
random_seed (int) – Define a random seed value for use inside the R environment. Calls “set.seed(value)” (optional)

Output result_dataset

Output Dataset

Type

result_dataset: Output

Output r_device

Output Dataset2

Type

r_device: Output

example.assets.example_components.azureml_export_data(input_path: Optional[pathlib.Path] = None, datastore_type: Optional[str] = None, output_data_store: Optional[str] = None, output_path: Optional[str] = None, output_file_type: Optional[str] = None, datatable_name: Optional[str] = None, column_list_to_be_saved: Optional[str] = None, column_list_datatable_columns: Optional[str] = None, number_rows_per_operation: int = 50) → example.assets.example_components._assets._AzuremlExportDataComponent

Writes a dataset to cloud-based storage in Azure, such as Azure blob storage, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2.

Parameters

input_path (Path) – export data
datastore_type (str) – datastore type (optional)
output_data_store (str) – the location of output data store
output_path (str) – the relative output path in the data store (optional)
output_file_type (str) – the file type to be outputted (optional)
datatable_name (str) – export data table name (optional)
column_list_to_be_saved (str) – selected column(s) to be exported (optional)
column_list_datatable_columns (str) – column names in export data table (optional)
number_rows_per_operation (int) – number of rows per operation (optional)

example.assets.example_components.azureml_extract_n_gram_features_from_text(dataset: Optional[pathlib.Path] = None, input_vocabulary: Optional[pathlib.Path] = None, text_column: Optional[str] = None, vocabulary_mode: example.assets.example_components._assets._AzuremlExtractNGramFeaturesFromTextVocabularyModeEnum = _AzuremlExtractNGramFeaturesFromTextVocabularyModeEnum.create, n_grams_size: int = 1, weighting_function: example.assets.example_components._assets._AzuremlExtractNGramFeaturesFromTextWeightingFunctionEnum = _AzuremlExtractNGramFeaturesFromTextWeightingFunctionEnum.binary_weight, minimum_word_length: int = 3, maximum_word_length: int = 25, minimum_n_gram_document_absolute_frequency: float = 5, maximum_n_gram_document_ratio: float = 1, normalize_n_gram_feature_vectors: bool = False) → example.assets.example_components._assets._AzuremlExtractNGramFeaturesFromTextComponent

Creates N-Gram dictionary features and does feature selection on them.

Parameters

dataset (Path) – Input data
input_vocabulary (Path) – Input vocabulary(optional)
text_column (str) – Name or index (one-based) of text column
vocabulary_mode (_AzuremlExtractNGramFeaturesFromTextVocabularyModeEnum) – Specify how the n-gram vocabulary should be created from the corpus (enum: [‘Create’, ‘ReadOnly’])
n_grams_size (int) – Indicate the maximum size of n-grams to create (min: 1)
weighting_function (_AzuremlExtractNGramFeaturesFromTextWeightingFunctionEnum) – Choose the weighting function to apply to each n-gram value (enum: [‘Binary Weight’, ‘TF Weight’, ‘IDF Weight’, ‘TF-IDF Weight’])
minimum_word_length (int) – Specify the minimum length of words to include in n-grams (min: 1)
maximum_word_length (int) – Specify the maximum length of words to include in n-grams (min: 2)
minimum_n_gram_document_absolute_frequency (float) – Minimum n-gram document absolute frequency (min: 1.0)
maximum_n_gram_document_ratio (float) – Maximum n-gram document ratio (min: 0.0001)
normalize_n_gram_feature_vectors (bool) – Normalize n-gram feature vectors. If true, then the n-gram feature vector is divided by its L2 norm.

Output results_dataset

Extracted features

Type

results_dataset: Output

Output result_vocabulary

Result vocabulary

Type

result_vocabulary: Output

example.assets.example_components.azureml_fast_forest_quantile_regression(create_trainer_mode: example.assets.example_components._assets._AzuremlFastForestQuantileRegressionCreateTrainerModeEnum = _AzuremlFastForestQuantileRegressionCreateTrainerModeEnum.singleparameter, number_of_trees: int = 100, number_of_leaves: int = 20, minimum_number_of_training_instances_required_to_form_a_leaf: int = 10, bagging_fraction: float = 0.7, split_fraction: float = 0.7, quantiles_to_be_estimated: str = '0.25; 0.5; 0.75', range_for_total_number_of_trees_constructed: str = '16; 32; 64', range_for_maximum_number_of_leaves_per_tree: str = '16; 32; 64', range_for_minimum_number_of_training_instances_required_to_form_a_leaf: str = '1; 5; 10', range_for_bagging_fraction: str = '0.25; 0.5; 0.75', range_for_split_fraction: str = '0.25; 0.5; 0.75', required_quantile_values: str = '0.25; 0.5; 0.75', random_number_seed: Optional[int] = None) → example.assets.example_components._assets._AzuremlFastForestQuantileRegressionComponent

Creates a quantile regression model

Parameters

create_trainer_mode (_AzuremlFastForestQuantileRegressionCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])
number_of_trees (int) – Specifies the number of trees to be constructed (optional)
number_of_leaves (int) – Specifies the maximum number of leaves per tree. The default number is 20 (optional, min: 2)
minimum_number_of_training_instances_required_to_form_a_leaf (int) – Indicates the minimum number of training instances requried to form a leaf (optional)
bagging_fraction (float) – Specifies the fraction of training data to use for each tree (optional)
split_fraction (float) – Specifies the fraction of features (chosen randomly) to use for each split (optional)
quantiles_to_be_estimated (str) – Specifies the quantile to be estimated (optional)
range_for_total_number_of_trees_constructed (str) – Specify the range for the maximum number of trees that can be created during training (optional)
range_for_maximum_number_of_leaves_per_tree (str) – Specify range for the maximum number of leaves allowed per tree (optional)
range_for_minimum_number_of_training_instances_required_to_form_a_leaf (str) – Specify the range for the minimum number of cases required to form a leaf (optional)
range_for_bagging_fraction (str) – Specifies the range for fraction of training data to use for each tree (optional)
range_for_split_fraction (str) – Specifies the range for fraction of features (chosen randomly) to use for each split (optional)
required_quantile_values (str) – Required quantile value used during parameter sweep (optional)
random_number_seed (int) – Provide a seed for the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained quantile regression model that can be connected to the Train Generic Model or Cross Validate Model modules.

Type

untrained_model: Output

example.assets.example_components.azureml_feature_hashing(dataset: Optional[pathlib.Path] = None, target_column: Optional[str] = None, hashing_bitsize: int = 10, n_grams: int = 2) → example.assets.example_components._assets._AzuremlFeatureHashingComponent

Convert text data to numeric features using the nimbusml.

Parameters

dataset (Path) – Input dataset
target_column (str) – Choose the columns to which hashing will be applied
hashing_bitsize (int) – Type the number of bits used to hash the selected columns (min: 1, max: 31)
n_grams (int) – Specify the number of N-grams generated during hashing (max: 10)

Output transformed_dataset

Output dataset with hashed columns,the number of feature columns generated is related to the parameters(Hashing bitsize).

Type

transformed_dataset: Output

example.assets.example_components.azureml_filter_based_feature_selection(input_dataset: Optional[pathlib.Path] = None, operate_on_feature_columns_only: bool = True, target_column: Optional[str] = None, number_of_desired_features: int = 1, feature_scoring_method: example.assets.example_components._assets._AzuremlFilterBasedFeatureSelectionFeatureScoringMethodEnum = _AzuremlFilterBasedFeatureSelectionFeatureScoringMethodEnum.pearsoncorrelation) → example.assets.example_components._assets._AzuremlFilterBasedFeatureSelectionComponent

Identifies the features in a dataset with the greatest predictive power.

Parameters

input_dataset (Path) – Input dataset
operate_on_feature_columns_only (bool) – Indicate whether to use only feature columns in the scoring process (optional)
target_column (str) – Specify the target column
number_of_desired_features (int) – Specify the number of features to output in results
feature_scoring_method (_AzuremlFilterBasedFeatureSelectionFeatureScoringMethodEnum) – Choose the method to use for scoring (enum: [‘PearsonCorrelation’, ‘ChiSquared’])

Output filtered_dataset

Filtered dataset

Type

filtered_dataset: Output

Output features

Names of output columns and feature selection scores

Type

features: Output

example.assets.example_components.azureml_group_data_into_bins(dataset: Optional[pathlib.Path] = None, binning_mode: example.assets.example_components._assets._AzuremlGroupDataIntoBinsBinningModeEnum = _AzuremlGroupDataIntoBinsBinningModeEnum.quantiles, number_of_bins: int = 10, quantile_normalization: example.assets.example_components._assets._AzuremlGroupDataIntoBinsQuantileNormalizationEnum = _AzuremlGroupDataIntoBinsQuantileNormalizationEnum.percent, comma_separated_list_of_bin_edges: Optional[str] = None, columns_to_bin: Optional[str] = None, output_mode: example.assets.example_components._assets._AzuremlGroupDataIntoBinsOutputModeEnum = _AzuremlGroupDataIntoBinsOutputModeEnum.append, tag_columns_as_categorical: bool = True) → example.assets.example_components._assets._AzuremlGroupDataIntoBinsComponent

Map input values to a smaller number of bins using a quantization function.

Parameters

dataset (Path) – Dataset to be analyzed
binning_mode (_AzuremlGroupDataIntoBinsBinningModeEnum) – Choose a binning method (enum: [‘Quantiles’, ‘Equal Width’, ‘Custom Edges’])
number_of_bins (int) – Specify the desired number of bins (optional, min: 1)
quantile_normalization (_AzuremlGroupDataIntoBinsQuantileNormalizationEnum) – Choose the method for normalizing quantiles (optional, enum: [‘Percent’, ‘PQuantile’, ‘Quantile Index’])
comma_separated_list_of_bin_edges (str) – Type a comma-separated list of numbers to use as bin edges (optional)
columns_to_bin (str) – Choose columns for quantization
output_mode (_AzuremlGroupDataIntoBinsOutputModeEnum) – Indicate how quantized columns should be output (enum: [‘Append’, ‘Inplace’, ‘Result Only’])
tag_columns_as_categorical (bool) – Indicate whether output columns should be tagged as categorical

Output quantized_dataset

Dataset with quantized columns

Type

quantized_dataset: Output

Output binning_transformation

Transformation that applies quantization to the dataset

Type

binning_transformation: Output

example.assets.example_components.azureml_import_data(input_dataset_request_dto: Optional[str] = None, data_store_type: Optional[str] = None, override_data_store_name: Optional[str] = None, override_data_path: Optional[str] = None) → example.assets.example_components._assets._AzuremlImportDataComponent

Load data from web URLs or from various cloud-based storage in Azure, such as Azure SQL database, Azure blob storage, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2.

Parameters

input_dataset_request_dto (str) – input dataset Id/Object
data_store_type (str) – data store type (optional)
override_data_store_name (str) – string (optional)
override_data_path (str) – string (optional)

Output output_data

DataFrameDirectory

Type

output_data: Output

example.assets.example_components.azureml_init_image_transformation(resize: example.assets.example_components._assets._AzuremlInitImageTransformationResizeEnum = _AzuremlInitImageTransformationResizeEnum.true, size: int = 256, center_crop: example.assets.example_components._assets._AzuremlInitImageTransformationCenterCropEnum = _AzuremlInitImageTransformationCenterCropEnum.true, crop_size: int = 224, pad: example.assets.example_components._assets._AzuremlInitImageTransformationPadEnum = _AzuremlInitImageTransformationPadEnum.false, padding: int = 0, color_jitter: bool = False, grayscale: bool = False, random_resized_crop: example.assets.example_components._assets._AzuremlInitImageTransformationRandomResizedCropEnum = _AzuremlInitImageTransformationRandomResizedCropEnum.false, random_resized_crop_size: int = 256, random_crop: example.assets.example_components._assets._AzuremlInitImageTransformationRandomCropEnum = _AzuremlInitImageTransformationRandomCropEnum.false, random_crop_size: int = 224, random_horizontal_flip: bool = True, random_vertical_flip: bool = False, random_rotation: example.assets.example_components._assets._AzuremlInitImageTransformationRandomRotationEnum = _AzuremlInitImageTransformationRandomRotationEnum.false, random_rotation_degrees: int = 0, random_affine: example.assets.example_components._assets._AzuremlInitImageTransformationRandomAffineEnum = _AzuremlInitImageTransformationRandomAffineEnum.false, random_affine_degrees: int = 0, random_grayscale: bool = False, random_perspective: bool = False) → example.assets.example_components._assets._AzuremlInitImageTransformationComponent

Initialize image transformation.

Parameters

resize (_AzuremlInitImageTransformationResizeEnum) – Resize the input PIL Image to the given size (enum: [‘False’, ‘True’])
size (int) – Desired output size (optional, min: 1)
center_crop (_AzuremlInitImageTransformationCenterCropEnum) – Crops the given PIL Image at the center (enum: [‘False’, ‘True’])
crop_size (int) – Desired output size of the crop (optional, min: 1)
pad (_AzuremlInitImageTransformationPadEnum) – Pad the given PIL Image on all sides with the given “pad” value (enum: [‘False’, ‘True’])
padding (int) – Padding on each border (optional)
color_jitter (bool) – Randomly change the brightness, contrast and saturation of an image
grayscale (bool) – Convert image to grayscale
random_resized_crop (_AzuremlInitImageTransformationRandomResizedCropEnum) – Crop the given PIL Image to random size and aspect ratio (enum: [‘False’, ‘True’])
random_resized_crop_size (int) – Expected output size of each edge (optional, min: 1)
random_crop (_AzuremlInitImageTransformationRandomCropEnum) – Crop the given PIL Image at a random location (enum: [‘False’, ‘True’])
random_crop_size (int) – Desired output size of the crop (optional, min: 1)
random_horizontal_flip (bool) – Horizontally flip the given PIL Image randomly with a given probability
random_vertical_flip (bool) – Vertically flip the given PIL Image randomly with a given probability
random_rotation (_AzuremlInitImageTransformationRandomRotationEnum) – Rotate the image by angle (enum: [‘False’, ‘True’])
random_rotation_degrees (int) – Range of degrees to select from (optional, max: 180)
random_affine (_AzuremlInitImageTransformationRandomAffineEnum) – Random affine transformation of the image keeping center invariant (enum: [‘False’, ‘True’])
random_affine_degrees (int) – Range of degrees to select from (optional, max: 180)
random_grayscale (bool) – Randomly convert image to grayscale with a probability of p (default 0.1)
random_perspective (bool) – Performs Perspective transformation of the given PIL Image randomly with a given probability

Output output_image_transformation

Output image transformation

Type

output_image_transformation: Output

example.assets.example_components.azureml_join_data(left_dataset: Optional[pathlib.Path] = None, right_dataset: Optional[pathlib.Path] = None, comma_separated_case_sensitive_names_of_join_key_columns_for_l: Optional[str] = None, comma_separated_case_sensitive_names_of_join_key_columns_for_r: Optional[str] = None, match_case: bool = True, join_type: example.assets.example_components._assets._AzuremlJoinDataJoinTypeEnum = _AzuremlJoinDataJoinTypeEnum.inner_join, keep_right_key_columns_in_joined_table: bool = True) → example.assets.example_components._assets._AzuremlJoinDataComponent

Joins two datasets on selected key columns.

Parameters

left_dataset (Path) – First dataset to join
right_dataset (Path) – Second dataset to join
comma_separated_case_sensitive_names_of_join_key_columns_for_l (str) – Select the join key columns for the first dataset
comma_separated_case_sensitive_names_of_join_key_columns_for_r (str) – Select the join key columns for the second dataset
match_case (bool) – Indicate whether a case-sensitive comparison is allowed on key columns
join_type (_AzuremlJoinDataJoinTypeEnum) – Choose a join type (enum: [‘Inner Join’, ‘Left Outer Join’, ‘Full Outer Join’, ‘Left Semi-Join’])
keep_right_key_columns_in_joined_table (bool) – Indicate whether to keep key columns from the second dataset in the joined dataset (optional)

Output results_dataset

Result of join operation

Type

results_dataset: Output

example.assets.example_components.azureml_k_means_clustering(create_trainer_mode: example.assets.example_components._assets._AzuremlKMeansClusteringCreateTrainerModeEnum = _AzuremlKMeansClusteringCreateTrainerModeEnum.singleparameter, number_of_centroids: int = 2, initialization: example.assets.example_components._assets._AzuremlKMeansClusteringInitializationEnum = _AzuremlKMeansClusteringInitializationEnum.k_means, random_number_seed: Optional[int] = None, metric: example.assets.example_components._assets._AzuremlKMeansClusteringMetricEnum = _AzuremlKMeansClusteringMetricEnum.euclidean, should_input_instances_be_normalized: bool = True, iterations: int = 100, assign_label_mode: example.assets.example_components._assets._AzuremlKMeansClusteringAssignLabelModeEnum = _AzuremlKMeansClusteringAssignLabelModeEnum.ignore_label_column) → example.assets.example_components._assets._AzuremlKMeansClusteringComponent

Initialize K-Means clustering model.

Parameters

create_trainer_mode (_AzuremlKMeansClusteringCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’])
number_of_centroids (int) – Number of Centroids (optional, min: 2)
initialization (_AzuremlKMeansClusteringInitializationEnum) – Initialization algorithm (optional, enum: [‘Random’, ‘K-Means++’, ‘Default’])
random_number_seed (int) – Type a value to seed the random number for centroid generator used by the training model. Leave blank to have value randomly choosen at first train. (optional, max: 4294967295)
metric (_AzuremlKMeansClusteringMetricEnum) – Selected metric (enum: [‘Euclidean’])
should_input_instances_be_normalized (bool) – Indicate whether instances should be normalized
iterations (int) – Number of iterations (min: 1)
assign_label_mode (_AzuremlKMeansClusteringAssignLabelModeEnum) – Mode of value assignment to the labeled column (enum: [‘Ignore label column’, ‘Fill missing values’, ‘Overwrite from closest to center’])

Output untrained_model

Untrained K-Means clustering model

Type

untrained_model: Output

example.assets.example_components.azureml_latent_dirichlet_allocation(dataset: Optional[pathlib.Path] = None, target_columns: Optional[str] = None, number_of_topics_to_model: int = 5, n_grams: int = 2, normalize: bool = True, show_all_options: example.assets.example_components._assets._AzuremlLatentDirichletAllocationShowAllOptionsEnum = _AzuremlLatentDirichletAllocationShowAllOptionsEnum.false, rho_parameter: float = 0.01, alpha_parameter: float = 0.01, estimated_number_of_documents: int = 1000, size_of_the_batch: int = 32, initial_value_of_iteration_count: int = 10, power_applied_to_the_iteration_during_updates: float = 0.5, passes: int = 25, build_dictionary_of_ngrams_prior_to_lda: example.assets.example_components._assets._AzuremlLatentDirichletAllocationBuildDictionaryOfNgramsPriorToLdaEnum = _AzuremlLatentDirichletAllocationBuildDictionaryOfNgramsPriorToLdaEnum.true, maximum_number_of_ngrams_in_dictionary: int = 20000, hash_bits: int = 12, build_dictionary_of_ngrams: example.assets.example_components._assets._AzuremlLatentDirichletAllocationBuildDictionaryOfNgramsEnum = _AzuremlLatentDirichletAllocationBuildDictionaryOfNgramsEnum.true, maximum_size_of_ngram_dictionary: int = 20000, number_of_hash_bits: int = 12) → example.assets.example_components._assets._AzuremlLatentDirichletAllocationComponent

Topic Modeling: Latent Dirichlet Allocation.

Parameters

dataset (Path) – Input dataset
target_columns (str) – Target column name or index
number_of_topics_to_model (int) – Model the document distribution against N topics (min: 1, max: 1000)
n_grams (int) – Order of N-grams generated during hashing (min: 1, max: 10)
normalize (bool) – Normalize output to probabilities. The feature topic matrix will be P(word|topic).
show_all_options (_AzuremlLatentDirichletAllocationShowAllOptionsEnum) – Presents additional parameters specific to Skleaarn online LDA (enum: [‘True’, ‘False’])
rho_parameter (float) – Rho parameter (optional, min: 2.220446049250313e-16, max: 1.0)
alpha_parameter (float) – Alpha parameter (optional, min: 2.220446049250313e-16, max: 1.0)
estimated_number_of_documents (int) – Estimated number of documents (optional, min: 1, max: 2147483647)
size_of_the_batch (int) – Size of the batch (optional, min: 1, max: 1024)
initial_value_of_iteration_count (int) – Initial value of iteration count used in learning rate update schedule (optional, min: 1, max: 2147483647)
power_applied_to_the_iteration_during_updates (float) – Power applied to the iteration count during online updates (optional, min: 0.5, max: 1.0)
passes (int) – Number of training iterations (optional, min: 1, max: 1024)
build_dictionary_of_ngrams_prior_to_lda (_AzuremlLatentDirichletAllocationBuildDictionaryOfNgramsPriorToLdaEnum) – Builds a dictionary of ngrams prior to LDA. Useful for model inspection and interpretation (optional, enum: [‘True’, ‘False’])
maximum_number_of_ngrams_in_dictionary (int) – Maximum size of the dictionary. If number of tokens in the input exceed this size, collisions may occur (optional, min: 1, max: 2147483647)
hash_bits (int) – Number of bits to use for feature hashing (optional, min: 1, max: 31)
build_dictionary_of_ngrams (_AzuremlLatentDirichletAllocationBuildDictionaryOfNgramsEnum) – Builds a dictionary of ngrams prior to computing LDA. Useful for model inspection and interpretation (optional, enum: [‘True’, ‘False’])
maximum_size_of_ngram_dictionary (int) – Maximum size of the ngrams dictionary. If number of tokens in the input exceed this size, collisions may occur (optional, min: 1, max: 2147483647)
number_of_hash_bits (int) – Number of bits to use during feature hashing (optional, min: 1, max: 31)

Output transformed_dataset

Output dataset

Type

transformed_dataset: Output

Output feature_topic_matrix

Feature topic matrix produced by LDA

Type

feature_topic_matrix: Output

Output lda_transformation

Transformation that applies LDA to the dataset

Type

lda_transformation: Output

example.assets.example_components.azureml_linear_regression(solution_method: example.assets.example_components._assets._AzuremlLinearRegressionSolutionMethodEnum = _AzuremlLinearRegressionSolutionMethodEnum.ordinary_least_squares, create_trainer_mode: example.assets.example_components._assets._AzuremlLinearRegressionCreateTrainerModeEnum = _AzuremlLinearRegressionCreateTrainerModeEnum.singleparameter, learning_rate: float = 0.1, number_of_epochs_over_which_algorithm_iterates_through_examples: int = 10, l2_regularization_term_weight: float = 0.001, range_for_learning_rate: str = '0.025; 0.05; 0.1; 0.2', range_for_number_of_epochs_over_which_algorithm_iterates_through_examples: str = '1; 10; 100', range_for_l2_regularization_term_weight: str = '0.001; 0.01; 0.1', should_input_instances_be_normalized: bool = True, decrease_learning_rate_as_iterations_progress: bool = True, l2_regularization_weight: float = 0.001, include_intercept_term: bool = True, random_number_seed: Optional[int] = None) → example.assets.example_components._assets._AzuremlLinearRegressionComponent

Creates a linear regression model.

Parameters

solution_method (_AzuremlLinearRegressionSolutionMethodEnum) – Choose an optimization method (enum: [‘Online Gradient Descent’, ‘Ordinary Least Squares’])
create_trainer_mode (_AzuremlLinearRegressionCreateTrainerModeEnum) – Create advanced learner options (optional, enum: [‘SingleParameter’, ‘ParameterRange’])
learning_rate (float) – Specify the initial learning rate for the stochastic gradient descent optimizer (optional, min: 2.220446049250313e-16)
number_of_epochs_over_which_algorithm_iterates_through_examples (int) – Specify how many times the algorithm should iterate through examples. For datasets with a small number of examples, this number should be large to reach convergence. (optional)
l2_regularization_term_weight (float) – Specify the weight for L2 regularization. Use a non-zero value to avoid overfitting. (optional)
range_for_learning_rate (str) – Specify the range for the initial learning rate for the stochastic gradient descent optimizer (optional)
range_for_number_of_epochs_over_which_algorithm_iterates_through_examples (str) – Specify range for how many times the algorithm should iterate through examples. For datasets with a small number of examples, this number should be large to reach convergence. (optional)
range_for_l2_regularization_term_weight (str) – Specify the range for the weight for L2 regularization. Use a non-zero value to avoid overfitting. (optional)
should_input_instances_be_normalized (bool) – Indicate whether instances should be normalized (optional)
decrease_learning_rate_as_iterations_progress (bool) – Indicate whether the learning rate should decrease as iterations progress (optional)
l2_regularization_weight (float) – Specify the weight for the L2 regularization. Use a non-zero value to avoid overfitting. (optional)
include_intercept_term (bool) – Indicate whether an additional term should be added for the intercept (optional)
random_number_seed (int) – Specify a value to seed the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained regression model

Type

untrained_model: Output

example.assets.example_components.azureml_multiclass_boosted_decision_tree(create_trainer_mode: example.assets.example_components._assets._AzuremlMulticlassBoostedDecisionTreeCreateTrainerModeEnum = _AzuremlMulticlassBoostedDecisionTreeCreateTrainerModeEnum.singleparameter, maximum_number_of_leaves_per_tree: int = 20, minimum_number_of_training_instances_required_to_form_a_leaf: int = 10, the_learning_rate: float = 0.2, total_number_of_trees_constructed: int = 100, range_for_maximum_number_of_leaves_per_tree: str = '2; 8; 32; 128', range_for_minimum_number_of_training_instances_required_to_form_a_leaf: str = '1; 10; 50', range_for_learning_rate: str = '0.025; 0.05; 0.1; 0.2; 0.4', range_for_total_number_of_trees_constructed: str = '20; 100; 500', random_number_seed: Optional[int] = None) → example.assets.example_components._assets._AzuremlMulticlassBoostedDecisionTreeComponent

Creates a multiclass classifier using a boosted decision tree algorithm.

Parameters

create_trainer_mode (_AzuremlMulticlassBoostedDecisionTreeCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])
maximum_number_of_leaves_per_tree (int) – Specify the maximum number of leaves allowed per tree (optional, min: 2, max: 131072)
minimum_number_of_training_instances_required_to_form_a_leaf (int) – Specify the minimum number of cases required to form a leaf (optional, min: 1)
the_learning_rate (float) – Specify the initial learning rate (optional, min: 2.220446049250313e-16, max: 1.0)
total_number_of_trees_constructed (int) – Specify the maximum number of trees that can be created during training (optional, min: 1)
range_for_maximum_number_of_leaves_per_tree (str) – Specify range for the maximum number of leaves allowed per tree (optional)
range_for_minimum_number_of_training_instances_required_to_form_a_leaf (str) – Specify the range for the minimum number of cases required to form a leaf (optional)
range_for_learning_rate (str) – Specify the range for the initial learning rate (optional)
range_for_total_number_of_trees_constructed (str) – Specify the range for the maximum number of trees that can be created during training (optional)
random_number_seed (int) – Type a value to seed the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained multiclass classification model

Type

untrained_model: Output

example.assets.example_components.azureml_multiclass_decision_forest(create_trainer_mode: example.assets.example_components._assets._AzuremlMulticlassDecisionForestCreateTrainerModeEnum = _AzuremlMulticlassDecisionForestCreateTrainerModeEnum.singleparameter, number_of_decision_trees: int = 8, maximum_depth_of_the_decision_trees: int = 32, minimum_number_of_samples_per_leaf_node: int = 1, range_for_number_of_decision_trees: str = '1; 8; 32', range_for_the_maximum_depth_of_the_decision_trees: str = '1; 16; 64', range_for_the_minimum_number_of_samples_per_leaf_node: str = '1; 4; 16', resampling_method: example.assets.example_components._assets._AzuremlMulticlassDecisionForestResamplingMethodEnum = _AzuremlMulticlassDecisionForestResamplingMethodEnum.bagging_resampling) → example.assets.example_components._assets._AzuremlMulticlassDecisionForestComponent

Creates a multiclass classification model using the decision forest algorithm.

Parameters

create_trainer_mode (_AzuremlMulticlassDecisionForestCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])
number_of_decision_trees (int) – Specify the number of decision trees to create in the ensemble (optional, min: 1)
maximum_depth_of_the_decision_trees (int) – Specify the maximum depth of any decision tree that can be created in the ensemble (optional, min: 1)
minimum_number_of_samples_per_leaf_node (int) – Specify the minimum number of training samples required to generate a leaf node (optional, min: 1)
range_for_number_of_decision_trees (str) – Specify range for the number of decision trees to create in the ensemble (optional)
range_for_the_maximum_depth_of_the_decision_trees (str) – Specify range for the maximum depth of the decision trees (optional)
range_for_the_minimum_number_of_samples_per_leaf_node (str) – Specify range for the minimum number of samples per leaf node (optional)
resampling_method (_AzuremlMulticlassDecisionForestResamplingMethodEnum) – Choose a resampling method (enum: [‘Bagging Resampling’, ‘Replicate Resampling’])

Output untrained_model

An untrained multiclass classification model

Type

untrained_model: Output

example.assets.example_components.azureml_multiclass_logistic_regression(create_trainer_mode: example.assets.example_components._assets._AzuremlMulticlassLogisticRegressionCreateTrainerModeEnum = _AzuremlMulticlassLogisticRegressionCreateTrainerModeEnum.singleparameter, optimization_tolerance: float = 1e-07, l2_regularizaton_weight: float = 1.0, range_for_optimization_tolerance: str = '0.00001; 0.00000001', range_for_l2_regularization_weight: str = '0.01; 0.1; 1.0', random_number_seed: Optional[int] = None) → example.assets.example_components._assets._AzuremlMulticlassLogisticRegressionComponent

Creates a multiclass logistic regression classification model.

Parameters

create_trainer_mode (_AzuremlMulticlassLogisticRegressionCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])
optimization_tolerance (float) – Specify a tolerance value for the L-BFGS optimizer (optional, min: 2.220446049250313e-16)
l2_regularizaton_weight (float) – Specify the L2 regularization weight. Use a non-zero value to avoid overfitting. (optional)
range_for_optimization_tolerance (str) – Specify a range for the tolerance value for the L-BFGS optimizer (optional)
range_for_l2_regularization_weight (str) – Specify the range for the L2 regularization weight. Use a non-zero value to avoid overfitting. (optional)
random_number_seed (int) – Type a value to seed the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained classificaiton model

Type

untrained_model: Output

example.assets.example_components.azureml_multiclass_neural_network(create_trainer_mode: example.assets.example_components._assets._AzuremlMulticlassNeuralNetworkCreateTrainerModeEnum = _AzuremlMulticlassNeuralNetworkCreateTrainerModeEnum.singleparameter, hidden_layer_specification: example.assets.example_components._assets._AzuremlMulticlassNeuralNetworkHiddenLayerSpecificationEnum = _AzuremlMulticlassNeuralNetworkHiddenLayerSpecificationEnum.fully_connected_case, number_of_hidden_nodes: str = '100', the_learning_rate: float = 0.1, number_of_learning_iterations: int = 100, hidden_layer_specification1: example.assets.example_components._assets._AzuremlMulticlassNeuralNetworkHiddenLayerSpecification1Enum = _AzuremlMulticlassNeuralNetworkHiddenLayerSpecification1Enum.fully_connected_case, number_of_hidden_nodes1: str = '100', range_for_learning_rate: str = '0.1; 0.2; 0.4', range_for_number_of_learning_iterations: str = '20; 40; 80; 160', the_momentum: float = 0, shuffle_examples: bool = True, random_number_seed: Optional[int] = None) → example.assets.example_components._assets._AzuremlMulticlassNeuralNetworkComponent

Creates a multiclass classification model using a neural network algorithm.

Parameters

create_trainer_mode (_AzuremlMulticlassNeuralNetworkCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])
hidden_layer_specification (_AzuremlMulticlassNeuralNetworkHiddenLayerSpecificationEnum) – Specify the architecture of the hidden layer or layers (optional, enum: [‘Fully-connected case’])
number_of_hidden_nodes (str) – Type the number of nodes in the hidden layer. For multiple hidden layers, type a comma-separated list. (optional)
the_learning_rate (float) – Specify the size of each step in the learning process (optional, min: 2.220446049250313e-16, max: 2.0)
number_of_learning_iterations (int) – Specify the number of iterations while learning (optional, min: 1)
hidden_layer_specification1 (_AzuremlMulticlassNeuralNetworkHiddenLayerSpecification1Enum) – Specify the architecture of the hidden layer or layers for range (optional, enum: [‘Fully-connected case’])
number_of_hidden_nodes1 (str) – Type the number of nodes in the hidden layer, or for multiple hidden layers, type a comma-separated list. (optional)
range_for_learning_rate (str) – Specify the range for the size of each step in the learning process (optional)
range_for_number_of_learning_iterations (str) – Specify the range for the number of iterations while learning (optional)
the_momentum (float) – Specify a weight to apply during learning to nodes from previous iterations (max: 1.0)
shuffle_examples (bool) – Select this option to change the order of instances between learning iterations
random_number_seed (int) – Specify a numeric seed to use for random number generation. Leave blank to use the default seed. (optional, max: 4294967295)

Output untrained_model

An untrained multiclass classification model

Type

untrained_model: Output

example.assets.example_components.azureml_neural_network_regression(create_trainer_mode: example.assets.example_components._assets._AzuremlNeuralNetworkRegressionCreateTrainerModeEnum = _AzuremlNeuralNetworkRegressionCreateTrainerModeEnum.singleparameter, hidden_layer_specification: example.assets.example_components._assets._AzuremlNeuralNetworkRegressionHiddenLayerSpecificationEnum = _AzuremlNeuralNetworkRegressionHiddenLayerSpecificationEnum.fully_connected_case, number_of_hidden_nodes: str = '100', the_learning_rate: float = 0.1, number_of_learning_iterations: int = 100, hidden_layer_specification1: example.assets.example_components._assets._AzuremlNeuralNetworkRegressionHiddenLayerSpecification1Enum = _AzuremlNeuralNetworkRegressionHiddenLayerSpecification1Enum.fully_connected_case, number_of_hidden_nodes1: str = '100', range_for_learning_rate: str = '0.1; 0.2; 0.4', range_for_number_of_learning_iterations: str = '20; 40; 80; 160', the_momentum: float = 0, shuffle_examples: bool = True, random_number_seed: Optional[int] = None) → example.assets.example_components._assets._AzuremlNeuralNetworkRegressionComponent

Creates a regression model using a neural network algorithm.

Parameters

create_trainer_mode (_AzuremlNeuralNetworkRegressionCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])
hidden_layer_specification (_AzuremlNeuralNetworkRegressionHiddenLayerSpecificationEnum) – Specify the architecture of the hidden layer or layers (optional, enum: [‘Fully-connected case’])
number_of_hidden_nodes (str) – Type the number of nodes in the hidden layer. For multiple hidden layers, type a comma-separated list. (optional)
the_learning_rate (float) – Specify the size of each step in the learning process (optional, min: 2.220446049250313e-16, max: 2.0)
number_of_learning_iterations (int) – Specify the number of iterations while learning (optional, min: 1)
hidden_layer_specification1 (_AzuremlNeuralNetworkRegressionHiddenLayerSpecification1Enum) – Specify the architecture of the hidden layer or layers for range (optional, enum: [‘Fully-connected case’])
number_of_hidden_nodes1 (str) – Type the number of nodes in the hidden layer, or for multiple hidden layers, type a comma-separated list. (optional)
range_for_learning_rate (str) – Specify the range for the size of each step in the learning process (optional)
range_for_number_of_learning_iterations (str) – Specify the range for the number of iterations while learning (optional)
the_momentum (float) – Specify a weight to apply during learning to nodes from previous iterations (max: 1.0)
shuffle_examples (bool) – Select this option to change the order of instances between learning iterations
random_number_seed (int) – Specify a numeric seed to use for random number generation. Leave blank to use the default seed. (optional, max: 4294967295)

Output untrained_model

An untrained regression model

Type

untrained_model: Output

example.assets.example_components.azureml_normalize_data(dataset: Optional[pathlib.Path] = None, transformation_method: example.assets.example_components._assets._AzuremlNormalizeDataTransformationMethodEnum = _AzuremlNormalizeDataTransformationMethodEnum.zscore, use_0_for_constant_columns_when_checked: bool = True, columns_to_transform: Optional[str] = None) → example.assets.example_components._assets._AzuremlNormalizeDataComponent

Rescales numeric data to constrain dataset values to a standard range.

Parameters

dataset (Path) – Input dataset
transformation_method (_AzuremlNormalizeDataTransformationMethodEnum) – Choose the mathematical method used for scaling (enum: [‘ZScore’, ‘MinMax’, ‘Logistic’, ‘LogNormal’, ‘Tanh’])
use_0_for_constant_columns_when_checked (bool) – Use NaN for constant columns when unchecked or 0 when checked (optional)
columns_to_transform (str) – Select all columns to which the selected transformation should be applied

Output transformed_dataset

Transformed dataset

Type

transformed_dataset: Output

Output transformation_function

Definition of the transformation function, which can be applied to other datasets

Type

transformation_function: Output

example.assets.example_components.azureml_one_vs_all_multiclass(untrained_binary_classification_model: Optional[pathlib.Path] = None) → example.assets.example_components._assets._AzuremlOneVsAllMulticlassComponent

Creates a one-vs-all multiclass classification model from an ensemble of binary classification models.

Parameters: untrained_binary_classification_model (Path) – An untrained binary classification model
Output untrained_model: An untrained multi-class classification
Type: untrained_model: Output

example.assets.example_components.azureml_one_vs_one_multiclass(untrained_binary_classification_model: Optional[pathlib.Path] = None) → example.assets.example_components._assets._AzuremlOneVsOneMulticlassComponent

Creates a one-vs-one multiclass classification model from an ensemble of binary classification models.

Parameters: untrained_binary_classification_model (Path) – An untrained binary classification model
Output untrained_model: An untrained multi-class classification
Type: untrained_model: Output

example.assets.example_components.azureml_partition_and_sample(dataset: Optional[pathlib.Path] = None, partition_or_sample_mode: example.assets.example_components._assets._AzuremlPartitionAndSamplePartitionOrSampleModeEnum = _AzuremlPartitionAndSamplePartitionOrSampleModeEnum.sampling, use_replacement_in_the_partitioning: bool = False, randomized_split: bool = True, random_seed: int = 0, specify_the_partitioner_method: example.assets.example_components._assets._AzuremlPartitionAndSampleSpecifyThePartitionerMethodEnum = _AzuremlPartitionAndSampleSpecifyThePartitionerMethodEnum.partition_evenly, specify_how_many_folds_do_you_want_to_split_evenly_into: int = 5, stratified_split: example.assets.example_components._assets._AzuremlPartitionAndSampleStratifiedSplitEnum = _AzuremlPartitionAndSampleStratifiedSplitEnum.false, stratification_key_column: Optional[str] = None, proportion_list_of_customized_folds_separated_by_comma: Optional[str] = None, stratified_split_for_customized_fold_assignment: example.assets.example_components._assets._AzuremlPartitionAndSampleStratifiedSplitForCustomizedFoldAssignmentEnum = _AzuremlPartitionAndSampleStratifiedSplitForCustomizedFoldAssignmentEnum.false, stratification_key_column_for_customized_fold_assignment: Optional[str] = None, specify_which_fold_to_be_sampled_from: int = 1, pick_complement_of_the_selected_fold: bool = False, rate_of_sampling: float = 0.01, random_seed_for_sampling: int = 0, stratified_split_for_sampling: example.assets.example_components._assets._AzuremlPartitionAndSampleStratifiedSplitForSamplingEnum = _AzuremlPartitionAndSampleStratifiedSplitForSamplingEnum.false, stratification_key_column_for_sampling: Optional[str] = None, number_of_rows_to_select: int = 10) → example.assets.example_components._assets._AzuremlPartitionAndSampleComponent

Creates multiple partitions of a dataset based on sampling.

Parameters

dataset (Path) – Dataset to be split
partition_or_sample_mode (_AzuremlPartitionAndSamplePartitionOrSampleModeEnum) – Select the partition or sampling mode (enum: [‘Assign to Folds’, ‘Pick Fold’, ‘Sampling’, ‘Head’])
use_replacement_in_the_partitioning (bool) – Indicate whether the dataset should be replaced when split, or split without replacement (optional)
randomized_split (bool) – Indicates whether split is random or not (optional)
random_seed (int) – Specify a seed for the random number generator (optional, max: 4294967295)
specify_the_partitioner_method (_AzuremlPartitionAndSampleSpecifyThePartitionerMethodEnum) – EvenSize where you specify number of folds, or ShapeInPct where you specify a list of percentage numbers (optional, enum: [‘Partition evenly’, ‘Partition with customized proportions’])
specify_how_many_folds_do_you_want_to_split_evenly_into (int) – Number of even partitions to be evenly split into (optional, min: 1)
stratified_split (_AzuremlPartitionAndSampleStratifiedSplitEnum) – Indicates whether the split is stratified or not (optional, enum: [‘True’, ‘False’])
stratification_key_column (str) – Column containing stratification key (optional)
proportion_list_of_customized_folds_separated_by_comma (str) – List of proportions separated by comma (optional)
stratified_split_for_customized_fold_assignment (_AzuremlPartitionAndSampleStratifiedSplitForCustomizedFoldAssignmentEnum) – Indicates whether the split is stratified or not for customized fold assignments (optional, enum: [‘True’, ‘False’])
stratification_key_column_for_customized_fold_assignment (str) – Column containing stratification key for customized fold assignments (optional)
specify_which_fold_to_be_sampled_from (int) – Index of the partitioned fold to be sampled from (optional, min: 1)
pick_complement_of_the_selected_fold (bool) – Complement of the logic fold (optional)
rate_of_sampling (float) – Sampling rate (optional)
random_seed_for_sampling (int) – Random number generator seed for sampling (optional, max: 4294967295)
stratified_split_for_sampling (_AzuremlPartitionAndSampleStratifiedSplitForSamplingEnum) – Indicates whether the split is stratified or not for sampling (optional, enum: [‘True’, ‘False’])
stratification_key_column_for_sampling (str) – Column containing stratification key for sampling (optional)
number_of_rows_to_select (int) – Maximum number of records that will be allowed to pass through to the next module (optional)

Output odataset

Dataset resulting from the split

Type

odataset: Output

example.assets.example_components.azureml_pca_based_anomaly_detection(training_mode: example.assets.example_components._assets._AzuremlPcaBasedAnomalyDetectionTrainingModeEnum = _AzuremlPcaBasedAnomalyDetectionTrainingModeEnum.singleparameter, number_of_components_to_use_in_pca: int = 2, oversampling_parameter_for_randomized_pca: int = 2, enable_input_feature_mean_normalization: bool = False) → example.assets.example_components._assets._AzuremlPcaBasedAnomalyDetectionComponent

Create a PCA-based anomaly detection model.

Parameters

training_mode (_AzuremlPcaBasedAnomalyDetectionTrainingModeEnum) – Specify learner options. Use ‘SingleParameter’ to manually specify all values. Use ‘ParameterRange’ to sweep over tunable parameters. (enum: [‘SingleParameter’])
number_of_components_to_use_in_pca (int) – Specify the number of components to use in PCA. (optional, min: 1)
oversampling_parameter_for_randomized_pca (int) – Specify the accuracy parameter for randomized PCA training. (optional)
enable_input_feature_mean_normalization (bool) – Specify if the input data is normalized to have zero mean.

Output untrained_model

An untrained PCA-based anomaly detection model.

Type

untrained_model: Output

example.assets.example_components.azureml_permutation_feature_importance(trained_model: Optional[pathlib.Path] = None, test_data: Optional[pathlib.Path] = None, random_seed: int = 0, metric_for_measuring_performance: example.assets.example_components._assets._AzuremlPermutationFeatureImportanceMetricForMeasuringPerformanceEnum = _AzuremlPermutationFeatureImportanceMetricForMeasuringPerformanceEnum.accuracy) → example.assets.example_components._assets._AzuremlPermutationFeatureImportanceComponent

Computes the permutation feature importance scores of feature variables given a trained model and a test dataset.

Parameters

trained_model (Path) – Trained model to be used for scoring
test_data (Path) – Test dataset for scoring and evaluating a model after permutation of feature values
random_seed (int) – Random number generator seed value (max: 4294967295)
metric_for_measuring_performance (_AzuremlPermutationFeatureImportanceMetricForMeasuringPerformanceEnum) – Evaluation metric (enum: [‘Accuracy’, ‘Precision’, ‘Recall’, ‘Mean Absolute Error’, ‘Root Mean Squared Error’, ‘Relative Absolute Error’, ‘Relative Squared Error’, ‘Coefficient of Determination’])

Output feature_importance

Feature importance results

Type

feature_importance: Output

example.assets.example_components.azureml_poisson_regression(create_trainer_mode: example.assets.example_components._assets._AzuremlPoissonRegressionCreateTrainerModeEnum = _AzuremlPoissonRegressionCreateTrainerModeEnum.singleparameter, tolerance_parameter_for_optimization_convergence_the_lower_the_value_the_slower_and_more_accurate_the_fitting: float = 1e-07, l1_regularization_weight: float = 1.0, l2_regularization_weight: float = 1.0, memory_size_for_l_bfgs_the_lower_the_value_the_faster_and_less_accurate_the_training: int = 20, range_for_optimization_tolerance: str = '0.00001; 0.00000001', range_for_l1_regularization_weight: str = '0.0; 0.01; 0.1; 1.0', range_for_l2_regularization_weight: str = '0.01; 0.1; 1.0', range_for_memory_size_for_l_bfgs_the_lower_the_value_the_faster_and_less_accurate_the_training: str = '5; 20; 50') → example.assets.example_components._assets._AzuremlPoissonRegressionComponent

Creates a regression model that assumes data has a Poisson distribution

Parameters

create_trainer_mode (_AzuremlPoissonRegressionCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])
tolerance_parameter_for_optimization_convergence_the_lower_the_value_the_slower_and_more_accurate_the_fitting (float) – Specify a tolerance value for optimization convergence. The lower the value, the slower and more accurate the fitting. (optional, min: 2.220446049250313e-16)
l1_regularization_weight (float) – Specify the L1 regularization weight. Use a non-zero value to avoid overfitting the model. (optional)
l2_regularization_weight (float) – Specify the L2 regularization weight. Use a non-zero value to avoid overfitting the model. (optional)
memory_size_for_l_bfgs_the_lower_the_value_the_faster_and_less_accurate_the_training (int) – Indicate how much memory (in MB) to use for the L-BFGS optimizer. With less memory, training is faster but less accurate the training. (optional, min: 1)
range_for_optimization_tolerance (str) – Specify a range for the tolerance value for the L-BFGS optimizer (optional)
range_for_l1_regularization_weight (str) – Specify the range for the L1 regularization weight. Use a non-zero value to avoid overfitting. (optional)
range_for_l2_regularization_weight (str) – Specify the range for the L2 regularization weight. Use a non-zero value to avoid overfitting. (optional)
range_for_memory_size_for_l_bfgs_the_lower_the_value_the_faster_and_less_accurate_the_training (str) – Specify the range for the amount of memory (in MB) to use for the L-BFGS optimizer. The lower the value, the faster and less accurate the training. (optional)

Output untrained_model

An untrained Poisson regression model

Type

untrained_model: Output

example.assets.example_components.azureml_preprocess_text(dataset: Optional[pathlib.Path] = None, stop_words: Optional[pathlib.Path] = None, language: example.assets.example_components._assets._AzuremlPreprocessTextLanguageEnum = _AzuremlPreprocessTextLanguageEnum.english, expand_verb_contractions: bool = True, text_column_to_clean: Optional[str] = None, remove_stop_words: bool = True, use_lemmatization: bool = True, detect_sentences: bool = True, normalize_case_to_lowercase: bool = True, remove_numbers: bool = True, remove_special_characters: bool = True, remove_duplicate_characters: bool = True, remove_email_addresses: bool = True, remove_urls: bool = True, normalize_backslashes_to_slashes: bool = True, split_tokens_on_special_characters: bool = True, custom_regular_expression: Optional[str] = None, custom_replacement_string: Optional[str] = None) → example.assets.example_components._assets._AzuremlPreprocessTextComponent

Performs cleaning operations on text.

Parameters

dataset (Path) – Input data
stop_words (Path) – Optional custom list of stop words to remove(optional)
language (_AzuremlPreprocessTextLanguageEnum) – Select the language to preprocess (enum: [‘English’])
expand_verb_contractions (bool) – Expand verb contractions (English only) (optional)
text_column_to_clean (str) – Select the text column to clean
remove_stop_words (bool) – Remove stop words
use_lemmatization (bool) – Use lemmatization
detect_sentences (bool) – Detect sentences by adding a sentence terminator “|||” that can be used by the n-gram features extractor module
normalize_case_to_lowercase (bool) – Normalize case to lowercase
remove_numbers (bool) – Remove numbers
remove_special_characters (bool) – Remove non-alphanumeric special characters and replace them with “|” character
remove_duplicate_characters (bool) – Remove duplicate characters
remove_email_addresses (bool) – Remove email addresses
remove_urls (bool) – Remove URLs
normalize_backslashes_to_slashes (bool) – Normalize backslashes to slashes
split_tokens_on_special_characters (bool) – Split tokens on special characters
custom_regular_expression (str) – Specify the custom regular expression (optional)
custom_replacement_string (str) – Specify the custom replacement string for the custom regular expression (optional)

Output results_dataset

Results dataset

Type

results_dataset: Output

example.assets.example_components.azureml_remove_duplicate_rows(dataset: Optional[pathlib.Path] = None, key_column_selection_filter_expression: Optional[str] = None, retain_first_duplicate_row: bool = True) → example.assets.example_components._assets._AzuremlRemoveDuplicateRowsComponent

Removes the duplicate rows from a dataset.

Parameters

dataset (Path) – Input dataset
key_column_selection_filter_expression (str) – Choose the key columns to use when searching for duplicates
retain_first_duplicate_row (bool) – indicate whether to keep the first row of a set of duplicates and discard others. if false, the last duplicate row encountered will be kept.

Output results_dataset

Filtered dataset

Type

results_dataset: Output

example.assets.example_components.azureml_resnet(model_name: example.assets.example_components._assets._AzuremlResnetModelNameEnum = _AzuremlResnetModelNameEnum.resnext101_32x8d, pretrained: bool = True, zero_init_residual: bool = False) → example.assets.example_components._assets._AzuremlResnetComponent

Creates a image classification model using the resnet algorithm.

Parameters

model_name (_AzuremlResnetModelNameEnum) – Name of a certain resnet structure (enum: [‘resnet18’, ‘resnet34’, ‘resnet50’, ‘resnet101’, ‘resnet152’, ‘resnext50_32x4d’, ‘resnext101_32x8d’, ‘wide_resnet50_2’, ‘wide_resnet101_2’])
pretrained (bool) – Indicate whether to use a model pre-trained on ImageNet
zero_init_residual (bool) – Zero-initialize the last BN in each residual branch. (optional)

Output untrained_model

Untrained resnet model path

Type

untrained_model: Output

example.assets.example_components.azureml_score_image_model(trained_model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None) → example.assets.example_components._assets._AzuremlScoreImageModelComponent

Scores predictions for a trained image model.

Parameters

trained_model (Path) – Trained predictive model
dataset (Path) – Input data to score

Output scored_dataset

Dataset with obtained scores

Type

scored_dataset: Output

example.assets.example_components.azureml_score_model(trained_model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, append_score_columns_to_output: bool = True) → example.assets.example_components._assets._AzuremlScoreModelComponent

Scores predictions for a trained classification or regression model.

Parameters

trained_model (Path) – Trained predictive model
dataset (Path) – Input test dataset
append_score_columns_to_output (bool) – If checked, append score columns to the result dataset, otherwise only return the scores and true labels if available.

Output scored_dataset

Dataset with obtained scores

Type

scored_dataset: Output

example.assets.example_components.azureml_score_svd_recommender(trained_svd_recommendation: Optional[pathlib.Path] = None, dataset_to_score: Optional[pathlib.Path] = None, training_data: Optional[pathlib.Path] = None, recommender_prediction_kind: example.assets.example_components._assets._AzuremlScoreSvdRecommenderRecommenderPredictionKindEnum = _AzuremlScoreSvdRecommenderRecommenderPredictionKindEnum.item_recommendation, recommended_item_selection: example.assets.example_components._assets._AzuremlScoreSvdRecommenderRecommendedItemSelectionEnum = _AzuremlScoreSvdRecommenderRecommendedItemSelectionEnum.from_rated_items_for_model_evaluation, minimum_size_of_the_recommendation_pool_for_a_single_user: int = 2, maximum_number_of_items_to_recommend_to_a_user: int = 5, whether_to_return_the_predicted_ratings_of_the_items_along_with_the_labels: bool = False) → example.assets.example_components._assets._AzuremlScoreSvdRecommenderComponent

Score a dataset using the SVD recommendation.

Parameters

trained_svd_recommendation (Path) – Trained SVD recommendation
dataset_to_score (Path) – Dataset to score
training_data (Path) – Dataset containing the training data. (Used to filter out already rated items from prediction)(optional)
recommender_prediction_kind (_AzuremlScoreSvdRecommenderRecommenderPredictionKindEnum) – Specify the type of prediction the recommendation should output (enum: [‘Rating Prediction’, ‘Item Recommendation’])
recommended_item_selection (_AzuremlScoreSvdRecommenderRecommendedItemSelectionEnum) – Select the set of items to make recommendations from (optional, enum: [‘From All Items’, ‘From Rated Items (for model evaluation)’, ‘From Unrated Items (to suggest new items to users)’])
minimum_size_of_the_recommendation_pool_for_a_single_user (int) – Specify the minimum size of the recommendation pool for each user (optional, min: 1)
maximum_number_of_items_to_recommend_to_a_user (int) – Specify the maximum number of items to recommend to a user (optional, min: 1)
whether_to_return_the_predicted_ratings_of_the_items_along_with_the_labels (bool) – Specify whether to return the predicted ratings of the items along with the labels (optional)

Output scored_dataset

Scored dataset

Type

scored_dataset: Output

example.assets.example_components.azureml_score_vowpal_wabbit_model(trained_vowpal_wabbit_model: Optional[pathlib.Path] = None, test_data: Optional[pathlib.Path] = None, vw_arguments: Optional[str] = None, name_of_the_test_data_file: Optional[str] = None, specify_file_type: example.assets.example_components._assets._AzuremlScoreVowpalWabbitModelSpecifyFileTypeEnum = _AzuremlScoreVowpalWabbitModelSpecifyFileTypeEnum.vw, include_an_extra_column_containing_labels: bool = False, include_an_extra_column_containing_raw_scores: bool = False) → example.assets.example_components._assets._AzuremlScoreVowpalWabbitModelComponent

Score data using Vowpal Wabbit from the command line interface.

Parameters

trained_vowpal_wabbit_model (Path) – Trained Vowpal Wabbit model.
test_data (Path) – Test data.
vw_arguments (str) – Type vowpal wabbit command line arguments. (optional)
name_of_the_test_data_file (str) – Type name of the test data file. (optional)
specify_file_type (_AzuremlScoreVowpalWabbitModelSpecifyFileTypeEnum) – Please specify file type. (enum: [‘VW’, ‘SVMLight’])
include_an_extra_column_containing_labels (bool) – Whether to include an extra column containing labels in the scored dataset.
include_an_extra_column_containing_raw_scores (bool) – Whether to include an extra column containing raw scores in the scored dataset.

Output scored_dataset

Scored dataset

Type

scored_dataset: Output

example.assets.example_components.azureml_score_wide_and_deep_recommender(trained_wide_and_deep_recommendation_model: Optional[pathlib.Path] = None, dataset_to_score: Optional[pathlib.Path] = None, user_features: Optional[pathlib.Path] = None, item_features: Optional[pathlib.Path] = None, training_data: Optional[pathlib.Path] = None, recommender_prediction_kind: example.assets.example_components._assets._AzuremlScoreWideAndDeepRecommenderRecommenderPredictionKindEnum = _AzuremlScoreWideAndDeepRecommenderRecommenderPredictionKindEnum.item_recommendation, recommended_item_selection: example.assets.example_components._assets._AzuremlScoreWideAndDeepRecommenderRecommendedItemSelectionEnum = _AzuremlScoreWideAndDeepRecommenderRecommendedItemSelectionEnum.from_rated_items_for_model_evaluation, minimum_size_of_the_recommendation_pool_for_a_single_user: int = 2, maximum_number_of_items_to_recommend_to_a_user: int = 5, whether_to_return_the_predicted_ratings_of_the_items_along_with_the_labels: bool = False) → example.assets.example_components._assets._AzuremlScoreWideAndDeepRecommenderComponent

Score a dataset using the Wide and Deep recommendation model.

Parameters

trained_wide_and_deep_recommendation_model (Path) – Trained Wide and Deep recommendation model
dataset_to_score (Path) – Dataset to score
user_features (Path) – User features(optional)
item_features (Path) – Item features(optional)
training_data (Path) – Dataset containing the training data. (Used to filter out already rated items from prediction)(optional)
recommender_prediction_kind (_AzuremlScoreWideAndDeepRecommenderRecommenderPredictionKindEnum) – Specify the type of prediction the recommendation should output (enum: [‘Rating Prediction’, ‘Item Recommendation’])
recommended_item_selection (_AzuremlScoreWideAndDeepRecommenderRecommendedItemSelectionEnum) – Select the set of items to make recommendations from (optional, enum: [‘From All Items’, ‘From Rated Items (for model evaluation)’, ‘From Unrated Items (to suggest new items to users)’])
minimum_size_of_the_recommendation_pool_for_a_single_user (int) – Specify the minimum size of the recommendation pool for each user (optional, min: 1)
maximum_number_of_items_to_recommend_to_a_user (int) – Specify the maximum number of items to recommend to a user (optional, min: 1)
whether_to_return_the_predicted_ratings_of_the_items_along_with_the_labels (bool) – Specify whether to return the predicted ratings of the items along with the labels (optional)

Output scored_dataset

Scored dataset

Type

scored_dataset: Output

example.assets.example_components.azureml_select_columns_in_dataset(dataset: Optional[pathlib.Path] = None, select_columns: Optional[str] = None) → example.assets.example_components._assets._AzuremlSelectColumnsInDatasetComponent

Selects columns to include or exclude from a dataset in an operation.

Parameters

dataset (Path) – Input dataset
select_columns (str) – Select columns to keep in the projected dataset

Output results_dataset

Output dataset

Type

results_dataset: Output

example.assets.example_components.azureml_select_columns_transform(dataset_with_desired_columns: Optional[pathlib.Path] = None) → example.assets.example_components._assets._AzuremlSelectColumnsTransformComponent

Create a transformation that selects the same subset of columns as in the given dataset.

Parameters: dataset_with_desired_columns (Path) – Dataset containing desired set of columns
Output columns_selection_transformation: Transformation that selects the same subset of columns as in the given dataset.
Type: columns_selection_transformation: Output

example.assets.example_components.azureml_smote(samples: Optional[pathlib.Path] = None, label_column: Optional[str] = None, smote_percentage: int = 100, number_of_nearest_neighbors: int = 1, random_seed: int = 0) → example.assets.example_components._assets._AzuremlSmoteComponent

Increases the number of low incidence examples in a dataset.

Parameters

samples (Path) – A DataTable of samples
label_column (str) – Select the column that contains the label or outcome column
smote_percentage (int) – Amount of oversampling.If not in integral multiples of 100, the minority class will be randomized and downsampled from the next integral multiple of 100.
number_of_nearest_neighbors (int) – The number of nearest neighbors (min: 1)
random_seed (int) – Random number generator seed (max: 4294967295)

Output table

A DataTable containing original samples plus an additional synthetic minority class samples, where T is the number of minority class samples

Type

table: Output

example.assets.example_components.azureml_split_data(dataset: Optional[pathlib.Path] = None, splitting_mode: example.assets.example_components._assets._AzuremlSplitDataSplittingModeEnum = _AzuremlSplitDataSplittingModeEnum.split_rows, fraction_of_rows_in_the_first_output_dataset: float = 0.5, randomized_split: bool = True, random_seed: int = 0, stratified_split: example.assets.example_components._assets._AzuremlSplitDataStratifiedSplitEnum = _AzuremlSplitDataStratifiedSplitEnum.false, stratification_key_column: Optional[str] = None, regular_expression: str = '"column name" ^start', relational_expression: str = '"column name" > 3') → example.assets.example_components._assets._AzuremlSplitDataComponent

Partitions the rows of a dataset into two distinct sets.

Parameters

dataset (Path) – Dataset to split
splitting_mode (_AzuremlSplitDataSplittingModeEnum) – Choose the method for splitting the dataset (enum: [‘Split Rows’, ‘Regular Expression’, ‘Relative Expression’])
fraction_of_rows_in_the_first_output_dataset (float) – Specify a ratio representing the number of rows in the first output dataset over the number of rows in the input dataset (optional, max: 1.0)
randomized_split (bool) – Indicate whether rows should be randomly selected (optional)
random_seed (int) – Provide a value to see the random number generator seed (optional, max: 4294967295)
stratified_split (_AzuremlSplitDataStratifiedSplitEnum) – Indicate whether the rows in each split should be grouped using a strata column (optional, enum: [‘True’, ‘False’])
stratification_key_column (str) – Select the column containing the stratification key (optional)
regular_expression (str) – Type a regular expression to use as criteria when splitting the dataset on a string column (optional)
relational_expression (str) – Type a relational expression to use in splitting the dataset on a numeric column (optional)

Output results_dataset1

Dataset containing selected rows

Type

results_dataset1: Output

Output results_dataset2

Dataset containing all other rows

Type

results_dataset2: Output

example.assets.example_components.azureml_split_image_directory(input_image_directory: Optional[pathlib.Path] = None, fraction_of_images_in_the_first_output: float = 0.9) → example.assets.example_components._assets._AzuremlSplitImageDirectoryComponent

Partitions the images of a image directory into two distinct sets.

Parameters

input_image_directory (Path) – Input image directory
fraction_of_images_in_the_first_output (float) – Fraction of images in the first output (min: 2.220446049250313e-16, max: 0.9999999999999998)

Output output_image_directory1

First output image directory

Type

output_image_directory1: Output

Output output_image_directory2

Second output image directory

Type

output_image_directory2: Output

example.assets.example_components.azureml_summarize_data(input: Optional[pathlib.Path] = None) → example.assets.example_components._assets._AzuremlSummarizeDataComponent

Generates a basic descriptive statistics report for the columns in a dataset.

Parameters: input (Path) – DataFrameDirectory
Output result_dataset: DataFrameDirectory
Type: result_dataset: Output

example.assets.example_components.azureml_train_anomaly_detection_model(untrained_model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None) → example.assets.example_components._assets._AzuremlTrainAnomalyDetectionModelComponent

Trains an anomaly detector model and labels data from the training set

Parameters

untrained_model (Path) – Untrained learner
dataset (Path) – Input data source

Output trained_model

Trained anomaly detection model

Type

trained_model: Output

example.assets.example_components.azureml_train_clustering_model(untrained_model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, column_set: Optional[str] = None, check_for_append_or_uncheck_for_result_only: bool = True) → example.assets.example_components._assets._AzuremlTrainClusteringModelComponent

Train clustering model and assign data to clusters.

Parameters

untrained_model (Path) – Untrained clustering model
dataset (Path) – Input data source
column_set (str) – Column selection pattern
check_for_append_or_uncheck_for_result_only (bool) – Whether output dataset must contain input dataset appended by assignments column (Checked) or assignments column only (Unchecked)

Output trained_model

Trained clustering model

Type

trained_model: Output

Output results_dataset

Input dataset appended by data column of assignments or assignments column only

Type

results_dataset: Output

example.assets.example_components.azureml_train_model(untrained_model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, label_column: Optional[str] = None, model_explanations: bool = False) → example.assets.example_components._assets._AzuremlTrainModelComponent

Trains a classification or regression model in a supervised manner.

Parameters

untrained_model (Path) – Untrained learner
dataset (Path) – Training data
label_column (str) – Select the column that contains the label or outcome column
model_explanations (bool) – Whether to generate explanations for the trained model. Default is unchecked to reduce extra compute overhead. (optional)

Output trained_model

Trained learner

Type

trained_model: Output

example.assets.example_components.azureml_train_pytorch_model(untrained_model: Optional[pathlib.Path] = None, training_dataset: Optional[pathlib.Path] = None, validation_dataset: Optional[pathlib.Path] = None, epochs: int = 5, batch_size: int = 16, warmup_step_number: int = 0, learning_rate: float = 0.001, random_seed: int = 1, patience: int = 3, print_frequency: int = 10) → example.assets.example_components._assets._AzuremlTrainPytorchModelComponent

Train pytorch model from scratch or finetune it.

Parameters

untrained_model (Path) – Untrained model
training_dataset (Path) – Input dataset for training
validation_dataset (Path) – Input dataset for validation
epochs (int) – Epochs. (min: 1)
batch_size (int) – Batch size. (min: 1)
warmup_step_number (int) – Warmup step number (optional)
learning_rate (float) – Learning rate. (min: 2.220446049250313e-16, max: 2.0)
random_seed (int) – Random seed.
patience (int) – Patience. (min: 1)
print_frequency (int) – Training log print frequency over iterations in each epoch. (optional, min: 1)

Output trained_model

Trained model

Type

trained_model: Output

example.assets.example_components.azureml_train_svd_recommender(training_dataset_of_user_item_rating_triples: Optional[pathlib.Path] = None, number_of_factors: int = 200, number_of_recommendation_algorithm_iterations: int = 30, learning_rate: float = 0.005) → example.assets.example_components._assets._AzuremlTrainSvdRecommenderComponent

Train a collaborative filtering recommendation using SVD algorithm.

Parameters

training_dataset_of_user_item_rating_triples (Path) – Ratings of items by users, expressed as triple (User, Item, Rating)
number_of_factors (int) – Specify the number of factors to use with recommendation (min: 1)
number_of_recommendation_algorithm_iterations (int) – Specify the maximum number of iterations to perform while training the recommendation model (min: 1)
learning_rate (float) – Specify the size of each step in the learning process (min: 2.220446049250313e-16, max: 2.0)

Output trained_svd_recommendation

Trained SVD recommendation

Type

trained_svd_recommendation: Output

example.assets.example_components.azureml_train_vowpal_wabbit_model(pre_trained_vowpal_wabbit_model: Optional[pathlib.Path] = None, training_data: Optional[pathlib.Path] = None, vw_arguments: Optional[str] = None, name_of_the_training_data_file: Optional[str] = None, specify_file_type: example.assets.example_components._assets._AzuremlTrainVowpalWabbitModelSpecifyFileTypeEnum = _AzuremlTrainVowpalWabbitModelSpecifyFileTypeEnum.vw, output_readable_model_file: bool = False, output_inverted_hash_file: bool = False) → example.assets.example_components._assets._AzuremlTrainVowpalWabbitModelComponent

Train a Vowpal Wabbit model using the command line interface.

Parameters

pre_trained_vowpal_wabbit_model (Path) – Trained Vowpal Wabbit model.(optional)
training_data (Path) – Training data.
vw_arguments (str) – Type vowpal wabbit command line arguments. (optional)
name_of_the_training_data_file (str) – Type name of the training data file. (optional)
specify_file_type (_AzuremlTrainVowpalWabbitModelSpecifyFileTypeEnum) – Please specify file type. (enum: [‘VW’, ‘SVMLight’])
output_readable_model_file (bool) – Output readable model (–readable_model) file.
output_inverted_hash_file (bool) – Output inverted hash (–invert_hash) file.

Output trained_vowpal_wabbit_model

Trained Vowpal Wabbit model

Type

trained_vowpal_wabbit_model: Output

example.assets.example_components.azureml_train_wide_and_deep_recommender(training_dataset_of_user_item_rating_triples: Optional[pathlib.Path] = None, user_features: Optional[pathlib.Path] = None, item_features: Optional[pathlib.Path] = None, epochs: int = 15, batch_size: int = 64, wide_part_optimizer: example.assets.example_components._assets._AzuremlTrainWideAndDeepRecommenderWidePartOptimizerEnum = _AzuremlTrainWideAndDeepRecommenderWidePartOptimizerEnum.adagrad, wide_optimizer_learning_rate: float = 0.1, crossed_feature_dimension: int = 1000, deep_part_optimizer: example.assets.example_components._assets._AzuremlTrainWideAndDeepRecommenderDeepPartOptimizerEnum = _AzuremlTrainWideAndDeepRecommenderDeepPartOptimizerEnum.adagrad, deep_optimizer_learning_rate: float = 0.1, user_embedding_dimension: int = 16, item_embedding_dimension: int = 16, categorical_features_embedding_dimension: int = 4, hidden_units: str = '256,128', activation_function: example.assets.example_components._assets._AzuremlTrainWideAndDeepRecommenderActivationFunctionEnum = _AzuremlTrainWideAndDeepRecommenderActivationFunctionEnum.relu, dropout: float = 0.8, batch_normalization: bool = True) → example.assets.example_components._assets._AzuremlTrainWideAndDeepRecommenderComponent

Train a recommender based on Wide & Deep model.

Parameters

training_dataset_of_user_item_rating_triples (Path) – Ratings of items by users, expressed as triple (User, Item, Rating)
user_features (Path) – User features(optional)
item_features (Path) – Item features(optional)
epochs (int) – Maximum number of epochs to perform while training (min: 1)
batch_size (int) – Number of consecutive samples to combine in a single batch (min: 1)
wide_part_optimizer (_AzuremlTrainWideAndDeepRecommenderWidePartOptimizerEnum) – Optimizer used to apply gradients to the wide part of the model (enum: [‘Adagrad’, ‘Adam’, ‘Ftrl’, ‘RMSProp’, ‘SGD’, ‘Adadelta’])
wide_optimizer_learning_rate (float) – Size of each step in the learning process for wide part of the model (min: 2.220446049250313e-16, max: 2.0)
crossed_feature_dimension (int) – Crossed feature dimension for wide part model (min: 1)
deep_part_optimizer (_AzuremlTrainWideAndDeepRecommenderDeepPartOptimizerEnum) – Optimizer used to apply gradients to the deep part of the model (enum: [‘Adagrad’, ‘Adam’, ‘Ftrl’, ‘RMSProp’, ‘SGD’, ‘Adadelta’])
deep_optimizer_learning_rate (float) – Size of each step in the learning process for deep part of the model (min: 2.220446049250313e-16, max: 2.0)
user_embedding_dimension (int) – User embedding dimension for deep part model (min: 1)
item_embedding_dimension (int) – Item embedding dimension for deep part model (min: 1)
categorical_features_embedding_dimension (int) – Categorical features embedding dimension for deep part model (min: 1)
hidden_units (str) – Hidden units per layer for deep part model
activation_function (_AzuremlTrainWideAndDeepRecommenderActivationFunctionEnum) – Activation function applied to each layer in deep part model (enum: [‘ReLU’, ‘Sigmoid’, ‘Tanh’, ‘Linear’, ‘LeakyReLU’])
dropout (float) – Probability that each element is dropped in deep part model (max: 1.0)
batch_normalization (bool) – Whether to use batch normalization after each hidden layer

Output trained_wide_and_deep_recommendation_model

Trained Wide and Deep recommendation model

Type

trained_wide_and_deep_recommendation_model: Output

example.assets.example_components.azureml_tune_model_hyperparameters(untrained_model: Optional[pathlib.Path] = None, training_dataset: Optional[pathlib.Path] = None, optional_validation_dataset: Optional[pathlib.Path] = None, specify_parameter_sweeping_mode: example.assets.example_components._assets._AzuremlTuneModelHyperparametersSpecifyParameterSweepingModeEnum = _AzuremlTuneModelHyperparametersSpecifyParameterSweepingModeEnum.random_sweep, maximum_number_of_runs_on_random_sweep: int = 5, random_seed: int = 0, name_or_numerical_index_of_the_label_column: Optional[str] = None, metric_for_measuring_performance_for_classification: example.assets.example_components._assets._AzuremlTuneModelHyperparametersMetricForMeasuringPerformanceForClassificationEnum = _AzuremlTuneModelHyperparametersMetricForMeasuringPerformanceForClassificationEnum.accuracy, metric_for_measuring_performance_for_regression: example.assets.example_components._assets._AzuremlTuneModelHyperparametersMetricForMeasuringPerformanceForRegressionEnum = _AzuremlTuneModelHyperparametersMetricForMeasuringPerformanceForRegressionEnum.mean_absolute_error) → example.assets.example_components._assets._AzuremlTuneModelHyperparametersComponent

Perform a parameter sweep on the model to determine the optimum parameter settings.

Parameters

untrained_model (Path) – Untrained model for parameter sweep
training_dataset (Path) – Input dataset for training
optional_validation_dataset (Path) – Input dataset for validation (for Train/Test validation mode)(optional)
specify_parameter_sweeping_mode (_AzuremlTuneModelHyperparametersSpecifyParameterSweepingModeEnum) – Sweep entire grid on parameter space, or sweep with using a limited number of sample runs (enum: [‘Entire grid’, ‘Random sweep’])
maximum_number_of_runs_on_random_sweep (int) – Execute maximum number of runs using random sweep (optional, min: 1, max: 10000)
random_seed (int) – Provide a value to seed the random number generator (optional, max: 4294967295)
name_or_numerical_index_of_the_label_column (str) – Label column
metric_for_measuring_performance_for_classification (_AzuremlTuneModelHyperparametersMetricForMeasuringPerformanceForClassificationEnum) – Select the metric used for evaluating classification models (enum: [‘Accuracy’, ‘Precision’, ‘Recall’, ‘F-score’, ‘AUC’, ‘Average Log Loss’])
metric_for_measuring_performance_for_regression (_AzuremlTuneModelHyperparametersMetricForMeasuringPerformanceForRegressionEnum) – Select the metric used for evaluating regression models (enum: [‘Mean absolute error’, ‘Root of mean squared error’, ‘Relative absolute error’, ‘Relative squared error’, ‘Coefficient of determination’])

Output sweep_results

Results metric for parameter sweep runs

Type

sweep_results: Output

Output trained_best_model

Model with best performance on the training dataset

Type

trained_best_model: Output

example.assets.example_components.azureml_two_class_averaged_perceptron(create_trainer_mode: example.assets.example_components._assets._AzuremlTwoClassAveragedPerceptronCreateTrainerModeEnum = _AzuremlTwoClassAveragedPerceptronCreateTrainerModeEnum.singleparameter, initial_learning_rate: float = 1.0, maximum_number_of_iterations: int = 10, range_for_initial_learning_rate: str = '0.1; 0.5; 1.0', range_for_maximum_number_of_iterations: str = '1; 10', random_number_seed: Optional[int] = None) → example.assets.example_components._assets._AzuremlTwoClassAveragedPerceptronComponent

Creates an averaged perceptron binary classification model.

Parameters

create_trainer_mode (_AzuremlTwoClassAveragedPerceptronCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])
initial_learning_rate (float) – The initial learning rate for the Stochastic Gradient Descent optimizer. (optional, min: 2.220446049250313e-16)
maximum_number_of_iterations (int) – The number of Stochastic Gradient Descent iterations to be performed over the training dataset. (optional, min: 1)
range_for_initial_learning_rate (str) – Range for initial learning rate for the Stochastic Gradient Descent optimizer. (optional)
range_for_maximum_number_of_iterations (str) – Range for the number of Stochastic Gradient Descent iterations to be performed over the training dataset. (optional)
random_number_seed (int) – The seed for the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained binary classification model that can be connected to the Create One-vs-All Multi-class Classifier or Train Generic Model or Cross Validate Model modules.

Type

untrained_model: Output

example.assets.example_components.azureml_two_class_boosted_decision_tree(create_trainer_mode: example.assets.example_components._assets._AzuremlTwoClassBoostedDecisionTreeCreateTrainerModeEnum = _AzuremlTwoClassBoostedDecisionTreeCreateTrainerModeEnum.singleparameter, maximum_number_of_leaves_per_tree: int = 20, minimum_number_of_training_instances_required_to_form_a_leaf: int = 10, the_learning_rate: float = 0.2, total_number_of_trees_constructed: int = 100, range_for_maximum_number_of_leaves_per_tree: str = '2; 8; 32; 128', range_for_minimum_number_of_training_instances_required_to_form_a_leaf: str = '1; 10; 50', range_for_learning_rate: str = '0.025; 0.05; 0.1; 0.2; 0.4', range_for_total_number_of_trees_constructed: str = '20; 100; 500', random_number_seed: Optional[int] = None) → example.assets.example_components._assets._AzuremlTwoClassBoostedDecisionTreeComponent

Creates a binary classifier using a boosted decision tree algorithm.

Parameters

create_trainer_mode (_AzuremlTwoClassBoostedDecisionTreeCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])
maximum_number_of_leaves_per_tree (int) – Specify the maximum number of leaves allowed per tree (optional, min: 2, max: 131072)
minimum_number_of_training_instances_required_to_form_a_leaf (int) – Specify the minimum number of cases required to form a leaf (optional, min: 1)
the_learning_rate (float) – Specify the initial learning rate (optional, min: 2.220446049250313e-16, max: 1.0)
total_number_of_trees_constructed (int) – Specify the maximum number of trees that can be created during training (optional, min: 1)
range_for_maximum_number_of_leaves_per_tree (str) – Specify range for the maximum number of leaves allowed per tree (optional)
range_for_minimum_number_of_training_instances_required_to_form_a_leaf (str) – Specify the range for the minimum number of cases required to form a leaf (optional)
range_for_learning_rate (str) – Specify the range for the initial learning rate (optional)
range_for_total_number_of_trees_constructed (str) – Specify the range for the maximum number of trees that can be created during training (optional)
random_number_seed (int) – Type a value to seed the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained binary classification model

Type

untrained_model: Output

example.assets.example_components.azureml_two_class_decision_forest(create_trainer_mode: example.assets.example_components._assets._AzuremlTwoClassDecisionForestCreateTrainerModeEnum = _AzuremlTwoClassDecisionForestCreateTrainerModeEnum.singleparameter, number_of_decision_trees: int = 8, maximum_depth_of_the_decision_trees: int = 32, minimum_number_of_samples_per_leaf_node: int = 1, range_for_number_of_decision_trees: str = '1; 8; 32', range_for_the_maximum_depth_of_the_decision_trees: str = '1; 16; 64', range_for_the_minimum_number_of_samples_per_leaf_node: str = '1; 4; 16', resampling_method: example.assets.example_components._assets._AzuremlTwoClassDecisionForestResamplingMethodEnum = _AzuremlTwoClassDecisionForestResamplingMethodEnum.bagging_resampling) → example.assets.example_components._assets._AzuremlTwoClassDecisionForestComponent

Creates a two-class classification model using the decision forest algorithm.

Parameters

create_trainer_mode (_AzuremlTwoClassDecisionForestCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])
number_of_decision_trees (int) – Specify the number of decision trees to create in the ensemble (optional, min: 1)
maximum_depth_of_the_decision_trees (int) – Specify the maximum depth of any decision tree that can be created in the ensemble (optional, min: 1)
minimum_number_of_samples_per_leaf_node (int) – Specify the minimum number of training samples required to generate a leaf node (optional, min: 1)
range_for_number_of_decision_trees (str) – Specify range for the number of decision trees to create in the ensemble (optional)
range_for_the_maximum_depth_of_the_decision_trees (str) – Specify range for the maximum depth of the decision trees (optional)
range_for_the_minimum_number_of_samples_per_leaf_node (str) – Specify range for the minimum number of samples per leaf node (optional)
resampling_method (_AzuremlTwoClassDecisionForestResamplingMethodEnum) – Choose a resampling method (enum: [‘Bagging Resampling’, ‘Replicate Resampling’])

Output untrained_model

An untrained binary classification model

Type

untrained_model: Output

example.assets.example_components.azureml_two_class_logistic_regression(create_trainer_mode: example.assets.example_components._assets._AzuremlTwoClassLogisticRegressionCreateTrainerModeEnum = _AzuremlTwoClassLogisticRegressionCreateTrainerModeEnum.singleparameter, optimization_tolerance: float = 1e-07, l2_regularizaton_weight: float = 1.0, range_for_optimization_tolerance: str = '0.00001; 0.00000001', range_for_l2_regularization_weight: str = '0.01; 0.1; 1.0', random_number_seed: Optional[int] = None) → example.assets.example_components._assets._AzuremlTwoClassLogisticRegressionComponent

Creates a two-class logistic regression model.

Parameters

create_trainer_mode (_AzuremlTwoClassLogisticRegressionCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])
optimization_tolerance (float) – Specify a tolerance value for the L-BFGS optimizer (optional, min: 2.220446049250313e-16)
l2_regularizaton_weight (float) – Specify the L2 regularization weight. Use a non-zero value to avoid overfitting. (optional)
range_for_optimization_tolerance (str) – Specify a range for the tolerance value for the L-BFGS optimizer (optional)
range_for_l2_regularization_weight (str) – Specify the range for the L2 regularization weight. Use a non-zero value to avoid overfitting. (optional)
random_number_seed (int) – Type a value to seed the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained classification model

Type

untrained_model: Output

example.assets.example_components.azureml_two_class_neural_network(create_trainer_mode: example.assets.example_components._assets._AzuremlTwoClassNeuralNetworkCreateTrainerModeEnum = _AzuremlTwoClassNeuralNetworkCreateTrainerModeEnum.singleparameter, hidden_layer_specification: example.assets.example_components._assets._AzuremlTwoClassNeuralNetworkHiddenLayerSpecificationEnum = _AzuremlTwoClassNeuralNetworkHiddenLayerSpecificationEnum.fully_connected_case, number_of_hidden_nodes: str = '100', the_learning_rate: float = 0.1, number_of_learning_iterations: int = 100, hidden_layer_specification1: example.assets.example_components._assets._AzuremlTwoClassNeuralNetworkHiddenLayerSpecification1Enum = _AzuremlTwoClassNeuralNetworkHiddenLayerSpecification1Enum.fully_connected_case, number_of_hidden_nodes1: str = '100', range_for_learning_rate: str = '0.1; 0.2; 0.4', range_for_number_of_learning_iterations: str = '20; 40; 80; 160', the_momentum: float = 0, shuffle_examples: bool = True, random_number_seed: Optional[int] = None) → example.assets.example_components._assets._AzuremlTwoClassNeuralNetworkComponent

Creates a binary classifier using a neural network algorithm.

Parameters

create_trainer_mode (_AzuremlTwoClassNeuralNetworkCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])
hidden_layer_specification (_AzuremlTwoClassNeuralNetworkHiddenLayerSpecificationEnum) – Specify the architecture of the hidden layer or layers (optional, enum: [‘Fully-connected case’])
number_of_hidden_nodes (str) – Type the number of nodes in the hidden layer. For multiple hidden layers, type a comma-separated list. (optional)
the_learning_rate (float) – Specify the size of each step in the learning process (optional, min: 2.220446049250313e-16, max: 2.0)
number_of_learning_iterations (int) – Specify the number of iterations while learning (optional, min: 1)
hidden_layer_specification1 (_AzuremlTwoClassNeuralNetworkHiddenLayerSpecification1Enum) – Specify the architecture of the hidden layer or layers for range (optional, enum: [‘Fully-connected case’])
number_of_hidden_nodes1 (str) – Type the number of nodes in the hidden layer, or for multiple hidden layers, type a comma-separated list. (optional)
range_for_learning_rate (str) – Specify the range for the size of each step in the learning process (optional)
range_for_number_of_learning_iterations (str) – Specify the range for the number of iterations while learning (optional)
the_momentum (float) – Specify a weight to apply during learning to nodes from previous iterations (max: 1.0)
shuffle_examples (bool) – Select this option to change the order of instances between learning iterations
random_number_seed (int) – Specify a numeric seed to use for random number generation. Leave blank to use the default seed. (optional, max: 4294967295)

Output untrained_model

An untrained binary classification model

Type

untrained_model: Output

example.assets.example_components.azureml_two_class_support_vector_machine(create_trainer_mode: example.assets.example_components._assets._AzuremlTwoClassSupportVectorMachineCreateTrainerModeEnum = _AzuremlTwoClassSupportVectorMachineCreateTrainerModeEnum.singleparameter, number_of_iterations: int = 10, the_value_lambda: float = 0.001, range_for_number_of_iterations: str = '1; 10; 100', range_for_lambda: str = '0.00001; 0.0001; 0.001; 0.01; 0.1', normalize_the_features: bool = True, random_number_seed: Optional[int] = None) → example.assets.example_components._assets._AzuremlTwoClassSupportVectorMachineComponent

Creates a binary classification model using the Support Vector Machine algorithm.

Parameters

create_trainer_mode (_AzuremlTwoClassSupportVectorMachineCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])
number_of_iterations (int) – The number of iterations. (optional, min: 1)
the_value_lambda (float) – Weight for L1 regularization. Using a non-zero value avoids overfitting the model to the training dataset. (optional, min: 2.220446049250313e-16)
range_for_number_of_iterations (str) – The range for the number of iterations. (optional)
range_for_lambda (str) – Weight range for the for L1 regularization. Using a non-zero value avoids overfitting the model to the training dataset. (optional)
normalize_the_features (bool) – If true normalize the features.
random_number_seed (int) – The seed for the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained binary classification model that can be connected to the Create One-vs-All Multiclass Classification Model or Train Generic Model or Cross Validate Model modules.

Type

untrained_model: Output

example.assets.example_components.bing_relevance_convert2ss(TextData: Optional[pathlib.Path] = None, ExtractionClause: Optional[str] = None) → example.assets.example_components._assets._BingRelevanceConvert2SsComponent

Convert ADLS test data to SS format

Parameters

TextData (Path) – relative path on ADLS storage
ExtractionClause (str) – the extraction clause, something like “column1:string, column2:int”

Output SSPath

output path of ss

Type

SSPath: Output

example.assets.example_components.bing_relevance_convert2ss_isresource(TextData: Optional[pathlib.Path] = None, ExtractionClause: Optional[str] = None) → example.assets.example_components._assets._BingRelevanceConvert2SsIsresourceComponent

Convert ADLS test data to SS format

Parameters

TextData (Path) – relative path on ADLS storage
ExtractionClause (str) – the extraction clause, something like “column1:string, column2:int”

Output SSPath

output path of ss

Type

SSPath: Output

example.assets.example_components.fine_tune_for_huggingface_text_classification(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, max_seq_length: int = 128, per_device_train_batch_size: int = 8, learning_rate: float = 5e-05, num_train_epochs: int = 1) → example.assets.example_components._assets._FineTuneForHuggingfaceTextClassificationComponent

Parameters

model (Path) – path
dataset (Path) – path
max_seq_length (int) – The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded. (optional)
per_device_train_batch_size (int) – Batch size per GPU/TPU core/CPU for training. (optional)
learning_rate (float) – The initial learning rate for AdamW. (optional)
num_train_epochs (int) – Total number of training epochs to perform. (optional)

Output output_model

path

Type

output_model: Output

example.assets.example_components.fine_tune_for_huggingface_text_generation(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, per_device_train_batch_size: int = 8, learning_rate: float = 5e-05, num_train_epochs: int = 1) → example.assets.example_components._assets._FineTuneForHuggingfaceTextGenerationComponent

Parameters

model (Path) – path
dataset (Path) – path
per_device_train_batch_size (int) – Batch size per GPU/TPU core/CPU for training. (optional)
learning_rate (float) – The initial learning rate for AdamW. (optional)
num_train_epochs (int) – Total number of training epochs to perform. (optional)

Output output_model

path

Type

output_model: Output

example.assets.example_components.fine_tune_for_huggingface_token_classification(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, max_seq_length: int = 128, per_device_train_batch_size: int = 8, learning_rate: float = 5e-05, num_train_epochs: int = 1) → example.assets.example_components._assets._FineTuneForHuggingfaceTokenClassificationComponent

Parameters

model (Path) – path
dataset (Path) – path
max_seq_length (int) – The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded. (optional)
per_device_train_batch_size (int) – Batch size per GPU/TPU core/CPU for training. (optional)
learning_rate (float) – The initial learning rate for AdamW. (optional)
num_train_epochs (int) – Total number of training epochs to perform. (optional)

Output output_model

path

Type

output_model: Output

example.assets.example_components.microsoft_com_azureml_samples_hello_world_with_cpu_image(input_path: Optional[pathlib.Path] = None, string_parameter: Optional[str] = None) → example.assets.example_components._assets._MicrosoftComAzuremlSamplesHelloWorldWithCpuImageComponent

A hello world tutorial to create a module for ml.azure.com.

Parameters

input_path (Path) – The directory contains dataframe.
string_parameter (str) – A parameter accepts a string value. (optional)

Output output_path

The directory contains a dataframe.

Type

output_path: Output

example.assets.example_components.microsoft_com_azureml_samples_parallel_copy_files_v1(input_folder: Optional[pathlib.Path] = None) → example.assets.example_components._assets._MicrosoftComAzuremlSamplesParallelCopyFilesV1Component

A sample Parallel module to copy files.

Parameters: input_folder (Path) – AnyDirectory
Output output_folder: Output files
Type: output_folder: Output

example.assets.example_components.microsoft_com_azureml_samples_sweep_train(training_data: Optional[pathlib.Path] = None, max_epochs: Optional[int] = None, learning_rate: Optional[float] = None, subsample: Optional[float] = None) → example.assets.example_components._assets._MicrosoftComAzuremlSamplesSweepTrainComponent

A dummy train component

Parameters

training_data (Path) – Training data organized in the torchvision format/structure
max_epochs (int) – Maximum number of epochs for the training
learning_rate (float) – learning_rate (min: 0.001, max: 0.1)
subsample (float) – learning_rate (min: 0.1, max: 0.5)

Output saved_model

path

Type

saved_model: Output

Output other_output

path

Type

other_output: Output

example.assets.example_components.microsoft_com_azureml_samples_train_in_spark(input_path: Optional[pathlib.Path] = None, regularization_rate: float = 0.01) → example.assets.example_components._assets._MicrosoftComAzuremlSamplesTrainInSparkComponent

Train a Spark ML model using an HDInsight Spark cluster

Parameters

input_path (Path) – Iris csv file
regularization_rate (float) – Regularization rate when training with logistic regression (optional)

Output output_path

The output path to save the trained model to

Type

output_path: Output

example.assets.example_components.microsoft_com_azureml_samples_tune(training_data: Optional[pathlib.Path] = None, max_epochs: Optional[int] = None, learning_rate: Optional[float] = None, subsample: Optional[float] = None) → example.assets.example_components._assets._MicrosoftComAzuremlSamplesTuneComponent

A dummy hyperparameter tuning component

Parameters

training_data (Path) – Training data organized in the torchvision format/structure
max_epochs (int) – Maximum number of epochs for the training
learning_rate (float) – learning_rate (min: 0.001, max: 0.1)
subsample (float) – learning_rate (min: 0.1, max: 0.5)

Output best_model

model

Type

best_model: Output

Output saved_model

path

Type

saved_model: Output

Output other_output

path

Type

other_output: Output

example.assets.example_components.score_for_huggingface_text_classification(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, max_seq_length: int = 128) → example.assets.example_components._assets._ScoreForHuggingfaceTextClassificationComponent

Parameters

model (Path) – path
dataset (Path) – path
max_seq_length (int) – The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded. (optional)

Output output_dir

path

Type

output_dir: Output

example.assets.example_components.score_for_huggingface_text_generation(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None) → example.assets.example_components._assets._ScoreForHuggingfaceTextGenerationComponent

Parameters

model (Path) – path
dataset (Path) – path

Output output_dir

path

Type

output_dir: Output

example.assets.example_components.score_for_huggingface_token_classification(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, max_seq_length: int = 128) → example.assets.example_components._assets._ScoreForHuggingfaceTokenClassificationComponent

Parameters

model (Path) – path
dataset (Path) – path
max_seq_length (int) – The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded. (optional)

Output output_dir

path

Type

output_dir: Output

example.assets.example_components.sweep_for_huggingface_text_classification(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, max_seq_length: Optional[int] = None, per_device_train_batch_size: int = 8, learning_rate: Optional[float] = None, num_train_epochs: int = 1) → example.assets.example_components._assets._SweepForHuggingfaceTextClassificationComponent

Parameters

model (Path) – path
dataset (Path) – path
max_seq_length (int) – The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded. (optional)
per_device_train_batch_size (int) – Batch size per GPU/TPU core/CPU for training. (optional)
learning_rate (float) – The initial learning rate for AdamW. (optional)
num_train_epochs (int) – Total number of training epochs to perform. (optional)

Output output_model

path

Type

output_model: Output

example.assets.example_components.sweep_for_huggingface_text_generation(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, per_device_train_batch_size: int = 8, learning_rate: Optional[float] = None, num_train_epochs: int = 1) → example.assets.example_components._assets._SweepForHuggingfaceTextGenerationComponent

Parameters

model (Path) – path
dataset (Path) – path
per_device_train_batch_size (int) – Batch size per GPU/TPU core/CPU for training. (optional)
learning_rate (float) – The initial learning rate for AdamW. (optional)
num_train_epochs (int) – Total number of training epochs to perform. (optional)

Output output_model

path

Type

output_model: Output

example.assets.example_components.sweep_for_huggingface_token_classification(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, max_seq_length: Optional[int] = None, per_device_train_batch_size: int = 8, learning_rate: Optional[float] = None, num_train_epochs: int = 1) → example.assets.example_components._assets._SweepForHuggingfaceTokenClassificationComponent

Parameters

model (Path) – path
dataset (Path) – path
max_seq_length (int) – The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded. (optional)
per_device_train_batch_size (int) – Batch size per GPU/TPU core/CPU for training. (optional)
learning_rate (float) – The initial learning rate for AdamW. (optional)
num_train_epochs (int) – Total number of training epochs to perform. (optional)

Output output_model

path

Type

output_model: Output