example.assets.example_components package

This module is auto generated by azure-ml-component.

Assets included:
class example.assets.example_components.Datasets

Bases: object

property acronym_identification_default

Huggingface acronym_identification-default dataset

property ade_corpus_v2_ade_corpus_v2_classification

Huggingface ade_corpus_v2-Ade_corpus_v2_classification dataset

property ade_corpus_v2_ade_corpus_v2_drug_ade_relation

Huggingface ade_corpus_v2-Ade_corpus_v2_drug_ade_relation dataset

property ade_corpus_v2_ade_corpus_v2_drug_dosage_relation

Huggingface ade_corpus_v2-Ade_corpus_v2_drug_dosage_relation dataset

property adult_census_income_binary_classification_dataset

Census Income dataset

property adversarial_qa_adversarialqa

Huggingface adversarial_qa-adversarialQA dataset

property adversarial_qa_dbert

Huggingface adversarial_qa-dbert dataset

property adversarial_qa_dbidaf

Huggingface adversarial_qa-dbidaf dataset

property adversarial_qa_droberta

Huggingface adversarial_qa-droberta dataset

property aeslc_default

Huggingface aeslc-default dataset

property afrikaans_ner_corpus_afrikaans_ner_corpus

Huggingface afrikaans_ner_corpus-afrikaans_ner_corpus dataset

property ag_news_default

Huggingface ag_news-default dataset

property ai2_arc_arc_challenge

Huggingface ai2_arc-ARC-Challenge dataset

property ai2_arc_arc_easy

Huggingface ai2_arc-ARC-Easy dataset

property air_dialogue_air_dialogue_data

Huggingface air_dialogue-air_dialogue_data dataset

property air_dialogue_air_dialogue_kb

Huggingface air_dialogue-air_dialogue_kb dataset

property akhooli_gpt2_small_arabic

Huggingface akhooli/gpt2-small-arabic model

property akhooli_gpt2_small_arabic_poetry

Huggingface akhooli/gpt2-small-arabic-poetry model

property allegro_reviews_default

Huggingface allegro_reviews-default dataset

property allocine_allocine

Huggingface allocine-allocine dataset

property alt_alt_km

Huggingface alt-alt-km dataset

property alt_alt_my

Huggingface alt-alt-my dataset

property alt_alt_my_transliteration

Huggingface alt-alt-my-transliteration dataset

property alt_alt_my_west_transliteration

Huggingface alt-alt-my-west-transliteration dataset

property alt_alt_parallel

Huggingface alt-alt-parallel dataset

property alvaroalon2_biobert_chemical_ner

Huggingface alvaroalon2/biobert_chemical_ner model

property alvaroalon2_biobert_diseases_ner

Huggingface alvaroalon2/biobert_diseases_ner model

property amazon_polarity_amazon_polarity

Huggingface amazon_polarity-amazon_polarity dataset

property amazon_reviews_multi_all_languages

Huggingface amazon_reviews_multi-all_languages dataset

property amazon_reviews_multi_de

Huggingface amazon_reviews_multi-de dataset

property amazon_reviews_multi_en

Huggingface amazon_reviews_multi-en dataset

property amazon_reviews_multi_es

Huggingface amazon_reviews_multi-es dataset

property amazon_reviews_multi_fr

Huggingface amazon_reviews_multi-fr dataset

property amazon_reviews_multi_ja

Huggingface amazon_reviews_multi-ja dataset

property amazon_reviews_multi_zh

Huggingface amazon_reviews_multi-zh dataset

property ambig_qa_full

Huggingface ambig_qa-full dataset

property ambig_qa_light

Huggingface ambig_qa-light dataset

property amttl_amttl

Huggingface amttl-amttl dataset

property animal_images_dataset

This sample dataset is derived from Open Image Dataset and includes 3 animal categories (cat, dog, frog). Each category contains 10 images.

property anli_plain_text

Huggingface anli-plain_text dataset

property app_reviews_default

Huggingface app_reviews-default dataset

property aqua_rat_raw

Huggingface aqua_rat-raw dataset

property aqua_rat_tokenized

Huggingface aqua_rat-tokenized dataset

property ar_res_reviews_default

Huggingface ar_res_reviews-default dataset

property arcd_plain_text

Huggingface arcd-plain_text dataset

property arsentd_lev_default

Huggingface arsentd_lev-default dataset

property art_anli

Huggingface art-anli dataset

property asi_gpt_fr_cased_small

Huggingface asi/gpt-fr-cased-small model

property aslg_pc12_default

Huggingface aslg_pc12-default dataset

property asset_ratings

Huggingface asset-ratings dataset

property asset_simplification

Huggingface asset-simplification dataset

property assin2_default

Huggingface assin2-default dataset

property assin_full

Huggingface assin-full dataset

property assin_ptbr

Huggingface assin-ptbr dataset

property assin_ptpt

Huggingface assin-ptpt dataset

property atomic_atomic

Huggingface atomic-atomic dataset

property automobile_price_data_raw

Clean missing data module required. Prices of various automobiles against make, model and technical specifications

property autshumato_autshumato_en_tn

Huggingface autshumato-autshumato-en-tn dataset

property autshumato_autshumato_en_ts

Huggingface autshumato-autshumato-en-ts dataset

property autshumato_autshumato_en_ts_manual

Huggingface autshumato-autshumato-en-ts-manual dataset

property autshumato_autshumato_en_zu

Huggingface autshumato-autshumato-en-zu dataset

property avichr_hebert_sentiment_analysis

Huggingface avichr/heBERT_sentiment_analysis model

property bert_base_uncased

Huggingface bert-base-uncased model

property bertin_project_bertin_base_ner_conll2002_es

Huggingface bertin-project/bertin-base-ner-conll2002-es model

property bsc_tecla_tecla

Huggingface bsc/tecla-tecla dataset

property cahya_gpt2_small_indonesian_522m

Huggingface cahya/gpt2-small-indonesian-522M model

property cahya_gpt2_small_indonesian_story

Huggingface cahya/gpt2-small-indonesian-story model

property capreolus_bert_base_msmarco

Huggingface Capreolus/bert-base-msmarco model

property cardiffnlp_twitter_roberta_base_emotion

Huggingface cardiffnlp/twitter-roberta-base-emotion model

property chambliss_distilbert_for_food_extraction

Huggingface chambliss/distilbert-for-food-extraction model

property ckiplab_albert_base_chinese_ner

Huggingface ckiplab/albert-base-chinese-ner model

property ckiplab_albert_base_chinese_pos

Huggingface ckiplab/albert-base-chinese-pos model

property ckiplab_albert_base_chinese_ws

Huggingface ckiplab/albert-base-chinese-ws model

property ckiplab_albert_tiny_chinese_ws

Huggingface ckiplab/albert-tiny-chinese-ws model

property ckiplab_bert_base_chinese_ner

Huggingface ckiplab/bert-base-chinese-ner model

property ckiplab_bert_base_chinese_pos

Huggingface ckiplab/bert-base-chinese-pos model

property ckiplab_bert_base_chinese_ws

Huggingface ckiplab/bert-base-chinese-ws model

property colorfulscoop_gpt2_small_ja

Huggingface colorfulscoop/gpt2-small-ja model

property crm_appetency_labels_shared

CRM Appetency Labels

property crm_churn_labels_shared

CRM Churn Labels

property crm_dataset_shared

CRM Dataset

property crm_upselling_labels_shared

CRM Upselling Labels

property cross_encoder_ms_marco_electra_base

Huggingface cross-encoder/ms-marco-electra-base model

property cross_encoder_stsb_tinybert_l_4

Huggingface cross-encoder/stsb-TinyBERT-L-4 model

property datificate_gpt2_small_spanish

Huggingface datificate/gpt2-small-spanish model

property dbmdz_bert_base_cased_finetuned_conll03_english

Huggingface dbmdz/bert-base-cased-finetuned-conll03-english model

property distilbert_base_uncased_finetuned_sst_2_english

Huggingface distilbert-base-uncased-finetuned-sst-2-english model

property dslim_bert_base_ner

Huggingface dslim/bert-base-NER model

property dslim_bert_base_ner_uncased

Huggingface dslim/bert-base-NER-uncased model

property elastic_distilbert_base_cased_finetuned_conll03_english

Huggingface elastic/distilbert-base-cased-finetuned-conll03-english model

property elastic_distilbert_base_uncased_finetuned_conll03_english

Huggingface elastic/distilbert-base-uncased-finetuned-conll03-english model

property ethanyt_guwen_ner

Huggingface ethanyt/guwen-ner model

property ethanyt_guwen_punc

Huggingface ethanyt/guwen-punc model

property ferch423_gpt2_small_portuguese_wikipediabio

Huggingface Ferch423/gpt2-small-portuguese-wikipediabio model

property finiteautomata_bertweet_base_sentiment_analysis

Huggingface finiteautomata/bertweet-base-sentiment-analysis model

property finiteautomata_beto_sentiment_analysis

Huggingface finiteautomata/beto-sentiment-analysis model

property flight_delays_data

Flight Delays Data

property german_credit_card_uci_dataset

German Credit Card UCI dataset

property gilf_french_camembert_postag_model

Huggingface gilf/french-camembert-postag-model model

property glue_ax

Huggingface glue-ax dataset

property glue_cola

Huggingface glue-cola dataset

property glue_mnli

Huggingface glue-mnli dataset

property glue_mnli_matched

Huggingface glue-mnli_matched dataset

property glue_mnli_mismatched

Huggingface glue-mnli_mismatched dataset

property glue_mrpc

Huggingface glue-mrpc dataset

property glue_qnli

Huggingface glue-qnli dataset

property glue_qqp

Huggingface glue-qqp dataset

property glue_rte

Huggingface glue-rte dataset

property glue_sst2

Huggingface glue-sst2 dataset

property glue_stsb

Huggingface glue-stsb dataset

property glue_wnli

Huggingface glue-wnli dataset

property gronlp_gpt2_small_italian

Huggingface GroNLP/gpt2-small-italian model

property gunghio_distilbert_base_multilingual_cased_finetuned_conll2003_ner

Huggingface gunghio/distilbert-base-multilingual-cased-finetuned-conll2003-ner model

property hf_internal_testing_tiny_xlm_roberta

Huggingface hf-internal-testing/tiny-xlm-roberta model

property imdb_movie_titles

IMDB Movie Titles

property imdb_plain_text

Huggingface imdb-plain_text dataset

property jsfoon_slogan_generator

Huggingface jsfoon/slogan-generator model

property lilaboualili_bert_vanilla

Huggingface LilaBoualili/bert-vanilla model

property lordtt13_emo_mobilebert

Huggingface lordtt13/emo-mobilebert model

property maltehb_l_ctra_danish_electra_small_uncased_ner_dane

Huggingface Maltehb/-l-ctra-danish-electra-small-uncased-ner-dane model

property media1129_recipe_tag_model

Huggingface Media1129/recipe-tag-model model

property microsoft_codegpt_small_py

Huggingface microsoft/CodeGPT-small-py model

property microsoft_codegpt_small_py_adaptedgpt2

Huggingface microsoft/CodeGPT-small-py-adaptedGPT2 model

property microsoft_minilm_l12_h384_uncased

Huggingface microsoft/MiniLM-L12-H384-uncased model

property movie_ratings

Movie Ratings

property mrm8488_bert_spanish_cased_finetuned_ner

Huggingface mrm8488/bert-spanish-cased-finetuned-ner model

property mrm8488_bert_tiny_finetuned_sms_spam_detection

Huggingface mrm8488/bert-tiny-finetuned-sms-spam-detection model

property mrm8488_codebert_base_finetuned_stackoverflow_ner

Huggingface mrm8488/codebert-base-finetuned-stackoverflow-ner model

property mrm8488_mobilebert_finetuned_ner

Huggingface mrm8488/mobilebert-finetuned-ner model

property mrm8488_mobilebert_finetuned_pos

Huggingface mrm8488/mobilebert-finetuned-pos model

property myx4567_distilgpt2_finetuned_wikitext2

Huggingface MYX4567/distilgpt2-finetuned-wikitext2 model

property narsil_tiny_distilbert_sequence_classification

Huggingface Narsil/tiny-distilbert-sequence-classification model

property nateraw_bert_base_uncased_emotion

Huggingface nateraw/bert-base-uncased-emotion model

property oliverguhr_german_sentiment_bert

Huggingface oliverguhr/german-sentiment-bert model

property philschmid_distilroberta_base_ner_conll2003

Huggingface philschmid/distilroberta-base-ner-conll2003 model

property pierreguillou_gpt2_small_portuguese

Huggingface pierreguillou/gpt2-small-portuguese model

property pierrerappolt_disease_extraction

Huggingface pierrerappolt/disease-extraction model

property pranavpsv_gpt2_genre_story_generator

Huggingface pranavpsv/gpt2-genre-story-generator model

property prosusai_finbert

Huggingface ProsusAI/finbert model

property proycon_bert_ner_cased_sonar1_nld

Huggingface proycon/bert-ner-cased-sonar1-nld model

property recordedfuture_swedish_ner

Huggingface RecordedFuture/Swedish-NER model

property restaurant_customer_data

Contains customer features, such as drink_level, dress_preference and marital_status.

property restaurant_feature_data

Contains restaurant features, such as name, address and dress_code.

property restaurant_ratings

Contains ratings given by customers to restaurants on scale from 0 to 2.

property sgugger_tiny_distilbert_classification

Huggingface sgugger/tiny-distilbert-classification model

property squad_adversarial_addonesent

Huggingface squad_adversarial-AddOneSent dataset

property squad_adversarial_addsent

Huggingface squad_adversarial-AddSent dataset

property squad_es_v1_1_0

Huggingface squad_es-v1.1.0 dataset

property squad_it_default

Huggingface squad_it-default dataset

property squad_plain_text

Huggingface squad-plain_text dataset

property squad_v1_pt_default

Huggingface squad_v1_pt-default dataset

property squad_v2_squad_v2

Huggingface squad_v2-squad_v2 dataset

property squadshifts_amazon

Huggingface squadshifts-amazon dataset

property squadshifts_new_wiki

Huggingface squadshifts-new_wiki dataset

property squadshifts_nyt

Huggingface squadshifts-nyt dataset

property squadshifts_reddit

Huggingface squadshifts-reddit dataset

property sshleifer_tiny_ctrl

Huggingface sshleifer/tiny-ctrl model

property sshleifer_tiny_dbmdz_bert_large_cased_finetuned_conll03_english

Huggingface sshleifer/tiny-dbmdz-bert-large-cased-finetuned-conll03-english model

property sshleifer_tiny_distilbert_base_cased

Huggingface sshleifer/tiny-distilbert-base-cased model

property sshleifer_tiny_distilbert_base_uncased_finetuned_sst_2_english

Huggingface sshleifer/tiny-distilbert-base-uncased-finetuned-sst-2-english model

property sshleifer_tiny_gpt2

Huggingface sshleifer/tiny-gpt2 model

property sshleifer_tiny_xlnet_base_cased

Huggingface sshleifer/tiny-xlnet-base-cased model

property super_glue_axb

Huggingface super_glue-axb dataset

property super_glue_axg

Huggingface super_glue-axg dataset

property super_glue_boolq

Huggingface super_glue-boolq dataset

property super_glue_cb

Huggingface super_glue-cb dataset

property super_glue_copa

Huggingface super_glue-copa dataset

property super_glue_multirc

Huggingface super_glue-multirc dataset

property super_glue_record

Huggingface super_glue-record dataset

property super_glue_rte

Huggingface super_glue-rte dataset

property super_glue_wic

Huggingface super_glue-wic dataset

property super_glue_wsc

Huggingface super_glue-wsc dataset

property super_glue_wsc_fixed

Huggingface super_glue-wsc.fixed dataset

property swag_full

Huggingface swag-full dataset

property swag_regular

Huggingface swag-regular dataset

property swahili_news_swahili_news

Huggingface swahili_news-swahili_news dataset

property swahili_swahili

Huggingface swahili-swahili dataset

property swda_default

Huggingface swda-default dataset

property swedish_ner_corpus_default

Huggingface swedish_ner_corpus-default dataset

property swedish_reviews_plain_text

Huggingface swedish_reviews-plain_text dataset

property tab_fact_blind_test

Huggingface tab_fact-blind_test dataset

property tab_fact_tab_fact

Huggingface tab_fact-tab_fact dataset

property tamilmixsentiment_default

Huggingface tamilmixsentiment-default dataset

property tanzil_bg_en

Huggingface tanzil-bg-en dataset

property tanzil_bn_hi

Huggingface tanzil-bn-hi dataset

property tanzil_en_tr

Huggingface tanzil-en-tr dataset

property tanzil_fa_sv

Huggingface tanzil-fa-sv dataset

property tanzil_ru_zh

Huggingface tanzil-ru-zh dataset

property tapaco_en

Huggingface tapaco-en dataset

property tapaco_eo

Huggingface tapaco-eo dataset

property tapaco_es

Huggingface tapaco-es dataset

property tapaco_et

Huggingface tapaco-et dataset

property tapaco_eu

Huggingface tapaco-eu dataset

property tapaco_fi

Huggingface tapaco-fi dataset

property tapaco_fr

Huggingface tapaco-fr dataset

property tapaco_gl

Huggingface tapaco-gl dataset

property tapaco_gos

Huggingface tapaco-gos dataset

property textattack_bert_base_uncased_cola

Huggingface textattack/bert-base-uncased-CoLA model

property textattack_bert_base_uncased_imdb

Huggingface textattack/bert-base-uncased-imdb model

property textattack_bert_base_uncased_mnli

Huggingface textattack/bert-base-uncased-MNLI model

property textattack_bert_base_uncased_snli

Huggingface textattack/bert-base-uncased-snli model

property textattack_bert_base_uncased_sst_2

Huggingface textattack/bert-base-uncased-SST-2 model

property textattack_distilbert_base_uncased_imdb

Huggingface textattack/distilbert-base-uncased-imdb model

property textattack_distilbert_base_uncased_rotten_tomatoes

Huggingface textattack/distilbert-base-uncased-rotten-tomatoes model

property textattack_roberta_base_imdb

Huggingface textattack/roberta-base-imdb model

property textattack_xlnet_base_cased_imdb

Huggingface textattack/xlnet-base-cased-imdb model

property transformersbook_codepage_small

Huggingface transformersbook/codepage-small model

property uer_gpt2_chinese_poem

Huggingface uer/gpt2-chinese-poem model

property uer_roberta_base_finetuned_cluener2020_chinese

Huggingface uer/roberta-base-finetuned-cluener2020-chinese model

property unitary_toxic_bert

Huggingface unitary/toxic-bert model

property vblagoje_bert_english_uncased_finetuned_pos

Huggingface vblagoje/bert-english-uncased-finetuned-pos model

property vishnun_distilgpt2_finetuned_distilgpt2_med_articles

Huggingface vishnun/distilgpt2-finetuned-distilgpt2-med_articles model

property vishnun_distilgpt2_finetuned_tamilmixsentiment

Huggingface vishnun/distilgpt2-finetuned-tamilmixsentiment model

property weather_dataset

Weather Dataset

property wietsedv_bert_base_multilingual_cased_finetuned_conll2002_ner

Huggingface wietsedv/bert-base-multilingual-cased-finetuned-conll2002-ner model

property wikipedia_sp_500_dataset

Wikipedia SP 500 Dataset

property xlnet_base_cased

Huggingface xlnet-base-cased model

example.assets.example_components.azureml_add_columns(left_dataset: Optional[pathlib.Path] = None, right_dataset: Optional[pathlib.Path] = None) example.assets.example_components._assets._AzuremlAddColumnsComponent

Adds a set of columns from one dataset to another.

Parameters
  • left_dataset (Path) – Left dataset

  • right_dataset (Path) – Right dataset

Output combined_dataset

Combined dataset

Type

combined_dataset: Output

example.assets.example_components.azureml_add_rows(dataset1: Optional[pathlib.Path] = None, dataset2: Optional[pathlib.Path] = None) example.assets.example_components._assets._AzuremlAddRowsComponent

Appends a set of rows from an input dataset to the end of another dataset.

Parameters
  • dataset1 (Path) – Dataset rows to be added to the output dataset first

  • dataset2 (Path) – Dataset rows to be appended to the first dataset

Output results_dataset

Dataset that contains all rows of both input datasets

Type

results_dataset: Output

example.assets.example_components.azureml_apply_image_transformation(input_image_transformation: Optional[pathlib.Path] = None, input_image_directory: Optional[pathlib.Path] = None, mode: Optional[example.assets.example_components._assets._AzuremlApplyImageTransformationModeEnum] = None) example.assets.example_components._assets._AzuremlApplyImageTransformationComponent

Applies a image transformation to a image directory.

Parameters
  • input_image_transformation (Path) – Input image transformation

  • input_image_directory (Path) – Input image directory

  • mode (_AzuremlApplyImageTransformationModeEnum) – Should exclude ‘Random’ transform operations in inference but keep them in training (enum: [‘For training’, ‘For inference’])

Output output_image_directory

Output image directory

Type

output_image_directory: Output

example.assets.example_components.azureml_apply_math_operation(input: Optional[pathlib.Path] = None, category: example.assets.example_components._assets._AzuremlApplyMathOperationCategoryEnum = _AzuremlApplyMathOperationCategoryEnum.basic, basic_func: example.assets.example_components._assets._AzuremlApplyMathOperationBasicFuncEnum = _AzuremlApplyMathOperationBasicFuncEnum.abs, basic_arg_type: example.assets.example_components._assets._AzuremlApplyMathOperationBasicArgTypeEnum = _AzuremlApplyMathOperationBasicArgTypeEnum.constant, basic_constant: float = 1, basic_column_selector: Optional[str] = None, compare_func: example.assets.example_components._assets._AzuremlApplyMathOperationCompareFuncEnum = _AzuremlApplyMathOperationCompareFuncEnum.equalto, compare_arg_type: example.assets.example_components._assets._AzuremlApplyMathOperationCompareArgTypeEnum = _AzuremlApplyMathOperationCompareArgTypeEnum.constant, compare_constant: float = 1, compare_column_selector: Optional[str] = None, operations_func: example.assets.example_components._assets._AzuremlApplyMathOperationOperationsFuncEnum = _AzuremlApplyMathOperationOperationsFuncEnum.add, operations_arg_type: example.assets.example_components._assets._AzuremlApplyMathOperationOperationsArgTypeEnum = _AzuremlApplyMathOperationOperationsArgTypeEnum.constant, operations_constant: float = 1, operations_column_selector: Optional[str] = None, rounding_func: example.assets.example_components._assets._AzuremlApplyMathOperationRoundingFuncEnum = _AzuremlApplyMathOperationRoundingFuncEnum.ceiling, rounding_arg_type: example.assets.example_components._assets._AzuremlApplyMathOperationRoundingArgTypeEnum = _AzuremlApplyMathOperationRoundingArgTypeEnum.constant, rounding_constant: float = 1, rounding_column_selector: Optional[str] = None, special_func: example.assets.example_components._assets._AzuremlApplyMathOperationSpecialFuncEnum = _AzuremlApplyMathOperationSpecialFuncEnum.beta, special_arg_type: example.assets.example_components._assets._AzuremlApplyMathOperationSpecialArgTypeEnum = _AzuremlApplyMathOperationSpecialArgTypeEnum.constant, special_constant: float = 1, special_column_selector: Optional[str] = None, trigonometric_func: example.assets.example_components._assets._AzuremlApplyMathOperationTrigonometricFuncEnum = _AzuremlApplyMathOperationTrigonometricFuncEnum.acos, column_selector: Optional[str] = None, output_mode: example.assets.example_components._assets._AzuremlApplyMathOperationOutputModeEnum = _AzuremlApplyMathOperationOutputModeEnum.append) example.assets.example_components._assets._AzuremlApplyMathOperationComponent

Applies a mathematical operation to column values.

Parameters
  • input (Path) – DataFrameDirectory

  • category (_AzuremlApplyMathOperationCategoryEnum) – enum (enum: [‘Basic’, ‘Compare’, ‘Operations’, ‘Rounding’, ‘Special’, ‘Trigonometric’])

  • basic_func (_AzuremlApplyMathOperationBasicFuncEnum) – enum (optional, enum: [‘Abs’, ‘Atan2’, ‘Conj’, ‘Cuberoot’, ‘DoubleFactorial’, ‘Eps’, ‘Exp’, ‘Exp2’, ‘ExpMinus1’, ‘Factorial’, ‘Hypotenuse’, ‘ImaginaryPart’, ‘Ln’, ‘LnPlus1’, ‘Log’, ‘Log10’, ‘Log2’, ‘NthRoot’, ‘Pow’, ‘RealPart’, ‘Sqrt’, ‘SqrtPi’, ‘Square’])

  • basic_arg_type (_AzuremlApplyMathOperationBasicArgTypeEnum) – enum (optional, enum: [‘Constant’, ‘ColumnSet’])

  • basic_constant (float) – float (optional)

  • basic_column_selector (str) – ColumnPicker (optional)

  • compare_func (_AzuremlApplyMathOperationCompareFuncEnum) – enum (optional, enum: [‘EqualTo’, ‘GreaterThan’, ‘GreaterThanOrEqualTo’, ‘LessThan’, ‘LessThanOrEqualTo’, ‘NotEqualTo’, ‘PairMax’, ‘PairMin’])

  • compare_arg_type (_AzuremlApplyMathOperationCompareArgTypeEnum) – enum (optional, enum: [‘Constant’, ‘ColumnSet’])

  • compare_constant (float) – float (optional)

  • compare_column_selector (str) – ColumnPicker (optional)

  • operations_func (_AzuremlApplyMathOperationOperationsFuncEnum) – enum (optional, enum: [‘Add’, ‘Divide’, ‘Multiply’, ‘Subtract’])

  • operations_arg_type (_AzuremlApplyMathOperationOperationsArgTypeEnum) – enum (optional, enum: [‘Constant’, ‘ColumnSet’])

  • operations_constant (float) – float (optional)

  • operations_column_selector (str) – ColumnPicker (optional)

  • rounding_func (_AzuremlApplyMathOperationRoundingFuncEnum) – enum (optional, enum: [‘Ceiling’, ‘CeilingPower2’, ‘Floor’, ‘Mod’, ‘Quotient’, ‘Remainder’, ‘RoundDigits’, ‘RoundDown’, ‘RoundUp’, ‘ToEven’, ‘ToMultiple’, ‘ToOdd’, ‘Truncate’])

  • rounding_arg_type (_AzuremlApplyMathOperationRoundingArgTypeEnum) – enum (optional, enum: [‘Constant’, ‘ColumnSet’])

  • rounding_constant (float) – float (optional)

  • rounding_column_selector (str) – ColumnPicker (optional)

  • special_func (_AzuremlApplyMathOperationSpecialFuncEnum) – enum (optional, enum: [‘Beta’, ‘BetaLn’, ‘EllipticIntegralE’, ‘EllipticIntegralK’, ‘Erf’, ‘Erfc’, ‘ErfcScaled’, ‘ErfInverse’, ‘ExponentialIntegralEin’, ‘Gamma’, ‘GammaLn’, ‘GammaRegularizedP’, ‘GammaRegularizedPInverse’, ‘GammaRegularizedQ’, ‘GammaRegularizedQInverse’, ‘Polygamma’])

  • special_arg_type (_AzuremlApplyMathOperationSpecialArgTypeEnum) – enum (optional, enum: [‘Constant’, ‘ColumnSet’])

  • special_constant (float) – float (optional)

  • special_column_selector (str) – ColumnPicker (optional)

  • trigonometric_func (_AzuremlApplyMathOperationTrigonometricFuncEnum) – enum (optional, enum: [‘Acos’, ‘AcosDegrees’, ‘Acosh’, ‘Acot’, ‘AcotDegrees’, ‘Acoth’, ‘Acsc’, ‘AcscDegrees’, ‘Acsch’, ‘Arg’, ‘Asec’, ‘AsecDegrees’, ‘Asech’, ‘Asin’, ‘AsinDegrees’, ‘Asinh’, ‘Atan’, ‘AtanDegrees’, ‘Atanh’, ‘Cis’, ‘Cos’, ‘CosDegrees’, ‘Cosh’, ‘Cot’, ‘CotDegrees’, ‘Coth’, ‘Csc’, ‘CscDegrees’, ‘Csch’, ‘DegreesToRadians’, ‘RadiansToDegrees’, ‘Sec’, ‘SecDegrees’, ‘Sech’, ‘Sign’, ‘Sin’, ‘Sinc’, ‘SinDegrees’, ‘Sinh’, ‘Tan’, ‘TanDegrees’, ‘Tanh’])

  • column_selector (str) – ColumnPicker

  • output_mode (_AzuremlApplyMathOperationOutputModeEnum) – enum (enum: [‘Append’, ‘Inplace’, ‘ResultOnly’])

Output result_dataset

DataFrameDirectory

Type

result_dataset: Output

example.assets.example_components.azureml_apply_sql_transformation(t1: Optional[pathlib.Path] = None, t2: Optional[pathlib.Path] = None, t3: Optional[pathlib.Path] = None, sqlquery: str = 'select * from t1') example.assets.example_components._assets._AzuremlApplySqlTransformationComponent

Runs a SQLite query on input datasets to transform the data.

Parameters
  • t1 (Path) – DataFrameDirectory

  • t2 (Path) – DataFrameDirectory(optional)

  • t3 (Path) – DataFrameDirectory(optional)

  • sqlquery (str) – Script

Output result_dataset

DataFrameDirectory

Type

result_dataset: Output

example.assets.example_components.azureml_apply_transformation(transformation: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None) example.assets.example_components._assets._AzuremlApplyTransformationComponent

Applies a well-specified data transformation to a dataset.

Parameters
  • transformation (Path) – A unary data transformation

  • dataset (Path) – Dataset to be transformed

Output transformed_dataset

Transformed dataset

Type

transformed_dataset: Output

example.assets.example_components.azureml_assign_data_to_clusters(trained_model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, check_for_append_or_uncheck_for_result_only: bool = True) example.assets.example_components._assets._AzuremlAssignDataToClustersComponent

Assign data to clusters using an existing trained clustering model.

Parameters
  • trained_model (Path) – Trained clustering model

  • dataset (Path) – Input data source

  • check_for_append_or_uncheck_for_result_only (bool) – Whether output dataset must contain input dataset appended by assignments column (Checked) or assignments column only (Unchecked)

Output results_dataset

Input dataset appended by data column of assignments or assignments column only

Type

results_dataset: Output

example.assets.example_components.azureml_boosted_decision_tree_regression(create_trainer_mode: example.assets.example_components._assets._AzuremlBoostedDecisionTreeRegressionCreateTrainerModeEnum = _AzuremlBoostedDecisionTreeRegressionCreateTrainerModeEnum.singleparameter, maximum_number_of_leaves_per_tree: int = 20, minimum_number_of_training_instances_required_to_form_a_leaf: int = 10, the_learning_rate: float = 0.2, total_number_of_trees_constructed: int = 100, range_for_maximum_number_of_leaves_per_tree: str = '2; 8; 32; 128', range_for_minimum_number_of_training_instances_required_to_form_a_leaf: str = '1; 10; 50', range_for_learning_rate: str = '0.025; 0.05; 0.1; 0.2; 0.4', range_for_total_number_of_trees_constructed: str = '20; 100; 500', random_number_seed: Optional[int] = None) example.assets.example_components._assets._AzuremlBoostedDecisionTreeRegressionComponent

Creates a regression model using the Boosted Decision Tree algorithm.

Parameters
  • create_trainer_mode (_AzuremlBoostedDecisionTreeRegressionCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])

  • maximum_number_of_leaves_per_tree (int) – Specify the maximum number of leaves per tree (optional, min: 2, max: 131072)

  • minimum_number_of_training_instances_required_to_form_a_leaf (int) – Specify the minimum number of cases required to form a leaf node (optional, min: 1)

  • the_learning_rate (float) – Specify the initial learning rate (optional, min: 2.220446049250313e-16, max: 1.0)

  • total_number_of_trees_constructed (int) – Specify the maximum number of trees that can be created during training (optional, min: 1)

  • range_for_maximum_number_of_leaves_per_tree (str) – Specify range for the maximum number of leaves allowed per tree (optional)

  • range_for_minimum_number_of_training_instances_required_to_form_a_leaf (str) – Specify the range for the minimum number of cases required to form a leaf (optional)

  • range_for_learning_rate (str) – Specify the range for the initial learning rate (optional)

  • range_for_total_number_of_trees_constructed (str) – Specify the range for the maximum number of trees that can be created during training (optional)

  • random_number_seed (int) – Provide a seed for the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained regression model that can be connected to the Train Generic Model or Cross Validate Model modules

Type

untrained_model: Output

example.assets.example_components.azureml_clean_missing_data(dataset: Optional[pathlib.Path] = None, columns_to_be_cleaned: Optional[str] = None, minimum_missing_value_ratio: float = 0.0, maximum_missing_value_ratio: float = 1.0, cleaning_mode: example.assets.example_components._assets._AzuremlCleanMissingDataCleaningModeEnum = _AzuremlCleanMissingDataCleaningModeEnum.custom_substitution_value, replacement_value: str = '0', generate_missing_value_indicator_column: bool = False, cols_with_all_missing_values: example.assets.example_components._assets._AzuremlCleanMissingDataColsWithAllMissingValuesEnum = _AzuremlCleanMissingDataColsWithAllMissingValuesEnum.remove) example.assets.example_components._assets._AzuremlCleanMissingDataComponent

Specifies how to handle the values missing from a dataset.

Parameters
  • dataset (Path) – Dataset to be cleaned

  • columns_to_be_cleaned (str) – Columns for missing values clean operation

  • minimum_missing_value_ratio (float) – Clean only column with missing value ratio above specified value, out of set of all selected columns (max: 1.0)

  • maximum_missing_value_ratio (float) – Clean only columns with missing value ratio below specified value, out of set of all selected columns (max: 1.0)

  • cleaning_mode (_AzuremlCleanMissingDataCleaningModeEnum) – Algorithm to clean missing values (enum: [‘Custom substitution value’, ‘Replace with mean’, ‘Replace with median’, ‘Replace with mode’, ‘Remove entire row’, ‘Remove entire column’])

  • replacement_value (str) – Type the value that takes the place of missing values (optional)

  • generate_missing_value_indicator_column (bool) – Generate a column that indicates which rows were cleaned (optional)

  • cols_with_all_missing_values (_AzuremlCleanMissingDataColsWithAllMissingValuesEnum) – Cols with all missing values (optional, enum: [‘Propagate’, ‘Remove’])

Output cleaned_dataset

Cleaned dataset

Type

cleaned_dataset: Output

Output cleaning_transformation

Transformation to be passed to Apply Transformation module to clean new data

Type

cleaning_transformation: Output

example.assets.example_components.azureml_clip_values(input: Optional[pathlib.Path] = None, clipmode: example.assets.example_components._assets._AzuremlClipValuesClipmodeEnum = _AzuremlClipValuesClipmodeEnum.clippeaks, upperthreshold: example.assets.example_components._assets._AzuremlClipValuesUpperthresholdEnum = _AzuremlClipValuesUpperthresholdEnum.constant, constantupperthreshold: float = 99, percentileupperthreshold: float = 99, modeuppersubstitute: example.assets.example_components._assets._AzuremlClipValuesModeuppersubstituteEnum = _AzuremlClipValuesModeuppersubstituteEnum.threshold, lowerthreshold: example.assets.example_components._assets._AzuremlClipValuesLowerthresholdEnum = _AzuremlClipValuesLowerthresholdEnum.constant, constantlowerthreshold: float = 1, percentilelowerthreshold: float = 1, modeowersubstitute: example.assets.example_components._assets._AzuremlClipValuesModeowersubstituteEnum = _AzuremlClipValuesModeowersubstituteEnum.threshold, lowerupperthreshold: example.assets.example_components._assets._AzuremlClipValuesLowerupperthresholdEnum = _AzuremlClipValuesLowerupperthresholdEnum.constant, constantuthreshold: float = 99, constantlthreshold: float = 1, percentileuthreshold: float = 99, percentilelthreshold: float = 1, modeusubstitute: example.assets.example_components._assets._AzuremlClipValuesModeusubstituteEnum = _AzuremlClipValuesModeusubstituteEnum.threshold, modelsubstitute: example.assets.example_components._assets._AzuremlClipValuesModelsubstituteEnum = _AzuremlClipValuesModelsubstituteEnum.threshold, column_selector: Optional[str] = None, inplace_flag: bool = True, indicator_flag: bool = False) example.assets.example_components._assets._AzuremlClipValuesComponent

Detects outliers and clips or replaces their values.

Parameters
  • input (Path) – DataFrameDirectory

  • clipmode (_AzuremlClipValuesClipmodeEnum) – enum (enum: [‘ClipPeaks’, ‘ClipSubPeaks’, ‘ClipPeaksAndSubpeaks’])

  • upperthreshold (_AzuremlClipValuesUpperthresholdEnum) – enum (optional, enum: [‘Constant’, ‘Percentile’])

  • constantupperthreshold (float) – float (optional)

  • percentileupperthreshold (float) – float (optional)

  • modeuppersubstitute (_AzuremlClipValuesModeuppersubstituteEnum) – enum (optional, enum: [‘Threshold’, ‘Mean’, ‘Median’, ‘Missing’])

  • lowerthreshold (_AzuremlClipValuesLowerthresholdEnum) – enum (optional, enum: [‘Constant’, ‘Percentile’])

  • constantlowerthreshold (float) – float (optional)

  • percentilelowerthreshold (float) – float (optional)

  • modeowersubstitute (_AzuremlClipValuesModeowersubstituteEnum) – enum (optional, enum: [‘Threshold’, ‘Mean’, ‘Median’, ‘Missing’])

  • lowerupperthreshold (_AzuremlClipValuesLowerupperthresholdEnum) – enum (optional, enum: [‘Constant’, ‘Percentile’])

  • constantuthreshold (float) – float (optional)

  • constantlthreshold (float) – float (optional)

  • percentileuthreshold (float) – float (optional)

  • percentilelthreshold (float) – float (optional)

  • modeusubstitute (_AzuremlClipValuesModeusubstituteEnum) – enum (optional, enum: [‘Threshold’, ‘Mean’, ‘Median’, ‘Missing’])

  • modelsubstitute (_AzuremlClipValuesModelsubstituteEnum) – enum (optional, enum: [‘Threshold’, ‘Mean’, ‘Median’, ‘Missing’])

  • column_selector (str) – ColumnPicker

  • inplace_flag (bool) – boolean

  • indicator_flag (bool) – boolean

Output result_dataset

DataFrameDirectory

Type

result_dataset: Output

example.assets.example_components.azureml_convert_to_csv(dataset: Optional[pathlib.Path] = None) example.assets.example_components._assets._AzuremlConvertToCsvComponent

Converts data input to a comma-separated values format.

Parameters

dataset (Path) – Input dataset

Output results_dataset

Output dataset

Type

results_dataset: Output

example.assets.example_components.azureml_convert_to_dataset(dataset: Optional[pathlib.Path] = None, action: example.assets.example_components._assets._AzuremlConvertToDatasetActionEnum = _AzuremlConvertToDatasetActionEnum.none, custom_missing_value: str = '?', replace: example.assets.example_components._assets._AzuremlConvertToDatasetReplaceEnum = _AzuremlConvertToDatasetReplaceEnum.missing, custom_value: str = 'obs', new_value: str = '0') example.assets.example_components._assets._AzuremlConvertToDatasetComponent

Converts data input to the internal Dataset format used by Azure Machine Learning designer.

Parameters
  • dataset (Path) – Input dataset

  • action (_AzuremlConvertToDatasetActionEnum) – Action to apply to input dataset (enum: [‘None’, ‘SetMissingValues’, ‘ReplaceValues’])

  • custom_missing_value (str) – Value indicating missing value token (optional)

  • replace (_AzuremlConvertToDatasetReplaceEnum) – Specifies type of replacement for values (optional, enum: [‘Missing’, ‘Custom’])

  • custom_value (str) – Value to be replaced (optional)

  • new_value (str) – Replacement value (optional)

Output results_dataset

Output dataset

Type

results_dataset: Output

example.assets.example_components.azureml_convert_to_image_directory(input_dataset: Optional[pathlib.Path] = None) example.assets.example_components._assets._AzuremlConvertToImageDirectoryComponent

Convert dataset to image directory format.

Parameters

input_dataset (Path) – Input dataset

Output output_image_directory

Output image directory.

Type

output_image_directory: Output

example.assets.example_components.azureml_convert_to_indicator_values(dataset: Optional[pathlib.Path] = None, categorical_columns_to_convert: Optional[str] = None, overwrite_categorical_columns: bool = False) example.assets.example_components._assets._AzuremlConvertToIndicatorValuesComponent

Converts categorical values in columns to indicator values.

Parameters
  • dataset (Path) – Dataset with categorical columns

  • categorical_columns_to_convert (str) – Select categorical columns to convert to indicator matrices.

  • overwrite_categorical_columns (bool) – If True, overwrite the selected categorical columns, otherwise append the resulting indicator matrices to the dataset (optional)

Output results_dataset

Dataset with categorical columns converted to indicator matrices.

Type

results_dataset: Output

Output indicator_values_transformation

Transformation to be passed to Apply Transformation module to convert indicator values for new data

Type

indicator_values_transformation: Output

example.assets.example_components.azureml_convert_word_to_vector(dataset: Optional[pathlib.Path] = None, target_column: Optional[str] = None, word2vec_strategy: example.assets.example_components._assets._AzuremlConvertWordToVectorWord2VecStrategyEnum = _AzuremlConvertWordToVectorWord2VecStrategyEnum.gensim_word2vec, word2vec_training_algorithm: example.assets.example_components._assets._AzuremlConvertWordToVectorWord2VecTrainingAlgorithmEnum = _AzuremlConvertWordToVectorWord2VecTrainingAlgorithmEnum.skip_gram, length_of_word_embedding: int = 100, context_window_size: int = 5, number_of_epochs: int = 5, maximum_vocabulary_size: int = 10000, minimum_word_count: int = 5) example.assets.example_components._assets._AzuremlConvertWordToVectorComponent

Convert word to vector.

Parameters
  • dataset (Path) – Input data

  • target_column (str) – Select one target column whose vocabulary embeddings will be generated

  • word2vec_strategy (_AzuremlConvertWordToVectorWord2VecStrategyEnum) – Select the strategy for computing word embedding (enum: [‘GloVe pretrained English Model’, ‘Gensim Word2Vec’, ‘Gensim FastText’])

  • word2vec_training_algorithm (_AzuremlConvertWordToVectorWord2VecTrainingAlgorithmEnum) – Select the training algorithm for training Word2Vec model (optional, enum: [‘Skip_gram’, ‘CBOW’])

  • length_of_word_embedding (int) – Specify the length of the word embedding/vector (optional, min: 10, max: 2000)

  • context_window_size (int) – Specify the maximum distance between the word being predicted and the current word (optional, min: 1, max: 100)

  • number_of_epochs (int) – Specify the number of epochs (iterations) over the corpus (optional, min: 1, max: 1024)

  • maximum_vocabulary_size (int) – Specify the maximum number of the words in vocabulary (min: 10, max: 2147483647)

  • minimum_word_count (int) – Ignores all words that have a frequency lower than this value (min: 1, max: 100)

Output vocabulary_with_embeddings

Vocabulary with embeddings

Type

vocabulary_with_embeddings: Output

example.assets.example_components.azureml_create_python_model(python_script: str = '\n# The script MUST define a class named AzureMLModel.\n# This class MUST at least define the following three methods: "__init__", "train" and "predict".\n# The signatures (method and argument names) of all these methods MUST be exactly the same as the following example.\n\n# Please do not install extra packages such as "pip install xgboost" in this script,\n# otherwise errors will be raised when reading models in down-stream modules.\n\nimport pandas as pd\nfrom sklearn.linear_model import LogisticRegression\n\n\nclass AzureMLModel:\n    # The __init__ method is only invoked in module "Create Python Model",\n    # and will not be invoked again in the following modules "Train Model" and "Score Model".\n    # The attributes defined in the __init__ method are preserved and usable in the train and predict method.\n    def __init__(self):\n        # self.model must be assigned\n        self.model = LogisticRegression()\n        self.feature_column_names = list()\n\n    # Train model\n    #   Param<df_train>: a pandas.DataFrame\n    #   Param<df_label>: a pandas.Series\n    def train(self, df_train, df_label):\n        # self.feature_column_names records the column names used for training.\n        # It is recommended to set this attribute before training so that the\n        # feature columns used in predict and train methods have the same names.\n        self.feature_column_names = df_train.columns.tolist()\n        self.model.fit(df_train, df_label)\n\n    # Predict results\n    #   Param<df>: a pandas.DataFrame\n    #   Must return a pandas.DataFrame\n    def predict(self, df):\n        # The feature columns used for prediction MUST have the same names as the ones for training.\n        # The name of score column ("Scored Labels" in this case) MUST be different from any other\n        # columns in input data.\n        return pd.DataFrame({\'Scored Labels\': self.model.predict(df[self.feature_column_names])})\n') example.assets.example_components._assets._AzuremlCreatePythonModelComponent

Creates Python model using custom script.

Parameters

python_script (str) – The Python script to execute

Output untrained_model

A untrained custom python model

Type

untrained_model: Output

example.assets.example_components.azureml_cross_validate_model(untrained_model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, name_or_numerical_index_of_the_label_column: Optional[str] = None, random_seed: int = 0) example.assets.example_components._assets._AzuremlCrossValidateModelComponent

Cross Validate a classification or regression model with standard metrics.

Parameters
  • untrained_model (Path) – Untrained learner

  • dataset (Path) – Training data

  • name_or_numerical_index_of_the_label_column (str) – Select the column that contains the label or outcome column

  • random_seed (int) – Specify a numeric seed to use for random number generation. (max: 4294967295)

Output scored_results

Data scored results

Type

scored_results: Output

Output evaluation_results_by_fold

Data evaluation results by fold

Type

evaluation_results_by_fold: Output

example.assets.example_components.azureml_decision_forest_regression(create_trainer_mode: example.assets.example_components._assets._AzuremlDecisionForestRegressionCreateTrainerModeEnum = _AzuremlDecisionForestRegressionCreateTrainerModeEnum.singleparameter, number_of_decision_trees: int = 8, maximum_depth_of_the_decision_trees: int = 32, minimum_number_of_samples_per_leaf_node: int = 1, range_for_number_of_decision_trees: str = '1; 8; 32', range_for_the_maximum_depth_of_the_decision_trees: str = '1; 16; 64', range_for_the_minimum_number_of_samples_per_leaf_node: str = '1; 4; 16', resampling_method: example.assets.example_components._assets._AzuremlDecisionForestRegressionResamplingMethodEnum = _AzuremlDecisionForestRegressionResamplingMethodEnum.bagging_resampling) example.assets.example_components._assets._AzuremlDecisionForestRegressionComponent

Creates a regression model using the decision forest algorithm.

Parameters
  • create_trainer_mode (_AzuremlDecisionForestRegressionCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])

  • number_of_decision_trees (int) – Specify the number of decision trees to create in the ensemble (optional, min: 1)

  • maximum_depth_of_the_decision_trees (int) – Specify the maximum depth of any decision tree that can be created in the ensemble (optional, min: 1)

  • minimum_number_of_samples_per_leaf_node (int) – Specify the minimum number of training samples required to generate a leaf node (optional, min: 1)

  • range_for_number_of_decision_trees (str) – Specify range for the number of decision trees to create in the ensemble (optional)

  • range_for_the_maximum_depth_of_the_decision_trees (str) – Specify range for the maximum depth of the decision trees (optional)

  • range_for_the_minimum_number_of_samples_per_leaf_node (str) – Specify range for the minimum number of samples per leaf node (optional)

  • resampling_method (_AzuremlDecisionForestRegressionResamplingMethodEnum) – Choose a resampling method (enum: [‘Bagging Resampling’, ‘Replicate Resampling’])

Output untrained_model

An untrained regression model

Type

untrained_model: Output

example.assets.example_components.azureml_densenet(model_name: example.assets.example_components._assets._AzuremlDensenetModelNameEnum = _AzuremlDensenetModelNameEnum.densenet201, pretrained: bool = True, memory_efficient: bool = False) example.assets.example_components._assets._AzuremlDensenetComponent

Creates a image classification model using the densenet algorithm.

Parameters
  • model_name (_AzuremlDensenetModelNameEnum) – Name of a certain densenet structure (enum: [‘densenet121’, ‘densenet161’, ‘densenet169’, ‘densenet201’])

  • pretrained (bool) – Indicate whether to use a model pre-trained on ImageNet

  • memory_efficient (bool) – Indicate whether to use checkpointing, which is much more memory efficient but slower

Output untrained_model

Untrained densenet model path

Type

untrained_model: Output

example.assets.example_components.azureml_edit_metadata(dataset: Optional[pathlib.Path] = None, column: Optional[str] = None, data_type: example.assets.example_components._assets._AzuremlEditMetadataDataTypeEnum = _AzuremlEditMetadataDataTypeEnum.unchanged, date_and_time_format: Optional[str] = None, categorical: example.assets.example_components._assets._AzuremlEditMetadataCategoricalEnum = _AzuremlEditMetadataCategoricalEnum.unchanged, fields: example.assets.example_components._assets._AzuremlEditMetadataFieldsEnum = _AzuremlEditMetadataFieldsEnum.unchanged, new_column_name: Optional[str] = None) example.assets.example_components._assets._AzuremlEditMetadataComponent

Edits metadata associated with columns in a dataset.

Parameters
  • dataset (Path) – Input dataset

  • column (str) – Choose the columns to which your changes should apply

  • data_type (_AzuremlEditMetadataDataTypeEnum) – Specify the new data type of the column (enum: [‘Unchanged’, ‘String’, ‘Integer’, ‘Double’, ‘Boolean’, ‘DateTime’])

  • date_and_time_format (str) – Specify custom format string for parsing DateTime, refer to Python standard library datetime.strftime() for detailed documentation. Leave empty for default permissive parsing (optional)

  • categorical (_AzuremlEditMetadataCategoricalEnum) – Indicate whether the column should be flagged as categorical (enum: [‘Unchanged’, ‘Categorical’, ‘NonCategorical’])

  • fields (_AzuremlEditMetadataFieldsEnum) – Specify whether the column should be considered a feature or label by learning algorithms (enum: [‘Unchanged’, ‘Features’, ‘Labels’, ‘ClearFeatures’, ‘ClearLabels’, ‘ClearScores’])

  • new_column_name (str) – Type the new names of the columns (optional)

Output results_dataset

Dataset with changed metadata

Type

results_dataset: Output

example.assets.example_components.azureml_enter_data_manually(dataformat: example.assets.example_components._assets._AzuremlEnterDataManuallyDataformatEnum = _AzuremlEnterDataManuallyDataformatEnum.csv, hasheader: bool = True, data: Optional[str] = None) example.assets.example_components._assets._AzuremlEnterDataManuallyComponent

Enables entering and editing small datasets by typing values.

Parameters
  • dataformat (_AzuremlEnterDataManuallyDataformatEnum) – Select which format data will be entered (enum: [‘ARFF’, ‘CSV’, ‘SvmLight’, ‘TSV’])

  • hasheader (bool) – CSV or TSV file has a header (optional)

  • data (str) – Text to output as DataTable

Output dataset

Entered data

Type

dataset: Output

example.assets.example_components.azureml_evaluate_model(scored_dataset: Optional[pathlib.Path] = None, scored_dataset_to_compare: Optional[pathlib.Path] = None) example.assets.example_components._assets._AzuremlEvaluateModelComponent

Evaluates the results of a classification or regression model with standard metrics.

Parameters
  • scored_dataset (Path) – Scored dataset

  • scored_dataset_to_compare (Path) – Scored dataset to compare (optional)(optional)

Output evaluation_results

Data evaluation result

Type

evaluation_results: Output

example.assets.example_components.azureml_evaluate_recommender(test_dataset: Optional[pathlib.Path] = None, scored_dataset: Optional[pathlib.Path] = None) example.assets.example_components._assets._AzuremlEvaluateRecommenderComponent

Evaluate a recommendation model.

Parameters
  • test_dataset (Path) – Test dataset

  • scored_dataset (Path) – Scored dataset

Output metric

A table of evaluation metrics

Type

metric: Output

example.assets.example_components.azureml_execute_python_script(dataset1: Optional[pathlib.Path] = None, dataset2: Optional[pathlib.Path] = None, script_bundle: Optional[pathlib.Path] = None, python_script: str = '\n# The script MUST contain a function named azureml_main\n# which is the entry point for this module.\n\n# imports up here can be used to\nimport pandas as pd\n\n# The entry point function MUST have two input arguments.\n# If the input port is not connected, the corresponding\n# dataframe argument will be None.\n#   Param<dataframe1>: a pandas.DataFrame\n#   Param<dataframe2>: a pandas.DataFrame\ndef azureml_main(dataframe1 = None, dataframe2 = None):\n\n    # Execution logic goes here\n    print(f\'Input pandas.DataFrame #1: {dataframe1}\')\n\n    # If a zip file is connected to the third input port,\n    # it is unzipped under "./Script Bundle". This directory is added\n    # to sys.path. Therefore, if your zip file contains a Python file\n    # mymodule.py you can import it using:\n    # import mymodule\n\n    # Return value must be of a sequence of pandas.DataFrame\n    # E.g.\n    #   -  Single return value: return dataframe1,\n    #   -  Two return values: return dataframe1, dataframe2\n    return dataframe1,\n\n') example.assets.example_components._assets._AzuremlExecutePythonScriptComponent

Executes a Python script from an Azure Machine Learning designer pipeline.

Parameters
  • dataset1 (Path) – Input dataset 1(optional)

  • dataset2 (Path) – Input dataset 2(optional)

  • script_bundle (Path) – Zip file containing custom resources(optional)

  • python_script (str) – The Python script to execute

Output result_dataset

Output Dataset

Type

result_dataset: Output

Output python_device

Output Dataset2

Type

python_device: Output

example.assets.example_components.azureml_execute_r_script(dataset1: Optional[pathlib.Path] = None, dataset2: Optional[pathlib.Path] = None, script_bundle: Optional[pathlib.Path] = None, r_script: str = '\n# R version: 3.5.1\n# The script MUST contain a function named azureml_main\n# which is the entry point for this module.\n\n# Please note that functions dependant on X11 library\n# such as "View" are not supported because X11 library\n# is not pre-installed.\n\n# The entry point function MUST have two input arguments.\n# If the input port is not connected, the corresponding\n# dataframe argument will be null.\n#   Param<dataframe1>: a R DataFrame\n#   Param<dataframe2>: a R DataFrame\nazureml_main <- function(dataframe1, dataframe2){\n  print("R script run.")\n\n  # If a zip file is connected to the third input port, it is\n  # unzipped under "./Script Bundle". This directory is added\n  # to sys.path.\n\n  # Return datasets as a Named List\n  return(list(dataset1=dataframe1, dataset2=dataframe2))\n}\n\n', random_seed: Optional[int] = None) example.assets.example_components._assets._AzuremlExecuteRScriptComponent

Executes an R script from an Azure Machine Learning designer pipeline.

Parameters
  • dataset1 (Path) – Input dataset 1(optional)

  • dataset2 (Path) – Input dataset 2(optional)

  • script_bundle (Path) – Set of R sources(optional)

  • r_script (str) – Specify a StreamReader pointing to the R script sources

  • random_seed (int) – Define a random seed value for use inside the R environment. Calls “set.seed(value)” (optional)

Output result_dataset

Output Dataset

Type

result_dataset: Output

Output r_device

Output Dataset2

Type

r_device: Output

example.assets.example_components.azureml_export_data(input_path: Optional[pathlib.Path] = None, datastore_type: Optional[str] = None, output_data_store: Optional[str] = None, output_path: Optional[str] = None, output_file_type: Optional[str] = None, datatable_name: Optional[str] = None, column_list_to_be_saved: Optional[str] = None, column_list_datatable_columns: Optional[str] = None, number_rows_per_operation: int = 50) example.assets.example_components._assets._AzuremlExportDataComponent

Writes a dataset to cloud-based storage in Azure, such as Azure blob storage, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2.

Parameters
  • input_path (Path) – export data

  • datastore_type (str) – datastore type (optional)

  • output_data_store (str) – the location of output data store

  • output_path (str) – the relative output path in the data store (optional)

  • output_file_type (str) – the file type to be outputted (optional)

  • datatable_name (str) – export data table name (optional)

  • column_list_to_be_saved (str) – selected column(s) to be exported (optional)

  • column_list_datatable_columns (str) – column names in export data table (optional)

  • number_rows_per_operation (int) – number of rows per operation (optional)

example.assets.example_components.azureml_extract_n_gram_features_from_text(dataset: Optional[pathlib.Path] = None, input_vocabulary: Optional[pathlib.Path] = None, text_column: Optional[str] = None, vocabulary_mode: example.assets.example_components._assets._AzuremlExtractNGramFeaturesFromTextVocabularyModeEnum = _AzuremlExtractNGramFeaturesFromTextVocabularyModeEnum.create, n_grams_size: int = 1, weighting_function: example.assets.example_components._assets._AzuremlExtractNGramFeaturesFromTextWeightingFunctionEnum = _AzuremlExtractNGramFeaturesFromTextWeightingFunctionEnum.binary_weight, minimum_word_length: int = 3, maximum_word_length: int = 25, minimum_n_gram_document_absolute_frequency: float = 5, maximum_n_gram_document_ratio: float = 1, normalize_n_gram_feature_vectors: bool = False) example.assets.example_components._assets._AzuremlExtractNGramFeaturesFromTextComponent

Creates N-Gram dictionary features and does feature selection on them.

Parameters
  • dataset (Path) – Input data

  • input_vocabulary (Path) – Input vocabulary(optional)

  • text_column (str) – Name or index (one-based) of text column

  • vocabulary_mode (_AzuremlExtractNGramFeaturesFromTextVocabularyModeEnum) – Specify how the n-gram vocabulary should be created from the corpus (enum: [‘Create’, ‘ReadOnly’])

  • n_grams_size (int) – Indicate the maximum size of n-grams to create (min: 1)

  • weighting_function (_AzuremlExtractNGramFeaturesFromTextWeightingFunctionEnum) – Choose the weighting function to apply to each n-gram value (enum: [‘Binary Weight’, ‘TF Weight’, ‘IDF Weight’, ‘TF-IDF Weight’])

  • minimum_word_length (int) – Specify the minimum length of words to include in n-grams (min: 1)

  • maximum_word_length (int) – Specify the maximum length of words to include in n-grams (min: 2)

  • minimum_n_gram_document_absolute_frequency (float) – Minimum n-gram document absolute frequency (min: 1.0)

  • maximum_n_gram_document_ratio (float) – Maximum n-gram document ratio (min: 0.0001)

  • normalize_n_gram_feature_vectors (bool) – Normalize n-gram feature vectors. If true, then the n-gram feature vector is divided by its L2 norm.

Output results_dataset

Extracted features

Type

results_dataset: Output

Output result_vocabulary

Result vocabulary

Type

result_vocabulary: Output

example.assets.example_components.azureml_fast_forest_quantile_regression(create_trainer_mode: example.assets.example_components._assets._AzuremlFastForestQuantileRegressionCreateTrainerModeEnum = _AzuremlFastForestQuantileRegressionCreateTrainerModeEnum.singleparameter, number_of_trees: int = 100, number_of_leaves: int = 20, minimum_number_of_training_instances_required_to_form_a_leaf: int = 10, bagging_fraction: float = 0.7, split_fraction: float = 0.7, quantiles_to_be_estimated: str = '0.25; 0.5; 0.75', range_for_total_number_of_trees_constructed: str = '16; 32; 64', range_for_maximum_number_of_leaves_per_tree: str = '16; 32; 64', range_for_minimum_number_of_training_instances_required_to_form_a_leaf: str = '1; 5; 10', range_for_bagging_fraction: str = '0.25; 0.5; 0.75', range_for_split_fraction: str = '0.25; 0.5; 0.75', required_quantile_values: str = '0.25; 0.5; 0.75', random_number_seed: Optional[int] = None) example.assets.example_components._assets._AzuremlFastForestQuantileRegressionComponent

Creates a quantile regression model

Parameters
  • create_trainer_mode (_AzuremlFastForestQuantileRegressionCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])

  • number_of_trees (int) – Specifies the number of trees to be constructed (optional)

  • number_of_leaves (int) – Specifies the maximum number of leaves per tree. The default number is 20 (optional, min: 2)

  • minimum_number_of_training_instances_required_to_form_a_leaf (int) – Indicates the minimum number of training instances requried to form a leaf (optional)

  • bagging_fraction (float) – Specifies the fraction of training data to use for each tree (optional)

  • split_fraction (float) – Specifies the fraction of features (chosen randomly) to use for each split (optional)

  • quantiles_to_be_estimated (str) – Specifies the quantile to be estimated (optional)

  • range_for_total_number_of_trees_constructed (str) – Specify the range for the maximum number of trees that can be created during training (optional)

  • range_for_maximum_number_of_leaves_per_tree (str) – Specify range for the maximum number of leaves allowed per tree (optional)

  • range_for_minimum_number_of_training_instances_required_to_form_a_leaf (str) – Specify the range for the minimum number of cases required to form a leaf (optional)

  • range_for_bagging_fraction (str) – Specifies the range for fraction of training data to use for each tree (optional)

  • range_for_split_fraction (str) – Specifies the range for fraction of features (chosen randomly) to use for each split (optional)

  • required_quantile_values (str) – Required quantile value used during parameter sweep (optional)

  • random_number_seed (int) – Provide a seed for the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained quantile regression model that can be connected to the Train Generic Model or Cross Validate Model modules.

Type

untrained_model: Output

example.assets.example_components.azureml_feature_hashing(dataset: Optional[pathlib.Path] = None, target_column: Optional[str] = None, hashing_bitsize: int = 10, n_grams: int = 2) example.assets.example_components._assets._AzuremlFeatureHashingComponent

Convert text data to numeric features using the nimbusml.

Parameters
  • dataset (Path) – Input dataset

  • target_column (str) – Choose the columns to which hashing will be applied

  • hashing_bitsize (int) – Type the number of bits used to hash the selected columns (min: 1, max: 31)

  • n_grams (int) – Specify the number of N-grams generated during hashing (max: 10)

Output transformed_dataset

Output dataset with hashed columns,the number of feature columns generated is related to the parameters(Hashing bitsize).

Type

transformed_dataset: Output

example.assets.example_components.azureml_filter_based_feature_selection(input_dataset: Optional[pathlib.Path] = None, operate_on_feature_columns_only: bool = True, target_column: Optional[str] = None, number_of_desired_features: int = 1, feature_scoring_method: example.assets.example_components._assets._AzuremlFilterBasedFeatureSelectionFeatureScoringMethodEnum = _AzuremlFilterBasedFeatureSelectionFeatureScoringMethodEnum.pearsoncorrelation) example.assets.example_components._assets._AzuremlFilterBasedFeatureSelectionComponent

Identifies the features in a dataset with the greatest predictive power.

Parameters
  • input_dataset (Path) – Input dataset

  • operate_on_feature_columns_only (bool) – Indicate whether to use only feature columns in the scoring process (optional)

  • target_column (str) – Specify the target column

  • number_of_desired_features (int) – Specify the number of features to output in results

  • feature_scoring_method (_AzuremlFilterBasedFeatureSelectionFeatureScoringMethodEnum) – Choose the method to use for scoring (enum: [‘PearsonCorrelation’, ‘ChiSquared’])

Output filtered_dataset

Filtered dataset

Type

filtered_dataset: Output

Output features

Names of output columns and feature selection scores

Type

features: Output

example.assets.example_components.azureml_group_data_into_bins(dataset: Optional[pathlib.Path] = None, binning_mode: example.assets.example_components._assets._AzuremlGroupDataIntoBinsBinningModeEnum = _AzuremlGroupDataIntoBinsBinningModeEnum.quantiles, number_of_bins: int = 10, quantile_normalization: example.assets.example_components._assets._AzuremlGroupDataIntoBinsQuantileNormalizationEnum = _AzuremlGroupDataIntoBinsQuantileNormalizationEnum.percent, comma_separated_list_of_bin_edges: Optional[str] = None, columns_to_bin: Optional[str] = None, output_mode: example.assets.example_components._assets._AzuremlGroupDataIntoBinsOutputModeEnum = _AzuremlGroupDataIntoBinsOutputModeEnum.append, tag_columns_as_categorical: bool = True) example.assets.example_components._assets._AzuremlGroupDataIntoBinsComponent

Map input values to a smaller number of bins using a quantization function.

Parameters
  • dataset (Path) – Dataset to be analyzed

  • binning_mode (_AzuremlGroupDataIntoBinsBinningModeEnum) – Choose a binning method (enum: [‘Quantiles’, ‘Equal Width’, ‘Custom Edges’])

  • number_of_bins (int) – Specify the desired number of bins (optional, min: 1)

  • quantile_normalization (_AzuremlGroupDataIntoBinsQuantileNormalizationEnum) – Choose the method for normalizing quantiles (optional, enum: [‘Percent’, ‘PQuantile’, ‘Quantile Index’])

  • comma_separated_list_of_bin_edges (str) – Type a comma-separated list of numbers to use as bin edges (optional)

  • columns_to_bin (str) – Choose columns for quantization

  • output_mode (_AzuremlGroupDataIntoBinsOutputModeEnum) – Indicate how quantized columns should be output (enum: [‘Append’, ‘Inplace’, ‘Result Only’])

  • tag_columns_as_categorical (bool) – Indicate whether output columns should be tagged as categorical

Output quantized_dataset

Dataset with quantized columns

Type

quantized_dataset: Output

Output binning_transformation

Transformation that applies quantization to the dataset

Type

binning_transformation: Output

example.assets.example_components.azureml_import_data(input_dataset_request_dto: Optional[str] = None, data_store_type: Optional[str] = None, override_data_store_name: Optional[str] = None, override_data_path: Optional[str] = None) example.assets.example_components._assets._AzuremlImportDataComponent

Load data from web URLs or from various cloud-based storage in Azure, such as Azure SQL database, Azure blob storage, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2.

Parameters
  • input_dataset_request_dto (str) – input dataset Id/Object

  • data_store_type (str) – data store type (optional)

  • override_data_store_name (str) – string (optional)

  • override_data_path (str) – string (optional)

Output output_data

DataFrameDirectory

Type

output_data: Output

example.assets.example_components.azureml_init_image_transformation(resize: example.assets.example_components._assets._AzuremlInitImageTransformationResizeEnum = _AzuremlInitImageTransformationResizeEnum.true, size: int = 256, center_crop: example.assets.example_components._assets._AzuremlInitImageTransformationCenterCropEnum = _AzuremlInitImageTransformationCenterCropEnum.true, crop_size: int = 224, pad: example.assets.example_components._assets._AzuremlInitImageTransformationPadEnum = _AzuremlInitImageTransformationPadEnum.false, padding: int = 0, color_jitter: bool = False, grayscale: bool = False, random_resized_crop: example.assets.example_components._assets._AzuremlInitImageTransformationRandomResizedCropEnum = _AzuremlInitImageTransformationRandomResizedCropEnum.false, random_resized_crop_size: int = 256, random_crop: example.assets.example_components._assets._AzuremlInitImageTransformationRandomCropEnum = _AzuremlInitImageTransformationRandomCropEnum.false, random_crop_size: int = 224, random_horizontal_flip: bool = True, random_vertical_flip: bool = False, random_rotation: example.assets.example_components._assets._AzuremlInitImageTransformationRandomRotationEnum = _AzuremlInitImageTransformationRandomRotationEnum.false, random_rotation_degrees: int = 0, random_affine: example.assets.example_components._assets._AzuremlInitImageTransformationRandomAffineEnum = _AzuremlInitImageTransformationRandomAffineEnum.false, random_affine_degrees: int = 0, random_grayscale: bool = False, random_perspective: bool = False) example.assets.example_components._assets._AzuremlInitImageTransformationComponent

Initialize image transformation.

Parameters
  • resize (_AzuremlInitImageTransformationResizeEnum) – Resize the input PIL Image to the given size (enum: [‘False’, ‘True’])

  • size (int) – Desired output size (optional, min: 1)

  • center_crop (_AzuremlInitImageTransformationCenterCropEnum) – Crops the given PIL Image at the center (enum: [‘False’, ‘True’])

  • crop_size (int) – Desired output size of the crop (optional, min: 1)

  • pad (_AzuremlInitImageTransformationPadEnum) – Pad the given PIL Image on all sides with the given “pad” value (enum: [‘False’, ‘True’])

  • padding (int) – Padding on each border (optional)

  • color_jitter (bool) – Randomly change the brightness, contrast and saturation of an image

  • grayscale (bool) – Convert image to grayscale

  • random_resized_crop (_AzuremlInitImageTransformationRandomResizedCropEnum) – Crop the given PIL Image to random size and aspect ratio (enum: [‘False’, ‘True’])

  • random_resized_crop_size (int) – Expected output size of each edge (optional, min: 1)

  • random_crop (_AzuremlInitImageTransformationRandomCropEnum) – Crop the given PIL Image at a random location (enum: [‘False’, ‘True’])

  • random_crop_size (int) – Desired output size of the crop (optional, min: 1)

  • random_horizontal_flip (bool) – Horizontally flip the given PIL Image randomly with a given probability

  • random_vertical_flip (bool) – Vertically flip the given PIL Image randomly with a given probability

  • random_rotation (_AzuremlInitImageTransformationRandomRotationEnum) – Rotate the image by angle (enum: [‘False’, ‘True’])

  • random_rotation_degrees (int) – Range of degrees to select from (optional, max: 180)

  • random_affine (_AzuremlInitImageTransformationRandomAffineEnum) – Random affine transformation of the image keeping center invariant (enum: [‘False’, ‘True’])

  • random_affine_degrees (int) – Range of degrees to select from (optional, max: 180)

  • random_grayscale (bool) – Randomly convert image to grayscale with a probability of p (default 0.1)

  • random_perspective (bool) – Performs Perspective transformation of the given PIL Image randomly with a given probability

Output output_image_transformation

Output image transformation

Type

output_image_transformation: Output

example.assets.example_components.azureml_join_data(left_dataset: Optional[pathlib.Path] = None, right_dataset: Optional[pathlib.Path] = None, comma_separated_case_sensitive_names_of_join_key_columns_for_l: Optional[str] = None, comma_separated_case_sensitive_names_of_join_key_columns_for_r: Optional[str] = None, match_case: bool = True, join_type: example.assets.example_components._assets._AzuremlJoinDataJoinTypeEnum = _AzuremlJoinDataJoinTypeEnum.inner_join, keep_right_key_columns_in_joined_table: bool = True) example.assets.example_components._assets._AzuremlJoinDataComponent

Joins two datasets on selected key columns.

Parameters
  • left_dataset (Path) – First dataset to join

  • right_dataset (Path) – Second dataset to join

  • comma_separated_case_sensitive_names_of_join_key_columns_for_l (str) – Select the join key columns for the first dataset

  • comma_separated_case_sensitive_names_of_join_key_columns_for_r (str) – Select the join key columns for the second dataset

  • match_case (bool) – Indicate whether a case-sensitive comparison is allowed on key columns

  • join_type (_AzuremlJoinDataJoinTypeEnum) – Choose a join type (enum: [‘Inner Join’, ‘Left Outer Join’, ‘Full Outer Join’, ‘Left Semi-Join’])

  • keep_right_key_columns_in_joined_table (bool) – Indicate whether to keep key columns from the second dataset in the joined dataset (optional)

Output results_dataset

Result of join operation

Type

results_dataset: Output

example.assets.example_components.azureml_k_means_clustering(create_trainer_mode: example.assets.example_components._assets._AzuremlKMeansClusteringCreateTrainerModeEnum = _AzuremlKMeansClusteringCreateTrainerModeEnum.singleparameter, number_of_centroids: int = 2, initialization: example.assets.example_components._assets._AzuremlKMeansClusteringInitializationEnum = _AzuremlKMeansClusteringInitializationEnum.k_means, random_number_seed: Optional[int] = None, metric: example.assets.example_components._assets._AzuremlKMeansClusteringMetricEnum = _AzuremlKMeansClusteringMetricEnum.euclidean, should_input_instances_be_normalized: bool = True, iterations: int = 100, assign_label_mode: example.assets.example_components._assets._AzuremlKMeansClusteringAssignLabelModeEnum = _AzuremlKMeansClusteringAssignLabelModeEnum.ignore_label_column) example.assets.example_components._assets._AzuremlKMeansClusteringComponent

Initialize K-Means clustering model.

Parameters
  • create_trainer_mode (_AzuremlKMeansClusteringCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’])

  • number_of_centroids (int) – Number of Centroids (optional, min: 2)

  • initialization (_AzuremlKMeansClusteringInitializationEnum) – Initialization algorithm (optional, enum: [‘Random’, ‘K-Means++’, ‘Default’])

  • random_number_seed (int) – Type a value to seed the random number for centroid generator used by the training model. Leave blank to have value randomly choosen at first train. (optional, max: 4294967295)

  • metric (_AzuremlKMeansClusteringMetricEnum) – Selected metric (enum: [‘Euclidean’])

  • should_input_instances_be_normalized (bool) – Indicate whether instances should be normalized

  • iterations (int) – Number of iterations (min: 1)

  • assign_label_mode (_AzuremlKMeansClusteringAssignLabelModeEnum) – Mode of value assignment to the labeled column (enum: [‘Ignore label column’, ‘Fill missing values’, ‘Overwrite from closest to center’])

Output untrained_model

Untrained K-Means clustering model

Type

untrained_model: Output

example.assets.example_components.azureml_latent_dirichlet_allocation(dataset: Optional[pathlib.Path] = None, target_columns: Optional[str] = None, number_of_topics_to_model: int = 5, n_grams: int = 2, normalize: bool = True, show_all_options: example.assets.example_components._assets._AzuremlLatentDirichletAllocationShowAllOptionsEnum = _AzuremlLatentDirichletAllocationShowAllOptionsEnum.false, rho_parameter: float = 0.01, alpha_parameter: float = 0.01, estimated_number_of_documents: int = 1000, size_of_the_batch: int = 32, initial_value_of_iteration_count: int = 10, power_applied_to_the_iteration_during_updates: float = 0.5, passes: int = 25, build_dictionary_of_ngrams_prior_to_lda: example.assets.example_components._assets._AzuremlLatentDirichletAllocationBuildDictionaryOfNgramsPriorToLdaEnum = _AzuremlLatentDirichletAllocationBuildDictionaryOfNgramsPriorToLdaEnum.true, maximum_number_of_ngrams_in_dictionary: int = 20000, hash_bits: int = 12, build_dictionary_of_ngrams: example.assets.example_components._assets._AzuremlLatentDirichletAllocationBuildDictionaryOfNgramsEnum = _AzuremlLatentDirichletAllocationBuildDictionaryOfNgramsEnum.true, maximum_size_of_ngram_dictionary: int = 20000, number_of_hash_bits: int = 12) example.assets.example_components._assets._AzuremlLatentDirichletAllocationComponent

Topic Modeling: Latent Dirichlet Allocation.

Parameters
  • dataset (Path) – Input dataset

  • target_columns (str) – Target column name or index

  • number_of_topics_to_model (int) – Model the document distribution against N topics (min: 1, max: 1000)

  • n_grams (int) – Order of N-grams generated during hashing (min: 1, max: 10)

  • normalize (bool) – Normalize output to probabilities. The feature topic matrix will be P(word|topic).

  • show_all_options (_AzuremlLatentDirichletAllocationShowAllOptionsEnum) – Presents additional parameters specific to Skleaarn online LDA (enum: [‘True’, ‘False’])

  • rho_parameter (float) – Rho parameter (optional, min: 2.220446049250313e-16, max: 1.0)

  • alpha_parameter (float) – Alpha parameter (optional, min: 2.220446049250313e-16, max: 1.0)

  • estimated_number_of_documents (int) – Estimated number of documents (optional, min: 1, max: 2147483647)

  • size_of_the_batch (int) – Size of the batch (optional, min: 1, max: 1024)

  • initial_value_of_iteration_count (int) – Initial value of iteration count used in learning rate update schedule (optional, min: 1, max: 2147483647)

  • power_applied_to_the_iteration_during_updates (float) – Power applied to the iteration count during online updates (optional, min: 0.5, max: 1.0)

  • passes (int) – Number of training iterations (optional, min: 1, max: 1024)

  • build_dictionary_of_ngrams_prior_to_lda (_AzuremlLatentDirichletAllocationBuildDictionaryOfNgramsPriorToLdaEnum) – Builds a dictionary of ngrams prior to LDA. Useful for model inspection and interpretation (optional, enum: [‘True’, ‘False’])

  • maximum_number_of_ngrams_in_dictionary (int) – Maximum size of the dictionary. If number of tokens in the input exceed this size, collisions may occur (optional, min: 1, max: 2147483647)

  • hash_bits (int) – Number of bits to use for feature hashing (optional, min: 1, max: 31)

  • build_dictionary_of_ngrams (_AzuremlLatentDirichletAllocationBuildDictionaryOfNgramsEnum) – Builds a dictionary of ngrams prior to computing LDA. Useful for model inspection and interpretation (optional, enum: [‘True’, ‘False’])

  • maximum_size_of_ngram_dictionary (int) – Maximum size of the ngrams dictionary. If number of tokens in the input exceed this size, collisions may occur (optional, min: 1, max: 2147483647)

  • number_of_hash_bits (int) – Number of bits to use during feature hashing (optional, min: 1, max: 31)

Output transformed_dataset

Output dataset

Type

transformed_dataset: Output

Output feature_topic_matrix

Feature topic matrix produced by LDA

Type

feature_topic_matrix: Output

Output lda_transformation

Transformation that applies LDA to the dataset

Type

lda_transformation: Output

example.assets.example_components.azureml_linear_regression(solution_method: example.assets.example_components._assets._AzuremlLinearRegressionSolutionMethodEnum = _AzuremlLinearRegressionSolutionMethodEnum.ordinary_least_squares, create_trainer_mode: example.assets.example_components._assets._AzuremlLinearRegressionCreateTrainerModeEnum = _AzuremlLinearRegressionCreateTrainerModeEnum.singleparameter, learning_rate: float = 0.1, number_of_epochs_over_which_algorithm_iterates_through_examples: int = 10, l2_regularization_term_weight: float = 0.001, range_for_learning_rate: str = '0.025; 0.05; 0.1; 0.2', range_for_number_of_epochs_over_which_algorithm_iterates_through_examples: str = '1; 10; 100', range_for_l2_regularization_term_weight: str = '0.001; 0.01; 0.1', should_input_instances_be_normalized: bool = True, decrease_learning_rate_as_iterations_progress: bool = True, l2_regularization_weight: float = 0.001, include_intercept_term: bool = True, random_number_seed: Optional[int] = None) example.assets.example_components._assets._AzuremlLinearRegressionComponent

Creates a linear regression model.

Parameters
  • solution_method (_AzuremlLinearRegressionSolutionMethodEnum) – Choose an optimization method (enum: [‘Online Gradient Descent’, ‘Ordinary Least Squares’])

  • create_trainer_mode (_AzuremlLinearRegressionCreateTrainerModeEnum) – Create advanced learner options (optional, enum: [‘SingleParameter’, ‘ParameterRange’])

  • learning_rate (float) – Specify the initial learning rate for the stochastic gradient descent optimizer (optional, min: 2.220446049250313e-16)

  • number_of_epochs_over_which_algorithm_iterates_through_examples (int) – Specify how many times the algorithm should iterate through examples. For datasets with a small number of examples, this number should be large to reach convergence. (optional)

  • l2_regularization_term_weight (float) – Specify the weight for L2 regularization. Use a non-zero value to avoid overfitting. (optional)

  • range_for_learning_rate (str) – Specify the range for the initial learning rate for the stochastic gradient descent optimizer (optional)

  • range_for_number_of_epochs_over_which_algorithm_iterates_through_examples (str) – Specify range for how many times the algorithm should iterate through examples. For datasets with a small number of examples, this number should be large to reach convergence. (optional)

  • range_for_l2_regularization_term_weight (str) – Specify the range for the weight for L2 regularization. Use a non-zero value to avoid overfitting. (optional)

  • should_input_instances_be_normalized (bool) – Indicate whether instances should be normalized (optional)

  • decrease_learning_rate_as_iterations_progress (bool) – Indicate whether the learning rate should decrease as iterations progress (optional)

  • l2_regularization_weight (float) – Specify the weight for the L2 regularization. Use a non-zero value to avoid overfitting. (optional)

  • include_intercept_term (bool) – Indicate whether an additional term should be added for the intercept (optional)

  • random_number_seed (int) – Specify a value to seed the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained regression model

Type

untrained_model: Output

example.assets.example_components.azureml_multiclass_boosted_decision_tree(create_trainer_mode: example.assets.example_components._assets._AzuremlMulticlassBoostedDecisionTreeCreateTrainerModeEnum = _AzuremlMulticlassBoostedDecisionTreeCreateTrainerModeEnum.singleparameter, maximum_number_of_leaves_per_tree: int = 20, minimum_number_of_training_instances_required_to_form_a_leaf: int = 10, the_learning_rate: float = 0.2, total_number_of_trees_constructed: int = 100, range_for_maximum_number_of_leaves_per_tree: str = '2; 8; 32; 128', range_for_minimum_number_of_training_instances_required_to_form_a_leaf: str = '1; 10; 50', range_for_learning_rate: str = '0.025; 0.05; 0.1; 0.2; 0.4', range_for_total_number_of_trees_constructed: str = '20; 100; 500', random_number_seed: Optional[int] = None) example.assets.example_components._assets._AzuremlMulticlassBoostedDecisionTreeComponent

Creates a multiclass classifier using a boosted decision tree algorithm.

Parameters
  • create_trainer_mode (_AzuremlMulticlassBoostedDecisionTreeCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])

  • maximum_number_of_leaves_per_tree (int) – Specify the maximum number of leaves allowed per tree (optional, min: 2, max: 131072)

  • minimum_number_of_training_instances_required_to_form_a_leaf (int) – Specify the minimum number of cases required to form a leaf (optional, min: 1)

  • the_learning_rate (float) – Specify the initial learning rate (optional, min: 2.220446049250313e-16, max: 1.0)

  • total_number_of_trees_constructed (int) – Specify the maximum number of trees that can be created during training (optional, min: 1)

  • range_for_maximum_number_of_leaves_per_tree (str) – Specify range for the maximum number of leaves allowed per tree (optional)

  • range_for_minimum_number_of_training_instances_required_to_form_a_leaf (str) – Specify the range for the minimum number of cases required to form a leaf (optional)

  • range_for_learning_rate (str) – Specify the range for the initial learning rate (optional)

  • range_for_total_number_of_trees_constructed (str) – Specify the range for the maximum number of trees that can be created during training (optional)

  • random_number_seed (int) – Type a value to seed the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained multiclass classification model

Type

untrained_model: Output

example.assets.example_components.azureml_multiclass_decision_forest(create_trainer_mode: example.assets.example_components._assets._AzuremlMulticlassDecisionForestCreateTrainerModeEnum = _AzuremlMulticlassDecisionForestCreateTrainerModeEnum.singleparameter, number_of_decision_trees: int = 8, maximum_depth_of_the_decision_trees: int = 32, minimum_number_of_samples_per_leaf_node: int = 1, range_for_number_of_decision_trees: str = '1; 8; 32', range_for_the_maximum_depth_of_the_decision_trees: str = '1; 16; 64', range_for_the_minimum_number_of_samples_per_leaf_node: str = '1; 4; 16', resampling_method: example.assets.example_components._assets._AzuremlMulticlassDecisionForestResamplingMethodEnum = _AzuremlMulticlassDecisionForestResamplingMethodEnum.bagging_resampling) example.assets.example_components._assets._AzuremlMulticlassDecisionForestComponent

Creates a multiclass classification model using the decision forest algorithm.

Parameters
  • create_trainer_mode (_AzuremlMulticlassDecisionForestCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])

  • number_of_decision_trees (int) – Specify the number of decision trees to create in the ensemble (optional, min: 1)

  • maximum_depth_of_the_decision_trees (int) – Specify the maximum depth of any decision tree that can be created in the ensemble (optional, min: 1)

  • minimum_number_of_samples_per_leaf_node (int) – Specify the minimum number of training samples required to generate a leaf node (optional, min: 1)

  • range_for_number_of_decision_trees (str) – Specify range for the number of decision trees to create in the ensemble (optional)

  • range_for_the_maximum_depth_of_the_decision_trees (str) – Specify range for the maximum depth of the decision trees (optional)

  • range_for_the_minimum_number_of_samples_per_leaf_node (str) – Specify range for the minimum number of samples per leaf node (optional)

  • resampling_method (_AzuremlMulticlassDecisionForestResamplingMethodEnum) – Choose a resampling method (enum: [‘Bagging Resampling’, ‘Replicate Resampling’])

Output untrained_model

An untrained multiclass classification model

Type

untrained_model: Output

example.assets.example_components.azureml_multiclass_logistic_regression(create_trainer_mode: example.assets.example_components._assets._AzuremlMulticlassLogisticRegressionCreateTrainerModeEnum = _AzuremlMulticlassLogisticRegressionCreateTrainerModeEnum.singleparameter, optimization_tolerance: float = 1e-07, l2_regularizaton_weight: float = 1.0, range_for_optimization_tolerance: str = '0.00001; 0.00000001', range_for_l2_regularization_weight: str = '0.01; 0.1; 1.0', random_number_seed: Optional[int] = None) example.assets.example_components._assets._AzuremlMulticlassLogisticRegressionComponent

Creates a multiclass logistic regression classification model.

Parameters
  • create_trainer_mode (_AzuremlMulticlassLogisticRegressionCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])

  • optimization_tolerance (float) – Specify a tolerance value for the L-BFGS optimizer (optional, min: 2.220446049250313e-16)

  • l2_regularizaton_weight (float) – Specify the L2 regularization weight. Use a non-zero value to avoid overfitting. (optional)

  • range_for_optimization_tolerance (str) – Specify a range for the tolerance value for the L-BFGS optimizer (optional)

  • range_for_l2_regularization_weight (str) – Specify the range for the L2 regularization weight. Use a non-zero value to avoid overfitting. (optional)

  • random_number_seed (int) – Type a value to seed the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained classificaiton model

Type

untrained_model: Output

example.assets.example_components.azureml_multiclass_neural_network(create_trainer_mode: example.assets.example_components._assets._AzuremlMulticlassNeuralNetworkCreateTrainerModeEnum = _AzuremlMulticlassNeuralNetworkCreateTrainerModeEnum.singleparameter, hidden_layer_specification: example.assets.example_components._assets._AzuremlMulticlassNeuralNetworkHiddenLayerSpecificationEnum = _AzuremlMulticlassNeuralNetworkHiddenLayerSpecificationEnum.fully_connected_case, number_of_hidden_nodes: str = '100', the_learning_rate: float = 0.1, number_of_learning_iterations: int = 100, hidden_layer_specification1: example.assets.example_components._assets._AzuremlMulticlassNeuralNetworkHiddenLayerSpecification1Enum = _AzuremlMulticlassNeuralNetworkHiddenLayerSpecification1Enum.fully_connected_case, number_of_hidden_nodes1: str = '100', range_for_learning_rate: str = '0.1; 0.2; 0.4', range_for_number_of_learning_iterations: str = '20; 40; 80; 160', the_momentum: float = 0, shuffle_examples: bool = True, random_number_seed: Optional[int] = None) example.assets.example_components._assets._AzuremlMulticlassNeuralNetworkComponent

Creates a multiclass classification model using a neural network algorithm.

Parameters
  • create_trainer_mode (_AzuremlMulticlassNeuralNetworkCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])

  • hidden_layer_specification (_AzuremlMulticlassNeuralNetworkHiddenLayerSpecificationEnum) – Specify the architecture of the hidden layer or layers (optional, enum: [‘Fully-connected case’])

  • number_of_hidden_nodes (str) – Type the number of nodes in the hidden layer. For multiple hidden layers, type a comma-separated list. (optional)

  • the_learning_rate (float) – Specify the size of each step in the learning process (optional, min: 2.220446049250313e-16, max: 2.0)

  • number_of_learning_iterations (int) – Specify the number of iterations while learning (optional, min: 1)

  • hidden_layer_specification1 (_AzuremlMulticlassNeuralNetworkHiddenLayerSpecification1Enum) – Specify the architecture of the hidden layer or layers for range (optional, enum: [‘Fully-connected case’])

  • number_of_hidden_nodes1 (str) – Type the number of nodes in the hidden layer, or for multiple hidden layers, type a comma-separated list. (optional)

  • range_for_learning_rate (str) – Specify the range for the size of each step in the learning process (optional)

  • range_for_number_of_learning_iterations (str) – Specify the range for the number of iterations while learning (optional)

  • the_momentum (float) – Specify a weight to apply during learning to nodes from previous iterations (max: 1.0)

  • shuffle_examples (bool) – Select this option to change the order of instances between learning iterations

  • random_number_seed (int) – Specify a numeric seed to use for random number generation. Leave blank to use the default seed. (optional, max: 4294967295)

Output untrained_model

An untrained multiclass classification model

Type

untrained_model: Output

example.assets.example_components.azureml_neural_network_regression(create_trainer_mode: example.assets.example_components._assets._AzuremlNeuralNetworkRegressionCreateTrainerModeEnum = _AzuremlNeuralNetworkRegressionCreateTrainerModeEnum.singleparameter, hidden_layer_specification: example.assets.example_components._assets._AzuremlNeuralNetworkRegressionHiddenLayerSpecificationEnum = _AzuremlNeuralNetworkRegressionHiddenLayerSpecificationEnum.fully_connected_case, number_of_hidden_nodes: str = '100', the_learning_rate: float = 0.1, number_of_learning_iterations: int = 100, hidden_layer_specification1: example.assets.example_components._assets._AzuremlNeuralNetworkRegressionHiddenLayerSpecification1Enum = _AzuremlNeuralNetworkRegressionHiddenLayerSpecification1Enum.fully_connected_case, number_of_hidden_nodes1: str = '100', range_for_learning_rate: str = '0.1; 0.2; 0.4', range_for_number_of_learning_iterations: str = '20; 40; 80; 160', the_momentum: float = 0, shuffle_examples: bool = True, random_number_seed: Optional[int] = None) example.assets.example_components._assets._AzuremlNeuralNetworkRegressionComponent

Creates a regression model using a neural network algorithm.

Parameters
  • create_trainer_mode (_AzuremlNeuralNetworkRegressionCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])

  • hidden_layer_specification (_AzuremlNeuralNetworkRegressionHiddenLayerSpecificationEnum) – Specify the architecture of the hidden layer or layers (optional, enum: [‘Fully-connected case’])

  • number_of_hidden_nodes (str) – Type the number of nodes in the hidden layer. For multiple hidden layers, type a comma-separated list. (optional)

  • the_learning_rate (float) – Specify the size of each step in the learning process (optional, min: 2.220446049250313e-16, max: 2.0)

  • number_of_learning_iterations (int) – Specify the number of iterations while learning (optional, min: 1)

  • hidden_layer_specification1 (_AzuremlNeuralNetworkRegressionHiddenLayerSpecification1Enum) – Specify the architecture of the hidden layer or layers for range (optional, enum: [‘Fully-connected case’])

  • number_of_hidden_nodes1 (str) – Type the number of nodes in the hidden layer, or for multiple hidden layers, type a comma-separated list. (optional)

  • range_for_learning_rate (str) – Specify the range for the size of each step in the learning process (optional)

  • range_for_number_of_learning_iterations (str) – Specify the range for the number of iterations while learning (optional)

  • the_momentum (float) – Specify a weight to apply during learning to nodes from previous iterations (max: 1.0)

  • shuffle_examples (bool) – Select this option to change the order of instances between learning iterations

  • random_number_seed (int) – Specify a numeric seed to use for random number generation. Leave blank to use the default seed. (optional, max: 4294967295)

Output untrained_model

An untrained regression model

Type

untrained_model: Output

example.assets.example_components.azureml_normalize_data(dataset: Optional[pathlib.Path] = None, transformation_method: example.assets.example_components._assets._AzuremlNormalizeDataTransformationMethodEnum = _AzuremlNormalizeDataTransformationMethodEnum.zscore, use_0_for_constant_columns_when_checked: bool = True, columns_to_transform: Optional[str] = None) example.assets.example_components._assets._AzuremlNormalizeDataComponent

Rescales numeric data to constrain dataset values to a standard range.

Parameters
  • dataset (Path) – Input dataset

  • transformation_method (_AzuremlNormalizeDataTransformationMethodEnum) – Choose the mathematical method used for scaling (enum: [‘ZScore’, ‘MinMax’, ‘Logistic’, ‘LogNormal’, ‘Tanh’])

  • use_0_for_constant_columns_when_checked (bool) – Use NaN for constant columns when unchecked or 0 when checked (optional)

  • columns_to_transform (str) – Select all columns to which the selected transformation should be applied

Output transformed_dataset

Transformed dataset

Type

transformed_dataset: Output

Output transformation_function

Definition of the transformation function, which can be applied to other datasets

Type

transformation_function: Output

example.assets.example_components.azureml_one_vs_all_multiclass(untrained_binary_classification_model: Optional[pathlib.Path] = None) example.assets.example_components._assets._AzuremlOneVsAllMulticlassComponent

Creates a one-vs-all multiclass classification model from an ensemble of binary classification models.

Parameters

untrained_binary_classification_model (Path) – An untrained binary classification model

Output untrained_model

An untrained multi-class classification

Type

untrained_model: Output

example.assets.example_components.azureml_one_vs_one_multiclass(untrained_binary_classification_model: Optional[pathlib.Path] = None) example.assets.example_components._assets._AzuremlOneVsOneMulticlassComponent

Creates a one-vs-one multiclass classification model from an ensemble of binary classification models.

Parameters

untrained_binary_classification_model (Path) – An untrained binary classification model

Output untrained_model

An untrained multi-class classification

Type

untrained_model: Output

example.assets.example_components.azureml_partition_and_sample(dataset: Optional[pathlib.Path] = None, partition_or_sample_mode: example.assets.example_components._assets._AzuremlPartitionAndSamplePartitionOrSampleModeEnum = _AzuremlPartitionAndSamplePartitionOrSampleModeEnum.sampling, use_replacement_in_the_partitioning: bool = False, randomized_split: bool = True, random_seed: int = 0, specify_the_partitioner_method: example.assets.example_components._assets._AzuremlPartitionAndSampleSpecifyThePartitionerMethodEnum = _AzuremlPartitionAndSampleSpecifyThePartitionerMethodEnum.partition_evenly, specify_how_many_folds_do_you_want_to_split_evenly_into: int = 5, stratified_split: example.assets.example_components._assets._AzuremlPartitionAndSampleStratifiedSplitEnum = _AzuremlPartitionAndSampleStratifiedSplitEnum.false, stratification_key_column: Optional[str] = None, proportion_list_of_customized_folds_separated_by_comma: Optional[str] = None, stratified_split_for_customized_fold_assignment: example.assets.example_components._assets._AzuremlPartitionAndSampleStratifiedSplitForCustomizedFoldAssignmentEnum = _AzuremlPartitionAndSampleStratifiedSplitForCustomizedFoldAssignmentEnum.false, stratification_key_column_for_customized_fold_assignment: Optional[str] = None, specify_which_fold_to_be_sampled_from: int = 1, pick_complement_of_the_selected_fold: bool = False, rate_of_sampling: float = 0.01, random_seed_for_sampling: int = 0, stratified_split_for_sampling: example.assets.example_components._assets._AzuremlPartitionAndSampleStratifiedSplitForSamplingEnum = _AzuremlPartitionAndSampleStratifiedSplitForSamplingEnum.false, stratification_key_column_for_sampling: Optional[str] = None, number_of_rows_to_select: int = 10) example.assets.example_components._assets._AzuremlPartitionAndSampleComponent

Creates multiple partitions of a dataset based on sampling.

Parameters
  • dataset (Path) – Dataset to be split

  • partition_or_sample_mode (_AzuremlPartitionAndSamplePartitionOrSampleModeEnum) – Select the partition or sampling mode (enum: [‘Assign to Folds’, ‘Pick Fold’, ‘Sampling’, ‘Head’])

  • use_replacement_in_the_partitioning (bool) – Indicate whether the dataset should be replaced when split, or split without replacement (optional)

  • randomized_split (bool) – Indicates whether split is random or not (optional)

  • random_seed (int) – Specify a seed for the random number generator (optional, max: 4294967295)

  • specify_the_partitioner_method (_AzuremlPartitionAndSampleSpecifyThePartitionerMethodEnum) – EvenSize where you specify number of folds, or ShapeInPct where you specify a list of percentage numbers (optional, enum: [‘Partition evenly’, ‘Partition with customized proportions’])

  • specify_how_many_folds_do_you_want_to_split_evenly_into (int) – Number of even partitions to be evenly split into (optional, min: 1)

  • stratified_split (_AzuremlPartitionAndSampleStratifiedSplitEnum) – Indicates whether the split is stratified or not (optional, enum: [‘True’, ‘False’])

  • stratification_key_column (str) – Column containing stratification key (optional)

  • proportion_list_of_customized_folds_separated_by_comma (str) – List of proportions separated by comma (optional)

  • stratified_split_for_customized_fold_assignment (_AzuremlPartitionAndSampleStratifiedSplitForCustomizedFoldAssignmentEnum) – Indicates whether the split is stratified or not for customized fold assignments (optional, enum: [‘True’, ‘False’])

  • stratification_key_column_for_customized_fold_assignment (str) – Column containing stratification key for customized fold assignments (optional)

  • specify_which_fold_to_be_sampled_from (int) – Index of the partitioned fold to be sampled from (optional, min: 1)

  • pick_complement_of_the_selected_fold (bool) – Complement of the logic fold (optional)

  • rate_of_sampling (float) – Sampling rate (optional)

  • random_seed_for_sampling (int) – Random number generator seed for sampling (optional, max: 4294967295)

  • stratified_split_for_sampling (_AzuremlPartitionAndSampleStratifiedSplitForSamplingEnum) – Indicates whether the split is stratified or not for sampling (optional, enum: [‘True’, ‘False’])

  • stratification_key_column_for_sampling (str) – Column containing stratification key for sampling (optional)

  • number_of_rows_to_select (int) – Maximum number of records that will be allowed to pass through to the next module (optional)

Output odataset

Dataset resulting from the split

Type

odataset: Output

example.assets.example_components.azureml_pca_based_anomaly_detection(training_mode: example.assets.example_components._assets._AzuremlPcaBasedAnomalyDetectionTrainingModeEnum = _AzuremlPcaBasedAnomalyDetectionTrainingModeEnum.singleparameter, number_of_components_to_use_in_pca: int = 2, oversampling_parameter_for_randomized_pca: int = 2, enable_input_feature_mean_normalization: bool = False) example.assets.example_components._assets._AzuremlPcaBasedAnomalyDetectionComponent

Create a PCA-based anomaly detection model.

Parameters
  • training_mode (_AzuremlPcaBasedAnomalyDetectionTrainingModeEnum) – Specify learner options. Use ‘SingleParameter’ to manually specify all values. Use ‘ParameterRange’ to sweep over tunable parameters. (enum: [‘SingleParameter’])

  • number_of_components_to_use_in_pca (int) – Specify the number of components to use in PCA. (optional, min: 1)

  • oversampling_parameter_for_randomized_pca (int) – Specify the accuracy parameter for randomized PCA training. (optional)

  • enable_input_feature_mean_normalization (bool) – Specify if the input data is normalized to have zero mean.

Output untrained_model

An untrained PCA-based anomaly detection model.

Type

untrained_model: Output

example.assets.example_components.azureml_permutation_feature_importance(trained_model: Optional[pathlib.Path] = None, test_data: Optional[pathlib.Path] = None, random_seed: int = 0, metric_for_measuring_performance: example.assets.example_components._assets._AzuremlPermutationFeatureImportanceMetricForMeasuringPerformanceEnum = _AzuremlPermutationFeatureImportanceMetricForMeasuringPerformanceEnum.accuracy) example.assets.example_components._assets._AzuremlPermutationFeatureImportanceComponent

Computes the permutation feature importance scores of feature variables given a trained model and a test dataset.

Parameters
  • trained_model (Path) – Trained model to be used for scoring

  • test_data (Path) – Test dataset for scoring and evaluating a model after permutation of feature values

  • random_seed (int) – Random number generator seed value (max: 4294967295)

  • metric_for_measuring_performance (_AzuremlPermutationFeatureImportanceMetricForMeasuringPerformanceEnum) – Evaluation metric (enum: [‘Accuracy’, ‘Precision’, ‘Recall’, ‘Mean Absolute Error’, ‘Root Mean Squared Error’, ‘Relative Absolute Error’, ‘Relative Squared Error’, ‘Coefficient of Determination’])

Output feature_importance

Feature importance results

Type

feature_importance: Output

example.assets.example_components.azureml_poisson_regression(create_trainer_mode: example.assets.example_components._assets._AzuremlPoissonRegressionCreateTrainerModeEnum = _AzuremlPoissonRegressionCreateTrainerModeEnum.singleparameter, tolerance_parameter_for_optimization_convergence_the_lower_the_value_the_slower_and_more_accurate_the_fitting: float = 1e-07, l1_regularization_weight: float = 1.0, l2_regularization_weight: float = 1.0, memory_size_for_l_bfgs_the_lower_the_value_the_faster_and_less_accurate_the_training: int = 20, range_for_optimization_tolerance: str = '0.00001; 0.00000001', range_for_l1_regularization_weight: str = '0.0; 0.01; 0.1; 1.0', range_for_l2_regularization_weight: str = '0.01; 0.1; 1.0', range_for_memory_size_for_l_bfgs_the_lower_the_value_the_faster_and_less_accurate_the_training: str = '5; 20; 50') example.assets.example_components._assets._AzuremlPoissonRegressionComponent

Creates a regression model that assumes data has a Poisson distribution

Parameters
  • create_trainer_mode (_AzuremlPoissonRegressionCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])

  • tolerance_parameter_for_optimization_convergence_the_lower_the_value_the_slower_and_more_accurate_the_fitting (float) – Specify a tolerance value for optimization convergence. The lower the value, the slower and more accurate the fitting. (optional, min: 2.220446049250313e-16)

  • l1_regularization_weight (float) – Specify the L1 regularization weight. Use a non-zero value to avoid overfitting the model. (optional)

  • l2_regularization_weight (float) – Specify the L2 regularization weight. Use a non-zero value to avoid overfitting the model. (optional)

  • memory_size_for_l_bfgs_the_lower_the_value_the_faster_and_less_accurate_the_training (int) – Indicate how much memory (in MB) to use for the L-BFGS optimizer. With less memory, training is faster but less accurate the training. (optional, min: 1)

  • range_for_optimization_tolerance (str) – Specify a range for the tolerance value for the L-BFGS optimizer (optional)

  • range_for_l1_regularization_weight (str) – Specify the range for the L1 regularization weight. Use a non-zero value to avoid overfitting. (optional)

  • range_for_l2_regularization_weight (str) – Specify the range for the L2 regularization weight. Use a non-zero value to avoid overfitting. (optional)

  • range_for_memory_size_for_l_bfgs_the_lower_the_value_the_faster_and_less_accurate_the_training (str) – Specify the range for the amount of memory (in MB) to use for the L-BFGS optimizer. The lower the value, the faster and less accurate the training. (optional)

Output untrained_model

An untrained Poisson regression model

Type

untrained_model: Output

example.assets.example_components.azureml_preprocess_text(dataset: Optional[pathlib.Path] = None, stop_words: Optional[pathlib.Path] = None, language: example.assets.example_components._assets._AzuremlPreprocessTextLanguageEnum = _AzuremlPreprocessTextLanguageEnum.english, expand_verb_contractions: bool = True, text_column_to_clean: Optional[str] = None, remove_stop_words: bool = True, use_lemmatization: bool = True, detect_sentences: bool = True, normalize_case_to_lowercase: bool = True, remove_numbers: bool = True, remove_special_characters: bool = True, remove_duplicate_characters: bool = True, remove_email_addresses: bool = True, remove_urls: bool = True, normalize_backslashes_to_slashes: bool = True, split_tokens_on_special_characters: bool = True, custom_regular_expression: Optional[str] = None, custom_replacement_string: Optional[str] = None) example.assets.example_components._assets._AzuremlPreprocessTextComponent

Performs cleaning operations on text.

Parameters
  • dataset (Path) – Input data

  • stop_words (Path) – Optional custom list of stop words to remove(optional)

  • language (_AzuremlPreprocessTextLanguageEnum) – Select the language to preprocess (enum: [‘English’])

  • expand_verb_contractions (bool) – Expand verb contractions (English only) (optional)

  • text_column_to_clean (str) – Select the text column to clean

  • remove_stop_words (bool) – Remove stop words

  • use_lemmatization (bool) – Use lemmatization

  • detect_sentences (bool) – Detect sentences by adding a sentence terminator “|||” that can be used by the n-gram features extractor module

  • normalize_case_to_lowercase (bool) – Normalize case to lowercase

  • remove_numbers (bool) – Remove numbers

  • remove_special_characters (bool) – Remove non-alphanumeric special characters and replace them with “|” character

  • remove_duplicate_characters (bool) – Remove duplicate characters

  • remove_email_addresses (bool) – Remove email addresses

  • remove_urls (bool) – Remove URLs

  • normalize_backslashes_to_slashes (bool) – Normalize backslashes to slashes

  • split_tokens_on_special_characters (bool) – Split tokens on special characters

  • custom_regular_expression (str) – Specify the custom regular expression (optional)

  • custom_replacement_string (str) – Specify the custom replacement string for the custom regular expression (optional)

Output results_dataset

Results dataset

Type

results_dataset: Output

example.assets.example_components.azureml_remove_duplicate_rows(dataset: Optional[pathlib.Path] = None, key_column_selection_filter_expression: Optional[str] = None, retain_first_duplicate_row: bool = True) example.assets.example_components._assets._AzuremlRemoveDuplicateRowsComponent

Removes the duplicate rows from a dataset.

Parameters
  • dataset (Path) – Input dataset

  • key_column_selection_filter_expression (str) – Choose the key columns to use when searching for duplicates

  • retain_first_duplicate_row (bool) – indicate whether to keep the first row of a set of duplicates and discard others. if false, the last duplicate row encountered will be kept.

Output results_dataset

Filtered dataset

Type

results_dataset: Output

example.assets.example_components.azureml_resnet(model_name: example.assets.example_components._assets._AzuremlResnetModelNameEnum = _AzuremlResnetModelNameEnum.resnext101_32x8d, pretrained: bool = True, zero_init_residual: bool = False) example.assets.example_components._assets._AzuremlResnetComponent

Creates a image classification model using the resnet algorithm.

Parameters
  • model_name (_AzuremlResnetModelNameEnum) – Name of a certain resnet structure (enum: [‘resnet18’, ‘resnet34’, ‘resnet50’, ‘resnet101’, ‘resnet152’, ‘resnext50_32x4d’, ‘resnext101_32x8d’, ‘wide_resnet50_2’, ‘wide_resnet101_2’])

  • pretrained (bool) – Indicate whether to use a model pre-trained on ImageNet

  • zero_init_residual (bool) – Zero-initialize the last BN in each residual branch. (optional)

Output untrained_model

Untrained resnet model path

Type

untrained_model: Output

example.assets.example_components.azureml_score_image_model(trained_model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None) example.assets.example_components._assets._AzuremlScoreImageModelComponent

Scores predictions for a trained image model.

Parameters
  • trained_model (Path) – Trained predictive model

  • dataset (Path) – Input data to score

Output scored_dataset

Dataset with obtained scores

Type

scored_dataset: Output

example.assets.example_components.azureml_score_model(trained_model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, append_score_columns_to_output: bool = True) example.assets.example_components._assets._AzuremlScoreModelComponent

Scores predictions for a trained classification or regression model.

Parameters
  • trained_model (Path) – Trained predictive model

  • dataset (Path) – Input test dataset

  • append_score_columns_to_output (bool) – If checked, append score columns to the result dataset, otherwise only return the scores and true labels if available.

Output scored_dataset

Dataset with obtained scores

Type

scored_dataset: Output

example.assets.example_components.azureml_score_svd_recommender(trained_svd_recommendation: Optional[pathlib.Path] = None, dataset_to_score: Optional[pathlib.Path] = None, training_data: Optional[pathlib.Path] = None, recommender_prediction_kind: example.assets.example_components._assets._AzuremlScoreSvdRecommenderRecommenderPredictionKindEnum = _AzuremlScoreSvdRecommenderRecommenderPredictionKindEnum.item_recommendation, recommended_item_selection: example.assets.example_components._assets._AzuremlScoreSvdRecommenderRecommendedItemSelectionEnum = _AzuremlScoreSvdRecommenderRecommendedItemSelectionEnum.from_rated_items_for_model_evaluation, minimum_size_of_the_recommendation_pool_for_a_single_user: int = 2, maximum_number_of_items_to_recommend_to_a_user: int = 5, whether_to_return_the_predicted_ratings_of_the_items_along_with_the_labels: bool = False) example.assets.example_components._assets._AzuremlScoreSvdRecommenderComponent

Score a dataset using the SVD recommendation.

Parameters
  • trained_svd_recommendation (Path) – Trained SVD recommendation

  • dataset_to_score (Path) – Dataset to score

  • training_data (Path) – Dataset containing the training data. (Used to filter out already rated items from prediction)(optional)

  • recommender_prediction_kind (_AzuremlScoreSvdRecommenderRecommenderPredictionKindEnum) – Specify the type of prediction the recommendation should output (enum: [‘Rating Prediction’, ‘Item Recommendation’])

  • recommended_item_selection (_AzuremlScoreSvdRecommenderRecommendedItemSelectionEnum) – Select the set of items to make recommendations from (optional, enum: [‘From All Items’, ‘From Rated Items (for model evaluation)’, ‘From Unrated Items (to suggest new items to users)’])

  • minimum_size_of_the_recommendation_pool_for_a_single_user (int) – Specify the minimum size of the recommendation pool for each user (optional, min: 1)

  • maximum_number_of_items_to_recommend_to_a_user (int) – Specify the maximum number of items to recommend to a user (optional, min: 1)

  • whether_to_return_the_predicted_ratings_of_the_items_along_with_the_labels (bool) – Specify whether to return the predicted ratings of the items along with the labels (optional)

Output scored_dataset

Scored dataset

Type

scored_dataset: Output

example.assets.example_components.azureml_score_vowpal_wabbit_model(trained_vowpal_wabbit_model: Optional[pathlib.Path] = None, test_data: Optional[pathlib.Path] = None, vw_arguments: Optional[str] = None, name_of_the_test_data_file: Optional[str] = None, specify_file_type: example.assets.example_components._assets._AzuremlScoreVowpalWabbitModelSpecifyFileTypeEnum = _AzuremlScoreVowpalWabbitModelSpecifyFileTypeEnum.vw, include_an_extra_column_containing_labels: bool = False, include_an_extra_column_containing_raw_scores: bool = False) example.assets.example_components._assets._AzuremlScoreVowpalWabbitModelComponent

Score data using Vowpal Wabbit from the command line interface.

Parameters
  • trained_vowpal_wabbit_model (Path) – Trained Vowpal Wabbit model.

  • test_data (Path) – Test data.

  • vw_arguments (str) – Type vowpal wabbit command line arguments. (optional)

  • name_of_the_test_data_file (str) – Type name of the test data file. (optional)

  • specify_file_type (_AzuremlScoreVowpalWabbitModelSpecifyFileTypeEnum) – Please specify file type. (enum: [‘VW’, ‘SVMLight’])

  • include_an_extra_column_containing_labels (bool) – Whether to include an extra column containing labels in the scored dataset.

  • include_an_extra_column_containing_raw_scores (bool) – Whether to include an extra column containing raw scores in the scored dataset.

Output scored_dataset

Scored dataset

Type

scored_dataset: Output

example.assets.example_components.azureml_score_wide_and_deep_recommender(trained_wide_and_deep_recommendation_model: Optional[pathlib.Path] = None, dataset_to_score: Optional[pathlib.Path] = None, user_features: Optional[pathlib.Path] = None, item_features: Optional[pathlib.Path] = None, training_data: Optional[pathlib.Path] = None, recommender_prediction_kind: example.assets.example_components._assets._AzuremlScoreWideAndDeepRecommenderRecommenderPredictionKindEnum = _AzuremlScoreWideAndDeepRecommenderRecommenderPredictionKindEnum.item_recommendation, recommended_item_selection: example.assets.example_components._assets._AzuremlScoreWideAndDeepRecommenderRecommendedItemSelectionEnum = _AzuremlScoreWideAndDeepRecommenderRecommendedItemSelectionEnum.from_rated_items_for_model_evaluation, minimum_size_of_the_recommendation_pool_for_a_single_user: int = 2, maximum_number_of_items_to_recommend_to_a_user: int = 5, whether_to_return_the_predicted_ratings_of_the_items_along_with_the_labels: bool = False) example.assets.example_components._assets._AzuremlScoreWideAndDeepRecommenderComponent

Score a dataset using the Wide and Deep recommendation model.

Parameters
  • trained_wide_and_deep_recommendation_model (Path) – Trained Wide and Deep recommendation model

  • dataset_to_score (Path) – Dataset to score

  • user_features (Path) – User features(optional)

  • item_features (Path) – Item features(optional)

  • training_data (Path) – Dataset containing the training data. (Used to filter out already rated items from prediction)(optional)

  • recommender_prediction_kind (_AzuremlScoreWideAndDeepRecommenderRecommenderPredictionKindEnum) – Specify the type of prediction the recommendation should output (enum: [‘Rating Prediction’, ‘Item Recommendation’])

  • recommended_item_selection (_AzuremlScoreWideAndDeepRecommenderRecommendedItemSelectionEnum) – Select the set of items to make recommendations from (optional, enum: [‘From All Items’, ‘From Rated Items (for model evaluation)’, ‘From Unrated Items (to suggest new items to users)’])

  • minimum_size_of_the_recommendation_pool_for_a_single_user (int) – Specify the minimum size of the recommendation pool for each user (optional, min: 1)

  • maximum_number_of_items_to_recommend_to_a_user (int) – Specify the maximum number of items to recommend to a user (optional, min: 1)

  • whether_to_return_the_predicted_ratings_of_the_items_along_with_the_labels (bool) – Specify whether to return the predicted ratings of the items along with the labels (optional)

Output scored_dataset

Scored dataset

Type

scored_dataset: Output

example.assets.example_components.azureml_select_columns_in_dataset(dataset: Optional[pathlib.Path] = None, select_columns: Optional[str] = None) example.assets.example_components._assets._AzuremlSelectColumnsInDatasetComponent

Selects columns to include or exclude from a dataset in an operation.

Parameters
  • dataset (Path) – Input dataset

  • select_columns (str) – Select columns to keep in the projected dataset

Output results_dataset

Output dataset

Type

results_dataset: Output

example.assets.example_components.azureml_select_columns_transform(dataset_with_desired_columns: Optional[pathlib.Path] = None) example.assets.example_components._assets._AzuremlSelectColumnsTransformComponent

Create a transformation that selects the same subset of columns as in the given dataset.

Parameters

dataset_with_desired_columns (Path) – Dataset containing desired set of columns

Output columns_selection_transformation

Transformation that selects the same subset of columns as in the given dataset.

Type

columns_selection_transformation: Output

example.assets.example_components.azureml_smote(samples: Optional[pathlib.Path] = None, label_column: Optional[str] = None, smote_percentage: int = 100, number_of_nearest_neighbors: int = 1, random_seed: int = 0) example.assets.example_components._assets._AzuremlSmoteComponent

Increases the number of low incidence examples in a dataset.

Parameters
  • samples (Path) – A DataTable of samples

  • label_column (str) – Select the column that contains the label or outcome column

  • smote_percentage (int) – Amount of oversampling.If not in integral multiples of 100, the minority class will be randomized and downsampled from the next integral multiple of 100.

  • number_of_nearest_neighbors (int) – The number of nearest neighbors (min: 1)

  • random_seed (int) – Random number generator seed (max: 4294967295)

Output table

A DataTable containing original samples plus an additional synthetic minority class samples, where T is the number of minority class samples

Type

table: Output

example.assets.example_components.azureml_split_data(dataset: Optional[pathlib.Path] = None, splitting_mode: example.assets.example_components._assets._AzuremlSplitDataSplittingModeEnum = _AzuremlSplitDataSplittingModeEnum.split_rows, fraction_of_rows_in_the_first_output_dataset: float = 0.5, randomized_split: bool = True, random_seed: int = 0, stratified_split: example.assets.example_components._assets._AzuremlSplitDataStratifiedSplitEnum = _AzuremlSplitDataStratifiedSplitEnum.false, stratification_key_column: Optional[str] = None, regular_expression: str = '"column name" ^start', relational_expression: str = '"column name" > 3') example.assets.example_components._assets._AzuremlSplitDataComponent

Partitions the rows of a dataset into two distinct sets.

Parameters
  • dataset (Path) – Dataset to split

  • splitting_mode (_AzuremlSplitDataSplittingModeEnum) – Choose the method for splitting the dataset (enum: [‘Split Rows’, ‘Regular Expression’, ‘Relative Expression’])

  • fraction_of_rows_in_the_first_output_dataset (float) – Specify a ratio representing the number of rows in the first output dataset over the number of rows in the input dataset (optional, max: 1.0)

  • randomized_split (bool) – Indicate whether rows should be randomly selected (optional)

  • random_seed (int) – Provide a value to see the random number generator seed (optional, max: 4294967295)

  • stratified_split (_AzuremlSplitDataStratifiedSplitEnum) – Indicate whether the rows in each split should be grouped using a strata column (optional, enum: [‘True’, ‘False’])

  • stratification_key_column (str) – Select the column containing the stratification key (optional)

  • regular_expression (str) – Type a regular expression to use as criteria when splitting the dataset on a string column (optional)

  • relational_expression (str) – Type a relational expression to use in splitting the dataset on a numeric column (optional)

Output results_dataset1

Dataset containing selected rows

Type

results_dataset1: Output

Output results_dataset2

Dataset containing all other rows

Type

results_dataset2: Output

example.assets.example_components.azureml_split_image_directory(input_image_directory: Optional[pathlib.Path] = None, fraction_of_images_in_the_first_output: float = 0.9) example.assets.example_components._assets._AzuremlSplitImageDirectoryComponent

Partitions the images of a image directory into two distinct sets.

Parameters
  • input_image_directory (Path) – Input image directory

  • fraction_of_images_in_the_first_output (float) – Fraction of images in the first output (min: 2.220446049250313e-16, max: 0.9999999999999998)

Output output_image_directory1

First output image directory

Type

output_image_directory1: Output

Output output_image_directory2

Second output image directory

Type

output_image_directory2: Output

example.assets.example_components.azureml_summarize_data(input: Optional[pathlib.Path] = None) example.assets.example_components._assets._AzuremlSummarizeDataComponent

Generates a basic descriptive statistics report for the columns in a dataset.

Parameters

input (Path) – DataFrameDirectory

Output result_dataset

DataFrameDirectory

Type

result_dataset: Output

example.assets.example_components.azureml_train_anomaly_detection_model(untrained_model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None) example.assets.example_components._assets._AzuremlTrainAnomalyDetectionModelComponent

Trains an anomaly detector model and labels data from the training set

Parameters
  • untrained_model (Path) – Untrained learner

  • dataset (Path) – Input data source

Output trained_model

Trained anomaly detection model

Type

trained_model: Output

example.assets.example_components.azureml_train_clustering_model(untrained_model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, column_set: Optional[str] = None, check_for_append_or_uncheck_for_result_only: bool = True) example.assets.example_components._assets._AzuremlTrainClusteringModelComponent

Train clustering model and assign data to clusters.

Parameters
  • untrained_model (Path) – Untrained clustering model

  • dataset (Path) – Input data source

  • column_set (str) – Column selection pattern

  • check_for_append_or_uncheck_for_result_only (bool) – Whether output dataset must contain input dataset appended by assignments column (Checked) or assignments column only (Unchecked)

Output trained_model

Trained clustering model

Type

trained_model: Output

Output results_dataset

Input dataset appended by data column of assignments or assignments column only

Type

results_dataset: Output

example.assets.example_components.azureml_train_model(untrained_model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, label_column: Optional[str] = None, model_explanations: bool = False) example.assets.example_components._assets._AzuremlTrainModelComponent

Trains a classification or regression model in a supervised manner.

Parameters
  • untrained_model (Path) – Untrained learner

  • dataset (Path) – Training data

  • label_column (str) – Select the column that contains the label or outcome column

  • model_explanations (bool) – Whether to generate explanations for the trained model. Default is unchecked to reduce extra compute overhead. (optional)

Output trained_model

Trained learner

Type

trained_model: Output

example.assets.example_components.azureml_train_pytorch_model(untrained_model: Optional[pathlib.Path] = None, training_dataset: Optional[pathlib.Path] = None, validation_dataset: Optional[pathlib.Path] = None, epochs: int = 5, batch_size: int = 16, warmup_step_number: int = 0, learning_rate: float = 0.001, random_seed: int = 1, patience: int = 3, print_frequency: int = 10) example.assets.example_components._assets._AzuremlTrainPytorchModelComponent

Train pytorch model from scratch or finetune it.

Parameters
  • untrained_model (Path) – Untrained model

  • training_dataset (Path) – Input dataset for training

  • validation_dataset (Path) – Input dataset for validation

  • epochs (int) – Epochs. (min: 1)

  • batch_size (int) – Batch size. (min: 1)

  • warmup_step_number (int) – Warmup step number (optional)

  • learning_rate (float) – Learning rate. (min: 2.220446049250313e-16, max: 2.0)

  • random_seed (int) – Random seed.

  • patience (int) – Patience. (min: 1)

  • print_frequency (int) – Training log print frequency over iterations in each epoch. (optional, min: 1)

Output trained_model

Trained model

Type

trained_model: Output

example.assets.example_components.azureml_train_svd_recommender(training_dataset_of_user_item_rating_triples: Optional[pathlib.Path] = None, number_of_factors: int = 200, number_of_recommendation_algorithm_iterations: int = 30, learning_rate: float = 0.005) example.assets.example_components._assets._AzuremlTrainSvdRecommenderComponent

Train a collaborative filtering recommendation using SVD algorithm.

Parameters
  • training_dataset_of_user_item_rating_triples (Path) – Ratings of items by users, expressed as triple (User, Item, Rating)

  • number_of_factors (int) – Specify the number of factors to use with recommendation (min: 1)

  • number_of_recommendation_algorithm_iterations (int) – Specify the maximum number of iterations to perform while training the recommendation model (min: 1)

  • learning_rate (float) – Specify the size of each step in the learning process (min: 2.220446049250313e-16, max: 2.0)

Output trained_svd_recommendation

Trained SVD recommendation

Type

trained_svd_recommendation: Output

example.assets.example_components.azureml_train_vowpal_wabbit_model(pre_trained_vowpal_wabbit_model: Optional[pathlib.Path] = None, training_data: Optional[pathlib.Path] = None, vw_arguments: Optional[str] = None, name_of_the_training_data_file: Optional[str] = None, specify_file_type: example.assets.example_components._assets._AzuremlTrainVowpalWabbitModelSpecifyFileTypeEnum = _AzuremlTrainVowpalWabbitModelSpecifyFileTypeEnum.vw, output_readable_model_file: bool = False, output_inverted_hash_file: bool = False) example.assets.example_components._assets._AzuremlTrainVowpalWabbitModelComponent

Train a Vowpal Wabbit model using the command line interface.

Parameters
  • pre_trained_vowpal_wabbit_model (Path) – Trained Vowpal Wabbit model.(optional)

  • training_data (Path) – Training data.

  • vw_arguments (str) – Type vowpal wabbit command line arguments. (optional)

  • name_of_the_training_data_file (str) – Type name of the training data file. (optional)

  • specify_file_type (_AzuremlTrainVowpalWabbitModelSpecifyFileTypeEnum) – Please specify file type. (enum: [‘VW’, ‘SVMLight’])

  • output_readable_model_file (bool) – Output readable model (–readable_model) file.

  • output_inverted_hash_file (bool) – Output inverted hash (–invert_hash) file.

Output trained_vowpal_wabbit_model

Trained Vowpal Wabbit model

Type

trained_vowpal_wabbit_model: Output

example.assets.example_components.azureml_train_wide_and_deep_recommender(training_dataset_of_user_item_rating_triples: Optional[pathlib.Path] = None, user_features: Optional[pathlib.Path] = None, item_features: Optional[pathlib.Path] = None, epochs: int = 15, batch_size: int = 64, wide_part_optimizer: example.assets.example_components._assets._AzuremlTrainWideAndDeepRecommenderWidePartOptimizerEnum = _AzuremlTrainWideAndDeepRecommenderWidePartOptimizerEnum.adagrad, wide_optimizer_learning_rate: float = 0.1, crossed_feature_dimension: int = 1000, deep_part_optimizer: example.assets.example_components._assets._AzuremlTrainWideAndDeepRecommenderDeepPartOptimizerEnum = _AzuremlTrainWideAndDeepRecommenderDeepPartOptimizerEnum.adagrad, deep_optimizer_learning_rate: float = 0.1, user_embedding_dimension: int = 16, item_embedding_dimension: int = 16, categorical_features_embedding_dimension: int = 4, hidden_units: str = '256,128', activation_function: example.assets.example_components._assets._AzuremlTrainWideAndDeepRecommenderActivationFunctionEnum = _AzuremlTrainWideAndDeepRecommenderActivationFunctionEnum.relu, dropout: float = 0.8, batch_normalization: bool = True) example.assets.example_components._assets._AzuremlTrainWideAndDeepRecommenderComponent

Train a recommender based on Wide & Deep model.

Parameters
  • training_dataset_of_user_item_rating_triples (Path) – Ratings of items by users, expressed as triple (User, Item, Rating)

  • user_features (Path) – User features(optional)

  • item_features (Path) – Item features(optional)

  • epochs (int) – Maximum number of epochs to perform while training (min: 1)

  • batch_size (int) – Number of consecutive samples to combine in a single batch (min: 1)

  • wide_part_optimizer (_AzuremlTrainWideAndDeepRecommenderWidePartOptimizerEnum) – Optimizer used to apply gradients to the wide part of the model (enum: [‘Adagrad’, ‘Adam’, ‘Ftrl’, ‘RMSProp’, ‘SGD’, ‘Adadelta’])

  • wide_optimizer_learning_rate (float) – Size of each step in the learning process for wide part of the model (min: 2.220446049250313e-16, max: 2.0)

  • crossed_feature_dimension (int) – Crossed feature dimension for wide part model (min: 1)

  • deep_part_optimizer (_AzuremlTrainWideAndDeepRecommenderDeepPartOptimizerEnum) – Optimizer used to apply gradients to the deep part of the model (enum: [‘Adagrad’, ‘Adam’, ‘Ftrl’, ‘RMSProp’, ‘SGD’, ‘Adadelta’])

  • deep_optimizer_learning_rate (float) – Size of each step in the learning process for deep part of the model (min: 2.220446049250313e-16, max: 2.0)

  • user_embedding_dimension (int) – User embedding dimension for deep part model (min: 1)

  • item_embedding_dimension (int) – Item embedding dimension for deep part model (min: 1)

  • categorical_features_embedding_dimension (int) – Categorical features embedding dimension for deep part model (min: 1)

  • hidden_units (str) – Hidden units per layer for deep part model

  • activation_function (_AzuremlTrainWideAndDeepRecommenderActivationFunctionEnum) – Activation function applied to each layer in deep part model (enum: [‘ReLU’, ‘Sigmoid’, ‘Tanh’, ‘Linear’, ‘LeakyReLU’])

  • dropout (float) – Probability that each element is dropped in deep part model (max: 1.0)

  • batch_normalization (bool) – Whether to use batch normalization after each hidden layer

Output trained_wide_and_deep_recommendation_model

Trained Wide and Deep recommendation model

Type

trained_wide_and_deep_recommendation_model: Output

example.assets.example_components.azureml_tune_model_hyperparameters(untrained_model: Optional[pathlib.Path] = None, training_dataset: Optional[pathlib.Path] = None, optional_validation_dataset: Optional[pathlib.Path] = None, specify_parameter_sweeping_mode: example.assets.example_components._assets._AzuremlTuneModelHyperparametersSpecifyParameterSweepingModeEnum = _AzuremlTuneModelHyperparametersSpecifyParameterSweepingModeEnum.random_sweep, maximum_number_of_runs_on_random_sweep: int = 5, random_seed: int = 0, name_or_numerical_index_of_the_label_column: Optional[str] = None, metric_for_measuring_performance_for_classification: example.assets.example_components._assets._AzuremlTuneModelHyperparametersMetricForMeasuringPerformanceForClassificationEnum = _AzuremlTuneModelHyperparametersMetricForMeasuringPerformanceForClassificationEnum.accuracy, metric_for_measuring_performance_for_regression: example.assets.example_components._assets._AzuremlTuneModelHyperparametersMetricForMeasuringPerformanceForRegressionEnum = _AzuremlTuneModelHyperparametersMetricForMeasuringPerformanceForRegressionEnum.mean_absolute_error) example.assets.example_components._assets._AzuremlTuneModelHyperparametersComponent

Perform a parameter sweep on the model to determine the optimum parameter settings.

Parameters
  • untrained_model (Path) – Untrained model for parameter sweep

  • training_dataset (Path) – Input dataset for training

  • optional_validation_dataset (Path) – Input dataset for validation (for Train/Test validation mode)(optional)

  • specify_parameter_sweeping_mode (_AzuremlTuneModelHyperparametersSpecifyParameterSweepingModeEnum) – Sweep entire grid on parameter space, or sweep with using a limited number of sample runs (enum: [‘Entire grid’, ‘Random sweep’])

  • maximum_number_of_runs_on_random_sweep (int) – Execute maximum number of runs using random sweep (optional, min: 1, max: 10000)

  • random_seed (int) – Provide a value to seed the random number generator (optional, max: 4294967295)

  • name_or_numerical_index_of_the_label_column (str) – Label column

  • metric_for_measuring_performance_for_classification (_AzuremlTuneModelHyperparametersMetricForMeasuringPerformanceForClassificationEnum) – Select the metric used for evaluating classification models (enum: [‘Accuracy’, ‘Precision’, ‘Recall’, ‘F-score’, ‘AUC’, ‘Average Log Loss’])

  • metric_for_measuring_performance_for_regression (_AzuremlTuneModelHyperparametersMetricForMeasuringPerformanceForRegressionEnum) – Select the metric used for evaluating regression models (enum: [‘Mean absolute error’, ‘Root of mean squared error’, ‘Relative absolute error’, ‘Relative squared error’, ‘Coefficient of determination’])

Output sweep_results

Results metric for parameter sweep runs

Type

sweep_results: Output

Output trained_best_model

Model with best performance on the training dataset

Type

trained_best_model: Output

example.assets.example_components.azureml_two_class_averaged_perceptron(create_trainer_mode: example.assets.example_components._assets._AzuremlTwoClassAveragedPerceptronCreateTrainerModeEnum = _AzuremlTwoClassAveragedPerceptronCreateTrainerModeEnum.singleparameter, initial_learning_rate: float = 1.0, maximum_number_of_iterations: int = 10, range_for_initial_learning_rate: str = '0.1; 0.5; 1.0', range_for_maximum_number_of_iterations: str = '1; 10', random_number_seed: Optional[int] = None) example.assets.example_components._assets._AzuremlTwoClassAveragedPerceptronComponent

Creates an averaged perceptron binary classification model.

Parameters
  • create_trainer_mode (_AzuremlTwoClassAveragedPerceptronCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])

  • initial_learning_rate (float) – The initial learning rate for the Stochastic Gradient Descent optimizer. (optional, min: 2.220446049250313e-16)

  • maximum_number_of_iterations (int) – The number of Stochastic Gradient Descent iterations to be performed over the training dataset. (optional, min: 1)

  • range_for_initial_learning_rate (str) – Range for initial learning rate for the Stochastic Gradient Descent optimizer. (optional)

  • range_for_maximum_number_of_iterations (str) – Range for the number of Stochastic Gradient Descent iterations to be performed over the training dataset. (optional)

  • random_number_seed (int) – The seed for the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained binary classification model that can be connected to the Create One-vs-All Multi-class Classifier or Train Generic Model or Cross Validate Model modules.

Type

untrained_model: Output

example.assets.example_components.azureml_two_class_boosted_decision_tree(create_trainer_mode: example.assets.example_components._assets._AzuremlTwoClassBoostedDecisionTreeCreateTrainerModeEnum = _AzuremlTwoClassBoostedDecisionTreeCreateTrainerModeEnum.singleparameter, maximum_number_of_leaves_per_tree: int = 20, minimum_number_of_training_instances_required_to_form_a_leaf: int = 10, the_learning_rate: float = 0.2, total_number_of_trees_constructed: int = 100, range_for_maximum_number_of_leaves_per_tree: str = '2; 8; 32; 128', range_for_minimum_number_of_training_instances_required_to_form_a_leaf: str = '1; 10; 50', range_for_learning_rate: str = '0.025; 0.05; 0.1; 0.2; 0.4', range_for_total_number_of_trees_constructed: str = '20; 100; 500', random_number_seed: Optional[int] = None) example.assets.example_components._assets._AzuremlTwoClassBoostedDecisionTreeComponent

Creates a binary classifier using a boosted decision tree algorithm.

Parameters
  • create_trainer_mode (_AzuremlTwoClassBoostedDecisionTreeCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])

  • maximum_number_of_leaves_per_tree (int) – Specify the maximum number of leaves allowed per tree (optional, min: 2, max: 131072)

  • minimum_number_of_training_instances_required_to_form_a_leaf (int) – Specify the minimum number of cases required to form a leaf (optional, min: 1)

  • the_learning_rate (float) – Specify the initial learning rate (optional, min: 2.220446049250313e-16, max: 1.0)

  • total_number_of_trees_constructed (int) – Specify the maximum number of trees that can be created during training (optional, min: 1)

  • range_for_maximum_number_of_leaves_per_tree (str) – Specify range for the maximum number of leaves allowed per tree (optional)

  • range_for_minimum_number_of_training_instances_required_to_form_a_leaf (str) – Specify the range for the minimum number of cases required to form a leaf (optional)

  • range_for_learning_rate (str) – Specify the range for the initial learning rate (optional)

  • range_for_total_number_of_trees_constructed (str) – Specify the range for the maximum number of trees that can be created during training (optional)

  • random_number_seed (int) – Type a value to seed the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained binary classification model

Type

untrained_model: Output

example.assets.example_components.azureml_two_class_decision_forest(create_trainer_mode: example.assets.example_components._assets._AzuremlTwoClassDecisionForestCreateTrainerModeEnum = _AzuremlTwoClassDecisionForestCreateTrainerModeEnum.singleparameter, number_of_decision_trees: int = 8, maximum_depth_of_the_decision_trees: int = 32, minimum_number_of_samples_per_leaf_node: int = 1, range_for_number_of_decision_trees: str = '1; 8; 32', range_for_the_maximum_depth_of_the_decision_trees: str = '1; 16; 64', range_for_the_minimum_number_of_samples_per_leaf_node: str = '1; 4; 16', resampling_method: example.assets.example_components._assets._AzuremlTwoClassDecisionForestResamplingMethodEnum = _AzuremlTwoClassDecisionForestResamplingMethodEnum.bagging_resampling) example.assets.example_components._assets._AzuremlTwoClassDecisionForestComponent

Creates a two-class classification model using the decision forest algorithm.

Parameters
  • create_trainer_mode (_AzuremlTwoClassDecisionForestCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])

  • number_of_decision_trees (int) – Specify the number of decision trees to create in the ensemble (optional, min: 1)

  • maximum_depth_of_the_decision_trees (int) – Specify the maximum depth of any decision tree that can be created in the ensemble (optional, min: 1)

  • minimum_number_of_samples_per_leaf_node (int) – Specify the minimum number of training samples required to generate a leaf node (optional, min: 1)

  • range_for_number_of_decision_trees (str) – Specify range for the number of decision trees to create in the ensemble (optional)

  • range_for_the_maximum_depth_of_the_decision_trees (str) – Specify range for the maximum depth of the decision trees (optional)

  • range_for_the_minimum_number_of_samples_per_leaf_node (str) – Specify range for the minimum number of samples per leaf node (optional)

  • resampling_method (_AzuremlTwoClassDecisionForestResamplingMethodEnum) – Choose a resampling method (enum: [‘Bagging Resampling’, ‘Replicate Resampling’])

Output untrained_model

An untrained binary classification model

Type

untrained_model: Output

example.assets.example_components.azureml_two_class_logistic_regression(create_trainer_mode: example.assets.example_components._assets._AzuremlTwoClassLogisticRegressionCreateTrainerModeEnum = _AzuremlTwoClassLogisticRegressionCreateTrainerModeEnum.singleparameter, optimization_tolerance: float = 1e-07, l2_regularizaton_weight: float = 1.0, range_for_optimization_tolerance: str = '0.00001; 0.00000001', range_for_l2_regularization_weight: str = '0.01; 0.1; 1.0', random_number_seed: Optional[int] = None) example.assets.example_components._assets._AzuremlTwoClassLogisticRegressionComponent

Creates a two-class logistic regression model.

Parameters
  • create_trainer_mode (_AzuremlTwoClassLogisticRegressionCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])

  • optimization_tolerance (float) – Specify a tolerance value for the L-BFGS optimizer (optional, min: 2.220446049250313e-16)

  • l2_regularizaton_weight (float) – Specify the L2 regularization weight. Use a non-zero value to avoid overfitting. (optional)

  • range_for_optimization_tolerance (str) – Specify a range for the tolerance value for the L-BFGS optimizer (optional)

  • range_for_l2_regularization_weight (str) – Specify the range for the L2 regularization weight. Use a non-zero value to avoid overfitting. (optional)

  • random_number_seed (int) – Type a value to seed the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained classification model

Type

untrained_model: Output

example.assets.example_components.azureml_two_class_neural_network(create_trainer_mode: example.assets.example_components._assets._AzuremlTwoClassNeuralNetworkCreateTrainerModeEnum = _AzuremlTwoClassNeuralNetworkCreateTrainerModeEnum.singleparameter, hidden_layer_specification: example.assets.example_components._assets._AzuremlTwoClassNeuralNetworkHiddenLayerSpecificationEnum = _AzuremlTwoClassNeuralNetworkHiddenLayerSpecificationEnum.fully_connected_case, number_of_hidden_nodes: str = '100', the_learning_rate: float = 0.1, number_of_learning_iterations: int = 100, hidden_layer_specification1: example.assets.example_components._assets._AzuremlTwoClassNeuralNetworkHiddenLayerSpecification1Enum = _AzuremlTwoClassNeuralNetworkHiddenLayerSpecification1Enum.fully_connected_case, number_of_hidden_nodes1: str = '100', range_for_learning_rate: str = '0.1; 0.2; 0.4', range_for_number_of_learning_iterations: str = '20; 40; 80; 160', the_momentum: float = 0, shuffle_examples: bool = True, random_number_seed: Optional[int] = None) example.assets.example_components._assets._AzuremlTwoClassNeuralNetworkComponent

Creates a binary classifier using a neural network algorithm.

Parameters
  • create_trainer_mode (_AzuremlTwoClassNeuralNetworkCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])

  • hidden_layer_specification (_AzuremlTwoClassNeuralNetworkHiddenLayerSpecificationEnum) – Specify the architecture of the hidden layer or layers (optional, enum: [‘Fully-connected case’])

  • number_of_hidden_nodes (str) – Type the number of nodes in the hidden layer. For multiple hidden layers, type a comma-separated list. (optional)

  • the_learning_rate (float) – Specify the size of each step in the learning process (optional, min: 2.220446049250313e-16, max: 2.0)

  • number_of_learning_iterations (int) – Specify the number of iterations while learning (optional, min: 1)

  • hidden_layer_specification1 (_AzuremlTwoClassNeuralNetworkHiddenLayerSpecification1Enum) – Specify the architecture of the hidden layer or layers for range (optional, enum: [‘Fully-connected case’])

  • number_of_hidden_nodes1 (str) – Type the number of nodes in the hidden layer, or for multiple hidden layers, type a comma-separated list. (optional)

  • range_for_learning_rate (str) – Specify the range for the size of each step in the learning process (optional)

  • range_for_number_of_learning_iterations (str) – Specify the range for the number of iterations while learning (optional)

  • the_momentum (float) – Specify a weight to apply during learning to nodes from previous iterations (max: 1.0)

  • shuffle_examples (bool) – Select this option to change the order of instances between learning iterations

  • random_number_seed (int) – Specify a numeric seed to use for random number generation. Leave blank to use the default seed. (optional, max: 4294967295)

Output untrained_model

An untrained binary classification model

Type

untrained_model: Output

example.assets.example_components.azureml_two_class_support_vector_machine(create_trainer_mode: example.assets.example_components._assets._AzuremlTwoClassSupportVectorMachineCreateTrainerModeEnum = _AzuremlTwoClassSupportVectorMachineCreateTrainerModeEnum.singleparameter, number_of_iterations: int = 10, the_value_lambda: float = 0.001, range_for_number_of_iterations: str = '1; 10; 100', range_for_lambda: str = '0.00001; 0.0001; 0.001; 0.01; 0.1', normalize_the_features: bool = True, random_number_seed: Optional[int] = None) example.assets.example_components._assets._AzuremlTwoClassSupportVectorMachineComponent

Creates a binary classification model using the Support Vector Machine algorithm.

Parameters
  • create_trainer_mode (_AzuremlTwoClassSupportVectorMachineCreateTrainerModeEnum) – Create advanced learner options (enum: [‘SingleParameter’, ‘ParameterRange’])

  • number_of_iterations (int) – The number of iterations. (optional, min: 1)

  • the_value_lambda (float) – Weight for L1 regularization. Using a non-zero value avoids overfitting the model to the training dataset. (optional, min: 2.220446049250313e-16)

  • range_for_number_of_iterations (str) – The range for the number of iterations. (optional)

  • range_for_lambda (str) – Weight range for the for L1 regularization. Using a non-zero value avoids overfitting the model to the training dataset. (optional)

  • normalize_the_features (bool) – If true normalize the features.

  • random_number_seed (int) – The seed for the random number generator used by the model. Leave blank for default. (optional, max: 4294967295)

Output untrained_model

An untrained binary classification model that can be connected to the Create One-vs-All Multiclass Classification Model or Train Generic Model or Cross Validate Model modules.

Type

untrained_model: Output

example.assets.example_components.bing_relevance_convert2ss(TextData: Optional[pathlib.Path] = None, ExtractionClause: Optional[str] = None) example.assets.example_components._assets._BingRelevanceConvert2SsComponent

Convert ADLS test data to SS format

Parameters
  • TextData (Path) – relative path on ADLS storage

  • ExtractionClause (str) – the extraction clause, something like “column1:string, column2:int”

Output SSPath

output path of ss

Type

SSPath: Output

example.assets.example_components.bing_relevance_convert2ss_isresource(TextData: Optional[pathlib.Path] = None, ExtractionClause: Optional[str] = None) example.assets.example_components._assets._BingRelevanceConvert2SsIsresourceComponent

Convert ADLS test data to SS format

Parameters
  • TextData (Path) – relative path on ADLS storage

  • ExtractionClause (str) – the extraction clause, something like “column1:string, column2:int”

Output SSPath

output path of ss

Type

SSPath: Output

example.assets.example_components.fine_tune_for_huggingface_text_classification(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, max_seq_length: int = 128, per_device_train_batch_size: int = 8, learning_rate: float = 5e-05, num_train_epochs: int = 1) example.assets.example_components._assets._FineTuneForHuggingfaceTextClassificationComponent
Parameters
  • model (Path) – path

  • dataset (Path) – path

  • max_seq_length (int) – The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded. (optional)

  • per_device_train_batch_size (int) – Batch size per GPU/TPU core/CPU for training. (optional)

  • learning_rate (float) – The initial learning rate for AdamW. (optional)

  • num_train_epochs (int) – Total number of training epochs to perform. (optional)

Output output_model

path

Type

output_model: Output

example.assets.example_components.fine_tune_for_huggingface_text_generation(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, per_device_train_batch_size: int = 8, learning_rate: float = 5e-05, num_train_epochs: int = 1) example.assets.example_components._assets._FineTuneForHuggingfaceTextGenerationComponent
Parameters
  • model (Path) – path

  • dataset (Path) – path

  • per_device_train_batch_size (int) – Batch size per GPU/TPU core/CPU for training. (optional)

  • learning_rate (float) – The initial learning rate for AdamW. (optional)

  • num_train_epochs (int) – Total number of training epochs to perform. (optional)

Output output_model

path

Type

output_model: Output

example.assets.example_components.fine_tune_for_huggingface_token_classification(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, max_seq_length: int = 128, per_device_train_batch_size: int = 8, learning_rate: float = 5e-05, num_train_epochs: int = 1) example.assets.example_components._assets._FineTuneForHuggingfaceTokenClassificationComponent
Parameters
  • model (Path) – path

  • dataset (Path) – path

  • max_seq_length (int) – The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded. (optional)

  • per_device_train_batch_size (int) – Batch size per GPU/TPU core/CPU for training. (optional)

  • learning_rate (float) – The initial learning rate for AdamW. (optional)

  • num_train_epochs (int) – Total number of training epochs to perform. (optional)

Output output_model

path

Type

output_model: Output

example.assets.example_components.microsoft_com_azureml_samples_hello_world_with_cpu_image(input_path: Optional[pathlib.Path] = None, string_parameter: Optional[str] = None) example.assets.example_components._assets._MicrosoftComAzuremlSamplesHelloWorldWithCpuImageComponent

A hello world tutorial to create a module for ml.azure.com.

Parameters
  • input_path (Path) – The directory contains dataframe.

  • string_parameter (str) – A parameter accepts a string value. (optional)

Output output_path

The directory contains a dataframe.

Type

output_path: Output

example.assets.example_components.microsoft_com_azureml_samples_parallel_copy_files_v1(input_folder: Optional[pathlib.Path] = None) example.assets.example_components._assets._MicrosoftComAzuremlSamplesParallelCopyFilesV1Component

A sample Parallel module to copy files.

Parameters

input_folder (Path) – AnyDirectory

Output output_folder

Output files

Type

output_folder: Output

example.assets.example_components.microsoft_com_azureml_samples_sweep_train(training_data: Optional[pathlib.Path] = None, max_epochs: Optional[int] = None, learning_rate: Optional[float] = None, subsample: Optional[float] = None) example.assets.example_components._assets._MicrosoftComAzuremlSamplesSweepTrainComponent

A dummy train component

Parameters
  • training_data (Path) – Training data organized in the torchvision format/structure

  • max_epochs (int) – Maximum number of epochs for the training

  • learning_rate (float) – learning_rate (min: 0.001, max: 0.1)

  • subsample (float) – learning_rate (min: 0.1, max: 0.5)

Output saved_model

path

Type

saved_model: Output

Output other_output

path

Type

other_output: Output

example.assets.example_components.microsoft_com_azureml_samples_train_in_spark(input_path: Optional[pathlib.Path] = None, regularization_rate: float = 0.01) example.assets.example_components._assets._MicrosoftComAzuremlSamplesTrainInSparkComponent

Train a Spark ML model using an HDInsight Spark cluster

Parameters
  • input_path (Path) – Iris csv file

  • regularization_rate (float) – Regularization rate when training with logistic regression (optional)

Output output_path

The output path to save the trained model to

Type

output_path: Output

example.assets.example_components.microsoft_com_azureml_samples_tune(training_data: Optional[pathlib.Path] = None, max_epochs: Optional[int] = None, learning_rate: Optional[float] = None, subsample: Optional[float] = None) example.assets.example_components._assets._MicrosoftComAzuremlSamplesTuneComponent

A dummy hyperparameter tuning component

Parameters
  • training_data (Path) – Training data organized in the torchvision format/structure

  • max_epochs (int) – Maximum number of epochs for the training

  • learning_rate (float) – learning_rate (min: 0.001, max: 0.1)

  • subsample (float) – learning_rate (min: 0.1, max: 0.5)

Output best_model

model

Type

best_model: Output

Output saved_model

path

Type

saved_model: Output

Output other_output

path

Type

other_output: Output

example.assets.example_components.score_for_huggingface_text_classification(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, max_seq_length: int = 128) example.assets.example_components._assets._ScoreForHuggingfaceTextClassificationComponent
Parameters
  • model (Path) – path

  • dataset (Path) – path

  • max_seq_length (int) – The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded. (optional)

Output output_dir

path

Type

output_dir: Output

example.assets.example_components.score_for_huggingface_text_generation(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None) example.assets.example_components._assets._ScoreForHuggingfaceTextGenerationComponent
Parameters
  • model (Path) – path

  • dataset (Path) – path

Output output_dir

path

Type

output_dir: Output

example.assets.example_components.score_for_huggingface_token_classification(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, max_seq_length: int = 128) example.assets.example_components._assets._ScoreForHuggingfaceTokenClassificationComponent
Parameters
  • model (Path) – path

  • dataset (Path) – path

  • max_seq_length (int) – The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded. (optional)

Output output_dir

path

Type

output_dir: Output

example.assets.example_components.sweep_for_huggingface_text_classification(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, max_seq_length: Optional[int] = None, per_device_train_batch_size: int = 8, learning_rate: Optional[float] = None, num_train_epochs: int = 1) example.assets.example_components._assets._SweepForHuggingfaceTextClassificationComponent
Parameters
  • model (Path) – path

  • dataset (Path) – path

  • max_seq_length (int) – The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded. (optional)

  • per_device_train_batch_size (int) – Batch size per GPU/TPU core/CPU for training. (optional)

  • learning_rate (float) – The initial learning rate for AdamW. (optional)

  • num_train_epochs (int) – Total number of training epochs to perform. (optional)

Output output_model

path

Type

output_model: Output

example.assets.example_components.sweep_for_huggingface_text_generation(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, per_device_train_batch_size: int = 8, learning_rate: Optional[float] = None, num_train_epochs: int = 1) example.assets.example_components._assets._SweepForHuggingfaceTextGenerationComponent
Parameters
  • model (Path) – path

  • dataset (Path) – path

  • per_device_train_batch_size (int) – Batch size per GPU/TPU core/CPU for training. (optional)

  • learning_rate (float) – The initial learning rate for AdamW. (optional)

  • num_train_epochs (int) – Total number of training epochs to perform. (optional)

Output output_model

path

Type

output_model: Output

example.assets.example_components.sweep_for_huggingface_token_classification(model: Optional[pathlib.Path] = None, dataset: Optional[pathlib.Path] = None, max_seq_length: Optional[int] = None, per_device_train_batch_size: int = 8, learning_rate: Optional[float] = None, num_train_epochs: int = 1) example.assets.example_components._assets._SweepForHuggingfaceTokenClassificationComponent
Parameters
  • model (Path) – path

  • dataset (Path) – path

  • max_seq_length (int) – The maximum total input sequence length after tokenization. Sequences longer than this will be truncated, sequences shorter will be padded. (optional)

  • per_device_train_batch_size (int) – Batch size per GPU/TPU core/CPU for training. (optional)

  • learning_rate (float) – The initial learning rate for AdamW. (optional)

  • num_train_epochs (int) – Total number of training epochs to perform. (optional)

Output output_model

path

Type

output_model: Output