ML to Predict Formulas:

Still in progress

Four labels:

more: Structured: Predict equation from text formula. (important)
less: Noise: Leave Nonsense. (won't equate)
same: Label well trained: Existing equation. (stronger)
unknown: Unstructured: Not sure (unstructured)

Features are words / formulas. Can only add a feature if you know how it applies to the other features.

Good
Bad
Amount
Frequency
Distance
Direction
Time
...

First training formulas are obvious

import numpy as np

dataset = {}
dataset['target_name'] = np.array(['More', 'Less', 'Same', 'Unknown'])
# which means targets are identified by positions 0, 1, 2, 3

#Currently has 3 features, add more features when value is able to apply to other features
dataset['feature_name'] = np.array(['Good', 'Bad', 'Amount'])
dataset['feature_value'] = np.array([
                            [9999**9, -9999**9, 9999**9], # going to be target more thus 0
                            [-9999**9, 9999**9, 0], # going to be target less thus 1
                            [9999**9, 9999**9, 9999**9], # going to be target same thus 2
                            [1, -1, 1], # going to be target unknown thus 3
])

#So feature_values determined the following targets
dataset['target'] = np.array([0, 1, 2, 3])


################
#Now lets train a model
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)

knn.fit(dataset['feature_value'], dataset['target'])

###############
#Now lets predict
#10 Good, 0 Bad, 1 Amount
what_is = np.array([[10, 0, 1]])

prediction = knn.predict(what_is)

print(dataset['target_name'][prediction])
prediction.predict_proba(what_is)
#Awnser is: Unknown
#Certainty: [[0. 0. 0. 1.]]