Qu'est qu'un démonstrateur web
Un démonstrateur permet de mettre en valeur des travaux de recherche.
In which format should my data be?
Further constraints which your data must fit are:
- 1. The dataset must have a header which contains the column names (typically the names of sensors and a time column).
- 2. The first column of the dataset must be the time/timestamp column, and furthermore must be named time.
- 3. The dataset must be normalized.
- 4. There can't be any missing or NULL values in the dataset. Make sure your data is valid.
Should my data be normalized before clustering?
At present there is no built-in preprocessing step in DyClee which normalized data for the user, so the user must manually do this.
How important is the choice of the parameter of window size?
While the window size has no effect on the result of the algorithm, meaning the final clusters, the choice of window size is important for the dynamic nature of the algorithm. It allows us to have multiple steps in the clustering process and visualize each step, allowing for dynamic clustering and the arrival of new data samples in each step. If the window size parameter is not set, the default value of the window size is the amount of samples that exist in the dataset, meaning only one step will take place and we lose the dynamic activity. For data which arrives more quickly, meaning the intervals between two samples are relatively short, smaller window size will be better and vice versa.
Where can I find examples of DyClee clustering process?
Under the tab Démos, examples of the usage of DyClee on datasets of different sizes and natures are available to see. There are 5 predetermined DyClee configurations, each ran on a specific dataset and the results of the DyClee run. The user can also alter these configurations as they like (changing one or more parameters) and run the Demo again, to see if there will be a difference in the final clustering.
What preprocessing steps do I need to do before clustering?
Based on the given constraints, the user must execute the following preprocessing steps in order to use DyClee:
- 1. Add a header
Do I need to have a header in my .csv file?
However, if you are using DyClee from the command line, the header is not necessary and you can simply name the indices of the columns you would like to use (except the first column, which is always chosen to be used in the algorithm as it is automatically considered as the time column).
Do I need to have a time column?
Furthermore, the time column must be the first column in the dataset and must accordingly be names "time". DyClee automatically takes the first column into account and marks is at the column containing time or timestamps, and the DyClee parser will report an error if the first column is not named "time". When you use the DyClee parser via our interface, the parser will print out all column names and you can make sure the first column is indeed time. Additionally, when you then choose which columns you wish to process using DyClee, you will not be able to check or uncheck the time column as it is automatically taken into account.
The time column, as previously stated, can contain time in any format including timestamps, integers (order of arrival) or floating point values (time measured in seconds).