Rationale - The Second Deedle #13

pkese · 2019-01-09T14:51:07Z

Please provide rationale somewhere in main README, why using/developing/participating in this project rather than Deedle.

Oceania2018 · 2019-01-09T15:14:45Z

Good question. I know there will be some duplication of work, but there are some differences between the two projects, and users can choose according to their own situation. I just came the some reasons.
We want to make the Pandas.NET:

be more like Pandas, I mean everything (99%), function name, letter case.
friendly for C#.
better performance (many people complain the Deedle's performance).
work closer with SciSharp projects.
We use dtype instead of generic design.
Original in .netstandard 2.x

@dotChris90 Do you have more?

@pkese Hope we can find some common ground and cooperate, we are all advocates of .net.

pkese · 2019-01-09T15:37:01Z

@Oceania2018 Good. I'm all in if you wish to improve upon Deedle.

I have however found that the amount of annoyances in Deedle are approximately the same as in Pandas. With Pandas you have a bit larger surface area, whereas with Deedle you have to deal with types bit more. Deedle can have smaller surface area, because you can simply do a for-loop data interaction in .Net without performance penalty, so there really isn't much need to provide 100% feature compatibility with Pandas (often times, Pandas code can be quite hard to read and a for-loop over data would be much more legible, albeit slow in Python).

Regarding performance issues Deedle is on average good enough and I'm not sure you can beat it in any substantial way. The main thing is that Deedle is way faster than Pandas (in my experience, what took 4 hours in Pandas took just 5 minutes in Deedle). Adding a few percent more on that is negligible.

My main worry however is that there are 5 or 6 big SciSharp projects (Pandas.Net, Tensorflow, NumSharp, SciSharpLearn, etc...) with something like 3 active developers behind. It is a rather large surface area to cover by such a small team and there should be a solid reason for people to switch or start participating on your project rather than on more established projects with existing larger community participation and solid documentation. And with the word 'reasons', I mean reasons besides not-invented-here, or not-Pythonic-enough.

If you can't provide a good answer to such questions, you will be unlikely to gain much community support and without community you will eventually give up - wasting your (and even other people's) time. On the other hand, if you provide excellent answers to those questions, people might prefer to contribute their time and code to your project rather than to Deedle.

Oceania2018 · 2019-01-09T15:51:48Z

Pandas.NET is based on NumSharp like Pandas is based on Numpy. Where is Deedle's numpy?

NumSharp adopt serveral providers, default is implemented by pure C# (worst performance).
Imported LAPACK, and working on MKL.
Plan to use C++ to optimize the for..loop issue.

Deedle use object everywhere, obviousely, the performance won't be good. Check our NumSharp's Benchmark project. We even give up the generic design for somewhere, because generic bring performance defect.

dotChris90 · 2019-01-09T15:52:12Z

The only thing I know now here is that NumSharp using specific NDArrays - not .NET Arrays.
Most .NET Numeric projects using .NET arrays and do not create their own.
We made NDArrays which stores elements of NDArray in 1 single 1D array (row wise or column wise - both possible). Made this to easily shape an array to specific form and because C++ Libs like LAPACK using 1D arrays instead of matrix, Tensor, etc.

Deedle using .NET arrays (correct me if I am wrong).
So people could use Deedle and cast the .NET arrays to NumSharp NDArrays. possible.

Also could maybe talk to the FSLab community in general if they are interested in a Scipy like stack.

And I think when tried out Deedle it did not work well in Powershell (so an other .NET language) - but before somebody complain - This could be related to Powershell import mechanism - not sure.

dotChris90 · 2019-01-09T15:52:40Z

@Oceania2018 lol want to say the same like you now.

pkese · 2019-01-09T15:55:04Z

Wonderful. That's exactly the stuff that you need to expose a bit more and put in front.

Oceania2018 · 2019-01-09T15:55:39Z

@pkese Deedle use object and generic everywhere, NumSharp use dtype, that's the biggest difference, more elegent, work exactly same as python style. I really like NumSharp's dtype design.
Try the unit test, think about it and do some benchmark.
Welcome to discuss.

dotChris90 · 2019-01-09T15:56:25Z

@Oceania2018 yes yes - but i have to admit @pkese is right - we need to extend the readme. Otherwise people think "yes this is a 2nd Deedle"

dotChris90 · 2019-01-09T15:57:13Z

or people think "why the guys make a 2nd deedle"

Oceania2018 · 2019-01-09T15:58:17Z

We pursue a Python-like experience, just as smooth as python when you do Machine Learning in .NET. @pkese The other point. @dotChris90 Yes, we use explain more in ReadMe.

tpetricek · 2019-02-09T14:45:23Z

I don't have enough time to join a detailed discussion, but saying Deedle uses objects everywhere is not right. When you have a column of floating point values, the data is actually stored as float[] - the public interface hides that somewhat, but when you get a column as type Series<DateTime, float>, you get pretty direct access to the underlying array of floats.

totalgit74 · 2019-02-10T03:00:34Z

Deedle is ok but it suffers from poor performance for larger datasets. I found the Extreme Optimization library to be far more performant (order of magnitude at least) when I last compared them. However one is free, the other licensed. Deedle is a small dataset only solution and in no way comparable to Pandas or where Pandas is headed.
With regards Pandas and copying it in .Net I would make sure that you are copying where Pandas is headed and not where it has been. Wes McKinney has pointed out some major warts/flaws in Python and its implementation under the hood here. I would aim for that same end-point of Apache Arrow usage else you'll just be a poor man's Pandas in .Net. Parquet file usage would be a requirement. The last thing you want to do is spend a lot of time and effort creating the Pandas of 2017 in .Net in 2019.

NB When I'm talking large datasets I'm only looking at millions of rows so not even big data. Deedle is palatable for perhaps thousands/tens of thousands of rows.

lidanger · 2019-04-10T03:10:24Z

Interfaces of Deedle is so different from Pandas, a .Net ported verison of Pandas is absolutely necessary to use achievements of Python.After all, IronPython cannot be used as a version of Python.

lidanger · 2019-04-29T04:58:43Z

Recently, I found it seems the next version of project pythonnet my solve many problems about interoperability of C# with Python.

@Oceania2018 @Esther2013

Oceania2018 · 2019-04-29T11:18:03Z

@lidanger have you set it up in pythonnet?

lidanger · 2019-04-30T05:52:43Z

I have used pandas and other Python packages in pythonnet for several months. The version 2.3 is not so good for multi-platform, but 2.4 has made great progress. Althouth it has not been released, I used it well in my project with target framework .net core 2.1 and .net framework 4.6.1 these days.

@Oceania2018

Oceania2018 mentioned this issue Jan 9, 2019

Rationale - The Second TensorFlowSharp SciSharp/TensorFlow.NET#107

Closed

Oceania2018 changed the title ~~Rationale~~ Rationale - The Second Deedle Jan 9, 2019

Oceania2018 added the further discuss need further discuss to find the best solution label Apr 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rationale - The Second Deedle #13

Rationale - The Second Deedle #13

pkese commented Jan 9, 2019

Oceania2018 commented Jan 9, 2019

pkese commented Jan 9, 2019 •

edited

Loading

Oceania2018 commented Jan 9, 2019

dotChris90 commented Jan 9, 2019

dotChris90 commented Jan 9, 2019

pkese commented Jan 9, 2019

Oceania2018 commented Jan 9, 2019

dotChris90 commented Jan 9, 2019

dotChris90 commented Jan 9, 2019

Oceania2018 commented Jan 9, 2019

tpetricek commented Feb 9, 2019

totalgit74 commented Feb 10, 2019 •

edited

Loading

lidanger commented Apr 10, 2019

lidanger commented Apr 29, 2019

Oceania2018 commented Apr 29, 2019

lidanger commented Apr 30, 2019 •

edited

Loading

Rationale - The Second Deedle #13

Rationale - The Second Deedle #13

Comments

pkese commented Jan 9, 2019

Oceania2018 commented Jan 9, 2019

pkese commented Jan 9, 2019 • edited Loading

Oceania2018 commented Jan 9, 2019

dotChris90 commented Jan 9, 2019

dotChris90 commented Jan 9, 2019

pkese commented Jan 9, 2019

Oceania2018 commented Jan 9, 2019

dotChris90 commented Jan 9, 2019

dotChris90 commented Jan 9, 2019

Oceania2018 commented Jan 9, 2019

tpetricek commented Feb 9, 2019

totalgit74 commented Feb 10, 2019 • edited Loading

lidanger commented Apr 10, 2019

lidanger commented Apr 29, 2019

Oceania2018 commented Apr 29, 2019

lidanger commented Apr 30, 2019 • edited Loading

pkese commented Jan 9, 2019 •

edited

Loading

totalgit74 commented Feb 10, 2019 •

edited

Loading

lidanger commented Apr 30, 2019 •

edited

Loading